gvisor

Commit Graph

Author	SHA1	Message	Date
Ian Gudger	45566fa4e4	Add finalizer on AtomicRefCount to check for leaks. PiperOrigin-RevId: 255711454	2019-06-28 20:07:52 -07:00
Adin Scannell	7dae043fec	Drop ashmem and binder. These are unfortunately unused and unmaintained. They can be brought back in the future if need requires it. PiperOrigin-RevId: 255697132	2019-06-28 17:20:25 -07:00
Ayush Ranjan	c4da599e22	ext4: disklayout: SuperBlock interface implementations. PiperOrigin-RevId: 255687771	2019-06-28 16:18:29 -07:00
Nicolas Lacasse	295078fa7a	Automated rollback of changelist 255263686 PiperOrigin-RevId: 255679453	2019-06-28 15:28:41 -07:00
Ayush Ranjan	7c13789818	Superblock interface in the disk layout package for ext4. PiperOrigin-RevId: 255644277	2019-06-28 12:07:28 -07:00
Yong He	c61d7761b4	Fix deadloop in proc subtask list Readdir of /proc/x/task/ will get direntry entries from tasks of specified taskgroup. Now the tasks slice is unsorted, use sort.SearchInts search entry from the slice may cause infinity loops. The fix is sort the slice before search. This issue could be easily reproduced via following steps, revise Readdir in pkg/sentry/fs/proc/task.go, force set taskInts into test slice []int{1, 11, 7, 5, 10, 6, 8, 3, 9, 2, 4}, then run docker image and run ls /proc/1/task, the command will cause infinity loops.	2019-06-28 22:20:57 +08:00
Fabricio Voznika	b2907595e5	Complete pipe support on overlayfs Get/Set pipe size and ioctl support were missing from overlayfs. It required moving the pipe.Sizer interface to fs so that overlay could get access. Fixes #318 PiperOrigin-RevId: 255511125	2019-06-27 17:22:53 -07:00
Michael Pratt	5b41ba5d0e	Fix various spelling issues in the documentation Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779	2019-06-27 14:25:50 -07:00
Michael Pratt	085a907565	Cache directory entries in the overlay Currently, the overlay dirCache is only used for a single logical use of getdents. i.e., it is discard when the FD is closed or seeked back to the beginning. But the initial work of getting the directory contents can be quite expensive (particularly sorting large directories), so we should keep it as long as possible. This is very similar to the readdirCache in fs/gofer. Since the upper filesystem does not have to allow caching readdir entries, the new CacheReaddir MountSourceOperations method controls this behavior. This caching should be trivially movable to all Inodes if desired, though that adds an additional copy step for non-overlay Inodes. (Overlay Inodes already do the extra copy). PiperOrigin-RevId: 255477592	2019-06-27 14:24:03 -07:00
Fabricio Voznika	42e212f6b7	Preserve permissions when checking lower The code was wrongly assuming that only read access was required from the lower overlay when checking for permissions. This allowed non-writable files to be writable in the overlay. Fixes #316 PiperOrigin-RevId: 255263686	2019-06-26 14:24:44 -07:00
Michael Pratt	e98ce4a2c6	Add TODO reminder to remove tmpfs caching options Updates #179 PiperOrigin-RevId: 255081565	2019-06-25 17:12:34 -07:00
Andrei Vagin	e9ea7230f7	fs: synchronize concurrent writes into files with O_APPEND For files with O_APPEND, a file write operation gets a file size and uses it as offset to call an inode write operation. This means that all other operations which can change a file size should be blocked while the write operation doesn't complete. PiperOrigin-RevId: 254873771	2019-06-24 17:45:02 -07:00
Rahat Mahmood	94a6bfab5d	Implement /proc/net/tcp. PiperOrigin-RevId: 254854346	2019-06-24 15:56:36 -07:00
chris.zn	f957fb23cf	Return ENOENT when reading /proc/{pid}/task of an exited process There will be a deadloop when we use getdents to read /proc/{pid}/task of an exited process Like this: Process A is running Process B: open /proc/{pid of A}/task Process A exits Process B: getdents /proc/{pid of A}/task Then, process B will fall into deadloop, and return "." and ".." in loops and never ends. This patch returns ENOENT when use getdents to read /proc/{pid}/task if the process is just exited. Signed-off-by: chris.zn <chris.zn@antfin.com>	2019-06-24 15:49:53 +08:00
Nicolas Lacasse	35719d52c7	Implement statx. We don't have the plumbing for btime yet, so that field is left off. The returned mask indicates that btime is absent. Fixes #343 PiperOrigin-RevId: 254575752	2019-06-22 13:29:26 -07:00
Andrei Vagin	ab6774cebf	gvisor/fs: getdents returns 0 if offset is equal to FileMaxOffset FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called. PiperOrigin-RevId: 254498777	2019-06-21 17:25:17 -07:00
Ayush Ranjan	727375321f	ext4 block group descriptor implementation in disk layout package. PiperOrigin-RevId: 254482180	2019-06-21 15:42:46 -07:00
Michael Pratt	292f70cbf7	Add package docs to seqfile and ramfs These are the only packages missing docs: https://godoc.org/gvisor.dev/gvisor PiperOrigin-RevId: 254261022	2019-06-20 13:34:33 -07:00
Nicolas Lacasse	f7428af9c1	Add MountNamespace to task. This allows tasks to have distinct mount namespace, instead of all sharing the kernel's root mount namespace. Currently, the only way for a task to get a different mount namespace than the kernel's root is by explicitly setting a different MountNamespace in CreateProcessArgs, and nothing does this (yet). In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a new container inside runsc. Note that "MountNamespace" is a poor term for this thing. It's more like a distinct VFS tree. When we get around to adding real mount namespaces, this will need a better naem. PiperOrigin-RevId: 254009310	2019-06-19 09:21:21 -07:00
Fabricio Voznika	ca245a428b	Attempt to fix TestPipeWritesAccumulate Test fails because it's reading 4KB instead of the expected 64KB. Changed the test to read pipe buffer size instead of hardcode and added some logging in case the reason for failure was not pipe buffer size. PiperOrigin-RevId: 253916040	2019-06-18 19:16:11 -07:00
Andrei Vagin	8ab0848c70	gvisor/fs: don't update file.offset for sockets, pipes, etc sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644	2019-06-18 01:43:29 -07:00
Ian Gudger	3e9b8ecbfe	Plumb context through more layers of filesytem. All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709	2019-06-13 18:40:38 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00
Fabricio Voznika	fc746efa9a	Add support to mount pod shared tmpfs mounts Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037	2019-06-11 14:54:31 -07:00
Rahat Mahmood	a00157cc0e	Store more information in the kernel socket table. Store enough information in the kernel socket table to distinguish between different types of sockets. Previously we were only storing the socket family, but this isn't enough to classify sockets. For example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets are SOCK_DGRAM sockets with a particular protocol. Instead of creating more sub-tables, flatten the socket table and provide a filtering mechanism based on the socket entry. Also generate and store a socket entry index ("sl" in linux) which allows us to output entries in a stable order from procfs. PiperOrigin-RevId: 252495895	2019-06-10 15:17:43 -07:00
Rahat Mahmood	315cf9a523	Use common definition of SockType. SockType isn't specific to unix domain sockets, and the current definition basically mirrors the linux ABI's definition. PiperOrigin-RevId: 251956740	2019-06-06 17:00:27 -07:00
Fabricio Voznika	02ab1f187c	Copy up parent when binding UDS on overlayfs Overlayfs was expecting the parent to exist when bind(2) was called, which may not be the case. The fix is to copy the parent directory to the upper layer before binding the UDS. There is not good place to add tests for it. Syscall tests would be ideal, but it's hard to guarantee that the directory where the socket is created hasn't been touched before (and thus copied the parent to the upper layer). Added it to runsc integration tests for now. If it turns out we have lots of these kind of tests, we can consider moving them somewhere more appropriate. PiperOrigin-RevId: 251954156	2019-06-06 16:45:51 -07:00
Rahat Mahmood	2d2831e354	Track and export socket state. This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850	2019-06-06 15:04:47 -07:00
Michael Pratt	57772db2e7	Shutdown host sockets on internal shutdown This is required to make the shutdown visible to peers outside the sandbox. The readClosed / writeClosed fields were dropped, as they were preventing a shutdown socket from reading the remainder of queued bytes. The host syscalls will return the appropriate errors for shutdown. The control message tests have been split out of socket_unix.cc to make the (few) remaining tests accessible to testing inherited host UDS, which don't support sending control messages. Updates #273 PiperOrigin-RevId: 251763060	2019-06-05 18:40:37 -07:00
Michael Pratt	d3ed9baac0	Implement dumpability tracking and checks We don't actually support core dumps, but some applications want to get/set dumpability, which still has an effect in procfs. Lack of support for set-uid binaries or fs creds simplifies things a bit. As-is, processes started via CreateProcess (i.e., init and sentryctl exec) have normal dumpability. I'm a bit torn on whether sentryctl exec tasks should be dumpable, but at least since they have no parent normal UID/GID checks should protect them. PiperOrigin-RevId: 251712714	2019-06-05 14:00:13 -07:00
Yong He	7398f013f0	Drop one dirent reference after referenced by file When pipe is created, a dirent of pipe will be created and its initial reference is set as 0. Cause all dirent will only be destroyed when the reference decreased to -1, so there is already a 'initial reference' of dirent after it created. For destroying dirent after all reference released, the correct way is to drop the 'initial reference' once someone hold a reference to the dirent, such as fs.NewFile, otherwise the reference of dirent will stay 0 all the time, and will cause memory leak of dirent. Except pipe, timerfd/eventfd/epoll has the same problem Here is a simple case to create memory leak of dirent for pipe/timerfd/eventfd/epoll in C langange, after run the case, pprof the runsc process, you will find lots dirents of pipe/timerfd/eventfd/epoll not freed: int main(int argc, char *argv[]) { int i; int n; int pipefd[2]; if (argc != 3) { printf("Usage: %s epoll\|timerfd\|eventfd\|pipe <iterations>\n", argv[0]); } n = strtol(argv[2], NULL, 10); if (strcmp(argv[1], "epoll") == 0) { for (i = 0; i < n; ++i) close(epoll_create(1)); } else if (strcmp(argv[1], "timerfd") == 0) { for (i = 0; i < n; ++i) close(timerfd_create(CLOCK_REALTIME, 0)); } else if (strcmp(argv[1], "eventfd") == 0) { for (i = 0; i < n; ++i) close(eventfd(0, 0)); } else if (strcmp(argv[1], "pipe") == 0) { for (i = 0; i < n; ++i) if (pipe(pipefd) == 0) { close(pipefd[0]); close(pipefd[1]); } } printf("%s %s test finished\r\n",argv[1],argv[2]); return 0; } Change-Id: Ia1b8a1fb9142edb00c040e44ec644d007f81f5d2 PiperOrigin-RevId: 251531096	2019-06-04 15:40:23 -07:00
Andrei Vagin	90a116890f	gvisor/sock/unix: pass creds when a message is sent between unconnected sockets and don't report a sender address if it doesn't have one PiperOrigin-RevId: 251371284	2019-06-03 21:48:19 -07:00
Andrei Vagin	00f8663887	gvisor/fs: return a proper error from FileWriter.Write in case of a short-write The io.Writer contract requires that Write writes all available bytes and does not return short writes. This causes errors with io.Copy, since our own Write interface does not have this same contract. PiperOrigin-RevId: 251368730	2019-06-03 21:26:01 -07:00
Nicolas Lacasse	6f73d79c32	Simplify overlayBoundEndpoint. There is no reason to do the recursion manually, since Inode.BoundEndpoint will do it for us. PiperOrigin-RevId: 250794903	2019-05-30 17:20:20 -07:00
chris.zn	b18df9bed6	Add VmData field to /proc/{pid}/status VmData is the size of private data segments. It has the same meaning as in Linux. Change-Id: Iebf1ae85940a810524a6cde9c2e767d4233ddb2a PiperOrigin-RevId: 250593739	2019-05-30 12:07:40 -07:00
Adin Scannell	2165b77774	Remove obsolete bug. The original bug is no longer relevant, and the FIXME here contains lots of obsolete information. PiperOrigin-RevId: 249924036	2019-05-30 12:03:39 -07:00
Adin Scannell	ed5793808e	Remove obsolete TODO. We don't need to model internal interfaces after the system call interfaces (which are objectively worse and simply use a flag to distinguish between two logically different operations). PiperOrigin-RevId: 249916814 Change-Id: I45d02e0ec0be66b782a685b1f305ea027694cab9	2019-05-24 16:18:09 -07:00
Andrei Vagin	a949133c4b	gvisor: interrupt the sendfile system call if a task has been interrupted sendfile can be called for a big range and it can require significant amount of time to process it, so we need to handle task interrupts in this system call. PiperOrigin-RevId: 249781023 Change-Id: Ifc2ec505d74c06f5ee76f93b8d30d518ec2d4015	2019-05-23 23:21:13 -07:00
Ayush Ranjan	6240abb205	Added boilerplate code for ext4 fs. Initialized BUILD with license Mount is still unimplemented and is not meant to be part of this CL. Rest of the fs interface is implemented. Referenced the Linux kernel appropriately when needed PiperOrigin-RevId: 249741997 Change-Id: Id1e4c7c9e68b3f6946da39896fc6a0c3dcd7f98c	2019-05-23 16:55:42 -07:00
Fabricio Voznika	9006304dfe	Initial support for bind mounts Separate MountSource from Mount. This is needed to allow mounts to be shared by multiple containers within the same pod. PiperOrigin-RevId: 249617810 Change-Id: Id2944feb7e4194951f355cbe6d4944ae3c02e468	2019-05-23 04:16:10 -07:00
Adin Scannell	21915eb58b	Remove obsolete TODO. There no obvious reason to require that BlockSize and StatFS are MountSource operations. Today they are in INodeOperations, and they can be moved elsewhere in the future as part of a normal refactor process. PiperOrigin-RevId: 249549982 Change-Id: Ib832e02faeaf8253674475df4e385bcc53d780f3	2019-05-22 17:00:36 -07:00
Adin Scannell	9cdae51fec	Add basic plumbing for splice and stub implementation. This does not actually implement an efficient splice or sendfile. Rather, it adds a generic plumbing to the file internals so that this can be added. All file implementations use the stub fileutil.NoSplice implementation, which causes sendfile and splice to fall back to an internal copy. A basic splice system call interface is added, along with a test. PiperOrigin-RevId: 249335960 Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b	2019-05-21 15:18:12 -07:00
Neel Natu	adeb99709b	Remove unused struct member. Remove unused struct member. PiperOrigin-RevId: 249300446 Change-Id: Ifb16538f684bc3200342462c3da927eb564bf52d	2019-05-21 12:20:19 -07:00
Michael Pratt	80cc2c78e5	Forward named pipe creation to the gofer The backing 9p server must allow named pipe creation, which the runsc fsgofer currently does not. There are small changes to the overlay here. GetFile may block when opening a named pipe, which can cause a deadlock: 1. open(O_RDONLY) -> copyMu.Lock() -> GetFile() 2. open(O_WRONLY) -> copyMu.Lock() -> Deadlock A named pipe usable for writing must already be on the upper filesystem, but we are still taking copyMu for write when checking for upper. That can be changed to a read lock to fix the common case. However, a named pipe on the lower filesystem would still deadlock in open(O_WRONLY) when it tries to actually perform copy up (which would simply return EINVAL). Move the copy up type check before taking copyMu for write to avoid this. p9 must be modified, as it was incorrectly removing the file mode when sending messages on the wire. PiperOrigin-RevId: 249154033 Change-Id: Id6637130e567b03758130eb6c7cdbc976384b7d6	2019-05-20 16:53:08 -07:00
Michael Pratt	6588427451	Fix incorrect tmpfs timestamp updates * Creation of files, directories (and other fs objects) in a directory should always update ctime. * Same for removal. * atime should not be updated on lookup, only readdir. I've also renamed some misleading functions that update mtime and ctime. PiperOrigin-RevId: 249115063 Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7	2019-05-20 13:35:17 -07:00
Michael Pratt	4a842836e5	Return EPERM for mknod This more directly matches what Linux does with unsupported nodes. PiperOrigin-RevId: 248780425 Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec	2019-05-17 13:47:40 -07:00
Michael Pratt	04105781ad	Fix gofer rename ctime and cleanup stat_times test There is a lot of redundancy that we can simplify in the stat_times test. This will make it easier to add new tests. However, the simplification reveals that cached uattrs on goferfs don't properly update ctime on rename. PiperOrigin-RevId: 248773425 Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf	2019-05-17 13:05:47 -07:00
Andrei Vagin	2105158d4b	gofer: don't call hostfile.Close if hostFile is nil PiperOrigin-RevId: 248437159 Change-Id: Ife71f6ca032fca59ec97a82961000ed0af257101	2019-05-15 17:21:10 -07:00
Nicolas Lacasse	dd153c014d	Start of support for /proc/pid/cgroup file. PiperOrigin-RevId: 248263378 Change-Id: Ic057d2bb0b6212110f43ac4df3f0ac9bf931ab98	2019-05-14 20:34:50 -07:00
Michael Pratt	330a1bbd04	Remove false comment PiperOrigin-RevId: 248249285 Change-Id: I9b6d267baa666798b22def590ff20c9a118efd47	2019-05-14 18:06:14 -07:00
Fabricio Voznika	1bee43be13	Implement fallocate(2) Closes #225 PiperOrigin-RevId: 247508791 Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc	2019-05-09 15:35:49 -07:00
Nicolas Lacasse	bfd9f75ba4	Set the FilesytemType in MountSource from the Filesystem. And stop storing the Filesystem in the MountSource. This allows us to decouple the MountSource filesystem type from the name of the filesystem. PiperOrigin-RevId: 247292982 Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8	2019-05-08 14:35:06 -07:00
Fabricio Voznika	e5432fa1b3	Remove defers from gofer.contextFile Most are single line methods in hot paths. PiperOrigin-RevId: 247050267 Change-Id: I428d78723fe00b57483185899dc8fa9e1f01e2ea	2019-05-07 10:55:09 -07:00
Andrei Vagin	24d8656585	gofer: don't leak file descriptors Fixes #219 PiperOrigin-RevId: 246568639 Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a	2019-05-03 14:01:50 -07:00
Michael Pratt	23ca9886c6	Update reference to old type PiperOrigin-RevId: 246036806 Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7	2019-04-30 15:42:39 -07:00
Jamie Liu	8bfb83d0ac	Implement async MemoryFile eviction, and use it in CachingInodeOperations. This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb	2019-04-30 13:56:41 -07:00
Ian Gudger	81ecd8b6ea	Implement the MSG_CTRUNC msghdr flag for Unix sockets. Updates google/gvisor#206 PiperOrigin-RevId: 245880573 Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e	2019-04-29 21:21:08 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Kevin Krakauer	5f13338d30	Fix reference counting bug in /proc/PID/fdinfo/. PiperOrigin-RevId: 245452217 Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869	2019-04-26 11:09:55 -07:00
Jamie Liu	6b76c172b4	Don't enforce NAME_MAX in fs.Dirent.walk(). Maximum filename length is filesystem-dependent, and obtained via statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not always. For example, VFAT supports filenames of up to 255... UCS-2 characters, which Linux conservatively takes to mean UTF-8-encoded bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. As a result, Linux's VFS does not enforce NAME_MAX: $ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/ fs/libfs.c 38: buf->f_namelen = NAME_MAX; 64: if (dentry->d_name.len > NAME_MAX) include/linux/relay.h 74: char base_filename[NAME_MAX]; /* saved base filename / include/linux/fscrypt.h 149: filenames up to NAME_MAX bytes, since base64 encoding expands the length. include/linux/exportfs.h 176: * understanding that it is already pointing to a a %NAME_MAX+1 sized Remove this check from core VFS, and add it to ramfs (and by extension tmpfs), where it is actually applicable: mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup does enforce NAME_MAX. PiperOrigin-RevId: 245324748 Change-Id: I17567c4324bfd60e31746a5270096e75db963fac	2019-04-25 16:05:13 -07:00
Michael Pratt	d6aac9387f	Fix doc typo PiperOrigin-RevId: 244773890 Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa	2019-04-22 18:22:19 -07:00
Fabricio Voznika	c8cee7108f	Use FD limit and file size limit from host FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6	2019-04-17 12:57:40 -07:00
Jamie Liu	4209edafb6	Use open fids when fstat()ing gofer files. PiperOrigin-RevId: 243018347 Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37	2019-04-11 00:43:04 -07:00
Nicolas Lacasse	d93d19fd4e	Fix uses of RootFromContext. RootFromContext can return a dirent with reference taken, or nil. We must call DecRef if (and only if) a real dirent is returned. PiperOrigin-RevId: 242965515 Change-Id: Ie2b7b4cb19ee09b6ccf788b71f3fd7efcdf35a11	2019-04-10 16:36:28 -07:00
Yong He	89cc8eef9b	DATA RACE in fs.(Dirent).fullName add renameMu.Lock when oldParent == newParent in order to avoid data race in following report: WARNING: DATA RACE Read at 0x00c000ba2160 by goroutine 405: gvisor.googlesource.com/gvisor/pkg/sentry/fs.(Dirent).fullName() pkg/sentry/fs/dirent.go:246 +0x6c gvisor.googlesource.com/gvisor/pkg/sentry/fs.(Dirent).FullName() pkg/sentry/fs/dirent.go:356 +0x8b gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(FDMap).String() pkg/sentry/kernel/fd_map.go:135 +0x1e0 fmt.(pp).handleMethods() GOROOT/src/fmt/print.go:603 +0x404 fmt.(pp).printArg() GOROOT/src/fmt/print.go:686 +0x255 fmt.(pp).doPrintf() GOROOT/src/fmt/print.go:1003 +0x33f fmt.Fprintf() GOROOT/src/fmt/print.go:188 +0x7f gvisor.googlesource.com/gvisor/pkg/log.(Writer).Emit() pkg/log/log.go:121 +0x89 gvisor.googlesource.com/gvisor/pkg/log.GoogleEmitter.Emit() pkg/log/glog.go:162 +0x1acc gvisor.googlesource.com/gvisor/pkg/log.(GoogleEmitter).Emit() <autogenerated>:1 +0xe1 gvisor.googlesource.com/gvisor/pkg/log.(BasicLogger).Debugf() pkg/log/log.go:177 +0x111 gvisor.googlesource.com/gvisor/pkg/log.Debugf() pkg/log/log.go:235 +0x66 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).Debugf() pkg/sentry/kernel/task_log.go:48 +0xfe gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).DebugDumpState() pkg/sentry/kernel/task_log.go:66 +0x11f gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(runApp).execute() pkg/sentry/kernel/task_run.go:272 +0xc80 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).run() pkg/sentry/kernel/task_run.go:91 +0x24b Previous write at 0x00c000ba2160 by goroutine 423: gvisor.googlesource.com/gvisor/pkg/sentry/fs.Rename() pkg/sentry/fs/dirent.go:1628 +0x61f gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1.1() pkg/sentry/syscalls/linux/sys_file.go:1864 +0x1f8 gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt( gvisor.googlesource.com/g/linux/sys_file.go:51 +0x20f gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1() pkg/sentry/syscalls/linux/sys_file.go:1852 +0x218 gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt() pkg/sentry/syscalls/linux/sys_file.go:51 +0x20f gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt() pkg/sentry/syscalls/linux/sys_file.go:1840 +0x180 gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.Rename() pkg/sentry/syscalls/linux/sys_file.go:1873 +0x60 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).executeSyscall() pkg/sentry/kernel/task_syscall.go:165 +0x17a gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).doSyscallInvoke() pkg/sentry/kernel/task_syscall.go:283 +0xb4 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).doSyscallEnter() pkg/sentry/kernel/task_syscall.go:244 +0x10c gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).doSyscall() pkg/sentry/kernel/task_syscall.go:219 +0x1e3 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(runApp).execute() pkg/sentry/kernel/task_run.go:215 +0x15a9 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).run() pkg/sentry/kernel/task_run.go:91 +0x24b Reported-by: syzbot+e1babbf756fab380dfff@syzkaller.appspotmail.com Change-Id: Icd2620bb3ea28b817bf0672d454a22b9d8ee189a PiperOrigin-RevId: 242938741	2019-04-10 14:17:33 -07:00
Nicolas Lacasse	0a0619216e	Start saving MountSource.DirentCache. DirentCache is already a savable type, and it ensures that it is empty at the point of Save. There is no reason not to save it along with the MountSource. This did uncover an issue where not all MountSources were properly flushed before Save. If a mount point has an open file and is then unmounted, we save the MountSource without flushing it first. This CL also fixes that by flushing all MountSources for all open FDs on Save. PiperOrigin-RevId: 242906637 Change-Id: I3acd9d52b6ce6b8c989f835a408016cb3e67018f	2019-04-10 11:27:16 -07:00
Shiva Prasanth	7140b1fdca	Fixed /proc/cpuinfo permissions This also applies these permissions to other static proc files. Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d PiperOrigin-RevId: 242898575	2019-04-10 10:49:43 -07:00
Jamie Liu	9471c01348	Export kernel.SignalInfoPriv. Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks. PiperOrigin-RevId: 242562428 Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90	2019-04-08 16:32:11 -07:00
Nicolas Lacasse	70906f1d24	Intermediate ram fs dirs should be writable. We construct a ramfs tree of "scaffolding" directories for all mount points, so that a directory exists that each mount point can be mounted over. We were creating these directories without write permissions, which meant that they were not wribable even when underlayed under a writable filesystem. They should be writable. PiperOrigin-RevId: 242507789 Change-Id: I86645e35417560d862442ff5962da211dbe9b731	2019-04-08 11:56:38 -07:00
Nicolas Lacasse	ee7e6d33b2	Use string type for extended attribute values, instead of []byte. Strings are a better fit for this usage because they are immutable in Go, and can contain arbitrary bytes. It also allows us to avoid casting bytes to string (and the associated allocation) in the hot path when checking for overlay whiteouts. PiperOrigin-RevId: 242208856 Change-Id: I7699ae6302492eca71787dd0b72e0a5a217a3db2	2019-04-05 15:49:39 -07:00
Andrei Vagin	88409e983c	gvisor: Add support for the MS_NOEXEC mount option https://github.com/google/gvisor/issues/145 PiperOrigin-RevId: 242044115 Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878	2019-04-04 17:43:53 -07:00
Nicolas Lacasse	61d8c361c6	Don't release d.mu in checks for child-existence. Dirent.exists() is called in Create to check whether a child with the given name already exists. Dirent.exists() calls walk(), and before this CL allowed walk() to drop d.mu while calling d.Inode.Lookup. During this existence check, a racing Rename() can acquire d.mu and create a new child of the dirent with the same name. (Note that the source and destination of the rename must be in the same directory, otherwise renameMu will be taken preventing the race.) In this case, d.exists() can return false, even though a child with the same name actually does exist. This CL changes d.exists() so that it does not release d.mu while walking, thus preventing the race with Rename. It also adds comments noting that lockForRename may not take renameMu if the source and destination are in the same directory, as this is a bit surprising (at least it was to me). PiperOrigin-RevId: 241842579 Change-Id: I56524870e39dfcd18cab82054eb3088846c34813	2019-04-03 17:53:56 -07:00
Kevin Krakauer	82529becae	Fix index out of bounds in tty implementation. The previous implementation revolved around runes instead of bytes, which caused weird behavior when converting between the two. For example, peekRune would read the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for ?. However, peekRune also returned the length of the byte (1). When calling utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte character ?. tl;dr: I apparently didn't understand runes when I wrote this. PiperOrigin-RevId: 241789081 Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727	2019-04-03 13:00:34 -07:00
Kevin Krakauer	c79e81bd27	Addresses data race in tty implementation. Also makes the safemem reading and writing inline, as it makes it easier to see what locks are held. PiperOrigin-RevId: 241775201 Change-Id: Ib1072f246773ef2d08b5b9a042eb7e9e0284175c	2019-04-03 11:49:55 -07:00
Nicolas Lacasse	1776ab28f0	Add test that symlinking over a directory returns EEXIST. Also remove comments in InodeOperations that required that implementation of some Create* operations ensure that the name does not already exist, since these checks are all centralized in the Dirent. PiperOrigin-RevId: 241637335 Change-Id: Id098dc6063ff7c38347af29d1369075ad1e89a58	2019-04-02 17:28:36 -07:00
Wei Zhang	1fcd40719d	device: fix device major/minor Current gvisor doesn't give devices a right major and minor number. When testing golang supporting of gvisor, I run the test case below: ``` $ docker run -ti --runtime runsc golang:1.12.1 bash -c "cd /usr/local/go/src && ./run.bash " ``` And it reports some errors, one of them is: "--- FAIL: TestDevices (0.00s) --- FAIL: TestDevices//dev/null_1:3 (0.00s) dev_linux_test.go:45: for /dev/null Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/null Minor(0x0) == 0, want 3 dev_linux_test.go:51: for /dev/null Mkdev(1, 3) == 0x103, want 0x0 --- FAIL: TestDevices//dev/zero_1:5 (0.00s) dev_linux_test.go:45: for /dev/zero Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/zero Minor(0x0) == 0, want 5 dev_linux_test.go:51: for /dev/zero Mkdev(1, 5) == 0x105, want 0x0 --- FAIL: TestDevices//dev/random_1:8 (0.00s) dev_linux_test.go:45: for /dev/random Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/random Minor(0x0) == 0, want 8 dev_linux_test.go:51: for /dev/random Mkdev(1, 8) == 0x108, want 0x0 --- FAIL: TestDevices//dev/full_1:7 (0.00s) dev_linux_test.go:45: for /dev/full Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/full Minor(0x0) == 0, want 7 dev_linux_test.go:51: for /dev/full Mkdev(1, 7) == 0x107, want 0x0 --- FAIL: TestDevices//dev/urandom_1:9 (0.00s) dev_linux_test.go:45: for /dev/urandom Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/urandom Minor(0x0) == 0, want 9 dev_linux_test.go:51: for /dev/urandom Mkdev(1, 9) == 0x109, want 0x0 " So I think we'd better assign to them correct major/minor numbers following linux spec. Signed-off-by: Wei Zhang <zhangwei198900@gmail.com> Change-Id: I4521ee7884b4e214fd3a261929e3b6dac537ada9 PiperOrigin-RevId: 241609021	2019-04-02 14:51:07 -07:00
Andrei Vagin	a4b34e2637	gvisor: convert ilist to ilist:generic_list ilist:generic_list works faster (cl/240185278) and the code looks cleaner without type casting. PiperOrigin-RevId: 241381175 Change-Id: I8487ab1d73637b3e9733c253c56dce9e79f0d35f	2019-04-01 12:53:27 -07:00
Jamie Liu	69afd0438e	Return srclen in proc.idMapFileOperations.Write. PiperOrigin-RevId: 241037926 Change-Id: I4b0381ac1c7575e8b861291b068d3da22bc03850	2019-03-29 13:16:46 -07:00
Googler	e373d3642e	Internal change. PiperOrigin-RevId: 240842801 Change-Id: Ibbd6f849f9613edc1b1dd7a99a97d1ecdb6e9188	2019-03-28 13:43:47 -07:00
Jamie Liu	f005350c93	Clean up gofer handle caching. - Document fsutil.CachedFileObject.FD() requirements on access permissions, and change gofer.inodeFileState.FD() to honor them. Fixes #147. - Combine gofer.inodeFileState.readonly and gofer.inodeFileState.readthrough, and simplify handle caching logic. - Inline gofer.cachePolicy.cacheHandles into gofer.inodeFileState.setSharedHandles, because users with access to gofer.inodeFileState don't necessarily have access to the fs.Inode (predictably, this is a save/restore problem). Before this CL: $ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash root@34d51017ed67:/# /root/repro/runsc-b147 mmap: 0x7f3c01e45000 Segmentation fault After this CL: $ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash root@d3c3cb56bbf9:/# /root/repro/runsc-b147 mmap: 0x7f78987ec000 o PiperOrigin-RevId: 240818413 Change-Id: I49e1d4a81a0cb9177832b0a9f31a10da722a896b	2019-03-28 11:43:51 -07:00
Nicolas Lacasse	9c18897887	Add rsslim field in /proc/pid/stat. PiperOrigin-RevId: 240681675 Change-Id: Ib214106e303669fca2d5c744ed5c18e835775161	2019-03-27 17:44:38 -07:00
Nicolas Lacasse	2d355f0e8f	Add start time to /proc/<pid>/stat. The start time is the number of clock ticks between the boot time and application start time. PiperOrigin-RevId: 240619475 Change-Id: Ic8bd7a73e36627ed563988864b0c551c052492a5	2019-03-27 12:41:27 -07:00
Nicolas Lacasse	645af7cdd8	Dev device methods should take pointer receiver. PiperOrigin-RevId: 240600504 Change-Id: I7dd5f27c8da31f24b68b48acdf8f1c19dbd0c32d	2019-03-27 11:08:50 -07:00
Rahat Mahmood	06ec97a3f8	Implement memfd_create. Memfds are simply anonymous tmpfs files with no associated mounts. Also implementing file seals, which Linux only implements for memfds at the moment. PiperOrigin-RevId: 240450031 Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f	2019-03-26 16:16:57 -07:00
Jamie Liu	f3723f8059	Call memmap.Mappable.Translate with more conservative usermem.AccessType. MM.insertPMAsLocked() passes vma.maxPerms to memmap.Mappable.Translate (although it unsets AccessType.Write if the vma is private). This somewhat simplifies handling of pmas, since it means only COW-break needs to replace existing pmas. However, it also means that a MAP_SHARED mapping of a file opened O_RDWR dirties the file, regardless of the mapping's permissions and whether or not the mapping is ever actually written to with I/O that ignores permissions (e.g. ptrace(PTRACE_POKEDATA)). To fix this: - Change the pma-getting path to request only the permissions that are required for the calling access. - Change memmap.Mappable.Translate to take requested permissions, and return allowed permissions. This preserves the existing behavior in the common cases where the memmap.Mappable isn't fsutil.CachingInodeOperations and doesn't care if the translated platform.File pages are written to. - Change the MM.getPMAsLocked path to support permission upgrading of pmas outside of copy-on-write. PiperOrigin-RevId: 240196979 Change-Id: Ie0147c62c1fbc409467a6fa16269a413f3d7d571	2019-03-25 12:42:43 -07:00
Kevin Krakauer	0cd5f20044	Replace manual pty copies to/from userspace with safemem operations. Also, changing queue.writeBuf from a buffer.Bytes to a [][]byte should reduce copying and reallocating of slices. PiperOrigin-RevId: 239713547 Change-Id: I6ee5ff19c3ee2662f1af5749cae7b73db0569e96	2019-03-21 18:05:07 -07:00
Andrei Vagin	87cce0ec08	netstack: reduce MSS from SYN to account tcp options See: https://tools.ietf.org/html/rfc6691#section-2 PiperOrigin-RevId: 239305632 Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e	2019-03-19 17:33:20 -07:00
Michael Pratt	8a499ae65f	Remove references to replaced child in Rename in ramfs/agentfs In the case of a rename replacing an existing destination inode, ramfs Rename failed to first remove the replaced inode. This caused: 1. A leak of a reference to the inode (making it live indefinitely). 2. For directories, a leak of the replaced directory's .. link to the parent. This would cause the parent's link count to incorrectly increase. (2) is much simpler to test than (1), so that's what I've done. agentfs has a similar bug with link count only, so the Dirent layer informs the Inode if this is a replacing rename. Fixes #133 PiperOrigin-RevId: 239105698 Change-Id: I4450af2462d8ae3339def812287213d2cbeebde0	2019-03-18 18:40:06 -07:00
Jamie Liu	8f4634997b	Decouple filemem from platform and move it to pgalloc.MemoryFile. This is in preparation for improved page cache reclaim, which requires greater integration between the page cache and page allocator. PiperOrigin-RevId: 238444706 Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4	2019-03-14 08:12:48 -07:00
Jamie Liu	fb9919881c	Use WalkGetAttr in gofer.inodeOperations.Create. p9.Twalk.handle() with a non-empty path also stats the walked-to path anyway, so the preceding GetAttr is completely wasted. PiperOrigin-RevId: 238440645 Change-Id: I7fbc7536f46b8157639d0d1f491e6aaa9ab688a3	2019-03-14 07:43:15 -07:00
Nicolas Lacasse	2512cc5617	Allow filesystem.Mount to take an optional interface argument. PiperOrigin-RevId: 238360231 Change-Id: I5eaf8d26f8892f77d71c7fbd6c5225ef471cedf1	2019-03-13 19:24:03 -07:00
Jamie Liu	8930e79ebf	Clarify the platform.File interface. - Redefine some memmap.Mappable, platform.File, and platform.Memory semantics in terms of File reference counts (no functional change). - Make AddressSpace.MapFile take a platform.File instead of a raw FD, and replace platform.File.MapInto with platform.File.FD. This allows kvm.AddressSpace.MapFile to always use platform.File.MapInternal instead of maintaining its own (redundant) cache of file mappings in the sentry address space. PiperOrigin-RevId: 238044504 Change-Id: Ib73a11e4275c0da0126d0194aa6c6017a9cef64f	2019-03-12 10:29:16 -07:00
Nicolas Lacasse	fbacb35039	No need to check for negative uintptr. Fixes #134 PiperOrigin-RevId: 237128306 Change-Id: I396e808484c18931fc5775970ec1f5ae231e1cb9	2019-03-06 15:06:46 -08:00
Nicolas Lacasse	0d683c9961	Make tmpfs respect MountNoATime now that fs.Handle is gone. PiperOrigin-RevId: 236752802 Change-Id: I9e50600b2ae25d5f2ac632c4405a7a185bdc3c92	2019-03-04 16:57:14 -08:00
Nicolas Lacasse	9177bcd0ba	DecRef replaced dirent in inode_overlay. PiperOrigin-RevId: 236352158 Change-Id: Ide5104620999eaef6820917505e7299c7b0c5a03	2019-03-01 11:58:59 -08:00
Ruidong Cao	3851705a73	Fix procfs bugs Current procfs has some bugs. After executing ls twice, many dirs come out with same name like "1" or ".". Files like "cpuinfo" disappear. Here variable names is a slice with cap() > len(). Sort after appending to it will not alloc a new space and impact orignal slice. Same to m. Signed-off-by: Ruidong Cao <crdfrank@gmail.com> Change-Id: I83e5cd1c7968c6fe28c35ea4fee497488d4f9eef PiperOrigin-RevId: 236222270	2019-02-28 16:44:54 -08:00
Jamie Liu	05d721f9ee	Hold dataMu for writing in CachingInodeOperations.WriteOut. fsutil.SyncDirtyAll mutates the DirtySet. PiperOrigin-RevId: 236183349 Change-Id: I7e809d5b406ac843407e61eff17d81259a819b4f	2019-02-28 13:14:43 -08:00
Nicolas Lacasse	d516ee3312	Allow overlay to merge Directories and SepcialDirectories. Needed to mount inside /proc or /sys. PiperOrigin-RevId: 235936529 Change-Id: Iee6f2671721b1b9b58a3989705ea901322ec9206	2019-02-27 09:45:45 -08:00
Fabricio Voznika	23fe059761	Lazily allocate inotify map on inode PiperOrigin-RevId: 235735865 Change-Id: I84223eb18eb51da1fa9768feaae80387ff6bfed0	2019-02-26 09:33:44 -08:00
Googler	532f4b2fba	Internal change. PiperOrigin-RevId: 235053594 Change-Id: Ie3d7b11843d0710184a2463886c7034e8f5305d1	2019-02-21 13:08:34 -08:00
Jamie Liu	22d8b6eba1	Break /proc/[pid]/{uid,gid}_map's dependence on seqfile. In addition to simplifying the implementation, this fixes two bugs: - seqfile.NewSeqFile unconditionally creates an inode with mode 0444, but {uid,gid}_map have mode 0644. - idMapSeqFile.Write implements fs.FileOperations.Write ... but it doesn't implement any other fs.FileOperations methods and is never used as fs.FileOperations. idMapSeqFile.GetFile() => seqfile.SeqFile.GetFile() uses seqfile.seqFileOperations instead, which rejects all writes. PiperOrigin-RevId: 234638212 Change-Id: I4568f741ab07929273a009d7e468c8205a8541bc	2019-02-19 11:21:46 -08:00
Nicolas Lacasse	0a41ea72c1	Don't allow writing or reading to TTY unless process group is in foreground. If a background process tries to read from a TTY, linux sends it a SIGTTIN unless the signal is blocked or ignored, or the process group is an orphan, in which case the syscall returns EIO. See drivers/tty/n_tty.c:n_tty_read()=>job_control(). If a background process tries to write a TTY, set the termios, or set the foreground process group, linux then sends a SIGTTOU. If the signal is ignored or blocked, linux allows the write. If the process group is an orphan, the syscall returns EIO. See drivers/tty/tty_io.c:tty_check_change(). PiperOrigin-RevId: 234044367 Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9	2019-02-14 15:47:31 -08:00
Googler	7aaa6cf225	Internal change. PiperOrigin-RevId: 233802562 Change-Id: I40e1b13fd571daaf241b00f8df4bcedd034dc3f1	2019-02-13 12:07:34 -08:00
Nicolas Lacasse	f17692d807	Add fs.AsyncWithContext and call it in fs/gofer/inodeOperations.Release. fs/gofer/inodeOperations.Release does some asynchronous work. Previously it was calling fs.Async with an anonymous function, which caused the function to be allocated on the heap. Because Release is relatively hot, this results in a lot of small allocations and increased GC pressure, noticeable in perf profiles. This CL adds a new function, AsyncWithContext, which is just like Async, but passes a context to the async function. It avoids the need for an extra anonymous function in fs/gofer/inodeOperations.Release. The Async function itself still requires a single anonymous function. PiperOrigin-RevId: 233141763 Change-Id: I1dce4a883a7be9a8a5b884db01e654655f16d19c	2019-02-08 15:54:15 -08:00
Rahat Mahmood	2ba74f84be	Implement /proc/net/unix. PiperOrigin-RevId: 232948478 Change-Id: Ib830121e5e79afaf5d38d17aeef5a1ef97913d23	2019-02-07 14:44:21 -08:00
Zach Koopmans	0cf7fc4e11	Change /proc/PID/cmdline to read environment vector. - Change proc to return envp on overwrite of argv with limitations from upstream. - Add unit tests - Change layout of argv/envp on the stack so that end of argv is contiguous with beginning of envp. PiperOrigin-RevId: 232506107 Change-Id: I993880499ab2c1220f6dc456a922235c49304dec	2019-02-05 10:02:06 -08:00
Fabricio Voznika	2d20b121d7	CachingInodeOperations was over-dirtying cached attributes Dirty should be set only when the attribute is changed in the cache only. Instances where the change was also sent to the backing file doesn't need to dirty the attribute. Also remove size update during WriteOut as writing dirty page would naturaly grow the file if needed. RELNOTES: relnotes is needed for the parent CL. PiperOrigin-RevId: 232068978 Change-Id: I00ba54693a2c7adc06efa9e030faf8f2e8e7f188	2019-02-01 17:51:48 -08:00
Nicolas Lacasse	92e85623a0	Factor the subtargets method into a helper method with tests. PiperOrigin-RevId: 232047515 Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60	2019-02-01 15:23:43 -08:00
Michael Pratt	88b4ce8cac	Fix comment PiperOrigin-RevId: 231861005 Change-Id: I134d4e20cc898d44844219db0a8aacda87e11ef0	2019-01-31 15:03:12 -08:00
Fabricio Voznika	a497f5ed5f	Invalidate COW mappings when file is truncated This changed required making fsutil.HostMappable use a backing file to ensure the correct FD would be used for read/write operations. RELNOTES: relnotes is needed for the parent CL. PiperOrigin-RevId: 231836164 Change-Id: I8ae9639715529874ea7d80a65e2c711a5b4ce254	2019-01-31 12:54:00 -08:00
Michael Pratt	2a0c69b19f	Remove license comments Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses($.$)./licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0	2019-01-31 11:12:53 -08:00
Zhaozhong Ni	ae6e37df2a	Convert TODO into FIXME. PiperOrigin-RevId: 231301228 Change-Id: I3e18f3a12a35fb89a22a8c981188268d5887dc61	2019-01-28 15:34:18 -08:00
Nicolas Lacasse	09cf3b40a8	Fix data race in InodeSimpleAttributes.Unstable. We were modifying InodeSimpleAttributes.Unstable.AccessTime without holding the necessary lock. Luckily for us, InodeSimpleAttributes already has a NotifyAccess method that will do the update while holding the lock. In addition, we were holding dfo.dir.mu.Lock while setting AccessTime, which is unnecessary, so that lock has been removed. PiperOrigin-RevId: 231278447 Change-Id: I81ed6d3dbc0b18e3f90c1df5e5a9c06132761769	2019-01-28 13:26:28 -08:00
Jamie Liu	1cedccf8e9	Drop the one-page limit for /proc/[pid]/{cmdline,environ}. It never actually should have applied to environ (the relevant change in Linux 4.2 is c2c0bb44620d "proc: fix PAGE_SIZE limit of /proc/$PID/cmdline"), and we claim to be Linux 4.4 now anyway. PiperOrigin-RevId: 231250661 Change-Id: I37f9c4280a533d1bcb3eebb7803373ac3c7b9f15	2019-01-28 11:00:23 -08:00
Fabricio Voznika	55e8eb775b	Make cacheRemoteRevalidating detect changes to file size When file size changes outside the sandbox, page cache was not refreshing file size which is required for cacheRemoteRevalidating. In fact, cacheRemoteRevalidating should be skipping the cache completely since it's not really benefiting from it. The cache is cache is already bypassed for unstable attributes (see cachePolicy.cacheUAttrs). And althought the cache is called to map pages, they will always miss the cache and map directly from the host. Created a HostMappable struct that maps directly to the host and use it for files with cacheRemoteRevalidating. Closes #124 PiperOrigin-RevId: 230998440 Change-Id: Ic5f632eabe33b47241e05e98c95e9b2090ae08fc	2019-01-25 17:23:07 -08:00
Adin Scannell	b5088ba59c	cleanup: extract the kernel from context Change-Id: I94704a90beebb53164325e0cce1fcb9a0b97d65c PiperOrigin-RevId: 230817308	2019-01-24 17:02:52 -08:00
Rahat Mahmood	8d7c10e908	Display /proc/net entries for all network configurations. Most of the entries are stubbed out at the moment, but even those were only displayed if IPv6 support was enabled. The entries should be displayed with IPv4-support only, and with only loopback devices. PiperOrigin-RevId: 229946441 Change-Id: I18afaa3af386322787f91bf9d168ab66c01d5a4c	2019-01-18 10:02:12 -08:00
Nicolas Lacasse	12bc7834dc	Allow fsync on a directory. PiperOrigin-RevId: 229781337 Change-Id: I1f946cff2771714fb1abd83a83ed454e9febda0a	2019-01-17 11:06:59 -08:00
Nicolas Lacasse	dc8450b567	Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations. More helper structs have been added to the fsutil package to make it easier to implement fs.InodeOperations and fs.FileOperations. PiperOrigin-RevId: 229305982 Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b	2019-01-14 20:34:28 -08:00
Nicolas Lacasse	d321f575e2	Fix lock order violation. overlayFileOperations.Readdir was holding overlay.copyMu while calling DirentReaddir, which then attempts to take take the corresponding Dirent.mu, causing a lock order violation. (See lock order documentation in fs/copy_up.go.) We only actually need to hold copyMu during readdirEntries(), so holding the lock is moved in there, thus avoiding the lock order violation. A new lock was added to protect overlayFileOperations.dirCache. We were inadvertently relying on copyMu to protect this. There is no reason it should not have its own lock. PiperOrigin-RevId: 228542473 Change-Id: I03c3a368c8cbc0b5a79d50cc486fc94adaddc1c2	2019-01-09 10:29:36 -08:00
Jamie Liu	901ed5da44	Implement /proc/[pid]/smaps. PiperOrigin-RevId: 228245523 Change-Id: I5a4d0a6570b93958e51437e917e5331d83e23a7e	2019-01-07 15:17:44 -08:00
Fabricio Voznika	8e586db162	Add /proc/net/psched content FIO reads this file and expects it to be well formed. PiperOrigin-RevId: 227554483 Change-Id: Ia48ae2377626dd6a2daf17b5b4f5119f90ece55b	2019-01-02 11:39:57 -08:00
Fabricio Voznika	46e6577014	Fix deadlock between epoll_wait and getdents epoll_wait acquires EventPoll.listsMu (in EventPoll.ReadEvents) and then calls Inotify.Readiness which tries to acquire Inotify.evMu. getdents acquires Inotify.evMu (in Inotify.queueEvent) and then calls readyCallback.Callback which tries to acquire EventPoll.listsMu. The fix is to release Inotify.evMu before calling Queue.Notify. Queue is thread-safe and doesn't require Inotify.evMu to be held. Closes #121 PiperOrigin-RevId: 227066695 Change-Id: Id29364bb940d1727f33a5dff9a3c52f390c15761	2018-12-27 14:59:50 -08:00
Fabricio Voznika	1679ef31ef	inotify notifies watchers when control events bit are set The code that matches the event being published with events watchers was wronly matching all watchers in case any of the control event bits were set. Issue #121 PiperOrigin-RevId: 226521230 Change-Id: Ie2c42bc4366faaf59fbf80a74e9297499bd93f9e	2018-12-21 11:54:02 -08:00
Nicolas Lacasse	8ba450363f	Deflake gofer_test. We must wait for all lazy resources to be released before closing the rootFile. PiperOrigin-RevId: 226419499 Change-Id: I1d4d961a92b3816e02690cf3eaf0a88944d730cc	2018-12-20 17:23:26 -08:00
Nicolas Lacasse	d3ae74d2a5	overlayBoundEndpoint must be recursive if there is an overlay in the lower. The old overlayBoundEndpoint assumed that the lower is not an overlay. It should check if the lower is an overlay and handle that case. PiperOrigin-RevId: 225882303 Change-Id: I60660c587d91db2826e0719da0983ec8ad024cb8	2018-12-17 13:46:57 -08:00
Adin Scannell	5d8cf31346	Move fdnotifier package to reduce internal confusion. PiperOrigin-RevId: 225632398 Change-Id: I909e7e2925aa369adc28e844c284d9a6108e85ce	2018-12-14 18:05:01 -08:00
Andrei Vagin	3cf84e3bef	Mark sync.Mutex in TTYFileOperations as nosave PiperOrigin-RevId: 225621767 Change-Id: Ie3a42cdf0b0de22a020ff43e307bf86409cff329	2018-12-14 16:24:21 -08:00
Ian Gudger	e1dcf92ec5	Implement SO_SNDTIMEO PiperOrigin-RevId: 225620490 Change-Id: Ia726107b3f58093a5f881634f90b071b32d2c269	2018-12-14 16:15:06 -08:00
Rahat Mahmood	ccce1d4281	Filesystems shouldn't be saving references to Platform. Platform objects are not savable, storing references to them in filesystem datastructures would cause save to fail if someone actually passed in a Platform. Current implementations work because everywhere a Platform is expected, we currently pass in a Kernel object which embeds Platform and thus satisfies the interface. Eliminate this indirection and save pointers to Kernel directly. PiperOrigin-RevId: 225288336 Change-Id: Ica399ff43f425e15bc150a0d7102196c3d54a2ab	2018-12-12 17:47:55 -08:00
Rahat Mahmood	75e39eaa74	Pass information about map writableness to filesystems. This is necessary to implement file seals for memfds. PiperOrigin-RevId: 225239394 Change-Id: Ib3f1ab31385afc4b24e96cd81a05ef1bebbcbb70	2018-12-12 13:09:59 -08:00
Ian Gudger	5d87d8865f	Implement MSG_WAITALL MSG_WAITALL requests that recv family calls do not perform short reads. It only has an effect for SOCK_STREAM sockets, other types ignore it. PiperOrigin-RevId: 224918540 Change-Id: Id97fbf972f1f7cbd4e08eec0138f8cbdf1c94fe7	2018-12-10 17:56:34 -08:00
Zhaozhong Ni	9984138abe	sentry: turn "dynamically-created" procfs files into static creation. PiperOrigin-RevId: 224600982 Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34	2018-12-07 17:03:54 -08:00
Michael Pratt	673949048e	Add period to comment PiperOrigin-RevId: 224553291 Change-Id: I35d0772c215b71f4319c23f22df5c61c908f8590	2018-12-07 11:53:19 -08:00
Michael Pratt	9f64e64a6e	Enforce directory accessibility before delete Walk By Walking before checking that the directory is writable and executable, MayDelete may return the Walk error (e.g., ENOENT) which would normally be masked by a permission error (EACCES). PiperOrigin-RevId: 224222453 Change-Id: I108a7f730e6bdaa7f277eaddb776267c00805475	2018-12-05 14:31:58 -08:00
Michael Pratt	592f5bdc67	Add context to mount errors This makes it more obvious why a mount failed. PiperOrigin-RevId: 224203880 Change-Id: I7961774a7b6fdbb5493a791f8b3815c49b8f7631	2018-12-05 12:46:30 -08:00
Brian Geffon	82719be42e	Max link traversals should be for an entire path. The number of symbolic links that are allowed to be followed are for a full path and not just a chain of symbolic links. PiperOrigin-RevId: 224047321 Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40	2018-12-04 14:32:03 -08:00
Zhaozhong Ni	adafc08d7c	sentry: save / restore netstack procfs configuration. PiperOrigin-RevId: 224047120 Change-Id: Ia6cb17fa978595cd73857b6178c4bdba401e185e	2018-12-04 14:30:42 -08:00
Brian Geffon	5a6a1eb420	Enforce name length restriction on paths. NAME_LENGTH must be enforced per component. PiperOrigin-RevId: 224046749 Change-Id: Iba8105b00d951f2509dc768af58e4110dafbe1c9	2018-12-04 14:28:33 -08:00
Nicolas Lacasse	54dd0d0dc5	Fix data race caused by unlocked call of Dirent.descendantOf. PiperOrigin-RevId: 224025363 Change-Id: I98864403c779832e9e1436f7d3c3f6fb2fba9904	2018-12-04 12:24:55 -08:00
Nicolas Lacasse	573622fdca	Fix data race in fs.Async. Replaces the WaitGroup with a RWMutex. Calls to Async hold the mutex for reading, while AsyncBarrier takes the lock for writing. This ensures that all executing Async work finishes before AsyncBarrier returns. Also pushes the Async() call from Inode.Release into gofer/InodeOperations.Release(). This removes a recursive Async call which should not have been allowed in the first place. The gofer Release call is the slow one (since it may make RPCs to the gofer), so putting the Async call there makes sense. PiperOrigin-RevId: 223093067 Change-Id: I116da7b20fce5ebab8d99c2ab0f27db7c89d890e	2018-11-27 18:17:09 -08:00
Nicolas Lacasse	8c84f9a3c1	Parse the tmpfs mode before validating. This gets rid of the problematic modeRegex. PiperOrigin-RevId: 221835959 Change-Id: I566b8d8a43579a4c30c0a08a620a964bbcd826dd	2018-11-20 14:02:39 -08:00
Nicolas Lacasse	6ef08c2bc2	Allow setting sticky bit in tmpfs permissions. PiperOrigin-RevId: 221683127 Change-Id: Ide6a9f41d75aa19d0e2051a05a1e4a114a4fb93c	2018-11-15 13:48:59 -08:00
Googler	25d07fbbed	Internal change. PiperOrigin-RevId: 221189534 Change-Id: Id20d318bed97d5226b454c9351df396d11251e1f	2018-11-12 17:44:46 -08:00
Rahat Mahmood	5a0be6fa20	Create stubs for syscalls upto Linux 4.4. Create syscall stubs for missing syscalls upto Linux 4.4 and advertise a kernel version of 4.4. PiperOrigin-RevId: 220667680 Change-Id: Idbdccde538faabf16debc22f492dd053a8af0ba7	2018-11-08 11:09:46 -08:00
Juan	b23cd33682	modify modeRegexp to adapt the default spec of containerd https://github.com/containerd/containerd/blob/master/oci/spec.go#L206, the mode=755 didn't match the pattern modeRegexp = regexp.MustCompile("0[0-7][0-7][0-7]"). Closes #112 Signed-off-by: Juan <xionghuan.cn@gmail.com> Change-Id: I469e0a68160a1278e34c9e1dbe4b7784c6f97e5a PiperOrigin-RevId: 219672525	2018-11-01 11:57:54 -07:00
Ian Gudger	425dccdd7e	Convert Unix transport to syserr Previously this code used the tcpip error space. Since it is no longer part of netstack, it can use the sentry's error space (except for a few cases where there is still some shared code. This reduces the number of error space conversions required for hot Unix socket operations. PiperOrigin-RevId: 218541611 Change-Id: I3d13047006a8245b5dfda73364d37b8a453784bb	2018-10-24 11:05:08 -07:00
Adin Scannell	75cd70ecc9	Track paths and provide a rename hook. This change also adds extensive testing to the p9 package via mocks. The sanity checks and type checks are moved from the gofer into the core package, where they can be more easily validated. PiperOrigin-RevId: 218296768 Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55	2018-10-23 00:20:15 -07:00
Fabricio Voznika	b2068cf5a5	Add more unimplemented syscall events Added events for ctl syscalls that may have multiple different commands. For runsc, each syscall event is only logged once. For ctl syscalls, use the cmd as identifier, not only the syscall number. PiperOrigin-RevId: 218015941 Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59	2018-10-20 11:14:23 -07:00
Ian Gudger	8fce67af24	Use correct company name in copyright header PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035	2018-10-19 16:35:11 -07:00
Ian Gudger	8c85f5e9ce	Fix typos in socket_test PiperOrigin-RevId: 217576188 Change-Id: I82e45c306c5c9161e207311c7dbb8a983820c1df	2018-10-17 13:25:45 -07:00
Ian Gudger	6cba410df0	Move Unix transport out of netstack PiperOrigin-RevId: 217557656 Change-Id: I63d27635b1a6c12877279995d2d9847b6a19da9b	2018-10-17 11:37:51 -07:00
Ian Gudger	324ad3564b	Refactor host.ConnectedEndpoint * Integrate recvMsg and sendMsg functions into Recv and Send respectively as they are no longer shared. * Clean up partial read/write error handling code. * Re-order code to make sense given that there is no longer a host.endpoint type. PiperOrigin-RevId: 217255072 Change-Id: Ib43fe9286452f813b8309d969be11f5fa40694cd	2018-10-15 20:23:18 -07:00
Ian Gudger	167f2401c4	Merge host.endpoint into host.ConnectedEndpoint host.endpoint contained duplicated logic from the sockerpair implementation and host.ConnectedEndpoint. Remove host.endpoint in favor of a host.ConnectedEndpoint wrapped in a socketpair end. PiperOrigin-RevId: 217240096 Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c	2018-10-15 17:48:11 -07:00
Nicolas Lacasse	ecd94ea7a6	Clean up Rename and Unlink checks for EBUSY. - Change Dirent.Busy => Dirent.isMountPoint. The function body is unchanged, and it is no longer exported. - fs.MayDelete now checks that the victim is not the process root. This aligns with Linux's namei.c:may_delete(). - Fix "is-ancestor" checks to actually compare all ancestors, not just the parents. - Fix handling of paths that end in dots, which are handled differently in Rename vs. Unlink. PiperOrigin-RevId: 217239274 Change-Id: I7a0eb768e70a1b2915017ce54f7f95cbf8edf1fb	2018-10-15 17:42:30 -07:00
Zhaozhong Ni	4ea69fce8d	sentry: save fs.Dirent deleted info. PiperOrigin-RevId: 217155458 Change-Id: Id3265b1ec784787039e2131c80254ac4937330c7	2018-10-15 09:31:32 -07:00
Zhaozhong Ni	0bfa03d61c	sentry: allow saving of unlinked files with open fds on virtual fs. PiperOrigin-RevId: 216733414 Change-Id: I33cd3eb818f0c39717d6656fcdfff6050b37ebb0	2018-10-11 11:41:44 -07:00
Michael Pratt	ddb34b3690	Enforce message size limits and avoid host calls with too many iovecs Currently, in the face of FileMem fragmentation and a large sendmsg or recvmsg call, host sockets may pass > 1024 iovecs to the host, which will immediately cause the host to return EMSGSIZE. When we detect this case, use a single intermediate buffer to pass to the kernel, copying to/from the src/dst buffer. To avoid creating unbounded intermediate buffers, enforce message size checks and truncation w.r.t. the send buffer size. The same functionality is added to netstack unix sockets for feature parity. PiperOrigin-RevId: 216590198 Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7	2018-10-10 14:10:17 -07:00
Nicolas Lacasse	213f6688a5	Implement TIOCSCTTY ioctl as a noop. PiperOrigin-RevId: 215658757 Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd	2018-10-03 17:29:56 -07:00
Ian Gudger	4fef31f96c	Add S/R support for FIOASYNC PiperOrigin-RevId: 215655197 Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f	2018-10-03 17:03:09 -07:00
Nicolas Lacasse	f1c01ed886	runsc: Support job control signals in "exec -it". Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39	2018-10-01 22:06:56 -07:00
Michael Pratt	3ff24b4f2c	Require AF_UNIX sockets from the gofer host.endpoint already has the check, but it is missing from host.ConnectedEndpoint. PiperOrigin-RevId: 214962762 Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6	2018-09-28 11:03:11 -07:00
Nicolas Lacasse	b709d23987	Forward ioctl(TCSETSF) calls on host ttys to the host kernel. We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but discards pending input. The filters were relaxed to allow host ioctls with TCSETSF argument. This fixes programs like "passwd" that prevent user input from being displayed on the terminal. Before: root@b8a0240fc836:/# passwd Enter new UNIX password: 123 Retype new UNIX password: 123 passwd: password updated successfully After: root@ae6f5dabe402:/# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully PiperOrigin-RevId: 214869788 Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58	2018-09-27 18:17:38 -07:00
Nicolas Lacasse	fd222d62ed	Short-circuit Readdir calls on overlay files when the dirent is frozen. If we have an overlay file whose corresponding Dirent is frozen, then we should not bother calling Readdir on the upper or lower files, since DirentReaddir will calculate children based on the frozen Dirent tree. A test was added that fails without this change. PiperOrigin-RevId: 213531215 Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d	2018-09-18 15:42:22 -07:00
Ian Gudger	ab6fa44588	Allow kernel.(*Task).Block to accept an extract only channel PiperOrigin-RevId: 213328293 Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a	2018-09-17 13:35:54 -07:00
Michael Pratt	3aa50f18a4	Reuse readlink parameter, add sockaddr max. PiperOrigin-RevId: 213058623 Change-Id: I522598c655d633b9330990951ff1c54d1023ec29	2018-09-14 16:00:02 -07:00
Nicolas Lacasse	b84bfa570d	Make gVisor hard link check match Linux's. Linux permits hard-linking if the target is owned by the user OR the target has Read+Write permission. PiperOrigin-RevId: 213024613 Change-Id: If642066317b568b99084edd33ee4e8822ec9cbb3	2018-09-14 12:29:46 -07:00
Nicolas Lacasse	9751b800a6	runsc: Support multi-container exec. We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537	2018-09-07 17:39:54 -07:00
Fabricio Voznika	41b56696c4	Imported FD in exec was leaking Imported file needs to be closed after it's been imported. PiperOrigin-RevId: 211732472 Change-Id: Ia9249210558b77be076bcce465b832a22eed301f	2018-09-05 18:07:11 -07:00
Michael Pratt	3944cb41cb	/proc/PID/mounts is not tab-delimited PiperOrigin-RevId: 211513847 Change-Id: Ib484dd2d921c3e5d70d0e410cd973d3bff4f6b73	2018-09-04 13:29:49 -07:00
Adin Scannell	c09f9acd7c	Distinguish Element and Linker for ilist. Furthermore, allow for the specification of an ElementMapper. This allows a single "Element" type to exist on multiple inline lists, and work without having to embed the entry type. This is a requisite change for supporting a per-Inode list of Dirents. PiperOrigin-RevId: 211467497 Change-Id: If2768999b43e03fdaecf8ed15f435fe37518d163	2018-09-04 09:19:11 -07:00
Jamie Liu	b935311e23	Do not use fs.FileOwnerFromContext in fs/proc.file.UnstableAttr(). From //pkg/sentry/context/context.go: // - It is not safe to retain a Context passed to a function beyond the scope // of that function call. Passing a stored kernel.Task as a context.Context to fs.FileOwnerFromContext violates this requirement. PiperOrigin-RevId: 211143021 Change-Id: I4c5b02bd941407be4c9cfdbcbdfe5a26acaec037	2018-08-31 14:17:56 -07:00
Nicolas Lacasse	8bfb5fa919	fs: Add empty dir at /sys/class/power_supply. PiperOrigin-RevId: 210953512 Change-Id: I07d2d7fb0d268aa8eca26d81ef28b5b5c42289ee	2018-08-30 12:01:27 -07:00
Nicolas Lacasse	956fe64ad6	fs: Fix renameMu lock recursion. dirent.walk() takes renameMu, but is often called with renameMu already held, which can lead to a deadlock. Fix this by requiring renameMu to be held for reading when dirent.walk() is called. This causes walks and existence checks to block while a rename operation takes place, but that is what we were already trying to enforce by taking renameMu in walk() anyways. PiperOrigin-RevId: 210760780 Change-Id: Id61018e6e4adbeac53b9c1b3aa24ab77f75d8a54	2018-08-29 11:47:01 -07:00
Nicolas Lacasse	1893247616	fs: Drop reference to over-written file before renaming over it. dirent.go:Rename() walks to the file being replaced and defers replaced.DecRef(). After the rename, the reference is dropped, triggering a writeout and SettAttr call to the gofer. Because of lazyOpenForWrite, the gofer opens the replaced file BY ITS OLD NAME and calls ftruncate on it. This CL changes Remove to drop the reference on replaced (and thus trigger writeout) before the actual rename call. PiperOrigin-RevId: 210756097 Change-Id: I01ea09a5ee6c2e2d464560362f09943641638e0f	2018-08-29 11:22:27 -07:00
Nicolas Lacasse	3b11769c77	fs: Don't bother saving negative dirents. PiperOrigin-RevId: 210616454 Change-Id: I3f536e2b4d603e540cdd9a67c61b8ec3351f4ac3	2018-08-28 15:18:42 -07:00
Nicolas Lacasse	515d9bf43b	fs: Add tests for dirent ref counting with an overlay. PiperOrigin-RevId: 210614669 Change-Id: I408365ff6d6c7765ed7b789446d30e7079cbfc67	2018-08-28 15:09:17 -07:00
Zhaozhong Ni	d724863a31	sentry: optimize dirent weakref map save / restore. Weak references save / restore involves multiple interface indirection and cause material latency overhead when there are lots of dirents, each containing a weak reference map. The nil entries in the map should also be purged. PiperOrigin-RevId: 210593727 Change-Id: Ied6f4c3c0726fcc53a24b983d9b3a79121b6b758	2018-08-28 13:22:07 -07:00
Brian Geffon	f0492d45aa	Add /proc/sys/kernel/shm[all,max,mni]. PiperOrigin-RevId: 210459956 Change-Id: I51859b90fa967631e0a54a390abc3b5541fbee66	2018-08-27 17:21:37 -07:00
Nicolas Lacasse	0b3bfe2ea3	fs: Fix remote-revalidate cache policy. When revalidating a Dirent, if the inode id is the same, then we don't need to throw away the entire Dirent. We can just update the unstable attributes in place. If the inode id has changed, then the remote file has been deleted or moved, and we have no choice but to throw away the dirent we have a look up another. In this case, we may still end up losing a mounted dirent that is a child of the revalidated dirent. However, that seems appropriate here because the entire mount point has been pulled out from underneath us. Because gVisor's overlay is at the Inode level rather than the Dirent level, we must pass the parent Inode and name along with the Inode that is being revalidated. PiperOrigin-RevId: 210431270 Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42	2018-08-27 14:26:29 -07:00
Zhaozhong Ni	bd01816c87	sentry: mark fsutil.DirFileOperations as savable. PiperOrigin-RevId: 210405166 Change-Id: I252766015885c418e914007baf2fc058fec39b3e	2018-08-27 11:55:32 -07:00
Kevin Krakauer	2524111fc6	runsc: Terminal resizing support. Implements the TIOCGWINSZ and TIOCSWINSZ ioctls, which allow processes to resize the terminal. This allows, for example, sshd to properly set the window size for ssh sessions. PiperOrigin-RevId: 210392504 Change-Id: I0d4789154d6d22f02509b31d71392e13ee4a50ba	2018-08-27 10:49:16 -07:00
Nicolas Lacasse	106de2182d	runsc: Terminal support for "docker exec -ti". This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9	2018-08-24 17:43:21 -07:00
Nicolas Lacasse	c48708a041	fs: Drop unused WaitGroup in Dirent.destroy. PiperOrigin-RevId: 210182476 Change-Id: I655a2a801e2069108d30323f7f5ae76deb3ea3ec	2018-08-24 17:15:42 -07:00
Zhaozhong Ni	ba8f6ba8c8	sentry: mark idMapSeqHandle as savable. PiperOrigin-RevId: 209994384 Change-Id: I16186cf79cb4760a134f3968db30c168a5f4340e	2018-08-23 13:59:20 -07:00
Zhaozhong Ni	6b9133ba96	sentry: mark S/R stating errors as save rejections / fs corruptions. PiperOrigin-RevId: 209817767 Change-Id: Iddf2b8441bc44f31f9a8cf6f2bd8e7a5b824b487	2018-08-22 13:19:16 -07:00
Nicolas Lacasse	8d318aac55	fs: Hold Dirent.mu when calling Dirent.flush(). As required by the contract in Dirent.flush(). Also inline Dirent.freeze() into Dirent.Freeze(), since it is only called from there. PiperOrigin-RevId: 209783626 Change-Id: Ie6de4533d93dd299ffa01dabfa257c9cc259b1f4	2018-08-22 10:07:01 -07:00
Zhaozhong Ni	8bb50dab79	sentry: do not release gofer inode file state loading lock upon error. When an inode file state failed to load asynchronuously, we want to report the error instead of potentially panicing in another async loading goroutine incorrectly unblocked. PiperOrigin-RevId: 209683977 Change-Id: I591cde97710bbe3cdc53717ee58f1d28bbda9261	2018-08-21 16:52:27 -07:00
Nicolas Lacasse	0050e3e71c	sysfs: Add (empty) cpu directories for each cpu in /sys/devices/system/cpu. Numpy needs these. Also added the "present" directory, since the contents are the same as possible and online. PiperOrigin-RevId: 209451777 Change-Id: I2048de3f57bf1c57e9b5421d607ca89c2a173684	2018-08-20 11:19:15 -07:00
Chenggang Qin	aeec7a4c00	fs: Support possible and online knobs for cpu Some linux commands depend on /sys/devices/system/cpu/possible, such as 'lscpu'. Add 2 knobs for cpu: /sys/devices/system/cpu/possible /sys/devices/system/cpu/online Both the values are '0 - Kernel.ApplicationCores()-1'. Change-Id: Iabd8a4e559cbb630ed249686b92c22b4e7120663 PiperOrigin-RevId: 209070163	2018-08-16 16:28:14 -07:00
Nicolas Lacasse	e8a4f2e133	runsc: Change cache policy for root fs and volume mounts. Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9	2018-08-14 16:25:58 -07:00
Kevin Krakauer	d4939f6dc2	TTY: Fix data race where calls into tty.queue's waiter were not synchronized. Now, there's a waiter for each end (master and slave) of the TTY, and each waiter.Entry is only enqueued in one of the waiters. PiperOrigin-RevId: 208734483 Change-Id: I06996148f123075f8dd48cde5a553e2be74c6dce	2018-08-14 16:22:56 -07:00
Kevin Krakauer	12a4912aed	Fix `ls -laR \| wc -l` hanging. stat()-ing /proc/PID/fd/FD incremented but didn't decrement the refcount for FD. This behavior wasn't usually noticeable, but in the above case: - ls would never decrement the refcount of the write end of the pipe to 0. - This caused the write end of the pipe never to close. - wc would then hang read()-ing from the pipe. PiperOrigin-RevId: 208728817 Change-Id: I4fca1ba5ca24e4108915a1d30b41dc63da40604d	2018-08-14 15:49:58 -07:00
Nicolas Lacasse	66b0f3e15a	Fix bind() on overlays. InodeOperations.Bind now returns a Dirent which will be cached in the Dirent tree. When an overlay is in-use, Bind cannot return the Dirent created by the upper filesystem because the Dirent does not know about the overlay. Instead, overlayBind must create a new overlay-aware Inode and Dirent and return that. This is analagous to how Lookup and overlayLookup work. PiperOrigin-RevId: 208670710 Change-Id: I6390affbcf94c38656b4b458e248739b4853da29	2018-08-14 10:34:56 -07:00
Adin Scannell	dde836a918	Prevent renames across walk fast path. PiperOrigin-RevId: 208533436 Change-Id: Ifc1a4e2d6438a424650bee831c301b1ac0d670a3	2018-08-13 13:31:18 -07:00
Nicolas Lacasse	a2ec391dfb	fs: Allow overlays to revalidate files from the upper fs. Previously, an overlay would panic if either the upper or lower fs required revalidation for a given Dirent. Now, we allow revalidation from the upper file, but not the lower. If a cached overlay inode does need revalidation (because the upper needs revalidation), then the entire overlay Inode will be discarded and a new overlay Inode will be built with a fresh copy of the upper file. As a side effect of this change, Revalidate must take an Inode instead of a Dirent, since an overlay needs to revalidate individual Inodes. PiperOrigin-RevId: 208293638 Change-Id: Ic8f8d1ffdc09114721745661a09522b54420c5f1	2018-08-10 17:16:38 -07:00
Nicolas Lacasse	567c5eed11	cache policy: Check policy before returning a negative dirent. The cache policy determines whether Lookup should return a negative dirent, or just ENOENT. This CL fixes one spot where we returned a negative dirent without first consulting the policy. PiperOrigin-RevId: 208280230 Change-Id: I8f963bbdb45a95a74ad0ecc1eef47eff2092d3a4	2018-08-10 15:43:03 -07:00
Brielle Broder	4ececd8e8d	Enable checkpoint/restore in cases of UDS use. Previously, processes which used file-system Unix Domain Sockets could not be checkpoint-ed in runsc because the sockets were saved with their inode numbers which do not necessarily remain the same upon restore. Now, the sockets are also saved with their paths so that the new inodes can be determined for the sockets based on these paths after restoring. Tests for cases with UDS use are included. Test cleanup to come. PiperOrigin-RevId: 208268781 Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec	2018-08-10 14:33:20 -07:00
Nicolas Lacasse	a38f41b464	fs: Add new cache policy "remote_revalidate". This CL adds a new cache-policy for gofer filesystems that uses the host page cache, but causes dirents to be reloaded on each Walk, and does not cache readdir results. This policy is useful when the remote filesystem may change out from underneath us, as any remote changes will be reflected on the next Walk. Importantly, this cache policy is only consistent if we do not use gVisor's internal page cache, since that page cache is tied to the Inode and may be thrown away upon Revalidation. This cache policy should only be used when the gofer supports donating host FDs, since then gVisor will make use of the host kernel page cache, which will be consistent for all open files in the gofer. In fact, a panic will be raised if a file is opened without a donated FD. PiperOrigin-RevId: 207752937 Change-Id: I233cb78b4695bbe00a4605ae64080a47629329b8	2018-08-07 11:43:41 -07:00

... 2 3 4 5 6 ...

412 Commits