gvisor

Commit Graph

Author	SHA1	Message	Date
Ayush Ranjan	1c9781a4ed	ext: vfs.FileDescriptionImpl and vfs.FilesystemImpl implementations. - This also gets rid of pipes for now because pipe does not have vfs2 specific support yet. - Added file path resolution logic. - Fixes testing infrastructure. - Does not include unit tests yet. PiperOrigin-RevId: 262213950	2019-08-07 14:23:42 -07:00
Kevin Krakauer	b6a5b950d2	Job control: controlling TTYs and foreground process groups. (Don't worry, this is mostly tests.) Implemented the following ioctls: - TIOCSCTTY - set controlling TTY - TIOCNOTTY - remove controlling tty, maybe signal some other processes - TIOCGPGRP - get foreground process group. Also enables tcgetpgrp(). - TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp(). Next steps are to actually turn terminal-generated control characters (e.g. C^c) into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when appropriate. PiperOrigin-RevId: 261387276	2019-08-02 14:05:48 -07:00
Nicolas Lacasse	aaaefdf9ca	Remove kernel.mounts. We can get the mount namespace from the CreateProcessArgs in all cases where we need it. This also gets rid of kernel.Destroy method, since the only thing it was doing was DecRefing the mounts. Removing the need to call kernel.SetRootMountNamespace also allowed for some more simplifications in the container fs setup code. PiperOrigin-RevId: 261357060	2019-08-02 11:23:11 -07:00
Nicolas Lacasse	f2b25aeac7	tmpfs and ramfs Dirs should drop references on children in Release(). This is the source of many warnings like: AtomicRefCount 0x7f5ff84e3500 owned by "fs.Inode" garbage collected with ref count of 1 (want 0) PiperOrigin-RevId: 261197093	2019-08-01 14:25:14 -07:00
Jamie Liu	a7d5e0d254	Cache pages in CachingInodeOperations.Read when memory evictions are delayed. PiperOrigin-RevId: 260851452	2019-07-30 20:32:29 -07:00
Ayush Ranjan	5afa642deb	ext: Migrate from using fileReader custom interface to using io.Reader. It gets rid of holding state of the io.Reader offset (which is anyways held by the vfs.FileDescriptor struct. It is also odd using a io.Reader becuase we using io.ReaderAt to interact with the device. So making a io.ReaderAt wrapper makes more sense. Most importantly, it gets rid of the complexity of extracting the file reader from a regular file implementation and then using it. Now we can just use the regular file implementation as a reader which is more intuitive. PiperOrigin-RevId: 260846927	2019-07-30 19:43:59 -07:00
Ayush Ranjan	9fbe984dc1	ext: block map file reader implementation. Also adds stress tests for block map reader and intensifies extent reader tests. PiperOrigin-RevId: 260838177	2019-07-30 18:20:31 -07:00
Zach Koopmans	e511c0e05f	Add feature to launch Sentry from an open host FD. Adds feature to launch from an open host FD instead of a binary_path. The FD should point to a valid executable and most likely be statically compiled. If the executable is not statically compiled, the loader will search along the interpreter paths, which must be able to be resolved in the Sandbox's file system or start will fail. PiperOrigin-RevId: 260756825	2019-07-30 11:20:40 -07:00
Ayush Ranjan	8da9f8a12c	Migrate from using io.ReadSeeker to io.ReaderAt. This provides the following benefits: - We can now use pkg/fd package which does not take ownership of the file descriptor. So it does not close the fd when garbage collected. This reduces scope of errors from unexpected garbage collection of io.File. - It enforces the offset parameter in every read call. It does not affect the fd offset nor is it affected by it. Hence reducing scope of error of using stale offsets when reading. - We do not need to serialize the usage of any global file descriptor anymore. So this drops the mutual exclusion req hence reducing complexity and congestion. PiperOrigin-RevId: 260635174	2019-07-29 20:12:37 -07:00
Ayush Ranjan	ddf25e3331	ext: extent reader implementation. PiperOrigin-RevId: 260629559	2019-07-29 19:17:27 -07:00
Ayush Ranjan	b765eb4589	ext: inode implementations. PiperOrigin-RevId: 260624470	2019-07-29 18:33:55 -07:00
Fabricio Voznika	7052d21dc4	Automated rollback of changelist 255679453 PiperOrigin-RevId: 260047477	2019-07-25 16:48:49 -07:00
Ayush Ranjan	8376757495	ext: filesystem boilerplate code. PiperOrigin-RevId: 259865366	2019-07-24 19:08:21 -07:00
Ayush Ranjan	417096f781	ext: Add tests for root directory inode. PiperOrigin-RevId: 259856442	2019-07-24 17:59:57 -07:00
Ayush Ranjan	2ed832ff86	ext: testing environment setup with VFS2 support. PiperOrigin-RevId: 259835948	2019-07-24 16:03:30 -07:00
Ayush Ranjan	7e38d64333	ext: Inode creation logic. PiperOrigin-RevId: 259666476	2019-07-23 20:36:04 -07:00
Ayush Ranjan	d7bb79b6f1	ext: Add ext2 and ext3 tiny images. PiperOrigin-RevId: 259657917	2019-07-23 19:01:05 -07:00
Ayush Ranjan	bd7708956f	ext: Added extent tree building logic. PiperOrigin-RevId: 259628657	2019-07-23 15:51:50 -07:00
Michael Pratt	6f7e2bb388	Take copyMu in Revalidate copyMu is required to read child.overlay.upper. PiperOrigin-RevId: 258662209	2019-07-17 16:12:01 -07:00
Jamie Liu	2bc398bfd8	Separate O_DSYNC and O_SYNC. PiperOrigin-RevId: 258657913	2019-07-17 15:52:38 -07:00
Ayush Ranjan	84a59de5dc	ext: disklayout: extents support. PiperOrigin-RevId: 258657776	2019-07-17 15:48:58 -07:00
Ayush Ranjan	8e3e021aca	ext: Filesystem init implementation. PiperOrigin-RevId: 258645957	2019-07-17 14:48:04 -07:00
gVisor bot	682fd2d68f	Merge pull request #533 from kevinGC:stub-dev-tty PiperOrigin-RevId: 258607547	2019-07-17 11:28:30 -07:00
Michael Pratt	ca829158e3	Properly invalidate cache in rename and remove We were invalidating the wrong overlayEntry in rename and missing invalidation in rename and remove if lower exists. PiperOrigin-RevId: 258604685	2019-07-17 11:14:57 -07:00
gVisor bot	78a2704bde	Merge pull request #474 from zhuangel:proctasks PiperOrigin-RevId: 258479216	2019-07-16 18:12:07 -07:00
Jianfeng Tan	cf4fc510fd	Support /proc/net/dev This proc file reports the stats of interfaces. We could use ifconfig command to check the result. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ia7c1e637f5c76c30791ffda68ee61e861b6ef827 COPYBARA_INTEGRATE_REVIEW=https://gvisor-review.googlesource.com/c/gvisor/+/18282/ PiperOrigin-RevId: 258303936	2019-07-15 22:51:05 -07:00
gVisor bot	eff2c264a4	Merge pull request #282 from zhangningdlut:chris_test_proc PiperOrigin-RevId: 257855479	2019-07-12 13:11:01 -07:00
Kevin	ddef7f8078	Fix license year and remove Read.	2019-07-11 21:31:26 -07:00
Kevin	44427d8e26	Add a stub for /dev/tty. Actual implementation to follow, but this will satisfy applications that want it to just exist.	2019-07-11 21:24:27 -07:00
Ayush Ranjan	2eeca68900	Added tiny ext4 image. The image is of size 64Kb which supports 64 1k blocks and 16 inodes. This is the smallest size mkfs.ext4 works with. Added README.md documenting how this was created and included all files on the device under assets. PiperOrigin-RevId: 257712672	2019-07-11 17:17:47 -07:00
Ayush Ranjan	5242face2e	ext: boilerplate code. Renamed ext4 to ext since we are targeting ext(2/3/4). Removed fs.go since we are targeting VFS2. Added ext.go with filesystem struct. PiperOrigin-RevId: 257689775	2019-07-11 15:05:36 -07:00
Ayush Ranjan	7965b1272b	ext4: disklayout: Directory Entry implementation. PiperOrigin-RevId: 257314911	2019-07-09 18:36:02 -07:00
Nicolas Lacasse	659bebab8e	Don't try to execute a file that is not regular. PiperOrigin-RevId: 257037608	2019-07-08 12:56:48 -07:00
Ayush Ranjan	8f9b1ca8e7	ext4: disklayout: inode impl. PiperOrigin-RevId: 257010414	2019-07-08 10:44:11 -07:00
Adin Scannell	753da9604e	Remove map from fd_map, change to fd_table. This renames FDMap to FDTable and drops the kernel.FD type, which had an entire package to itself and didn't serve much use (it was freely cast between types, and served as more of an annoyance than providing any protection.) Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup operation, and 10-15 ns per concurrent lookup operation of savings. This also fixes two tangential usage issues with the FDMap. Namely, non-atomic use of NewFDFrom and associated calls to Remove (that are both racy and fail to drop the reference on the underlying file.) PiperOrigin-RevId: 256285890	2019-07-02 19:28:59 -07:00
Ayush Ranjan	d8ec2fb671	Ext4: DiskLayout: Inode interface. PiperOrigin-RevId: 256234390	2019-07-02 14:04:31 -07:00
Ian Gudger	45566fa4e4	Add finalizer on AtomicRefCount to check for leaks. PiperOrigin-RevId: 255711454	2019-06-28 20:07:52 -07:00
Adin Scannell	7dae043fec	Drop ashmem and binder. These are unfortunately unused and unmaintained. They can be brought back in the future if need requires it. PiperOrigin-RevId: 255697132	2019-06-28 17:20:25 -07:00
Ayush Ranjan	c4da599e22	ext4: disklayout: SuperBlock interface implementations. PiperOrigin-RevId: 255687771	2019-06-28 16:18:29 -07:00
Nicolas Lacasse	295078fa7a	Automated rollback of changelist 255263686 PiperOrigin-RevId: 255679453	2019-06-28 15:28:41 -07:00
Ayush Ranjan	7c13789818	Superblock interface in the disk layout package for ext4. PiperOrigin-RevId: 255644277	2019-06-28 12:07:28 -07:00
Yong He	c61d7761b4	Fix deadloop in proc subtask list Readdir of /proc/x/task/ will get direntry entries from tasks of specified taskgroup. Now the tasks slice is unsorted, use sort.SearchInts search entry from the slice may cause infinity loops. The fix is sort the slice before search. This issue could be easily reproduced via following steps, revise Readdir in pkg/sentry/fs/proc/task.go, force set taskInts into test slice []int{1, 11, 7, 5, 10, 6, 8, 3, 9, 2, 4}, then run docker image and run ls /proc/1/task, the command will cause infinity loops.	2019-06-28 22:20:57 +08:00
Fabricio Voznika	b2907595e5	Complete pipe support on overlayfs Get/Set pipe size and ioctl support were missing from overlayfs. It required moving the pipe.Sizer interface to fs so that overlay could get access. Fixes #318 PiperOrigin-RevId: 255511125	2019-06-27 17:22:53 -07:00
Michael Pratt	5b41ba5d0e	Fix various spelling issues in the documentation Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779	2019-06-27 14:25:50 -07:00
Michael Pratt	085a907565	Cache directory entries in the overlay Currently, the overlay dirCache is only used for a single logical use of getdents. i.e., it is discard when the FD is closed or seeked back to the beginning. But the initial work of getting the directory contents can be quite expensive (particularly sorting large directories), so we should keep it as long as possible. This is very similar to the readdirCache in fs/gofer. Since the upper filesystem does not have to allow caching readdir entries, the new CacheReaddir MountSourceOperations method controls this behavior. This caching should be trivially movable to all Inodes if desired, though that adds an additional copy step for non-overlay Inodes. (Overlay Inodes already do the extra copy). PiperOrigin-RevId: 255477592	2019-06-27 14:24:03 -07:00
Fabricio Voznika	42e212f6b7	Preserve permissions when checking lower The code was wrongly assuming that only read access was required from the lower overlay when checking for permissions. This allowed non-writable files to be writable in the overlay. Fixes #316 PiperOrigin-RevId: 255263686	2019-06-26 14:24:44 -07:00
Michael Pratt	e98ce4a2c6	Add TODO reminder to remove tmpfs caching options Updates #179 PiperOrigin-RevId: 255081565	2019-06-25 17:12:34 -07:00
Andrei Vagin	e9ea7230f7	fs: synchronize concurrent writes into files with O_APPEND For files with O_APPEND, a file write operation gets a file size and uses it as offset to call an inode write operation. This means that all other operations which can change a file size should be blocked while the write operation doesn't complete. PiperOrigin-RevId: 254873771	2019-06-24 17:45:02 -07:00
Rahat Mahmood	94a6bfab5d	Implement /proc/net/tcp. PiperOrigin-RevId: 254854346	2019-06-24 15:56:36 -07:00
chris.zn	f957fb23cf	Return ENOENT when reading /proc/{pid}/task of an exited process There will be a deadloop when we use getdents to read /proc/{pid}/task of an exited process Like this: Process A is running Process B: open /proc/{pid of A}/task Process A exits Process B: getdents /proc/{pid of A}/task Then, process B will fall into deadloop, and return "." and ".." in loops and never ends. This patch returns ENOENT when use getdents to read /proc/{pid}/task if the process is just exited. Signed-off-by: chris.zn <chris.zn@antfin.com>	2019-06-24 15:49:53 +08:00
Nicolas Lacasse	35719d52c7	Implement statx. We don't have the plumbing for btime yet, so that field is left off. The returned mask indicates that btime is absent. Fixes #343 PiperOrigin-RevId: 254575752	2019-06-22 13:29:26 -07:00
Andrei Vagin	ab6774cebf	gvisor/fs: getdents returns 0 if offset is equal to FileMaxOffset FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called. PiperOrigin-RevId: 254498777	2019-06-21 17:25:17 -07:00
Ayush Ranjan	727375321f	ext4 block group descriptor implementation in disk layout package. PiperOrigin-RevId: 254482180	2019-06-21 15:42:46 -07:00
Michael Pratt	292f70cbf7	Add package docs to seqfile and ramfs These are the only packages missing docs: https://godoc.org/gvisor.dev/gvisor PiperOrigin-RevId: 254261022	2019-06-20 13:34:33 -07:00
Nicolas Lacasse	f7428af9c1	Add MountNamespace to task. This allows tasks to have distinct mount namespace, instead of all sharing the kernel's root mount namespace. Currently, the only way for a task to get a different mount namespace than the kernel's root is by explicitly setting a different MountNamespace in CreateProcessArgs, and nothing does this (yet). In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a new container inside runsc. Note that "MountNamespace" is a poor term for this thing. It's more like a distinct VFS tree. When we get around to adding real mount namespaces, this will need a better naem. PiperOrigin-RevId: 254009310	2019-06-19 09:21:21 -07:00
Fabricio Voznika	ca245a428b	Attempt to fix TestPipeWritesAccumulate Test fails because it's reading 4KB instead of the expected 64KB. Changed the test to read pipe buffer size instead of hardcode and added some logging in case the reason for failure was not pipe buffer size. PiperOrigin-RevId: 253916040	2019-06-18 19:16:11 -07:00
Andrei Vagin	8ab0848c70	gvisor/fs: don't update file.offset for sockets, pipes, etc sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644	2019-06-18 01:43:29 -07:00
Ian Gudger	3e9b8ecbfe	Plumb context through more layers of filesytem. All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709	2019-06-13 18:40:38 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00
Fabricio Voznika	fc746efa9a	Add support to mount pod shared tmpfs mounts Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037	2019-06-11 14:54:31 -07:00
Rahat Mahmood	a00157cc0e	Store more information in the kernel socket table. Store enough information in the kernel socket table to distinguish between different types of sockets. Previously we were only storing the socket family, but this isn't enough to classify sockets. For example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets are SOCK_DGRAM sockets with a particular protocol. Instead of creating more sub-tables, flatten the socket table and provide a filtering mechanism based on the socket entry. Also generate and store a socket entry index ("sl" in linux) which allows us to output entries in a stable order from procfs. PiperOrigin-RevId: 252495895	2019-06-10 15:17:43 -07:00
Rahat Mahmood	315cf9a523	Use common definition of SockType. SockType isn't specific to unix domain sockets, and the current definition basically mirrors the linux ABI's definition. PiperOrigin-RevId: 251956740	2019-06-06 17:00:27 -07:00
Fabricio Voznika	02ab1f187c	Copy up parent when binding UDS on overlayfs Overlayfs was expecting the parent to exist when bind(2) was called, which may not be the case. The fix is to copy the parent directory to the upper layer before binding the UDS. There is not good place to add tests for it. Syscall tests would be ideal, but it's hard to guarantee that the directory where the socket is created hasn't been touched before (and thus copied the parent to the upper layer). Added it to runsc integration tests for now. If it turns out we have lots of these kind of tests, we can consider moving them somewhere more appropriate. PiperOrigin-RevId: 251954156	2019-06-06 16:45:51 -07:00
Rahat Mahmood	2d2831e354	Track and export socket state. This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850	2019-06-06 15:04:47 -07:00
Michael Pratt	57772db2e7	Shutdown host sockets on internal shutdown This is required to make the shutdown visible to peers outside the sandbox. The readClosed / writeClosed fields were dropped, as they were preventing a shutdown socket from reading the remainder of queued bytes. The host syscalls will return the appropriate errors for shutdown. The control message tests have been split out of socket_unix.cc to make the (few) remaining tests accessible to testing inherited host UDS, which don't support sending control messages. Updates #273 PiperOrigin-RevId: 251763060	2019-06-05 18:40:37 -07:00
Michael Pratt	d3ed9baac0	Implement dumpability tracking and checks We don't actually support core dumps, but some applications want to get/set dumpability, which still has an effect in procfs. Lack of support for set-uid binaries or fs creds simplifies things a bit. As-is, processes started via CreateProcess (i.e., init and sentryctl exec) have normal dumpability. I'm a bit torn on whether sentryctl exec tasks should be dumpable, but at least since they have no parent normal UID/GID checks should protect them. PiperOrigin-RevId: 251712714	2019-06-05 14:00:13 -07:00
Yong He	7398f013f0	Drop one dirent reference after referenced by file When pipe is created, a dirent of pipe will be created and its initial reference is set as 0. Cause all dirent will only be destroyed when the reference decreased to -1, so there is already a 'initial reference' of dirent after it created. For destroying dirent after all reference released, the correct way is to drop the 'initial reference' once someone hold a reference to the dirent, such as fs.NewFile, otherwise the reference of dirent will stay 0 all the time, and will cause memory leak of dirent. Except pipe, timerfd/eventfd/epoll has the same problem Here is a simple case to create memory leak of dirent for pipe/timerfd/eventfd/epoll in C langange, after run the case, pprof the runsc process, you will find lots dirents of pipe/timerfd/eventfd/epoll not freed: int main(int argc, char *argv[]) { int i; int n; int pipefd[2]; if (argc != 3) { printf("Usage: %s epoll\|timerfd\|eventfd\|pipe <iterations>\n", argv[0]); } n = strtol(argv[2], NULL, 10); if (strcmp(argv[1], "epoll") == 0) { for (i = 0; i < n; ++i) close(epoll_create(1)); } else if (strcmp(argv[1], "timerfd") == 0) { for (i = 0; i < n; ++i) close(timerfd_create(CLOCK_REALTIME, 0)); } else if (strcmp(argv[1], "eventfd") == 0) { for (i = 0; i < n; ++i) close(eventfd(0, 0)); } else if (strcmp(argv[1], "pipe") == 0) { for (i = 0; i < n; ++i) if (pipe(pipefd) == 0) { close(pipefd[0]); close(pipefd[1]); } } printf("%s %s test finished\r\n",argv[1],argv[2]); return 0; } Change-Id: Ia1b8a1fb9142edb00c040e44ec644d007f81f5d2 PiperOrigin-RevId: 251531096	2019-06-04 15:40:23 -07:00
Andrei Vagin	90a116890f	gvisor/sock/unix: pass creds when a message is sent between unconnected sockets and don't report a sender address if it doesn't have one PiperOrigin-RevId: 251371284	2019-06-03 21:48:19 -07:00
Andrei Vagin	00f8663887	gvisor/fs: return a proper error from FileWriter.Write in case of a short-write The io.Writer contract requires that Write writes all available bytes and does not return short writes. This causes errors with io.Copy, since our own Write interface does not have this same contract. PiperOrigin-RevId: 251368730	2019-06-03 21:26:01 -07:00
Nicolas Lacasse	6f73d79c32	Simplify overlayBoundEndpoint. There is no reason to do the recursion manually, since Inode.BoundEndpoint will do it for us. PiperOrigin-RevId: 250794903	2019-05-30 17:20:20 -07:00
chris.zn	b18df9bed6	Add VmData field to /proc/{pid}/status VmData is the size of private data segments. It has the same meaning as in Linux. Change-Id: Iebf1ae85940a810524a6cde9c2e767d4233ddb2a PiperOrigin-RevId: 250593739	2019-05-30 12:07:40 -07:00
Adin Scannell	2165b77774	Remove obsolete bug. The original bug is no longer relevant, and the FIXME here contains lots of obsolete information. PiperOrigin-RevId: 249924036	2019-05-30 12:03:39 -07:00
Adin Scannell	ed5793808e	Remove obsolete TODO. We don't need to model internal interfaces after the system call interfaces (which are objectively worse and simply use a flag to distinguish between two logically different operations). PiperOrigin-RevId: 249916814 Change-Id: I45d02e0ec0be66b782a685b1f305ea027694cab9	2019-05-24 16:18:09 -07:00
Andrei Vagin	a949133c4b	gvisor: interrupt the sendfile system call if a task has been interrupted sendfile can be called for a big range and it can require significant amount of time to process it, so we need to handle task interrupts in this system call. PiperOrigin-RevId: 249781023 Change-Id: Ifc2ec505d74c06f5ee76f93b8d30d518ec2d4015	2019-05-23 23:21:13 -07:00
Ayush Ranjan	6240abb205	Added boilerplate code for ext4 fs. Initialized BUILD with license Mount is still unimplemented and is not meant to be part of this CL. Rest of the fs interface is implemented. Referenced the Linux kernel appropriately when needed PiperOrigin-RevId: 249741997 Change-Id: Id1e4c7c9e68b3f6946da39896fc6a0c3dcd7f98c	2019-05-23 16:55:42 -07:00
Fabricio Voznika	9006304dfe	Initial support for bind mounts Separate MountSource from Mount. This is needed to allow mounts to be shared by multiple containers within the same pod. PiperOrigin-RevId: 249617810 Change-Id: Id2944feb7e4194951f355cbe6d4944ae3c02e468	2019-05-23 04:16:10 -07:00
Adin Scannell	21915eb58b	Remove obsolete TODO. There no obvious reason to require that BlockSize and StatFS are MountSource operations. Today they are in INodeOperations, and they can be moved elsewhere in the future as part of a normal refactor process. PiperOrigin-RevId: 249549982 Change-Id: Ib832e02faeaf8253674475df4e385bcc53d780f3	2019-05-22 17:00:36 -07:00
Adin Scannell	9cdae51fec	Add basic plumbing for splice and stub implementation. This does not actually implement an efficient splice or sendfile. Rather, it adds a generic plumbing to the file internals so that this can be added. All file implementations use the stub fileutil.NoSplice implementation, which causes sendfile and splice to fall back to an internal copy. A basic splice system call interface is added, along with a test. PiperOrigin-RevId: 249335960 Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b	2019-05-21 15:18:12 -07:00
Neel Natu	adeb99709b	Remove unused struct member. Remove unused struct member. PiperOrigin-RevId: 249300446 Change-Id: Ifb16538f684bc3200342462c3da927eb564bf52d	2019-05-21 12:20:19 -07:00
Michael Pratt	80cc2c78e5	Forward named pipe creation to the gofer The backing 9p server must allow named pipe creation, which the runsc fsgofer currently does not. There are small changes to the overlay here. GetFile may block when opening a named pipe, which can cause a deadlock: 1. open(O_RDONLY) -> copyMu.Lock() -> GetFile() 2. open(O_WRONLY) -> copyMu.Lock() -> Deadlock A named pipe usable for writing must already be on the upper filesystem, but we are still taking copyMu for write when checking for upper. That can be changed to a read lock to fix the common case. However, a named pipe on the lower filesystem would still deadlock in open(O_WRONLY) when it tries to actually perform copy up (which would simply return EINVAL). Move the copy up type check before taking copyMu for write to avoid this. p9 must be modified, as it was incorrectly removing the file mode when sending messages on the wire. PiperOrigin-RevId: 249154033 Change-Id: Id6637130e567b03758130eb6c7cdbc976384b7d6	2019-05-20 16:53:08 -07:00
Michael Pratt	6588427451	Fix incorrect tmpfs timestamp updates * Creation of files, directories (and other fs objects) in a directory should always update ctime. * Same for removal. * atime should not be updated on lookup, only readdir. I've also renamed some misleading functions that update mtime and ctime. PiperOrigin-RevId: 249115063 Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7	2019-05-20 13:35:17 -07:00
Michael Pratt	4a842836e5	Return EPERM for mknod This more directly matches what Linux does with unsupported nodes. PiperOrigin-RevId: 248780425 Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec	2019-05-17 13:47:40 -07:00
Michael Pratt	04105781ad	Fix gofer rename ctime and cleanup stat_times test There is a lot of redundancy that we can simplify in the stat_times test. This will make it easier to add new tests. However, the simplification reveals that cached uattrs on goferfs don't properly update ctime on rename. PiperOrigin-RevId: 248773425 Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf	2019-05-17 13:05:47 -07:00
Andrei Vagin	2105158d4b	gofer: don't call hostfile.Close if hostFile is nil PiperOrigin-RevId: 248437159 Change-Id: Ife71f6ca032fca59ec97a82961000ed0af257101	2019-05-15 17:21:10 -07:00
Nicolas Lacasse	dd153c014d	Start of support for /proc/pid/cgroup file. PiperOrigin-RevId: 248263378 Change-Id: Ic057d2bb0b6212110f43ac4df3f0ac9bf931ab98	2019-05-14 20:34:50 -07:00
Michael Pratt	330a1bbd04	Remove false comment PiperOrigin-RevId: 248249285 Change-Id: I9b6d267baa666798b22def590ff20c9a118efd47	2019-05-14 18:06:14 -07:00
Fabricio Voznika	1bee43be13	Implement fallocate(2) Closes #225 PiperOrigin-RevId: 247508791 Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc	2019-05-09 15:35:49 -07:00
Nicolas Lacasse	bfd9f75ba4	Set the FilesytemType in MountSource from the Filesystem. And stop storing the Filesystem in the MountSource. This allows us to decouple the MountSource filesystem type from the name of the filesystem. PiperOrigin-RevId: 247292982 Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8	2019-05-08 14:35:06 -07:00
Fabricio Voznika	e5432fa1b3	Remove defers from gofer.contextFile Most are single line methods in hot paths. PiperOrigin-RevId: 247050267 Change-Id: I428d78723fe00b57483185899dc8fa9e1f01e2ea	2019-05-07 10:55:09 -07:00
Andrei Vagin	24d8656585	gofer: don't leak file descriptors Fixes #219 PiperOrigin-RevId: 246568639 Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a	2019-05-03 14:01:50 -07:00
Michael Pratt	23ca9886c6	Update reference to old type PiperOrigin-RevId: 246036806 Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7	2019-04-30 15:42:39 -07:00
Jamie Liu	8bfb83d0ac	Implement async MemoryFile eviction, and use it in CachingInodeOperations. This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb	2019-04-30 13:56:41 -07:00
Ian Gudger	81ecd8b6ea	Implement the MSG_CTRUNC msghdr flag for Unix sockets. Updates google/gvisor#206 PiperOrigin-RevId: 245880573 Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e	2019-04-29 21:21:08 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Kevin Krakauer	5f13338d30	Fix reference counting bug in /proc/PID/fdinfo/. PiperOrigin-RevId: 245452217 Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869	2019-04-26 11:09:55 -07:00
Jamie Liu	6b76c172b4	Don't enforce NAME_MAX in fs.Dirent.walk(). Maximum filename length is filesystem-dependent, and obtained via statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not always. For example, VFAT supports filenames of up to 255... UCS-2 characters, which Linux conservatively takes to mean UTF-8-encoded bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. As a result, Linux's VFS does not enforce NAME_MAX: $ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/ fs/libfs.c 38: buf->f_namelen = NAME_MAX; 64: if (dentry->d_name.len > NAME_MAX) include/linux/relay.h 74: char base_filename[NAME_MAX]; /* saved base filename / include/linux/fscrypt.h 149: filenames up to NAME_MAX bytes, since base64 encoding expands the length. include/linux/exportfs.h 176: * understanding that it is already pointing to a a %NAME_MAX+1 sized Remove this check from core VFS, and add it to ramfs (and by extension tmpfs), where it is actually applicable: mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup does enforce NAME_MAX. PiperOrigin-RevId: 245324748 Change-Id: I17567c4324bfd60e31746a5270096e75db963fac	2019-04-25 16:05:13 -07:00
Michael Pratt	d6aac9387f	Fix doc typo PiperOrigin-RevId: 244773890 Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa	2019-04-22 18:22:19 -07:00
Fabricio Voznika	c8cee7108f	Use FD limit and file size limit from host FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6	2019-04-17 12:57:40 -07:00
Jamie Liu	4209edafb6	Use open fids when fstat()ing gofer files. PiperOrigin-RevId: 243018347 Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37	2019-04-11 00:43:04 -07:00

1 2 3 4 5 ...

348 Commits