gvisor

Commit Graph

Author	SHA1	Message	Date
Kevin Krakauer	2a82d5ad68	Reorder BUILD license and load functions in gvisor. PiperOrigin-RevId: 275139066	2019-10-16 16:40:30 -07:00
Fabricio Voznika	9fb562234e	Fix problem with open FD when copy up is triggered in overlayfs Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289	2019-10-16 15:06:24 -07:00
Nicolas Lacasse	fd4e436002	Support O_SYNC and O_DSYNC flags. When any of these flags are set, all writes will trigger a subsequent fsync call. This behavior already existed for "write-through" mounts. O_DIRECT is treated as an alias for O_SYNC. Better support coming soon. PiperOrigin-RevId: 275114392	2019-10-16 15:01:23 -07:00
gVisor bot	d22f0534c0	Merge pull request #736 from tanjianfeng:fix-unix PiperOrigin-RevId: 275114157	2019-10-16 14:41:43 -07:00
Kevin Krakauer	1de0cf3563	Remove unnecessary context parameter for new pipes. PiperOrigin-RevId: 273421634	2019-10-07 18:16:14 -07:00
Nicolas Lacasse	f24c3188b5	Add sanity check that overlayCreate is called with an overlay parent inode. PiperOrigin-RevId: 272987037	2019-10-04 17:03:50 -07:00
gVisor bot	cde7711837	Merge pull request #865 from tanjianfeng:fix-829 PiperOrigin-RevId: 272522508	2019-10-02 14:51:04 -07:00
Andrei Vagin	2016cc283c	fs/proc: report PID-s from a pid namespace of the proc mount Right now, we can find more than one process with the 1 PID in /proc. $ for i in `seq 10`; do > unshare -fp sleep 1000 & > done $ ls /proc 1 1 1 1 12 18 24 29 6 loadavg net sys version 1 1 1 1 16 20 26 32 cpuinfo meminfo self thread-self 1 1 1 1 17 21 28 36 filesystems mounts stat uptime PiperOrigin-RevId: 272506593	2019-10-02 13:29:42 -07:00
Michael Pratt	dd69b49ed1	Disable cpuClockTicker when app is idle Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to track their CPU usage. This improves latency in the syscall path by avoid expensive monotonic clock calls on every syscall entry/exit. However, this timer fires every 10ms. Thus, when all tasks are idle (i.e., blocked or stopped), this forces a sentry wakeup every 10ms, when we may otherwise be able to sleep until the next app-relevant event. These wakeups cause the sentry to utilize approximately 2% CPU when the application is otherwise idle. Updates to clock are not strictly necessary when the app is idle, as there are no readers of cpuClock. This commit reduces idle CPU by disabling the timer when tasks are completely idle, and computing its effects at the next wakeup. Rather than disabling the timer as soon as the app goes idle, we wait until the next tick, which provides a window for short sleeps to sleep and wakeup without doing the (relatively) expensive work of disabling and enabling the timer. PiperOrigin-RevId: 272265822	2019-10-01 12:21:01 -07:00
Andrei Vagin	7a234f736f	splice: try another fallback option only if the previous one isn't supported Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com PiperOrigin-RevId: 272110815	2019-09-30 18:23:42 -07:00
Nicolas Lacasse	3ad17ff597	Force timestamps to update when set via InodeOperations.SetTimestamps. The gofer's CachingInodeOperations implementation contains an optimization for the common open-read-close pattern when we have a host FD. In this case, the host kernel will update the timestamp for us to a reasonably close time, so we don't need an extra RPC to the gofer. However, when the app explicitly sets the timestamps (via futimes or similar) then we actually DO need to update the timestamps, because the host kernel won't do it for us. To fix this, a new boolean `forceSetTimestamps` was added to CachineInodeOperations.SetMaskedAttributes. It is only set by gofer.InodeOperations.SetTimestamps. PiperOrigin-RevId: 272048146	2019-09-30 13:08:45 -07:00
henry.tjf	bc9de939fd	tty: fix sending SIGTTOU on tty write How to reproduce: $ echo "timeout 10 ls" > foo.sh $ chmod +x foo.sh $ ./foo.sh (will hang here for 10 secs, and the output of ls does not show) When "ls" process writes to stdout, it receives SIGTTOU signal, and hangs there. Until "timeout" process timeouts, and kills "ls" process. The expected result is: "ls" writes its output into tty, and terminates immdedately, then "timeout" process receives SIGCHLD and terminates. The reason for this failure is that we missed the check for TOSTOP (if set, background processes will receive the SIGTTOU signal when they do write). We use drivers/tty/n_tty.c:n_tty_write() as a reference. Fixes: #862 Reported-by: chris.zn <chris.zn@antfin.com> Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Signed-off-by: chenglang.hy <chenglang.hy@antfin.com>	2019-09-24 14:18:22 +00:00
Jianfeng Tan	329b6653ff	Implement /proc/net/tcp6 Fixes: #829 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Signed-off-by: Jielong Zhou <jielong.zjl@antfin.com>	2019-09-20 17:20:08 +00:00
Kevin Krakauer	0a8a75f3da	Job control: controlling TTYs and foreground process groups. Adresses a deadlock with the rolled back change: `b6a5b950d2` Creating a session from an orphaned process group was causing a lock to be acquired twice by a single goroutine. This behavior is addressed, and a test (OrphanRegression) has been added to pty.cc. Implemented the following ioctls: - TIOCSCTTY - set controlling TTY - TIOCNOTTY - remove controlling tty, maybe signal some other processes - TIOCGPGRP - get foreground process group. Also enables tcgetpgrp(). - TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp(). Next steps are to actually turn terminal-generated control characters (e.g. C^c) into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when appropriate. PiperOrigin-RevId: 270088599	2019-09-19 11:36:47 -07:00
Andrei Vagin	239a07aabf	gvisor: return ENOTDIR from the unlink syscall ENOTDIR has to be returned when a component used as a directory in pathname is not, in fact, a directory. PiperOrigin-RevId: 269037893	2019-09-13 21:44:57 -07:00
Adin Scannell	7c6ab6a219	Implement splice methods for pipes and sockets. This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207	2019-09-12 17:43:27 -07:00
Michael Pratt	df5d377521	Remove go_test from go_stateify and go_marshal They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264	2019-09-12 15:10:17 -07:00
Jamie Liu	0352cf5866	Remove support for non-incremental mapped accounting. PiperOrigin-RevId: 266496644	2019-08-30 19:06:55 -07:00
Bhasker Hariharan	54bf2e8eff	Automated rollback of changelist 261387276 PiperOrigin-RevId: 266491264	2019-08-30 18:15:32 -07:00
Rahat Mahmood	863e11ac4d	Implement /proc/net/udp. PiperOrigin-RevId: 266229756	2019-08-29 14:30:41 -07:00
Jamie Liu	36a8949b2a	Add limit_host_fd_translation Gofer mount option. PiperOrigin-RevId: 266177409	2019-08-29 14:01:03 -07:00
Fabricio Voznika	c39564332b	Mount volumes as super user This used to be the case, but regressed after a recent change. Also made a few fixes around it and clean up the code a bit. Closes #720 PiperOrigin-RevId: 265717496	2019-08-27 10:47:16 -07:00
Jianfeng Tan	2c3e2ed2bf	unix: return ECONNRESET if peer closed with data not read For SOCK_STREAM type unix socket, we shall return ECONNRESET if peer is closed with data not read. We explictly set a flag when closing one end, to differentiate from just shutdown (where zero shall be returned). Fixes: #735 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>	2019-08-22 15:25:38 +00:00
Tamir Duberstein	d81d94ac4c	Replace uinptr with int64 when returning lengths This is in accordance with newer parts of the standard library. PiperOrigin-RevId: 263449916	2019-08-14 16:05:56 -07:00
Fabricio Voznika	0e907c4298	Fix file mode check in pipeOperations PiperOrigin-RevId: 263203441	2019-08-13 13:33:33 -07:00
Ayush Ranjan	c8961a6cbd	ext: Move to pkg/sentry/fsimpl. fsimpl is the keeper of all filesystem implementations in VFS2. PiperOrigin-RevId: 262617869	2019-08-09 13:08:28 -07:00
Ayush Ranjan	690308111c	ext: Benchmark tests. Added benchmark tests which emulate memfs benchmarks. Stat benchmarks BenchmarkVFS2Ext4fsStat/1-12 10000000 145 ns/op BenchmarkVFS2Ext4fsStat/2-12 10000000 170 ns/op BenchmarkVFS2Ext4fsStat/3-12 10000000 202 ns/op BenchmarkVFS2Ext4fsStat/8-12 3000000 374 ns/op BenchmarkVFS2Ext4fsStat/64-12 500000 2159 ns/op BenchmarkVFS2Ext4fsStat/100-12 300000 3459 ns/op BenchmarkVFS1TmpfsStat/1-12 5000000 348 ns/op BenchmarkVFS1TmpfsStat/2-12 3000000 487 ns/op BenchmarkVFS1TmpfsStat/3-12 2000000 655 ns/op BenchmarkVFS1TmpfsStat/8-12 1000000 1365 ns/op BenchmarkVFS1TmpfsStat/64-12 200000 9565 ns/op BenchmarkVFS1TmpfsStat/100-12 100000 15158 ns/op BenchmarkVFS2MemfsStat/1-12 10000000 133 ns/op BenchmarkVFS2MemfsStat/2-12 10000000 155 ns/op BenchmarkVFS2MemfsStat/3-12 10000000 182 ns/op BenchmarkVFS2MemfsStat/8-12 5000000 310 ns/op BenchmarkVFS2MemfsStat/64-12 1000000 1659 ns/op BenchmarkVFS2MemfsStat/100-12 500000 2787 ns/op Mount Stat benchmarks BenchmarkVFS2ExtfsMountStat/1-12 5000000 245 ns/op BenchmarkVFS2ExtfsMountStat/2-12 5000000 266 ns/op BenchmarkVFS2ExtfsMountStat/3-12 5000000 304 ns/op BenchmarkVFS2ExtfsMountStat/8-12 3000000 456 ns/op BenchmarkVFS2ExtfsMountStat/64-12 500000 2308 ns/op BenchmarkVFS2ExtfsMountStat/100-12 300000 3482 ns/op BenchmarkVFS1TmpfsMountStat/1-12 3000000 488 ns/op BenchmarkVFS1TmpfsMountStat/2-12 2000000 658 ns/op BenchmarkVFS1TmpfsMountStat/3-12 2000000 806 ns/op BenchmarkVFS1TmpfsMountStat/8-12 1000000 1514 ns/op BenchmarkVFS1TmpfsMountStat/64-12 100000 10037 ns/op BenchmarkVFS1TmpfsMountStat/100-12 100000 15280 ns/op BenchmarkVFS2MemfsMountStat/1-12 10000000 212 ns/op BenchmarkVFS2MemfsMountStat/2-12 5000000 232 ns/op BenchmarkVFS2MemfsMountStat/3-12 5000000 264 ns/op BenchmarkVFS2MemfsMountStat/8-12 3000000 390 ns/op BenchmarkVFS2MemfsMountStat/64-12 1000000 1813 ns/op BenchmarkVFS2MemfsMountStat/100-12 500000 2812 ns/op PiperOrigin-RevId: 262477158	2019-08-08 18:45:37 -07:00
Rahat Mahmood	7bfad8ebb6	Return a well-defined socket address type from socket funtions. Previously we were representing socket addresses as an interface{}, which allowed any type which could be binary.Marshal()ed to be used as a socket address. This is fine when the address is passed to userspace via the linux ABI, but is problematic when used from within the sentry such as by networking procfs files. PiperOrigin-RevId: 262460640	2019-08-08 16:50:33 -07:00
Ayush Ranjan	08cd5e1d36	ext: Seek unit tests. PiperOrigin-RevId: 262264674	2019-08-07 19:13:41 -07:00
Ayush Ranjan	40d6d8c15b	ext: StatAt unit tests. PiperOrigin-RevId: 262249166	2019-08-07 17:21:00 -07:00
Ayush Ranjan	3b368cabf9	ext: Read unit tests. PiperOrigin-RevId: 262242410	2019-08-07 16:44:10 -07:00
Ayush Ranjan	ad67e5a7a0	ext: IterDirent unit tests. PiperOrigin-RevId: 262226761	2019-08-07 15:24:33 -07:00
Ayush Ranjan	1c9781a4ed	ext: vfs.FileDescriptionImpl and vfs.FilesystemImpl implementations. - This also gets rid of pipes for now because pipe does not have vfs2 specific support yet. - Added file path resolution logic. - Fixes testing infrastructure. - Does not include unit tests yet. PiperOrigin-RevId: 262213950	2019-08-07 14:23:42 -07:00
Kevin Krakauer	b6a5b950d2	Job control: controlling TTYs and foreground process groups. (Don't worry, this is mostly tests.) Implemented the following ioctls: - TIOCSCTTY - set controlling TTY - TIOCNOTTY - remove controlling tty, maybe signal some other processes - TIOCGPGRP - get foreground process group. Also enables tcgetpgrp(). - TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp(). Next steps are to actually turn terminal-generated control characters (e.g. C^c) into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when appropriate. PiperOrigin-RevId: 261387276	2019-08-02 14:05:48 -07:00
Nicolas Lacasse	aaaefdf9ca	Remove kernel.mounts. We can get the mount namespace from the CreateProcessArgs in all cases where we need it. This also gets rid of kernel.Destroy method, since the only thing it was doing was DecRefing the mounts. Removing the need to call kernel.SetRootMountNamespace also allowed for some more simplifications in the container fs setup code. PiperOrigin-RevId: 261357060	2019-08-02 11:23:11 -07:00
Nicolas Lacasse	f2b25aeac7	tmpfs and ramfs Dirs should drop references on children in Release(). This is the source of many warnings like: AtomicRefCount 0x7f5ff84e3500 owned by "fs.Inode" garbage collected with ref count of 1 (want 0) PiperOrigin-RevId: 261197093	2019-08-01 14:25:14 -07:00
Jamie Liu	a7d5e0d254	Cache pages in CachingInodeOperations.Read when memory evictions are delayed. PiperOrigin-RevId: 260851452	2019-07-30 20:32:29 -07:00
Ayush Ranjan	5afa642deb	ext: Migrate from using fileReader custom interface to using io.Reader. It gets rid of holding state of the io.Reader offset (which is anyways held by the vfs.FileDescriptor struct. It is also odd using a io.Reader becuase we using io.ReaderAt to interact with the device. So making a io.ReaderAt wrapper makes more sense. Most importantly, it gets rid of the complexity of extracting the file reader from a regular file implementation and then using it. Now we can just use the regular file implementation as a reader which is more intuitive. PiperOrigin-RevId: 260846927	2019-07-30 19:43:59 -07:00
Ayush Ranjan	9fbe984dc1	ext: block map file reader implementation. Also adds stress tests for block map reader and intensifies extent reader tests. PiperOrigin-RevId: 260838177	2019-07-30 18:20:31 -07:00
Zach Koopmans	e511c0e05f	Add feature to launch Sentry from an open host FD. Adds feature to launch from an open host FD instead of a binary_path. The FD should point to a valid executable and most likely be statically compiled. If the executable is not statically compiled, the loader will search along the interpreter paths, which must be able to be resolved in the Sandbox's file system or start will fail. PiperOrigin-RevId: 260756825	2019-07-30 11:20:40 -07:00
Ayush Ranjan	8da9f8a12c	Migrate from using io.ReadSeeker to io.ReaderAt. This provides the following benefits: - We can now use pkg/fd package which does not take ownership of the file descriptor. So it does not close the fd when garbage collected. This reduces scope of errors from unexpected garbage collection of io.File. - It enforces the offset parameter in every read call. It does not affect the fd offset nor is it affected by it. Hence reducing scope of error of using stale offsets when reading. - We do not need to serialize the usage of any global file descriptor anymore. So this drops the mutual exclusion req hence reducing complexity and congestion. PiperOrigin-RevId: 260635174	2019-07-29 20:12:37 -07:00
Ayush Ranjan	ddf25e3331	ext: extent reader implementation. PiperOrigin-RevId: 260629559	2019-07-29 19:17:27 -07:00
Ayush Ranjan	b765eb4589	ext: inode implementations. PiperOrigin-RevId: 260624470	2019-07-29 18:33:55 -07:00
Fabricio Voznika	7052d21dc4	Automated rollback of changelist 255679453 PiperOrigin-RevId: 260047477	2019-07-25 16:48:49 -07:00
Ayush Ranjan	8376757495	ext: filesystem boilerplate code. PiperOrigin-RevId: 259865366	2019-07-24 19:08:21 -07:00
Ayush Ranjan	417096f781	ext: Add tests for root directory inode. PiperOrigin-RevId: 259856442	2019-07-24 17:59:57 -07:00
Ayush Ranjan	2ed832ff86	ext: testing environment setup with VFS2 support. PiperOrigin-RevId: 259835948	2019-07-24 16:03:30 -07:00
Ayush Ranjan	7e38d64333	ext: Inode creation logic. PiperOrigin-RevId: 259666476	2019-07-23 20:36:04 -07:00
Ayush Ranjan	d7bb79b6f1	ext: Add ext2 and ext3 tiny images. PiperOrigin-RevId: 259657917	2019-07-23 19:01:05 -07:00
Ayush Ranjan	bd7708956f	ext: Added extent tree building logic. PiperOrigin-RevId: 259628657	2019-07-23 15:51:50 -07:00

1 2 3 4 5 ...

330 Commits