gvisor

Commit Graph

Author	SHA1	Message	Date
Ayush Ranjan	8376757495	ext: filesystem boilerplate code. PiperOrigin-RevId: 259865366	2019-07-24 19:08:21 -07:00
Ayush Ranjan	417096f781	ext: Add tests for root directory inode. PiperOrigin-RevId: 259856442	2019-07-24 17:59:57 -07:00
Ayush Ranjan	2ed832ff86	ext: testing environment setup with VFS2 support. PiperOrigin-RevId: 259835948	2019-07-24 16:03:30 -07:00
Chris Kuiper	40e682759f	Add support for a subnet prefix length on interface network addresses This allows the user code to add a network address with a subnet prefix length. The prefix length value is stored in the network endpoint and provided back to the user in the ProtocolAddress type. PiperOrigin-RevId: 259807693	2019-07-24 13:42:14 -07:00
chris.zn	1c5b6d9bd2	Use different pidns among different containers The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>	2019-07-24 13:38:23 +08:00
Ayush Ranjan	7e38d64333	ext: Inode creation logic. PiperOrigin-RevId: 259666476	2019-07-23 20:36:04 -07:00
Ayush Ranjan	d7bb79b6f1	ext: Add ext2 and ext3 tiny images. PiperOrigin-RevId: 259657917	2019-07-23 19:01:05 -07:00
Ayush Ranjan	bd7708956f	ext: Added extent tree building logic. PiperOrigin-RevId: 259628657	2019-07-23 15:51:50 -07:00
Nicolas Lacasse	04cbb13ce9	Give each container a distinct MountNamespace. This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346	2019-07-23 14:37:07 -07:00
gVisor bot	d706922d78	Merge pull request #571 from lubinszARM:pr_loader PiperOrigin-RevId: 259427074	2019-07-22 16:12:46 -07:00
Andrei Vagin	ec906e46c0	kvm: fix race between machine.Put and machine.Get m.available.Signal() has to be called under m.mu.RLock, otherwise it can race with machine.Get: m.Get \| m.Put ------------------------------------- m.mu.Lock() \| Seatching available vcpu\| \| m.available.Signal() m.available.Wait \| PiperOrigin-RevId: 259394051	2019-07-22 13:28:16 -07:00
Bin Lu	ffe45f38e6	Add ARM64 support to pkg/sentry/loader Signed-off-by: Bin Lu <bin.lu@arm.com>	2019-07-21 19:30:18 -07:00
gVisor bot	f544509c01	Merge pull request #450 from Pixep:feature/add-clock-boottime-as-monotonic PiperOrigin-RevId: 258996346	2019-07-19 10:44:45 -07:00
Andrei Vagin	eefa817cfd	net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ) PiperOrigin-RevId: 258859507	2019-07-18 15:41:04 -07:00
Jamie Liu	163ab5e9ba	Sentry virtual filesystem, v2 Major differences from the current ("v1") sentry VFS: - Path resolution is Filesystem-driven (FilesystemImpl methods call vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a Dirent tree and calls fs.InodeOperations methods to populate it). This drastically improves performance, primarily by reducing overhead from inefficient synchronization and indirection. It also makes it possible to implement remote filesystem protocols that translate FS system calls into single RPCs, rather than having to make (at least) one RPC per path component, significantly reducing the latency of remote filesystems (especially during cold starts and for uncacheable shared filesystems). - Mounts are correctly represented as a separate check based on contextual state (current mount) rather than direct replacement in a fs.Dirent tree. This makes it possible to support (non-recursive) bind mounts and mount namespaces. Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem that exists primarily to demonstrate intended filesystem implementation patterns and for benchmarking: BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op From this we can infer that, on this machine: - Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1. - Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a difference of about 6x. - The cost of crossing a mount boundary is about 80ns in VFS2 (MemfsMountStat/1 does approximately the same amount of work as MemfsStat/2, except that it also crosses a mount boundary). This is an inescapable cost of the separate mount lookup needed to support bind mounts and mount namespaces. PiperOrigin-RevId: 258853946	2019-07-18 15:10:29 -07:00
Adrien Leravat	2d11fa05f7	sys_time: Wrap comments to 80 columns	2019-07-17 20:25:18 -07:00
Michael Pratt	6f7e2bb388	Take copyMu in Revalidate copyMu is required to read child.overlay.upper. PiperOrigin-RevId: 258662209	2019-07-17 16:12:01 -07:00
Jamie Liu	2bc398bfd8	Separate O_DSYNC and O_SYNC. PiperOrigin-RevId: 258657913	2019-07-17 15:52:38 -07:00
Ayush Ranjan	84a59de5dc	ext: disklayout: extents support. PiperOrigin-RevId: 258657776	2019-07-17 15:48:58 -07:00
Ayush Ranjan	8e3e021aca	ext: Filesystem init implementation. PiperOrigin-RevId: 258645957	2019-07-17 14:48:04 -07:00
gVisor bot	609cd91e3f	Merge pull request #355 from zhuangel:master PiperOrigin-RevId: 258643966	2019-07-17 14:38:22 -07:00
Bhasker Hariharan	542fbd01a7	Fix race in FDTable.GetFDs(). PiperOrigin-RevId: 258635459	2019-07-17 13:56:49 -07:00
Kevin Krakauer	9f1189130e	Add AF_UNIX, SOCK_RAW sockets, which exist for some reason. tcpdump creates these. PiperOrigin-RevId: 258611829	2019-07-17 11:49:16 -07:00
gVisor bot	682fd2d68f	Merge pull request #533 from kevinGC:stub-dev-tty PiperOrigin-RevId: 258607547	2019-07-17 11:28:30 -07:00
Michael Pratt	ca829158e3	Properly invalidate cache in rename and remove We were invalidating the wrong overlayEntry in rename and missing invalidation in rename and remove if lower exists. PiperOrigin-RevId: 258604685	2019-07-17 11:14:57 -07:00
gVisor bot	78a2704bde	Merge pull request #474 from zhuangel:proctasks PiperOrigin-RevId: 258479216	2019-07-16 18:12:07 -07:00
Jianfeng Tan	cf4fc510fd	Support /proc/net/dev This proc file reports the stats of interfaces. We could use ifconfig command to check the result. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ia7c1e637f5c76c30791ffda68ee61e861b6ef827 COPYBARA_INTEGRATE_REVIEW=https://gvisor-review.googlesource.com/c/gvisor/+/18282/ PiperOrigin-RevId: 258303936	2019-07-15 22:51:05 -07:00
Andrei Vagin	6a8ff6daef	kvm: wake up all waiter of vCPU.state Now we call FUTEX_WAKE with ^uintptr(0) of waiters, but in this case only one waiter will be waked up. If we want to wake up all of them, the number of waiters has to be set to math.MaxInt32. PiperOrigin-RevId: 258285286	2019-07-15 19:27:18 -07:00
Kevin Krakauer	9b4d3280e1	Add IPPROTO_RAW, which allows raw sockets to write IP headers. iptables also relies on IPPROTO_RAW in a way. It opens such a socket to manipulate the kernel's tables, but it doesn't actually use any of the functionality. Blegh. PiperOrigin-RevId: 257903078	2019-07-12 18:09:12 -07:00
Bhasker Hariharan	6116473b2f	Stub out support for TCP_MAXSEG. Adds support to set/get the TCP_MAXSEG value but does not really change the segment sizes emitted by netstack or alter the MSS advertised by the endpoint. This is currently being added only to unblock iperf3 on gVisor. Plumbing this correctly requires a bit more work which will come in separate CLs. PiperOrigin-RevId: 257859112	2019-07-12 13:35:17 -07:00
gVisor bot	eff2c264a4	Merge pull request #282 from zhangningdlut:chris_test_proc PiperOrigin-RevId: 257855479	2019-07-12 13:11:01 -07:00
Nicolas Lacasse	69e0affaec	Don't emit an event for extended attribute syscalls. These are filesystem-specific, and filesystems are allowed to return ENOTSUP if they are not supported. PiperOrigin-RevId: 257813477	2019-07-12 09:11:04 -07:00
Kevin	ddef7f8078	Fix license year and remove Read.	2019-07-11 21:31:26 -07:00
Kevin	44427d8e26	Add a stub for /dev/tty. Actual implementation to follow, but this will satisfy applications that want it to just exist.	2019-07-11 21:24:27 -07:00
Ayush Ranjan	2eeca68900	Added tiny ext4 image. The image is of size 64Kb which supports 64 1k blocks and 16 inodes. This is the smallest size mkfs.ext4 works with. Added README.md documenting how this was created and included all files on the device under assets. PiperOrigin-RevId: 257712672	2019-07-11 17:17:47 -07:00
Ayush Ranjan	5242face2e	ext: boilerplate code. Renamed ext4 to ext since we are targeting ext(2/3/4). Removed fs.go since we are targeting VFS2. Added ext.go with filesystem struct. PiperOrigin-RevId: 257689775	2019-07-11 15:05:36 -07:00
Liu Hua	7581e84cb6	tss: block userspace access to all I/O ports. A userspace process (CPL=3) can access an i/o port if the bit corresponding to the port is set to 0 in the I/O permission bitmap. Configure the I/O permission bitmap address beyond the last valid byte in the TSS so access to all i/o ports is blocked. Signed-off-by: Liu Hua <sdu.liu@huawei.com> Change-Id: I3df76980c3735491db768f7210e71703f86bb989 PiperOrigin-RevId: 257336518	2019-07-09 22:21:56 -07:00
Ayush Ranjan	7965b1272b	ext4: disklayout: Directory Entry implementation. PiperOrigin-RevId: 257314911	2019-07-09 18:36:02 -07:00
Adin Scannell	dea3cb92f2	build: add nogo for static validation PiperOrigin-RevId: 257297820	2019-07-09 16:44:06 -07:00
Adin Scannell	cceef9d2cf	Cleanup straggling syscall dependencies. PiperOrigin-RevId: 257293198	2019-07-09 16:18:02 -07:00
Nicolas Lacasse	6db3f8d54c	Don't mask errors in createAt loop. The error set in the loop in createAt was being masked by other errors declared with ":=". This allowed an ErrResolveViaReadlink error to escape, which can cause a sentry panic. Added test case which repros without the fix. PiperOrigin-RevId: 257061767	2019-07-08 14:57:15 -07:00
Nicolas Lacasse	659bebab8e	Don't try to execute a file that is not regular. PiperOrigin-RevId: 257037608	2019-07-08 12:56:48 -07:00
Ayush Ranjan	8f9b1ca8e7	ext4: disklayout: inode impl. PiperOrigin-RevId: 257010414	2019-07-08 10:44:11 -07:00
Andrei Vagin	67f2cefce0	Avoid importing platforms from many source files PiperOrigin-RevId: 256494243	2019-07-03 22:51:26 -07:00
Ian Lewis	da57fb9d25	Fix syscall doc for getresgid PiperOrigin-RevId: 256481284	2019-07-03 20:13:19 -07:00
Neel Natu	9f2f9f0cab	futex: compare keys for equality when doing a FUTEX_UNLOCK_PI. PiperOrigin-RevId: 256453827	2019-07-03 16:01:38 -07:00
Andrei Vagin	116cac053e	netstack/udp: connect with the AF_UNSPEC address family means disconnect PiperOrigin-RevId: 256433283	2019-07-03 14:19:02 -07:00
gVisor bot	f10862696c	Merge pull request #493 from ahmetb:reticulating-splines PiperOrigin-RevId: 256319059	2019-07-03 01:10:34 -07:00
Yong He	85b27a9f8f	Solve BounceToKernel may hang issue BounceToKernel will make vCPU quit from guest ring3 to guest ring0, but vCPUWaiter is not cleared when we unlock the vCPU, when next time this vCPU enter guest mode ring3, vCPU may enter guest mode with vCPUWaiter bit setted, this will cause the following BounceToKernel to this vCPU hangs at waitUntilNot. Halt may workaroud this issue, because halt process will reset vCPU status into vCPUUser, and notify all waiter for vCPU state change, but if there is no exception or syscall in this period, BounceToKernel will hang at waitUntilNot. PiperOrigin-RevId: 256299660	2019-07-02 22:03:28 -07:00
Adin Scannell	753da9604e	Remove map from fd_map, change to fd_table. This renames FDMap to FDTable and drops the kernel.FD type, which had an entire package to itself and didn't serve much use (it was freely cast between types, and served as more of an annoyance than providing any protection.) Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup operation, and 10-15 ns per concurrent lookup operation of savings. This also fixes two tangential usage issues with the FDMap. Namely, non-atomic use of NewFDFrom and associated calls to Remove (that are both racy and fail to drop the reference on the underlying file.) PiperOrigin-RevId: 256285890	2019-07-02 19:28:59 -07:00
Ian Lewis	3f14caeb99	Add documentation for remaining syscalls (fixes #197 , #186 ) Adds support level documentation for all syscalls. Removes the Undocumented utility function to discourage usage while leaving SupportUndocumented as the default support level for Syscall structs. PiperOrigin-RevId: 256281927	2019-07-02 18:45:16 -07:00
Ayush Ranjan	d8ec2fb671	Ext4: DiskLayout: Inode interface. PiperOrigin-RevId: 256234390	2019-07-02 14:04:31 -07:00
Nicolas Lacasse	4f2f44320f	Simplify (and fix) refcounts in createAt. fileOpAt holds references on the Dirents passed as arguments to the callback, and drops refs when finished, so we don't need to DecRef those Dirents ourselves However, all Dirents that we get from FindInode/FindLink must be DecRef'd. This CL cleans up the ref-counting logic, and fixes some refcount issues in the process. PiperOrigin-RevId: 256220882	2019-07-02 12:58:58 -07:00
Ahmet Alp Balkan	4cd28c6e27	sentry/kernel: add syslog message It feels like "reticulating splines" is missing from the list of meaningless syslog messages. Signed-off-by: Ahmet Alp Balkan <ahmetb@google.com>	2019-07-02 12:05:41 -07:00
Ian Gudger	0aa9418a77	Fix unix/transport.queue reference leaks. Fix two leaks for connectionless Unix sockets: * Double connect: Subsequent connects would leak a reference on the previously connected endpoint. * Close unconnected: Sockets which were not connected at the time of closure would leak a reference on their receiver. PiperOrigin-RevId: 256070451	2019-07-01 17:46:24 -07:00
Nicolas Lacasse	06537129a6	Check remaining traversal limit when creating a file through a symlink. This fixes the case when an app tries to create a file that already exists, and is a symlink to itself. A test was added. PiperOrigin-RevId: 256044811	2019-07-01 15:25:22 -07:00
Ian Gudger	45566fa4e4	Add finalizer on AtomicRefCount to check for leaks. PiperOrigin-RevId: 255711454	2019-06-28 20:07:52 -07:00
Adin Scannell	7dae043fec	Drop ashmem and binder. These are unfortunately unused and unmaintained. They can be brought back in the future if need requires it. PiperOrigin-RevId: 255697132	2019-06-28 17:20:25 -07:00
Nicolas Lacasse	d3f97aec49	Remove events from name_to_handle_at and open_by_handle_at. These syscalls require filesystem support that gVisor does not provide, and is not planning to implement. Their absense should not trigger an event. PiperOrigin-RevId: 255692871	2019-06-28 16:50:24 -07:00
Ayush Ranjan	c4da599e22	ext4: disklayout: SuperBlock interface implementations. PiperOrigin-RevId: 255687771	2019-06-28 16:18:29 -07:00
Nicolas Lacasse	295078fa7a	Automated rollback of changelist 255263686 PiperOrigin-RevId: 255679453	2019-06-28 15:28:41 -07:00
Andrei Vagin	e21d49c2d8	platform/ptrace: return more detailed errors Right now, if we can't create a stub process, we will see this error: panic: unable to activate mm: resource temporarily unavailable It would be better to know the root cause of this "resource temporarily unavailable". PiperOrigin-RevId: 255656831	2019-06-28 13:23:36 -07:00
Ayush Ranjan	7c13789818	Superblock interface in the disk layout package for ext4. PiperOrigin-RevId: 255644277	2019-06-28 12:07:28 -07:00
Yong He	c61d7761b4	Fix deadloop in proc subtask list Readdir of /proc/x/task/ will get direntry entries from tasks of specified taskgroup. Now the tasks slice is unsorted, use sort.SearchInts search entry from the slice may cause infinity loops. The fix is sort the slice before search. This issue could be easily reproduced via following steps, revise Readdir in pkg/sentry/fs/proc/task.go, force set taskInts into test slice []int{1, 11, 7, 5, 10, 6, 8, 3, 9, 2, 4}, then run docker image and run ls /proc/1/task, the command will cause infinity loops.	2019-06-28 22:20:57 +08:00
Fabricio Voznika	b2907595e5	Complete pipe support on overlayfs Get/Set pipe size and ioctl support were missing from overlayfs. It required moving the pipe.Sizer interface to fs so that overlay could get access. Fixes #318 PiperOrigin-RevId: 255511125	2019-06-27 17:22:53 -07:00
Michael Pratt	5b41ba5d0e	Fix various spelling issues in the documentation Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779	2019-06-27 14:25:50 -07:00
Michael Pratt	085a907565	Cache directory entries in the overlay Currently, the overlay dirCache is only used for a single logical use of getdents. i.e., it is discard when the FD is closed or seeked back to the beginning. But the initial work of getting the directory contents can be quite expensive (particularly sorting large directories), so we should keep it as long as possible. This is very similar to the readdirCache in fs/gofer. Since the upper filesystem does not have to allow caching readdir entries, the new CacheReaddir MountSourceOperations method controls this behavior. This caching should be trivially movable to all Inodes if desired, though that adds an additional copy step for non-overlay Inodes. (Overlay Inodes already do the extra copy). PiperOrigin-RevId: 255477592	2019-06-27 14:24:03 -07:00
Andrei Vagin	e276083903	gvisor/ptrace: grub initial thread registers only once PiperOrigin-RevId: 255465635	2019-06-27 13:59:57 -07:00
Fabricio Voznika	42e212f6b7	Preserve permissions when checking lower The code was wrongly assuming that only read access was required from the lower overlay when checking for permissions. This allowed non-writable files to be writable in the overlay. Fixes #316 PiperOrigin-RevId: 255263686	2019-06-26 14:24:44 -07:00
Nicolas Lacasse	857e5c47e9	Follow symlinks when creating a file, and create the target. If we have a symlink whose target does not exist, creating the symlink (either via 'creat' or 'open' with O_CREAT flag) should create the target of the symlink. Previously, gVisor would error with EEXIST in this case PiperOrigin-RevId: 255232944	2019-06-26 11:49:20 -07:00
Michael Pratt	e98ce4a2c6	Add TODO reminder to remove tmpfs caching options Updates #179 PiperOrigin-RevId: 255081565	2019-06-25 17:12:34 -07:00
Andrei Vagin	03ae91c662	gvisor: lockless read access for task credentials Credentials are immutable and even before these changes we could read them without locks, but we needed to take a task lock to get a credential object from a task object. It is possible to avoid this lock, if we will guarantee that a credential object will not be changed after setting it on a task. PiperOrigin-RevId: 254989492	2019-06-25 09:52:49 -07:00
Adrien Leravat	3688e6e99d	Add CLOCK_BOOTTIME as a CLOCK_MONOTONIC alias Makes CLOCK_BOOTTIME available with * clock_gettime * timerfd_create * clock_gettime vDSO CLOCK_BOOTTIME is implemented as an alias to CLOCK_MONOTONIC. CLOCK_MONOTONIC already keeps track of time across save and restore. This is the closest possible behavior to Linux CLOCK_BOOTIME, as there is no concept of suspend/resume. Updates google/gvisor#218	2019-06-24 21:14:38 -07:00
Andrei Vagin	e9ea7230f7	fs: synchronize concurrent writes into files with O_APPEND For files with O_APPEND, a file write operation gets a file size and uses it as offset to call an inode write operation. This means that all other operations which can change a file size should be blocked while the write operation doesn't complete. PiperOrigin-RevId: 254873771	2019-06-24 17:45:02 -07:00
Adin Scannell	7f5d0afe52	Add O_EXITKILL to ptrace options. This prevents a race before PDEATH_SIG can take effect during a sentry crash. Discovered and solution by avagin@. PiperOrigin-RevId: 254871534	2019-06-24 17:30:01 -07:00
Rahat Mahmood	94a6bfab5d	Implement /proc/net/tcp. PiperOrigin-RevId: 254854346	2019-06-24 15:56:36 -07:00
Andrei Vagin	c5486f5122	platform/ptrace: specify PTRACE_O_TRACEEXIT for stub-processes The tracee is stopped early during process exit, when registers are still available, allowing the tracer to see where the exit occurred, whereas the normal exit notifi? cation is done after the process is finished exiting. Without this option, dumpAndPanic fails to get registers. PiperOrigin-RevId: 254852917	2019-06-24 15:48:58 -07:00
Nicolas Lacasse	87df9aab24	Use correct statx syscall number for amd64. The previous number was for the arm architecture. Also change the statx tests to force them to run on gVisor, which would have caught this issue. PiperOrigin-RevId: 254846831	2019-06-24 15:19:36 -07:00
Fabricio Voznika	b21b1db700	Allow to change logging options using 'runsc debug' New options are: runsc debug --strace=off\|all\|function1,function2 runsc debug --log-level=warning\|info\|debug runsc debug --log-packets=true\|false Updates #407 PiperOrigin-RevId: 254843128	2019-06-24 15:03:02 -07:00
chris.zn	f957fb23cf	Return ENOENT when reading /proc/{pid}/task of an exited process There will be a deadloop when we use getdents to read /proc/{pid}/task of an exited process Like this: Process A is running Process B: open /proc/{pid of A}/task Process A exits Process B: getdents /proc/{pid of A}/task Then, process B will fall into deadloop, and return "." and ".." in loops and never ends. This patch returns ENOENT when use getdents to read /proc/{pid}/task if the process is just exited. Signed-off-by: chris.zn <chris.zn@antfin.com>	2019-06-24 15:49:53 +08:00
Nicolas Lacasse	35719d52c7	Implement statx. We don't have the plumbing for btime yet, so that field is left off. The returned mask indicates that btime is absent. Fixes #343 PiperOrigin-RevId: 254575752	2019-06-22 13:29:26 -07:00
Andrei Vagin	ab6774cebf	gvisor/fs: getdents returns 0 if offset is equal to FileMaxOffset FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called. PiperOrigin-RevId: 254498777	2019-06-21 17:25:17 -07:00
Ayush Ranjan	727375321f	ext4 block group descriptor implementation in disk layout package. PiperOrigin-RevId: 254482180	2019-06-21 15:42:46 -07:00
Fabricio Voznika	5ba16d51a9	Add list of stuck tasks to panic message PiperOrigin-RevId: 254450309	2019-06-21 12:46:53 -07:00
Andrei Vagin	f94653b3de	kernel: call t.mu.Unlock() explicitly in WithMuLocked defer here doesn't improve readability, but we know it slower that the explicit call. PiperOrigin-RevId: 254441473	2019-06-21 11:55:42 -07:00
Fabricio Voznika	054b5632ef	Update comment PiperOrigin-RevId: 254428866	2019-06-21 10:56:42 -07:00
Jamie Liu	7db8685100	Preallocate auth.NewAnonymousCredentials() in contexttest.TestContext. Otherwise every call to, say, fs.ContextCanAccessFile() in a benchmark using contexttest allocates new auth.Credentials, a new auth.UserNamespace, ... PiperOrigin-RevId: 254261051	2019-06-20 13:36:14 -07:00
Michael Pratt	292f70cbf7	Add package docs to seqfile and ramfs These are the only packages missing docs: https://godoc.org/gvisor.dev/gvisor PiperOrigin-RevId: 254261022	2019-06-20 13:34:33 -07:00
Neel Natu	0b2135072d	Implement madvise(MADV_DONTFORK) PiperOrigin-RevId: 254253777	2019-06-20 12:56:00 -07:00
Ian Gudger	7e49515696	Deflake SendFileTest_Shutdown. The sendfile syscall's backing doSplice contained a race with regard to blocking. If the first attempt failed with syserror.ErrWouldBlock and then the blocking file became ready before registering a waiter, we would just return the ErrWouldBlock (even if we were supposed to block). PiperOrigin-RevId: 254114432	2019-06-19 18:40:54 -07:00
Nicolas Lacasse	29f9e4fa87	fileOp{On,At} should pass the remaning symlink traversal count. And methods that do more traversals should use the remaining count rather than resetting. PiperOrigin-RevId: 254041720	2019-06-19 11:56:34 -07:00
Nicolas Lacasse	f7428af9c1	Add MountNamespace to task. This allows tasks to have distinct mount namespace, instead of all sharing the kernel's root mount namespace. Currently, the only way for a task to get a different mount namespace than the kernel's root is by explicitly setting a different MountNamespace in CreateProcessArgs, and nothing does this (yet). In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a new container inside runsc. Note that "MountNamespace" is a poor term for this thing. It's more like a distinct VFS tree. When we get around to adding real mount namespaces, this will need a better naem. PiperOrigin-RevId: 254009310	2019-06-19 09:21:21 -07:00
Fabricio Voznika	ca245a428b	Attempt to fix TestPipeWritesAccumulate Test fails because it's reading 4KB instead of the expected 64KB. Changed the test to read pipe buffer size instead of hardcode and added some logging in case the reason for failure was not pipe buffer size. PiperOrigin-RevId: 253916040	2019-06-18 19:16:11 -07:00
Andrei Vagin	8ab0848c70	gvisor/fs: don't update file.offset for sockets, pipes, etc sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644	2019-06-18 01:43:29 -07:00
Yong He	0dbdca349c	Skip tid allocation which is using When leader of process group (session) exit, the process group ID (session ID) is holding by other processes in the process group, so the process group ID (session ID) can not be reused. If reusing the process group ID (seession ID) as new process group ID for new process, this will cause session create failed, and later runsc crash when access process group. The fix skip the tid if it is using by a process group (session) when allocating a new tid. We could easily reproduce the runsc crash follow these steps: 1. build test program, and run inside container int main(int argc, char argv[]) { pid_t cpid, spid; cpid = fork(); if (cpid == -1) { perror("fork"); exit(EXIT_FAILURE); } if (cpid == 0) { pid_t sid = setsid(); printf("Start New Session %ld\n",sid); printf("Child PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); spid = fork(); if (spid == 0) { setpgid(getpid(), getpid()); printf("Set GrandSon as New Process Group\n"); printf("GrandSon PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); while(1) { usleep(1); } } sleep(3); exit(0); } else { exit(0); } return 0; } 2. build hello program int main(int argc, char argv[]) { printf("Current PID is %ld\n", (long) getpid()); return 0; } 3. run script on host which run hello inside container, you can speed up the test with set TasksLimit as lower value. for (( i=0; i<65535; i++ )) do docker exec <container id> /test/hello done 4. when hello process reusing the process group of loop process, runsc will crash. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x79f0c8] goroutine 612475 [running]: gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(ProcessGroup).decRefWithParent(0x0, 0x0) pkg/sentry/kernel/sessions.go:160 +0x78 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).exitNotifyLocked(0xc000663500, 0x0) pkg/sentry/kernel/task_exit.go:672 +0x2b7 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(runExitNotify).execute(0x0, 0xc000663500, 0x0, 0x0) pkg/sentry/kernel/task_exit.go:542 +0xc4 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).run(0xc000663500, 0xc) pkg/sentry/kernel/task_run.go:91 +0x194 created by gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Start pkg/sentry/kernel/task_start.go:286 +0xfe	2019-06-14 14:05:41 +08:00
Bhasker Hariharan	3d71c627fa	Add support for TCP receive buffer auto tuning. The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283	2019-06-13 22:28:01 -07:00
Ian Gudger	3e9b8ecbfe	Plumb context through more layers of filesytem. All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709	2019-06-13 18:40:38 -07:00
Ian Gudger	0a5ee6f7b2	Fix deadlock in fasync. The deadlock can occur when both ends of a connected Unix socket which has FIOASYNC enabled on at least one end are closed at the same time. One end notifies that it is closing, calling (waiter.Queue).Notify which takes waiter.Queue.mu (as a read lock) and then calls (FileAsync).Callback, which takes FileAsync.mu. The other end tries to unregister for notifications by calling (FileAsync).Unregister, which takes FileAsync.mu and calls (waiter.Queue).EventUnregister which takes waiter.Queue.mu. This is fixed by moving the calls to waiter.Waitable.EventRegister and waiter.Waitable.EventUnregister outside of the protection of any mutex used in (FileAsync).Callback. The new test is related, but does not cover this particular situation. Also fix a data race on FileAsync.e.Callback. (FileAsync).Callback checked FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter calling (*FileAsync).Callback could not and did not. This is fixed by making FileAsync.e.Callback immutable before passing it to the waiter for the first time. Fixes #346 PiperOrigin-RevId: 253138340	2019-06-13 17:26:22 -07:00
Rahat Mahmood	05ff1ffaad	Implement getsockopt() SO_DOMAIN, SO_PROTOCOL and SO_TYPE. SO_TYPE was already implemented for everything but netlink sockets. PiperOrigin-RevId: 253138157	2019-06-13 17:24:51 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00

1 2 3 4 5 ...

728 Commits