gvisor

Commit Graph

Author	SHA1	Message	Date
Jamie Liu	3a987160aa	Handle gofer blocking opens of host named pipes in VFS2. Using tee instead of read to detect when a O_RDONLY\|O_NONBLOCK pipe FD has a writer circumvents the problem of what to do with the byte read from the pipe, avoiding much of the complexity of the fdpipe package. PiperOrigin-RevId: 314216146	2020-06-01 15:33:30 -07:00
Michael Pratt	12f74bd6f6	Include runtime goroutines in panics SetTraceback("all") does not include all goroutines in panics (you didn't think it was that simple, did you?). It includes all _user_ goroutines; those started by the runtime (such as GC workers) are excluded. Switch to "system" to additionally include runtime goroutines, which are useful to track down bugs in the runtime itself. PiperOrigin-RevId: 314204473	2020-06-01 14:32:19 -07:00
Fabricio Voznika	16100d18cb	Make gofer mount readonly when overlay is enabled No writes are expected to the underlying filesystem when using --overlay. PiperOrigin-RevId: 314171457	2020-06-01 11:44:32 -07:00
Nicolas Lacasse	93edb36cbb	Refactor the ResolveExecutablePath logic. PiperOrigin-RevId: 313871804	2020-05-29 16:35:21 -07:00
gVisor bot	f498e46ef9	Merge pull request #2767 from mikaelmello:add-cwd-option-spec PiperOrigin-RevId: 313828906	2020-05-29 12:34:45 -07:00
Fabricio Voznika	f7418e2159	Move Cleanup to its own package PiperOrigin-RevId: 313663382	2020-05-28 14:49:06 -07:00
Fabricio Voznika	a8c1b32660	Automated rollback of changelist 309082540 PiperOrigin-RevId: 313636920	2020-05-28 12:25:57 -07:00
Mikael Mello	9e8000e9fb	Add cwd option to spec cmd	2020-05-24 17:44:03 -03:00
Fabricio Voznika	10abad0040	Add hugetlb and rdma cgroups to runsc Updates #2713 PiperOrigin-RevId: 312559463	2020-05-20 14:49:13 -07:00
Fabricio Voznika	32ab382c80	Improve unsupported syscall message PiperOrigin-RevId: 312104899	2020-05-18 10:23:22 -07:00
Adin Scannell	420b791a3d	Minor formatting updates for gvisor.dev. * Aggregate architecture Overview in "What is gVisor?" as it makes more sense in one place. * Drop "user-space kernel" and use "application kernel". The term "user-space kernel" is confusing when some platform implementation do not run in user-space (instead running in guest ring zero). * Clear up the relationship between the Platform page in the user guide and the Platform page in the architecture guide, and ensure they are cross-linked. * Restore the call-to-action quick start link in the main page, and drop the GitHub link (which also appears in the top-right). * Improve image formatting by centering all doc and blog images, and move the image captions to the alt text. PiperOrigin-RevId: 311845158	2020-05-15 20:05:18 -07:00
Jamie Liu	64afaf0e9b	Fix runsc association of gofers and FDs on VFS2. Updates #1487 PiperOrigin-RevId: 311443628	2020-05-13 18:18:09 -07:00
Jamie Liu	d846077628	Enable overlayfs_stale_read by default for runsc. Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char *argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267	2020-05-13 10:53:37 -07:00
Fabricio Voznika	18cb3d24cb	Use VFS2 mount names Updates #1487 PiperOrigin-RevId: 311356385	2020-05-13 10:31:29 -07:00
Fabricio Voznika	305f786e51	Adjust a few log messages PiperOrigin-RevId: 311234146	2020-05-12 17:26:07 -07:00
Bhasker Hariharan	e838e7ab34	Automated rollback of changelist 310417191 PiperOrigin-RevId: 310963404	2020-05-11 12:09:06 -07:00
Nicolas Lacasse	c52195d258	Stop avoiding preadv2 and pwritev2, and add them to the filters. Some code paths needed these syscalls anyways, so they should be included in the filters. Given that we depend on these syscalls in some cases, there's no real reason to avoid them any more. PiperOrigin-RevId: 310829126	2020-05-10 17:52:20 -07:00
Jamie Liu	9115f26851	Allocate device numbers for VFS2 filesystems. Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006	2020-05-07 14:01:53 -07:00
Bhasker Hariharan	28b5565fdd	Automated rollback of changelist 309339316 PiperOrigin-RevId: 310417191	2020-05-07 12:48:23 -07:00
Dean Deng	16da7e790f	Update privateunixsocket TODOs. Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474	2020-05-07 10:20:48 -07:00
Adin Scannell	279f1eb7ab	Fix runsc syscall documentation generation. We can register any number of tables with any number of architectures, and need not limit the definitions to the architecture in question. This allows runsc to generate documentation for all architectures simultaneously. Similarly, this simplifies the VFSv2 patching process. PiperOrigin-RevId: 310224827	2020-05-06 14:13:48 -07:00
Fabricio Voznika	e2b0e0e272	Enable TestRunNonRoot on VFS2 Also added back the default test dimension back which was dropped in a previous refactor. PiperOrigin-RevId: 309797327	2020-05-04 12:29:03 -07:00
Fabricio Voznika	0a307d0072	Mount VSFS2 filesystem using root credentials PiperOrigin-RevId: 309787938	2020-05-04 11:48:00 -07:00
Fabricio Voznika	cbc5bef2a6	Add TTY support on VFS2 to runsc Updates #1623, #1487 PiperOrigin-RevId: 309777922	2020-05-04 10:59:20 -07:00
Bhasker Hariharan	8962b7840f	Enable FIFO QDisc by default in runsc. Updates #231 PiperOrigin-RevId: 309339316	2020-04-30 18:29:57 -07:00
Bhasker Hariharan	ae15d90436	FIFO QDisc implementation Updates #231 PiperOrigin-RevId: 309323808	2020-04-30 16:41:00 -07:00
gVisor bot	d5c34ba2ff	Merge pull request #2487 from moricho:fix/bindmount PiperOrigin-RevId: 309082540	2020-04-29 13:13:51 -07:00
gVisor bot	ceb3c0e062	Merge pull request #2558 from prattmic:forward_signal PiperOrigin-RevId: 308829800	2020-04-28 08:43:49 -07:00
gVisor bot	316394ee89	Merge pull request #2544 from prattmic:runsc_do_cleanup PiperOrigin-RevId: 308727526	2020-04-27 17:01:33 -07:00
Michael Pratt	147c8ba1f7	runsc: extend do network cleanup Previously we unconditionally failed to cleanup the networking files (hostname, resolve.conf, hosts), and failed to cleanup the netns, etc on partial setup failure. We can drop the iptables commands from cleanup, as the routes automatically go away when the device is deleted. Those commands were failing previously. Forward signals to the container, allowing it to exit normally when a signal is received, and then for runsc to run the cleanup. This doesn't cover cleanup when runsc is signalled before the container start, it covers the most common case. Fixes #2539 Fixes #2540	2020-04-27 16:36:07 -04:00
Michael Pratt	b15d49a137	container: use sighandling package Use the sighandling package for Container.ForwardSignals, for consistency with other signal forwarding. Fixes #2546	2020-04-27 11:52:43 -04:00
kevin.xu	9a4ae0322e	Update container.go typo, should be `start` in comments	2020-04-27 21:53:04 +08:00
moricho	fc53d64367	refactor and add test for bindmount Signed-off-by: moricho <ikeda.morito@gmail.com>	2020-04-26 17:24:34 +09:00
Zach Koopmans	17ac90a203	Add container tests passing with VFS2 Several tests are passing after getting TestAppExitStatus (run /bin/true) changes. Make versions that run via VFS2 so that we know what is and isn't working. In addition, fix bug in VFSFile ReadFull. For the TestExePath test in container_test.go, the case "unmasked" will return 0 bytes read with no EOF err, causing the ReadFull call to spin. PiperOrigin-RevId: 308428126	2020-04-25 11:27:23 -07:00
moricho	0b3166f624	add bind/rbind options for mount Signed-off-by: moricho <ikeda.morito@gmail.com>	2020-04-25 22:04:39 +09:00
moricho	93e510e26f	fix behavior of `getMountNameAndOptions` when options include either bind or rbind Signed-off-by: moricho <ikeda.morito@gmail.com>	2020-04-25 22:04:39 +09:00
Zach Koopmans	15a822a193	VFS2: Get HelloWorld image tests to pass with VFS2 This change includes: - Modifications to loader_test.go to get TestCreateMountNamespace to pass with VFS2. - Changes necessary to get TestHelloWorld in image tests to pass with VFS2. This means runsc can run the hello-world container with docker on VSF2. Note: Containers that use sockets will not run with these changes. See "//test/image/...". Any tests here with sockets currently fail (which is all of them but HelloWorld). PiperOrigin-RevId: 308363072	2020-04-24 18:23:37 -07:00
Fabricio Voznika	4af39dd1c5	Propagate PID limit from OCI to sandbox cgroup Closes #2489 PiperOrigin-RevId: 308362434	2020-04-24 18:17:01 -07:00
Dean Deng	632b104aff	Plumb context.Context into kernfs.Inode.Open(). PiperOrigin-RevId: 308304793	2020-04-24 12:37:49 -07:00
Dean Deng	1b88c63b3e	Move hostfs mount to Kernel struct. This is needed to set up host fds passed through a Unix socket. Note that the host package depends on kernel, so we cannot set up the hostfs mount directly in Kernel.Init as we do for sockfs and pipefs. Also, adjust sockfs to make its setup look more like hostfs's and pipefs's. PiperOrigin-RevId: 308274053	2020-04-24 10:03:43 -07:00
Jamie Liu	5042ea7e2c	Add vfs.MkdirOptions.ForSyntheticMountpoint. PiperOrigin-RevId: 308143529	2020-04-23 15:37:10 -07:00
Adin Scannell	1481499fe2	Simplify Docker test infrastructure. This change adds a layer of abstraction around the internal Docker APIs, and eliminates all direct dependencies on Dockerfiles in the infrastructure. A subsequent change will automated the generation of local images (with efficient caching). Note that this change drops the use of bazel container rules, as that experiment does not seem to be viable. PiperOrigin-RevId: 308095430	2020-04-23 11:33:30 -07:00
Nicolas Lacasse	e69a871c7b	Move user home detection to its own library. PiperOrigin-RevId: 307977689	2020-04-22 22:18:21 -07:00
Andrei Vagin	0c586946ea	Specify a memory file in platform.New(). PiperOrigin-RevId: 307941984	2020-04-22 17:50:10 -07:00
Adin Scannell	1a597e01be	Add a functional vm_test for root_test. This change renames the tools/images directory to tools/vm for clarity, and adds a functional vm_test. Sharding is also added to the same test, and some documentation added around key flags & variables to describe how they work. Subsequent changes will add vm_tests for other cases, such as the runtime tests. PiperOrigin-RevId: 307492245	2020-04-20 15:48:27 -07:00
Fabricio Voznika	a80cd43023	Add test name to boot and gofer log files This is to make easier to find corresponding logs in case test fails. PiperOrigin-RevId: 307104283	2020-04-17 13:28:54 -07:00
Zach Koopmans	12bde95635	Get /bin/true to run on VFS2 Included: - loader_test.go RunTest and TestStartSignal VFS2 - container_test.go TestAppExitStatus on VFS2 - experimental flag added to runsc to turn on VFS2 Note: shared mounts are not yet supported. PiperOrigin-RevId: 307070753	2020-04-17 10:39:19 -07:00
Fabricio Voznika	5a8ee1beee	Preserve log FD after execve PiperOrigin-RevId: 306908296	2020-04-16 13:17:00 -07:00
gVisor bot	ac9b32c36b	Merge pull request #2212 from aaronlu:dup_stdioFDs PiperOrigin-RevId: 306477639	2020-04-14 11:20:11 -07:00
Ian Lewis	daf3322498	Add logging message for noNewPrivileges OCI option. noNewPrivileges is ignored if set to false since gVisor assumes that PR_SET_NO_NEW_PRIVS is always enabled. PiperOrigin-RevId: 305991947	2020-04-10 20:32:23 -07:00
Fabricio Voznika	96f9142959	Use O_CLOEXEC when dup'ing FDs The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737	2020-04-10 15:47:23 -07:00
gVisor bot	78126611e6	Merge pull request #2253 from amscanne:nogo PiperOrigin-RevId: 305807868	2020-04-09 19:16:46 -07:00
Fabricio Voznika	2a28e3e9c3	Don't unconditionally set --panic-signal Closes #2393 PiperOrigin-RevId: 305793027	2020-04-09 17:20:14 -07:00
Fabricio Voznika	6dd5a1f3fe	Clean up TODOs PiperOrigin-RevId: 305592245	2020-04-08 17:58:13 -07:00
Adin Scannell	928a7c60b8	Fix all printf formatting errors. Updates #2243	2020-04-08 10:14:34 -07:00
Adin Scannell	94b793262d	Fix all copy locks violations. This required minor restructuring of how system call tables were saved and restored, but it makes way more sense this way. Updates #2243	2020-04-08 10:00:14 -07:00
Ian Lewis	56054fc1fb	Add friendlier messages for frequently encountered errors. Issue #2270 Issue #1765 PiperOrigin-RevId: 305385436	2020-04-07 18:51:01 -07:00
Ian Lewis	5802051b3d	Update TODO to #238 Move TODO to #238 so that proper synchronization of operations is handled when we create the urpc client. Issue #238 Fixes #512 PiperOrigin-RevId: 305383924	2020-04-07 18:39:33 -07:00
Andrei Vagin	acf0259255	Don't map the 0 uid into a sandbox user namespace Starting with go1.13, we can specify ambient capabilities when we execute a new process with os/exe.Cmd. PiperOrigin-RevId: 305366706	2020-04-07 16:46:05 -07:00
Dean Deng	fc72eb3595	Remove TODOs for local gofer extended attributes. PiperOrigin-RevId: 305344989	2020-04-07 14:48:40 -07:00
Adin Scannell	4e6a1a5adb	Automated rollback of changelist 303799678 PiperOrigin-RevId: 304221302	2020-04-01 11:06:26 -07:00
Aaron Lu	0cfdd47391	checkpoint/restore: make sure the donated stdioFDs have the same value Suppose I start a runsc container using kvm platform like this: $ sudo runsc --debug=true --debug-log=1.txt --platform=kvm run rootbash The donating FD and the corresponding cmdline for runsc-sandbox is: D0313 17:50:12.608203 44389 x:0] Donating FD 3: "1.txt" D0313 17:50:12.608214 44389 x:0] Donating FD 4: "control_server_socket" D0313 17:50:12.608224 44389 x:0] Donating FD 5: "\|0" D0313 17:50:12.608229 44389 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:12.608234 44389 x:0] Donating FD 7: "\|1" D0313 17:50:12.608238 44389 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:12.608242 44389 x:0] Donating FD 9: "/dev/kvm" D0313 17:50:12.608246 44389 x:0] Donating FD 10: "/dev/stdin" D0313 17:50:12.608249 44389 x:0] Donating FD 11: "/dev/stdout" D0313 17:50:12.608253 44389 x:0] Donating FD 12: "/dev/stderr" D0313 17:50:12.608257 44389 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=kvm --strace=false --strace-syscalls=--strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --device-fd=9 --stdio-fds=10 --stdio-fds=11 --stdio-fds=12 --pidns=true --setup-root --cpu-num 32 --total-memory 4294967296 rootbash] Note stdioFDs starts from 10 with kvm platform and stderr's FD is 12. If I restore a container from the checkpoint image which is derived by checkpointing the above rootbash container, but either omit the platform switch or specify to use ptrace platform explicitely: $ sudo runsc --debug=true --debug-log=1.txt restore --image-path=some_path restored_rootbash the donating FD and corresponding cmdline for runsc-sandbox is: D0313 17:50:15.258632 44452 x:0] Donating FD 3: "1.txt" D0313 17:50:15.258640 44452 x:0] Donating FD 4: "control_server_socket" D0313 17:50:15.258645 44452 x:0] Donating FD 5: "\|0" D0313 17:50:15.258648 44452 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:15.258653 44452 x:0] Donating FD 7: "\|1" D0313 17:50:15.258657 44452 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:15.258661 44452 x:0] Donating FD 9: "/dev/stdin" D0313 17:50:15.258675 44452 x:0] Donating FD 10: "/dev/stdout" D0313 17:50:15.258680 44452 x:0] Donating FD 11: "/dev/stderr" D0313 17:50:15.258684 44452 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=ptrace --strace=false --strace-syscalls= --strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --stdio-fds=9 --stdio-fds=10 --stdio-fds=11 --setup-root --cpu-num 32 --total-memory 4294967296 restored_rootbash] Note this time, stdioFDs starts from 9 and stderr's FD is 11(so the saved host.descritor.origFD which is 12 for stderr is no longer valid). For the three host FD based files, The s.Dev and s.Ino derived from fstat(fd) shall all be the same and since the two fields are used as device.MultiDeviceKey, the host.inodeFileState.sattr.InodeId which is the value of MultiDevice.Map(MultiDeviceKey), shall also all be the same. Note that for MultiDevice m, m.cache records the mapping of key to value and m.rcache records the mapping of value to key. If same value doesn't map to the same key, it will panic on restore. Now that stderr's origFD 12 is no longer valid(it happens to be /memfd:runsc-memory in my test on restore), the s.Dev and s.Ino derived from fstat(fd=12) in host.inodeFileState.afterLoad() will neither be correct. But its InodeID is still the same as saved, MultiDevice.Load() will complain about the same value(InodeID) being mapped to different keys (different from stdin and stdout's) and panic with: "MultiDevice's caches are inconsistent". Solve this problem by making sure stdioFDs for root container's init task are always the same on initial start and on restore time, no matter what cmdline user has used: debug log specified or not, platform changed or not etc. shall not affect the ability to restore. Fixes #1844.	2020-03-31 11:37:11 +08:00
Adin Scannell	3fac85da95	kvm: handle exit reasons even under EINTR. In the case of other signals (preemption), inject a normal bounce and defer the signal until the vCPU has been returned from guest mode. PiperOrigin-RevId: 303799678	2020-03-30 12:37:57 -07:00
Dean Deng	137f361400	Use host-defined file owner and mode, when possible, for imported fds. Using the host-defined file owner matches VFS1. It is more correct to use the host-defined mode, since the cached value may become out of date. However, kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are in-memory so retrieving mode should not fail. Therefore, if the host syscall fails, we rely on a cached value instead. Updates #1672. PiperOrigin-RevId: 303220864	2020-03-26 16:47:20 -07:00
Dean Deng	248e46f320	Whitelist utimensat(2). utimensat is used by hostfs for setting timestamps on imported fds. Previously, this would crash the sandbox since utimensat was not allowed. Correct the VFS2 version of hostfs to match the call in VFS1. PiperOrigin-RevId: 301970121	2020-03-19 23:30:21 -07:00
Fabricio Voznika	069f1edbe4	Improve error message when pivot_root fails PiperOrigin-RevId: 301949722	2020-03-19 20:18:03 -07:00
Dean Deng	5e413cad10	Plumb VFS2 imported fds into virtual filesystem. - When setting up the virtual filesystem, mount a host.filesystem to contain all files that need to be imported. - Make read/preadv syscalls to the host in cases where preadv2 may not be supported yet (likewise for writing). - Make save/restore functions in kernel/kernel.go return early if vfs2 is enabled. PiperOrigin-RevId: 300922353	2020-03-14 07:14:33 -07:00
Fabricio Voznika	f2e4b5ab93	Kill sandbox process when parent process terminates When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247	2020-03-12 12:32:26 -07:00
Andrei Vagin	d3fa741fb5	runsc: Set asyncpreemptoff for the kvm platform The asynchronous goroutine preemption is a new feature of Go 1.14. When we switched to go 1.14 (cl/297915917) in the bazel config, the kokoro syscall-kvm job started permanently failing. Lets temporary set asyncpreemptoff for the kvm platform to unblock tests. PiperOrigin-RevId: 300372387	2020-03-11 11:45:50 -07:00
gVisor bot	6367963c14	Merge pull request #1951 from moricho:moricho/add-profiler-option PiperOrigin-RevId: 299233818	2020-03-05 17:16:54 -08:00
Andrei Vagin	6ec669631f	tests: Don't print log messages on stdout A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 299174138	2020-03-05 13:08:04 -08:00
Andrei Vagin	80b40bbb06	tests: Don't print log messages on stdout A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 298966577	2020-03-04 16:16:35 -08:00
Andrei Vagin	322dbfe06b	Allow to specify a separate log for GO's runtime messages GO's runtime calls the write system call twice to print "panic:" and "the reason of this panic", so here is a race window when other threads can print something to the log and we will see something like this: panic: log messages from another thread The reason of the panic. This confuses the syzkaller blacklist and dedup detection. It also makes the logs generally difficult to read. e.g., data races often have one side of the race, followed by a large "diagnosis" dump, finally followed by the other side of the race. PiperOrigin-RevId: 297887895	2020-02-28 11:24:11 -08:00
Fabricio Voznika	88f7369922	Log oom_score_adj value on error Updates #1873 PiperOrigin-RevId: 297695241	2020-02-27 14:59:38 -08:00
moricho	d8ed784311	add profile option	2020-02-26 16:49:51 +09:00
Jamie Liu	471b15b212	Port most syscalls to VFS2. pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2. mount and umount2 aren't ported out of temporary laziness. access and faccessat need additional FSImpl methods to implement properly, but are stubbed to prevent googletest from CHECK-failing. Other syscalls require additional plumbing. Updates #1623 PiperOrigin-RevId: 297188448	2020-02-25 13:37:34 -08:00
Fabricio Voznika	4d7db46123	Add log during process wait in tests TestMultiContainerKillAll timed out under --race. Without logging, we cannot tell if the process list is still increasing, but slowly, or is stuck. PiperOrigin-RevId: 297158834	2020-02-25 11:14:47 -08:00
gVisor bot	4a73bae269	Initial network namespace support. TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389	2020-02-20 15:20:40 -08:00
Adin Scannell	ec5630527b	Add statefile command to runsc. PiperOrigin-RevId: 296105337	2020-02-19 18:28:42 -08:00
gVisor bot	5baf9dc2fb	Synchronize signalling with S/R This is to fix a data race between sending an external signal to a ThreadGroup and kernel saving state for S/R. PiperOrigin-RevId: 295244281	2020-02-14 15:49:09 -08:00
gVisor bot	4075de11be	Plumb VFS2 inside the Sentry - Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950	2020-02-14 11:12:47 -08:00
gVisor bot	b8e22e241c	Disallow duplicate NIC names. PiperOrigin-RevId: 294500858	2020-02-11 12:59:11 -08:00
Adin Scannell	afcab8fe9f	Clean-up comments in runsc/BUILD and CONTRIBUTING.md. PiperOrigin-RevId: 294300437	2020-02-10 14:15:36 -08:00
Adin Scannell	3e8b38d08b	Add flag package to limit visibility. PiperOrigin-RevId: 294297004	2020-02-10 13:57:01 -08:00
Dean Deng	17b9f5e662	Support listxattr and removexattr syscalls. Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385	2020-02-07 14:47:13 -08:00
Ting-Yu Wang	386a1a1564	Fix TestPauseResume in container test failed with connection refused. Sometimes we get this error under TSAN: """ error getting process data from container: connecting to control server at PID XXXX: connection refused """ The theory is that the top "sleep 20" was too short for TSAN, and the container already exited, so we get connected refused. This commit changes the test to let container signaling it's running by touching a file repeatedly forever during the test. PiperOrigin-RevId: 293710957	2020-02-06 17:07:07 -08:00
Andrei Vagin	615d661112	runsc/container_test: hide host /etc in test containers The host /etc can contain config files which affect tests. For example, bash reads /etc/passwd and if it is too big a test can fail by timeout. PiperOrigin-RevId: 293670637	2020-02-06 14:02:52 -08:00
Adin Scannell	1b6a12a768	Add notes to relevant tests. These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915	2020-02-05 22:46:35 -08:00
gVisor bot	b29aeebaf6	Merge pull request #1683 from kevinGC:ipt-udp-matchers PiperOrigin-RevId: 293243342	2020-02-04 16:20:16 -08:00
Kevin Krakauer	3f5642c5af	Increase container_test size. container_test was flaking because a small percentage of runs timed out. Tested this fix with --runs_per_test=100. PiperOrigin-RevId: 293240102	2020-02-04 15:38:53 -08:00
Fabricio Voznika	6d8bf405bc	Allow mlock in fsgofer system call filters Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 293212935	2020-02-04 13:42:27 -08:00
Ting-Yu Wang	e7846e50f2	Reduce run time for //test/syscalls:socket_inet_loopback_test_runsc_ptrace. * Tests are picked for a shard differently. It now picks one test from each block, instead of picking the whole block. This makes the same kind of tests spreads across different shards. * Reduce the number of connect() calls in TCPListenClose. PiperOrigin-RevId: 293019281	2020-02-03 15:42:21 -08:00
Brad Burlage	80ce7f2537	Tag version_test as noguitar. PiperOrigin-RevId: 292974323	2020-02-03 12:09:52 -08:00
Michael Pratt	4d1a648c7c	Allow mlock in system call filters Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 292967478	2020-02-03 11:39:51 -08:00
Fabricio Voznika	437c986c6a	Add vfs.FileDescription to FD table FD table now holds both VFS1 and VFS2 types and uses the correct one based on what's set. Parts of this CL are just initial changes (e.g. sys_read.go, runsc/main.go) to serve as a template for the remaining changes. Updates #1487 Updates #1623 PiperOrigin-RevId: 292023223	2020-01-28 15:31:03 -08:00
Adin Scannell	253c9e666c	Cleanup glog and add real caller information. In general, we've learned that logging must be avoided at all costs in the hot path. It's unlikely that the optimizations here were significant in any case, since buffer would certainly escape. This also adds a test to ensure that the caller identification works as expected, and so that logging can be benchmarked. Original: BenchmarkGoogleLogging-6 1222255 949 ns/op With this change: BenchmarkGoogleLogging-6 517323 2346 ns/op Fixes #184 PiperOrigin-RevId: 291815420	2020-01-27 16:08:35 -08:00
Adin Scannell	0e2f1b7abd	Update package locations. Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289	2020-01-27 15:31:32 -08:00
Adin Scannell	90ec596166	Fix licenses. The preferred Copyright holder is "The gVisor Authors". PiperOrigin-RevId: 291786657	2020-01-27 13:23:57 -08:00
Adin Scannell	d29e59af9f	Standardize on tools directory. PiperOrigin-RevId: 291745021	2020-01-27 12:21:00 -08:00
Dean Deng	07f2584979	Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces. There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300	2020-01-16 12:56:33 -08:00
Bhasker Hariharan	f874723e64	Bump SO_SNDBUF for fdbased endpoint used by runsc. Updates #231 PiperOrigin-RevId: 289897881	2020-01-15 11:19:06 -08:00
Ian Gudger	27500d529f	New sync package. * Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387	2020-01-09 22:02:24 -08:00
Bert Muthalaly	e21c584056	Combine various Create*NIC methods into CreateNICWithOptions. PiperOrigin-RevId: 288779416	2020-01-08 14:50:49 -08:00
Bert Muthalaly	0cc1e74b57	Add NIC.isLoopback() ...enabling us to remove the "CreateNamedLoopbackNIC" variant of CreateNIC and all the plumbing to connect it through to where the value is read in FindRoute. PiperOrigin-RevId: 288713093	2020-01-08 09:30:20 -08:00
Fabricio Voznika	0d475cdb01	Increase waitForProcessList timeout It can take more than 10 seconds when running under --race. PiperOrigin-RevId: 286296060	2019-12-18 17:10:44 -08:00
Aleksandr Razumov	67f678be27	Leave minimum CPU number as a constant Remove introduced CPUNumMin config and hard-code it as 2.	2019-12-17 20:41:02 +03:00
Aleksandr Razumov	b661434202	Add minimum CPU number and only lower CPUs on --cpu-num-from-quota * Add `--cpu-num-min` flag to control minimum CPUs * Only lower CPU count * Fix comments	2019-12-17 13:27:13 +03:00
Aleksandr Razumov	8782f0e287	Set CPU number to CPU quota When application is not cgroups-aware, it can spawn excessive threads which often defaults to CPU number. Introduce a opt-in flag that will set CPU number accordingly to CPU quota (if available). Fixes #1391	2019-12-15 21:12:43 +03:00
Kevin Krakauer	be2754a4b9	Add iptables testing framework. It would be preferrable to test iptables via syscall tests, but there are some problems with that approach: * We're limited to loopback-only, as syscall tests involve only a single container. Other link interfaces (e.g. fdbased) should be tested. * We'd have to shell out to call iptables anyways, as the iptables syscall interface itself is too large and complex to work with alone. * Running the Linux/native version of the syscall test will require root, which is a pain to configure, is inherently unsafe, and could leave host iptables misconfigured. Using the go_test target allows there to be no new test runner. PiperOrigin-RevId: 285274275	2019-12-12 14:42:11 -08:00
Bhasker Hariharan	b9aa62b9f9	Enable IPv6 in runsc Fixes #1341 PiperOrigin-RevId: 285108973	2019-12-11 19:14:26 -08:00
Andrei Vagin	f8c5ad061b	runsc/debug: add an option to list all processes runsc debug --ps list all processes with all threads. This option is added to the debug command but not to the ps command, because it is going to be used for debug purposes and we want to add any useful information without thinking about backward compatibility. This will help to investigate syzkaller issues. PiperOrigin-RevId: 285013668	2019-12-11 11:05:41 -08:00
Dean Deng	1643224af0	Finish incomplete comment. PiperOrigin-RevId: 285012278	2019-12-11 10:37:35 -08:00
Fabricio Voznika	01eadf51ea	Bump up Go 1.13 as minimum requirement PiperOrigin-RevId: 284320186	2019-12-06 23:10:15 -08:00
gVisor bot	e70636d7f1	Merge pull request #1233 from xiaobo55x:compatLog PiperOrigin-RevId: 284305935	2019-12-06 19:41:39 -08:00
Adin Scannell	371e210b83	Add runtime tracing. This adds meaningful annotations to the trace generated by the runtime/trace package. PiperOrigin-RevId: 284290115	2019-12-06 17:00:07 -08:00
Nicolas Lacasse	663fe840f7	Implement TTY field in control.Processes(). Threadgroups already know their TTY (if they have one), which now contains the TTY Index, and is returned in the Processes() call. PiperOrigin-RevId: 284263850	2019-12-06 14:34:13 -08:00
Fabricio Voznika	ea7a100202	Make annotations OCI compliant Changed annotation to follow the standard defined here: https://github.com/opencontainers/image-spec/blob/master/annotations.md PiperOrigin-RevId: 284254847	2019-12-06 13:51:38 -08:00
Fabricio Voznika	40035d7d9c	Fix possible race condition destroying container When the sandbox is destroyed, making URPC calls to destroy the container will fail. The code was checking if the sandbox was running before attempting to make the URPC call, but that is racy. PiperOrigin-RevId: 284093764	2019-12-05 17:58:36 -08:00
Dean Deng	19b2d997ec	Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets. There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the latter. Fixes #1188 PiperOrigin-RevId: 283550925	2019-12-03 08:33:22 -08:00
Haibo Xu	61f2274cb6	Enable runsc compatLog support on arm64. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I3fd5e552f5f03b5144ed52647f75af3b8253b1d6	2019-12-03 03:25:54 +00:00
Dean Deng	684f757a22	Add support for receiving TOS and TCLASS control messages in hostinet. This involves allowing getsockopt/setsockopt for the corresponding socket options, as well as allowing hostinet to process control messages received from the actual recvmsg syscall. PiperOrigin-RevId: 282851425	2019-11-27 16:21:05 -08:00
gVisor bot	4a620c436d	Merge pull request #981 from tanjianfeng:fix-898 PiperOrigin-RevId: 282669859	2019-11-26 17:21:43 -08:00
Fabricio Voznika	97d2c9a94e	Use mount hints to determine FileAccessType PiperOrigin-RevId: 282401165	2019-11-25 11:43:05 -08:00
gVisor bot	0416c247ec	Merge pull request #1176 from xiaobo55x:runsc_boot PiperOrigin-RevId: 282382564	2019-11-25 11:01:22 -08:00
Jianfeng Tan	f697d1a33e	gofer: reduce CPU usage on GC as of frequent readdir Refer to golang mallocgc(), each time of allocating an object > 32 KB, a gc will be triggered. When we do readdir, sentry always passes 65535, which leads to a malloc of 65535 * sizeof(p9.Direnta) > 32 KB. Considering we already use slice append, let's avoid defining the capability for this slide. Command for test: Before this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m54.272s user 0m2.010s sys 0m1.740s (CPU usage of Gofer: ~30 cores) (host)$ perf top -p <pid-of-gofer> 42.57% runsc [.] runtime.gcDrain 23.41% runsc [.] runtime.(lfstack).pop 9.74% runsc [.] runtime.greyobject 8.06% runsc [.] runtime.(lfstack).push 4.33% runsc [.] runtime.scanobject 1.69% runsc [.] runtime.findObject 1.12% runsc [.] runtime.findrunnable 0.69% runsc [.] runtime.runqgrab ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 2m10.934s user 0m0.280s sys 0m4.260s (CPU usage of Gofer: ~1 core) After this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m22.465s user 0m1.270s sys 0m1.310s (CPU usage of Gofer: ~1 core) $ perf top -p <pid-of-gofer> 20.57% runsc [.] runtime.gcDrain 7.15% runsc [.] runtime.(lfstack).pop 4.11% runsc [.] runtime.scanobject 3.78% runsc [.] runtime.greyobject 2.78% runsc [.] runtime.(lfstack).push ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 0m13.338s user 0m0.190s sys 0m3.980s (CPU usage of Gofer: ~0.8 core) Fixes #898 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>	2019-11-23 13:24:46 +00:00
Michael Pratt	5eb522193c	Force timezone initialization before filter installation The first use of time.Local (usually via time.Time.Date, et. al) performs initialization of the local timezone, which involves open several tzdata files from the host. Since filter installation disallows open, we should explicitly force this initialization rather than implicitly depending on the first logging (or other time) call occurring before filter installation. PiperOrigin-RevId: 282053121	2019-11-22 15:47:15 -08:00
Haibo Xu	05871a1cdc	Enable runsc/boot support on arm64. This patch also include a minor change to replace syscall.Dup2 with syscall.Dup3 which was missed in a previous commit(ref `a25a976`). Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3	2019-11-13 06:39:11 +00:00
Jamie Liu	f8ffadddb3	Add p9.OpenTruncate. This is required to implement O_TRUNC correctly on filesystems backed by gofers. 9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags bits, e.g. O_RDONLY, O_RDWR, O_WRONLY." open(2): "The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation flags and file status flags can be bitwise-or'd in flags." The reference 9P2000.L implementation also appears to expect arbitrary flags, not just access modes, in Tlopen.flags: https://github.com/chaos/diod/blob/master/diod/ops.c#L703 PiperOrigin-RevId: 278972683	2019-11-06 17:11:58 -08:00
Adin Scannell	e904823833	Fix repository build scripts. This fixes a number of issues with the repository build process: * Fix the overall structure of the repository. * Fix the debian package description. * Fix the broken version number for packages. * Update the digest algorithm used for signing the release. I've validated that installation works from a separate staging bucket. Updates #852 PiperOrigin-RevId: 278716914	2019-11-05 15:16:04 -08:00
Michael Pratt	b23b36e701	Add NETLINK_KOBJECT_UEVENT socket support NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events. gVisor doesn't have any device events, so our sockets don't need to do anything once created. systemd's device manager needs to be able to create one of these sockets. It also wants to install a BPF filter on the socket. Since we'll never send any messages, the filter would never be invoked, thus we just fake it out. Fixes #1117 Updates #1119 PiperOrigin-RevId: 278405893	2019-11-04 10:07:52 -08:00
gVisor bot	802a3b3bd0	Merge pull request #1109 from xiaobo55x:fsgofer PiperOrigin-RevId: 278032567	2019-11-01 17:37:07 -07:00
Nicolas Lacasse	e70f28664a	Allow the watchdog to detect when the sandbox is stuck during setup. The watchdog currently can find stuck tasks, but has no way to tell if the sandbox is stuck before the application starts executing. This CL adds a startup timeout and action to the watchdog. If Start() is not called before the given timeout (if non-zero), then the watchdog will take the action. PiperOrigin-RevId: 277970577	2019-11-01 11:49:31 -07:00
Ian Lewis	36837c4ad3	Add systemd-cgroup flag option. Adds a systemd-cgroup flag option that prints an error letting the user know that systemd cgroups are not supported and points them to the relevant issue. Issue #193 PiperOrigin-RevId: 277837162	2019-10-31 17:39:06 -07:00
gVisor bot	0202be1ba5	Merge pull request #1058 from cmingxu:master PiperOrigin-RevId: 277623766	2019-10-31 11:26:45 -07:00
Fabricio Voznika	ca90dad0e2	Fix container locking Sandbox root dir was not being saved with the Container state, so it would point to the wrong directory location when attempting to lock the sandbox. This led to race conditions saving and loading container state. Fixing it, led to multiple deadlocks. I've moved the saving and locking logic to a separate struct and moved the lock file inside the RootDir (instead of container root dir), which allows the lock to be taken inside Destroy, and removes the need to lock the sandbox. PiperOrigin-RevId: 277599612	2019-10-30 15:39:04 -07:00
Andrei Vagin	db37483cb6	Store endpoints inside multiPortEndpoint in a sorted order It is required to guarantee the same order of endpoints after save/restore. PiperOrigin-RevId: 277598665	2019-10-30 15:33:41 -07:00
Haibo Xu	80d0db274e	Enable runsc/fsgofer support on arm64. newfstatat() syscall is not supported on arm64, so we resort to use the fstatat() syscall. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0	2019-10-30 05:21:36 +00:00
Haibo Xu	dec831b493	Cast the Stat_t.Nlink to uint64 on arm64. Since the syscall.Stat_t.Nlink is defined as different types on amd64 and arm64(uint64 and uint32 respectively), we need to cast them to a unified uint64 type in gVisor code. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b	2019-10-28 05:56:03 +00:00
Fabricio Voznika	e8ba10c008	Fix early deletion of rootDir container.startContainers() cannot be called twice in a test (e.g. TestMultiContainerLoadSandbox) because the cleanup function deletes the rootDir, together with information from all other containers that may exist. PiperOrigin-RevId: 276591806	2019-10-24 16:36:54 -07:00
kevin.xu	1f19624fa1	fix typo fix a typo	2019-10-23 15:21:50 +08:00
kevin.xu	3edbdcc191	remove duplicated period remove a duplicated period	2019-10-23 14:56:44 +08:00
gVisor bot	6122b413f1	Merge pull request #1046 from tomlanyon:crio PiperOrigin-RevId: 276172466	2019-10-22 17:05:04 -07:00
Andrei Vagin	8720bd643e	netstack/tcp: software segmentation offload Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677	2019-10-22 11:55:56 -07:00
Kevin Krakauer	12235d533a	AF_PACKET support for netstack (aka epsocket). Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366	2019-10-21 13:23:18 -07:00
Tom Lanyon	7e8b5f4a3a	Add runsc OCI annotations to support CRI-O. Obligatory https://xkcd.com/927 Fixes #626	2019-10-20 21:11:01 +11:00
Michael Pratt	49b596b98d	Cleanup host UDS support This change fixes several issues with the fsgofer host UDS support. Notably, it adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes unsafe use of unet.Socket, which could cause a panic if Socket.FD is called when err != nil, and calls to Socket.FD with nothing to prevent the garbage collector from destroying and closing the socket. A set of tests is added to exercise host UDS access. This required extracting most of the syscall test runner into a library that can be used by custom tests. Updates #235 Updates #1003 [1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can only reply to a client that binds first. We don't allow bind, so these are unlikely to be used. PiperOrigin-RevId: 275558502	2019-10-18 15:33:03 -07:00
Fabricio Voznika	9fb562234e	Fix problem with open FD when copy up is triggered in overlayfs Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289	2019-10-16 15:06:24 -07:00
Michael Pratt	a295616326	Make Attach no longer a special snowflake fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW file or connect a socket rather than creating a standard control file like localFile.Walk. This is unecessary and error-prone, as the attach point still has to go through Open or Connect which will properly convert the control file to something usable. As such, switch the logic to be equivalent to a simple Walk. Updates #235 PiperOrigin-RevId: 274827872	2019-10-15 10:01:22 -07:00
gVisor bot	35d35ea5d0	Merge pull request #997 from dvrkps:patch-1 PiperOrigin-RevId: 274675428	2019-10-14 15:52:22 -07:00
Davor Kapsa	fec0663bb7	Set base to root	2019-10-11 06:38:26 +02:00

1 2 3 4 5 ...

772 Commits