gvisor

Commit Graph

Author	SHA1	Message	Date
Fabricio Voznika	d28f71adcf	Remove 'clearStatus' option from container.Wait*PID() clearStatus was added to allow detached execution to wait on the exec'd process and retrieve its exit status. However, it's not currently used. Both docker and gvisor-containerd-shim wait on the "shim" process and retrieve the exit status from there. We could change gvisor-containerd-shim to use waits, but it will end up also consuming a process for the wait, which is similar to having the shim process. Closes #234 PiperOrigin-RevId: 251349490	2019-06-03 18:16:09 -07:00
Bhasker Hariharan	035a8fa38e	Add support for collecting execution trace to runsc. Updates #220 PiperOrigin-RevId: 250532302	2019-05-30 12:07:11 -07:00
Andrei Vagin	5f8225c009	runsc: don't create an empty network namespace if NetworkHost is set With this change, we will be able to run runsc do in a host network namespace. PiperOrigin-RevId: 246436660 Change-Id: I8ea18b1053c88fe2feed74239b915fe7a151ce34	2019-05-02 19:34:36 -07:00
Fabricio Voznika	bbb6539114	Add [simple] network support to 'runsc do' Sandbox always runsc with IP 192.168.10.2 and the peer network adds 1 to the address (192.168.10.3). Sandbox IP can be changed using --ip flag. Here a few examples: sudo runsc do curl www.google.com sudo runsc do --ip=10.10.10.2 bash -c "echo 123 \| netcat -l -p 8080" PiperOrigin-RevId: 246421277 Change-Id: I7b3dce4af46a57300350dab41cb27e04e4b6e9da	2019-05-02 17:17:39 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Fabricio Voznika	546a1df7d1	Add 'runsc do' command It provides an easy way to run commands to quickly test gVisor. By default it maps the host root as the container root with a writable overlay on top (so the host root is not modified). Example: sudo runsc do ls -lh --color sudo runsc do ~/src/test/my-test.sh PiperOrigin-RevId: 243178711 Change-Id: I05f3d6ce253fe4b5f1362f4a07b5387f6ddb5dd9	2019-04-11 17:54:34 -07:00
Fabricio Voznika	e420cc3e5d	Add support for mount propagation Properly handle propagation options for root and mounts. Now usage of mount options shared, rshared, and noexec cause error to start. shared/ rshared breaks sandbox=>host isolation. slave however can be supported because changes propagate from host to sandbox. Root FS setup moved inside the gofer. Apart from simplifying the code, it keeps all mounts inside the namespace. And they are torn down when the namespace is destroyed (DestroyFS is no longer needed). PiperOrigin-RevId: 239037661 Change-Id: I8b5ee4d50da33c042ea34fa68e56514ebe20e6e0	2019-03-18 12:30:43 -07:00
Fabricio Voznika	bc9b979b94	Add profiling commands to runsc Example: runsc debug --root=<dir> \ --profile-heap=/tmp/heap.prof \ --profile-cpu=/tmp/cpu.prod --profile-delay=30 \ <container ID> PiperOrigin-RevId: 237848456 Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773	2019-03-11 11:47:30 -07:00
Andrei Vagin	dd577f5410	runsc: reap a sandbox process only in sandbox.Wait() PiperOrigin-RevId: 231504064 Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d	2019-01-29 17:15:56 -08:00
Andrei Vagin	5f08f8fd81	Don't bind-mount runsc into a sandbox mntns PiperOrigin-RevId: 230437407 Change-Id: Id9d8ceeb018aad2fe317407c78c6ee0f4b47aa2b	2019-01-22 16:46:42 -08:00
Fabricio Voznika	c1be25b78d	Scrub runsc error messages Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1	2019-01-18 17:36:02 -08:00
Andrei Vagin	c0a981629c	Start a sandbox process in a new userns only if CAP_SETUID is set In addition, it fixes a race condition in TestMultiContainerGoferStop. There are two scripts copy the same set of files into the same directory and sometime one of this command fails with EXIST. PiperOrigin-RevId: 230011247 Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c	2019-01-18 16:08:39 -08:00
Andrei Vagin	c063a1350f	runsc: create a new proc mount if the sandbox process is running in a new pidns PiperOrigin-RevId: 229971902 Change-Id: Ief4fac731e839ef092175908de9375d725eaa3aa	2019-01-18 12:17:34 -08:00
Andrei Vagin	a46b6d453d	runsc: set up a minimal chroot from the sandbox process In this case, new mounts are not created in the host mount namspaces, so tearDownChroot isn't needed, because chroot will be destroyed with a sandbox mount namespace. In additional, pivot_root can't be called instead of chroot. PiperOrigin-RevId: 229250871 Change-Id: I765bdb587d0b8287a6a8efda8747639d37c7e7b6	2019-01-14 14:08:19 -08:00
Andrei Vagin	f8c8f24154	runsc: Collect zombies of sandbox and gofer processes And we need to wait a gofer process before cgroup.Uninstall, because it is running in the sandbox cgroups. PiperOrigin-RevId: 228904020 Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9	2019-01-11 10:32:26 -08:00
Fabricio Voznika	0d7023d581	Restore to original cgroup after sandbox and gofer processes are created The original code assumed that it was safe to join and not restore cgroup, but Container.Run will not exit after calling start, making cgroup cleanup fail because there were still processes inside the cgroup. PiperOrigin-RevId: 228529199 Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51	2019-01-09 09:18:15 -08:00
Fabricio Voznika	d033a76fa6	Apply chroot for --network=host too PiperOrigin-RevId: 227747566 Change-Id: Ide9df4ac1391adcd1c56e08d6570e0d149d85bc4	2019-01-03 14:10:44 -08:00
Fabricio Voznika	a891afad6d	Simplify synchronization between runsc and sandbox process Make 'runsc create' join cgroup before creating sandbox process. This removes the need to synchronize platform creation and ensure that sandbox process is charged to the right cgroup from the start. PiperOrigin-RevId: 227166451 Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2	2018-12-28 13:48:24 -08:00
Andrei Vagin	1b1a42ba6d	A sandbox process should wait until it has not been moved into cgroups PiperOrigin-RevId: 224418900 Change-Id: I53cf4d7c1c70117875b6920f8fd3d58a3b1497e9	2018-12-06 15:28:29 -08:00
Nicolas Lacasse	adf8138e06	Allow sandbox.Wait to be called after the sandbox has exited. sandbox.Wait is racey, as the sandbox may have exited before it is called, or even during. We already had code to handle the case that the sandbox exits during the Wait call, but we were not properly handling the case where the sandbox has exited before the call. The best we can do in such cases is return the sandbox exit code as the application exit code. PiperOrigin-RevId: 221702517 Change-Id: I290d0333cc094c7c1c3b4ce0f17f61a3e908d787	2018-11-15 15:35:41 -08:00
Fabricio Voznika	86b3f0cd24	Fix race between start and destroy Before this change, a container starting up could race with destroy (aka delete) and leave processes behind. Now, whenever a container is created, Loader.processes gets a new entry. Start now expects the entry to be there, and if it's not it means that the container was deleted. I've also fixed Loader.waitPID to search for the process using the init process's PID namespace. We could use a few more tests for signal and wait. I'll send them in another cl. PiperOrigin-RevId: 220224290 Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4	2018-11-05 21:29:37 -08:00
Ian Lewis	9d69d85bc1	Make error messages a bit more user friendly. Updated error messages so that it doesn't print full Go struct representations when running a new container in a sandbox. For example, this occurs frequently when commands are not found when doing a 'kubectl exec'. PiperOrigin-RevId: 219729141 Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09	2018-11-01 17:40:09 -07:00
Ian Lewis	c2c0f9cb7e	Updated cleanup code to be more explicit about ignoring errors. Errors are shown as being ignored by assigning to the blank identifier. PiperOrigin-RevId: 218103819 Change-Id: I7cc7b9d8ac503a03de5504ebdeb99ed30a531cf2	2018-10-21 19:42:32 -07:00
Ian Gudger	8fce67af24	Use correct company name in copyright header PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035	2018-10-19 16:35:11 -07:00
Fabricio Voznika	f3ffa4db52	Resolve mount paths while setting up root fs mount It's hard to resolve symlinks inside the sandbox because rootfs and mounts may be read-only, forcing us to create mount points inside lower layer of an overlay, before the volumes are mounted. Since the destination must already be resolved outside the sandbox when creating mounts, take this opportunity to rewrite the spec with paths resolved. "runsc boot" will use the "resolved" spec to load mounts. In addition, symlink traversals were disabled while mounting containers inside the sandbox. It haven't been able to write a good test for it. So I'm relying on manual tests for now. PiperOrigin-RevId: 217749904 Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9	2018-10-18 12:42:24 -07:00
Nicolas Lacasse	4e6f0892c9	runsc: Support job control signals for the root container. Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732	2018-10-17 12:29:05 -07:00
Nicolas Lacasse	cea51641d4	Bump sandbox start and stop timeouts. PiperOrigin-RevId: 217433699 Change-Id: Icef08285728c23ee7dd650706aaf18da51c25dff	2018-10-16 20:34:10 -07:00
Nicolas Lacasse	3f05325956	Never send boot process stdio to application stdio. We treat handle the boot process stdio separately from the application stdio (which gets passed via flags), but we were still sending both to same place. As a result, some logs that are written directly to os.Stderr by the boot process were ending up in the application logs. This CL starts sendind boot process stdio to the null device (since we don't have any better options). The boot process is already configured to send all logs (and panics) to the log file, so we won't miss anything important. PiperOrigin-RevId: 217173020 Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794	2018-10-15 11:08:49 -07:00
Fabricio Voznika	f074f0c2c7	Make the gofer process enter namespaces This is done to further isolate the gofer from the host. PiperOrigin-RevId: 216790991 Change-Id: Ia265b77e4e50f815d08f743a05669f9d75ad7a6f	2018-10-11 17:45:51 -07:00
Nicolas Lacasse	ea5f6ed6ec	Make Wait() return the sandbox exit status if the sandbox has exited. It's possible for Start() and Wait() calls to race, if the sandboxed application is short-lived. If the application finishes before (or during) the Wait RPC, then Wait will fail. In practice this looks like "connection refused" or "EOF" errors when waiting for an RPC response. This race is especially bad in tests, where we often run "true" inside a sandbox. This CL does a best-effort fix, by returning the sandbox exit status as the container exit status. In most cases, these are the same. This fixes the remaining flakes in runsc/container:container_test. PiperOrigin-RevId: 216777793 Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a	2018-10-11 16:07:05 -07:00
Fabricio Voznika	e68d86e1bd	Make debug log file name configurable This is a breaking change if you're using --debug-log-dir. The fix is to replace it with --debug-log and add a '/' at the end: --debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/ PiperOrigin-RevId: 216761212 Change-Id: I244270a0a522298c48115719fa08dad55e34ade1	2018-10-11 14:29:37 -07:00
Fabricio Voznika	f413e4b117	Add bare bones unsupported syscall logging This change introduces a new flags to create/run called --user-log. Logs to this files are visible to users and are meant to help debugging problems with their images and containers. For now only unsupported syscalls are sent to this log, and only minimum support was added. We can build more infrastructure around it as needed. PiperOrigin-RevId: 216735977 Change-Id: I54427ca194604991c407d49943ab3680470de2d0	2018-10-11 11:56:54 -07:00
Nicolas Lacasse	1939cd020f	runsc: Pass controlling TTY by FD in the new process, not current process. When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY in the new process, not the current process. The ioctl call is made after duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD. This fixes the "bad address" flakes in runsc/container:container_test, although some other flakes remain. PiperOrigin-RevId: 216594394 Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862	2018-10-10 14:35:03 -07:00
Fabricio Voznika	29cd05a7c6	Add sandbox to cgroup Sandbox creation uses the limits and reservations configured in the OCI spec and set cgroup options accordinly. Then it puts both the sandbox and gofer processes inside the cgroup. It also allows the cgroup to be pre-configured by the caller. If the cgroup already exists, sandbox and gofer processes will join the cgroup but it will not modify the cgroup with spec limits. PiperOrigin-RevId: 216538209 Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019	2018-10-10 09:00:42 -07:00
Fabricio Voznika	3f46f2e501	Fix sandbox chroot Sandbox was setting chroot, but was not chaging the working dir. Added test to ensure this doesn't happen in the future. PiperOrigin-RevId: 215676270 Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1	2018-10-03 20:44:20 -07:00
Nicolas Lacasse	e215b9970a	runsc: Pass root container's stdio via FD. We were previously using the sandbox process's stdio as the root container's stdio. This makes it difficult/impossible to distinguish output application output from sandbox output, such as panics, which are always written to stderr. Also close the console socket when we are done with it. PiperOrigin-RevId: 215585180 Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0	2018-10-03 10:32:03 -07:00
Nicolas Lacasse	f1c01ed886	runsc: Support job control signals in "exec -it". Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39	2018-10-01 22:06:56 -07:00
Fabricio Voznika	2496d9b4b6	Make runsc kill and delete more conformant to the "spec" PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521	2018-09-28 12:22:21 -07:00
Fabricio Voznika	cf226d48ce	Switch to root in userns when CAP_SYS_CHROOT is also missing Some tests check current capabilities and re-run the tests as root inside userns if required capabibilities are missing. It was checking for CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now. PiperOrigin-RevId: 214949226 Change-Id: Ic81363969fa76c04da408fae8ea7520653266312	2018-09-28 09:44:13 -07:00
Fabricio Voznika	491faac03b	Implement 'runsc kill --all' In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6	2018-09-27 15:00:58 -07:00
Fabricio Voznika	b514ab0589	Refactor 'runsc boot' to take container ID as argument This makes the flow slightly simpler (no need to call Loader.SetRootContainer). And this is required change to tag tasks with container ID inside the Sentry. PiperOrigin-RevId: 214795210 Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884	2018-09-27 10:26:34 -07:00
Fabricio Voznika	b63c4bfe02	Set Sandbox.Chroot so it gets cleaned up upon destruction I've made several attempts to create a test, but the lack of permission from the test user makes it nearly impossible to test anything useful. PiperOrigin-RevId: 213922174 Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625	2018-09-20 18:54:09 -07:00
Kevin Krakauer	ffb5fdd690	runsc: Fix stdin/stdout/stderr in multi-container mode. The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4	2018-09-19 22:20:41 -07:00
Nicolas Lacasse	915d76aa92	Add container.Destroy urpc method. This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9	2018-09-19 18:54:14 -07:00
Fabricio Voznika	7967d8ecd5	Handle children processes better in tests Reap children more systematically in container tests. Previously, container_test was taking ~5 mins to run because constainer.Destroy() would timeout waiting for the sandbox process to exit. Now the test running in less than a minute. Also made the contract around Container and Sandbox destroy clearer. PiperOrigin-RevId: 213527471 Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d	2018-09-18 15:21:28 -07:00
Kevin Krakauer	7e00f37054	Automated rollback of changelist 213307171 PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f	2018-09-18 13:22:26 -07:00
Kevin Krakauer	bb88c187c5	runsc: Enable waiting on exited processes. This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558	2018-09-17 16:25:24 -07:00
Kevin Krakauer	25add7b22b	runsc: Fix stdin/out/err in multi-container mode. Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2	2018-09-17 11:31:28 -07:00
Lantao Liu	bde2a91433	runsc: Support container signal/wait. This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a	2018-09-13 16:38:03 -07:00

1 2

92 Commits