gvisor

Commit Graph

Author	SHA1	Message	Date
Kevin Krakauer	bb88c187c5	runsc: Enable waiting on exited processes. This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558	2018-09-17 16:25:24 -07:00
Kevin Krakauer	25add7b22b	runsc: Fix stdin/out/err in multi-container mode. Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2	2018-09-17 11:31:28 -07:00
Lantao Liu	bde2a91433	runsc: Support container signal/wait. This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a	2018-09-13 16:38:03 -07:00
Kevin Krakauer	2eff1fdd06	runsc: Add exec flag that specifies where to save the sandbox-internal pid. This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0	2018-09-12 15:23:35 -07:00
Nicolas Lacasse	6cc9b311af	platform: Pass device fd into platform constructor. We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910	2018-09-11 13:09:46 -07:00
Nicolas Lacasse	9751b800a6	runsc: Support multi-container exec. We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537	2018-09-07 17:39:54 -07:00
Kevin Krakauer	8f0b6e7fc0	runsc: Support runsc kill multi-container. Now, we can kill individual containers rather than the entire sandbox. PiperOrigin-RevId: 211748106 Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead	2018-09-05 21:14:56 -07:00
Nicolas Lacasse	f96b33c73c	runsc: Promote getExecutablePathInternal to getExecutablePath. Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a	2018-09-05 13:01:21 -07:00
Fabricio Voznika	db81c0b02f	Put fsgofer inside chroot Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0	2018-08-27 11:10:14 -07:00
Nicolas Lacasse	106de2182d	runsc: Terminal support for "docker exec -ti". This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9	2018-08-24 17:43:21 -07:00
Kevin Krakauer	635b0c4593	runsc fsgofer: Support dynamic serving of filesystems. When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068	2018-08-15 16:25:22 -07:00
Fabricio Voznika	0d350aac7f	Enable SACK in runsc SACK is disabled by default and needs to be manually enabled. It not only improves performance, but also fixes hangs downloading files from certain websites. PiperOrigin-RevId: 207906742 Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b	2018-08-08 10:26:18 -07:00
Justine Olshan	c05660373e	Moved restore code out of create and made to be called after create. Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f	2018-07-18 16:58:30 -07:00
Kevin Krakauer	16d37973eb	runsc: Add the "wait" subcommand. Users can now call "runsc wait <container id>" to wait on a particular process inside the container. -pid can also be used to wait on a specific PID. Manually tested the wait subcommand for a single waiter and multiple waiters (simultaneously 2 processes waiting on the container and 2 processes waiting on a PID within the container). PiperOrigin-RevId: 202548978 Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c	2018-06-28 14:56:36 -07:00
Kevin Krakauer	04bdcc7b65	runsc: Enable waiting on individual containers within a sandbox. PiperOrigin-RevId: 201742160 Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31	2018-06-22 14:31:25 -07:00
Fabricio Voznika	4ad7315b67	Add 'runsc debug' command It prints sandbox stacks to the log to help debug stuckness. I expect that many more options will be added in the future. PiperOrigin-RevId: 201405931 Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e	2018-06-20 13:31:31 -07:00
Kevin Krakauer	5397963b5d	runsc: Enable container creation within existing sandboxes. Containers are created as processes in the sandbox. Of the many things that don't work yet, the biggest issue is that the fsgofer is launched with its root as the sandbox's root directory. Thus, when a container is started and wants to read anything (including the init binary of the container), the gofer tries to serve from sandbox's root (which basically just has pause), not the container's. PiperOrigin-RevId: 201294560 Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1	2018-06-19 21:44:33 -07:00
Justine Olshan	a6dbef045f	Added a resume command to unpause a paused container. Resume checks the status of the container and unpauses the kernel if its status is paused. Otherwise nothing happens. Tests were added to ensure that the process is in the correct state after various commands. PiperOrigin-RevId: 201251234 Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19	2018-06-19 15:23:36 -07:00
Justine Olshan	0786707cd9	Added code for a pause command for a container process. Like runc, the pause command will pause the processes of the given container. It will set that container's status to "paused." A resume command will be be added to unpause and continue running the process. PiperOrigin-RevId: 200789624 Change-Id: I72a5d7813d90ecfc4d01cc252d6018855016b1ea	2018-06-15 16:09:09 -07:00
Googler	722275c3d1	Added a function to the controller to checkpoint a container. Functionality for checkpoint is not complete, more to come. PiperOrigin-RevId: 199500803 Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7	2018-06-06 11:43:55 -07:00
Nicolas Lacasse	31386185fe	Push signal-delivery and wait into the sandbox. This is another step towards multi-container support. Previously, we delivered signals directly to the sandbox process (which then forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a container by waiting on the sandbox process itself. This approach will not work when there are multiple containers inside the sandbox, and we need to signal/wait on individual containers. This CL adds two new messages, ContainerSignal and ContainerWait. These messages include the id of the container to signal/wait. The controller inside the sandbox receives these messages and signals/waits on the appropriate process inside the sandbox. The container id is plumbed into the sandbox, but it currently is not used. We still end up signaling/waiting on PID 1 in all cases. Once we actually have multiple containers inside the sandbox, we will need to keep some sort of map of container id -> pid (or possibly pid namespace), and signal/kill the appropriate process for the container. PiperOrigin-RevId: 197028366 Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895	2018-05-17 11:55:28 -07:00
Nicolas Lacasse	1bdec86bae	Return better errors from Docker when runsc fails to start. Two changes in this CL: First, make the "boot" process sleep when it encounters an error to give the controller time to send the error back to the "start" process. Otherwise the "boot" process exits immediately and the control connection errors with EOF. Secondly, open the log file with O_APPEND, not O_TRUNC. Docker uses the same log file for all runtime commands, and setting O_TRUNC causes them to get destroyed. Furthermore, containerd parses these log files in the event of an error, and it does not like the file being truncated out from underneath it. Now, when trying to run a binary that does not exist in the image, the error message is more reasonable: $ docker run alpine /not/found docker: Error response from daemon: OCI runtime start failed: /usr/local/google/docker/runtimes/runscd did not terminate sucessfully: error starting sandbox: error starting application [/not/found]: failed to create init process: no such file or directory Fixes #32 PiperOrigin-RevId: 196027084 Change-Id: Iabc24c0bdd8fc327237acc051a1655515f445e68	2018-05-09 14:13:37 -07:00
Googler	d02b74a5dc	Check in gVisor. PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463	2018-04-28 01:44:26 -04:00

23 Commits