gvisor

Commit Graph

Author	SHA1	Message	Date
Fabricio Voznika	a2ad8fef13	Make multi-container the default mode for runsc And remove multicontainer option. PiperOrigin-RevId: 215236981 Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb	2018-10-01 10:31:17 -07:00
Fabricio Voznika	2496d9b4b6	Make runsc kill and delete more conformant to the "spec" PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521	2018-09-28 12:22:21 -07:00
Nicolas Lacasse	d489336784	runsc: All non-root bind mounts should be shared. This CL changes the semantics of the "--file-access" flag so that it only affects the root filesystem. The default remains "exclusive" which is the common use case, as neither Docker nor K8s supports sharing the root. Keeping the root fs as "exclusive" means that the fs-intensive work done during application startup will mostly be cacheable, and thus faster. Non-root bind mounts will always be shared. This CL also removes some redundant FSAccessType validations. We validate this flag in main(), so we can assume it is valid afterwards. PiperOrigin-RevId: 214359936 Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51	2018-09-24 17:22:15 -07:00
Kevin Krakauer	ffb5fdd690	runsc: Fix stdin/stdout/stderr in multi-container mode. The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4	2018-09-19 22:20:41 -07:00
Nicolas Lacasse	915d76aa92	Add container.Destroy urpc method. This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9	2018-09-19 18:54:14 -07:00
Fabricio Voznika	e395273301	Fix sandbox and gofer capabilities Capabilities.Set() adds capabilities, but doesn't remove existing ones that might have been loaded. Fixed the code and added tests. PiperOrigin-RevId: 213726369 Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba	2018-09-19 17:15:14 -07:00
Nicolas Lacasse	2ad3228cd0	runsc: Don't create __runsc_containers__ unless we are in multi-container mode. PiperOrigin-RevId: 213715511 Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12	2018-09-19 16:10:47 -07:00
Kevin Krakauer	7e00f37054	Automated rollback of changelist 213307171 PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f	2018-09-18 13:22:26 -07:00
Kevin Krakauer	25add7b22b	runsc: Fix stdin/out/err in multi-container mode. Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2	2018-09-17 11:31:28 -07:00
Nicolas Lacasse	9751b800a6	runsc: Support multi-container exec. We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537	2018-09-07 17:39:54 -07:00
Fabricio Voznika	bc81f3fe4a	Remove '--file-access=direct' option It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd	2018-09-07 12:28:48 -07:00
Fabricio Voznika	12aef686af	Enabled bind mounts in sub-containers With multi-gofers, bind mounts in sub-containers should just work. Removed restrictions and added test. There are also a few cleanups along the way, e.g. retry unmounting in case cleanup races with gofer teardown. PiperOrigin-RevId: 211699569 Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374	2018-09-05 14:30:09 -07:00
Nicolas Lacasse	f96b33c73c	runsc: Promote getExecutablePathInternal to getExecutablePath. Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a	2018-09-05 13:01:21 -07:00
Fabricio Voznika	db81c0b02f	Put fsgofer inside chroot Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0	2018-08-27 11:10:14 -07:00
Kevin Krakauer	02dfceab6d	runsc: Allow runsc to properly search the PATH for executable name. Previously, runsc improperly attempted to find an executable in the container's PATH. We now search the PATH via the container's fsgofer rather than the host FS, eliminating the confusing differences between paths on the host and within a container. PiperOrigin-RevId: 210159488 Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2	2018-08-24 14:42:40 -07:00
Kevin Krakauer	635b0c4593	runsc fsgofer: Support dynamic serving of filesystems. When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068	2018-08-15 16:25:22 -07:00
Nicolas Lacasse	e8a4f2e133	runsc: Change cache policy for root fs and volume mounts. Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9	2018-08-14 16:25:58 -07:00
Justine Olshan	c05660373e	Moved restore code out of create and made to be called after create. Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f	2018-07-18 16:58:30 -07:00
Fabricio Voznika	52ddb8571c	Skip overlay on root when its readonly PiperOrigin-RevId: 203161098 Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64	2018-07-03 12:01:09 -07:00
Nicolas Lacasse	4500155ffc	runsc: Mount "mandatory" mounts right after mounting the root. The /proc and /sys mounts are "mandatory" in the sense that they should be mounted in the sandbox even when they are not included in the spec. Runsc treats /tmp similarly, because it is faster to use the internal tmpfs implementation instead of proxying to the host. However, the spec may contain submounts of these mandatory mounts (particularly for /tmp). In those cases, we must mount our mandatory mounts before the submount, otherwise the submount will be masked. Since the mandatory mounts are all top-level directories, we can mount them right after the root. PiperOrigin-RevId: 203145635 Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf	2018-07-03 10:36:22 -07:00
Justine Olshan	80bdf8a406	Sets the restore environment for restoring a container. Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c	2018-06-29 14:47:40 -07:00
Fabricio Voznika	cecc1e472c	Fix lint errors PiperOrigin-RevId: 201978212 Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067	2018-06-25 10:41:27 -07:00
Justine Olshan	f2a687001d	Added functionality to create a RestoreEnvironment. Before a container can be restored, the mounts must be configured. The root and submounts and their key information is compiled into a RestoreEnvironment. Future code will be added to set this created environment before restoring a container. Tests to ensure the correct environment were added. PiperOrigin-RevId: 201544637 Change-Id: Ia894a8b0f80f31104d1c732e113b1d65a4697087	2018-06-21 10:18:11 -07:00
Lantao Liu	821aaf531d	runsc: support "rw" mount option. PiperOrigin-RevId: 201018483 Change-Id: I52fe3d01c83c8a2f0e9275d9d88c37e46fa224a2	2018-06-18 10:34:11 -07:00
Lantao Liu	2081c5e7f7	runsc: support /dev bind mount which does not conflict with default /dev mount. PiperOrigin-RevId: 200768923 Change-Id: I4b8da10bcac296e8171fe6754abec5aabfec5e65	2018-06-15 13:58:39 -07:00
Fabricio Voznika	717f2501c9	Fix failure to mount volume that sandbox process has no access Boot loader tries to stat mount to determine whether it's a file or not. This may file if the sandbox process doesn't have access to the file. Instead, add overlay on top of file, which is better anyway since we don't want to propagate changes to the host. PiperOrigin-RevId: 200411261 Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb	2018-06-13 10:20:06 -07:00
Lantao Liu	2506b9b11f	runsc: do not include sub target if it is not started with '/'. PiperOrigin-RevId: 200274828 Change-Id: I956703217df08d8650a881479b7ade8f9f119912	2018-06-12 13:54:54 -07:00
Kevin Krakauer	2dc9cd7bf7	runsc: enable terminals in the sandbox. runsc now mounts the devpts filesystem, so you get a real terminal using ssh+sshd. PiperOrigin-RevId: 200244830 Change-Id: If577c805ad0138fda13103210fa47178d8ac6605	2018-06-12 11:03:25 -07:00
Fabricio Voznika	6c585b8eb6	Create destination mount dir if it doesn't exist PiperOrigin-RevId: 199175296 Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630	2018-06-04 12:31:35 -07:00
Fabricio Voznika	e48f707876	Configure sandbox as superuser Container user might not have enough priviledge to walk directories and mount filesystems. Instead, create superuser to perform these steps of the configuration. PiperOrigin-RevId: 197953667 Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb	2018-05-24 14:27:57 -07:00
Fabricio Voznika	ac01f245ff	Skip atime and mtime update when file is backed by host FD When file is backed by host FD, atime and mtime for the host file and the cached attributes in the Sentry must be close together. In this case, the call to update atime and mtime can be skipped. This is important when host filesystem is using overlay because updating atime and mtime explicitly forces a copy up for every file that is touched. PiperOrigin-RevId: 196176413 Change-Id: I3933ea91637a071ba2ea9db9d8ac7cdba5dc0482	2018-05-10 14:59:40 -07:00
Googler	d02b74a5dc	Check in gVisor. PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463	2018-04-28 01:44:26 -04:00

32 Commits