gvisor

Commit Graph

Author	SHA1	Message	Date
Nicolas Lacasse	915d76aa92	Add container.Destroy urpc method. This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9	2018-09-19 18:54:14 -07:00
Fabricio Voznika	e395273301	Fix sandbox and gofer capabilities Capabilities.Set() adds capabilities, but doesn't remove existing ones that might have been loaded. Fixed the code and added tests. PiperOrigin-RevId: 213726369 Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba	2018-09-19 17:15:14 -07:00
Nicolas Lacasse	2ad3228cd0	runsc: Don't create __runsc_containers__ unless we are in multi-container mode. PiperOrigin-RevId: 213715511 Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12	2018-09-19 16:10:47 -07:00
Lingfu	f0a92b6b67	Add docker command line args support for --cpuset-cpus and --cpus `docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json (runtime spec file). When nginx worker_connections is configured as auto, the worker is generated according to the number of CPUs. If the cgroup is already set on the host, but it is not displayed correctly in the sandbox, performance may be degraded. This patch can get cpus info from spec file and apply to sentry on bootup, so the /proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on `/sys/devices/system/cpu/online` are also affected by this patch. e.g. --cpuset-cpus=2,3 -> cpu number:2 --cpuset-cpus=4-7 -> cpu number:4 --cpus=2.8 -> cpu number:3 --cpus=0.5 -> cpu number:1 Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316 PiperOrigin-RevId: 213685199	2018-09-19 13:35:42 -07:00
Kevin Krakauer	7e00f37054	Automated rollback of changelist 213307171 PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f	2018-09-18 13:22:26 -07:00
Fabricio Voznika	5d9816be41	Remove memory usage static init panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9	2018-09-17 21:34:37 -07:00
Kevin Krakauer	bb88c187c5	runsc: Enable waiting on exited processes. This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558	2018-09-17 16:25:24 -07:00
Kevin Krakauer	25add7b22b	runsc: Fix stdin/out/err in multi-container mode. Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2	2018-09-17 11:31:28 -07:00
Lantao Liu	bde2a91433	runsc: Support container signal/wait. This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a	2018-09-13 16:38:03 -07:00
Kevin Krakauer	2eff1fdd06	runsc: Add exec flag that specifies where to save the sandbox-internal pid. This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0	2018-09-12 15:23:35 -07:00
Michael Pratt	0efde2bfbd	Remove getdents from filters It was only used by whitelistfs, which was removed in `bc81f3fe4a`. PiperOrigin-RevId: 212666374 Change-Id: Ia35e6dc9d68c1a3b015d5b5f71ea3e68e46c5bed	2018-09-12 10:51:25 -07:00
Michael Pratt	b4aed01bf2	Rollback of changelist 212483372 PiperOrigin-RevId: 212557844 Change-Id: I414de848e75d57ecee2c05e851d05b607db4aa57	2018-09-11 17:54:50 -07:00
Nicolas Lacasse	6cc9b311af	platform: Pass device fd into platform constructor. We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910	2018-09-11 13:09:46 -07:00
Fabricio Voznika	c44bc6612f	Allow fstatat back in syscall filters PiperOrigin-RevId: 212483372 Change-Id: If95f32a8e41126cf3dc8bd6c8b2fb0fcfefedc6d	2018-09-11 11:05:09 -07:00
Nicolas Lacasse	9751b800a6	runsc: Support multi-container exec. We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537	2018-09-07 17:39:54 -07:00
Fabricio Voznika	8ce3fbf9f8	Only start signal forwarding after init process is created PiperOrigin-RevId: 212028121 Change-Id: If9c2c62f3be103e2bb556b8d154c169888e34369	2018-09-07 13:39:12 -07:00
Fabricio Voznika	bc81f3fe4a	Remove '--file-access=direct' option It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd	2018-09-07 12:28:48 -07:00
Fabricio Voznika	f895cb4d8b	Use root abstract socket namespace for exec PiperOrigin-RevId: 211999211 Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d	2018-09-07 10:45:55 -07:00
Nicolas Lacasse	210c252089	runsc: Run sandbox process inside minimal chroot. We construct a dir with the executable bind-mounted at /exe, and proc mounted at /proc. Runsc now executes the sandbox process inside this chroot, thus limiting access to the host filesystem. The mounts and chroot dir are removed when the sandbox is destroyed. Because this requires bind-mounts, we can only do the chroot if we have CAP_SYS_ADMIN. PiperOrigin-RevId: 211994001 Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1	2018-09-07 10:16:39 -07:00
Kevin Krakauer	8f0b6e7fc0	runsc: Support runsc kill multi-container. Now, we can kill individual containers rather than the entire sandbox. PiperOrigin-RevId: 211748106 Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead	2018-09-05 21:14:56 -07:00
Fabricio Voznika	12aef686af	Enabled bind mounts in sub-containers With multi-gofers, bind mounts in sub-containers should just work. Removed restrictions and added test. There are also a few cleanups along the way, e.g. retry unmounting in case cleanup races with gofer teardown. PiperOrigin-RevId: 211699569 Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374	2018-09-05 14:30:09 -07:00
Nicolas Lacasse	f96b33c73c	runsc: Promote getExecutablePathInternal to getExecutablePath. Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a	2018-09-05 13:01:21 -07:00
Nicolas Lacasse	0a9a40abcd	runsc: Run sandbox as user nobody. When starting a sandbox without direct file or network access, we create an empty user namespace and run the sandbox in there. However, the root user in that namespace is still mapped to the root user in the parent namespace. This CL maps the "nobody" user from the parent namespace into the child namespace, and runs the sandbox process as user "nobody" inside the new namespace. PiperOrigin-RevId: 211572223 Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6	2018-09-04 20:33:05 -07:00
Nicolas Lacasse	ad8648c634	runsc: Pass log and config files to sandbox process by FD. This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. The specutils.ReadSpecFromFile method was fixed to always seek to the beginning of the file before reading. This allows Files from the same FD to be read multiple times, as we do in the boot command when the apply-caps flag is set. Tested with --network=host. PiperOrigin-RevId: 211570647 Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c	2018-09-04 20:10:01 -07:00
Michael Pratt	ab7174611c	Remove epoll_wait from filters Go 1.11 replaced it with epoll_pwait. PiperOrigin-RevId: 211510006 Change-Id: I48a6cae95ed3d57a4633895358ad05ad8bf2f633	2018-09-04 13:10:09 -07:00
Fabricio Voznika	7e18f158b2	Automated rollback of changelist 210995199 PiperOrigin-RevId: 211116429 Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82	2018-08-31 11:30:47 -07:00
Nicolas Lacasse	5ade9350ad	runsc: Pass log and config files to sandbox process by FD. This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. PiperOrigin-RevId: 210995199 Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd	2018-08-30 15:47:18 -07:00
Fabricio Voznika	30c025f3ef	Add argument checks to seccomp This is required to increase protection when running in GKE. PiperOrigin-RevId: 210635123 Change-Id: Iaaa8be49e73f7a3a90805313885e75894416f0b5	2018-08-28 17:10:03 -07:00
Michael Pratt	ea113a4380	Drop support for Go 1.10 PiperOrigin-RevId: 210589588 Change-Id: Iba898bc3eb8f13e17c668ceea6dc820fc8180a70	2018-08-28 12:56:28 -07:00
Fabricio Voznika	ae648bafda	Add command-line parameter to trigger panic on signal This is to troubleshoot problems with a hung process that is not responding to 'runsc debug --stack' command. PiperOrigin-RevId: 210483513 Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d	2018-08-27 20:36:10 -07:00
Fabricio Voznika	db81c0b02f	Put fsgofer inside chroot Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0	2018-08-27 11:10:14 -07:00
Nicolas Lacasse	106de2182d	runsc: Terminal support for "docker exec -ti". This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9	2018-08-24 17:43:21 -07:00
Kevin Krakauer	02dfceab6d	runsc: Allow runsc to properly search the PATH for executable name. Previously, runsc improperly attempted to find an executable in the container's PATH. We now search the PATH via the container's fsgofer rather than the host FS, eliminating the confusing differences between paths on the host and within a container. PiperOrigin-RevId: 210159488 Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2	2018-08-24 14:42:40 -07:00
Fabricio Voznika	001a4c2493	Clean up syscall filters Removed syscalls that are only used by whitelistfs which has its own set of filters. PiperOrigin-RevId: 209967259 Change-Id: Idb2e1b9d0201043d7cd25d96894f354729dbd089	2018-08-23 11:15:07 -07:00
Kevin Krakauer	635b0c4593	runsc fsgofer: Support dynamic serving of filesystems. When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068	2018-08-15 16:25:22 -07:00
Nicolas Lacasse	2033f61aae	runsc: Fix instances of file access "proxy". This file access type is actually called "proxy-shared", but I forgot to update all locations. PiperOrigin-RevId: 208832491 Change-Id: I7848bc4ec2478f86cf2de1dcd1bfb5264c6276de	2018-08-15 09:34:18 -07:00
Nicolas Lacasse	e8a4f2e133	runsc: Change cache policy for root fs and volume mounts. Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9	2018-08-14 16:25:58 -07:00
Fabricio Voznika	4e171f7590	Basic support for ip link/addr and ifconfig Closes #94 PiperOrigin-RevId: 207997580 Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924	2018-08-08 22:39:58 -07:00
Fabricio Voznika	ea1e39a314	Resend packets back to netstack if destined to itself Add option to redirect packet back to netstack if it's destined to itself. This fixes the problem where connecting to the local NIC address would not work, e.g.: echo bar \| nc -l -p 8080 & echo foo \| nc 192.168.0.2 8080 PiperOrigin-RevId: 207995083 Change-Id: I17adc2a04df48bfea711011a5df206326a1fb8ef	2018-08-08 22:03:35 -07:00
Fabricio Voznika	0d350aac7f	Enable SACK in runsc SACK is disabled by default and needs to be manually enabled. It not only improves performance, but also fixes hangs downloading files from certain websites. PiperOrigin-RevId: 207906742 Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b	2018-08-08 10:26:18 -07:00
Ian Gudger	3cd7824410	Move stack clock to options struct PiperOrigin-RevId: 207039273 Change-Id: Ib8f55a6dc302052ab4a10ccd70b07f0d73b373df	2018-08-01 20:22:02 -07:00
Michael Pratt	6cad96f38a	Drop dup2 filter It is unused. PiperOrigin-RevId: 206798328 Change-Id: I2d7d27c0e4a0ef51264b900f14f1b3fdad17f2c4	2018-07-31 11:38:57 -07:00
Justine Olshan	c05660373e	Moved restore code out of create and made to be called after create. Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f	2018-07-18 16:58:30 -07:00
Nicolas Lacasse	9059983fdb	runsc: Fix map access race in boot.Loader.waitContainer. PiperOrigin-RevId: 204522004 Change-Id: I4819dc025f0a1df03ceaaba7951b1902d44562b3	2018-07-13 13:46:14 -07:00
Nicolas Lacasse	67507bd579	runsc: Don't close the control server in a defer. Closing the control server will block until all open requests have completed. If a control server method panics, we end up stuck because the defer'd Destroy function will never return. PiperOrigin-RevId: 204354676 Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf	2018-07-12 13:36:57 -07:00
Bhasker Hariharan	c15cb8d432	Automated rollback of changelist 203157739 PiperOrigin-RevId: 204196916 Change-Id: If632750fc6368acb835e22cfcee0ae55c8a04d16	2018-07-11 15:07:19 -07:00
Michael Pratt	660f1203ff	Fix runsc VDSO mapping `80bdf8a406` accidentally moved vdso into an inner scope, never assigning the vdso variable passed to the Kernel and thus skipping VDSO mappings. Fix this and remove the ability for loadVDSO to skip VDSO mappings, since tests that do so are gone. PiperOrigin-RevId: 203169135 Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba	2018-07-03 12:53:39 -07:00
Fabricio Voznika	52ddb8571c	Skip overlay on root when its readonly PiperOrigin-RevId: 203161098 Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64	2018-07-03 12:01:09 -07:00
Fabricio Voznika	0ef6066167	Resend packets back to netstack if destined to itself Add option to redirect packet back to netstack if it's destined to itself. This fixes the problem where connecting to the local NIC address would not work, e.g.: echo bar \| nc -l -p 8080 & echo foo \| nc 192.168.0.2 8080 PiperOrigin-RevId: 203157739 Change-Id: I31c9f7c501e3f55007f25e1852c27893a16ac6c4	2018-07-03 11:39:17 -07:00
Nicolas Lacasse	4500155ffc	runsc: Mount "mandatory" mounts right after mounting the root. The /proc and /sys mounts are "mandatory" in the sense that they should be mounted in the sandbox even when they are not included in the spec. Runsc treats /tmp similarly, because it is faster to use the internal tmpfs implementation instead of proxying to the host. However, the spec may contain submounts of these mandatory mounts (particularly for /tmp). In those cases, we must mount our mandatory mounts before the submount, otherwise the submount will be masked. Since the mandatory mounts are all top-level directories, we can mount them right after the root. PiperOrigin-RevId: 203145635 Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf	2018-07-03 10:36:22 -07:00
Dmitry Vyukov	6144751962	runsc/boot/filter: permit SYS_TIME for race glibc's malloc also uses SYS_TIME. Permit it. #0 0x0000000000de6267 in time () #1 0x0000000000db19d8 in get_nprocs () #2 0x0000000000d8a31a in arena_get2.part () #3 0x0000000000d8ab4a in malloc () #4 0x0000000000d3c6b5 in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 140737488355328ull, 0ul, __sanitizer::SizeClassMap<3ul, 4ul, 8ul, 17ul, 64ul, 14ul>, 20ul, __sanitizer::TwoLevelByteMap<32768ull, 4096ull, __sanitizer::NoOpMapUnmapCallback>, __sanitizer::NoOpMapUnmapCallback> >*, unsigned long) () #5 0x0000000000d4cd70 in __tsan_go_start () #6 0x00000000004617a3 in racecall () #7 0x00000000010f4ea0 in runtime.findfunctab () #8 0x000000000043f193 in runtime.racegostart () Signed-off-by: Dmitry Vyukov <dvyukov@google.com> [mpratt@google.com: updated comments and commit message] Signed-off-by: Michael Pratt <mpratt@google.com> Change-Id: Ibe2d0dc3035bf5052d5fb802cfaa37c5e0e7a09a PiperOrigin-RevId: 203042627	2018-07-02 17:47:32 -07:00
Fabricio Voznika	fa64c2a151	Make default limits the same as with runc Closes #2 PiperOrigin-RevId: 202997196 Change-Id: I0c9f6f5a8a1abe1ae427bca5f590bdf9f82a6675	2018-07-02 12:51:38 -07:00
Justine Olshan	80bdf8a406	Sets the restore environment for restoring a container. Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c	2018-06-29 14:47:40 -07:00
Kevin Krakauer	16d37973eb	runsc: Add the "wait" subcommand. Users can now call "runsc wait <container id>" to wait on a particular process inside the container. -pid can also be used to wait on a specific PID. Manually tested the wait subcommand for a single waiter and multiple waiters (simultaneously 2 processes waiting on the container and 2 processes waiting on a PID within the container). PiperOrigin-RevId: 202548978 Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c	2018-06-28 14:56:36 -07:00
Fabricio Voznika	8459390cdd	Error out if spec is invalid Closes #66 PiperOrigin-RevId: 202496258 Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34	2018-06-28 09:57:27 -07:00
Fabricio Voznika	1f207de315	Add option to configure watchdog action PiperOrigin-RevId: 202494747 Change-Id: I4d4a18e71468690b785060e580a5f83c616bd90f	2018-06-28 09:46:50 -07:00
Lantao Liu	000fd8d1e4	runsc: set gofer umask to 0. PiperOrigin-RevId: 202185642 Change-Id: I2eefcc0b2ffadc6ef21d177a8a4ab0cda91f3399	2018-06-26 13:40:04 -07:00
Lantao Liu	e8ae2b85e9	runsc: add a `multi-container` flag to enable multi-container support. PiperOrigin-RevId: 201995800 Change-Id: I770190d135e14ec7da4b3155009fe10121b2a502	2018-06-25 12:08:44 -07:00
Fabricio Voznika	cecc1e472c	Fix lint errors PiperOrigin-RevId: 201978212 Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067	2018-06-25 10:41:27 -07:00
Kevin Krakauer	04bdcc7b65	runsc: Enable waiting on individual containers within a sandbox. PiperOrigin-RevId: 201742160 Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31	2018-06-22 14:31:25 -07:00
Fabricio Voznika	f6be5fe619	Forward SIGUSR2 to the sandbox too SIGUSR2 was being masked out to be used as a way to dump sentry stacks. This could cause compatibility problems in cases anyone uses SIGUSR2 to communicate with the container init process. PiperOrigin-RevId: 201575374 Change-Id: I312246e828f38ad059139bb45b8addc2ed055d74	2018-06-21 13:22:18 -07:00
Justine Olshan	f2a687001d	Added functionality to create a RestoreEnvironment. Before a container can be restored, the mounts must be configured. The root and submounts and their key information is compiled into a RestoreEnvironment. Future code will be added to set this created environment before restoring a container. Tests to ensure the correct environment were added. PiperOrigin-RevId: 201544637 Change-Id: Ia894a8b0f80f31104d1c732e113b1d65a4697087	2018-06-21 10:18:11 -07:00
Nicolas Lacasse	81d13fbd4d	runsc: Default umask should be 0. PiperOrigin-RevId: 201539050 Change-Id: I36cbf270fa5ad25de507ecb919e4005eda6aa16d	2018-06-21 09:43:15 -07:00
Fabricio Voznika	4ad7315b67	Add 'runsc debug' command It prints sandbox stacks to the log to help debug stuckness. I expect that many more options will be added in the future. PiperOrigin-RevId: 201405931 Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e	2018-06-20 13:31:31 -07:00
Kevin Krakauer	5397963b5d	runsc: Enable container creation within existing sandboxes. Containers are created as processes in the sandbox. Of the many things that don't work yet, the biggest issue is that the fsgofer is launched with its root as the sandbox's root directory. Thus, when a container is started and wants to read anything (including the init binary of the container), the gofer tries to serve from sandbox's root (which basically just has pause), not the container's. PiperOrigin-RevId: 201294560 Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1	2018-06-19 21:44:33 -07:00
Kevin Krakauer	3ebd0e35f4	runsc: Whitelist lstat, as it is now used in specutils. When running multi-container, child containers are added after the filters have been installed. Thus, lstat must be in the set of allowed syscalls. PiperOrigin-RevId: 201269550 Change-Id: I03f2e6675a53d462ed12a0f651c10049b76d4c52	2018-06-19 17:17:41 -07:00
Justine Olshan	a6dbef045f	Added a resume command to unpause a paused container. Resume checks the status of the container and unpauses the kernel if its status is paused. Otherwise nothing happens. Tests were added to ensure that the process is in the correct state after various commands. PiperOrigin-RevId: 201251234 Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19	2018-06-19 15:23:36 -07:00
Justine Olshan	873ec0c414	Modified boot.go to allow for restores. A file descriptor was added as a flag to boot so a state file can restore a container that was checkpointed. PiperOrigin-RevId: 201068699 Change-Id: I18e96069488ffa3add468861397f3877725544aa	2018-06-18 15:20:36 -07:00
Lantao Liu	821aaf531d	runsc: support "rw" mount option. PiperOrigin-RevId: 201018483 Change-Id: I52fe3d01c83c8a2f0e9275d9d88c37e46fa224a2	2018-06-18 10:34:11 -07:00
Justine Olshan	0786707cd9	Added code for a pause command for a container process. Like runc, the pause command will pause the processes of the given container. It will set that container's status to "paused." A resume command will be be added to unpause and continue running the process. PiperOrigin-RevId: 200789624 Change-Id: I72a5d7813d90ecfc4d01cc252d6018855016b1ea	2018-06-15 16:09:09 -07:00
Lantao Liu	2081c5e7f7	runsc: support /dev bind mount which does not conflict with default /dev mount. PiperOrigin-RevId: 200768923 Change-Id: I4b8da10bcac296e8171fe6754abec5aabfec5e65	2018-06-15 13:58:39 -07:00
Fabricio Voznika	ef5dd4df9b	Set kernel.applicationCores to the number of processor on the host The right number to use is the number of processors assigned to the cgroup. But until we make the sandbox join the respective cgroup, just use the number of processors on the host. Closes #65, closes #66 PiperOrigin-RevId: 200725483 Change-Id: I34a566b1a872e26c66f56fa6e3100f42aaf802b1	2018-06-15 09:19:04 -07:00
Michael Pratt	d71f5ef688	Add nanosleep filter for Go 1.11 support golang.org/cl/108538 replaces pselect6 with nanosleep in runtime.usleep. Update the filters accordingly. PiperOrigin-RevId: 200574612 Change-Id: Ifb2296fcb3781518fc047aabbbffedb9ae488cd7	2018-06-14 10:11:05 -07:00
Fabricio Voznika	717f2501c9	Fix failure to mount volume that sandbox process has no access Boot loader tries to stat mount to determine whether it's a file or not. This may file if the sandbox process doesn't have access to the file. Instead, add overlay on top of file, which is better anyway since we don't want to propagate changes to the host. PiperOrigin-RevId: 200411261 Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb	2018-06-13 10:20:06 -07:00
Lantao Liu	2506b9b11f	runsc: do not include sub target if it is not started with '/'. PiperOrigin-RevId: 200274828 Change-Id: I956703217df08d8650a881479b7ade8f9f119912	2018-06-12 13:54:54 -07:00
Brielle Broder	711a9869e5	Runsc checkpoint works. This is the first iteration of checkpoint that actually saves to a file. Tests for checkpoint are included. Ran into an issue when private unix sockets are enabled. An error message was added for this case and the mutex state was set. PiperOrigin-RevId: 200269470 Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93	2018-06-12 13:25:23 -07:00
Kevin Krakauer	2dc9cd7bf7	runsc: enable terminals in the sandbox. runsc now mounts the devpts filesystem, so you get a real terminal using ssh+sshd. PiperOrigin-RevId: 200244830 Change-Id: If577c805ad0138fda13103210fa47178d8ac6605	2018-06-12 11:03:25 -07:00
Fabricio Voznika	48335318a2	Enable debug logging in tests Unit tests call runsc directly now, so all command line arguments are valid. On the other hand, enabling debug in the test binary doesn't affect runsc. It needs to be set in the config. PiperOrigin-RevId: 200237706 Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7	2018-06-12 10:25:55 -07:00
Fabricio Voznika	5c51bc51e4	Drop capabilities not needed by Gofer PiperOrigin-RevId: 199808391 Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4	2018-06-08 09:59:26 -07:00
Googler	722275c3d1	Added a function to the controller to checkpoint a container. Functionality for checkpoint is not complete, more to come. PiperOrigin-RevId: 199500803 Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7	2018-06-06 11:43:55 -07:00
Fabricio Voznika	6c585b8eb6	Create destination mount dir if it doesn't exist PiperOrigin-RevId: 199175296 Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630	2018-06-04 12:31:35 -07:00
Zhengyu He	d1ca50d49e	Add SyscallRules that supports argument filtering PiperOrigin-RevId: 198919043 Change-Id: I7f1f0a3b3430cd0936a4ee4fc6859aab71820bdf	2018-06-01 13:40:52 -07:00
Fabricio Voznika	e48f707876	Configure sandbox as superuser Container user might not have enough priviledge to walk directories and mount filesystems. Instead, create superuser to perform these steps of the configuration. PiperOrigin-RevId: 197953667 Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb	2018-05-24 14:27:57 -07:00
Rahat Mahmood	8878a66a56	Implement sysv shm. PiperOrigin-RevId: 197058289 Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574	2018-05-17 15:06:19 -07:00
Nicolas Lacasse	31386185fe	Push signal-delivery and wait into the sandbox. This is another step towards multi-container support. Previously, we delivered signals directly to the sandbox process (which then forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a container by waiting on the sandbox process itself. This approach will not work when there are multiple containers inside the sandbox, and we need to signal/wait on individual containers. This CL adds two new messages, ContainerSignal and ContainerWait. These messages include the id of the container to signal/wait. The controller inside the sandbox receives these messages and signals/waits on the appropriate process inside the sandbox. The container id is plumbed into the sandbox, but it currently is not used. We still end up signaling/waiting on PID 1 in all cases. Once we actually have multiple containers inside the sandbox, we will need to keep some sort of map of container id -> pid (or possibly pid namespace), and signal/kill the appropriate process for the container. PiperOrigin-RevId: 197028366 Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895	2018-05-17 11:55:28 -07:00
Nicolas Lacasse	205f1027e6	Refactor the Sandbox package into Sandbox + Container. This is a necessary prerequisite for supporting multiple containers in a single sandbox. All the commands (in cmd package) now call operations on Containers (container package). When a Container first starts, it will create a Sandbox with the same ID. The Sandbox class is now simpler, as it only knows how to create boot/gofer processes, and how to forward commands into the running boot process. There are TODOs sprinkled around for additional support for multiple containers. Most notably, we need to detect when a container is intended to run in an existing sandbox (by reading the metadata), and then have some way to signal to the sandbox to start a new container. Other urpc calls into the sandbox need to pass the container ID, so the sandbox can run the operation on the given container. These are only half-plummed through right now. PiperOrigin-RevId: 196688269 Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0	2018-05-15 10:18:03 -07:00
Fabricio Voznika	ac01f245ff	Skip atime and mtime update when file is backed by host FD When file is backed by host FD, atime and mtime for the host file and the cached attributes in the Sentry must be close together. In this case, the call to update atime and mtime can be skipped. This is important when host filesystem is using overlay because updating atime and mtime explicitly forces a copy up for every file that is touched. PiperOrigin-RevId: 196176413 Change-Id: I3933ea91637a071ba2ea9db9d8ac7cdba5dc0482	2018-05-10 14:59:40 -07:00
Nicolas Lacasse	1bdec86bae	Return better errors from Docker when runsc fails to start. Two changes in this CL: First, make the "boot" process sleep when it encounters an error to give the controller time to send the error back to the "start" process. Otherwise the "boot" process exits immediately and the control connection errors with EOF. Secondly, open the log file with O_APPEND, not O_TRUNC. Docker uses the same log file for all runtime commands, and setting O_TRUNC causes them to get destroyed. Furthermore, containerd parses these log files in the event of an error, and it does not like the file being truncated out from underneath it. Now, when trying to run a binary that does not exist in the image, the error message is more reasonable: $ docker run alpine /not/found docker: Error response from daemon: OCI runtime start failed: /usr/local/google/docker/runtimes/runscd did not terminate sucessfully: error starting sandbox: error starting application [/not/found]: failed to create init process: no such file or directory Fixes #32 PiperOrigin-RevId: 196027084 Change-Id: Iabc24c0bdd8fc327237acc051a1655515f445e68	2018-05-09 14:13:37 -07:00
Fabricio Voznika	c90fefc116	Fix runsc capabilities There was a typo and one new capability missing from the list PiperOrigin-RevId: 195427713 Change-Id: I6d9e1c6e77b48fe85ef10d9f54c70c8a7271f6e7	2018-05-04 09:39:28 -07:00
Ian Gudger	eb5414ee29	Add support for ping sockets PiperOrigin-RevId: 195049322 Change-Id: I09f6dd58cf10a2e50e53d17d2823d540102913c5	2018-05-01 22:51:41 -07:00
Ian Gudger	3d3deef573	Implement SO_TIMESTAMP PiperOrigin-RevId: 195047018 Change-Id: I6d99528a00a2125f414e1e51e067205289ec9d3d	2018-05-01 22:11:49 -07:00
Googler	d02b74a5dc	Check in gVisor. PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463	2018-04-28 01:44:26 -04:00

... 2 3 4 5 6

292 Commits