gvisor

Commit Graph

Author	SHA1	Message	Date
Nicolas Lacasse	e70f28664a	Allow the watchdog to detect when the sandbox is stuck during setup. The watchdog currently can find stuck tasks, but has no way to tell if the sandbox is stuck before the application starts executing. This CL adds a startup timeout and action to the watchdog. If Start() is not called before the given timeout (if non-zero), then the watchdog will take the action. PiperOrigin-RevId: 277970577	2019-11-01 11:49:31 -07:00
kevin.xu	1f19624fa1	fix typo fix a typo	2019-10-23 15:21:50 +08:00
kevin.xu	3edbdcc191	remove duplicated period remove a duplicated period	2019-10-23 14:56:44 +08:00
Fabricio Voznika	9fb562234e	Fix problem with open FD when copy up is triggered in overlayfs Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289	2019-10-16 15:06:24 -07:00
Fabricio Voznika	a357fe427b	Remove stale TODO PiperOrigin-RevId: 273630282	2019-10-08 16:23:41 -07:00
Kevin Krakauer	6a98237949	Rename epsocket to netstack. PiperOrigin-RevId: 273365058	2019-10-07 13:57:59 -07:00
Rahat Mahmood	13a98df49e	netstack: Don't start endpoint goroutines too soon on restore. Endpoint protocol goroutines were previously started as part of loading the endpoint. This is potentially too soon, as resources used by these goroutine may not have been loaded. Protocol goroutines may perform meaningful work as soon as they're started (ex: incoming connect) which can cause them to indirectly access resources that haven't been loaded yet. This CL defers resuming all protocol goroutines until the end of restore. PiperOrigin-RevId: 262409429	2019-08-08 12:33:11 -07:00
Fabricio Voznika	b461be88a8	Stops container if gofer is killed Each gofer now has a goroutine that polls on the FDs used to communicate with the sandbox. The respective gofer is destroyed if any of the FDs is closed. Closes #601 PiperOrigin-RevId: 261383725	2019-08-02 13:47:55 -07:00
Nicolas Lacasse	aaaefdf9ca	Remove kernel.mounts. We can get the mount namespace from the CreateProcessArgs in all cases where we need it. This also gets rid of kernel.Destroy method, since the only thing it was doing was DecRefing the mounts. Removing the need to call kernel.SetRootMountNamespace also allowed for some more simplifications in the container fs setup code. PiperOrigin-RevId: 261357060	2019-08-02 11:23:11 -07:00
Fabricio Voznika	b21b1db700	Allow to change logging options using 'runsc debug' New options are: runsc debug --strace=off\|all\|function1,function2 runsc debug --log-level=warning\|info\|debug runsc debug --log-packets=true\|false Updates #407 PiperOrigin-RevId: 254843128	2019-06-24 15:03:02 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00
Andrei Vagin	bb849bad29	gvisor/runsc: apply seccomp filters before parsing a state file PiperOrigin-RevId: 252869983	2019-06-12 11:55:24 -07:00
Fabricio Voznika	fc746efa9a	Add support to mount pod shared tmpfs mounts Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037	2019-06-11 14:54:31 -07:00
Fabricio Voznika	f1aee6a7ad	Refactor container FS setup No change in functionaly. Added containerMounter object to keep state while the mounts are processed. This will help upcoming changes to share mounts per-pod. PiperOrigin-RevId: 251350096	2019-06-03 18:20:57 -07:00
Fabricio Voznika	d28f71adcf	Remove 'clearStatus' option from container.Wait*PID() clearStatus was added to allow detached execution to wait on the exec'd process and retrieve its exit status. However, it's not currently used. Both docker and gvisor-containerd-shim wait on the "shim" process and retrieve the exit status from there. We could change gvisor-containerd-shim to use waits, but it will end up also consuming a process for the wait, which is similar to having the shim process. Closes #234 PiperOrigin-RevId: 251349490	2019-06-03 18:16:09 -07:00
Bhasker Hariharan	035a8fa38e	Add support for collecting execution trace to runsc. Updates #220 PiperOrigin-RevId: 250532302	2019-05-30 12:07:11 -07:00
Fabricio Voznika	ecb0f00e10	Cleanup around urpc file payload handling urpc always closes all files once the RPC function returns. PiperOrigin-RevId: 248406857 Change-Id: I400a8562452ec75c8e4bddc2154948567d572950	2019-05-15 14:36:28 -07:00
Andrei Vagin	bf0ac565d2	Fix runsc restore to be compatible with docker start --checkpoint ... Change-Id: I02b30de13f1393df66edf8829fedbf32405d18f8 PiperOrigin-RevId: 246621192	2019-05-03 21:41:45 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Kevin Krakauer	f9431fb20f	Remove obsolete TODO. PiperOrigin-RevId: 241637164 Change-Id: I65476a739cf38f1818dc47f6ce60638dec8b77a8	2019-04-02 17:27:05 -07:00
Kevin Krakauer	a40ee4f4b8	Change bug number for duplicate bug. PiperOrigin-RevId: 241567897 Change-Id: I580eac04f52bb15f4aab7df9822c4aa92e743021	2019-04-02 11:28:06 -07:00
Jamie Liu	8f4634997b	Decouple filemem from platform and move it to pgalloc.MemoryFile. This is in preparation for improved page cache reclaim, which requires greater integration between the page cache and page allocator. PiperOrigin-RevId: 238444706 Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4	2019-03-14 08:12:48 -07:00
Fabricio Voznika	bc9b979b94	Add profiling commands to runsc Example: runsc debug --root=<dir> \ --profile-heap=/tmp/heap.prof \ --profile-cpu=/tmp/cpu.prod --profile-delay=30 \ <container ID> PiperOrigin-RevId: 237848456 Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773	2019-03-11 11:47:30 -07:00
Andrei Vagin	dd577f5410	runsc: reap a sandbox process only in sandbox.Wait() PiperOrigin-RevId: 231504064 Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d	2019-01-29 17:15:56 -08:00
Fabricio Voznika	c1be25b78d	Scrub runsc error messages Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1	2019-01-18 17:36:02 -08:00
Fabricio Voznika	a891afad6d	Simplify synchronization between runsc and sandbox process Make 'runsc create' join cgroup before creating sandbox process. This removes the need to synchronize platform creation and ensure that sandbox process is charged to the right cgroup from the start. PiperOrigin-RevId: 227166451 Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2	2018-12-28 13:48:24 -08:00
Zhaozhong Ni	9984138abe	sentry: turn "dynamically-created" procfs files into static creation. PiperOrigin-RevId: 224600982 Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34	2018-12-07 17:03:54 -08:00
Fabricio Voznika	d97ccfa346	Close donated files if containerManager.Start() fails PiperOrigin-RevId: 220869535 Change-Id: I9917e5daf02499f7aab6e2aa4051c54ff4461b9a	2018-11-09 14:54:34 -08:00
Fabricio Voznika	c92b9b7086	Add more logging to controller.go PiperOrigin-RevId: 220519632 Change-Id: Iaeec007fc1aa3f0b72569b288826d45f2534c4bf	2018-11-07 13:33:19 -08:00
Fabricio Voznika	86b3f0cd24	Fix race between start and destroy Before this change, a container starting up could race with destroy (aka delete) and leave processes behind. Now, whenever a container is created, Loader.processes gets a new entry. Start now expects the entry to be there, and if it's not it means that the container was deleted. I've also fixed Loader.waitPID to search for the process using the init process's PID namespace. We could use a few more tests for signal and wait. I'll send them in another cl. PiperOrigin-RevId: 220224290 Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4	2018-11-05 21:29:37 -08:00
Fabricio Voznika	a467f09261	Log when external signal is received PiperOrigin-RevId: 220204591 Change-Id: I21a9c6f5c12a376d18da5d10c1871837c4f49ad2	2018-11-05 17:42:24 -08:00
Ian Gudger	8fce67af24	Use correct company name in copyright header PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035	2018-10-19 16:35:11 -07:00
Nicolas Lacasse	4e6f0892c9	runsc: Support job control signals for the root container. Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732	2018-10-17 12:29:05 -07:00
Nicolas Lacasse	f1c01ed886	runsc: Support job control signals in "exec -it". Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39	2018-10-01 22:06:56 -07:00
Fabricio Voznika	2496d9b4b6	Make runsc kill and delete more conformant to the "spec" PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521	2018-09-28 12:22:21 -07:00
Fabricio Voznika	6779bd1187	Merge Loader.containerRootTGs and execProcess into a single map It's easier to manage a single map with processes that we're interested to track. This will make the next change to clean up the map on destroy easier. PiperOrigin-RevId: 214894210 Change-Id: I099247323a0487cd0767120df47ba786fac0926d	2018-09-27 23:55:05 -07:00
Fabricio Voznika	491faac03b	Implement 'runsc kill --all' In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6	2018-09-27 15:00:58 -07:00
Fabricio Voznika	b514ab0589	Refactor 'runsc boot' to take container ID as argument This makes the flow slightly simpler (no need to call Loader.SetRootContainer). And this is required change to tag tasks with container ID inside the Sentry. PiperOrigin-RevId: 214795210 Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884	2018-09-27 10:26:34 -07:00
Nicolas Lacasse	cbaec4d614	Wait for all async fs operations to complete before returning from Destroy. Destroy flushes dirent references, which triggers many async close operations. We must wait for those to finish before returning from Destroy, otherwise we may kill the gofer, causing a cascade of failing RPCs and leading to an inconsistent FS state. PiperOrigin-RevId: 213884637 Change-Id: Id054b47fc0f97adc5e596d747c08d3b97a1d1f71	2018-09-20 14:37:53 -07:00
Kevin Krakauer	ffb5fdd690	runsc: Fix stdin/stdout/stderr in multi-container mode. The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4	2018-09-19 22:20:41 -07:00
Nicolas Lacasse	915d76aa92	Add container.Destroy urpc method. This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9	2018-09-19 18:54:14 -07:00
Kevin Krakauer	7e00f37054	Automated rollback of changelist 213307171 PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f	2018-09-18 13:22:26 -07:00
Kevin Krakauer	bb88c187c5	runsc: Enable waiting on exited processes. This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558	2018-09-17 16:25:24 -07:00
Kevin Krakauer	25add7b22b	runsc: Fix stdin/out/err in multi-container mode. Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2	2018-09-17 11:31:28 -07:00
Lantao Liu	bde2a91433	runsc: Support container signal/wait. This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a	2018-09-13 16:38:03 -07:00
Kevin Krakauer	2eff1fdd06	runsc: Add exec flag that specifies where to save the sandbox-internal pid. This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0	2018-09-12 15:23:35 -07:00
Nicolas Lacasse	6cc9b311af	platform: Pass device fd into platform constructor. We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910	2018-09-11 13:09:46 -07:00
Nicolas Lacasse	9751b800a6	runsc: Support multi-container exec. We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537	2018-09-07 17:39:54 -07:00
Kevin Krakauer	8f0b6e7fc0	runsc: Support runsc kill multi-container. Now, we can kill individual containers rather than the entire sandbox. PiperOrigin-RevId: 211748106 Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead	2018-09-05 21:14:56 -07:00

1 2

66 Commits