Commit Graph

93 Commits

Author SHA1 Message Date
Nicolas Lacasse ad8648c634 runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

The specutils.ReadSpecFromFile method was fixed to always seek to the beginning
of the file before reading. This allows Files from the same FD to be read
multiple times, as we do in the boot command when the apply-caps flag is set.

Tested with --network=host.

PiperOrigin-RevId: 211570647
Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-09-04 20:10:01 -07:00
Fabricio Voznika 7e18f158b2 Automated rollback of changelist 210995199
PiperOrigin-RevId: 211116429
Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-31 11:30:47 -07:00
Nicolas Lacasse 5ade9350ad runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

PiperOrigin-RevId: 210995199
Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-08-30 15:47:18 -07:00
Fabricio Voznika db81c0b02f Put fsgofer inside chroot
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.

PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-27 11:10:14 -07:00
Nicolas Lacasse 106de2182d runsc: Terminal support for "docker exec -ti".
This CL adds terminal support for "docker exec".  We previously only supported
consoles for the container process, but not exec processes.

The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.

Note that control-character signals are still not properly supported.

Tested with:
	$ docker run --runtime=runsc -it alpine
In another terminial:
	$ docker exec -it <containerid> /bin/sh

PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24 17:43:21 -07:00
Fabricio Voznika a81a4402a2 Add option to panic gofer if writes are attempted over RO mounts
This is used when '--overlay=true' to guarantee writes are not sent to gofer.

PiperOrigin-RevId: 210116288
Change-Id: I7616008c4c0e8d3668e07a205207f46e2144bf30
2018-08-24 10:17:42 -07:00
Fabricio Voznika d6d165cb0b Initial change for multi-gofer support
PiperOrigin-RevId: 209647293
Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-21 13:14:43 -07:00
Fabricio Voznika 0fc7b30695 Standardize mounts in tests
Tests get a readonly rootfs mapped to / (which was the case before)
and writable TEST_TMPDIR. This makes it easier to setup containers to
write to files and to share state between test and containers.

PiperOrigin-RevId: 209453224
Change-Id: I4d988e45dc0909a0450a3bb882fe280cf9c24334
2018-08-20 11:26:39 -07:00
Kevin Krakauer 635b0c4593 runsc fsgofer: Support dynamic serving of filesystems.
When multiple containers run inside a sentry, each container has its own root
filesystem and set of mounts. Containers are also added after sentry boot rather
than all configured and known at boot time.

The fsgofer needs to be able to serve the root filesystem of each container.
Thus, it must be possible to add filesystems after the fsgofer has already
started.

This change:
* Creates a URPC endpoint within the gofer process that listens for requests to
  serve new content.
* Enables the sentry, when starting a new container, to add the new container's
  filesystem.
* Mounts those new filesystems at separate roots within the sentry.

PiperOrigin-RevId: 208903248
Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15 16:25:22 -07:00
Nicolas Lacasse e8a4f2e133 runsc: Change cache policy for root fs and volume mounts.
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively.  While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.

This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.

This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.

A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.

All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.

PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-14 16:25:58 -07:00
Fabricio Voznika bc9a1fca23 Tiny reordering to network code
PiperOrigin-RevId: 207581723
Change-Id: I6e4eb1227b5ed302de5e6c891040b670955f1eea
2018-08-06 11:48:29 -07:00
Fabricio Voznika d7a34790a0 Add KVM and overlay dimensions to container_test
PiperOrigin-RevId: 205714667
Change-Id: I317a2ca98ac3bdad97c4790fcc61b004757d99ef
2018-07-23 13:31:42 -07:00
Justine Olshan f543ada150 Removed a now incorrect reference to restoreFile.
PiperOrigin-RevId: 205470108
Change-Id: I226878a887fe1133561005357a9e3b09428b06b6
2018-07-20 16:18:07 -07:00
Lantao Liu f62d6dd453 runsc: copy gateway from the pod network interface.
PiperOrigin-RevId: 205334841
Change-Id: Ia60d486f9aae70182fdc4af50cf7c915986126d7
2018-07-19 18:09:56 -07:00
Justine Olshan c05660373e Moved restore code out of create and made to be called after create.
Docker expects containers to be created before they are restored.
However, gVisor restoring requires specificactions regarding the kernel
and the file system. These actions were originally in booting the sandbox.

Now setting up the file system is deferred until a call to a call to
runsc start. In the restore case, the kernel is destroyed and a new kernel
is created in the same process, as we need the same process for Docker.

These changes required careful execution of concurrent processes which
required the use of a channel.

Full docker integration still needs the ability to restore into the same
container.

PiperOrigin-RevId: 205161441
Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-18 16:58:30 -07:00
Kevin Krakauer 16d37973eb runsc: Add the "wait" subcommand.
Users can now call "runsc wait <container id>" to wait on a particular process
inside the container. -pid can also be used to wait on a specific PID.

Manually tested the wait subcommand for a single waiter and multiple waiters
(simultaneously 2 processes waiting on the container and 2 processes waiting on
a PID within the container).

PiperOrigin-RevId: 202548978
Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
2018-06-28 14:56:36 -07:00
Fabricio Voznika bb31a11903 Wait for sandbox process when waiting for root container
Closes #71

PiperOrigin-RevId: 202532762
Change-Id: I80a446ff638672ff08e6fd853cd77e28dd05d540
2018-06-28 13:23:04 -07:00
Kevin Krakauer 04bdcc7b65 runsc: Enable waiting on individual containers within a sandbox.
PiperOrigin-RevId: 201742160
Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31
2018-06-22 14:31:25 -07:00
Brielle Broder 7d6149063a Restore implementation added to runsc.
Restore creates a new container and uses the given image-path to load a saved
image of a previous container. Restore command is plumbed through container
and sandbox. This command does not work yet - more to come.

PiperOrigin-RevId: 201541229
Change-Id: I864a14c799ce3717d99bcdaaebc764281863d06f
2018-06-21 09:58:24 -07:00
Fabricio Voznika 4ad7315b67 Add 'runsc debug' command
It prints sandbox stacks to the log to help debug stuckness. I expect
that many more options will be added in the future.

PiperOrigin-RevId: 201405931
Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e
2018-06-20 13:31:31 -07:00
Kevin Krakauer 5397963b5d runsc: Enable container creation within existing sandboxes.
Containers are created as processes in the sandbox. Of the many things that
don't work yet, the biggest issue is that the fsgofer is launched with its root
as the sandbox's root directory. Thus, when a container is started and wants to
read anything (including the init binary of the container), the gofer tries to
serve from sandbox's root (which basically just has pause), not the container's.

PiperOrigin-RevId: 201294560
Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1
2018-06-19 21:44:33 -07:00
Justine Olshan a6dbef045f Added a resume command to unpause a paused container.
Resume checks the status of the container and unpauses the kernel
if its status is paused. Otherwise nothing happens.
Tests were added to ensure that the process is in the correct state
after various commands.

PiperOrigin-RevId: 201251234
Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19
2018-06-19 15:23:36 -07:00
Fabricio Voznika 775982ed4b Automated rollback of changelist 200770591
PiperOrigin-RevId: 201012131
Change-Id: I5cd69e795555129319eb41135ecf26db9a0b1fcb
2018-06-18 10:00:04 -07:00
Justine Olshan 0786707cd9 Added code for a pause command for a container process.
Like runc, the pause command will pause the processes of the given container.
It will set that container's status to "paused."
A resume command will be be added to unpause and continue running the process.

PiperOrigin-RevId: 200789624
Change-Id: I72a5d7813d90ecfc4d01cc252d6018855016b1ea
2018-06-15 16:09:09 -07:00
Kevin Krakauer 437890dc4b runsc: Make gofer logs show up in test output.
PiperOrigin-RevId: 200770591
Change-Id: Ifc096d88615b63135210d93c2b4cee2eaecf1eee
2018-06-15 14:07:54 -07:00
Brielle Broder 711a9869e5 Runsc checkpoint works.
This is the first iteration of checkpoint that actually saves to a file.
Tests for checkpoint are included.

Ran into an issue when private unix sockets are enabled. An error message
was added for this case and the mutex state was set.

PiperOrigin-RevId: 200269470
Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
2018-06-12 13:25:23 -07:00
Fabricio Voznika 5c51bc51e4 Drop capabilities not needed by Gofer
PiperOrigin-RevId: 199808391
Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
2018-06-08 09:59:26 -07:00
Googler 0c34b460f2 Add runsc checkpoint command.
Checkpoint command is plumbed through container and sandbox.
Restore has also been added but it is only a stub. None of this
works yet. More changes to come.

PiperOrigin-RevId: 199510105
Change-Id: Ibd08d57f4737847eb25ca20b114518e487320185
2018-06-06 12:31:53 -07:00
Fabricio Voznika 78ccd1298e Return 'running' if gofer is still alive
Containerd will start deleting container and rootfs after container
is stopped. However, if gofer is still running, rootfs cleanup will
fail because of device busy.

This CL makes sure that gofer is not running when container state is
stopped.

Change from: lantaol@google.com

PiperOrigin-RevId: 199172668
Change-Id: I9d874eec3ecf74fd9c8edd7f62d9f998edef66fe
2018-06-04 12:14:23 -07:00
Fabricio Voznika 55a37ceef1 Fix leaky FD
9P socket was being created without CLOEXEC and was being inherited
by the children. This would prevent the gofer from detecting that the
sandbox had exited, because the socket would not be closed.

PiperOrigin-RevId: 199168959
Change-Id: I3ee1a07cbe7331b0aeb1cf2b697e728ce24f85a7
2018-06-04 11:52:17 -07:00
Fabricio Voznika 65dadc0029 Ignores IPv6 addresses when configuring network
Closes #60

PiperOrigin-RevId: 198887885
Change-Id: I9bf990ee3fde9259836e57d67257bef5b85c6008
2018-06-01 10:09:37 -07:00
Nicolas Lacasse 31386185fe Push signal-delivery and wait into the sandbox.
This is another step towards multi-container support.

Previously, we delivered signals directly to the sandbox process (which then
forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a
container by waiting on the sandbox process itself. This approach will not work
when there are multiple containers inside the sandbox, and we need to
signal/wait on individual containers.

This CL adds two new messages, ContainerSignal and ContainerWait. These
messages include the id of the container to signal/wait. The controller inside
the sandbox receives these messages and signals/waits on the appropriate
process inside the sandbox.

The container id is plumbed into the sandbox, but it currently is not used. We
still end up signaling/waiting on PID 1 in all cases.  Once we actually have
multiple containers inside the sandbox, we will need to keep some sort of map
of container id -> pid (or possibly pid namespace), and signal/kill the
appropriate process for the container.

PiperOrigin-RevId: 197028366
Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-05-17 11:55:28 -07:00
Nicolas Lacasse 205f1027e6 Refactor the Sandbox package into Sandbox + Container.
This is a necessary prerequisite for supporting multiple containers in a single
sandbox.

All the commands (in cmd package) now call operations on Containers (container
package). When a Container first starts, it will create a Sandbox with the same
ID.

The Sandbox class is now simpler, as it only knows how to create boot/gofer
processes, and how to forward commands into the running boot process.

There are TODOs sprinkled around for additional support for multiple
containers. Most notably, we need to detect when a container is intended to run
in an existing sandbox (by reading the metadata), and then have some way to
signal to the sandbox to start a new container. Other urpc calls into the
sandbox need to pass the container ID, so the sandbox can run the operation on
the given container. These are only half-plummed through right now.

PiperOrigin-RevId: 196688269
Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0
2018-05-15 10:18:03 -07:00
Nicolas Lacasse 1bdec86bae Return better errors from Docker when runsc fails to start.
Two changes in this CL:

First, make the "boot" process sleep when it encounters an error to give the
controller time to send the error back to the "start" process. Otherwise the
"boot" process exits immediately and the control connection errors with EOF.

Secondly, open the log file with O_APPEND, not O_TRUNC. Docker uses the same
log file for all runtime commands, and setting O_TRUNC causes them to get
destroyed. Furthermore, containerd parses these log files in the event of an
error, and it does not like the file being truncated out from underneath it.

Now, when trying to run a binary that does not exist in the image, the error
message is more reasonable:

$ docker run alpine /not/found
docker: Error response from daemon: OCI runtime start failed: /usr/local/google/docker/runtimes/runscd did not terminate sucessfully: error starting sandbox: error starting application [/not/found]: failed to create init process: no such file or directory

Fixes #32

PiperOrigin-RevId: 196027084
Change-Id: Iabc24c0bdd8fc327237acc051a1655515f445e68
2018-05-09 14:13:37 -07:00
Nicolas Lacasse 32cabad8da Use the containerd annotation instead of detecting the "pause" application.
FIXED=72380268
PiperOrigin-RevId: 195846596
Change-Id: Ic87fed1433482a514631e1e72f5ee208e11290d1
2018-05-08 11:11:50 -07:00
Fabricio Voznika e1b412d660 Error if container requires AppArmor, SELinux or seccomp
Closes #35

PiperOrigin-RevId: 195840128
Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29
2018-05-08 10:34:11 -07:00
Ian Gudger 7c8c3705ea Fix misspellings
PiperOrigin-RevId: 195742598
Change-Id: Ibd4a8e4394e268c87700b6d1e50b4b37dfce5182
2018-05-07 16:38:01 -07:00
Ian Gudger f47174f06b Run gofmt -s on everything
PiperOrigin-RevId: 195469901
Change-Id: I66d5c7a334bbb8b47e40d266a2661291c2d91c7f
2018-05-04 14:16:11 -07:00
Fabricio Voznika c186ebb62a Return error when child exits early
PiperOrigin-RevId: 195365050
Change-Id: I8754dc7a3fc2975d422cae453762a455478a8e6a
2018-05-03 21:09:31 -07:00
Cyrille Hemidy 04b79137ba Fix misspellings.
PiperOrigin-RevId: 195307689
Change-Id: I499f19af49875a43214797d63376f20ae788d2f4
2018-05-03 14:06:13 -07:00
Fabricio Voznika a61def1b36 Remove detach for exec options
Detachable exec commands are handled in the client entirely and the detach option is not used anymore.

PiperOrigin-RevId: 195181272
Change-Id: I6e82a2876d2c173709c099be59670f71702e5bf0
2018-05-02 17:40:01 -07:00
Fabricio Voznika 5eab7a41a3 Remove stale TODO
PiperOrigin-RevId: 194949678
Change-Id: I60a30c4bb7418e17583c66f437273fd17e9e99ba
2018-05-01 09:45:45 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00