Commit Graph

165 Commits

Author SHA1 Message Date
Andrei Vagin c0a981629c Start a sandbox process in a new userns only if CAP_SETUID is set
In addition, it fixes a race condition in TestMultiContainerGoferStop.
There are two scripts copy the same set of files into the same directory
and sometime one of this command fails with EXIST.

PiperOrigin-RevId: 230011247
Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c
2019-01-18 16:08:39 -08:00
Fabricio Voznika e4d3ca7263 Prevent internal tmpfs mount to override files in /tmp
Runsc wants to mount /tmp using internal tmpfs implementation for
performance. However, it risks hiding files that may exist under
/tmp in case it's present in the container. Now, it only mounts
over /tmp iff:
  - /tmp was not explicitly asked to be mounted
  - /tmp is empty

If any of this is not true, then /tmp maps to the container's
image /tmp.

Note: checkpoint doesn't have sentry FS mounted to check if /tmp
is empty. It simply looks for explicit mounts right now.
PiperOrigin-RevId: 229607856
Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
2019-01-16 12:48:32 -08:00
Fabricio Voznika 92cf3764e0 Create working directory if it doesn't yet exist
PiperOrigin-RevId: 229438125
Change-Id: I58eb0d10178d1adfc709d7b859189d1acbcb2f22
2019-01-15 14:13:27 -08:00
Andrei Vagin f8c8f24154 runsc: Collect zombies of sandbox and gofer processes
And we need to wait a gofer process before cgroup.Uninstall,
because it is running in the sandbox cgroups.

PiperOrigin-RevId: 228904020
Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
2019-01-11 10:32:26 -08:00
Fabricio Voznika 0d7023d581 Restore to original cgroup after sandbox and gofer processes are created
The original code assumed that it was safe to join and not restore cgroup,
but Container.Run will not exit after calling start, making cgroup cleanup
fail because there were still processes inside the cgroup.

PiperOrigin-RevId: 228529199
Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51
2019-01-09 09:18:15 -08:00
Nicolas Lacasse 1775a0e11e container.Destroy should clean up container metadata even if other cleanups fail
If the sandbox process is dead (because of a panic or some other problem),
container.Destroy will never remove the container metadata file, since it will
always fail when calling container.stop().

This CL changes container.Destroy() to always perform the three necessary
cleanup operations:
* Stop the sandbox and gofer processes.
* Remove the container fs on the host.
* Delete the container metadata directory.

Errors from these three operations will be concatenated and returned from
Destroy().

PiperOrigin-RevId: 225448164
Change-Id: I99c6311b2e4fe5f6e2ca991424edf1ebeae9df32
2018-12-13 15:38:10 -08:00
Brian Geffon d3bc79bc84 Open source system call tests.
PiperOrigin-RevId: 224886231
Change-Id: I0fccb4d994601739d8b16b1d4e6b31f40297fb22
2018-12-10 14:42:34 -08:00
Googler 613899f852 Internal change.
PiperOrigin-RevId: 223893409
Change-Id: I58869c7fb0012f6c3f7612a96cb649348b56335f
2018-12-03 17:27:35 -08:00
Nicolas Lacasse 845836c578 Internal change.
PiperOrigin-RevId: 221848471
Change-Id: I882fbe5ce7737048b2e1f668848e9c14ed355665
2018-11-20 14:03:11 -08:00
Nicolas Lacasse adf8138e06 Allow sandbox.Wait to be called after the sandbox has exited.
sandbox.Wait is racey, as the sandbox may have exited before it is called, or
even during.

We already had code to handle the case that the sandbox exits during the Wait
call, but we were not properly handling the case where the sandbox has exited
before the call.

The best we can do in such cases is return the sandbox exit code as the
application exit code.

PiperOrigin-RevId: 221702517
Change-Id: I290d0333cc094c7c1c3b4ce0f17f61a3e908d787
2018-11-15 15:35:41 -08:00
Nicolas Lacasse c57b92a0c7 Internal change.
PiperOrigin-RevId: 221178413
Change-Id: I0e615c5e945cb924d8df767c894a9e402f0b8ff2
2018-11-12 16:29:08 -08:00
Fabricio Voznika 93e88760b0 Add tests multicontainer start/stop
Each container has its respective gofer. Test that
gofer can be shutdown when a container stops and that
it doesn't affect other containers.

PiperOrigin-RevId: 220829898
Change-Id: I2a44a3cf2a88577e6ad1133afc622bbf4a5f6591
2018-11-09 10:58:32 -08:00
Fabricio Voznika d12a0dd6b8 Fix test --race violation
SetupContainerInRoot was setting Config.RootDir unnecessarily
and causing a --race violation in TestMultiContainerDestroyStarting.

PiperOrigin-RevId: 220580073
Change-Id: Ie0b28c19846106c7458a92681b708ae70f87d25a
2018-11-07 21:30:59 -08:00
Fabricio Voznika 86b3f0cd24 Fix race between start and destroy
Before this change, a container starting up could race with
destroy (aka delete) and leave processes behind.

Now, whenever a container is created, Loader.processes gets
a new entry. Start now expects the entry to be there, and if
it's not it means that the container was deleted.

I've also fixed Loader.waitPID to search for the process using
the init process's PID namespace.

We could use a few more tests for signal and wait. I'll send
them in another cl.

PiperOrigin-RevId: 220224290
Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
2018-11-05 21:29:37 -08:00
Fabricio Voznika 5cd55cd90f Use spec with clean paths for gofer
Otherwise the gofer's attach point may be different from sandbox when there
symlinks in the path.

PiperOrigin-RevId: 219730492
Change-Id: Ia9c4c2d16228c6a1a9e790e0cb673fd881003fe1
2018-11-01 17:52:11 -07:00
Adin Scannell 75cd70ecc9 Track paths and provide a rename hook.
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.

PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-23 00:20:15 -07:00
Ian Lewis c2c0f9cb7e Updated cleanup code to be more explicit about ignoring errors.
Errors are shown as being ignored by assigning to the blank identifier.

PiperOrigin-RevId: 218103819
Change-Id: I7cc7b9d8ac503a03de5504ebdeb99ed30a531cf2
2018-10-21 19:42:32 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Fabricio Voznika f3ffa4db52 Resolve mount paths while setting up root fs mount
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.

Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.

It haven't been able to write a good test for it. So I'm relying on manual tests
for now.

PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-18 12:42:24 -07:00
Nicolas Lacasse e4277cb6ff Relativize all socket paths in tests.
Otherwise they may exceed the maximum.

PiperOrigin-RevId: 217584658
Change-Id: I869e400d3409599c0d3b85c6590702c052f49550
2018-10-17 14:11:30 -07:00
Nicolas Lacasse 4e6f0892c9 runsc: Support job control signals for the root container.
Now containers run with "docker run -it" support control characters like ^C and
^Z.

This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.

PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17 12:29:05 -07:00
Kevin Krakauer 9b3550f70b runsc: Add --pid flag to runsc kill.
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.

PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-17 10:51:39 -07:00
Nicolas Lacasse cea51641d4 Bump sandbox start and stop timeouts.
PiperOrigin-RevId: 217433699
Change-Id: Icef08285728c23ee7dd650706aaf18da51c25dff
2018-10-16 20:34:10 -07:00
Fabricio Voznika f074f0c2c7 Make the gofer process enter namespaces
This is done to further isolate the gofer from the host.

PiperOrigin-RevId: 216790991
Change-Id: Ia265b77e4e50f815d08f743a05669f9d75ad7a6f
2018-10-11 17:45:51 -07:00
Nicolas Lacasse ea5f6ed6ec Make Wait() return the sandbox exit status if the sandbox has exited.
It's possible for Start() and Wait() calls to race, if the sandboxed
application is short-lived. If the application finishes before (or during) the
Wait RPC, then Wait will fail.  In practice this looks like "connection
refused" or "EOF" errors when waiting for an RPC response.

This race is especially bad in tests, where we often run "true" inside a
sandbox.

This CL does a best-effort fix, by returning the sandbox exit status as the
container exit status.  In most cases, these are the same.

This fixes the remaining flakes in runsc/container:container_test.

PiperOrigin-RevId: 216777793
Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
2018-10-11 16:07:05 -07:00
Fabricio Voznika f413e4b117 Add bare bones unsupported syscall logging
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.

For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.

PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-11 11:56:54 -07:00
Fabricio Voznika 29cd05a7c6 Add sandbox to cgroup
Sandbox creation uses the limits and reservations configured in the
OCI spec and set cgroup options accordinly. Then it puts both the
sandbox and gofer processes inside the cgroup.

It also allows the cgroup to be pre-configured by the caller. If the
cgroup already exists, sandbox and gofer processes will join the
cgroup but it will not modify the cgroup with spec limits.

PiperOrigin-RevId: 216538209
Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-10 09:00:42 -07:00
Fabricio Voznika 20508bafb8 Add tests to verify gofer is chroot'ed
PiperOrigin-RevId: 216472439
Change-Id: Ic4cb86c8e0a9cb022d3ceed9dc5615266c307cf9
2018-10-09 21:07:14 -07:00
Nicolas Lacasse 7a6412cb0b runsc: Allow state transition from Creating to Stopped.
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.

Without this transition, errors during Create get hidden by the later panic.

PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
2018-10-03 11:49:40 -07:00
Nicolas Lacasse 37e57a903c Fix arithmetic error in multi_container_test.
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.

I switched back to Math.Pow format to make the arithmetic easier to inspect.

PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
2018-10-03 10:47:52 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Fabricio Voznika a2ad8fef13 Make multi-container the default mode for runsc
And remove multicontainer option.

PiperOrigin-RevId: 215236981
Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-10-01 10:31:17 -07:00
Fabricio Voznika 43e6aff50e Don't fail if Root is readonly and is not a mount point
This makes runsc more friendly to run without docker or K8s.

PiperOrigin-RevId: 215165586
Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
2018-09-30 23:23:03 -07:00
Fabricio Voznika 9c7eb13079 Removed duplicate/stale TODOs
PiperOrigin-RevId: 215162121
Change-Id: I35f06ac3235cf31c9e8a158dcf6261a7ded6c4c4
2018-09-30 22:22:18 -07:00
Fabricio Voznika 50c283b9f5 Add test for 'signall --all' with stopped container
PiperOrigin-RevId: 215025517
Change-Id: I04b9d8022b3d9dfe279e466ddb91310b9860b9af
2018-09-28 18:16:10 -07:00
Lantao Liu f21dde5666 runsc: allow `kill --all` when container is in stopped state.
PiperOrigin-RevId: 215009105
Change-Id: I1ab12eddf7694c4db98f6dafca9dae352a33f7c4
2018-09-28 15:53:25 -07:00
Fabricio Voznika 2496d9b4b6 Make runsc kill and delete more conformant to the "spec"
PiperOrigin-RevId: 214976251
Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28 12:22:21 -07:00
Fabricio Voznika 1166c088fc Move common test code to function
PiperOrigin-RevId: 214890335
Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27 22:53:18 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Fabricio Voznika 6910ff3643 Move uds_test_app to common test_app
This was done so it's easier to add more functionality
to this file for other tests.

PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
2018-09-27 08:58:23 -07:00
Ian Gudger 7ce13ebcad Run gofmt -s on everything
PiperOrigin-RevId: 214040901
Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21 14:06:59 -07:00
Nicolas Lacasse d260e808f4 The "action" in container.Signal should be "signal".
PiperOrigin-RevId: 214038776
Change-Id: I4ad212540ec4ef4fb5ab5fdcb7f0865c4f746895
2018-09-21 13:54:35 -07:00
Nicolas Lacasse b4321f4447 runsc: Synchronize container metadata changes with a file lock.
Each container has associated metadata (particularly the container status) that
is manipulated by various runsc commands. This metadata is stored in a file
identified by the container id.

Different runsc processes may manipulate the same container metadata, and each
will read/write to the metadata file.

This CL adds a file lock per container which must be held when reading the
container metadata file, and when modifying and writing the container metadata.

PiperOrigin-RevId: 214019179
Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad
2018-09-21 11:42:06 -07:00
Fabricio Voznika b63c4bfe02 Set Sandbox.Chroot so it gets cleaned up upon destruction
I've made several attempts to create a test, but the lack of
permission from the test user makes it nearly impossible to
test anything useful.

PiperOrigin-RevId: 213922174
Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
2018-09-20 18:54:09 -07:00
Lantao Liu 8a938a3f9d runsc: allow `runsc wait` on a container for multiple times.
PiperOrigin-RevId: 213908919
Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7
2018-09-20 16:59:42 -07:00
Lantao Liu 9464b82a06 runsc: Fix a bug that `runsc wait` doesn't work after container exits.
PiperOrigin-RevId: 213849165
Change-Id: I5120b2f568850c0c42a08e8706e7f8653ef1bd94
2018-09-20 11:23:26 -07:00
Nicolas Lacasse 915d76aa92 Add container.Destroy urpc method.
This method will:
1. Stop the container process if it is still running.
2. Unmount all sanadbox-internal mounts for the container.
3. Delete the contaner root directory inside the sandbox.

Destroy is idempotent, and safe to call concurrantly.

This fixes a bug where after stopping a container, we cannot unmount the
container root directory on the host. This bug occured because the sandbox
dirent cache was holding a dirent with a host fd corresponding to a file inside
the container root on the host. The dirent cache did not know that the
container had exited, and kept the FD open, preventing us from unmounting on
the host.

Now that we unmount (and flush) all container mounts inside the sandbox, any
host FDs donated by the gofer will be closed, and we can unmount the container
root on the host.

PiperOrigin-RevId: 213737693
Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-19 18:54:14 -07:00
Kevin Krakauer 639226c3d9 runsc: Mark container_test flaky.
PiperOrigin-RevId: 213732520
Change-Id: Ife292987ec8b1de4c2e7e3b7d4452b00c1582e91
2018-09-19 18:03:35 -07:00
Fabricio Voznika 8aec7473a1 Added state machine checks for Container.Status
For my own sanitity when thinking about possible transitions and state.

PiperOrigin-RevId: 213559482
Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c
2018-09-18 19:12:54 -07:00
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Fabricio Voznika 26b08e182c Rename container in test
's' used to stand for sandbox, before container exited.

PiperOrigin-RevId: 213390641
Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17 21:18:27 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Lantao Liu bde2a91433 runsc: Support container signal/wait.
This CL:
1) Fix `runsc wait`, it now also works after the container exits;
2) Generate correct container state in Load;
2) Make sure `Destory` cleanup everything before successfully return.

PiperOrigin-RevId: 212900107
Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-13 16:38:03 -07:00
Kevin Krakauer 2eff1fdd06 runsc: Add exec flag that specifies where to save the sandbox-internal pid.
This is different from the existing -pid-file flag, which saves a host pid.

PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12 15:23:35 -07:00
Fabricio Voznika bc81f3fe4a Remove '--file-access=direct' option
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.

PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07 12:28:48 -07:00
Nicolas Lacasse 210c252089 runsc: Run sandbox process inside minimal chroot.
We construct a dir with the executable bind-mounted at /exe, and proc mounted
at /proc.  Runsc now executes the sandbox process inside this chroot, thus
limiting access to the host filesystem.  The mounts and chroot dir are removed
when the sandbox is destroyed.

Because this requires bind-mounts, we can only do the chroot if we have
CAP_SYS_ADMIN.

PiperOrigin-RevId: 211994001
Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-07 10:16:39 -07:00
Kevin Krakauer d95663a6b9 runsc testing: Move TestMultiContainerSignal to multi_container_test.
PiperOrigin-RevId: 211831396
Change-Id: Id67f182cb43dccb696180ec967f5b96176f252e0
2018-09-06 10:41:55 -07:00
Kevin Krakauer 8f0b6e7fc0 runsc: Support runsc kill multi-container.
Now, we can kill individual containers rather than the entire sandbox.

PiperOrigin-RevId: 211748106
Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05 21:14:56 -07:00
Fabricio Voznika 12aef686af Enabled bind mounts in sub-containers
With multi-gofers, bind mounts in sub-containers should
just work. Removed restrictions and added test. There are
also a few cleanups along the way, e.g. retry unmounting
in case cleanup races with gofer teardown.

PiperOrigin-RevId: 211699569
Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
2018-09-05 14:30:09 -07:00
Fabricio Voznika 0c7cfca0da Running container should have a valid sandbox
PiperOrigin-RevId: 211693868
Change-Id: Iea340dd78bf26ae6409c310b63c17cc611c2055f
2018-09-05 14:02:45 -07:00
Fabricio Voznika 1d22d87fdc Move multi-container test to a single file
PiperOrigin-RevId: 211685288
Change-Id: I7872f2a83fcaaa54f385e6e567af6e72320c5aa0
2018-09-05 13:13:26 -07:00
Nicolas Lacasse f96b33c73c runsc: Promote getExecutablePathInternal to getExecutablePath.
Remove GetExecutablePath (the non-internal version).  This makes path handling
more consistent between exec, root, and child containers.

The new getExecutablePath now uses MountNamespace.FindInode, which is more
robust than Walking the Dirent tree ourselves.

This also removes the last use of lstat(2) in the sentry, so that can be
removed from the filters.

PiperOrigin-RevId: 211683110
Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-05 13:01:21 -07:00
Lantao Liu 9ae4e28f75 runsc: fix container rootfs path.
PiperOrigin-RevId: 211515350
Change-Id: Ia495af57447c799909aa97bb873a50b87bee2625
2018-09-04 13:37:40 -07:00
Fabricio Voznika 66c03b3dd7 Mounting over '/tmp' may fail
PiperOrigin-RevId: 211160120
Change-Id: Ie5f280bdac17afd01cb16562ffff6222b3184c34
2018-08-31 16:12:08 -07:00
Lantao Liu be9f454eb6 runsc: Set volume mount rslave.
PiperOrigin-RevId: 211111376
Change-Id: I27b8cb4e070d476fa4781ed6ecfa0cf1dcaf85f5
2018-08-31 11:03:22 -07:00
Lantao Liu d8f0db9bcf runsc: unmount volume mounts when destroy container.
PiperOrigin-RevId: 210579178
Change-Id: Iae20639c5186b1a976cbff6d05bda134cd00d0da
2018-08-28 11:54:07 -07:00
Kevin Krakauer a4529c1b5b runsc: Fix readonly filesystem causing failure to create containers.
For readonly filesystems specified via relative path, we were forgetting to
mount relative to the container's bundle directory.

PiperOrigin-RevId: 210483388
Change-Id: I84809fce4b1f2056d0e225547cb611add5f74177
2018-08-27 20:34:27 -07:00
Nicolas Lacasse 0b3bfe2ea3 fs: Fix remote-revalidate cache policy.
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.

If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.

Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.

PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
2018-08-27 14:26:29 -07:00
Fabricio Voznika db81c0b02f Put fsgofer inside chroot
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.

PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-27 11:10:14 -07:00
Kevin Krakauer a78df1d874 runsc: De-flakes container_test TestMultiContainerSanity.
The bug was caused by os.File's finalizer, which closes the file. Because
fsgofer.serve() was passed a file descriptor as an int rather than a os.File,
callers would pass os.File.Fd(), and the os.File would go out of scope. Thus,
the file would get GC'd and finalized nondeterministically, causing failures
when the file was used.

PiperOrigin-RevId: 209861834
Change-Id: Idf24d5c1f04c9b28659e62c97202ab3b4d72e994
2018-08-22 17:55:15 -07:00
Fabricio Voznika e2ab7ec39e Fix TestUnixDomainSockets failure when path is too large
UDS has a lower size limit than regular files. When running under bazel
this limit is exceeded. Test was changed to always mount /tmp and use
it for the test.

PiperOrigin-RevId: 209717830
Change-Id: I1dbe19fe2051ffdddbaa32b188a9167f446ed193
2018-08-21 23:07:39 -07:00
Kevin Krakauer ae68e9e751 Temporarily skip multi-container tests in container_test until deflaked.
PiperOrigin-RevId: 209679235
Change-Id: I527e779eeb113d0c162f5e27a2841b9486f0e39f
2018-08-21 16:21:05 -07:00
Fabricio Voznika 19ef2ad1fe nonExclusiveFS is causing timeout with --race
Not sure why, just removed for now to unblock the tests.

PiperOrigin-RevId: 209661403
Change-Id: I72785c071687d54e22bda9073d36b447d52a7018
2018-08-21 14:35:08 -07:00
Fabricio Voznika a854678bc3 Move container_test to the container package
PiperOrigin-RevId: 209655274
Change-Id: Id381114bdb3197c73e14f74b3f6cf1afd87d60cb
2018-08-21 14:02:19 -07:00
Fabricio Voznika d6d165cb0b Initial change for multi-gofer support
PiperOrigin-RevId: 209647293
Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-21 13:14:43 -07:00
Fabricio Voznika 0fc7b30695 Standardize mounts in tests
Tests get a readonly rootfs mapped to / (which was the case before)
and writable TEST_TMPDIR. This makes it easier to setup containers to
write to files and to share state between test and containers.

PiperOrigin-RevId: 209453224
Change-Id: I4d988e45dc0909a0450a3bb882fe280cf9c24334
2018-08-20 11:26:39 -07:00
Fabricio Voznika 11800311a5 Add nonExclusiveFS dimension to more tests
The ones using 'kvm' actually mean that they don't want overlay.

PiperOrigin-RevId: 209194318
Change-Id: I941a443cb6d783e2c80cf66eb8d8630bcacdb574
2018-08-17 13:07:09 -07:00
Fabricio Voznika da087e66cc Combine functions to search for file under one common function
Bazel adds the build type in front of directories making it hard to
refer to binaries in code.

PiperOrigin-RevId: 209010854
Change-Id: I6c9da1ac3bbe79766868a3b14222dd42d03b4ec5
2018-08-16 10:55:45 -07:00
Kevin Krakauer 635b0c4593 runsc fsgofer: Support dynamic serving of filesystems.
When multiple containers run inside a sentry, each container has its own root
filesystem and set of mounts. Containers are also added after sentry boot rather
than all configured and known at boot time.

The fsgofer needs to be able to serve the root filesystem of each container.
Thus, it must be possible to add filesystems after the fsgofer has already
started.

This change:
* Creates a URPC endpoint within the gofer process that listens for requests to
  serve new content.
* Enables the sentry, when starting a new container, to add the new container's
  filesystem.
* Mounts those new filesystems at separate roots within the sentry.

PiperOrigin-RevId: 208903248
Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15 16:25:22 -07:00
Nicolas Lacasse e8a4f2e133 runsc: Change cache policy for root fs and volume mounts.
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively.  While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.

This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.

This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.

A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.

All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.

PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-14 16:25:58 -07:00
Brielle Broder 4ececd8e8d Enable checkpoint/restore in cases of UDS use.
Previously, processes which used file-system Unix Domain Sockets could not be
checkpoint-ed in runsc because the sockets were saved with their inode
numbers which do not necessarily remain the same upon restore. Now,
the sockets are also saved with their paths so that the new inodes
can be determined for the sockets based on these paths after restoring.
Tests for cases with UDS use are included. Test cleanup to come.

PiperOrigin-RevId: 208268781
Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec
2018-08-10 14:33:20 -07:00
Fabricio Voznika 9752174a7f Disable KVM dimension because it's making the test flaky
PiperOrigin-RevId: 207642348
Change-Id: Iacec9f097ab93b91c0c8eea61b1347e864f57a8b
2018-08-06 18:08:25 -07:00
Fabricio Voznika b8f96a9d0b Replace sleeps with waits in tests - part II
PiperOrigin-RevId: 206333130
Change-Id: Ic85874dbd53c5de2164a7bb75769d52d43666c2a
2018-07-27 10:10:10 -07:00
Fabricio Voznika e5adf42f66 Replace sleeps with waits in tests - part I
PiperOrigin-RevId: 206084473
Change-Id: I44e1b64b9cdd2964357799dca27cc0cbc19ce07d
2018-07-25 17:37:53 -07:00
Fabricio Voznika d7a34790a0 Add KVM and overlay dimensions to container_test
PiperOrigin-RevId: 205714667
Change-Id: I317a2ca98ac3bdad97c4790fcc61b004757d99ef
2018-07-23 13:31:42 -07:00
Justine Olshan c05660373e Moved restore code out of create and made to be called after create.
Docker expects containers to be created before they are restored.
However, gVisor restoring requires specificactions regarding the kernel
and the file system. These actions were originally in booting the sandbox.

Now setting up the file system is deferred until a call to a call to
runsc start. In the restore case, the kernel is destroyed and a new kernel
is created in the same process, as we need the same process for Docker.

These changes required careful execution of concurrent processes which
required the use of a channel.

Full docker integration still needs the ability to restore into the same
container.

PiperOrigin-RevId: 205161441
Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-18 16:58:30 -07:00
Nicolas Lacasse e5d8f99c60 runsc: Fixes to CheckpointRestoreTest.
We must delete the output file at the beginning of the test, otherwise the test
fails immediately.

Also some minor cleanups in readOutputFile.

PiperOrigin-RevId: 205150525
Change-Id: I6bae1acd5b315320a2c6e25a59afcfc06267fb17
2018-07-18 15:46:37 -07:00
Fabricio Voznika 52ddb8571c Skip overlay on root when its readonly
PiperOrigin-RevId: 203161098
Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
2018-07-03 12:01:09 -07:00
Lantao Liu 126296ce2a runsc: fix panic for `runsc wait` on stopped container.
PiperOrigin-RevId: 203016694
Change-Id: Ic51ef754aa6d7d1b3b35491aff96a63d7992e122
2018-07-02 14:52:21 -07:00
Brielle Broder ca353b53ed Fix typo.
PiperOrigin-RevId: 202720658
Change-Id: Iff42fd23f831ee7f29ddd6eb867020b76ed1eb23
2018-06-29 15:51:32 -07:00
Justine Olshan 80bdf8a406 Sets the restore environment for restoring a container.
Updated how restoring occurs through boot.go with a separate Restore function.
This prevents a new process and new mounts from being created.
Added tests to ensure the container is restored.
Registered checkpoint and restore commands so they can be used.
Docker support for these commands is still limited.
Working on #80.

PiperOrigin-RevId: 202710950
Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-29 14:47:40 -07:00
Brielle Broder 25e315c2e1 Added leave-running flag for checkpoint.
The leave-running flag allows the container to continue running after a
checkpoint has occurred by doing an immediate restore into a new
container with the same container ID after the old container is destroyed.

Updates #80.

PiperOrigin-RevId: 202695426
Change-Id: Iac50437f5afda018dc18b24bb8ddb935983cf336
2018-06-29 13:09:33 -07:00
Kevin Krakauer 16d37973eb runsc: Add the "wait" subcommand.
Users can now call "runsc wait <container id>" to wait on a particular process
inside the container. -pid can also be used to wait on a specific PID.

Manually tested the wait subcommand for a single waiter and multiple waiters
(simultaneously 2 processes waiting on the container and 2 processes waiting on
a PID within the container).

PiperOrigin-RevId: 202548978
Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
2018-06-28 14:56:36 -07:00
Fabricio Voznika bb31a11903 Wait for sandbox process when waiting for root container
Closes #71

PiperOrigin-RevId: 202532762
Change-Id: I80a446ff638672ff08e6fd853cd77e28dd05d540
2018-06-28 13:23:04 -07:00
Fabricio Voznika 8459390cdd Error out if spec is invalid
Closes #66

PiperOrigin-RevId: 202496258
Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
2018-06-28 09:57:27 -07:00
Lantao Liu e8ae2b85e9 runsc: add a `multi-container` flag to enable multi-container support.
PiperOrigin-RevId: 201995800
Change-Id: I770190d135e14ec7da4b3155009fe10121b2a502
2018-06-25 12:08:44 -07:00
Kevin Krakauer 04bdcc7b65 runsc: Enable waiting on individual containers within a sandbox.
PiperOrigin-RevId: 201742160
Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31
2018-06-22 14:31:25 -07:00
Brielle Broder 7d6149063a Restore implementation added to runsc.
Restore creates a new container and uses the given image-path to load a saved
image of a previous container. Restore command is plumbed through container
and sandbox. This command does not work yet - more to come.

PiperOrigin-RevId: 201541229
Change-Id: I864a14c799ce3717d99bcdaaebc764281863d06f
2018-06-21 09:58:24 -07:00
Fabricio Voznika 95cb01e0a9 Reduce test sleep time
PiperOrigin-RevId: 201428433
Change-Id: I72de1e46788ec84f61513416bb690956e515907e
2018-06-20 15:32:15 -07:00
Kevin Krakauer 5397963b5d runsc: Enable container creation within existing sandboxes.
Containers are created as processes in the sandbox. Of the many things that
don't work yet, the biggest issue is that the fsgofer is launched with its root
as the sandbox's root directory. Thus, when a container is started and wants to
read anything (including the init binary of the container), the gofer tries to
serve from sandbox's root (which basically just has pause), not the container's.

PiperOrigin-RevId: 201294560
Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1
2018-06-19 21:44:33 -07:00