Commit Graph

87 Commits

Author SHA1 Message Date
Nicolas Lacasse 7a6412cb0b runsc: Allow state transition from Creating to Stopped.
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.

Without this transition, errors during Create get hidden by the later panic.

PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
2018-10-03 11:49:40 -07:00
Nicolas Lacasse 37e57a903c Fix arithmetic error in multi_container_test.
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.

I switched back to Math.Pow format to make the arithmetic easier to inspect.

PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
2018-10-03 10:47:52 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Fabricio Voznika a2ad8fef13 Make multi-container the default mode for runsc
And remove multicontainer option.

PiperOrigin-RevId: 215236981
Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-10-01 10:31:17 -07:00
Fabricio Voznika 43e6aff50e Don't fail if Root is readonly and is not a mount point
This makes runsc more friendly to run without docker or K8s.

PiperOrigin-RevId: 215165586
Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
2018-09-30 23:23:03 -07:00
Fabricio Voznika 9c7eb13079 Removed duplicate/stale TODOs
PiperOrigin-RevId: 215162121
Change-Id: I35f06ac3235cf31c9e8a158dcf6261a7ded6c4c4
2018-09-30 22:22:18 -07:00
Fabricio Voznika 50c283b9f5 Add test for 'signall --all' with stopped container
PiperOrigin-RevId: 215025517
Change-Id: I04b9d8022b3d9dfe279e466ddb91310b9860b9af
2018-09-28 18:16:10 -07:00
Lantao Liu f21dde5666 runsc: allow `kill --all` when container is in stopped state.
PiperOrigin-RevId: 215009105
Change-Id: I1ab12eddf7694c4db98f6dafca9dae352a33f7c4
2018-09-28 15:53:25 -07:00
Fabricio Voznika 2496d9b4b6 Make runsc kill and delete more conformant to the "spec"
PiperOrigin-RevId: 214976251
Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28 12:22:21 -07:00
Fabricio Voznika 1166c088fc Move common test code to function
PiperOrigin-RevId: 214890335
Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27 22:53:18 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Fabricio Voznika 6910ff3643 Move uds_test_app to common test_app
This was done so it's easier to add more functionality
to this file for other tests.

PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
2018-09-27 08:58:23 -07:00
Ian Gudger 7ce13ebcad Run gofmt -s on everything
PiperOrigin-RevId: 214040901
Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21 14:06:59 -07:00
Nicolas Lacasse d260e808f4 The "action" in container.Signal should be "signal".
PiperOrigin-RevId: 214038776
Change-Id: I4ad212540ec4ef4fb5ab5fdcb7f0865c4f746895
2018-09-21 13:54:35 -07:00
Nicolas Lacasse b4321f4447 runsc: Synchronize container metadata changes with a file lock.
Each container has associated metadata (particularly the container status) that
is manipulated by various runsc commands. This metadata is stored in a file
identified by the container id.

Different runsc processes may manipulate the same container metadata, and each
will read/write to the metadata file.

This CL adds a file lock per container which must be held when reading the
container metadata file, and when modifying and writing the container metadata.

PiperOrigin-RevId: 214019179
Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad
2018-09-21 11:42:06 -07:00
Fabricio Voznika b63c4bfe02 Set Sandbox.Chroot so it gets cleaned up upon destruction
I've made several attempts to create a test, but the lack of
permission from the test user makes it nearly impossible to
test anything useful.

PiperOrigin-RevId: 213922174
Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
2018-09-20 18:54:09 -07:00
Lantao Liu 8a938a3f9d runsc: allow `runsc wait` on a container for multiple times.
PiperOrigin-RevId: 213908919
Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7
2018-09-20 16:59:42 -07:00
Lantao Liu 9464b82a06 runsc: Fix a bug that `runsc wait` doesn't work after container exits.
PiperOrigin-RevId: 213849165
Change-Id: I5120b2f568850c0c42a08e8706e7f8653ef1bd94
2018-09-20 11:23:26 -07:00
Nicolas Lacasse 915d76aa92 Add container.Destroy urpc method.
This method will:
1. Stop the container process if it is still running.
2. Unmount all sanadbox-internal mounts for the container.
3. Delete the contaner root directory inside the sandbox.

Destroy is idempotent, and safe to call concurrantly.

This fixes a bug where after stopping a container, we cannot unmount the
container root directory on the host. This bug occured because the sandbox
dirent cache was holding a dirent with a host fd corresponding to a file inside
the container root on the host. The dirent cache did not know that the
container had exited, and kept the FD open, preventing us from unmounting on
the host.

Now that we unmount (and flush) all container mounts inside the sandbox, any
host FDs donated by the gofer will be closed, and we can unmount the container
root on the host.

PiperOrigin-RevId: 213737693
Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-19 18:54:14 -07:00
Kevin Krakauer 639226c3d9 runsc: Mark container_test flaky.
PiperOrigin-RevId: 213732520
Change-Id: Ife292987ec8b1de4c2e7e3b7d4452b00c1582e91
2018-09-19 18:03:35 -07:00
Fabricio Voznika 8aec7473a1 Added state machine checks for Container.Status
For my own sanitity when thinking about possible transitions and state.

PiperOrigin-RevId: 213559482
Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c
2018-09-18 19:12:54 -07:00
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Fabricio Voznika 26b08e182c Rename container in test
's' used to stand for sandbox, before container exited.

PiperOrigin-RevId: 213390641
Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17 21:18:27 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Lantao Liu bde2a91433 runsc: Support container signal/wait.
This CL:
1) Fix `runsc wait`, it now also works after the container exits;
2) Generate correct container state in Load;
2) Make sure `Destory` cleanup everything before successfully return.

PiperOrigin-RevId: 212900107
Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-13 16:38:03 -07:00
Kevin Krakauer 2eff1fdd06 runsc: Add exec flag that specifies where to save the sandbox-internal pid.
This is different from the existing -pid-file flag, which saves a host pid.

PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12 15:23:35 -07:00
Fabricio Voznika bc81f3fe4a Remove '--file-access=direct' option
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.

PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07 12:28:48 -07:00
Nicolas Lacasse 210c252089 runsc: Run sandbox process inside minimal chroot.
We construct a dir with the executable bind-mounted at /exe, and proc mounted
at /proc.  Runsc now executes the sandbox process inside this chroot, thus
limiting access to the host filesystem.  The mounts and chroot dir are removed
when the sandbox is destroyed.

Because this requires bind-mounts, we can only do the chroot if we have
CAP_SYS_ADMIN.

PiperOrigin-RevId: 211994001
Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-07 10:16:39 -07:00
Kevin Krakauer d95663a6b9 runsc testing: Move TestMultiContainerSignal to multi_container_test.
PiperOrigin-RevId: 211831396
Change-Id: Id67f182cb43dccb696180ec967f5b96176f252e0
2018-09-06 10:41:55 -07:00
Kevin Krakauer 8f0b6e7fc0 runsc: Support runsc kill multi-container.
Now, we can kill individual containers rather than the entire sandbox.

PiperOrigin-RevId: 211748106
Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05 21:14:56 -07:00
Fabricio Voznika 12aef686af Enabled bind mounts in sub-containers
With multi-gofers, bind mounts in sub-containers should
just work. Removed restrictions and added test. There are
also a few cleanups along the way, e.g. retry unmounting
in case cleanup races with gofer teardown.

PiperOrigin-RevId: 211699569
Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
2018-09-05 14:30:09 -07:00
Fabricio Voznika 0c7cfca0da Running container should have a valid sandbox
PiperOrigin-RevId: 211693868
Change-Id: Iea340dd78bf26ae6409c310b63c17cc611c2055f
2018-09-05 14:02:45 -07:00
Fabricio Voznika 1d22d87fdc Move multi-container test to a single file
PiperOrigin-RevId: 211685288
Change-Id: I7872f2a83fcaaa54f385e6e567af6e72320c5aa0
2018-09-05 13:13:26 -07:00
Nicolas Lacasse f96b33c73c runsc: Promote getExecutablePathInternal to getExecutablePath.
Remove GetExecutablePath (the non-internal version).  This makes path handling
more consistent between exec, root, and child containers.

The new getExecutablePath now uses MountNamespace.FindInode, which is more
robust than Walking the Dirent tree ourselves.

This also removes the last use of lstat(2) in the sentry, so that can be
removed from the filters.

PiperOrigin-RevId: 211683110
Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-05 13:01:21 -07:00
Lantao Liu 9ae4e28f75 runsc: fix container rootfs path.
PiperOrigin-RevId: 211515350
Change-Id: Ia495af57447c799909aa97bb873a50b87bee2625
2018-09-04 13:37:40 -07:00
Fabricio Voznika 66c03b3dd7 Mounting over '/tmp' may fail
PiperOrigin-RevId: 211160120
Change-Id: Ie5f280bdac17afd01cb16562ffff6222b3184c34
2018-08-31 16:12:08 -07:00
Lantao Liu be9f454eb6 runsc: Set volume mount rslave.
PiperOrigin-RevId: 211111376
Change-Id: I27b8cb4e070d476fa4781ed6ecfa0cf1dcaf85f5
2018-08-31 11:03:22 -07:00
Lantao Liu d8f0db9bcf runsc: unmount volume mounts when destroy container.
PiperOrigin-RevId: 210579178
Change-Id: Iae20639c5186b1a976cbff6d05bda134cd00d0da
2018-08-28 11:54:07 -07:00
Kevin Krakauer a4529c1b5b runsc: Fix readonly filesystem causing failure to create containers.
For readonly filesystems specified via relative path, we were forgetting to
mount relative to the container's bundle directory.

PiperOrigin-RevId: 210483388
Change-Id: I84809fce4b1f2056d0e225547cb611add5f74177
2018-08-27 20:34:27 -07:00
Nicolas Lacasse 0b3bfe2ea3 fs: Fix remote-revalidate cache policy.
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.

If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.

Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.

PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
2018-08-27 14:26:29 -07:00
Fabricio Voznika db81c0b02f Put fsgofer inside chroot
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.

PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-27 11:10:14 -07:00
Kevin Krakauer a78df1d874 runsc: De-flakes container_test TestMultiContainerSanity.
The bug was caused by os.File's finalizer, which closes the file. Because
fsgofer.serve() was passed a file descriptor as an int rather than a os.File,
callers would pass os.File.Fd(), and the os.File would go out of scope. Thus,
the file would get GC'd and finalized nondeterministically, causing failures
when the file was used.

PiperOrigin-RevId: 209861834
Change-Id: Idf24d5c1f04c9b28659e62c97202ab3b4d72e994
2018-08-22 17:55:15 -07:00
Fabricio Voznika e2ab7ec39e Fix TestUnixDomainSockets failure when path is too large
UDS has a lower size limit than regular files. When running under bazel
this limit is exceeded. Test was changed to always mount /tmp and use
it for the test.

PiperOrigin-RevId: 209717830
Change-Id: I1dbe19fe2051ffdddbaa32b188a9167f446ed193
2018-08-21 23:07:39 -07:00
Kevin Krakauer ae68e9e751 Temporarily skip multi-container tests in container_test until deflaked.
PiperOrigin-RevId: 209679235
Change-Id: I527e779eeb113d0c162f5e27a2841b9486f0e39f
2018-08-21 16:21:05 -07:00
Fabricio Voznika 19ef2ad1fe nonExclusiveFS is causing timeout with --race
Not sure why, just removed for now to unblock the tests.

PiperOrigin-RevId: 209661403
Change-Id: I72785c071687d54e22bda9073d36b447d52a7018
2018-08-21 14:35:08 -07:00
Fabricio Voznika a854678bc3 Move container_test to the container package
PiperOrigin-RevId: 209655274
Change-Id: Id381114bdb3197c73e14f74b3f6cf1afd87d60cb
2018-08-21 14:02:19 -07:00
Fabricio Voznika d6d165cb0b Initial change for multi-gofer support
PiperOrigin-RevId: 209647293
Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-21 13:14:43 -07:00
Fabricio Voznika 0fc7b30695 Standardize mounts in tests
Tests get a readonly rootfs mapped to / (which was the case before)
and writable TEST_TMPDIR. This makes it easier to setup containers to
write to files and to share state between test and containers.

PiperOrigin-RevId: 209453224
Change-Id: I4d988e45dc0909a0450a3bb882fe280cf9c24334
2018-08-20 11:26:39 -07:00
Fabricio Voznika 11800311a5 Add nonExclusiveFS dimension to more tests
The ones using 'kvm' actually mean that they don't want overlay.

PiperOrigin-RevId: 209194318
Change-Id: I941a443cb6d783e2c80cf66eb8d8630bcacdb574
2018-08-17 13:07:09 -07:00
Fabricio Voznika da087e66cc Combine functions to search for file under one common function
Bazel adds the build type in front of directories making it hard to
refer to binaries in code.

PiperOrigin-RevId: 209010854
Change-Id: I6c9da1ac3bbe79766868a3b14222dd42d03b4ec5
2018-08-16 10:55:45 -07:00