Commit Graph

68 Commits

Author SHA1 Message Date
Fabricio Voznika 92cf3764e0 Create working directory if it doesn't yet exist
PiperOrigin-RevId: 229438125
Change-Id: I58eb0d10178d1adfc709d7b859189d1acbcb2f22
2019-01-15 14:13:27 -08:00
Andrei Vagin f8c8f24154 runsc: Collect zombies of sandbox and gofer processes
And we need to wait a gofer process before cgroup.Uninstall,
because it is running in the sandbox cgroups.

PiperOrigin-RevId: 228904020
Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
2019-01-11 10:32:26 -08:00
Nicolas Lacasse adf8138e06 Allow sandbox.Wait to be called after the sandbox has exited.
sandbox.Wait is racey, as the sandbox may have exited before it is called, or
even during.

We already had code to handle the case that the sandbox exits during the Wait
call, but we were not properly handling the case where the sandbox has exited
before the call.

The best we can do in such cases is return the sandbox exit code as the
application exit code.

PiperOrigin-RevId: 221702517
Change-Id: I290d0333cc094c7c1c3b4ce0f17f61a3e908d787
2018-11-15 15:35:41 -08:00
Fabricio Voznika d12a0dd6b8 Fix test --race violation
SetupContainerInRoot was setting Config.RootDir unnecessarily
and causing a --race violation in TestMultiContainerDestroyStarting.

PiperOrigin-RevId: 220580073
Change-Id: Ie0b28c19846106c7458a92681b708ae70f87d25a
2018-11-07 21:30:59 -08:00
Fabricio Voznika 86b3f0cd24 Fix race between start and destroy
Before this change, a container starting up could race with
destroy (aka delete) and leave processes behind.

Now, whenever a container is created, Loader.processes gets
a new entry. Start now expects the entry to be there, and if
it's not it means that the container was deleted.

I've also fixed Loader.waitPID to search for the process using
the init process's PID namespace.

We could use a few more tests for signal and wait. I'll send
them in another cl.

PiperOrigin-RevId: 220224290
Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
2018-11-05 21:29:37 -08:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Fabricio Voznika f3ffa4db52 Resolve mount paths while setting up root fs mount
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.

Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.

It haven't been able to write a good test for it. So I'm relying on manual tests
for now.

PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-18 12:42:24 -07:00
Nicolas Lacasse 4e6f0892c9 runsc: Support job control signals for the root container.
Now containers run with "docker run -it" support control characters like ^C and
^Z.

This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.

PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17 12:29:05 -07:00
Kevin Krakauer 9b3550f70b runsc: Add --pid flag to runsc kill.
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.

PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-17 10:51:39 -07:00
Nicolas Lacasse ea5f6ed6ec Make Wait() return the sandbox exit status if the sandbox has exited.
It's possible for Start() and Wait() calls to race, if the sandboxed
application is short-lived. If the application finishes before (or during) the
Wait RPC, then Wait will fail.  In practice this looks like "connection
refused" or "EOF" errors when waiting for an RPC response.

This race is especially bad in tests, where we often run "true" inside a
sandbox.

This CL does a best-effort fix, by returning the sandbox exit status as the
container exit status.  In most cases, these are the same.

This fixes the remaining flakes in runsc/container:container_test.

PiperOrigin-RevId: 216777793
Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
2018-10-11 16:07:05 -07:00
Fabricio Voznika f413e4b117 Add bare bones unsupported syscall logging
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.

For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.

PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-11 11:56:54 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Fabricio Voznika 43e6aff50e Don't fail if Root is readonly and is not a mount point
This makes runsc more friendly to run without docker or K8s.

PiperOrigin-RevId: 215165586
Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
2018-09-30 23:23:03 -07:00
Fabricio Voznika 1166c088fc Move common test code to function
PiperOrigin-RevId: 214890335
Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27 22:53:18 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Fabricio Voznika 6910ff3643 Move uds_test_app to common test_app
This was done so it's easier to add more functionality
to this file for other tests.

PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
2018-09-27 08:58:23 -07:00
Ian Gudger 7ce13ebcad Run gofmt -s on everything
PiperOrigin-RevId: 214040901
Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21 14:06:59 -07:00
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Fabricio Voznika 26b08e182c Rename container in test
's' used to stand for sandbox, before container exited.

PiperOrigin-RevId: 213390641
Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17 21:18:27 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Lantao Liu bde2a91433 runsc: Support container signal/wait.
This CL:
1) Fix `runsc wait`, it now also works after the container exits;
2) Generate correct container state in Load;
2) Make sure `Destory` cleanup everything before successfully return.

PiperOrigin-RevId: 212900107
Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-13 16:38:03 -07:00
Kevin Krakauer 2eff1fdd06 runsc: Add exec flag that specifies where to save the sandbox-internal pid.
This is different from the existing -pid-file flag, which saves a host pid.

PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12 15:23:35 -07:00
Fabricio Voznika bc81f3fe4a Remove '--file-access=direct' option
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.

PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07 12:28:48 -07:00
Kevin Krakauer d95663a6b9 runsc testing: Move TestMultiContainerSignal to multi_container_test.
PiperOrigin-RevId: 211831396
Change-Id: Id67f182cb43dccb696180ec967f5b96176f252e0
2018-09-06 10:41:55 -07:00
Kevin Krakauer 8f0b6e7fc0 runsc: Support runsc kill multi-container.
Now, we can kill individual containers rather than the entire sandbox.

PiperOrigin-RevId: 211748106
Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05 21:14:56 -07:00
Fabricio Voznika 1d22d87fdc Move multi-container test to a single file
PiperOrigin-RevId: 211685288
Change-Id: I7872f2a83fcaaa54f385e6e567af6e72320c5aa0
2018-09-05 13:13:26 -07:00
Nicolas Lacasse f96b33c73c runsc: Promote getExecutablePathInternal to getExecutablePath.
Remove GetExecutablePath (the non-internal version).  This makes path handling
more consistent between exec, root, and child containers.

The new getExecutablePath now uses MountNamespace.FindInode, which is more
robust than Walking the Dirent tree ourselves.

This also removes the last use of lstat(2) in the sentry, so that can be
removed from the filters.

PiperOrigin-RevId: 211683110
Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-05 13:01:21 -07:00
Fabricio Voznika 66c03b3dd7 Mounting over '/tmp' may fail
PiperOrigin-RevId: 211160120
Change-Id: Ie5f280bdac17afd01cb16562ffff6222b3184c34
2018-08-31 16:12:08 -07:00
Nicolas Lacasse 0b3bfe2ea3 fs: Fix remote-revalidate cache policy.
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.

If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.

Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.

PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
2018-08-27 14:26:29 -07:00
Fabricio Voznika db81c0b02f Put fsgofer inside chroot
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.

PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-27 11:10:14 -07:00
Kevin Krakauer a78df1d874 runsc: De-flakes container_test TestMultiContainerSanity.
The bug was caused by os.File's finalizer, which closes the file. Because
fsgofer.serve() was passed a file descriptor as an int rather than a os.File,
callers would pass os.File.Fd(), and the os.File would go out of scope. Thus,
the file would get GC'd and finalized nondeterministically, causing failures
when the file was used.

PiperOrigin-RevId: 209861834
Change-Id: Idf24d5c1f04c9b28659e62c97202ab3b4d72e994
2018-08-22 17:55:15 -07:00
Fabricio Voznika e2ab7ec39e Fix TestUnixDomainSockets failure when path is too large
UDS has a lower size limit than regular files. When running under bazel
this limit is exceeded. Test was changed to always mount /tmp and use
it for the test.

PiperOrigin-RevId: 209717830
Change-Id: I1dbe19fe2051ffdddbaa32b188a9167f446ed193
2018-08-21 23:07:39 -07:00
Kevin Krakauer ae68e9e751 Temporarily skip multi-container tests in container_test until deflaked.
PiperOrigin-RevId: 209679235
Change-Id: I527e779eeb113d0c162f5e27a2841b9486f0e39f
2018-08-21 16:21:05 -07:00
Fabricio Voznika 19ef2ad1fe nonExclusiveFS is causing timeout with --race
Not sure why, just removed for now to unblock the tests.

PiperOrigin-RevId: 209661403
Change-Id: I72785c071687d54e22bda9073d36b447d52a7018
2018-08-21 14:35:08 -07:00
Fabricio Voznika a854678bc3 Move container_test to the container package
PiperOrigin-RevId: 209655274
Change-Id: Id381114bdb3197c73e14f74b3f6cf1afd87d60cb
2018-08-21 14:02:19 -07:00
Fabricio Voznika d6d165cb0b Initial change for multi-gofer support
PiperOrigin-RevId: 209647293
Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-21 13:14:43 -07:00
Fabricio Voznika 0fc7b30695 Standardize mounts in tests
Tests get a readonly rootfs mapped to / (which was the case before)
and writable TEST_TMPDIR. This makes it easier to setup containers to
write to files and to share state between test and containers.

PiperOrigin-RevId: 209453224
Change-Id: I4d988e45dc0909a0450a3bb882fe280cf9c24334
2018-08-20 11:26:39 -07:00
Fabricio Voznika 11800311a5 Add nonExclusiveFS dimension to more tests
The ones using 'kvm' actually mean that they don't want overlay.

PiperOrigin-RevId: 209194318
Change-Id: I941a443cb6d783e2c80cf66eb8d8630bcacdb574
2018-08-17 13:07:09 -07:00
Fabricio Voznika da087e66cc Combine functions to search for file under one common function
Bazel adds the build type in front of directories making it hard to
refer to binaries in code.

PiperOrigin-RevId: 209010854
Change-Id: I6c9da1ac3bbe79766868a3b14222dd42d03b4ec5
2018-08-16 10:55:45 -07:00
Nicolas Lacasse e8a4f2e133 runsc: Change cache policy for root fs and volume mounts.
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively.  While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.

This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.

This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.

A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.

All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.

PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-14 16:25:58 -07:00
Brielle Broder 4ececd8e8d Enable checkpoint/restore in cases of UDS use.
Previously, processes which used file-system Unix Domain Sockets could not be
checkpoint-ed in runsc because the sockets were saved with their inode
numbers which do not necessarily remain the same upon restore. Now,
the sockets are also saved with their paths so that the new inodes
can be determined for the sockets based on these paths after restoring.
Tests for cases with UDS use are included. Test cleanup to come.

PiperOrigin-RevId: 208268781
Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec
2018-08-10 14:33:20 -07:00
Fabricio Voznika 9752174a7f Disable KVM dimension because it's making the test flaky
PiperOrigin-RevId: 207642348
Change-Id: Iacec9f097ab93b91c0c8eea61b1347e864f57a8b
2018-08-06 18:08:25 -07:00
Fabricio Voznika b8f96a9d0b Replace sleeps with waits in tests - part II
PiperOrigin-RevId: 206333130
Change-Id: Ic85874dbd53c5de2164a7bb75769d52d43666c2a
2018-07-27 10:10:10 -07:00
Fabricio Voznika e5adf42f66 Replace sleeps with waits in tests - part I
PiperOrigin-RevId: 206084473
Change-Id: I44e1b64b9cdd2964357799dca27cc0cbc19ce07d
2018-07-25 17:37:53 -07:00
Fabricio Voznika d7a34790a0 Add KVM and overlay dimensions to container_test
PiperOrigin-RevId: 205714667
Change-Id: I317a2ca98ac3bdad97c4790fcc61b004757d99ef
2018-07-23 13:31:42 -07:00
Justine Olshan c05660373e Moved restore code out of create and made to be called after create.
Docker expects containers to be created before they are restored.
However, gVisor restoring requires specificactions regarding the kernel
and the file system. These actions were originally in booting the sandbox.

Now setting up the file system is deferred until a call to a call to
runsc start. In the restore case, the kernel is destroyed and a new kernel
is created in the same process, as we need the same process for Docker.

These changes required careful execution of concurrent processes which
required the use of a channel.

Full docker integration still needs the ability to restore into the same
container.

PiperOrigin-RevId: 205161441
Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-18 16:58:30 -07:00
Nicolas Lacasse e5d8f99c60 runsc: Fixes to CheckpointRestoreTest.
We must delete the output file at the beginning of the test, otherwise the test
fails immediately.

Also some minor cleanups in readOutputFile.

PiperOrigin-RevId: 205150525
Change-Id: I6bae1acd5b315320a2c6e25a59afcfc06267fb17
2018-07-18 15:46:37 -07:00
Fabricio Voznika 52ddb8571c Skip overlay on root when its readonly
PiperOrigin-RevId: 203161098
Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
2018-07-03 12:01:09 -07:00
Lantao Liu 126296ce2a runsc: fix panic for `runsc wait` on stopped container.
PiperOrigin-RevId: 203016694
Change-Id: Ic51ef754aa6d7d1b3b35491aff96a63d7992e122
2018-07-02 14:52:21 -07:00
Justine Olshan 80bdf8a406 Sets the restore environment for restoring a container.
Updated how restoring occurs through boot.go with a separate Restore function.
This prevents a new process and new mounts from being created.
Added tests to ensure the container is restored.
Registered checkpoint and restore commands so they can be used.
Docker support for these commands is still limited.
Working on #80.

PiperOrigin-RevId: 202710950
Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-29 14:47:40 -07:00