Commit Graph

342 Commits

Author SHA1 Message Date
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Kevin Krakauer 7e00f37054 Automated rollback of changelist 213307171
PiperOrigin-RevId: 213504354
Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-18 13:22:26 -07:00
Fabricio Voznika 5d9816be41 Remove memory usage static init
panic() during init() can be hard to debug.

Updates #100

PiperOrigin-RevId: 213391932
Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17 21:34:37 -07:00
Fabricio Voznika 26b08e182c Rename container in test
's' used to stand for sandbox, before container exited.

PiperOrigin-RevId: 213390641
Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17 21:18:27 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Kevin Krakauer 25add7b22b runsc: Fix stdin/out/err in multi-container mode.
Stdin/out/err weren't being sent to the sentry.

PiperOrigin-RevId: 213307171
Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-17 11:31:28 -07:00
Lantao Liu bde2a91433 runsc: Support container signal/wait.
This CL:
1) Fix `runsc wait`, it now also works after the container exits;
2) Generate correct container state in Load;
2) Make sure `Destory` cleanup everything before successfully return.

PiperOrigin-RevId: 212900107
Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-13 16:38:03 -07:00
Kevin Krakauer 2eff1fdd06 runsc: Add exec flag that specifies where to save the sandbox-internal pid.
This is different from the existing -pid-file flag, which saves a host pid.

PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12 15:23:35 -07:00
Michael Pratt 0efde2bfbd Remove getdents from filters
It was only used by whitelistfs, which was removed in
bc81f3fe4a.

PiperOrigin-RevId: 212666374
Change-Id: Ia35e6dc9d68c1a3b015d5b5f71ea3e68e46c5bed
2018-09-12 10:51:25 -07:00
Michael Pratt b4aed01bf2 Rollback of changelist 212483372
PiperOrigin-RevId: 212557844
Change-Id: I414de848e75d57ecee2c05e851d05b607db4aa57
2018-09-11 17:54:50 -07:00
Nicolas Lacasse 6cc9b311af platform: Pass device fd into platform constructor.
We were previously openining the platform device (i.e. /dev/kvm) inside the
platfrom constructor (i.e. kvm.New).  This requires that we have RW access to
the platform device when constructing the platform.

However, now that the runsc sandbox process runs as user "nobody", it is not
able to open the platform device.

This CL changes the kvm constructor to take the platform device FD, rather than
opening the device file itself. The device file is opened outside of the
sandbox and passed to the sandbox process.

PiperOrigin-RevId: 212505804
Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-11 13:09:46 -07:00
Fabricio Voznika c44bc6612f Allow fstatat back in syscall filters
PiperOrigin-RevId: 212483372
Change-Id: If95f32a8e41126cf3dc8bd6c8b2fb0fcfefedc6d
2018-09-11 11:05:09 -07:00
Nicolas Lacasse e198f9ab02 runsc: Chmod all mounted files to 777 inside chroot.
Inside the chroot, we run as user nobody, so all mounted files and directories
must be accessible to all users.

PiperOrigin-RevId: 212284805
Change-Id: I705e0dbbf15e01e04e0c7f378a99daffe6866807
2018-09-10 10:00:16 -07:00
Nicolas Lacasse 0c0c942327 Automated rollback of changelist 212059579
PiperOrigin-RevId: 212069131
Change-Id: I01476f957bbf29d4ee5a3c11d59d4f863ba9f2df
2018-09-07 18:23:27 -07:00
Nicolas Lacasse 922d8c3c8c Automated rollback of changelist 211992321
PiperOrigin-RevId: 212066419
Change-Id: Icded56e7e117bfd9b644e6541bddcd110460a9b8
2018-09-07 17:56:07 -07:00
Nicolas Lacasse 9751b800a6 runsc: Support multi-container exec.
We must use a context.Context with a Root Dirent that corresponds to the
container's chroot. Previously we were using the root context, which does not
have a chroot.

Getting the correct context required refactoring some of the path-lookup code.
We can't lookup the path without a context.Context, which requires
kernel.CreateProcArgs, which we only get inside control.Execute.  So we have to
do the path lookup much later than we previously were.

PiperOrigin-RevId: 212064734
Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07 17:39:54 -07:00
Fabricio Voznika cf5006ff24 Disable test until we figure out what's broken
PiperOrigin-RevId: 212059579
Change-Id: I052c2192d3483d7bd0fd2232ef2023a12da66446
2018-09-07 17:00:41 -07:00
Adin Scannell 6cfb5cd56d Add additional sanity checks for walk.
PiperOrigin-RevId: 212058684
Change-Id: I319709b9ffcfccb3231bac98df345d2a20eca24b
2018-09-07 16:53:12 -07:00
Fabricio Voznika 8ce3fbf9f8 Only start signal forwarding after init process is created
PiperOrigin-RevId: 212028121
Change-Id: If9c2c62f3be103e2bb556b8d154c169888e34369
2018-09-07 13:39:12 -07:00
Fabricio Voznika bc81f3fe4a Remove '--file-access=direct' option
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.

PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07 12:28:48 -07:00
Fabricio Voznika f895cb4d8b Use root abstract socket namespace for exec
PiperOrigin-RevId: 211999211
Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d
2018-09-07 10:45:55 -07:00
Nicolas Lacasse 210c252089 runsc: Run sandbox process inside minimal chroot.
We construct a dir with the executable bind-mounted at /exe, and proc mounted
at /proc.  Runsc now executes the sandbox process inside this chroot, thus
limiting access to the host filesystem.  The mounts and chroot dir are removed
when the sandbox is destroyed.

Because this requires bind-mounts, we can only do the chroot if we have
CAP_SYS_ADMIN.

PiperOrigin-RevId: 211994001
Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-07 10:16:39 -07:00
Nicolas Lacasse 590d832099 runsc: Dup debug log file to stderr, so sentry panics don't get lost.
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.

If we have a debug log file, we should send panics (and all stderr data) to the
log file.

PiperOrigin-RevId: 211992321
Change-Id: I5f0d2f45f35c110a38dab86bafc695aaba42f7a3
2018-09-07 10:05:21 -07:00
Lantao Liu 4f3053cb4e runsc: do not delete in paused state.
PiperOrigin-RevId: 211835570
Change-Id: Ied7933732cad5bc60b762e9c964986cb49a8d9b9
2018-09-06 11:06:19 -07:00
Fabricio Voznika efac28976c Enable network for multi-container
PiperOrigin-RevId: 211834411
Change-Id: I52311a6c5407f984e5069359d9444027084e4d2a
2018-09-06 11:00:08 -07:00
Kevin Krakauer d95663a6b9 runsc testing: Move TestMultiContainerSignal to multi_container_test.
PiperOrigin-RevId: 211831396
Change-Id: Id67f182cb43dccb696180ec967f5b96176f252e0
2018-09-06 10:41:55 -07:00
Kevin Krakauer 8f0b6e7fc0 runsc: Support runsc kill multi-container.
Now, we can kill individual containers rather than the entire sandbox.

PiperOrigin-RevId: 211748106
Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05 21:14:56 -07:00
Fabricio Voznika 5f0002fc83 Use container's capabilities in exec
When no capabilities are specified in exec, use the
container's capabilities to match runc's behavior.

PiperOrigin-RevId: 211735186
Change-Id: Icd372ed64410c81144eae94f432dffc9fe3a86ce
2018-09-05 18:32:50 -07:00
Fabricio Voznika 12aef686af Enabled bind mounts in sub-containers
With multi-gofers, bind mounts in sub-containers should
just work. Removed restrictions and added test. There are
also a few cleanups along the way, e.g. retry unmounting
in case cleanup races with gofer teardown.

PiperOrigin-RevId: 211699569
Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
2018-09-05 14:30:09 -07:00
Fabricio Voznika 0c7cfca0da Running container should have a valid sandbox
PiperOrigin-RevId: 211693868
Change-Id: Iea340dd78bf26ae6409c310b63c17cc611c2055f
2018-09-05 14:02:45 -07:00
Fabricio Voznika 4b57fd920d Add MADVISE to fsgofer seccomp profile
PiperOrigin-RevId: 211686037
Change-Id: I0e776ca760b65ba100e495f471b6e811dbd6590a
2018-09-05 13:18:06 -07:00
Fabricio Voznika 1d22d87fdc Move multi-container test to a single file
PiperOrigin-RevId: 211685288
Change-Id: I7872f2a83fcaaa54f385e6e567af6e72320c5aa0
2018-09-05 13:13:26 -07:00
Nicolas Lacasse f96b33c73c runsc: Promote getExecutablePathInternal to getExecutablePath.
Remove GetExecutablePath (the non-internal version).  This makes path handling
more consistent between exec, root, and child containers.

The new getExecutablePath now uses MountNamespace.FindInode, which is more
robust than Walking the Dirent tree ourselves.

This also removes the last use of lstat(2) in the sentry, so that can be
removed from the filters.

PiperOrigin-RevId: 211683110
Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-05 13:01:21 -07:00
Nicolas Lacasse 0a9a40abcd runsc: Run sandbox as user nobody.
When starting a sandbox without direct file or network access, we create an
empty user namespace and run the sandbox in there.  However, the root user in
that namespace is still mapped to the root user in the parent namespace.

This CL maps the "nobody" user from the parent namespace into the child
namespace, and runs the sandbox process as user "nobody" inside the new
namespace.

PiperOrigin-RevId: 211572223
Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6
2018-09-04 20:33:05 -07:00
Nicolas Lacasse ad8648c634 runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

The specutils.ReadSpecFromFile method was fixed to always seek to the beginning
of the file before reading. This allows Files from the same FD to be read
multiple times, as we do in the boot command when the apply-caps flag is set.

Tested with --network=host.

PiperOrigin-RevId: 211570647
Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-09-04 20:10:01 -07:00
Lantao Liu 9ae4e28f75 runsc: fix container rootfs path.
PiperOrigin-RevId: 211515350
Change-Id: Ia495af57447c799909aa97bb873a50b87bee2625
2018-09-04 13:37:40 -07:00
Michael Pratt ab7174611c Remove epoll_wait from filters
Go 1.11 replaced it with epoll_pwait.

PiperOrigin-RevId: 211510006
Change-Id: I48a6cae95ed3d57a4633895358ad05ad8bf2f633
2018-09-04 13:10:09 -07:00
Fabricio Voznika 66c03b3dd7 Mounting over '/tmp' may fail
PiperOrigin-RevId: 211160120
Change-Id: Ie5f280bdac17afd01cb16562ffff6222b3184c34
2018-08-31 16:12:08 -07:00
Fabricio Voznika 7713e2cb75 Remove not used deps
PiperOrigin-RevId: 211147521
Change-Id: I9b8b67df50a3ba084c07a48c72a874d7e2007f23
2018-08-31 14:47:46 -07:00
Fabricio Voznika 7e18f158b2 Automated rollback of changelist 210995199
PiperOrigin-RevId: 211116429
Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-31 11:30:47 -07:00
Lantao Liu be9f454eb6 runsc: Set volume mount rslave.
PiperOrigin-RevId: 211111376
Change-Id: I27b8cb4e070d476fa4781ed6ecfa0cf1dcaf85f5
2018-08-31 11:03:22 -07:00
Michael Pratt 08bfb5643c Add other missing dep
runsc and runsc-race need the same deps.

PiperOrigin-RevId: 211103766
Change-Id: Ib0c97078a469656c1e5b019648589a1d07915625
2018-08-31 10:22:09 -07:00
Fabricio Voznika e669697241 Fix RunAsRoot arguments forwarding
It was including the path to the executable twice in the
arguments.

PiperOrigin-RevId: 211098311
Change-Id: I5357c51c63f38dfab551b17bb0e04011a0575010
2018-08-31 09:45:32 -07:00
Tamir Duberstein 3f04bd68b2 Add missing import
GoCompile: missing strict dependencies:
	/tmpfs/tmp/bazel/sandbox/linux-sandbox/1744/execroot/__main__/runsc/main.go:
	import of "gvisor.googlesource.com/gvisor/runsc/specutils"

This was broken in 210995199.

PiperOrigin-RevId: 211086595
Change-Id: I166b9a2ed8e4d6e624def944b720190940d7537c
2018-08-31 08:07:52 -07:00
Fabricio Voznika 3e493adf7a Add seccomp filter to fsgofer
PiperOrigin-RevId: 211011542
Change-Id: Ib5a83a00f8eb6401603c6fb5b59afc93bac52558
2018-08-30 17:30:19 -07:00
Nicolas Lacasse 5ade9350ad runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

PiperOrigin-RevId: 210995199
Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-08-30 15:47:18 -07:00
Fabricio Voznika 30c025f3ef Add argument checks to seccomp
This is required to increase protection when running in GKE.

PiperOrigin-RevId: 210635123
Change-Id: Iaaa8be49e73f7a3a90805313885e75894416f0b5
2018-08-28 17:10:03 -07:00
Michael Pratt ea113a4380 Drop support for Go 1.10
PiperOrigin-RevId: 210589588
Change-Id: Iba898bc3eb8f13e17c668ceea6dc820fc8180a70
2018-08-28 12:56:28 -07:00
Lantao Liu d8f0db9bcf runsc: unmount volume mounts when destroy container.
PiperOrigin-RevId: 210579178
Change-Id: Iae20639c5186b1a976cbff6d05bda134cd00d0da
2018-08-28 11:54:07 -07:00
Fabricio Voznika f7366e4e64 Consolidate image tests into a single file
This is to keep it consistent with other test, and
it's easier to maintain them in single file.
Also increase python test timeout to deflake it.

PiperOrigin-RevId: 210575042
Change-Id: I2ef5bcd5d97c08549f0c5f645c4b694253ef0b4d
2018-08-28 11:31:04 -07:00
Fabricio Voznika ae648bafda Add command-line parameter to trigger panic on signal
This is to troubleshoot problems with a hung process that is
not responding to 'runsc debug --stack' command.

PiperOrigin-RevId: 210483513
Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
2018-08-27 20:36:10 -07:00
Kevin Krakauer a4529c1b5b runsc: Fix readonly filesystem causing failure to create containers.
For readonly filesystems specified via relative path, we were forgetting to
mount relative to the container's bundle directory.

PiperOrigin-RevId: 210483388
Change-Id: I84809fce4b1f2056d0e225547cb611add5f74177
2018-08-27 20:34:27 -07:00
Nicolas Lacasse 0b3bfe2ea3 fs: Fix remote-revalidate cache policy.
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.

If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.

Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.

PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
2018-08-27 14:26:29 -07:00
Nicolas Lacasse 5999767d53 runsc: fsgofer should return a unique QID.Path for each file.
Previously, we were only using the host inode id as the QID path. But the host
filesystem can have multiple devices with conflicting inode ids. This resulted
in duplicate inode ids in the sentry.

This CL generates a unique QID for each <host inode, host device> pair.

PiperOrigin-RevId: 210424813
Change-Id: I16d106f61c7c8f910c0da4ceec562a010ffca2fb
2018-08-27 13:52:14 -07:00
Adin Scannell b9ded9bf39 Add runsc-race target.
PiperOrigin-RevId: 210422178
Change-Id: I984dd348d467908bc3180a20fc79b8387fcca05e
2018-08-27 13:37:03 -07:00
Fabricio Voznika db81c0b02f Put fsgofer inside chroot
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.

PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-27 11:10:14 -07:00
Nicolas Lacasse 106de2182d runsc: Terminal support for "docker exec -ti".
This CL adds terminal support for "docker exec".  We previously only supported
consoles for the container process, but not exec processes.

The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.

Note that control-character signals are still not properly supported.

Tested with:
	$ docker run --runtime=runsc -it alpine
In another terminial:
	$ docker exec -it <containerid> /bin/sh

PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24 17:43:21 -07:00
Kevin Krakauer 02dfceab6d runsc: Allow runsc to properly search the PATH for executable name.
Previously, runsc improperly attempted to find an executable in the container's
PATH.

We now search the PATH via the container's fsgofer rather than the host FS,
eliminating the confusing differences between paths on the host and within a
container.

PiperOrigin-RevId: 210159488
Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2
2018-08-24 14:42:40 -07:00
Fabricio Voznika a81a4402a2 Add option to panic gofer if writes are attempted over RO mounts
This is used when '--overlay=true' to guarantee writes are not sent to gofer.

PiperOrigin-RevId: 210116288
Change-Id: I7616008c4c0e8d3668e07a205207f46e2144bf30
2018-08-24 10:17:42 -07:00
Fabricio Voznika 001a4c2493 Clean up syscall filters
Removed syscalls that are only used by whitelistfs
which has its own set of filters.

PiperOrigin-RevId: 209967259
Change-Id: Idb2e1b9d0201043d7cd25d96894f354729dbd089
2018-08-23 11:15:07 -07:00
Kevin Krakauer a78df1d874 runsc: De-flakes container_test TestMultiContainerSanity.
The bug was caused by os.File's finalizer, which closes the file. Because
fsgofer.serve() was passed a file descriptor as an int rather than a os.File,
callers would pass os.File.Fd(), and the os.File would go out of scope. Thus,
the file would get GC'd and finalized nondeterministically, causing failures
when the file was used.

PiperOrigin-RevId: 209861834
Change-Id: Idf24d5c1f04c9b28659e62c97202ab3b4d72e994
2018-08-22 17:55:15 -07:00
Fabricio Voznika e2ab7ec39e Fix TestUnixDomainSockets failure when path is too large
UDS has a lower size limit than regular files. When running under bazel
this limit is exceeded. Test was changed to always mount /tmp and use
it for the test.

PiperOrigin-RevId: 209717830
Change-Id: I1dbe19fe2051ffdddbaa32b188a9167f446ed193
2018-08-21 23:07:39 -07:00
Kevin Krakauer ae68e9e751 Temporarily skip multi-container tests in container_test until deflaked.
PiperOrigin-RevId: 209679235
Change-Id: I527e779eeb113d0c162f5e27a2841b9486f0e39f
2018-08-21 16:21:05 -07:00
Fabricio Voznika 19ef2ad1fe nonExclusiveFS is causing timeout with --race
Not sure why, just removed for now to unblock the tests.

PiperOrigin-RevId: 209661403
Change-Id: I72785c071687d54e22bda9073d36b447d52a7018
2018-08-21 14:35:08 -07:00
Fabricio Voznika a854678bc3 Move container_test to the container package
PiperOrigin-RevId: 209655274
Change-Id: Id381114bdb3197c73e14f74b3f6cf1afd87d60cb
2018-08-21 14:02:19 -07:00
Fabricio Voznika d6d165cb0b Initial change for multi-gofer support
PiperOrigin-RevId: 209647293
Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-21 13:14:43 -07:00
Fabricio Voznika 0fc7b30695 Standardize mounts in tests
Tests get a readonly rootfs mapped to / (which was the case before)
and writable TEST_TMPDIR. This makes it easier to setup containers to
write to files and to share state between test and containers.

PiperOrigin-RevId: 209453224
Change-Id: I4d988e45dc0909a0450a3bb882fe280cf9c24334
2018-08-20 11:26:39 -07:00
Fabricio Voznika 11800311a5 Add nonExclusiveFS dimension to more tests
The ones using 'kvm' actually mean that they don't want overlay.

PiperOrigin-RevId: 209194318
Change-Id: I941a443cb6d783e2c80cf66eb8d8630bcacdb574
2018-08-17 13:07:09 -07:00
Fabricio Voznika da087e66cc Combine functions to search for file under one common function
Bazel adds the build type in front of directories making it hard to
refer to binaries in code.

PiperOrigin-RevId: 209010854
Change-Id: I6c9da1ac3bbe79766868a3b14222dd42d03b4ec5
2018-08-16 10:55:45 -07:00
Kevin Krakauer 635b0c4593 runsc fsgofer: Support dynamic serving of filesystems.
When multiple containers run inside a sentry, each container has its own root
filesystem and set of mounts. Containers are also added after sentry boot rather
than all configured and known at boot time.

The fsgofer needs to be able to serve the root filesystem of each container.
Thus, it must be possible to add filesystems after the fsgofer has already
started.

This change:
* Creates a URPC endpoint within the gofer process that listens for requests to
  serve new content.
* Enables the sentry, when starting a new container, to add the new container's
  filesystem.
* Mounts those new filesystems at separate roots within the sentry.

PiperOrigin-RevId: 208903248
Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15 16:25:22 -07:00
Nicolas Lacasse 2033f61aae runsc: Fix instances of file access "proxy".
This file access type is actually called "proxy-shared", but I forgot to update
all locations.

PiperOrigin-RevId: 208832491
Change-Id: I7848bc4ec2478f86cf2de1dcd1bfb5264c6276de
2018-08-15 09:34:18 -07:00
Nicolas Lacasse e8a4f2e133 runsc: Change cache policy for root fs and volume mounts.
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively.  While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.

This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.

This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.

A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.

All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.

PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-14 16:25:58 -07:00
Nicolas Lacasse 36c940b093 Move checkpoint/restore readme to g3doc directory.
PiperOrigin-RevId: 208282383
Change-Id: Ifa4aaf5d925b17d9a0672ea951a4570d35855300
2018-08-10 15:57:49 -07:00
Brielle Broder f213a5e0fd README for Checkpoint/Restore.
PiperOrigin-RevId: 208274833
Change-Id: Iddda875a87205f7b8fa6f5c60b547522b94a6696
2018-08-10 15:08:26 -07:00
Brielle Broder 4ececd8e8d Enable checkpoint/restore in cases of UDS use.
Previously, processes which used file-system Unix Domain Sockets could not be
checkpoint-ed in runsc because the sockets were saved with their inode
numbers which do not necessarily remain the same upon restore. Now,
the sockets are also saved with their paths so that the new inodes
can be determined for the sockets based on these paths after restoring.
Tests for cases with UDS use are included. Test cleanup to come.

PiperOrigin-RevId: 208268781
Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec
2018-08-10 14:33:20 -07:00
Fabricio Voznika 0ac912f99e Fix runsc integration_test when using --network=host
inethost doesn't support netlink and 'ifconfig' call to retrieve IP address
fails. Look up IP address in /etc/hosts instead.

PiperOrigin-RevId: 208135641
Change-Id: I3c2ce15db6fc7c3306a45e4bfb9cc5d4423ffad3
2018-08-09 17:05:24 -07:00
Fabricio Voznika 4e171f7590 Basic support for ip link/addr and ifconfig
Closes #94

PiperOrigin-RevId: 207997580
Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924
2018-08-08 22:39:58 -07:00
Fabricio Voznika ea1e39a314 Resend packets back to netstack if destined to itself
Add option to redirect packet back to netstack if it's destined to itself.
This fixes the problem where connecting to the local NIC address would
not work, e.g.:
echo bar | nc -l -p 8080 &
echo foo | nc 192.168.0.2 8080

PiperOrigin-RevId: 207995083
Change-Id: I17adc2a04df48bfea711011a5df206326a1fb8ef
2018-08-08 22:03:35 -07:00
Fabricio Voznika 0d350aac7f Enable SACK in runsc
SACK is disabled by default and needs to be manually enabled. It not only
improves performance, but also fixes hangs downloading files from certain
websites.

PiperOrigin-RevId: 207906742
Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b
2018-08-08 10:26:18 -07:00
Fabricio Voznika cb23232c37 Fix build break in test
integration_test runs manually and breakage wasn't detected. Added test to
kokoro to ensure breakages are detected in the future.

PiperOrigin-RevId: 207772835
Change-Id: Iada81b579b558477d4db3516b38366ef6a2e933d
2018-08-07 13:48:35 -07:00
Fabricio Voznika 9752174a7f Disable KVM dimension because it's making the test flaky
PiperOrigin-RevId: 207642348
Change-Id: Iacec9f097ab93b91c0c8eea61b1347e864f57a8b
2018-08-06 18:08:25 -07:00
Fabricio Voznika bc9a1fca23 Tiny reordering to network code
PiperOrigin-RevId: 207581723
Change-Id: I6e4eb1227b5ed302de5e6c891040b670955f1eea
2018-08-06 11:48:29 -07:00
Fabricio Voznika 4c1167de4e Isolate image pulling time from container startup
mysql image test is timing out sporadically and it's hard to tell
where the slow down in coming from.

PiperOrigin-RevId: 207147237
Change-Id: I05a4d2c116292695d63cf861f3b89cd1c54b6106
2018-08-02 12:42:07 -07:00
Ian Gudger 3cd7824410 Move stack clock to options struct
PiperOrigin-RevId: 207039273
Change-Id: Ib8f55a6dc302052ab4a10ccd70b07f0d73b373df
2018-08-01 20:22:02 -07:00
Fabricio Voznika 413bfb39a9 Use backoff package for retry logic
PiperOrigin-RevId: 206834838
Change-Id: I9a44c6fa5f4766a01f86e90810f025cefecdf2d4
2018-07-31 15:07:53 -07:00
Michael Pratt 6cad96f38a Drop dup2 filter
It is unused.

PiperOrigin-RevId: 206798328
Change-Id: I2d7d27c0e4a0ef51264b900f14f1b3fdad17f2c4
2018-07-31 11:38:57 -07:00
Brielle Broder 543c997978 Cleans up files created if there is a failure.
PiperOrigin-RevId: 206674267
Change-Id: Ifc4eb19e0882e8bed566e9c553af910925fe6ae2
2018-07-30 17:18:02 -07:00
Adin Scannell 3188859742 Make runsc visibility public.
(Why not?)

PiperOrigin-RevId: 206401282
Change-Id: Iadcb7fb8472de7aef7c4bf5182e9a1d339e4d259
2018-07-27 17:57:42 -07:00
Fabricio Voznika b8f96a9d0b Replace sleeps with waits in tests - part II
PiperOrigin-RevId: 206333130
Change-Id: Ic85874dbd53c5de2164a7bb75769d52d43666c2a
2018-07-27 10:10:10 -07:00
Fabricio Voznika e5adf42f66 Replace sleeps with waits in tests - part I
PiperOrigin-RevId: 206084473
Change-Id: I44e1b64b9cdd2964357799dca27cc0cbc19ce07d
2018-07-25 17:37:53 -07:00
Nicolas Lacasse 1129b35c92 runsc: Fix "exec" command when called without --pid-file.
When "exec" command is called without the "--detach" flag, we spawn a second
"exec" command and wait for that one to start. We use the pid file passed in
--pid-file to detect when this second command has started running.

However if "exec" is called with no --pid-file flag, this system breaks down,
as we don't have a pid file to wait for.

This CL ensures that the second instance of the "exec" command always writes a
pid-file, so the wait is successful.

PiperOrigin-RevId: 206002403
Change-Id: If9f2be31eb6e831734b1b833f25054ec71ab94a6
2018-07-25 09:11:45 -07:00
Justine Olshan b5113574fe Created a docker integration test for a tomcat image.
PiperOrigin-RevId: 205718733
Change-Id: I200b23af064d256f157baf9da5005ab16cc55928
2018-07-23 13:55:28 -07:00
Fabricio Voznika d7a34790a0 Add KVM and overlay dimensions to container_test
PiperOrigin-RevId: 205714667
Change-Id: I317a2ca98ac3bdad97c4790fcc61b004757d99ef
2018-07-23 13:31:42 -07:00
Justine Olshan f543ada150 Removed a now incorrect reference to restoreFile.
PiperOrigin-RevId: 205470108
Change-Id: I226878a887fe1133561005357a9e3b09428b06b6
2018-07-20 16:18:07 -07:00
Lantao Liu f62d6dd453 runsc: copy gateway from the pod network interface.
PiperOrigin-RevId: 205334841
Change-Id: Ia60d486f9aae70182fdc4af50cf7c915986126d7
2018-07-19 18:09:56 -07:00
Justine Olshan c05660373e Moved restore code out of create and made to be called after create.
Docker expects containers to be created before they are restored.
However, gVisor restoring requires specificactions regarding the kernel
and the file system. These actions were originally in booting the sandbox.

Now setting up the file system is deferred until a call to a call to
runsc start. In the restore case, the kernel is destroyed and a new kernel
is created in the same process, as we need the same process for Docker.

These changes required careful execution of concurrent processes which
required the use of a channel.

Full docker integration still needs the ability to restore into the same
container.

PiperOrigin-RevId: 205161441
Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-18 16:58:30 -07:00
Nicolas Lacasse e5d8f99c60 runsc: Fixes to CheckpointRestoreTest.
We must delete the output file at the beginning of the test, otherwise the test
fails immediately.

Also some minor cleanups in readOutputFile.

PiperOrigin-RevId: 205150525
Change-Id: I6bae1acd5b315320a2c6e25a59afcfc06267fb17
2018-07-18 15:46:37 -07:00
Nicolas Lacasse 9059983fdb runsc: Fix map access race in boot.Loader.waitContainer.
PiperOrigin-RevId: 204522004
Change-Id: I4819dc025f0a1df03ceaaba7951b1902d44562b3
2018-07-13 13:46:14 -07:00
Nicolas Lacasse 6dce46d4c0 Bump the timeout when waiting for python HTTP server.
PiperOrigin-RevId: 204511630
Change-Id: Ib841a7144f3833321b0e69b8585b03c4ed55a265
2018-07-13 12:34:04 -07:00
Nicolas Lacasse 67507bd579 runsc: Don't close the control server in a defer.
Closing the control server will block until all open requests have completed.
If a control server method panics, we end up stuck because the defer'd Destroy
function will never return.

PiperOrigin-RevId: 204354676
Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
2018-07-12 13:36:57 -07:00
Bhasker Hariharan c15cb8d432 Automated rollback of changelist 203157739
PiperOrigin-RevId: 204196916
Change-Id: If632750fc6368acb835e22cfcee0ae55c8a04d16
2018-07-11 15:07:19 -07:00
Justine Olshan 81ae5f3df5 Created runsc and docker integration tests.
Moved some of the docker image functions to testutil.go.
Test runsc commands create, start, stop, pause, and resume.

PiperOrigin-RevId: 204138452
Change-Id: Id00bc58d2ad230db5e9e905eed942187e68e7c7b
2018-07-11 09:37:28 -07:00
Brielle Broder b763b3992a Modified error message for clarity.
Previously, error message only showed "<nil>" when child and pid were the
same (since no error is returned by the Wait4 syscall in this case) which
occurs when the process has incorrectly terminated. A new error message
was added to improve clarity for such a case. Tests for this function were
modified to reflect the improved distinction between process termination
and error.

PiperOrigin-RevId: 204018107
Change-Id: Ib38481c9590405e5bafcb6efe27fd49b3948910c
2018-07-10 14:58:12 -07:00
Justine Olshan f107a5b1a0 Tests pause and resume functionality on a Python container.
PiperOrigin-RevId: 203488336
Change-Id: I55e1b646f1fae73c27a49e064875d55f5605b200
2018-07-06 09:39:01 -07:00
Michael Pratt 660f1203ff Fix runsc VDSO mapping
80bdf8a406 accidentally moved vdso into an
inner scope, never assigning the vdso variable passed to the Kernel and
thus skipping VDSO mappings.

Fix this and remove the ability for loadVDSO to skip VDSO mappings,
since tests that do so are gone.

PiperOrigin-RevId: 203169135
Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba
2018-07-03 12:53:39 -07:00
Fabricio Voznika 52ddb8571c Skip overlay on root when its readonly
PiperOrigin-RevId: 203161098
Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
2018-07-03 12:01:09 -07:00
Lantao Liu 138cb8da50 runsc: `runsc wait` print wait status.
PiperOrigin-RevId: 203160639
Change-Id: I8fb2787ba0efb7eacd9d4c934238a26eb5ae79d5
2018-07-03 11:58:12 -07:00
Fabricio Voznika 0ef6066167 Resend packets back to netstack if destined to itself
Add option to redirect packet back to netstack if it's destined to itself.
This fixes the problem where connecting to the local NIC address would
not work, e.g.:
echo bar | nc -l -p 8080 &
echo foo | nc 192.168.0.2 8080

PiperOrigin-RevId: 203157739
Change-Id: I31c9f7c501e3f55007f25e1852c27893a16ac6c4
2018-07-03 11:39:17 -07:00
Fabricio Voznika c1b4c1ffee Fix flaky image_test
- Some failures were being ignored in run_tests.sh
- Give more time for mysql to setup
- Fix typo with network=host tests
- Change httpd test to wait on http server being available, not only output

PiperOrigin-RevId: 203156896
Change-Id: Ie1801dcd76e9b5fe4722c4d8695c76e40988dd74
2018-07-03 11:34:15 -07:00
Nicolas Lacasse 4500155ffc runsc: Mount "mandatory" mounts right after mounting the root.
The /proc and /sys mounts are "mandatory" in the sense that they should be
mounted in the sandbox even when they are not included in the spec. Runsc
treats /tmp similarly, because it is faster to use the internal tmpfs
implementation instead of proxying to the host.

However, the spec may contain submounts of these mandatory mounts (particularly
for /tmp). In those cases, we must mount our mandatory mounts before the
submount, otherwise the submount will be masked.

Since the mandatory mounts are all top-level directories, we can mount them
right after the root.

PiperOrigin-RevId: 203145635
Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf
2018-07-03 10:36:22 -07:00
Dmitry Vyukov 6144751962 runsc/boot/filter: permit SYS_TIME for race
glibc's malloc also uses SYS_TIME. Permit it.

#0  0x0000000000de6267 in time ()
#1  0x0000000000db19d8 in get_nprocs ()
#2  0x0000000000d8a31a in arena_get2.part ()
#3  0x0000000000d8ab4a in malloc ()
#4  0x0000000000d3c6b5 in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 140737488355328ull, 0ul, __sanitizer::SizeClassMap<3ul, 4ul, 8ul, 17ul, 64ul, 14ul>, 20ul, __sanitizer::TwoLevelByteMap<32768ull, 4096ull, __sanitizer::NoOpMapUnmapCallback>, __sanitizer::NoOpMapUnmapCallback> >*, unsigned long) ()
#5  0x0000000000d4cd70 in __tsan_go_start ()
#6  0x00000000004617a3 in racecall ()
#7  0x00000000010f4ea0 in runtime.findfunctab ()
#8  0x000000000043f193 in runtime.racegostart ()

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
[mpratt@google.com: updated comments and commit message]
Signed-off-by: Michael Pratt <mpratt@google.com>

Change-Id: Ibe2d0dc3035bf5052d5fb802cfaa37c5e0e7a09a
PiperOrigin-RevId: 203042627
2018-07-02 17:47:32 -07:00
Lantao Liu 126296ce2a runsc: fix panic for `runsc wait` on stopped container.
PiperOrigin-RevId: 203016694
Change-Id: Ic51ef754aa6d7d1b3b35491aff96a63d7992e122
2018-07-02 14:52:21 -07:00
Fabricio Voznika fa64c2a151 Make default limits the same as with runc
Closes #2

PiperOrigin-RevId: 202997196
Change-Id: I0c9f6f5a8a1abe1ae427bca5f590bdf9f82a6675
2018-07-02 12:51:38 -07:00
Brielle Broder ca353b53ed Fix typo.
PiperOrigin-RevId: 202720658
Change-Id: Iff42fd23f831ee7f29ddd6eb867020b76ed1eb23
2018-06-29 15:51:32 -07:00
Justine Olshan 80bdf8a406 Sets the restore environment for restoring a container.
Updated how restoring occurs through boot.go with a separate Restore function.
This prevents a new process and new mounts from being created.
Added tests to ensure the container is restored.
Registered checkpoint and restore commands so they can be used.
Docker support for these commands is still limited.
Working on #80.

PiperOrigin-RevId: 202710950
Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-29 14:47:40 -07:00
Brielle Broder 25e315c2e1 Added leave-running flag for checkpoint.
The leave-running flag allows the container to continue running after a
checkpoint has occurred by doing an immediate restore into a new
container with the same container ID after the old container is destroyed.

Updates #80.

PiperOrigin-RevId: 202695426
Change-Id: Iac50437f5afda018dc18b24bb8ddb935983cf336
2018-06-29 13:09:33 -07:00
Kevin Krakauer 16d37973eb runsc: Add the "wait" subcommand.
Users can now call "runsc wait <container id>" to wait on a particular process
inside the container. -pid can also be used to wait on a specific PID.

Manually tested the wait subcommand for a single waiter and multiple waiters
(simultaneously 2 processes waiting on the container and 2 processes waiting on
a PID within the container).

PiperOrigin-RevId: 202548978
Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
2018-06-28 14:56:36 -07:00
Fabricio Voznika 5a8e014c3d Add more image tests
PiperOrigin-RevId: 202537696
Change-Id: I900fe8fd36cc7a4edb44fe2d03f8ba6768db53cb
2018-06-28 13:54:04 -07:00
Fabricio Voznika bb31a11903 Wait for sandbox process when waiting for root container
Closes #71

PiperOrigin-RevId: 202532762
Change-Id: I80a446ff638672ff08e6fd853cd77e28dd05d540
2018-06-28 13:23:04 -07:00
Fabricio Voznika 8459390cdd Error out if spec is invalid
Closes #66

PiperOrigin-RevId: 202496258
Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
2018-06-28 09:57:27 -07:00
Fabricio Voznika 1f207de315 Add option to configure watchdog action
PiperOrigin-RevId: 202494747
Change-Id: I4d4a18e71468690b785060e580a5f83c616bd90f
2018-06-28 09:46:50 -07:00
Brielle Broder f93043615f Added MkdirAll capabilities for Checkpoint's image-path.
Now able to save the state file (checkpoint.img) at an image-path that had
previously not existed. This is important because there can only be one
checkpoint.img file per directory so this will enable users to create as many
directories as needed for proper organization.

PiperOrigin-RevId: 202360414
Change-Id: If5dd2b72e08ab52834a2b605571186d107b64526
2018-06-27 13:32:53 -07:00
Fabricio Voznika c186e408cc Add KVM, overlay and host network to image tests
PiperOrigin-RevId: 202236006
Change-Id: I4ea964a70fc49e8b51c9da27d77301c4eadaae71
2018-06-26 19:05:50 -07:00
Lantao Liu 000fd8d1e4 runsc: set gofer umask to 0.
PiperOrigin-RevId: 202185642
Change-Id: I2eefcc0b2ffadc6ef21d177a8a4ab0cda91f3399
2018-06-26 13:40:04 -07:00
Lantao Liu e8ae2b85e9 runsc: add a `multi-container` flag to enable multi-container support.
PiperOrigin-RevId: 201995800
Change-Id: I770190d135e14ec7da4b3155009fe10121b2a502
2018-06-25 12:08:44 -07:00
Fabricio Voznika cecc1e472c Fix lint errors
PiperOrigin-RevId: 201978212
Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067
2018-06-25 10:41:27 -07:00
Kevin Krakauer 04bdcc7b65 runsc: Enable waiting on individual containers within a sandbox.
PiperOrigin-RevId: 201742160
Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31
2018-06-22 14:31:25 -07:00
Brielle Broder e1aee51d09 Modified Checkpoint/Restore flags to improve compatibility with Docker.
Added a number of unimplemented flags required for using runsc's
Checkpoint and Restore with Docker. Modified the "image-path" flag to
require a directory instead of a file.

PiperOrigin-RevId: 201697486
Change-Id: I55883df2f1bbc3ec3c395e0ca160ce189e5e7eba
2018-06-22 09:41:26 -07:00
Fabricio Voznika f6be5fe619 Forward SIGUSR2 to the sandbox too
SIGUSR2 was being masked out to be used as a way to dump sentry
stacks. This could cause compatibility problems in cases anyone
uses SIGUSR2 to communicate with the container init process.

PiperOrigin-RevId: 201575374
Change-Id: I312246e828f38ad059139bb45b8addc2ed055d74
2018-06-21 13:22:18 -07:00
Justine Olshan f2a687001d Added functionality to create a RestoreEnvironment.
Before a container can be restored, the mounts must be configured.
The root and submounts and their key information is compiled into a
RestoreEnvironment.
Future code will be added to set this created environment before
restoring a container.
Tests to ensure the correct environment were added.

PiperOrigin-RevId: 201544637
Change-Id: Ia894a8b0f80f31104d1c732e113b1d65a4697087
2018-06-21 10:18:11 -07:00
Brielle Broder 7d6149063a Restore implementation added to runsc.
Restore creates a new container and uses the given image-path to load a saved
image of a previous container. Restore command is plumbed through container
and sandbox. This command does not work yet - more to come.

PiperOrigin-RevId: 201541229
Change-Id: I864a14c799ce3717d99bcdaaebc764281863d06f
2018-06-21 09:58:24 -07:00
Nicolas Lacasse 81d13fbd4d runsc: Default umask should be 0.
PiperOrigin-RevId: 201539050
Change-Id: I36cbf270fa5ad25de507ecb919e4005eda6aa16d
2018-06-21 09:43:15 -07:00
Ian Gudger ef4f239c79 Fix typo in runsc gofer flag description
PiperOrigin-RevId: 201529295
Change-Id: I55eb516ec6d14fbcd48593a3d61f724adc253a23
2018-06-21 08:34:51 -07:00
Fabricio Voznika 95cb01e0a9 Reduce test sleep time
PiperOrigin-RevId: 201428433
Change-Id: I72de1e46788ec84f61513416bb690956e515907e
2018-06-20 15:32:15 -07:00
Fabricio Voznika 2f59ba0e2d Include image test as part of kokoro tests
PiperOrigin-RevId: 201427731
Change-Id: I5cbee383ec51c02b7892ec7812cbbdc426be8991
2018-06-20 15:28:12 -07:00
Fabricio Voznika 2b5bdb525e Add end-to-end image tests
PiperOrigin-RevId: 201418619
Change-Id: I7961b027394d98422642f829bc54745838c138bd
2018-06-20 14:38:45 -07:00
Fabricio Voznika 4ad7315b67 Add 'runsc debug' command
It prints sandbox stacks to the log to help debug stuckness. I expect
that many more options will be added in the future.

PiperOrigin-RevId: 201405931
Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e
2018-06-20 13:31:31 -07:00
Fabricio Voznika af6f9f56f8 Add tool to configure runtime settings in docker
This will be used with the upcoming e2e image tests.

PiperOrigin-RevId: 201400832
Change-Id: I49509314e16ea54655ea8060dbf511a04a7a8f79
2018-06-20 13:01:16 -07:00
Kevin Krakauer 5397963b5d runsc: Enable container creation within existing sandboxes.
Containers are created as processes in the sandbox. Of the many things that
don't work yet, the biggest issue is that the fsgofer is launched with its root
as the sandbox's root directory. Thus, when a container is started and wants to
read anything (including the init binary of the container), the gofer tries to
serve from sandbox's root (which basically just has pause), not the container's.

PiperOrigin-RevId: 201294560
Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1
2018-06-19 21:44:33 -07:00
Kevin Krakauer 3ebd0e35f4 runsc: Whitelist lstat, as it is now used in specutils.
When running multi-container, child containers are added after the filters have
been installed. Thus, lstat must be in the set of allowed syscalls.

PiperOrigin-RevId: 201269550
Change-Id: I03f2e6675a53d462ed12a0f651c10049b76d4c52
2018-06-19 17:17:41 -07:00
Kevin Krakauer 33f29c730f runsc: Fix flakey container_test.
Verified that this is no longer flakey over 10K repetitions.

PiperOrigin-RevId: 201267499
Change-Id: I793c916fe725412aec25953f764cb4f52c9fbed3
2018-06-19 17:04:51 -07:00
Justine Olshan a6dbef045f Added a resume command to unpause a paused container.
Resume checks the status of the container and unpauses the kernel
if its status is paused. Otherwise nothing happens.
Tests were added to ensure that the process is in the correct state
after various commands.

PiperOrigin-RevId: 201251234
Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19
2018-06-19 15:23:36 -07:00
Justine Olshan 873ec0c414 Modified boot.go to allow for restores.
A file descriptor was added as a flag to boot so a state file can restore a
container that was checkpointed.

PiperOrigin-RevId: 201068699
Change-Id: I18e96069488ffa3add468861397f3877725544aa
2018-06-18 15:20:36 -07:00
Lantao Liu f3727528e5 runsc: support symlink to the exec path.
PiperOrigin-RevId: 201049912
Change-Id: Idd937492217a4c2ca3d59c602e41576a3b203dd9
2018-06-18 13:37:59 -07:00
Lantao Liu 821aaf531d runsc: support "rw" mount option.
PiperOrigin-RevId: 201018483
Change-Id: I52fe3d01c83c8a2f0e9275d9d88c37e46fa224a2
2018-06-18 10:34:11 -07:00
Fabricio Voznika 775982ed4b Automated rollback of changelist 200770591
PiperOrigin-RevId: 201012131
Change-Id: I5cd69e795555129319eb41135ecf26db9a0b1fcb
2018-06-18 10:00:04 -07:00
Justine Olshan 0786707cd9 Added code for a pause command for a container process.
Like runc, the pause command will pause the processes of the given container.
It will set that container's status to "paused."
A resume command will be be added to unpause and continue running the process.

PiperOrigin-RevId: 200789624
Change-Id: I72a5d7813d90ecfc4d01cc252d6018855016b1ea
2018-06-15 16:09:09 -07:00
Kevin Krakauer 437890dc4b runsc: Make gofer logs show up in test output.
PiperOrigin-RevId: 200770591
Change-Id: Ifc096d88615b63135210d93c2b4cee2eaecf1eee
2018-06-15 14:07:54 -07:00
Lantao Liu 2081c5e7f7 runsc: support /dev bind mount which does not conflict with default /dev mount.
PiperOrigin-RevId: 200768923
Change-Id: I4b8da10bcac296e8171fe6754abec5aabfec5e65
2018-06-15 13:58:39 -07:00
Dmitry Vyukov 52110bfc33 runsc/cmd: fix kill signal parsing
Signal is arg 1, not 2.
Killing with SIGABRT is useful to get Go traces.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>

Change-Id: I0b78e34a9de3fb3385108e26fdb4ff6e9347aeff
PiperOrigin-RevId: 200742743
2018-06-15 11:06:07 -07:00
Fabricio Voznika ef5dd4df9b Set kernel.applicationCores to the number of processor on the host
The right number to use is the number of processors assigned to the cgroup. But until
we make the sandbox join the respective cgroup, just use the number of processors on
the host.

Closes #65, closes #66

PiperOrigin-RevId: 200725483
Change-Id: I34a566b1a872e26c66f56fa6e3100f42aaf802b1
2018-06-15 09:19:04 -07:00
Brielle Broder bd1e83ff60 Fix typo.
PiperOrigin-RevId: 200631795
Change-Id: I297fe3e30fb06b04fccd8358c933e45019dcc1fa
2018-06-14 15:45:10 -07:00
Michael Pratt d71f5ef688 Add nanosleep filter for Go 1.11 support
golang.org/cl/108538 replaces pselect6 with nanosleep in runtime.usleep. Update
the filters accordingly.

PiperOrigin-RevId: 200574612
Change-Id: Ifb2296fcb3781518fc047aabbbffedb9ae488cd7
2018-06-14 10:11:05 -07:00
Fabricio Voznika 717f2501c9 Fix failure to mount volume that sandbox process has no access
Boot loader tries to stat mount to determine whether it's a file or not. This
may file if the sandbox process doesn't have access to the file. Instead, add
overlay on top of file, which is better anyway since we don't want to propagate
changes to the host.

PiperOrigin-RevId: 200411261
Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
2018-06-13 10:20:06 -07:00
Lantao Liu 2506b9b11f runsc: do not include sub target if it is not started with '/'.
PiperOrigin-RevId: 200274828
Change-Id: I956703217df08d8650a881479b7ade8f9f119912
2018-06-12 13:54:54 -07:00
Brielle Broder 711a9869e5 Runsc checkpoint works.
This is the first iteration of checkpoint that actually saves to a file.
Tests for checkpoint are included.

Ran into an issue when private unix sockets are enabled. An error message
was added for this case and the mutex state was set.

PiperOrigin-RevId: 200269470
Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
2018-06-12 13:25:23 -07:00
Kevin Krakauer 2dc9cd7bf7 runsc: enable terminals in the sandbox.
runsc now mounts the devpts filesystem, so you get a real terminal using
ssh+sshd.

PiperOrigin-RevId: 200244830
Change-Id: If577c805ad0138fda13103210fa47178d8ac6605
2018-06-12 11:03:25 -07:00
Fabricio Voznika 48335318a2 Enable debug logging in tests
Unit tests call runsc directly now, so all command line arguments
are valid. On the other hand, enabling debug in the test binary
doesn't affect runsc. It needs to be set in the config.

PiperOrigin-RevId: 200237706
Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7
2018-06-12 10:25:55 -07:00
Fabricio Voznika 5c51bc51e4 Drop capabilities not needed by Gofer
PiperOrigin-RevId: 199808391
Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
2018-06-08 09:59:26 -07:00
Kevin Krakauer 206e90d057 runsc: Support abbreviated container IDs.
Just a UI/usability addition. It's a lot easier to type "60" than
"60185c721d7e10c00489f1fa210ee0d35c594873d6376b457fb1815e4fdbfc2c".

PiperOrigin-RevId: 199547932
Change-Id: I19011b5061a88aba48a9ad7f8cf954a6782de854
2018-06-06 16:13:53 -07:00
Googler 0c34b460f2 Add runsc checkpoint command.
Checkpoint command is plumbed through container and sandbox.
Restore has also been added but it is only a stub. None of this
works yet. More changes to come.

PiperOrigin-RevId: 199510105
Change-Id: Ibd08d57f4737847eb25ca20b114518e487320185
2018-06-06 12:31:53 -07:00
Googler 722275c3d1 Added a function to the controller to checkpoint a container.
Functionality for checkpoint is not complete, more to come.

PiperOrigin-RevId: 199500803
Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7
2018-06-06 11:43:55 -07:00
Fabricio Voznika 19a0e83b50 Make fsgofer attach more strict
Refuse to mount paths with "." and ".." in the path to prevent
a compromised Sentry to mount "../../secrets". Only allow
Attach to be called once per mount point.

PiperOrigin-RevId: 199225929
Change-Id: I2a3eb7ea0b23f22eb8dde2e383e32563ec003bd5
2018-06-04 18:04:54 -07:00
Fabricio Voznika 6c585b8eb6 Create destination mount dir if it doesn't exist
PiperOrigin-RevId: 199175296
Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
2018-06-04 12:31:35 -07:00
Fabricio Voznika 78ccd1298e Return 'running' if gofer is still alive
Containerd will start deleting container and rootfs after container
is stopped. However, if gofer is still running, rootfs cleanup will
fail because of device busy.

This CL makes sure that gofer is not running when container state is
stopped.

Change from: lantaol@google.com

PiperOrigin-RevId: 199172668
Change-Id: I9d874eec3ecf74fd9c8edd7f62d9f998edef66fe
2018-06-04 12:14:23 -07:00
Fabricio Voznika 55a37ceef1 Fix leaky FD
9P socket was being created without CLOEXEC and was being inherited
by the children. This would prevent the gofer from detecting that the
sandbox had exited, because the socket would not be closed.

PiperOrigin-RevId: 199168959
Change-Id: I3ee1a07cbe7331b0aeb1cf2b697e728ce24f85a7
2018-06-04 11:52:17 -07:00
Fabricio Voznika a0e2126be4 Refactor container_test in preparation for sandbox_test
Common code to setup and run sandbox is moved to testutil. Also, don't
link "boot" and "gofer" commands with test binary. Instead, use runsc
binary from the build. This not only make the test setup simpler, but
also resolves a dependency issue with sandbox_tests not depending on
container package.

PiperOrigin-RevId: 199164478
Change-Id: I27226286ca3f914d4d381358270dd7d70ee8372f
2018-06-04 11:26:30 -07:00
Zhengyu He d1ca50d49e Add SyscallRules that supports argument filtering
PiperOrigin-RevId: 198919043
Change-Id: I7f1f0a3b3430cd0936a4ee4fc6859aab71820bdf
2018-06-01 13:40:52 -07:00
Fabricio Voznika 65dadc0029 Ignores IPv6 addresses when configuring network
Closes #60

PiperOrigin-RevId: 198887885
Change-Id: I9bf990ee3fde9259836e57d67257bef5b85c6008
2018-06-01 10:09:37 -07:00
Fabricio Voznika 812e83d3bb Supress error when deleting non-existing container with --force
This addresses the first issue reported in #59. CRI-O expects runsc to
return success to delete when --force is used with a non-existing container.

PiperOrigin-RevId: 198487418
Change-Id: If7660e8fdab1eb29549d0a7a45ea82e20a1d4f4a
2018-05-29 17:58:12 -07:00
Fabricio Voznika e48f707876 Configure sandbox as superuser
Container user might not have enough priviledge to walk directories and
mount filesystems. Instead, create superuser to perform these steps of
the configuration.

PiperOrigin-RevId: 197953667
Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb
2018-05-24 14:27:57 -07:00
Fabricio Voznika ed2b86a549 Fix test failure when user can't mount temp dir
PiperOrigin-RevId: 197491098
Change-Id: Ifb75bd4e4f41b84256b6d7afc4b157f6ce3839f3
2018-05-21 17:48:04 -07:00
Rahat Mahmood 8878a66a56 Implement sysv shm.
PiperOrigin-RevId: 197058289
Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
2018-05-17 15:06:19 -07:00
Nicolas Lacasse 31386185fe Push signal-delivery and wait into the sandbox.
This is another step towards multi-container support.

Previously, we delivered signals directly to the sandbox process (which then
forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a
container by waiting on the sandbox process itself. This approach will not work
when there are multiple containers inside the sandbox, and we need to
signal/wait on individual containers.

This CL adds two new messages, ContainerSignal and ContainerWait. These
messages include the id of the container to signal/wait. The controller inside
the sandbox receives these messages and signals/waits on the appropriate
process inside the sandbox.

The container id is plumbed into the sandbox, but it currently is not used. We
still end up signaling/waiting on PID 1 in all cases.  Once we actually have
multiple containers inside the sandbox, we will need to keep some sort of map
of container id -> pid (or possibly pid namespace), and signal/kill the
appropriate process for the container.

PiperOrigin-RevId: 197028366
Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-05-17 11:55:28 -07:00
Nicolas Lacasse 205f1027e6 Refactor the Sandbox package into Sandbox + Container.
This is a necessary prerequisite for supporting multiple containers in a single
sandbox.

All the commands (in cmd package) now call operations on Containers (container
package). When a Container first starts, it will create a Sandbox with the same
ID.

The Sandbox class is now simpler, as it only knows how to create boot/gofer
processes, and how to forward commands into the running boot process.

There are TODOs sprinkled around for additional support for multiple
containers. Most notably, we need to detect when a container is intended to run
in an existing sandbox (by reading the metadata), and then have some way to
signal to the sandbox to start a new container. Other urpc calls into the
sandbox need to pass the container ID, so the sandbox can run the operation on
the given container. These are only half-plummed through right now.

PiperOrigin-RevId: 196688269
Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0
2018-05-15 10:18:03 -07:00
Fabricio Voznika 7cff8489de Fix failure to rename directory
os.Rename validates that the target doesn't exist, which is different from
syscall.Rename which replace the target if both are directories. fsgofer needs
the syscall behavior.

PiperOrigin-RevId: 196194630
Change-Id: I87d08cad88b5ef310b245cd91647c4f5194159d8
2018-05-10 17:13:10 -07:00
Chanwit Kaewkasi 7b6111b695 Display the current git revision in the info block
Change-Id: I9737cc680968033ba82c95bb04cc482fcaa12642
PiperOrigin-RevId: 196192683
2018-05-10 16:57:41 -07:00
Fabricio Voznika ac01f245ff Skip atime and mtime update when file is backed by host FD
When file is backed by host FD, atime and mtime for the host file and the
cached attributes in the Sentry must be close together. In this case,
the call to update atime and mtime can be skipped. This is important when
host filesystem is using overlay because updating atime and mtime explicitly
forces a copy up for every file that is touched.

PiperOrigin-RevId: 196176413
Change-Id: I3933ea91637a071ba2ea9db9d8ac7cdba5dc0482
2018-05-10 14:59:40 -07:00
Fabricio Voznika 5a509c47a2 Open file as read-write when mount points to a file
This is to allow files mapped directly, like /etc/hosts, to be writable.
Closes #40

PiperOrigin-RevId: 196155920
Change-Id: Id2027e421cef5f94a0951c3e18b398a77c285bbd
2018-05-10 12:38:36 -07:00
Nicolas Lacasse 1bdec86bae Return better errors from Docker when runsc fails to start.
Two changes in this CL:

First, make the "boot" process sleep when it encounters an error to give the
controller time to send the error back to the "start" process. Otherwise the
"boot" process exits immediately and the control connection errors with EOF.

Secondly, open the log file with O_APPEND, not O_TRUNC. Docker uses the same
log file for all runtime commands, and setting O_TRUNC causes them to get
destroyed. Furthermore, containerd parses these log files in the event of an
error, and it does not like the file being truncated out from underneath it.

Now, when trying to run a binary that does not exist in the image, the error
message is more reasonable:

$ docker run alpine /not/found
docker: Error response from daemon: OCI runtime start failed: /usr/local/google/docker/runtimes/runscd did not terminate sucessfully: error starting sandbox: error starting application [/not/found]: failed to create init process: no such file or directory

Fixes #32

PiperOrigin-RevId: 196027084
Change-Id: Iabc24c0bdd8fc327237acc051a1655515f445e68
2018-05-09 14:13:37 -07:00
Nicolas Lacasse 32cabad8da Use the containerd annotation instead of detecting the "pause" application.
FIXED=72380268
PiperOrigin-RevId: 195846596
Change-Id: Ic87fed1433482a514631e1e72f5ee208e11290d1
2018-05-08 11:11:50 -07:00
Fabricio Voznika e1b412d660 Error if container requires AppArmor, SELinux or seccomp
Closes #35

PiperOrigin-RevId: 195840128
Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29
2018-05-08 10:34:11 -07:00
Ian Gudger 7c8c3705ea Fix misspellings
PiperOrigin-RevId: 195742598
Change-Id: Ibd4a8e4394e268c87700b6d1e50b4b37dfce5182
2018-05-07 16:38:01 -07:00
Ian Gudger f47174f06b Run gofmt -s on everything
PiperOrigin-RevId: 195469901
Change-Id: I66d5c7a334bbb8b47e40d266a2661291c2d91c7f
2018-05-04 14:16:11 -07:00
Fabricio Voznika c90fefc116 Fix runsc capabilities
There was a typo and one new capability missing from the list

PiperOrigin-RevId: 195427713
Change-Id: I6d9e1c6e77b48fe85ef10d9f54c70c8a7271f6e7
2018-05-04 09:39:28 -07:00
Fabricio Voznika c186ebb62a Return error when child exits early
PiperOrigin-RevId: 195365050
Change-Id: I8754dc7a3fc2975d422cae453762a455478a8e6a
2018-05-03 21:09:31 -07:00
Cyrille Hemidy 04b79137ba Fix misspellings.
PiperOrigin-RevId: 195307689
Change-Id: I499f19af49875a43214797d63376f20ae788d2f4
2018-05-03 14:06:13 -07:00
Fabricio Voznika a61def1b36 Remove detach for exec options
Detachable exec commands are handled in the client entirely and the detach option is not used anymore.

PiperOrigin-RevId: 195181272
Change-Id: I6e82a2876d2c173709c099be59670f71702e5bf0
2018-05-02 17:40:01 -07:00
Ian Gudger eb5414ee29 Add support for ping sockets
PiperOrigin-RevId: 195049322
Change-Id: I09f6dd58cf10a2e50e53d17d2823d540102913c5
2018-05-01 22:51:41 -07:00
Ian Gudger 3d3deef573 Implement SO_TIMESTAMP
PiperOrigin-RevId: 195047018
Change-Id: I6d99528a00a2125f414e1e51e067205289ec9d3d
2018-05-01 22:11:49 -07:00
Fabricio Voznika 5eab7a41a3 Remove stale TODO
PiperOrigin-RevId: 194949678
Change-Id: I60a30c4bb7418e17583c66f437273fd17e9e99ba
2018-05-01 09:45:45 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00