Commit Graph

71 Commits

Author SHA1 Message Date
Lantao Liu 4cd4b60352 runsc: generate exec pidfile after everything is ready.
PiperOrigin-RevId: 221123160
Change-Id: Ia7061d60d114d69f49aba853fe6bae3c733522b5
2018-11-12 11:12:39 -08:00
Fabricio Voznika 5cd55cd90f Use spec with clean paths for gofer
Otherwise the gofer's attach point may be different from sandbox when there
symlinks in the path.

PiperOrigin-RevId: 219730492
Change-Id: Ia9c4c2d16228c6a1a9e790e0cb673fd881003fe1
2018-11-01 17:52:11 -07:00
Fabricio Voznika ccc3d7ca11 Make lazy open the mode of operation for fsgofer
With recent changes to 9P server, path walks are now safe inside
open, create, rename and setattr calls. To simplify the code, remove
the lazyopen=false mode that was used for bind mounts, and converge
all mounts to using lazy open.

PiperOrigin-RevId: 219508628
Change-Id: I073e7e1e2e9a9972d150eaf4cb29e553997a9b76
2018-10-31 11:28:27 -07:00
Adin Scannell 75cd70ecc9 Track paths and provide a rename hook.
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.

PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-23 00:20:15 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Fabricio Voznika f3ffa4db52 Resolve mount paths while setting up root fs mount
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.

Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.

It haven't been able to write a good test for it. So I'm relying on manual tests
for now.

PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-18 12:42:24 -07:00
Kevin Krakauer 9b3550f70b runsc: Add --pid flag to runsc kill.
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.

PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-17 10:51:39 -07:00
Ian Lewis a771775f3a Added spec command to create OCI spec config.json
The spec command is analygous to the 'runc spec' command and allows for
the convenient creation of a config.json file for users that don't have
runc handy.

Change-Id: Ifdfec37e023048ea461c32da1a9042a45b37d856
PiperOrigin-RevId: 216907826
2018-10-12 12:59:49 -07:00
Fabricio Voznika f413e4b117 Add bare bones unsupported syscall logging
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.

For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.

PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-11 11:56:54 -07:00
Fabricio Voznika 29cd05a7c6 Add sandbox to cgroup
Sandbox creation uses the limits and reservations configured in the
OCI spec and set cgroup options accordinly. Then it puts both the
sandbox and gofer processes inside the cgroup.

It also allows the cgroup to be pre-configured by the caller. If the
cgroup already exists, sandbox and gofer processes will join the
cgroup but it will not modify the cgroup with spec limits.

PiperOrigin-RevId: 216538209
Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-10 09:00:42 -07:00
Fabricio Voznika 20508bafb8 Add tests to verify gofer is chroot'ed
PiperOrigin-RevId: 216472439
Change-Id: Ic4cb86c8e0a9cb022d3ceed9dc5615266c307cf9
2018-10-09 21:07:14 -07:00
Nicolas Lacasse e215b9970a runsc: Pass root container's stdio via FD.
We were previously using the sandbox process's stdio as the root container's
stdio. This makes it difficult/impossible to distinguish output application
output from sandbox output, such as panics, which are always written to stderr.

Also close the console socket when we are done with it.

PiperOrigin-RevId: 215585180
Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-10-03 10:32:03 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Fabricio Voznika 9c7eb13079 Removed duplicate/stale TODOs
PiperOrigin-RevId: 215162121
Change-Id: I35f06ac3235cf31c9e8a158dcf6261a7ded6c4c4
2018-09-30 22:22:18 -07:00
Fabricio Voznika 2496d9b4b6 Make runsc kill and delete more conformant to the "spec"
PiperOrigin-RevId: 214976251
Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28 12:22:21 -07:00
Fabricio Voznika cf226d48ce Switch to root in userns when CAP_SYS_CHROOT is also missing
Some tests check current capabilities and re-run the tests as root inside
userns if required capabibilities are missing. It was checking for
CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now.

PiperOrigin-RevId: 214949226
Change-Id: Ic81363969fa76c04da408fae8ea7520653266312
2018-09-28 09:44:13 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Fabricio Voznika b514ab0589 Refactor 'runsc boot' to take container ID as argument
This makes the flow slightly simpler (no need to call
Loader.SetRootContainer). And this is required change to tag
tasks with container ID inside the Sentry.

PiperOrigin-RevId: 214795210
Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-27 10:26:34 -07:00
Lantao Liu a003e041c8 runsc: fix pid file race condition in exec detach mode.
PiperOrigin-RevId: 214700295
Change-Id: I73d8490572eebe5da584af91914650d1953aeb91
2018-09-26 17:41:20 -07:00
Fabricio Voznika e395273301 Fix sandbox and gofer capabilities
Capabilities.Set() adds capabilities,
but doesn't remove existing ones that might have been loaded. Fixed
the code and added tests.

PiperOrigin-RevId: 213726369
Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba
2018-09-19 17:15:14 -07:00
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Lantao Liu bde2a91433 runsc: Support container signal/wait.
This CL:
1) Fix `runsc wait`, it now also works after the container exits;
2) Generate correct container state in Load;
2) Make sure `Destory` cleanup everything before successfully return.

PiperOrigin-RevId: 212900107
Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-13 16:38:03 -07:00
Kevin Krakauer 2eff1fdd06 runsc: Add exec flag that specifies where to save the sandbox-internal pid.
This is different from the existing -pid-file flag, which saves a host pid.

PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12 15:23:35 -07:00
Nicolas Lacasse 6cc9b311af platform: Pass device fd into platform constructor.
We were previously openining the platform device (i.e. /dev/kvm) inside the
platfrom constructor (i.e. kvm.New).  This requires that we have RW access to
the platform device when constructing the platform.

However, now that the runsc sandbox process runs as user "nobody", it is not
able to open the platform device.

This CL changes the kvm constructor to take the platform device FD, rather than
opening the device file itself. The device file is opened outside of the
sandbox and passed to the sandbox process.

PiperOrigin-RevId: 212505804
Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-11 13:09:46 -07:00
Fabricio Voznika bc81f3fe4a Remove '--file-access=direct' option
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.

PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07 12:28:48 -07:00
Lantao Liu 4f3053cb4e runsc: do not delete in paused state.
PiperOrigin-RevId: 211835570
Change-Id: Ied7933732cad5bc60b762e9c964986cb49a8d9b9
2018-09-06 11:06:19 -07:00
Kevin Krakauer 8f0b6e7fc0 runsc: Support runsc kill multi-container.
Now, we can kill individual containers rather than the entire sandbox.

PiperOrigin-RevId: 211748106
Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05 21:14:56 -07:00
Fabricio Voznika 5f0002fc83 Use container's capabilities in exec
When no capabilities are specified in exec, use the
container's capabilities to match runc's behavior.

PiperOrigin-RevId: 211735186
Change-Id: Icd372ed64410c81144eae94f432dffc9fe3a86ce
2018-09-05 18:32:50 -07:00
Nicolas Lacasse f96b33c73c runsc: Promote getExecutablePathInternal to getExecutablePath.
Remove GetExecutablePath (the non-internal version).  This makes path handling
more consistent between exec, root, and child containers.

The new getExecutablePath now uses MountNamespace.FindInode, which is more
robust than Walking the Dirent tree ourselves.

This also removes the last use of lstat(2) in the sentry, so that can be
removed from the filters.

PiperOrigin-RevId: 211683110
Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-05 13:01:21 -07:00
Nicolas Lacasse ad8648c634 runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

The specutils.ReadSpecFromFile method was fixed to always seek to the beginning
of the file before reading. This allows Files from the same FD to be read
multiple times, as we do in the boot command when the apply-caps flag is set.

Tested with --network=host.

PiperOrigin-RevId: 211570647
Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-09-04 20:10:01 -07:00
Lantao Liu 9ae4e28f75 runsc: fix container rootfs path.
PiperOrigin-RevId: 211515350
Change-Id: Ia495af57447c799909aa97bb873a50b87bee2625
2018-09-04 13:37:40 -07:00
Fabricio Voznika 7e18f158b2 Automated rollback of changelist 210995199
PiperOrigin-RevId: 211116429
Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-31 11:30:47 -07:00
Fabricio Voznika 3e493adf7a Add seccomp filter to fsgofer
PiperOrigin-RevId: 211011542
Change-Id: Ib5a83a00f8eb6401603c6fb5b59afc93bac52558
2018-08-30 17:30:19 -07:00
Nicolas Lacasse 5ade9350ad runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

PiperOrigin-RevId: 210995199
Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-08-30 15:47:18 -07:00
Fabricio Voznika ae648bafda Add command-line parameter to trigger panic on signal
This is to troubleshoot problems with a hung process that is
not responding to 'runsc debug --stack' command.

PiperOrigin-RevId: 210483513
Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
2018-08-27 20:36:10 -07:00
Fabricio Voznika db81c0b02f Put fsgofer inside chroot
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.

PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-27 11:10:14 -07:00
Nicolas Lacasse 106de2182d runsc: Terminal support for "docker exec -ti".
This CL adds terminal support for "docker exec".  We previously only supported
consoles for the container process, but not exec processes.

The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.

Note that control-character signals are still not properly supported.

Tested with:
	$ docker run --runtime=runsc -it alpine
In another terminial:
	$ docker exec -it <containerid> /bin/sh

PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24 17:43:21 -07:00
Fabricio Voznika a81a4402a2 Add option to panic gofer if writes are attempted over RO mounts
This is used when '--overlay=true' to guarantee writes are not sent to gofer.

PiperOrigin-RevId: 210116288
Change-Id: I7616008c4c0e8d3668e07a205207f46e2144bf30
2018-08-24 10:17:42 -07:00
Fabricio Voznika d6d165cb0b Initial change for multi-gofer support
PiperOrigin-RevId: 209647293
Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-21 13:14:43 -07:00
Kevin Krakauer 635b0c4593 runsc fsgofer: Support dynamic serving of filesystems.
When multiple containers run inside a sentry, each container has its own root
filesystem and set of mounts. Containers are also added after sentry boot rather
than all configured and known at boot time.

The fsgofer needs to be able to serve the root filesystem of each container.
Thus, it must be possible to add filesystems after the fsgofer has already
started.

This change:
* Creates a URPC endpoint within the gofer process that listens for requests to
  serve new content.
* Enables the sentry, when starting a new container, to add the new container's
  filesystem.
* Mounts those new filesystems at separate roots within the sentry.

PiperOrigin-RevId: 208903248
Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15 16:25:22 -07:00
Nicolas Lacasse 1129b35c92 runsc: Fix "exec" command when called without --pid-file.
When "exec" command is called without the "--detach" flag, we spawn a second
"exec" command and wait for that one to start. We use the pid file passed in
--pid-file to detect when this second command has started running.

However if "exec" is called with no --pid-file flag, this system breaks down,
as we don't have a pid file to wait for.

This CL ensures that the second instance of the "exec" command always writes a
pid-file, so the wait is successful.

PiperOrigin-RevId: 206002403
Change-Id: If9f2be31eb6e831734b1b833f25054ec71ab94a6
2018-07-25 09:11:45 -07:00
Justine Olshan c05660373e Moved restore code out of create and made to be called after create.
Docker expects containers to be created before they are restored.
However, gVisor restoring requires specificactions regarding the kernel
and the file system. These actions were originally in booting the sandbox.

Now setting up the file system is deferred until a call to a call to
runsc start. In the restore case, the kernel is destroyed and a new kernel
is created in the same process, as we need the same process for Docker.

These changes required careful execution of concurrent processes which
required the use of a channel.

Full docker integration still needs the ability to restore into the same
container.

PiperOrigin-RevId: 205161441
Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-18 16:58:30 -07:00
Nicolas Lacasse 67507bd579 runsc: Don't close the control server in a defer.
Closing the control server will block until all open requests have completed.
If a control server method panics, we end up stuck because the defer'd Destroy
function will never return.

PiperOrigin-RevId: 204354676
Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
2018-07-12 13:36:57 -07:00
Lantao Liu 138cb8da50 runsc: `runsc wait` print wait status.
PiperOrigin-RevId: 203160639
Change-Id: I8fb2787ba0efb7eacd9d4c934238a26eb5ae79d5
2018-07-03 11:58:12 -07:00
Justine Olshan 80bdf8a406 Sets the restore environment for restoring a container.
Updated how restoring occurs through boot.go with a separate Restore function.
This prevents a new process and new mounts from being created.
Added tests to ensure the container is restored.
Registered checkpoint and restore commands so they can be used.
Docker support for these commands is still limited.
Working on #80.

PiperOrigin-RevId: 202710950
Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-29 14:47:40 -07:00
Brielle Broder 25e315c2e1 Added leave-running flag for checkpoint.
The leave-running flag allows the container to continue running after a
checkpoint has occurred by doing an immediate restore into a new
container with the same container ID after the old container is destroyed.

Updates #80.

PiperOrigin-RevId: 202695426
Change-Id: Iac50437f5afda018dc18b24bb8ddb935983cf336
2018-06-29 13:09:33 -07:00
Kevin Krakauer 16d37973eb runsc: Add the "wait" subcommand.
Users can now call "runsc wait <container id>" to wait on a particular process
inside the container. -pid can also be used to wait on a specific PID.

Manually tested the wait subcommand for a single waiter and multiple waiters
(simultaneously 2 processes waiting on the container and 2 processes waiting on
a PID within the container).

PiperOrigin-RevId: 202548978
Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
2018-06-28 14:56:36 -07:00
Fabricio Voznika 8459390cdd Error out if spec is invalid
Closes #66

PiperOrigin-RevId: 202496258
Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
2018-06-28 09:57:27 -07:00
Brielle Broder f93043615f Added MkdirAll capabilities for Checkpoint's image-path.
Now able to save the state file (checkpoint.img) at an image-path that had
previously not existed. This is important because there can only be one
checkpoint.img file per directory so this will enable users to create as many
directories as needed for proper organization.

PiperOrigin-RevId: 202360414
Change-Id: If5dd2b72e08ab52834a2b605571186d107b64526
2018-06-27 13:32:53 -07:00