Commit Graph

238 Commits

Author SHA1 Message Date
Nicolas Lacasse ae5122eb87 Job control signals must be sent to all processes in the FG process group.
We were previously only sending to the originator of the process group.

Integration test was changed to test this behavior. It fails without the
corresponding code change.

PiperOrigin-RevId: 216297263
Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
2018-10-08 20:48:54 -07:00
Michael Pratt b8048f75da Uncapitalize error
PiperOrigin-RevId: 216281263
Change-Id: Ie0c189e7f5934b77c6302336723bc1181fd2866c
2018-10-08 17:44:39 -07:00
Nicolas Lacasse 4a00ea557c Capture boot panics in debug log.
Docker and Containerd both eat the boot processes stderr, making it difficult
to track down panics (which are always written to stderr).

This CL makes the boot process dup its debug log FD to stderr, so that panics
will be captured in the debug log, which is better than nothing.

This is the 3rd try at this CL.  Previous attempts were foiled because Docker
expects the 'create' command to pass its stdio directly to the container, so
duping stderr in 'create' caused the applications stderr to go to the log file,
which breaks many applications (including our mysql test).

I added a new image_test that makes sure stdout and stderr are handled
correctly.

PiperOrigin-RevId: 215767328
Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
2018-10-04 11:01:44 -07:00
Fabricio Voznika 3f46f2e501 Fix sandbox chroot
Sandbox was setting chroot, but was not chaging the working
dir. Added test to ensure this doesn't happen in the future.

PiperOrigin-RevId: 215676270
Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
2018-10-03 20:44:20 -07:00
Nicolas Lacasse 9f2ba6ac3e Automated rollback of changelist 215585559
PiperOrigin-RevId: 215633475
Change-Id: I7bc471e3b9a2c725fb5e15b3bbcba2ee1ea574b1
2018-10-03 14:54:21 -07:00
Nicolas Lacasse 7a6412cb0b runsc: Allow state transition from Creating to Stopped.
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.

Without this transition, errors during Create get hidden by the later panic.

PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
2018-10-03 11:49:40 -07:00
Nicolas Lacasse 37e57a903c Fix arithmetic error in multi_container_test.
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.

I switched back to Math.Pow format to make the arithmetic easier to inspect.

PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
2018-10-03 10:47:52 -07:00
Nicolas Lacasse 55d28fb124 runsc: Dup debug log file to stderr, so sentry panics don't get lost.
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.

If we have a debug log file, we should send panics (and all stderr data) to the
log file.

PiperOrigin-RevId: 215585559
Change-Id: I3844259ed0cd26e26422bcdb40dded302740b8b6
2018-10-03 10:33:56 -07:00
Nicolas Lacasse e215b9970a runsc: Pass root container's stdio via FD.
We were previously using the sandbox process's stdio as the root container's
stdio. This makes it difficult/impossible to distinguish output application
output from sandbox output, such as panics, which are always written to stderr.

Also close the console socket when we are done with it.

PiperOrigin-RevId: 215585180
Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-10-03 10:32:03 -07:00
Fabricio Voznika 77e43adeab Add TIOCINQ to allowed seccomp when hostinet is used
PiperOrigin-RevId: 215574070
Change-Id: Ib36e804adebaf756adb9cbc2752be9789691530b
2018-10-03 09:32:54 -07:00
Nicolas Lacasse 0a13042d48 Bump some timeouts in the image tests.
PiperOrigin-RevId: 215489101
Change-Id: Iaf96aa8edb1101b70548030c62995841215237d9
2018-10-02 17:28:09 -07:00
Nicolas Lacasse cf3dc2f8a5 Fix compilation bug.
Docker.Run only returns a single argument.

PiperOrigin-RevId: 215427309
Change-Id: I1eebbc628853ca57f79d25e18d4f04dfa5a2a003
2018-10-02 11:36:50 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Nicolas Lacasse d185552e79 Fix ruby image tests.
PiperOrigin-RevId: 215274663
Change-Id: I051721f459084db3aa608432831170cd47ae7df0
2018-10-01 13:57:36 -07:00
Fabricio Voznika a2ad8fef13 Make multi-container the default mode for runsc
And remove multicontainer option.

PiperOrigin-RevId: 215236981
Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-10-01 10:31:17 -07:00
Fabricio Voznika 43e6aff50e Don't fail if Root is readonly and is not a mount point
This makes runsc more friendly to run without docker or K8s.

PiperOrigin-RevId: 215165586
Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
2018-09-30 23:23:03 -07:00
Fabricio Voznika 9c7eb13079 Removed duplicate/stale TODOs
PiperOrigin-RevId: 215162121
Change-Id: I35f06ac3235cf31c9e8a158dcf6261a7ded6c4c4
2018-09-30 22:22:18 -07:00
Fabricio Voznika 50c283b9f5 Add test for 'signall --all' with stopped container
PiperOrigin-RevId: 215025517
Change-Id: I04b9d8022b3d9dfe279e466ddb91310b9860b9af
2018-09-28 18:16:10 -07:00
Fabricio Voznika cfdd418fe2 Made a few changes to make testutil.Docker easier to use
PiperOrigin-RevId: 215023376
Change-Id: I139569bd15c013e5dd0f60d0c98a64eaa0ba9e8e
2018-09-28 17:48:14 -07:00
Lantao Liu f21dde5666 runsc: allow `kill --all` when container is in stopped state.
PiperOrigin-RevId: 215009105
Change-Id: I1ab12eddf7694c4db98f6dafca9dae352a33f7c4
2018-09-28 15:53:25 -07:00
Fabricio Voznika 49ff81a42b Add ruby image tests
PiperOrigin-RevId: 215009066
Change-Id: I54ab920fa649cf4d0817f7cb8ea76f9126523330
2018-09-28 15:52:33 -07:00
Fabricio Voznika 2496d9b4b6 Make runsc kill and delete more conformant to the "spec"
PiperOrigin-RevId: 214976251
Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28 12:22:21 -07:00
Googler fb65b0b471 Change tcpip.Route.Mask to tcpip.AddressMask.
PiperOrigin-RevId: 214975659
Change-Id: I7bd31a2c54f03ff52203109da312e4206701c44c
2018-09-28 12:18:15 -07:00
Fabricio Voznika cf226d48ce Switch to root in userns when CAP_SYS_CHROOT is also missing
Some tests check current capabilities and re-run the tests as root inside
userns if required capabibilities are missing. It was checking for
CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now.

PiperOrigin-RevId: 214949226
Change-Id: Ic81363969fa76c04da408fae8ea7520653266312
2018-09-28 09:44:13 -07:00
Fabricio Voznika 6779bd1187 Merge Loader.containerRootTGs and execProcess into a single map
It's easier to manage a single map with processes that we're interested
to track. This will make the next change to clean up the map on destroy
easier.

PiperOrigin-RevId: 214894210
Change-Id: I099247323a0487cd0767120df47ba786fac0926d
2018-09-27 23:55:05 -07:00
Fabricio Voznika 1166c088fc Move common test code to function
PiperOrigin-RevId: 214890335
Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27 22:53:18 -07:00
Nicolas Lacasse b709d23987 Forward ioctl(TCSETSF) calls on host ttys to the host kernel.
We already forward TCSETS and TCSETSW.  TCSETSF is roughly equivalent but
discards pending input.

The filters were relaxed to allow host ioctls with TCSETSF argument.

This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.

Before:
	root@b8a0240fc836:/# passwd
	Enter new UNIX password: 123
	Retype new UNIX password: 123
	passwd: password updated successfully

After:
	root@ae6f5dabe402:/# passwd
	Enter new UNIX password:
	Retype new UNIX password:
	passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27 18:17:38 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Fabricio Voznika b514ab0589 Refactor 'runsc boot' to take container ID as argument
This makes the flow slightly simpler (no need to call
Loader.SetRootContainer). And this is required change to tag
tasks with container ID inside the Sentry.

PiperOrigin-RevId: 214795210
Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-27 10:26:34 -07:00
Fabricio Voznika 6910ff3643 Move uds_test_app to common test_app
This was done so it's easier to add more functionality
to this file for other tests.

PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
2018-09-27 08:58:23 -07:00
Lantao Liu a003e041c8 runsc: fix pid file race condition in exec detach mode.
PiperOrigin-RevId: 214700295
Change-Id: I73d8490572eebe5da584af91914650d1953aeb91
2018-09-26 17:41:20 -07:00
Nicolas Lacasse d489336784 runsc: All non-root bind mounts should be shared.
This CL changes the semantics of the "--file-access" flag so that it only
affects the root filesystem.  The default remains "exclusive" which is the
common use case, as neither Docker nor K8s supports sharing the root.

Keeping the root fs as "exclusive" means that the fs-intensive work done during
application startup will mostly be cacheable, and thus faster.

Non-root bind mounts will always be shared.

This CL also removes some redundant FSAccessType validations.  We validate this
flag in main(), so we can assume it is valid afterwards.

PiperOrigin-RevId: 214359936
Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51
2018-09-24 17:22:15 -07:00
Ian Gudger 7ce13ebcad Run gofmt -s on everything
PiperOrigin-RevId: 214040901
Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21 14:06:59 -07:00
Nicolas Lacasse d260e808f4 The "action" in container.Signal should be "signal".
PiperOrigin-RevId: 214038776
Change-Id: I4ad212540ec4ef4fb5ab5fdcb7f0865c4f746895
2018-09-21 13:54:35 -07:00
Nicolas Lacasse b4321f4447 runsc: Synchronize container metadata changes with a file lock.
Each container has associated metadata (particularly the container status) that
is manipulated by various runsc commands. This metadata is stored in a file
identified by the container id.

Different runsc processes may manipulate the same container metadata, and each
will read/write to the metadata file.

This CL adds a file lock per container which must be held when reading the
container metadata file, and when modifying and writing the container metadata.

PiperOrigin-RevId: 214019179
Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad
2018-09-21 11:42:06 -07:00
Fabricio Voznika b63c4bfe02 Set Sandbox.Chroot so it gets cleaned up upon destruction
I've made several attempts to create a test, but the lack of
permission from the test user makes it nearly impossible to
test anything useful.

PiperOrigin-RevId: 213922174
Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
2018-09-20 18:54:09 -07:00
Lantao Liu 8a938a3f9d runsc: allow `runsc wait` on a container for multiple times.
PiperOrigin-RevId: 213908919
Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7
2018-09-20 16:59:42 -07:00
Nicolas Lacasse cbaec4d614 Wait for all async fs operations to complete before returning from Destroy.
Destroy flushes dirent references, which triggers many async close operations.
We must wait for those to finish before returning from Destroy, otherwise we
may kill the gofer, causing a cascade of failing RPCs and leading to an
inconsistent FS state.

PiperOrigin-RevId: 213884637
Change-Id: Id054b47fc0f97adc5e596d747c08d3b97a1d1f71
2018-09-20 14:37:53 -07:00
Lantao Liu 9464b82a06 runsc: Fix a bug that `runsc wait` doesn't work after container exits.
PiperOrigin-RevId: 213849165
Change-Id: I5120b2f568850c0c42a08e8706e7f8653ef1bd94
2018-09-20 11:23:26 -07:00
Kevin Krakauer ffb5fdd690 runsc: Fix stdin/stdout/stderr in multi-container mode.
The issue with the previous change was that the stdin/stdout/stderr passed to
the sentry were dup'd by host.ImportFile. This left a dangling FD that by never
closing caused containerd to timeout waiting on container stop.

PiperOrigin-RevId: 213753032
Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
2018-09-19 22:20:41 -07:00
Nicolas Lacasse 915d76aa92 Add container.Destroy urpc method.
This method will:
1. Stop the container process if it is still running.
2. Unmount all sanadbox-internal mounts for the container.
3. Delete the contaner root directory inside the sandbox.

Destroy is idempotent, and safe to call concurrantly.

This fixes a bug where after stopping a container, we cannot unmount the
container root directory on the host. This bug occured because the sandbox
dirent cache was holding a dirent with a host fd corresponding to a file inside
the container root on the host. The dirent cache did not know that the
container had exited, and kept the FD open, preventing us from unmounting on
the host.

Now that we unmount (and flush) all container mounts inside the sandbox, any
host FDs donated by the gofer will be closed, and we can unmount the container
root on the host.

PiperOrigin-RevId: 213737693
Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-19 18:54:14 -07:00
Kevin Krakauer 639226c3d9 runsc: Mark container_test flaky.
PiperOrigin-RevId: 213732520
Change-Id: Ife292987ec8b1de4c2e7e3b7d4452b00c1582e91
2018-09-19 18:03:35 -07:00
Fabricio Voznika e395273301 Fix sandbox and gofer capabilities
Capabilities.Set() adds capabilities,
but doesn't remove existing ones that might have been loaded. Fixed
the code and added tests.

PiperOrigin-RevId: 213726369
Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba
2018-09-19 17:15:14 -07:00
Nicolas Lacasse 2ad3228cd0 runsc: Don't create __runsc_containers__ unless we are in multi-container mode.
PiperOrigin-RevId: 213715511
Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12
2018-09-19 16:10:47 -07:00
Lingfu f0a92b6b67 Add docker command line args support for --cpuset-cpus and --cpus
`docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json
(runtime spec file). When nginx worker_connections is configured as auto, the worker is
generated according to the number of CPUs. If the cgroup is already set on the host, but
it is not displayed correctly in the sandbox, performance may be degraded.

This patch can get cpus info from spec file and apply to sentry on bootup, so the
/proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on
`/sys/devices/system/cpu/online` are also affected by this patch.

e.g.

--cpuset-cpus=2,3   ->  cpu number:2
--cpuset-cpus=4-7   ->  cpu number:4
--cpus=2.8          ->  cpu number:3
--cpus=0.5          ->  cpu number:1
Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316
PiperOrigin-RevId: 213685199
2018-09-19 13:35:42 -07:00
Fabricio Voznika 8aec7473a1 Added state machine checks for Container.Status
For my own sanitity when thinking about possible transitions and state.

PiperOrigin-RevId: 213559482
Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c
2018-09-18 19:12:54 -07:00
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Kevin Krakauer 7e00f37054 Automated rollback of changelist 213307171
PiperOrigin-RevId: 213504354
Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-18 13:22:26 -07:00
Fabricio Voznika 5d9816be41 Remove memory usage static init
panic() during init() can be hard to debug.

Updates #100

PiperOrigin-RevId: 213391932
Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17 21:34:37 -07:00
Fabricio Voznika 26b08e182c Rename container in test
's' used to stand for sandbox, before container exited.

PiperOrigin-RevId: 213390641
Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17 21:18:27 -07:00