Commit Graph

341 Commits

Author SHA1 Message Date
Nicolas Lacasse 92e85623a0 Factor the subtargets method into a helper method with tests.
PiperOrigin-RevId: 232047515
Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60
2019-02-01 15:23:43 -08:00
Andrei Vagin 4e695adcd0 gvisor/gofer: Use pivot_root instead of chroot
PiperOrigin-RevId: 231864273
Change-Id: I8545b72b615f5c2945df374b801b80be64ec3e13
2019-01-31 15:19:04 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Andrei Vagin 7e8a56087b runsc: check whether a container is deleted or not before setupContainerFS
PiperOrigin-RevId: 231811387
Change-Id: Ib143fb9a4d0fa1f105d1a3a3bd533dfc44e792af
2019-01-31 10:34:15 -08:00
Andrei Vagin dd577f5410 runsc: reap a sandbox process only in sandbox.Wait()
PiperOrigin-RevId: 231504064
Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d
2019-01-29 17:15:56 -08:00
Bhasker Hariharan 24cb2c0a72 Use recvmmsg() instead of readv() to read packets from NIC.
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.

PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
2019-01-29 01:39:01 -08:00
Shijiang Wei b44699c529 check isRootNS by ns inode
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
Change-Id: I032f834edae5c716fb2d3538285eec07aa11a902
PiperOrigin-RevId: 231318438
2019-01-28 17:20:20 -08:00
Lantao Liu 52b3cd873d runsc: Only uninstall cgroup for sandbox stop.
PiperOrigin-RevId: 231263114
Change-Id: I57467a34fe94e395fdd3685462c4fe9776d040a3
2019-01-28 11:58:25 -08:00
Fabricio Voznika 55e8eb775b Make cacheRemoteRevalidating detect changes to file size
When file size changes outside the sandbox, page cache was not
refreshing file size which is required for cacheRemoteRevalidating.
In fact, cacheRemoteRevalidating should be skipping the cache
completely since it's not really benefiting from it. The cache is
cache is already bypassed for unstable attributes (see
cachePolicy.cacheUAttrs). And althought the cache is called to
map pages, they will always miss the cache and map directly from
the host.

Created a HostMappable struct that maps directly to the host and
use it for files with cacheRemoteRevalidating.

Closes #124

PiperOrigin-RevId: 230998440
Change-Id: Ic5f632eabe33b47241e05e98c95e9b2090ae08fc
2019-01-25 17:23:07 -08:00
ShiruRen c6facd0358 Fix a nil pointer dereference bug in Container.Destroy()
In Container.Destroy(), we call c.stop() before calling
executeHooksBestEffort(), therefore, when we call
executeHooksBestEffort(c.Spec.Hooks.Poststop, c.State()) to execute
the poststop hook, it results in a nil pointer dereference since it
reads c.Sandbox.Pid in c.State() after the sandbox has been destroyed.
To fix this bug, we can change container's status to "stopped" before
executing the poststop hook.

Signed-off-by: ShiruRen <renshiru2000@gmail.com>
Change-Id: I4d835e430066fab7e599e188f945291adfc521ef
PiperOrigin-RevId: 230975505
2019-01-25 15:03:17 -08:00
Fabricio Voznika c28f886c0b Execute statically linked binary
Mounting lib and lib64 are not necessary anymore and simplifies the test.

PiperOrigin-RevId: 230971195
Change-Id: Ib91a3ffcec4b322cd3687c337eedbde9641685ed
2019-01-25 14:39:20 -08:00
Andrei Vagin 5f08f8fd81 Don't bind-mount runsc into a sandbox mntns
PiperOrigin-RevId: 230437407
Change-Id: Id9d8ceeb018aad2fe317407c78c6ee0f4b47aa2b
2019-01-22 16:46:42 -08:00
Fabricio Voznika c1be25b78d Scrub runsc error messages
Removed "error" and "failed to" prefix that don't add value
from messages. Adjusted a few other messages.  In particular,
when the container fail to start, the message returned is easier
for humans to read:

$ docker run --rm --runtime=runsc alpine foobar
docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory

Closes #77

PiperOrigin-RevId: 230022798
Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-18 17:36:02 -08:00
Andrei Vagin c0a981629c Start a sandbox process in a new userns only if CAP_SETUID is set
In addition, it fixes a race condition in TestMultiContainerGoferStop.
There are two scripts copy the same set of files into the same directory
and sometime one of this command fails with EXIST.

PiperOrigin-RevId: 230011247
Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c
2019-01-18 16:08:39 -08:00
Andrei Vagin c063a1350f runsc: create a new proc mount if the sandbox process is running in a new pidns
PiperOrigin-RevId: 229971902
Change-Id: Ief4fac731e839ef092175908de9375d725eaa3aa
2019-01-18 12:17:34 -08:00
Fabricio Voznika e4d3ca7263 Prevent internal tmpfs mount to override files in /tmp
Runsc wants to mount /tmp using internal tmpfs implementation for
performance. However, it risks hiding files that may exist under
/tmp in case it's present in the container. Now, it only mounts
over /tmp iff:
  - /tmp was not explicitly asked to be mounted
  - /tmp is empty

If any of this is not true, then /tmp maps to the container's
image /tmp.

Note: checkpoint doesn't have sentry FS mounted to check if /tmp
is empty. It simply looks for explicit mounts right now.
PiperOrigin-RevId: 229607856
Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
2019-01-16 12:48:32 -08:00
Fabricio Voznika 92cf3764e0 Create working directory if it doesn't yet exist
PiperOrigin-RevId: 229438125
Change-Id: I58eb0d10178d1adfc709d7b859189d1acbcb2f22
2019-01-15 14:13:27 -08:00
Nicolas Lacasse dc8450b567 Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.
More helper structs have been added to the fsutil package to make it easier to
implement fs.InodeOperations and fs.FileOperations.

PiperOrigin-RevId: 229305982
Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2019-01-14 20:34:28 -08:00
Andrei Vagin a46b6d453d runsc: set up a minimal chroot from the sandbox process
In this case, new mounts are not created in the host mount namspaces, so
tearDownChroot isn't needed, because chroot will be destroyed with a
sandbox mount namespace.

In additional, pivot_root can't be called instead of chroot.

PiperOrigin-RevId: 229250871
Change-Id: I765bdb587d0b8287a6a8efda8747639d37c7e7b6
2019-01-14 14:08:19 -08:00
Andrei Vagin f8c8f24154 runsc: Collect zombies of sandbox and gofer processes
And we need to wait a gofer process before cgroup.Uninstall,
because it is running in the sandbox cgroups.

PiperOrigin-RevId: 228904020
Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
2019-01-11 10:32:26 -08:00
Fabricio Voznika 0d7023d581 Restore to original cgroup after sandbox and gofer processes are created
The original code assumed that it was safe to join and not restore cgroup,
but Container.Run will not exit after calling start, making cgroup cleanup
fail because there were still processes inside the cgroup.

PiperOrigin-RevId: 228529199
Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51
2019-01-09 09:18:15 -08:00
Fabricio Voznika 5ce542ecc7 Undo changes in case of failure to create file/dir/symlink
File/dir/symlink creation is multi-step and may leave state behind in
case of failure in one of the steps. Added best effort attempt to
clean up.

PiperOrigin-RevId: 228286612
Change-Id: Ib03c27cd3d3e4f44d0352edc6ee212a53412d7f1
2019-01-07 23:02:19 -08:00
Fabricio Voznika d033a76fa6 Apply chroot for --network=host too
PiperOrigin-RevId: 227747566
Change-Id: Ide9df4ac1391adcd1c56e08d6570e0d149d85bc4
2019-01-03 14:10:44 -08:00
Michael Pratt 33191e1cc4 Automated rollback of changelist 225089593
PiperOrigin-RevId: 227595007
Change-Id: If14cc5aab869c5fd7a4ebd95929c887ab690e94c
2019-01-02 15:48:00 -08:00
Fabricio Voznika a891afad6d Simplify synchronization between runsc and sandbox process
Make 'runsc create' join cgroup before creating sandbox process.
This removes the need to synchronize platform creation and ensure
that sandbox process is charged to the right cgroup from the start.

PiperOrigin-RevId: 227166451
Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2
2018-12-28 13:48:24 -08:00
Jamie Liu 194ef586fc Rename limits.MemoryPagesLocked to limits.MemoryLocked.
"RLIMIT_MEMLOCK: This is the maximum number of bytes of memory that may
be locked into RAM." - getrlimit(2)

PiperOrigin-RevId: 226384346
Change-Id: Iefac4a1bb69f7714dc813b5b871226a8344dc800
2018-12-20 13:28:46 -08:00
Googler 86c9bd2547 Automated rollback of changelist 225861605
PiperOrigin-RevId: 226224230
Change-Id: Id24c7d3733722fd41d5fe74ef64e0ce8c68f0b12
2018-12-19 13:30:08 -08:00
Michael Pratt b62591e6a8 Expose internal testing flag
Never to used outside of runsc tests!

PiperOrigin-RevId: 225919013
Change-Id: Ib3b14aa2a2564b5246fb3f8933d95e01027ed186
2018-12-17 17:35:06 -08:00
Jamie Liu 2421006426 Implement mlock(), kind of.
Currently mlock() and friends do nothing whatsoever. However, mlocking
is directly application-visible in a number of ways; for example,
madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked
regions. We handle this inconsistently: MADV_DONTNEED is too important
to not work, but MS_INVALIDATE is rejected.

Change MM to track mlocked regions in a manner consistent with Linux.
It still will not actually pin pages into host physical memory, but:

- mlock() will now cause sentry memory management to precommit mlocked
pages.

- MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as
described above.

PiperOrigin-RevId: 225861605
Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee
2018-12-17 11:38:59 -08:00
Nicolas Lacasse 1775a0e11e container.Destroy should clean up container metadata even if other cleanups fail
If the sandbox process is dead (because of a panic or some other problem),
container.Destroy will never remove the container metadata file, since it will
always fail when calling container.stop().

This CL changes container.Destroy() to always perform the three necessary
cleanup operations:
* Stop the sandbox and gofer processes.
* Remove the container fs on the host.
* Delete the container metadata directory.

Errors from these three operations will be concatenated and returned from
Destroy().

PiperOrigin-RevId: 225448164
Change-Id: I99c6311b2e4fe5f6e2ca991424edf1ebeae9df32
2018-12-13 15:38:10 -08:00
Michael Pratt 24c1158b9c Add "trace signal" option
This option is effectively equivalent to -panic-signal, except that the
sandbox does not die after logging the traceback.

PiperOrigin-RevId: 225089593
Change-Id: Ifb1c411210110b6104613f404334bd02175e484e
2018-12-11 16:12:41 -08:00
Brian Geffon d3bc79bc84 Open source system call tests.
PiperOrigin-RevId: 224886231
Change-Id: I0fccb4d994601739d8b16b1d4e6b31f40297fb22
2018-12-10 14:42:34 -08:00
Nicolas Lacasse 833edbd10b Internal change.
PiperOrigin-RevId: 224865061
Change-Id: I6aa31f880931980ad2fc4c4b3cc4c532aacb31f4
2018-12-10 12:51:54 -08:00
Zhaozhong Ni 9984138abe sentry: turn "dynamically-created" procfs files into static creation.
PiperOrigin-RevId: 224600982
Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34
2018-12-07 17:03:54 -08:00
Andrei Vagin 1b1a42ba6d A sandbox process should wait until it has not been moved into cgroups
PiperOrigin-RevId: 224418900
Change-Id: I53cf4d7c1c70117875b6920f8fd3d58a3b1497e9
2018-12-06 15:28:29 -08:00
Brian Geffon 82719be42e Max link traversals should be for an entire path.
The number of symbolic links that are allowed to be followed
are for a full path and not just a chain of symbolic links.

PiperOrigin-RevId: 224047321
Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
2018-12-04 14:32:03 -08:00
Googler 613899f852 Internal change.
PiperOrigin-RevId: 223893409
Change-Id: I58869c7fb0012f6c3f7612a96cb649348b56335f
2018-12-03 17:27:35 -08:00
Googler 4d0da37cbb Internal change.
PiperOrigin-RevId: 223231273
Change-Id: I8fb97ea91f7507b4918f7ce6562890611513fc30
2018-11-28 14:01:48 -08:00
Kevin Krakauer 7b86d36a63 Fix crictl tests.
gvisor-containerd-shim moved. It now has a stable URL that run_tests.sh always
uses.

PiperOrigin-RevId: 223188822
Change-Id: I5687c78289404da27becd8d5949371e580fdb360
2018-11-28 10:10:13 -08:00
Michael Pratt 071aeea9d3 Disable crictl tests
gvisor-containerd-shim installation is currently broken.

PiperOrigin-RevId: 223002877
Change-Id: I2b890c5bf602a96c475c3805f24852ead8593a35
2018-11-27 09:25:20 -08:00
Fabricio Voznika eaac94d91c Use RET_KILL_PROCESS if available in kernel
RET_KILL_THREAD doesn't work well for Go because it will
kill only the offending thread and leave the process hanging.
RET_TRAP can be masked out and it's not guaranteed to kill
the process. RET_KILL_PROCESS is available since 4.14.

For older kernel, continue to use RET_TRAP as this is the
best option (likely to kill process, easy to debug).

PiperOrigin-RevId: 222357867
Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
2018-11-20 22:56:51 -08:00
Nicolas Lacasse f894610c57 Use math.Rand to generate a random test container id.
We were relying on time.UnixNano, but that was causing collisions.

Now we generate 20 bytes of entropy from rand.Read, and base32-encode it to get
a valid container id.

PiperOrigin-RevId: 222313867
Change-Id: Iaeea9b9582d36de55f9f02f55de6a5de3f739371
2018-11-20 15:10:18 -08:00
Nicolas Lacasse 9363edcf06 Internal change.
PiperOrigin-RevId: 222170431
Change-Id: I26a6d6ad5d6910a94bb8b0a05fc2d12e23098399
2018-11-20 14:04:41 -08:00
Fabricio Voznika fadffa2ff8 Add unsupported syscall events for get/setsockopt
PiperOrigin-RevId: 222148953
Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa
2018-11-20 14:04:12 -08:00
Fabricio Voznika 237f9c7a5e Don't fail when destroyContainerFS is called more than once
This can happen when destroy is called multiple times or when destroy
failed previously and is being called again.

PiperOrigin-RevId: 221882034
Change-Id: I8d069af19cf66c4e2419bdf0d4b789c5def8d19e
2018-11-20 14:03:42 -08:00
Nicolas Lacasse 845836c578 Internal change.
PiperOrigin-RevId: 221848471
Change-Id: I882fbe5ce7737048b2e1f668848e9c14ed355665
2018-11-20 14:03:11 -08:00
Nicolas Lacasse adf8138e06 Allow sandbox.Wait to be called after the sandbox has exited.
sandbox.Wait is racey, as the sandbox may have exited before it is called, or
even during.

We already had code to handle the case that the sandbox exits during the Wait
call, but we were not properly handling the case where the sandbox has exited
before the call.

The best we can do in such cases is return the sandbox exit code as the
application exit code.

PiperOrigin-RevId: 221702517
Change-Id: I290d0333cc094c7c1c3b4ce0f17f61a3e908d787
2018-11-15 15:35:41 -08:00
Nicolas Lacasse 7b938ef78c Internal change.
PiperOrigin-RevId: 221462069
Change-Id: Id469ed21fe12e582c78340189b932989afa13c67
2018-11-14 09:58:43 -08:00
Nicolas Lacasse 40f843fc78 Internal change.
PiperOrigin-RevId: 221343626
Change-Id: I03d57293a555cf4da9952a81803b9f8463173c89
2018-11-13 15:18:17 -08:00
Nicolas Lacasse 7f558eda44 Internal change.
PiperOrigin-RevId: 221343421
Change-Id: I418b5204c5ed4fe1e0af25ef36ee66b9b571928e
2018-11-13 15:17:19 -08:00