Commit Graph

368 Commits

Author SHA1 Message Date
Kevin Krakauer f9431fb20f Remove obsolete TODO.
PiperOrigin-RevId: 241637164
Change-Id: I65476a739cf38f1818dc47f6ce60638dec8b77a8
2019-04-02 17:27:05 -07:00
Kevin Krakauer a40ee4f4b8 Change bug number for duplicate bug.
PiperOrigin-RevId: 241567897
Change-Id: I580eac04f52bb15f4aab7df9822c4aa92e743021
2019-04-02 11:28:06 -07:00
Fabricio Voznika 1df3fa6997 Automated rollback of changelist 240657604
PiperOrigin-RevId: 241434161
Change-Id: I9ec734e50cef5b39203e8bf37de2d91d24943f1e
2019-04-01 17:30:11 -07:00
Adin Scannell 7543e9ec20 Add release hook and version flag
PiperOrigin-RevId: 241421671
Change-Id: Ic0cebfe3efd458dc42c49f7f812c13318705199a
2019-04-01 16:18:43 -07:00
Liu Hua 33c644bc0b gofer: ignore unsupported files
'ls' will hang if there is any FIFO in this path. So
return EPERM if unsupported file occurs and add NONBLOCK flag
when opening file to avoid blocking on FIFO read.

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Change-Id: I8b9a2a48322118d8ad531dd226395438123eb047
PiperOrigin-RevId: 241406726
2019-04-01 15:01:53 -07:00
Andrei Vagin a046054ba3 gvisor/runsc: enable generic segmentation offload (GSO)
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.

Here are numbers for the nginx-1m test:
runsc:		579330.01 [Kbytes/sec] received
runsc-gso:	1794121.66 [Kbytes/sec] received
runc:		2122139.06 [Kbytes/sec] received

and for tcp_benchmark:

$ tcp_benchmark  --duration 15   --ideal
[  4]  0.0-15.0 sec  86647 MBytes  48456 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal
[  4]  0.0-15.0 sec  2173 MBytes  1214 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal --gso 65536
[  4]  0.0-15.0 sec  19357 MBytes  10825 Mbits/sec

PiperOrigin-RevId: 241072403
Change-Id: I20b03063a1a6649362b43609cbbc9b59be06e6d5
2019-03-29 16:27:38 -07:00
Nicolas Lacasse dcf6613331 Set container.CreatedAt in Create().
PiperOrigin-RevId: 241056805
Change-Id: I13ea8f5dbfb01ca02a3b0ab887b8c3bdf4d556a6
2019-03-29 14:55:22 -07:00
Liu Hua 1d7e2bc377 gofer: some fixs in setupRootFS
1.use root instead of spec.Root.path as mountpoint
2.put remount readonly logic ahead to avoid device busy errors

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Change-Id: I9222b4695f917136a97b0898ac6f75fcff296e5d
PiperOrigin-RevId: 240818182
2019-03-28 11:42:41 -07:00
Fabricio Voznika 6cb0b1881a Automated rollback of changelist 240502097
PiperOrigin-RevId: 240657604
Change-Id: Ida15dee83337867c560427eae0b4b9ce1051dbb8
2019-03-27 15:46:49 -07:00
Andrei Vagin 5d94c893ae gvisor/runsc: address typos from github
Fixes: https://github.com/google/gvisor/issues/143
Fixes #143
PiperOrigin-RevId: 240600719
Change-Id: Id1731b9969f98e32e52e144a6643e12b0b70f168
2019-03-27 11:10:15 -07:00
Fabricio Voznika beb71ab681 Merge fsgofer 'controlFile' and 'openedFile'
This reduces the number of FDs used for writable files.

#149

PiperOrigin-RevId: 240502097
Change-Id: Ib44489f65bce23dd1a995f620d69e65dce003f7c
2019-03-26 23:44:34 -07:00
Fabricio Voznika c7877b0a14 Fail in case mount option is unknown
PiperOrigin-RevId: 239425816
Change-Id: I3b1479c61b4222c3931a416c4efc909157044330
2019-03-20 10:36:20 -07:00
Andrei Vagin 87cce0ec08 netstack: reduce MSS from SYN to account tcp options
See: https://tools.ietf.org/html/rfc6691#section-2
PiperOrigin-RevId: 239305632
Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e
2019-03-19 17:33:20 -07:00
Fabricio Voznika e420cc3e5d Add support for mount propagation
Properly handle propagation options for root and mounts. Now usage of
mount options shared, rshared, and noexec cause error to start. shared/
rshared breaks sandbox=>host isolation. slave however can be supported
because changes propagate from host to sandbox.

Root FS setup moved inside the gofer. Apart from simplifying the code,
it keeps all mounts inside the namespace. And they are torn down when
the namespace is destroyed (DestroyFS is no longer needed).

PiperOrigin-RevId: 239037661
Change-Id: I8b5ee4d50da33c042ea34fa68e56514ebe20e6e0
2019-03-18 12:30:43 -07:00
Jamie Liu 8f4634997b Decouple filemem from platform and move it to pgalloc.MemoryFile.
This is in preparation for improved page cache reclaim, which requires
greater integration between the page cache and page allocator.

PiperOrigin-RevId: 238444706
Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4
2019-03-14 08:12:48 -07:00
Nicolas Lacasse 2512cc5617 Allow filesystem.Mount to take an optional interface argument.
PiperOrigin-RevId: 238360231
Change-Id: I5eaf8d26f8892f77d71c7fbd6c5225ef471cedf1
2019-03-13 19:24:03 -07:00
Ian Gudger a16f6e50c5 Make HandleLocal apply to all non-loopback interfaces.
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify
the implementations. This has the benefit of making HandleLocal apply even when
the fdbased link endpoint isn't in use.

In addition, move looping logic to route creation so that it doesn't need to be
run for each packet. This should improve performance.

PiperOrigin-RevId: 238099480
Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
2019-03-12 14:37:56 -07:00
Fabricio Voznika bc9b979b94 Add profiling commands to runsc
Example:
  runsc debug --root=<dir> \
      --profile-heap=/tmp/heap.prof \
      --profile-cpu=/tmp/cpu.prod --profile-delay=30 \
      <container ID>
PiperOrigin-RevId: 237848456
Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773
2019-03-11 11:47:30 -07:00
Ian Gudger 56a6128295 Implement IP_MULTICAST_LOOP.
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.

For now we only support IPv4 multicast.

PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
2019-03-08 15:49:17 -08:00
Fabricio Voznika 0b76887147 Priority-inheritance futex implementation
It is Implemented without the priority inheritance part given
that gVisor defers scheduling decisions to Go runtime and doesn't
have control over it.

PiperOrigin-RevId: 236989545
Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
2019-03-05 23:40:18 -08:00
Fabricio Voznika fcba4e8f04 Add uncaught signal message to the user log
This help troubleshoot cases where the container is killed and the
app logs don't show the reason.

PiperOrigin-RevId: 236982883
Change-Id: I361892856a146cea5b04abaa3aedbf805e123724
2019-03-05 22:20:17 -08:00
Fabricio Voznika 3dbd4a16f8 Add semctl(GETPID) syscall
Also added unimplemented notification for semctl(2)
commands.

PiperOrigin-RevId: 236340672
Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478
2019-03-01 10:57:02 -08:00
Fabricio Voznika 6df212b831 Don't log twice to debug log when --log isn't set
PiperOrigin-RevId: 235940853
Change-Id: I9c5b4cf18b199fb74044a5edb131bfff59dec945
2019-02-27 10:06:35 -08:00
Fabricio Voznika 52a2abfca4 Fix cgroup when path is relative
This can happen when 'docker run --cgroup-parent=' flag is set.

PiperOrigin-RevId: 235645559
Change-Id: Ieea3ae66939abadab621053551bf7d62d412e7ee
2019-02-25 19:21:47 -08:00
Kevin Krakauer b75aa51504 Rename ping endpoints to icmp endpoints.
PiperOrigin-RevId: 235248572
Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09
2019-02-22 13:34:47 -08:00
Nicolas Lacasse 0a41ea72c1 Don't allow writing or reading to TTY unless process group is in foreground.
If a background process tries to read from a TTY, linux sends it a SIGTTIN
unless the signal is blocked or ignored, or the process group is an orphan, in
which case the syscall returns EIO.

See drivers/tty/n_tty.c:n_tty_read()=>job_control().

If a background process tries to write a TTY, set the termios, or set the
foreground process group, linux then sends a SIGTTOU. If the signal is ignored
or blocked, linux allows the write. If the process group is an orphan, the
syscall returns EIO.

See drivers/tty/tty_io.c:tty_check_change().

PiperOrigin-RevId: 234044367
Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9
2019-02-14 15:47:31 -08:00
Bhasker Hariharan e0b3d3323f Add support for using PACKET_RX_RING to receive packets.
PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the
kernel. This should cut down the number of host syscalls that need to be made
to receive packets when the underlying fd is a socket of the AF_PACKET type.

PiperOrigin-RevId: 233834998
Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f
2019-02-13 14:53:03 -08:00
Nicolas Lacasse 92e85623a0 Factor the subtargets method into a helper method with tests.
PiperOrigin-RevId: 232047515
Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60
2019-02-01 15:23:43 -08:00
Andrei Vagin 4e695adcd0 gvisor/gofer: Use pivot_root instead of chroot
PiperOrigin-RevId: 231864273
Change-Id: I8545b72b615f5c2945df374b801b80be64ec3e13
2019-01-31 15:19:04 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Andrei Vagin 7e8a56087b runsc: check whether a container is deleted or not before setupContainerFS
PiperOrigin-RevId: 231811387
Change-Id: Ib143fb9a4d0fa1f105d1a3a3bd533dfc44e792af
2019-01-31 10:34:15 -08:00
Andrei Vagin dd577f5410 runsc: reap a sandbox process only in sandbox.Wait()
PiperOrigin-RevId: 231504064
Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d
2019-01-29 17:15:56 -08:00
Bhasker Hariharan 24cb2c0a72 Use recvmmsg() instead of readv() to read packets from NIC.
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.

PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
2019-01-29 01:39:01 -08:00
Shijiang Wei b44699c529 check isRootNS by ns inode
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
Change-Id: I032f834edae5c716fb2d3538285eec07aa11a902
PiperOrigin-RevId: 231318438
2019-01-28 17:20:20 -08:00
Lantao Liu 52b3cd873d runsc: Only uninstall cgroup for sandbox stop.
PiperOrigin-RevId: 231263114
Change-Id: I57467a34fe94e395fdd3685462c4fe9776d040a3
2019-01-28 11:58:25 -08:00
Fabricio Voznika 55e8eb775b Make cacheRemoteRevalidating detect changes to file size
When file size changes outside the sandbox, page cache was not
refreshing file size which is required for cacheRemoteRevalidating.
In fact, cacheRemoteRevalidating should be skipping the cache
completely since it's not really benefiting from it. The cache is
cache is already bypassed for unstable attributes (see
cachePolicy.cacheUAttrs). And althought the cache is called to
map pages, they will always miss the cache and map directly from
the host.

Created a HostMappable struct that maps directly to the host and
use it for files with cacheRemoteRevalidating.

Closes #124

PiperOrigin-RevId: 230998440
Change-Id: Ic5f632eabe33b47241e05e98c95e9b2090ae08fc
2019-01-25 17:23:07 -08:00
ShiruRen c6facd0358 Fix a nil pointer dereference bug in Container.Destroy()
In Container.Destroy(), we call c.stop() before calling
executeHooksBestEffort(), therefore, when we call
executeHooksBestEffort(c.Spec.Hooks.Poststop, c.State()) to execute
the poststop hook, it results in a nil pointer dereference since it
reads c.Sandbox.Pid in c.State() after the sandbox has been destroyed.
To fix this bug, we can change container's status to "stopped" before
executing the poststop hook.

Signed-off-by: ShiruRen <renshiru2000@gmail.com>
Change-Id: I4d835e430066fab7e599e188f945291adfc521ef
PiperOrigin-RevId: 230975505
2019-01-25 15:03:17 -08:00
Fabricio Voznika c28f886c0b Execute statically linked binary
Mounting lib and lib64 are not necessary anymore and simplifies the test.

PiperOrigin-RevId: 230971195
Change-Id: Ib91a3ffcec4b322cd3687c337eedbde9641685ed
2019-01-25 14:39:20 -08:00
Andrei Vagin 5f08f8fd81 Don't bind-mount runsc into a sandbox mntns
PiperOrigin-RevId: 230437407
Change-Id: Id9d8ceeb018aad2fe317407c78c6ee0f4b47aa2b
2019-01-22 16:46:42 -08:00
Fabricio Voznika c1be25b78d Scrub runsc error messages
Removed "error" and "failed to" prefix that don't add value
from messages. Adjusted a few other messages.  In particular,
when the container fail to start, the message returned is easier
for humans to read:

$ docker run --rm --runtime=runsc alpine foobar
docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory

Closes #77

PiperOrigin-RevId: 230022798
Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-18 17:36:02 -08:00
Andrei Vagin c0a981629c Start a sandbox process in a new userns only if CAP_SETUID is set
In addition, it fixes a race condition in TestMultiContainerGoferStop.
There are two scripts copy the same set of files into the same directory
and sometime one of this command fails with EXIST.

PiperOrigin-RevId: 230011247
Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c
2019-01-18 16:08:39 -08:00
Andrei Vagin c063a1350f runsc: create a new proc mount if the sandbox process is running in a new pidns
PiperOrigin-RevId: 229971902
Change-Id: Ief4fac731e839ef092175908de9375d725eaa3aa
2019-01-18 12:17:34 -08:00
Fabricio Voznika e4d3ca7263 Prevent internal tmpfs mount to override files in /tmp
Runsc wants to mount /tmp using internal tmpfs implementation for
performance. However, it risks hiding files that may exist under
/tmp in case it's present in the container. Now, it only mounts
over /tmp iff:
  - /tmp was not explicitly asked to be mounted
  - /tmp is empty

If any of this is not true, then /tmp maps to the container's
image /tmp.

Note: checkpoint doesn't have sentry FS mounted to check if /tmp
is empty. It simply looks for explicit mounts right now.
PiperOrigin-RevId: 229607856
Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
2019-01-16 12:48:32 -08:00
Fabricio Voznika 92cf3764e0 Create working directory if it doesn't yet exist
PiperOrigin-RevId: 229438125
Change-Id: I58eb0d10178d1adfc709d7b859189d1acbcb2f22
2019-01-15 14:13:27 -08:00
Nicolas Lacasse dc8450b567 Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.
More helper structs have been added to the fsutil package to make it easier to
implement fs.InodeOperations and fs.FileOperations.

PiperOrigin-RevId: 229305982
Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2019-01-14 20:34:28 -08:00
Andrei Vagin a46b6d453d runsc: set up a minimal chroot from the sandbox process
In this case, new mounts are not created in the host mount namspaces, so
tearDownChroot isn't needed, because chroot will be destroyed with a
sandbox mount namespace.

In additional, pivot_root can't be called instead of chroot.

PiperOrigin-RevId: 229250871
Change-Id: I765bdb587d0b8287a6a8efda8747639d37c7e7b6
2019-01-14 14:08:19 -08:00
Andrei Vagin f8c8f24154 runsc: Collect zombies of sandbox and gofer processes
And we need to wait a gofer process before cgroup.Uninstall,
because it is running in the sandbox cgroups.

PiperOrigin-RevId: 228904020
Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
2019-01-11 10:32:26 -08:00
Fabricio Voznika 0d7023d581 Restore to original cgroup after sandbox and gofer processes are created
The original code assumed that it was safe to join and not restore cgroup,
but Container.Run will not exit after calling start, making cgroup cleanup
fail because there were still processes inside the cgroup.

PiperOrigin-RevId: 228529199
Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51
2019-01-09 09:18:15 -08:00
Fabricio Voznika 5ce542ecc7 Undo changes in case of failure to create file/dir/symlink
File/dir/symlink creation is multi-step and may leave state behind in
case of failure in one of the steps. Added best effort attempt to
clean up.

PiperOrigin-RevId: 228286612
Change-Id: Ib03c27cd3d3e4f44d0352edc6ee212a53412d7f1
2019-01-07 23:02:19 -08:00
Fabricio Voznika d033a76fa6 Apply chroot for --network=host too
PiperOrigin-RevId: 227747566
Change-Id: Ide9df4ac1391adcd1c56e08d6570e0d149d85bc4
2019-01-03 14:10:44 -08:00