Commit Graph

573 Commits

Author SHA1 Message Date
Dean Deng 07f2584979 Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.

Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.

Extended attributes will be implemented in future commits.

PiperOrigin-RevId: 290121300
2020-01-16 12:56:33 -08:00
Bhasker Hariharan f874723e64 Bump SO_SNDBUF for fdbased endpoint used by runsc.
Updates #231

PiperOrigin-RevId: 289897881
2020-01-15 11:19:06 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Bert Muthalaly e21c584056 Combine various Create*NIC methods into CreateNICWithOptions.
PiperOrigin-RevId: 288779416
2020-01-08 14:50:49 -08:00
Bert Muthalaly 0cc1e74b57 Add NIC.isLoopback()
...enabling us to remove the "CreateNamedLoopbackNIC" variant of
CreateNIC and all the plumbing to connect it through to where the value
is read in FindRoute.

PiperOrigin-RevId: 288713093
2020-01-08 09:30:20 -08:00
Fabricio Voznika 0d475cdb01 Increase waitForProcessList timeout
It can take more than 10 seconds when running under --race.

PiperOrigin-RevId: 286296060
2019-12-18 17:10:44 -08:00
Aleksandr Razumov 67f678be27
Leave minimum CPU number as a constant
Remove introduced CPUNumMin config and hard-code it as 2.
2019-12-17 20:41:02 +03:00
Aleksandr Razumov b661434202
Add minimum CPU number and only lower CPUs on --cpu-num-from-quota
* Add `--cpu-num-min` flag to control minimum CPUs
* Only lower CPU count
* Fix comments
2019-12-17 13:27:13 +03:00
Aleksandr Razumov 8782f0e287
Set CPU number to CPU quota
When application is not cgroups-aware, it can spawn excessive threads
which often defaults to CPU number.
Introduce a opt-in flag that will set CPU number accordingly to CPU
quota (if available).

Fixes #1391
2019-12-15 21:12:43 +03:00
Kevin Krakauer be2754a4b9 Add iptables testing framework.
It would be preferrable to test iptables via syscall tests, but there are some
problems with that approach:

* We're limited to loopback-only, as syscall tests involve only a single
  container. Other link interfaces (e.g. fdbased) should be tested.
* We'd have to shell out to call iptables anyways, as the iptables syscall
  interface itself is too large and complex to work with alone.
* Running the Linux/native version of the syscall test will require root, which
  is a pain to configure, is inherently unsafe, and could leave host iptables
  misconfigured.

Using the go_test target allows there to be no new test runner.

PiperOrigin-RevId: 285274275
2019-12-12 14:42:11 -08:00
Bhasker Hariharan b9aa62b9f9 Enable IPv6 in runsc
Fixes #1341

PiperOrigin-RevId: 285108973
2019-12-11 19:14:26 -08:00
Andrei Vagin f8c5ad061b runsc/debug: add an option to list all processes
runsc debug --ps list all processes with all threads. This option is added to
the debug command but not to the ps command, because it is going to be used for
debug purposes and we want to add any useful information without thinking about
backward compatibility.

This will help to investigate syzkaller issues.

PiperOrigin-RevId: 285013668
2019-12-11 11:05:41 -08:00
Dean Deng 1643224af0 Finish incomplete comment.
PiperOrigin-RevId: 285012278
2019-12-11 10:37:35 -08:00
Fabricio Voznika 01eadf51ea Bump up Go 1.13 as minimum requirement
PiperOrigin-RevId: 284320186
2019-12-06 23:10:15 -08:00
gVisor bot e70636d7f1 Merge pull request #1233 from xiaobo55x:compatLog
PiperOrigin-RevId: 284305935
2019-12-06 19:41:39 -08:00
Adin Scannell 371e210b83 Add runtime tracing.
This adds meaningful annotations to the trace generated by the runtime/trace
package.

PiperOrigin-RevId: 284290115
2019-12-06 17:00:07 -08:00
Nicolas Lacasse 663fe840f7 Implement TTY field in control.Processes().
Threadgroups already know their TTY (if they have one), which now contains the
TTY Index, and is returned in the Processes() call.

PiperOrigin-RevId: 284263850
2019-12-06 14:34:13 -08:00
Fabricio Voznika ea7a100202 Make annotations OCI compliant
Changed annotation to follow the standard defined here:
https://github.com/opencontainers/image-spec/blob/master/annotations.md

PiperOrigin-RevId: 284254847
2019-12-06 13:51:38 -08:00
Fabricio Voznika 40035d7d9c Fix possible race condition destroying container
When the sandbox is destroyed, making URPC calls to destroy the
container will fail. The code was checking if the sandbox was
running before attempting to make the URPC call, but that is racy.

PiperOrigin-RevId: 284093764
2019-12-05 17:58:36 -08:00
Dean Deng 19b2d997ec Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets.
There are two potential ways of sending a TOS byte with outgoing packets:
including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS
socket options (for IPV4 and IPV6 respectively). This change lets hostinet
support the latter.

Fixes #1188

PiperOrigin-RevId: 283550925
2019-12-03 08:33:22 -08:00
Haibo Xu 61f2274cb6 Enable runsc compatLog support on arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I3fd5e552f5f03b5144ed52647f75af3b8253b1d6
2019-12-03 03:25:54 +00:00
Dean Deng 684f757a22 Add support for receiving TOS and TCLASS control messages in hostinet.
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.

PiperOrigin-RevId: 282851425
2019-11-27 16:21:05 -08:00
gVisor bot 4a620c436d Merge pull request #981 from tanjianfeng:fix-898
PiperOrigin-RevId: 282669859
2019-11-26 17:21:43 -08:00
Fabricio Voznika 97d2c9a94e Use mount hints to determine FileAccessType
PiperOrigin-RevId: 282401165
2019-11-25 11:43:05 -08:00
gVisor bot 0416c247ec Merge pull request #1176 from xiaobo55x:runsc_boot
PiperOrigin-RevId: 282382564
2019-11-25 11:01:22 -08:00
Jianfeng Tan f697d1a33e gofer: reduce CPU usage on GC as of frequent readdir
Refer to golang mallocgc(), each time of allocating an object > 32 KB,
a gc will be triggered.

When we do readdir, sentry always passes 65535, which leads to a malloc
of 65535 * sizeof(p9.Direnta) > 32 KB.

Considering we already use slice append, let's avoid defining the
capability for this slide.

Command for test:

Before this change:

  (container)$ time tree linux-5.3.1 > /dev/null

  real    0m54.272s
  user    0m2.010s
  sys     0m1.740s
  (CPU usage of Gofer: ~30 cores)

  (host)$ perf top -p <pid-of-gofer>

    42.57%  runsc        [.] runtime.gcDrain
    23.41%  runsc        [.] runtime.(*lfstack).pop
     9.74%  runsc        [.] runtime.greyobject
     8.06%  runsc        [.] runtime.(*lfstack).push
     4.33%  runsc        [.] runtime.scanobject
     1.69%  runsc        [.] runtime.findObject
     1.12%  runsc        [.] runtime.findrunnable
     0.69%  runsc        [.] runtime.runqgrab
    ...

  (host)$ mkdir test && cd test
  (host)$ for i in `seq 1 65536`; do mkdir $i; done
  (container)$ time ls test/ > /dev/null

  real    2m10.934s
  user    0m0.280s
  sys     0m4.260s
  (CPU usage of Gofer: ~1 core)

After this change:

  (container)$ time tree linux-5.3.1 > /dev/null

  real    0m22.465s
  user    0m1.270s
  sys     0m1.310s
  (CPU usage of Gofer: ~1 core)

  $ perf top -p <pid-of-gofer>

    20.57%  runsc        [.] runtime.gcDrain
     7.15%  runsc        [.] runtime.(*lfstack).pop
     4.11%  runsc        [.] runtime.scanobject
     3.78%  runsc        [.] runtime.greyobject
     2.78%  runsc        [.] runtime.(*lfstack).push
    ...

  (host)$ mkdir test && cd test
  (host)$ for i in `seq 1 65536`; do mkdir $i; done
  (container)$ time ls test/ > /dev/null

  real    0m13.338s
  user    0m0.190s
  sys     0m3.980s
  (CPU usage of Gofer: ~0.8 core)

Fixes #898

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-11-23 13:24:46 +00:00
Michael Pratt 5eb522193c Force timezone initialization before filter installation
The first use of time.Local (usually via time.Time.Date, et. al) performs
initialization of the local timezone, which involves open several tzdata files
from the host.

Since filter installation disallows open, we should explicitly force this
initialization rather than implicitly depending on the first logging (or other
time) call occurring before filter installation.

PiperOrigin-RevId: 282053121
2019-11-22 15:47:15 -08:00
Haibo Xu 05871a1cdc Enable runsc/boot support on arm64.
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-13 06:39:11 +00:00
Jamie Liu f8ffadddb3 Add p9.OpenTruncate.
This is required to implement O_TRUNC correctly on filesystems backed by
gofers.

9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags
bits, e.g. O_RDONLY, O_RDWR, O_WRONLY."

open(2): "The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation
flags and file status flags can be bitwise-or'd in flags."

The reference 9P2000.L implementation also appears to expect arbitrary flags,
not just access modes, in Tlopen.flags:
https://github.com/chaos/diod/blob/master/diod/ops.c#L703

PiperOrigin-RevId: 278972683
2019-11-06 17:11:58 -08:00
Adin Scannell e904823833 Fix repository build scripts.
This fixes a number of issues with the repository build process:

 * Fix the overall structure of the repository.
 * Fix the debian package description.
 * Fix the broken version number for packages.
 * Update the digest algorithm used for signing the release.

I've validated that installation works from a separate staging bucket.

Updates #852

PiperOrigin-RevId: 278716914
2019-11-05 15:16:04 -08:00
Michael Pratt b23b36e701 Add NETLINK_KOBJECT_UEVENT socket support
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.

systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.

Fixes #1117
Updates #1119

PiperOrigin-RevId: 278405893
2019-11-04 10:07:52 -08:00
gVisor bot 802a3b3bd0 Merge pull request #1109 from xiaobo55x:fsgofer
PiperOrigin-RevId: 278032567
2019-11-01 17:37:07 -07:00
Nicolas Lacasse e70f28664a Allow the watchdog to detect when the sandbox is stuck during setup.
The watchdog currently can find stuck tasks, but has no way to tell if the
sandbox is stuck before the application starts executing.

This CL adds a startup timeout and action to the watchdog. If Start() is not
called before the given timeout (if non-zero), then the watchdog will take the
action.

PiperOrigin-RevId: 277970577
2019-11-01 11:49:31 -07:00
Ian Lewis 36837c4ad3 Add systemd-cgroup flag option.
Adds a systemd-cgroup flag option that prints an error letting the user know
that systemd cgroups are not supported and points them to the relevant issue.

Issue #193

PiperOrigin-RevId: 277837162
2019-10-31 17:39:06 -07:00
gVisor bot 0202be1ba5 Merge pull request #1058 from cmingxu:master
PiperOrigin-RevId: 277623766
2019-10-31 11:26:45 -07:00
Fabricio Voznika ca90dad0e2 Fix container locking
Sandbox root dir was not being saved with the Container state,
so it would point to the wrong directory location when attempting
to lock the sandbox. This led to race conditions saving and
loading container state. Fixing it, led to multiple deadlocks.

I've moved the saving and locking logic to a separate struct and
moved the lock file inside the RootDir (instead of container
root dir), which allows the lock to be taken inside Destroy,
and removes the need to lock the sandbox.

PiperOrigin-RevId: 277599612
2019-10-30 15:39:04 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Haibo Xu 80d0db274e Enable runsc/fsgofer support on arm64.
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
2019-10-30 05:21:36 +00:00
Haibo Xu dec831b493 Cast the Stat_t.Nlink to uint64 on arm64.
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-28 05:56:03 +00:00
Fabricio Voznika e8ba10c008 Fix early deletion of rootDir
container.startContainers() cannot be called twice in a test
(e.g. TestMultiContainerLoadSandbox) because the cleanup
function deletes the rootDir, together with information from
all other containers that may exist.

PiperOrigin-RevId: 276591806
2019-10-24 16:36:54 -07:00
kevin.xu 1f19624fa1
fix typo
fix a typo
2019-10-23 15:21:50 +08:00
kevin.xu 3edbdcc191
remove duplicated period
remove a duplicated period
2019-10-23 14:56:44 +08:00
gVisor bot 6122b413f1 Merge pull request #1046 from tomlanyon:crio
PiperOrigin-RevId: 276172466
2019-10-22 17:05:04 -07:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Tom Lanyon 7e8b5f4a3a Add runsc OCI annotations to support CRI-O.
Obligatory https://xkcd.com/927

Fixes #626
2019-10-20 21:11:01 +11:00
Michael Pratt 49b596b98d Cleanup host UDS support
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.

A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.

Updates #235
Updates #1003

[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.

PiperOrigin-RevId: 275558502
2019-10-18 15:33:03 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Michael Pratt a295616326 Make Attach no longer a special snowflake
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.

This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.

Updates #235

PiperOrigin-RevId: 274827872
2019-10-15 10:01:22 -07:00
gVisor bot 35d35ea5d0 Merge pull request #997 from dvrkps:patch-1
PiperOrigin-RevId: 274675428
2019-10-14 15:52:22 -07:00