Commit Graph

549 Commits

Author SHA1 Message Date
Fabricio Voznika 97d2c9a94e Use mount hints to determine FileAccessType
PiperOrigin-RevId: 282401165
2019-11-25 11:43:05 -08:00
gVisor bot 0416c247ec Merge pull request #1176 from xiaobo55x:runsc_boot
PiperOrigin-RevId: 282382564
2019-11-25 11:01:22 -08:00
Michael Pratt 5eb522193c Force timezone initialization before filter installation
The first use of time.Local (usually via time.Time.Date, et. al) performs
initialization of the local timezone, which involves open several tzdata files
from the host.

Since filter installation disallows open, we should explicitly force this
initialization rather than implicitly depending on the first logging (or other
time) call occurring before filter installation.

PiperOrigin-RevId: 282053121
2019-11-22 15:47:15 -08:00
Haibo Xu 05871a1cdc Enable runsc/boot support on arm64.
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-13 06:39:11 +00:00
Jamie Liu f8ffadddb3 Add p9.OpenTruncate.
This is required to implement O_TRUNC correctly on filesystems backed by
gofers.

9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags
bits, e.g. O_RDONLY, O_RDWR, O_WRONLY."

open(2): "The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation
flags and file status flags can be bitwise-or'd in flags."

The reference 9P2000.L implementation also appears to expect arbitrary flags,
not just access modes, in Tlopen.flags:
https://github.com/chaos/diod/blob/master/diod/ops.c#L703

PiperOrigin-RevId: 278972683
2019-11-06 17:11:58 -08:00
Adin Scannell e904823833 Fix repository build scripts.
This fixes a number of issues with the repository build process:

 * Fix the overall structure of the repository.
 * Fix the debian package description.
 * Fix the broken version number for packages.
 * Update the digest algorithm used for signing the release.

I've validated that installation works from a separate staging bucket.

Updates #852

PiperOrigin-RevId: 278716914
2019-11-05 15:16:04 -08:00
Michael Pratt b23b36e701 Add NETLINK_KOBJECT_UEVENT socket support
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.

systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.

Fixes #1117
Updates #1119

PiperOrigin-RevId: 278405893
2019-11-04 10:07:52 -08:00
gVisor bot 802a3b3bd0 Merge pull request #1109 from xiaobo55x:fsgofer
PiperOrigin-RevId: 278032567
2019-11-01 17:37:07 -07:00
Nicolas Lacasse e70f28664a Allow the watchdog to detect when the sandbox is stuck during setup.
The watchdog currently can find stuck tasks, but has no way to tell if the
sandbox is stuck before the application starts executing.

This CL adds a startup timeout and action to the watchdog. If Start() is not
called before the given timeout (if non-zero), then the watchdog will take the
action.

PiperOrigin-RevId: 277970577
2019-11-01 11:49:31 -07:00
Ian Lewis 36837c4ad3 Add systemd-cgroup flag option.
Adds a systemd-cgroup flag option that prints an error letting the user know
that systemd cgroups are not supported and points them to the relevant issue.

Issue #193

PiperOrigin-RevId: 277837162
2019-10-31 17:39:06 -07:00
gVisor bot 0202be1ba5 Merge pull request #1058 from cmingxu:master
PiperOrigin-RevId: 277623766
2019-10-31 11:26:45 -07:00
Fabricio Voznika ca90dad0e2 Fix container locking
Sandbox root dir was not being saved with the Container state,
so it would point to the wrong directory location when attempting
to lock the sandbox. This led to race conditions saving and
loading container state. Fixing it, led to multiple deadlocks.

I've moved the saving and locking logic to a separate struct and
moved the lock file inside the RootDir (instead of container
root dir), which allows the lock to be taken inside Destroy,
and removes the need to lock the sandbox.

PiperOrigin-RevId: 277599612
2019-10-30 15:39:04 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Haibo Xu 80d0db274e Enable runsc/fsgofer support on arm64.
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
2019-10-30 05:21:36 +00:00
Haibo Xu dec831b493 Cast the Stat_t.Nlink to uint64 on arm64.
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-28 05:56:03 +00:00
Fabricio Voznika e8ba10c008 Fix early deletion of rootDir
container.startContainers() cannot be called twice in a test
(e.g. TestMultiContainerLoadSandbox) because the cleanup
function deletes the rootDir, together with information from
all other containers that may exist.

PiperOrigin-RevId: 276591806
2019-10-24 16:36:54 -07:00
kevin.xu 1f19624fa1
fix typo
fix a typo
2019-10-23 15:21:50 +08:00
kevin.xu 3edbdcc191
remove duplicated period
remove a duplicated period
2019-10-23 14:56:44 +08:00
gVisor bot 6122b413f1 Merge pull request #1046 from tomlanyon:crio
PiperOrigin-RevId: 276172466
2019-10-22 17:05:04 -07:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Tom Lanyon 7e8b5f4a3a Add runsc OCI annotations to support CRI-O.
Obligatory https://xkcd.com/927

Fixes #626
2019-10-20 21:11:01 +11:00
Michael Pratt 49b596b98d Cleanup host UDS support
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.

A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.

Updates #235
Updates #1003

[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.

PiperOrigin-RevId: 275558502
2019-10-18 15:33:03 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Michael Pratt a295616326 Make Attach no longer a special snowflake
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.

This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.

Updates #235

PiperOrigin-RevId: 274827872
2019-10-15 10:01:22 -07:00
gVisor bot 35d35ea5d0 Merge pull request #997 from dvrkps:patch-1
PiperOrigin-RevId: 274675428
2019-10-14 15:52:22 -07:00
Davor Kapsa fec0663bb7
Set base to root 2019-10-11 06:38:26 +02:00
Ian Lewis 065339193e Update TODO for OCI seccomp support.
PiperOrigin-RevId: 274042343
2019-10-10 14:43:03 -07:00
Davor Kapsa 53960d48c7
Remove unnecessary assignment to path 2019-10-10 23:06:17 +02:00
Michael Pratt a5170fd825 Allow rt_sigreturn in runsc gofer
rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer
dereference). Before this, nil-pointer dereferences cause a syscall violation
instead of a panic.

PiperOrigin-RevId: 274028767
2019-10-10 13:41:29 -07:00
Fabricio Voznika a357fe427b Remove stale TODO
PiperOrigin-RevId: 273630282
2019-10-08 16:23:41 -07:00
Fabricio Voznika b9cdbc26bc Ignore mount options that are not supported in shared mounts
Options that do not change mount behavior inside the Sentry are
irrelevant and should not be used when looking for possible
incompatibilities between master and slave mounts.

PiperOrigin-RevId: 273593486
2019-10-08 13:36:16 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Ian Lewis da9e18f24d Add tests for $HOME
Adds two tests. One to make sure that $HOME is set when starting a container
via 'docker run' and one to make sure that $HOME is set for each container in a
multi-container sandbox.

Issue #701

PiperOrigin-RevId: 273395763
2019-10-07 15:55:39 -07:00
Kevin Krakauer 6a98237949 Rename epsocket to netstack.
PiperOrigin-RevId: 273365058
2019-10-07 13:57:59 -07:00
Andrei Vagin 29207cef14 runsc: remove todo from the build file
b/135475885 was fixed by cl/271434565.

PiperOrigin-RevId: 272320178
2019-10-01 16:25:34 -07:00
gVisor bot 90e908f419 Merge pull request #917 from KentaTada:fix-clone-flags
PiperOrigin-RevId: 272262368
2019-10-01 12:06:30 -07:00
Fabricio Voznika 0b02c3d5e5 Prevent CAP_NET_RAW from appearing in exec
'docker exec' was getting CAP_NET_RAW even when --net-raw=false
because it was not filtered out from when copying container's
capabilities.

PiperOrigin-RevId: 272260451
2019-10-01 11:49:49 -07:00
Andrei Vagin fa15fda6c4 bazel: use rules_pkg from https://github.com/bazelbuild/
BUILD:85:1: in _pkg_deb rule //runsc:runsc-debian: target
'//runsc:runsc-debian' depends on deprecated target
'@bazel_tools//tools/build_defs/pkg:make_deb': The internal version of
make_deb is deprecated. Please use the replacement for pkg_deb from
https://github.com/bazelbuild/rules_pkg/blob/master/pkg.
PiperOrigin-RevId: 271590386
2019-09-27 09:50:18 -07:00
Fabricio Voznika 8337e4f509 Disallow opening of sockets if --fsgofer-host-uds=false
Updates #235

PiperOrigin-RevId: 271475319
2019-09-26 18:16:02 -07:00
Kenta Tada 69f3c79b24 runsc: add the clone flag of cgroup namespace
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2019-09-26 12:02:01 +09:00
gVisor bot dd0e5eedae Merge pull request #765 from trailofbits:uds_support
PiperOrigin-RevId: 271235134
2019-09-25 16:44:22 -07:00
Fabricio Voznika 129c67d68e Fix runsc log collection in kokoro
PiperOrigin-RevId: 271207152
2019-09-25 14:33:11 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Robert Tonic 9ebd498a55 Remove unecessary seccomp permission.
This removes the F_DUPFD_CLOEXEC support for the gofer, previously 
required when depending on the STL net package.
2019-09-24 18:37:25 -04:00
Robert Tonic 7810b30983 Refactor command line options and remove the allowed terminology for uds 2019-09-24 18:24:10 -04:00
gVisor bot 91abeb1dbc Merge pull request #812 from lubinszARM:pr_dup3_arm
PiperOrigin-RevId: 270957224
2019-09-24 12:06:38 -07:00
Nicolas Lacasse f2ea8e6b24 Always set HOME env var with `runsc exec`.
We already do this for `runsc run`, but need to do the same for `runsc exec`.

PiperOrigin-RevId: 270793459
2019-09-23 17:06:02 -07:00
Robert Tonic e975184bc5 Update InstallUDSFilters documentation to be accurate to functionality. 2019-09-19 17:44:46 -04:00
Robert Tonic 46beb91912 Fix documentation, clean up seccomp filter installation, rename helpers.
Filter installation has been streamlined and functions renamed. 
Documentation has been fixed to be standards compliant, and missing 
documentation added. gofmt has also been applied to modified files.
2019-09-19 17:10:50 -04:00