Commit Graph

282 Commits

Author SHA1 Message Date
gVisor bot 5baf9dc2fb Synchronize signalling with S/R
This is to fix a data race between sending an external signal to
a ThreadGroup and kernel saving state for S/R.

PiperOrigin-RevId: 295244281
2020-02-14 15:49:09 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00
Adin Scannell 1b6a12a768 Add notes to relevant tests.
These were out-of-band notes that can help provide additional context
and simplify automated imports.

PiperOrigin-RevId: 293525915
2020-02-05 22:46:35 -08:00
Michael Pratt 4d1a648c7c Allow mlock in system call filters
Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g
stack to prevent register corruption. We need to allow this syscall until it is
removed from Go.

PiperOrigin-RevId: 292967478
2020-02-03 11:39:51 -08:00
Fabricio Voznika 437c986c6a Add vfs.FileDescription to FD table
FD table now holds both VFS1 and VFS2 types and uses the correct
one based on what's set.

Parts of this CL are just initial changes (e.g. sys_read.go,
runsc/main.go) to serve as a template for the remaining changes.

Updates #1487
Updates #1623

PiperOrigin-RevId: 292023223
2020-01-28 15:31:03 -08:00
Adin Scannell 253c9e666c Cleanup glog and add real caller information.
In general, we've learned that logging must be avoided at all
costs in the hot path. It's unlikely that the optimizations
here were significant in any case, since buffer would certainly
escape.

This also adds a test to ensure that the caller identification
works as expected, and so that logging can be benchmarked.

Original:
BenchmarkGoogleLogging-6   	 1222255	       949 ns/op

With this change:
BenchmarkGoogleLogging-6   	  517323	      2346 ns/op

Fixes #184

PiperOrigin-RevId: 291815420
2020-01-27 16:08:35 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Bert Muthalaly e21c584056 Combine various Create*NIC methods into CreateNICWithOptions.
PiperOrigin-RevId: 288779416
2020-01-08 14:50:49 -08:00
Bert Muthalaly 0cc1e74b57 Add NIC.isLoopback()
...enabling us to remove the "CreateNamedLoopbackNIC" variant of
CreateNIC and all the plumbing to connect it through to where the value
is read in FindRoute.

PiperOrigin-RevId: 288713093
2020-01-08 09:30:20 -08:00
Aleksandr Razumov 67f678be27
Leave minimum CPU number as a constant
Remove introduced CPUNumMin config and hard-code it as 2.
2019-12-17 20:41:02 +03:00
Aleksandr Razumov b661434202
Add minimum CPU number and only lower CPUs on --cpu-num-from-quota
* Add `--cpu-num-min` flag to control minimum CPUs
* Only lower CPU count
* Fix comments
2019-12-17 13:27:13 +03:00
Aleksandr Razumov 8782f0e287
Set CPU number to CPU quota
When application is not cgroups-aware, it can spawn excessive threads
which often defaults to CPU number.
Introduce a opt-in flag that will set CPU number accordingly to CPU
quota (if available).

Fixes #1391
2019-12-15 21:12:43 +03:00
Bhasker Hariharan b9aa62b9f9 Enable IPv6 in runsc
Fixes #1341

PiperOrigin-RevId: 285108973
2019-12-11 19:14:26 -08:00
Fabricio Voznika 01eadf51ea Bump up Go 1.13 as minimum requirement
PiperOrigin-RevId: 284320186
2019-12-06 23:10:15 -08:00
gVisor bot e70636d7f1 Merge pull request #1233 from xiaobo55x:compatLog
PiperOrigin-RevId: 284305935
2019-12-06 19:41:39 -08:00
Adin Scannell 371e210b83 Add runtime tracing.
This adds meaningful annotations to the trace generated by the runtime/trace
package.

PiperOrigin-RevId: 284290115
2019-12-06 17:00:07 -08:00
Fabricio Voznika ea7a100202 Make annotations OCI compliant
Changed annotation to follow the standard defined here:
https://github.com/opencontainers/image-spec/blob/master/annotations.md

PiperOrigin-RevId: 284254847
2019-12-06 13:51:38 -08:00
Dean Deng 19b2d997ec Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets.
There are two potential ways of sending a TOS byte with outgoing packets:
including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS
socket options (for IPV4 and IPV6 respectively). This change lets hostinet
support the latter.

Fixes #1188

PiperOrigin-RevId: 283550925
2019-12-03 08:33:22 -08:00
Haibo Xu 61f2274cb6 Enable runsc compatLog support on arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I3fd5e552f5f03b5144ed52647f75af3b8253b1d6
2019-12-03 03:25:54 +00:00
Dean Deng 684f757a22 Add support for receiving TOS and TCLASS control messages in hostinet.
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.

PiperOrigin-RevId: 282851425
2019-11-27 16:21:05 -08:00
Fabricio Voznika 97d2c9a94e Use mount hints to determine FileAccessType
PiperOrigin-RevId: 282401165
2019-11-25 11:43:05 -08:00
gVisor bot 0416c247ec Merge pull request #1176 from xiaobo55x:runsc_boot
PiperOrigin-RevId: 282382564
2019-11-25 11:01:22 -08:00
Haibo Xu 05871a1cdc Enable runsc/boot support on arm64.
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-13 06:39:11 +00:00
Michael Pratt b23b36e701 Add NETLINK_KOBJECT_UEVENT socket support
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.

systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.

Fixes #1117
Updates #1119

PiperOrigin-RevId: 278405893
2019-11-04 10:07:52 -08:00
Nicolas Lacasse e70f28664a Allow the watchdog to detect when the sandbox is stuck during setup.
The watchdog currently can find stuck tasks, but has no way to tell if the
sandbox is stuck before the application starts executing.

This CL adds a startup timeout and action to the watchdog. If Start() is not
called before the given timeout (if non-zero), then the watchdog will take the
action.

PiperOrigin-RevId: 277970577
2019-11-01 11:49:31 -07:00
gVisor bot 0202be1ba5 Merge pull request #1058 from cmingxu:master
PiperOrigin-RevId: 277623766
2019-10-31 11:26:45 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
kevin.xu 1f19624fa1
fix typo
fix a typo
2019-10-23 15:21:50 +08:00
kevin.xu 3edbdcc191
remove duplicated period
remove a duplicated period
2019-10-23 14:56:44 +08:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Fabricio Voznika a357fe427b Remove stale TODO
PiperOrigin-RevId: 273630282
2019-10-08 16:23:41 -07:00
Fabricio Voznika b9cdbc26bc Ignore mount options that are not supported in shared mounts
Options that do not change mount behavior inside the Sentry are
irrelevant and should not be used when looking for possible
incompatibilities between master and slave mounts.

PiperOrigin-RevId: 273593486
2019-10-08 13:36:16 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Kevin Krakauer 6a98237949 Rename epsocket to netstack.
PiperOrigin-RevId: 273365058
2019-10-07 13:57:59 -07:00
gVisor bot dd0e5eedae Merge pull request #765 from trailofbits:uds_support
PiperOrigin-RevId: 271235134
2019-09-25 16:44:22 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Robert Tonic 7810b30983 Refactor command line options and remove the allowed terminology for uds 2019-09-24 18:24:10 -04:00
Nicolas Lacasse f2ea8e6b24 Always set HOME env var with `runsc exec`.
We already do this for `runsc run`, but need to do the same for `runsc exec`.

PiperOrigin-RevId: 270793459
2019-09-23 17:06:02 -07:00
Robert Tonic 46beb91912 Fix documentation, clean up seccomp filter installation, rename helpers.
Filter installation has been streamlined and functions renamed. 
Documentation has been fixed to be standards compliant, and missing 
documentation added. gofmt has also been applied to modified files.
2019-09-19 17:10:50 -04:00
Robert Tonic ac38a7ead0 Place the host UDS mounting behind --fsgofer-host-uds-allowed.
This commit allows the use of the `--fsgofer-host-uds-allowed` flag to 
enable mounting sockets and add the appropriate seccomp filters.
2019-09-19 12:37:15 -04:00
Fabricio Voznika 010b093258 Bring back to life features lost in recent refactor
- Sandbox logs are generated when running tests
- Kokoro uploads the sandbox logs
- Supports multiple parallel runs
- Revive script to install locally built runsc with docker

PiperOrigin-RevId: 269337274
2019-09-16 08:17:00 -07:00
Adin Scannell a8834fc555 Update p9 to support flipcall.
PiperOrigin-RevId: 268845090
2019-09-12 23:37:31 -07:00
Ian Gudger fe1f521077 Remove reundant global tcpip.LinkEndpointID.
PiperOrigin-RevId: 267709597
2019-09-06 18:01:14 -07:00
Fabricio Voznika 0f5cdc1e00 Resolve flakes with TestMultiContainerDestroy
Some processes are reparented to the root container depending
on the kill order and the root container would not reap in time.
So some zombie processes were still present when the test checked.

Fix it by running the second container inside a PID namespace.

PiperOrigin-RevId: 267278591
2019-09-04 18:56:49 -07:00
gVisor bot 0789b9cc08 Merge pull request #655 from praveensastry:feature/runsc-ref-chk-leak
PiperOrigin-RevId: 266226714
2019-08-29 14:17:32 -07:00
Fabricio Voznika c39564332b Mount volumes as super user
This used to be the case, but regressed after a recent change.
Also made a few fixes around it and clean up the code a bit.

Closes #720

PiperOrigin-RevId: 265717496
2019-08-27 10:47:16 -07:00