Commit Graph

5309 Commits

Author SHA1 Message Date
gVisor bot cc762235ce Internal change.
PiperOrigin-RevId: 366462448
2021-04-02 09:58:19 -07:00
Bhasker Hariharan b2ea37401e Internal changes
PiperOrigin-RevId: 366344805
2021-04-01 15:40:07 -07:00
Adin Scannell 513de4039c Remove invalid dependency.
PiperOrigin-RevId: 366344222
2021-04-01 15:34:56 -07:00
gVisor bot 6c10c772e4 Internal change.
PiperOrigin-RevId: 366292533
2021-04-01 11:24:04 -07:00
Fabricio Voznika 71f3dccbb3 Fix panic when overriding /dev files with VFS2
VFS1 skips over mounts that overrides files in /dev because the list of
files is hardcoded. This is not needed for VFS2 and a recent change
lifted this restriction. However, parts of the code were still skipping
/dev mounts even in VFS2, causing the loader to panic when it ran short
of FDs to connect to the gofer.

PiperOrigin-RevId: 365858436
2021-03-30 11:36:55 -07:00
Zach Koopmans 8a2f7e716d [syserror] Split usermem package
Split usermem package to help remove syserror dependency in go_marshal.
New hostarch package contains code not dependent on syserror.

PiperOrigin-RevId: 365651233
2021-03-29 13:30:21 -07:00
gVisor bot b125afba41 Merge pull request #5728 from zhlhahaha:2091
PiperOrigin-RevId: 365613394
2021-03-29 10:57:46 -07:00
Ayush Ranjan da6ddd1df8 [perf] Reduce contention in ptrace.threadPool.lookupOrCreate().
lookupOrCreate is called from subprocess.switchToApp() and subprocess.syscall().
lookupOrCreate() looks for a thread already created for the current TID. If a
thread exists (common case), it returns immediately. Otherwise it creates a new
one.

This change switches to using a sync.RWMutex. The initial thread existence
lookup is now done only with the read lock. So multiple successful lookups can
occur concurrently. Only when a new thread is created will it acquire the lock
for writing and update the map (which is not the common case).

Discovered in mutex profiles from the various ptrace benchmarks.
Example: https://gvisor.dev/profile/gvisor-buildkite/fd14bfad-b30f-44dc-859b-80ebac50beb4/843827db-da50-4dc9-a2ea-ecf734dde2d5/tmp/profile/ptrace/BenchmarkFio/operation.write/blockSize.4K/filesystem.tmpfs/benchmarks/fio/mutex.pprof/flamegraph
PiperOrigin-RevId: 365612094
2021-03-29 10:52:19 -07:00
Jamie Liu fbec65fc3f Use seqfile.SeqHandles correctly in VFS1 /proc/net/.
Before this change:

```
$ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = EOF
$ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = 256
```

After this change:

```
$ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = 256
$ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = 256
```

Fixes #5732

PiperOrigin-RevId: 365178386
2021-03-25 20:27:38 -07:00
Jamie Liu 79bc446fac Lock TaskSet mutex for writing in ptraceClone().
This is necessary since ptraceClone() mutates tracer.ptraceTracees.

PiperOrigin-RevId: 365152396
2021-03-25 16:50:06 -07:00
Kevin Krakauer 6b085ba477 setgid: skip tests when we can't find usable GIDs
PiperOrigin-RevId: 365092320
2021-03-25 12:00:24 -07:00
Howard Zhang 253f180c69 Fix comments error
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2021-03-25 17:39:45 +08:00
Howard Zhang a01fc7108f Fix nogo test error
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2021-03-25 17:39:28 +08:00
Ian Lewis c27fac421b Fix path to runsc in CNI tutorial.
PiperOrigin-RevId: 364931406
2021-03-24 18:01:05 -07:00
Ian Lewis e4772bd845 Fix highlighting sidebar menu on the website
Highlighting previously highlighted multiple items in the sidebar if the had
the same page name (not full url). This change simplifies this by adding the
highlight class in the jekyll template rather than javascript, and highlights
only the correct page.

PiperOrigin-RevId: 364931350
2021-03-24 17:56:40 -07:00
Bhasker Hariharan e7ca2a51a8 Add POLLRDNORM/POLLWRNORM support.
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather
than hack these on in sys_poll etc it felt cleaner to just cleanup
the call sites to notify for both events. This is what linux does
as well.

Fixes #5544

PiperOrigin-RevId: 364859977
2021-03-24 12:11:44 -07:00
Bhasker Hariharan 72ff6a1cac Fix data race in fdbased when accessing fanoutID.
PiperOrigin-RevId: 364859173
2021-03-24 12:07:10 -07:00
Nick Brown ec0aa657ed Unexpose immutable fields in stack.Route
This change sets the inner `routeInfo` struct to be a named private member
and replaces direct access with access through getters. Note that direct
access to the fields of `routeInfo` is still possible through the `RouteInfo`
struct.

Fixes #4902

PiperOrigin-RevId: 364822872
2021-03-24 09:38:27 -07:00
gVisor bot 8ee4a3f6d0 Merge pull request #5677 from avagin:kvm-mmio
PiperOrigin-RevId: 364728696
2021-03-23 22:50:14 -07:00
Andrei Vagin 56a9a13976 Move the code that manages floating-point state to a separate package
This change is inspired by Adin's cl/355256448.

PiperOrigin-RevId: 364695931
2021-03-23 18:46:37 -07:00
Fabricio Voznika 960155cdaa Add --file-access-mounts flag
--file-access-mounts flag is similar to --file-access, but controls
non-root mounts that were previously mounted in shared mode only.
This gives more flexibility to control how mounts are shared within
a container.

PiperOrigin-RevId: 364669882
2021-03-23 16:21:12 -07:00
Kevin Krakauer 92374e5197 setgid directory support in goferfs
Also adds support for clearing the setuid bit when appropriate (writing,
truncating, changing size, changing UID, or changing GID).

VFS2 only.

PiperOrigin-RevId: 364661835
2021-03-23 15:42:12 -07:00
Rahat Mahmood acb4c62885 Skip checklocks analysis for stateify generated code.
Stateify methods are always called without holding the appropriate
locks. The system is paused and we know there will be no mutations
when we call Save/Load, so this is perfectly safe. However, checklocks
can't know about this, and it will always complain.

Mark stateify generated methods that touch struct fields as
"checklocksignore" to avoid this.

PiperOrigin-RevId: 364610241
2021-03-23 11:56:59 -07:00
Chong Cai beb11cec76 Allow FSETXATTR/FGETXATTR host calls for Verity
These host calls are needed for Verity fs to generate/verify hashes.

PiperOrigin-RevId: 364598180
2021-03-23 11:06:02 -07:00
Nayana Bidari dc75f08c2a Use constant (TestInitialSequenceNumber) instead of integer (789) in tests.
PiperOrigin-RevId: 364596526
2021-03-23 10:59:57 -07:00
Zach Koopmans 98f378d9ef Split fio read/write and randread/randwrite operations
The fio benchmark was changed to a fixed size read/write ammount
because the timed benchmark was overwhelming machine memory on
tmpfs mounts.

Now rand(read|write) operations are prohibitively long, leading to timeouts.

Split the benchmarks as they were in python bm-tools: the read/write as
fixed sized (1GB) and the rand(read|write) as timed operations (15s).

PiperOrigin-RevId: 364584436
2021-03-23 10:11:26 -07:00
Ghanan Gowripalan 409a114454 Explicitly allow martian loopback packets
...instead of opting out of them.

Loopback traffic should be stack-local but gVisor has some clients
that depend on the ability to receive loopback traffic that originated
from outside of the stack. Because of this, we guard this change behind
IP protocol options.

A previous change provided the facility to deny these martian loopback
packets but this change requires client to opt-in to accepting martian
loopback packets as accepting martian loopback packets are not meant
to be accepted, as per RFC 1122 section 3.2.1.3.g:

        (g)  { 127, <any> }

             Internal host loopback address.  Addresses of this form
             MUST NOT appear outside a host.

PiperOrigin-RevId: 364581174
2021-03-23 09:57:01 -07:00
Adin Scannell 7dbd6924a3 Update apt repository to limit to supported architectures.
Fixes #5703

PiperOrigin-RevId: 364492235
2021-03-22 23:16:41 -07:00
Ayush Ranjan c0bd71c5a5 [lisa] Support dynamic types for all types.
We were only supporting dynamic struct types. With this change, users can make
any type dynamic. The tool (correctly) blindly just generates the remaining
methods needed to implement Marshallable using the 3 methods defined by the
user on the dynamic type.

This is helpful in situations like:
type StringArray []string

Added a test for such a use case.

PiperOrigin-RevId: 364463164
2021-03-22 19:17:49 -07:00
Zeling Feng 9e86dfc9c5 Fix logs for packetimpact tests cleanup
- Don't cleanup containers in Network.Cleanup, otherwise containers will
  be killed and removed several times.
- Don't set AutoRemove for containers. This will prevent the confusing
  'removal already in progress' messages.

Fixes #3795

PiperOrigin-RevId: 364404414
2021-03-22 14:10:00 -07:00
Ghanan Gowripalan a073d76979 Return tcpip.Error from (*Stack).GetMainNICAddress
PiperOrigin-RevId: 364381970
2021-03-22 12:31:46 -07:00
Rahat Mahmood 6bd2c6ce73 Emit comment about build tags in gomarshal generated files.
This may be useful for tracking down where build tags come from and
understanding tag import issues in generated files.

PiperOrigin-RevId: 364374931
2021-03-22 12:02:03 -07:00
Nicolas Lacasse b428fd02e6 Avoid calling sync on each write in writethrough mode.
PiperOrigin-RevId: 364370595
2021-03-22 11:44:31 -07:00
Zeling Feng cbac2d9f97 Fix and merge tcp_{outside_the_window,tcp_unacc_seq_ack}_closing
The tests were not using the correct windowSize so the testing segments were
actually within the window for seqNumOffset=0 tests. The issue is already fixed
by #5674.

PiperOrigin-RevId: 364252630
2021-03-22 00:06:18 -07:00
Fabricio Voznika 7fac7e32f3 Translate syserror when validating partial IO errors
syserror allows packages to register translators for errors. These
translators should be called prior to checking if the error is valid,
otherwise it may not account for possible errors that can be returned
from different packages, e.g. safecopy.BusError => syserror.EFAULT.

Second attempt, it passes tests now :-)

PiperOrigin-RevId: 363714508
2021-03-18 12:19:57 -07:00
Zach Koopmans 29be908ab6 Address post submit comments for fs benchmarks.
Also, drop fio total reads/writes to 1GB as 10GB is
prohibitively slow.

PiperOrigin-RevId: 363714060
2021-03-18 12:14:27 -07:00
Jamie Liu 5c4f4ed9eb Skip /dev submount hack on VFS2.
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.:

```
  "mounts": [
    ...
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    ...
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "mode=1777",
        "size=67108864"
      ]
    },
    ...
```

(This is mostly consistent with how Linux is usually configured, except that
/dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer
implements OCI-runtime-spec-undocumented behavior to create
/dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently
switches /dev to devtmpfs. In VFS1, this is necessary to get device files like
/dev/null at all, since VFS1 doesn't support real device special files, only
what is hardcoded in devfs. VFS2 does support device special files, but using
devtmpfs is the easiest way to get pre-created files in /dev.)

runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1,
this appears to be to avoid introducing a submount overlay for /dev, and is
mostly fine since the typical mode for the /dev/shm mount is ~consistent with
the mode of the /dev/shm directory provided by devfs (modulo the sticky bit).
In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs'
/dev/shm mode is correct for the mount point but not the mount. So turn off
this behavior for VFS2.

After this change:

```
$ docker run --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root  40 Mar 18 00:16 .
drwxr-xr-x 5 root root 360 Mar 18 00:16 ..

$ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwx 1 root root 0 Mar 18 00:16 .
dr-xr-xr-x 1 root root 0 Mar 18 00:16 ..

$ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root  40 Mar 18 00:16 .
drwxr-xr-x 5 root root 320 Mar 18 00:16 ..
```

Fixes #5687

PiperOrigin-RevId: 363699385
2021-03-18 11:12:43 -07:00
Ghanan Gowripalan d3a433caae Do not use martian loopback packets in tests
Transport demuxer and UDP tests should not use a loopback address as the
source address for packets injected into the stack as martian loopback
packets will be dropped in a later change.

PiperOrigin-RevId: 363479681
2021-03-17 12:29:08 -07:00
Ghanan Gowripalan 4065604e1b Drop loopback traffic from outside of the stack
Loopback traffic should be stack-local but gVisor has some clients
that depend on the ability to receive loopback traffic that originated
from outside of the stack. Because of this, we guard this change behind
IP protocol options.

Test: integration_test.TestExternalLoopbackTraffic
PiperOrigin-RevId: 363461242
2021-03-17 11:12:06 -07:00
Andrei Vagin 2f3dac78ca kvm: prefault a floating point state before restoring it
If physical pages of a memory region are not mapped yet, the kernel will
trigger KVM_EXIT_MMIO and we will map physical pages in bluepillHandler().

An instruction that triggered a fault will not be re-executed, it
will be emulated in the kernel, but it can't  emulate complex
instructions like xsave, xrstor. We can touch the memory with
simple instructions to workaround this problem.
2021-03-16 21:55:20 -07:00
Zeling Feng 3dd7ad13b4 Fix tcp_fin_retransmission_netstack_test
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT
unconditionally. Fixing the state machine will give us back the retransmission
of FIN.

PiperOrigin-RevId: 363301883
2021-03-16 16:59:26 -07:00
Mithun Iyer 5eede4e756 Fix a race with synRcvdCount and accept
There is a race in handling new incoming connections on a listening
endpoint that causes the endpoint to reply to more incoming SYNs than
what is permitted by the listen backlog.

The race occurs when there is a successful passive connection handshake
and the synRcvdCount counter is decremented, followed by the endpoint
delivered to the accept queue. In the window of time between
synRcvdCount decrementing and the endpoint being enqueued for accept,
new incoming SYNs can be handled without honoring the listen backlog
value, as the backlog could be perceived not full.

Fixes #5637

PiperOrigin-RevId: 363279372
2021-03-16 15:08:09 -07:00
Kevin Krakauer 607a1e481c setgid directory support in overlayfs
PiperOrigin-RevId: 363276495
2021-03-16 14:55:29 -07:00
Ghanan Gowripalan 05193de1cc Unexport methods on NDPOption
They are not used outside of the header package.

PiperOrigin-RevId: 363237708
2021-03-16 12:04:52 -07:00
Ghanan Gowripalan 68065d1ceb Detect looped-back NDP DAD messages
...as per RFC 7527.

If a looped-back DAD message is received, do not fail DAD since our own
DAD message does not indicate that a neighbor has the address assigned.

Test: ndp_test.TestDADResolveLoopback
PiperOrigin-RevId: 363224288
2021-03-16 11:09:26 -07:00
Ghanan Gowripalan ebd7c1b889 Do not call into Stack from LinkAddressRequest
Calling into the stack from LinkAddressRequest is not needed as we
already have a reference to the network endpoint (IPv6) or network
interface (IPv4/ARP).

PiperOrigin-RevId: 363213973
2021-03-16 10:29:49 -07:00
Etienne Perot f7e841c2ce Turn sys_thread constants into variables.
PiperOrigin-RevId: 363092268
2021-03-15 20:16:48 -07:00
Etienne Perot f4b7421820 Move `MaxIovs` back to a variable in `iovec.go`.
PiperOrigin-RevId: 363091954
2021-03-15 20:11:41 -07:00
Fabricio Voznika 34d0d72067 Deflake proc_test_native
Terminating tasks from other tests can mess up with the task
list of the current test. Tests were changed to look for added/removed
tasks, ignoring other tasks that may exist while the test is running.

PiperOrigin-RevId: 363084261
2021-03-15 19:06:03 -07:00
Kevin Krakauer b1d5787726 Make netstack (//pkg/tcpip) buildable for 32 bit
Doing so involved breaking dependencies between //pkg/tcpip and the rest
of gVisor, which are discouraged anyways.

Tested on the Go branch via:
  gvisor.dev/gvisor/pkg/tcpip/...

Addresses #1446.

PiperOrigin-RevId: 363081778
2021-03-15 18:49:59 -07:00