Commit Graph

4066 Commits

Author SHA1 Message Date
Bhasker Hariharan b69352245a Fix Accept to not return error for sockets in accept queue.
Accept on gVisor will return an error if a socket in the accept queue was closed
before Accept() was called. Linux will return the new fd even if the returned
socket is already closed by the peer say due to a RST being sent by the peer.

This seems to be intentional in linux more details on the github issue.

Fixes #3780

PiperOrigin-RevId: 329828404
2020-09-02 18:21:47 -07:00
Ayush Ranjan 1fec861939 [vfs] Implement xattr for overlayfs.
PiperOrigin-RevId: 329825497
2020-09-02 17:58:05 -07:00
Ayush Ranjan 0ca0d8e011 [vfs] Fix error handling in overlayfs OpenAt.
Updates #1199

PiperOrigin-RevId: 329802274
2020-09-02 15:43:13 -07:00
Jamie Liu 5c66011200 Update Go version constraint on sync/spin_unsafe.go.
PiperOrigin-RevId: 329801584
2020-09-02 15:37:26 -07:00
Jamie Liu 9bd0164237 Improve sync.SeqCount performance.
- Make sync.SeqCountEpoch not a struct. This allows sync.SeqCount.BeginRead()
  to be inlined.

- Mark sync.SeqAtomicLoad<T> nosplit to mitigate the Go compiler's refusal to
  inline it. (Best I could get was "cost 92 exceeds budget 80".)

- Use runtime-guided spinning in SeqCount.BeginRead().

Benchmarks:
name                               old time/op  new time/op   delta
pkg:pkg/sync/sync goos:linux goarch:amd64
SeqCountWriteUncontended-12        8.24ns ± 0%  11.40ns ± 0%  +38.35%  (p=0.000 n=10+10)
SeqCountReadUncontended-12         0.33ns ± 0%   0.14ns ± 3%  -57.77%  (p=0.000 n=7+8)
pkg:pkg/sync/seqatomictest/seqatomic goos:linux goarch:amd64
SeqAtomicLoadIntUncontended-12     0.64ns ± 1%   0.41ns ± 1%  -36.40%  (p=0.000 n=10+8)
SeqAtomicTryLoadIntUncontended-12  0.18ns ± 4%   0.18ns ± 1%     ~     (p=0.206 n=10+8)
AtomicValueLoadIntUncontended-12   0.27ns ± 3%   0.27ns ± 0%   -1.77%  (p=0.000 n=10+8)

(atomic.Value.Load is, of course, inlined. We would expect an uncontended
inline SeqAtomicLoad<int> to perform identically to SeqAtomicTryLoad<int>.) The
"regression" in BenchmarkSeqCountWriteUncontended, despite this CL changing
nothing in that path, is attributed to microarchitectural subtlety; the
benchmark loop is unchanged except for its address:

Before this CL:
  :0                    0x4e62d1                48ffc2                  INCQ DX
  :0                    0x4e62d4                48399110010000          CMPQ DX, 0x110(CX)
  :0                    0x4e62db                7e26                    JLE 0x4e6303
  :0                    0x4e62dd                90                      NOPL
  :0                    0x4e62de                bb01000000              MOVL $0x1, BX
  :0                    0x4e62e3                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e62e7                ffc3                    INCL BX
  :0                    0x4e62e9                0fbae300                BTL $0x0, BX
  :0                    0x4e62ed                733a                    JAE 0x4e6329
  :0                    0x4e62ef                90                      NOPL
  :0                    0x4e62f0                bb01000000              MOVL $0x1, BX
  :0                    0x4e62f5                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e62f9                ffc3                    INCL BX
  :0                    0x4e62fb                0fbae300                BTL $0x0, BX
  :0                    0x4e62ff                73d0                    JAE 0x4e62d1

After this CL:
  :0                    0x4e6361                48ffc2                  INCQ DX
  :0                    0x4e6364                48399110010000          CMPQ DX, 0x110(CX)
  :0                    0x4e636b                7e26                    JLE 0x4e6393
  :0                    0x4e636d                90                      NOPL
  :0                    0x4e636e                bb01000000              MOVL $0x1, BX
  :0                    0x4e6373                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e6377                ffc3                    INCL BX
  :0                    0x4e6379                0fbae300                BTL $0x0, BX
  :0                    0x4e637d                733a                    JAE 0x4e63b9
  :0                    0x4e637f                90                      NOPL
  :0                    0x4e6380                bb01000000              MOVL $0x1, BX
  :0                    0x4e6385                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e6389                ffc3                    INCL BX
  :0                    0x4e638b                0fbae300                BTL $0x0, BX
  :0                    0x4e638f                73d0                    JAE 0x4e6361

PiperOrigin-RevId: 329754148
2020-09-02 11:37:31 -07:00
Zach Koopmans b9b6660dc4 Add Docs to nginx benchmark.
Adds docs to nginx and refactors both Httpd and Nginx benchmarks.

Key changes:
- Add docs and make nginx tests the same as httpd (reverse, all docs, etc.).
- Make requests scale on c * b.N -> a request per thread. This works well
with both --test.benchtime=10m (do a run that lasts at least 10m) and
--test.benchtime=10x (do b.N = 10).
-- Remove a doc from both tests (1000Kb) as 1024Kb exists.

PiperOrigin-RevId: 329751091
2020-09-02 11:22:17 -07:00
Ayush Ranjan 8ab08cdc01 [runtime tests] Exclude flaky nodejs test
PiperOrigin-RevId: 329749191
2020-09-02 11:13:02 -07:00
gVisor bot a0e4310384 Merge pull request #3822 from btw616:fix/issue-3821
PiperOrigin-RevId: 329710371
2020-09-02 07:42:19 -07:00
Zach Koopmans 563f28b7d5 Fix statfs test for opensource.
PiperOrigin-RevId: 329638946
2020-09-01 21:03:48 -07:00
Fabricio Voznika 37a217aca4 Implement setattr+clunk in 9P
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.

Benchmark results:

BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2:          68109 ns
VFS1:          72507 ns

Updates #1198

PiperOrigin-RevId: 329628461
2020-09-01 19:22:12 -07:00
Mithun Iyer 40faeaa180 Fix handling of unacceptable ACKs during close.
On receiving an ACK with unacceptable ACK number, in a closing state,
TCP, needs to reply back with an ACK with correct seq and ack numbers and
remain in same state. This change is as per RFC793 page 37, but with a
difference that it does not apply to ESTABLISHED state, just as in Linux.
Also add more tests to check for OTW sequence number and unacceptable
ack numbers in these states.

Fixes #3785

PiperOrigin-RevId: 329616283
2020-09-01 17:45:04 -07:00
Dean Deng c67d8ece09 Test opening file handles with different permissions.
These were problematic for vfs2 gofers before correctly implementing separate
read/write handles.

PiperOrigin-RevId: 329613261
2020-09-01 17:21:22 -07:00
Ayush Ranjan 2eaf54dd59 Refactor tty codebase to use master-replica terminology.
Updates #2972

PiperOrigin-RevId: 329584905
2020-09-01 14:43:41 -07:00
Nayana Bidari 04c284f8c2 Fix panic when calling dup2().
PiperOrigin-RevId: 329572337
2020-09-01 13:41:01 -07:00
Ayush Ranjan 723fb5c116 [go-marshal] Enable auto-marshalling for fs/tty.
PiperOrigin-RevId: 329564614
2020-09-01 13:02:17 -07:00
Fabricio Voznika 71589b7f7e Let flags be overriden from OCI annotations
This allows runsc flags to be set per sandbox instance. For
example, K8s pod annotations can be used to enable
--debug for a single pod, making troubleshoot much easier.
Similarly, features like --vfs2 can be enabled for
experimentation without affecting other pods in the node.

Closes #3494

PiperOrigin-RevId: 329542815
2020-09-01 11:12:19 -07:00
Nayana Bidari 0eae08bc9e Automated rollback of changelist 328350576
PiperOrigin-RevId: 329526153
2020-09-01 09:54:55 -07:00
Ian Lewis f4be726fde Use 1080p background image.
This makes the background image on the top page 1/3 as big and allows it to
load in roughly half the time.

PiperOrigin-RevId: 329462030
2020-09-01 01:28:19 -07:00
Tiwei Bie 66ee7c0e98 Dup stdio FDs for VFS2 when starting a child container
Currently the stdio FDs are not dupped and will be closed
unexpectedly in VFS2 when starting a child container. This
patch fixes this issue.

Fixes: #3821

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-01 15:51:08 +08:00
Zach Koopmans 6748438493 Fix bug in bazel build benchmark.
PiperOrigin-RevId: 329409802
2020-08-31 17:17:09 -07:00
Adin Scannell 101c97d6f8 Change nogo failures to test failures, instead of build failures.
PiperOrigin-RevId: 329408633
2020-08-31 17:09:20 -07:00
Jay Zhuang 170560cec0 Set errno on response when syscall actually fails
This prevents setting stale errno on responses.

Also fixes TestDiscardsUDPPacketsWithMcastSourceAddressV6 to use correct
multicast addresses in test.

Fixes #3793

PiperOrigin-RevId: 329391155
2020-08-31 15:31:42 -07:00
Jamie Liu 6cdfa4fee0 Don't use read-only host FD for writable gofer dentries in VFS2.
As documented for gofer.dentry.hostFD.

PiperOrigin-RevId: 329372319
2020-08-31 13:57:19 -07:00
Tamir Duberstein 9d0d82088a Remove __fuchsia__ defines
These mostly guard linux-only headers; check for linux instead.

PiperOrigin-RevId: 329362762
2020-08-31 13:10:56 -07:00
gVisor bot 911cecaa34 Implement walk in gvisor verity fs
Implement walk directories in gvisor verity file system. For each step,
the child dentry is verified against a verified parent root hash.

PiperOrigin-RevId: 329358747
2020-08-31 12:52:21 -07:00
Ting-Yu Wang ba25485d96 stateify: Bring back struct field and type names in pretty print
PiperOrigin-RevId: 329349158
2020-08-31 12:06:00 -07:00
Rahat Mahmood a3d189301d Run syscall tests in uts namespaces.
Some syscall tests, namely uname_test_* modify the host and domain
name, which modifies the execution environment and can have unintended
consequences on other tests. For example, modifying the hostname
causes some networking tests to fail DNS lookups. Run all syscall
tests in their own uts namespaces to isolate these changes.

PiperOrigin-RevId: 329348127
2020-08-31 11:58:45 -07:00
Ian Lewis 3bee863aee Add code search badge
PiperOrigin-RevId: 329042549
2020-08-28 18:09:13 -07:00
Nicolas Lacasse f6ddcbefac Fix kernfs.Dentry reference leak.
PiperOrigin-RevId: 329036994
2020-08-28 17:20:17 -07:00
Tamir Duberstein d3057717dc Include command output on error
Currently the logs produce

  TestOne: packetimpact_test.go:182: listing devices on ... container: process terminated with status: 126

which is not actionable; presumably the `ip` command output is interesting.

PiperOrigin-RevId: 329032105
2020-08-28 16:45:18 -07:00
Ghanan Gowripalan d5787f628c Don't bind loopback to all IPs in an IPv6 subnet
An earlier change considered the loopback bound to all addresses in an
assigned subnet. This should have only be done for IPv4 to maintain
compatability with Linux:

```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3062ms

$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
^C
--- 2001:db8::2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms

$ sudo ip addr add 2001:db8::1/64 dev lo
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
64 bytes from 2001:db8::1: icmp_seq=1 ttl=64 time=0.055 ms
64 bytes from 2001:db8::1: icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from 2001:db8::1: icmp_seq=3 ttl=64 time=0.073 ms
64 bytes from 2001:db8::1: icmp_seq=4 ttl=64 time=0.071 ms
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3075ms
rtt min/avg/max/mdev = 0.055/0.068/0.074/0.007 ms
$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
From 2001:db8::1 icmp_seq=1 Destination unreachable: No route
From 2001:db8::1 icmp_seq=2 Destination unreachable: No route
From 2001:db8::1 icmp_seq=3 Destination unreachable: No route
From 2001:db8::1 icmp_seq=4 Destination unreachable: No route
^C
--- 2001:db8::2 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3070ms
```

Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 329011566
2020-08-28 14:39:30 -07:00
Rahat Mahmood b4820e5986 Implement StatFS for various VFS2 filesystems.
This mainly involved enabling kernfs' client filesystems to provide a
StatFS implementation.

Fixes #3411, #3515.

PiperOrigin-RevId: 329009864
2020-08-28 14:31:11 -07:00
Ghanan Gowripalan bdd5996a73 Improve type safety for network protocol options
The existing implementation for NetworkProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.

This change introduces marker interfaces for network protocol options
that may be set or queried which network protocol option types implement
to ensure that invalid types are caught at compile time. Different
interfaces are used to allow the compiler to enforce read-only or
set-only socket options.

PiperOrigin-RevId: 328980359
2020-08-28 11:50:17 -07:00
Dean Deng 8b9cb36d1c Fix EOF handling for splice.
Also, add corresponding EOF tests for splice/sendfile.

Discovered by syzkaller.

PiperOrigin-RevId: 328975990
2020-08-28 11:28:28 -07:00
Kevin Krakauer b3ff31d041 fix panic when calling SO_ORIGINAL_DST without initializing iptables
Reported-by: syzbot+074ec22c42305725b79f@syzkaller.appspotmail.com
PiperOrigin-RevId: 328963899
2020-08-28 10:35:18 -07:00
Tamir Duberstein 7bc9f9b47f Add test demonstrating accept bug
Updates #3780.

PiperOrigin-RevId: 328922573
2020-08-28 05:33:49 -07:00
Ghanan Gowripalan 8ae0ab722c Use a single NetworkEndpoint per address
This change was already done as of
https://github.com/google/gvisor/commit/1736b2208f but
https://github.com/google/gvisor/commit/a174aa7597 conflicted with that
change and it was missed in reviews.

This change fixes the conflict.

PiperOrigin-RevId: 328920372
2020-08-28 05:09:15 -07:00
Ayush Ranjan 421e35020b [go-marshal] Enable auto-marshalling for tundev.
PiperOrigin-RevId: 328863725
2020-08-27 19:27:18 -07:00
Dean Deng 84f04909c2 Fix vfs2 pipe behavior when splicing to a non-pipe.
Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero
would not actually empty the pipe.

There was no guarantee that the data would actually be consumed on a splice
operation unless the output file's implementation of Write/PWrite actually
called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed
regardless of whether CopyIn is called or not.

Furthermore, the number of bytes in the IOSequence for reads is now capped at
the amount of data actually available. Before, splicing to /dev/zero would
always return the requested splice size without taking the actual available
data into account.

This change also refactors the case where an input file is spliced into an
output pipe so that it follows a similar pattern, which is arguably cleaner
anyway.

Updates #3576.

PiperOrigin-RevId: 328843954
2020-08-27 16:57:40 -07:00
Andrei Vagin dc008fbbcc unix: return ECONNREFUSE if a socket file exists but a socket isn't bound to it
PiperOrigin-RevId: 328843560
2020-08-27 16:52:02 -07:00
Ayush Ranjan 57877b420c [go-marshal] Support for usermem.IOOpts.
PiperOrigin-RevId: 328839759
2020-08-27 16:30:05 -07:00
Ghanan Gowripalan 6f8fb7e0db Improve type safety for socket options
The existing implementation for {G,S}etSockOpt take arguments of an
empty interface type which all types (implicitly) implement; any
type may be passed to the functions.

This change introduces marker interfaces for socket options that may be
set or queried which socket option types implement to ensure that invalid
types are caught at compile time. Different interfaces are used to allow
the compiler to enforce read-only or set-only socket options.

Fixes #3714.

RELNOTES: n/a
PiperOrigin-RevId: 328832161
2020-08-27 15:46:44 -07:00
gVisor bot 29d528399c Merge pull request #3077 from jinmouil:beef-write-syscall
PiperOrigin-RevId: 328824023
2020-08-27 15:03:40 -07:00
Zach Koopmans 26c588f063 Fix BadSocketPair for open source.
BadSocketPair test will return several errnos (EPREM, ESOCKTNOSUPPORT,
EAFNOSUPPORT) meaning the test is just too specific. Checking the syscall
fails is appropriate.

PiperOrigin-RevId: 328813071
2020-08-27 14:11:23 -07:00
Ghanan Gowripalan a5f1e74260 Skip IPv6UDPUnboundSocketNetlinkTest on native linux
...while we figure out of we want to consider the loopback interface
bound to all IPs in an assigned IPv6 subnet, or not (to maintain
compatibility with Linux).

PiperOrigin-RevId: 328807974
2020-08-27 13:45:36 -07:00
Ghanan Gowripalan dc81eb9c37 Add function to get error from a tcpip.Endpoint
In an upcoming CL, socket option types are made to implement a marker
interface with pointer receivers. Since this results in calling methods
of an interface with a pointer, we incur an allocation when attempting
to get an Endpoint's last error with the current implementation.

When calling the method of an interface, the compiler is unable to
determine what the interface implementation does with the pointer
(since calling a method on an interface uses virtual dispatch at runtime
so the compiler does not know what the interface method will do) so it
allocates on the heap to be safe incase an implementation continues to
hold the pointer after the functioon returns (the reference escapes the
scope of the object).

In the example below, the compiler does not know what b.foo does with
the reference to a it allocates a on the heap as the reference to a may
escape the scope of a.
```
var a int
var b someInterface
b.foo(&a)
```

This change removes the opportunity for that allocation.

RELNOTES: n/a
PiperOrigin-RevId: 328796559
2020-08-27 12:50:19 -07:00
Kevin Krakauer 01a35a2f19 ip6tables: (de)serialize ip6tables structs
More implementation+testing to follow.

#3549.

PiperOrigin-RevId: 328770160
2020-08-27 10:53:49 -07:00
Zach Koopmans 140ffb6007 Fix JobControl tests for open source.
ioctl calls with TIOCSCTTY fail if the calling process already has a
controlling terminal, which occurs on a 5.4 kernel like our Ubuntu 18 CI.
Thus, run tests calling ioctl TTOCSCTTY in clean subprocess.

Also, while we're here, switch out non-inclusive master/slave for main/replica.

PiperOrigin-RevId: 328756598
2020-08-27 09:52:49 -07:00
Fabricio Voznika 32e7a54f7f Make flag propagation automatic
Use reflection and tags to provide automatic conversion from
Config to flags. This makes adding new flags less error-prone,
skips flags using default values (easier to read), and makes
tests correctly use default flag values for test Configs.

Updates #3494

PiperOrigin-RevId: 328662070
2020-08-26 20:24:41 -07:00
gVisor bot a4b1c6f5a4 Merge pull request #3742 from lubinszARM:pr_n1_1
PiperOrigin-RevId: 328639254
2020-08-26 17:10:16 -07:00