Previously, we did not check the kcov mode when performing task work. As a
result, disabling kcov did not do anything.
Also avoid expensive atomic RMW when consuming coverage data. We don't need the
swap if the value is already zero (which is most of the time), and it is ok if
there are slight inconsistencies due to a race between coverage data generation
(incrementing the value) and consumption (reading a nonzero value and writing
zero).
PiperOrigin-RevId: 334049207
Regarding ThreadCpuTimeArray.java: The test starts 10 threads, each of which
does some computation, then blocks. When all threads are blocked, the test
sleeps for 200ms, then checks that less than 100ns of CPU time in userspace
elapse over the course of the sleep; AFAICT, the 100ns of slop is because a
thread indicates that it's in the WAITING state before it actually blocks, and
because signals can cause threads to be temporarily woken. gVisor's CPU clocks
have a granularity of 10ms (the interval of Kernel.cpuClockTicker is
//pkg/abi/linux.ClockTick), so a single tick pushes the test over the
threshold.
PiperOrigin-RevId: 333830287
segment_queue today has its own standalone limit of MaxUnprocessedSegments but
this can be a problem in UnlockUser() we do not release the lock till there are
segments to be processed. What can happen is as handleSegments dequeues packets
more keep getting queued and we will never release the lock. This can keep
happening even if the receive buffer is full because nothing can read() till we
release the lock.
Further having a separate limit for pending segments makes it harder to track
memory usage etc. Unifying the limits makes it easier to reason about memory in
use and makes the overall buffer behaviour more consistent.
PiperOrigin-RevId: 333508122
Mostly simplifies SKIP_IF statements and adds some more documentation.
Also, mknod is now supported by gofer fs, so remove SKIP_IFs related to this.
PiperOrigin-RevId: 333449932
"DefaultValueEqZero" is only valid if the test is in a
sandbox. Our CI VMs often have "/proc/sys/net/ipv4/ip_forward" set
to 1.
PiperOrigin-RevId: 332910859
`recv` calls with MSG_DONTWAIT can fail with EAGAIN randomly
in tests. Fix this by calling `select` on sockets with a timeout
prior to attempting a `recv`.
PiperOrigin-RevId: 332873735
Unfortunately, I think TSC misalignment means that we can't really expect any
consistent correspondence between a TSC-based VDSO and the sentry's view of
time on the KVM platform.
PiperOrigin-RevId: 332576147
This is more consistent with Linux (see comment on MM.NewSharedAnonMappable()).
We don't do the same thing on VFS1 for reasons documented by the updated
comment.
PiperOrigin-RevId: 332514849
TCP needs to enqueue any send requests arriving when the connection is in
SYN_SENT state. The data should be sent out soon after completion of the
connection handshake.
Fixes#3995
PiperOrigin-RevId: 332482041
SO_LINGER is a socket level option and should be stored on all endpoints even
though it is used to linger only for TCP endpoints.
PiperOrigin-RevId: 332369252
Docker does not have IPv6 port forwarding as tracked by the following issue:
https://github.com/moby/moby/issues/11518
So when running bazel itself inside a docker container, we can not use the host
port bindings to communicate with sockets inside the container. This was causing
integration tests and image tests to fail when run through our Makefile targets.
PiperOrigin-RevId: 332355051
There are two device names on the test net.
- The sniffer/injector device which is always a linux device. Only the
testbench library is interested in this device.
- The device which is on the DUT. It happens to be the same device as
the former if DUT is linux. An individual test might be interested in
this device if the test cares about the device name.
PiperOrigin-RevId: 332112968
Added a README describing what these tests are, how they work and how to run
them locally. Also reorganized the exclude files into a directory.
PiperOrigin-RevId: 332079697
When a broadcast packet is received by the stack, the packet should be
delivered to each endpoint that may be interested in the packet. This
includes all any address and specified broadcast address listeners.
Test: integration_test.TestReuseAddrAndBroadcast
PiperOrigin-RevId: 332060652
This commit implements FUSE_SETATTR command. When a system call modifies
the metadata of a regular file or a folder by chown(2), chmod(2),
truncate(2), utime(2), or utimes(2), they should be translated to
corresponding FUSE_SETATTR command and sent to the FUSE server.
Fixes#3332
According to Linux 4.4's FUSE behavior, the flags and fh attributes in
FUSE_GETATTR are only used in read, write, and lseek. fstat(2) doesn't
use them either. Add tests to ensure the requests sent from FUSE module
are consistent with Linux's.
Updates #3655
This change adds the following:
- Add support for containerizing syscall tests for FUSE
- Mount tmpfs in the container so we can run benchmarks against it
- Run the server in a background process
- benchmarks for fuse syscall
Co-authored-by: Ridwan Sharif <ridwanmsharif@google.com>
Adds a function for the testing thread to set up a fake inode with a
specific path under mount point. After this function is called, each
subsequent FUSE_LOOKUP request with the same path will be served with
the fixed stub response.
Fixes#3539
This commit adds 3 utility functions to ensure all received requests
and preset responses are consumed.
1. Get number of unconsumed requests (received by the FUSE server but
not consumed by the testing thread).
2. Get number of unsent responses (set by the testing thread but not
processed by the FUSE server).
3. Get total bytes of the received requests (to ensure some operations
don't trigger FUSE requests).
Fixes#3607
Original FUSE integration test has limited capabilities. To test more
situations, the new integration test framework introduces a protocol
to communicate between testing thread and the FUSE server. In summary,
this change includes:
1. Remove CompareResult() and break SetExpected() into
SetServerResponse() and GetServerActualRequest(). We no longer set
up an expected request because we want to retrieve the actual FUSE
request made to the FUSE server and check in the testing thread.
2. Declare a serial buffer data structure to save the received requests
and expected responses sequentially. The data structure contains a
cursor to indicate the progress of accessing. This change makes
sequential SetServerResponse() and GetServerActualRequest() possible.
3. Replace 2 single directional pipes with 1 bi-directional socketpair.
A protocol which starts with FuseTestCmd is used between the testing
thread and the FUSE server to provide various functionality.
Fixes#3405
Make setting STATX_SIZE a no-op, if it is valid for the given permissions and
file type.
Also update proc tests, which were overfitted before.
Fixes#3842.
Updates #1193.
PiperOrigin-RevId: 331861087
Currently the returned offset is an index, and we can't
use it to find the next fd to serialize, because getdents
should iterate correctly despite mutation of fds. Instead,
we can return the next fd to serialize plus 2 (which
accounts for "." and "..") as the offset.
Fixes: #3894
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
gVisor stack ignores RSTs when in TIME_WAIT which is not the default
Linux behavior. Add a packetimpact test to test the same.
Also update code comments to reflect the rationale for the current
gVisor behavior.
PiperOrigin-RevId: 331629879
These operations require CAP_SYS_ADMIN in the root user namespace. There's no
easy way to check that other than trying the operation and seeing what happens.
PiperOrigin-RevId: 331242256
Overlayfs does not persist a directory's inode number even while it is mounted.
See fs/overlayfs/inode.c:ovl_map_dev_ino(). VFS2 generates a new inode number
for directories everytime in lookup.
PiperOrigin-RevId: 331045037
Overlayfs intentionally does not compute nlink for directories (because it can
be really expensive). Linux returns 1, VFS2 returns 2 and VFS1 actually
calculates the correct value.
PiperOrigin-RevId: 330967139
- BindSocketThenOpen test was expecting the incorrect error when opening
a socket. Fixed that.
- VirtualFilesystem.BindEndpointAt should not require pop.Path.Begin.Ok()
because the filesystem implementations do not need to walk to the parent
dentry. This check also exists for MknodAt, MkdirAt, RmdirAt, SymlinkAt and
UnlinkAt but those filesystem implementations also need to walk to the parent
denty. So that check is valid. Added some syscall tests to test this.
PiperOrigin-RevId: 330625220
The existing implementation for TransportProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.
This change introduces marker interfaces for transport protocol options
that may be set or queried which transport protocol option types
implement to ensure that invalid types are caught at compile time.
Different interfaces are used to allow the compiler to enforce read-only
or set-only socket options.
RELNOTES: n/a
PiperOrigin-RevId: 330559811
Accept on gVisor will return an error if a socket in the accept queue was closed
before Accept() was called. Linux will return the new fd even if the returned
socket is already closed by the peer say due to a RST being sent by the peer.
This seems to be intentional in linux more details on the github issue.
Fixes#3780
PiperOrigin-RevId: 329828404
Adds docs to nginx and refactors both Httpd and Nginx benchmarks.
Key changes:
- Add docs and make nginx tests the same as httpd (reverse, all docs, etc.).
- Make requests scale on c * b.N -> a request per thread. This works well
with both --test.benchtime=10m (do a run that lasts at least 10m) and
--test.benchtime=10x (do b.N = 10).
-- Remove a doc from both tests (1000Kb) as 1024Kb exists.
PiperOrigin-RevId: 329751091
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.
Benchmark results:
BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2: 68109 ns
VFS1: 72507 ns
Updates #1198
PiperOrigin-RevId: 329628461
On receiving an ACK with unacceptable ACK number, in a closing state,
TCP, needs to reply back with an ACK with correct seq and ack numbers and
remain in same state. This change is as per RFC793 page 37, but with a
difference that it does not apply to ESTABLISHED state, just as in Linux.
Also add more tests to check for OTW sequence number and unacceptable
ack numbers in these states.
Fixes#3785
PiperOrigin-RevId: 329616283
This prevents setting stale errno on responses.
Also fixes TestDiscardsUDPPacketsWithMcastSourceAddressV6 to use correct
multicast addresses in test.
Fixes#3793
PiperOrigin-RevId: 329391155
Some syscall tests, namely uname_test_* modify the host and domain
name, which modifies the execution environment and can have unintended
consequences on other tests. For example, modifying the hostname
causes some networking tests to fail DNS lookups. Run all syscall
tests in their own uts namespaces to isolate these changes.
PiperOrigin-RevId: 329348127
Currently the logs produce
TestOne: packetimpact_test.go:182: listing devices on ... container: process terminated with status: 126
which is not actionable; presumably the `ip` command output is interesting.
PiperOrigin-RevId: 329032105
An earlier change considered the loopback bound to all addresses in an
assigned subnet. This should have only be done for IPv4 to maintain
compatability with Linux:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3062ms
$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
^C
--- 2001:db8::2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms
$ sudo ip addr add 2001:db8::1/64 dev lo
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
64 bytes from 2001:db8::1: icmp_seq=1 ttl=64 time=0.055 ms
64 bytes from 2001:db8::1: icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from 2001:db8::1: icmp_seq=3 ttl=64 time=0.073 ms
64 bytes from 2001:db8::1: icmp_seq=4 ttl=64 time=0.071 ms
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3075ms
rtt min/avg/max/mdev = 0.055/0.068/0.074/0.007 ms
$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
From 2001:db8::1 icmp_seq=1 Destination unreachable: No route
From 2001:db8::1 icmp_seq=2 Destination unreachable: No route
From 2001:db8::1 icmp_seq=3 Destination unreachable: No route
From 2001:db8::1 icmp_seq=4 Destination unreachable: No route
^C
--- 2001:db8::2 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3070ms
```
Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 329011566
Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero
would not actually empty the pipe.
There was no guarantee that the data would actually be consumed on a splice
operation unless the output file's implementation of Write/PWrite actually
called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed
regardless of whether CopyIn is called or not.
Furthermore, the number of bytes in the IOSequence for reads is now capped at
the amount of data actually available. Before, splicing to /dev/zero would
always return the requested splice size without taking the actual available
data into account.
This change also refactors the case where an input file is spliced into an
output pipe so that it follows a similar pattern, which is arguably cleaner
anyway.
Updates #3576.
PiperOrigin-RevId: 328843954
BadSocketPair test will return several errnos (EPREM, ESOCKTNOSUPPORT,
EAFNOSUPPORT) meaning the test is just too specific. Checking the syscall
fails is appropriate.
PiperOrigin-RevId: 328813071
...while we figure out of we want to consider the loopback interface
bound to all IPs in an assigned IPv6 subnet, or not (to maintain
compatibility with Linux).
PiperOrigin-RevId: 328807974
ioctl calls with TIOCSCTTY fail if the calling process already has a
controlling terminal, which occurs on a 5.4 kernel like our Ubuntu 18 CI.
Thus, run tests calling ioctl TTOCSCTTY in clean subprocess.
Also, while we're here, switch out non-inclusive master/slave for main/replica.
PiperOrigin-RevId: 328756598
Added a few tests for write(2) and pwrite(2)
1. Regular Files
For write(2)
- write zero bytes should not move the offset
- write non-zero bytes should increment the offset the exact amount
- write non-zero bytes after a lseek() should move the offset the exact amount after the seek
- write non-zero bytes with O_APPEND should move the offset the exact amount after original EOF
For pwrite(2), offset is not affected when
- pwrite zero bytes
- pwrite non-zero bytes
For EOF, added a test asserting the EOF (indicated by lseek(SEEK_END)) is updated properly after writing non-zero bytes
2. Symlink
Added one pwite64() call for symlink that is written as a counterpart of the existing test using pread64()
In Linux, a kernel configuration is set that compiles the kernel with a
custom function that is called at the beginning of every basic block, which
updates the memory-mapped coverage information. The Go coverage tool does not
allow us to inject arbitrary instructions into basic blocks, but it does
provide data that we can convert to a kcov-like format and transfer them to
userspace through a memory mapping.
Note that this is not a strict implementation of kcov, which is especially
tricky to do because we do not have the same coverage tools available in Go
that that are available for the actual Linux kernel. In Linux, a kernel
configuration is set that compiles the kernel with a custom function that is
called at the beginning of every basic block to write program counters to the
kcov memory mapping. In Go, however, coverage tools only give us a count of
basic blocks as they are executed. Every time we return to userspace, we
collect the coverage information and write out PCs for each block that was
executed, providing userspace with the illusion that the kcov data is always
up to date. For convenience, we also generate a unique synthetic PC for each
block instead of using actual PCs. Finally, we do not provide thread-specific
coverage data (each kcov instance only contains PCs executed by the thread
owning it); instead, we will supply data for any file specified by --
instrumentation_filter.
Also, fix issue in nogo that was causing pkg/coverage:coverage_nogo
compilation to fail.
PiperOrigin-RevId: 328426526
iptables sockopts were kludged into an unnecessary check, this properly
relegates them to the {get,set}SockOptIP functions.
PiperOrigin-RevId: 328395135
When SO_LINGER option is enabled, the close will not return until all the
queued messages are sent and acknowledged for the socket or linger timeout is
reached. If the option is not set, close will return immediately. This option
is mainly supported for connection oriented protocols such as TCP.
PiperOrigin-RevId: 328350576
These tests print disk_free_space()/disk_total_space() and expect the printed
result to be an integer (despite the fact that both the documented and returned
type is float). After cl/297213789, free/total disk space on tmpfs is
sufficiently large that PHP prints the result in scientific notation instead:
========DIFF========
012+ float(9.2233720368548E+18)
013+ float(9.2233720368548E+18)
012- float(%d)
013- float(%d)
========DONE========
FAIL disk_total_space() and disk_free_space() tests [ext/standard/tests/file/disk.phpt]
PiperOrigin-RevId: 328349906
We still deviate a bit from linux in how long we will actually wait in
FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely
obvious as to why it's done that way. For now I think we can ignore that and
fix it if it really is an issue.
PiperOrigin-RevId: 328324922
When a loopback interface is configurd with an address and associated
subnet, the loopback should treat all addresses in that subnet as an
address it owns.
This is mimicking linux behaviour as seen below:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
^C
--- 192.0.2.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
^C
--- 192.0.2.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms
$ sudo ip addr add 192.0.2.1/24 dev lo
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 192.0.2.1/24 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms
^C
--- 192.0.2.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms
^C
--- 192.0.2.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms
```
Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 328188546
Add tests for socket re-bind/listen of client and server sockets
with the older connection still in TIME_WAIT state and with
SO_REUSEADDR enabled.
PiperOrigin-RevId: 327924702
This is done to ease troubleshooting when tests fail. runsc
logs are not stored when tests passe, so this will only
affect failing tests and should not increase log storage
too badly.
PiperOrigin-RevId: 327717551
Accept 128 + SIGNAL as well as SIGNAL as valid
returns for fork/exec tests.
Also, make changes so that test compiles in opensource. Test
had compile errors on latest Ubuntu 16.04 image with updated bazel to
3.4.0 (as well as base 2.0) used for Kokoro tests.
PiperOrigin-RevId: 327510310
Skip InvalidOffset and InvalidLength for Linux as the test is invalid for
later Kernel versions.
Add UnsupportedFile test as this check is in all kernel versions.
PiperOrigin-RevId: 327248035
Setting timeouts for sockets on GCP images (debian) for usecs only
respects multiples of 4K. Set the test with a multiple of 4K with a comment.
PiperOrigin-RevId: 327093848
Fixes python runtime test test_glob.
Updates #3515
We were checking is the to-be-opened dentry is a dir or not before resolving
symlinks. We should check that after resolving symlinks.
This was preventing us from opening a symlink which pointed to a directory
with O_DIRECTORY.
Also added this check in tmpfs and removed a duplicate check.
PiperOrigin-RevId: 327085895
This is a preparatory commit for a larger commit working on
ICMP generation in error cases.
This is removal of technical debt and cleanup in the gvisor code
as part of gvisor issue 2211.
Updates #2211.
PiperOrigin-RevId: 326615389
Fixes php runtime test ext/standard/tests/file/readfile_basic.phpt
Fixes#3516
fsgofers only want the access mode in the OpenFlags passed to Create(). If more
flags are supplied (like O_APPEND in this case), read/write from that fd will
fail with EBADF. See runsc/fsgofer/fsgofer.go:WriteAt()
VFS2 was providing more than just access modes. So filtering the flags using
p9.OpenFlagsModeMask == linux.O_ACCMODE fixes the issue.
Gofer in VFS1 also only extracts the access mode flags while making the create
RPC. See pkg/sentry/fs/gofer/path.go:Create()
Even in VFS2, when we open a handle, we extract out only the access mode flags
+ O_TRUNC.
See third_party/gvisor/pkg/sentry/fsimpl/gofer/handle.go:openHandle()
Added a test for this.
PiperOrigin-RevId: 326574829
Netstack's TIME-WAIT state for a TCP socket could be terminated prematurely if
the socket entered TIME-WAIT using shutdown(..., SHUT_RDWR) and then was closed
using close(). This fixes that bug and updates the tests to verify that Netstack
correctly honors TIME-WAIT under such conditions.
Fixes#3106
PiperOrigin-RevId: 326456443
Fixes php test ext/standard/tests/file/touch_variation5.phpt on vfs2.
Updates #3516
Also spotted a bug with O_EXCL, where we did not return EEXIST when we tried
to open the root of the filesystem with O_EXCL | O_CREAT.
Added some more tests for open() corner cases.
PiperOrigin-RevId: 326346863
The code was deleting logs for all tests when a single test
passed. Change it to delete only the logs relevant to the
test at hand.
Also fixed the benchmark lookup code, which was always generating
a single empty benchmark entry if there were not benchmarks.
PiperOrigin-RevId: 326311477
Mark all tests passing for VFS2 in:
image_test
integration_test
There's no way to do negative look ahead/behind in golang test regex,
so check if the tests uses VFS2 and skip CheckPointRestore if it does.
PiperOrigin-RevId: 326050915
//test/iptables:iptables_test runs 30 seconds faster on my machine.
* Using contexts instead of many smaller timeouts makes the tests less
likely to flake and removes unnecessary complexity.
* We also use context to properly shut down concurrent goroutines and
the test container.
* Container logs are always logged.
Remove the old benchmark-tools directory, including
imports in the WORKSPACE file and associated bazel rules.
The new Golang benchmark-tools can be found at //test/benchmarks
and it is functionally equivalent, excepting syscall_test
which can be found in //test/perf/linux.
PiperOrigin-RevId: 325529075