We still deviate a bit from linux in how long we will actually wait in
FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely
obvious as to why it's done that way. For now I think we can ignore that and
fix it if it really is an issue.
PiperOrigin-RevId: 328324922
When a loopback interface is configurd with an address and associated
subnet, the loopback should treat all addresses in that subnet as an
address it owns.
This is mimicking linux behaviour as seen below:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
^C
--- 192.0.2.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
^C
--- 192.0.2.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms
$ sudo ip addr add 192.0.2.1/24 dev lo
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 192.0.2.1/24 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms
^C
--- 192.0.2.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms
^C
--- 192.0.2.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms
```
Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 328188546
Add tests for socket re-bind/listen of client and server sockets
with the older connection still in TIME_WAIT state and with
SO_REUSEADDR enabled.
PiperOrigin-RevId: 327924702
This is done to ease troubleshooting when tests fail. runsc
logs are not stored when tests passe, so this will only
affect failing tests and should not increase log storage
too badly.
PiperOrigin-RevId: 327717551
Accept 128 + SIGNAL as well as SIGNAL as valid
returns for fork/exec tests.
Also, make changes so that test compiles in opensource. Test
had compile errors on latest Ubuntu 16.04 image with updated bazel to
3.4.0 (as well as base 2.0) used for Kokoro tests.
PiperOrigin-RevId: 327510310
Skip InvalidOffset and InvalidLength for Linux as the test is invalid for
later Kernel versions.
Add UnsupportedFile test as this check is in all kernel versions.
PiperOrigin-RevId: 327248035
Setting timeouts for sockets on GCP images (debian) for usecs only
respects multiples of 4K. Set the test with a multiple of 4K with a comment.
PiperOrigin-RevId: 327093848
Fixes python runtime test test_glob.
Updates #3515
We were checking is the to-be-opened dentry is a dir or not before resolving
symlinks. We should check that after resolving symlinks.
This was preventing us from opening a symlink which pointed to a directory
with O_DIRECTORY.
Also added this check in tmpfs and removed a duplicate check.
PiperOrigin-RevId: 327085895
Fixes php runtime test ext/standard/tests/file/readfile_basic.phpt
Fixes#3516
fsgofers only want the access mode in the OpenFlags passed to Create(). If more
flags are supplied (like O_APPEND in this case), read/write from that fd will
fail with EBADF. See runsc/fsgofer/fsgofer.go:WriteAt()
VFS2 was providing more than just access modes. So filtering the flags using
p9.OpenFlagsModeMask == linux.O_ACCMODE fixes the issue.
Gofer in VFS1 also only extracts the access mode flags while making the create
RPC. See pkg/sentry/fs/gofer/path.go:Create()
Even in VFS2, when we open a handle, we extract out only the access mode flags
+ O_TRUNC.
See third_party/gvisor/pkg/sentry/fsimpl/gofer/handle.go:openHandle()
Added a test for this.
PiperOrigin-RevId: 326574829
Netstack's TIME-WAIT state for a TCP socket could be terminated prematurely if
the socket entered TIME-WAIT using shutdown(..., SHUT_RDWR) and then was closed
using close(). This fixes that bug and updates the tests to verify that Netstack
correctly honors TIME-WAIT under such conditions.
Fixes#3106
PiperOrigin-RevId: 326456443
Fixes php test ext/standard/tests/file/touch_variation5.phpt on vfs2.
Updates #3516
Also spotted a bug with O_EXCL, where we did not return EEXIST when we tried
to open the root of the filesystem with O_EXCL | O_CREAT.
Added some more tests for open() corner cases.
PiperOrigin-RevId: 326346863
It was changed in the Linux kernel:
commit f0628c524fd188c3f9418e12478dfdfadacba815
Date: Fri Apr 24 16:06:16 2020 +0800
net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
PiperOrigin-RevId: 325493859
Before kernel version 4.16-rc6, fuse mount is protected by
capable(CAP_SYS_ADMIN). After this version, it uses
ns_capable(CAP_SYS_ADMIN) to protect. Before the 4.16 kernel,
it was not allowed to mount fuse file systems without the
global CAP_SYS_ADMIN.
Fixes#3360
Helps in fixing open syscall tests: AppendConcurrentWrite and AppendOnly.
We also now update the file size for seekable special files (regular files)
which we were not doing earlier.
Updates #2923
PiperOrigin-RevId: 322670843
Packet sockets also seem to allow double binding and do not return an error on
linux. This was tested by running the syscall test in a linux namespace as root
and the current test DoubleBind fails@HEAD.
Passes after this change.
Updates #173
PiperOrigin-RevId: 321445137
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks
tcpdump as it tries to interpret the packets incorrectly.
Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which
fails with an EINVAL since we don't implement it. For now change it to return
EOPNOTSUPP to indicate that we don't support the query rather than return
EINVAL.
NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities
and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type
field while NIC capabilities are more like the device features which can be
queried using SIOCETHTOOL but not modified and NIC Flags are fields that can
be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc.
Updates #2746
PiperOrigin-RevId: 321436525
This change gates all FUSE commands (by gating /dev/fuse) behind a runsc
flag. In order to use FUSE commands, use the --fuse flag with the --vfs2
flag. Check if FUSE is enabled by running dmesg in the sandbox.
- Only use MAXSYMLINKS/2+1 symlinks for each of the interpreter and script
paths in SymlinkLimitRefreshedForInterpreter to tolerate cases where the
original paths (/tmp, /bin, or /bin/echo) themselves contain symlinks.
- Ensure that UnshareFiles performs execve immediately after clone(CLONE_VFORK)
(no heap allocation for ExecveArray/RunfilesPath).
- Use lstat() rather than stat() for the existence check in fs_util's Exists;
the latter will fail if the symlink target does not exist, even if the
symlink does.
PiperOrigin-RevId: 320110156
RFC 6864 imposes various restrictions on the uniqueness of the IPv4
Identification field for non-atomic datagrams, defined as an IP datagram that
either can be fragmented (DF=0) or is already a fragment (MF=1 or positive
fragment offset). In order to be compliant, the ID field is assigned for all
non-atomic datagrams.
Add a TCP unit test that induces retransmissions and checks that the IPv4
ID field is unique every time. Add basic handling of the IP_MTU_DISCOVER
socket option so that the option can be used to disable PMTU discovery,
effectively setting DF=0. Attempting to set the sockopt to anything other
than disabled will fail because PMTU discovery is currently not implemented,
and the default behavior matches that of disabled.
PiperOrigin-RevId: 320081842
This change fixes a few things:
- creating sockets using mknod(2) is supported via vfs2
- fsgofer can create regular files via mknod(2)
- mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well
Updates #2923
PiperOrigin-RevId: 320074267
We do not support RWF_SYNC/RWF_DSYNC and probably shouldn't silently accept
them, since the user may incorrectly believe that we are synchronizing I/O.
Remove the pwritev2 test verifying that we support these flags.
gvisor.dev/issue/2601 is the tracking bug for deciding which RWF_.* flags
we need and supporting them.
Updates #2923, #2601.
PiperOrigin-RevId: 319351286
We were not invalidating mappings when the file size changed in shared mode.
Enabled the syscall test for vfs2.
Updates #2923
PiperOrigin-RevId: 319346569
Currently, we always perform a full-file sync which could be extremely
expensive for some applications. Although vfs1 did not fully support
sync_file_range, there were some optimizations that allowed us skip some
unnecessary write-outs.
Updates #2923, #1897.
PiperOrigin-RevId: 319324213
After we change credentials, it is possible that we no longer have access to
the sticky directory where we are trying to delete files. Use an open fd so
this is not an issue.
PiperOrigin-RevId: 319306255
- Support FIOASYNC, FIO{SET,GET}OWN, SIOC{G,S}PGRP (refactor getting/setting
owner in the process).
- Unset signal recipient when setting owner with pid == 0 and
valid owner type.
Updates #2923.
PiperOrigin-RevId: 319231420
This includes the provisional style guide in the website and fixes the broken
link from CONTRIBUTING.md. The style guide will be located under the "Community"
category as it's related to contributing to the project.
Also, add missing includes that were causing some presubmits to fail.
PiperOrigin-RevId: 319061410
Bring udp_socket_test into complianc by:
- Eliminating IsRunningOnGvisor() invocations.
- Wrapping sockets in RAII FileDescriptor objects.
- Creating a Bind() method so that the first bind happens on port 0.
PiperOrigin-RevId: 318909396
SO_NO_CHECK is used to skip the UDP checksum generation on a TX socket
(UDP checksum is optional on IPv4).
Test:
- TestNoChecksum
- SoNoCheckOffByDefault (UdpSocketTest)
- SoNoCheck (UdpSocketTest)
Fixes#3055
PiperOrigin-RevId: 318575215
Also, while we're here, make sure that gofer inotify events are generated when
files are created in remote revalidating mode.
Updates #1479.
PiperOrigin-RevId: 318536354
This change adds a FUSE character device backed by devtmpfs. This
device will be used to establish a connection between the FUSE
server daemon and fusefs. The FileDescriptionImpl methods will
be implemented as we flesh out fusefs some more. The tests assert
that the device can be opened and used.
For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses.
gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did
not allow disabling it. This change brings the SO_REUSEADDR behavior closer to
the behavior implemented by Linux and adds a new SO_REUSEADDR disabled
behavior. Like Linux, SO_REUSEADDR is now disabled by default.
PiperOrigin-RevId: 317984380
Events were only skipped on parent directories after their children were
unlinked; events on the unlinked file itself need to be skipped as well.
As a result, all Watches.Notify() calls need to know whether the dentry where
the call came from was unlinked.
Updates #1479.
PiperOrigin-RevId: 317979476
Because there is no inode structure stored in the sandbox, inotify watches
must be held on the dentry. This would be an issue in the presence of hard
links, where multiple dentries would need to share the same set of watches,
but in VFS2, we do not support the internal creation of hard links on gofer
fs. As a result, we make the assumption that every dentry corresponds to a
unique inode.
Furthermore, dentries can be cached and then evicted, even if the underlying
file has not be deleted. We must prevent this from occurring if there are any
watches that would be lost. Note that if the dentry was deleted or invalidated
(d.vfsd.IsDead()), we should still destroy it along with its watches.
Additionally, when a dentry’s last watch is removed, we cache it if it also
has zero references. This way, the dentry can eventually be evicted from
memory if it is no longer needed. This is accomplished with a new dentry
method, OnZeroWatches(), which is called by Inotify.RmWatch and
Inotify.Release. Note that it must be called after all inotify locks are
released to avoid violating lock order. Stress tests are added to make sure
that inotify operations don't deadlock with gofer.OnZeroWatches.
Updates #1479.
PiperOrigin-RevId: 317958034
Despite what the man page says, linux will return EINVAL when calling
getdents() an a /proc/[tid]/net file corresponding to a zombie task. This
causes readdir() to return a null pointer AND errno=EINVAL.
See fs/proc/proc_net.c:proc_tgid_net_readdir() for where this occurs.
We have tests that recursively read /proc, and are likely to hit this when
running natively, so we must catch and handle this case.
PiperOrigin-RevId: 317674168
I forgot to update getdents earlier. Several thousand runs of the fsync and
proc_net_unix tests all passed as well.
Updates #2923.
PiperOrigin-RevId: 317415488
Check for unsupported flags, and silently support RWF_HIPRI by doing nothing.
From pkg/abi/linux/file.go: "gVisor does not implement the RWF_HIPRI feature,
but the flag is accepted as a valid flag argument for preadv2/pwritev2."
Updates #2923.
PiperOrigin-RevId: 317330631
Always check if a synthetic file already exists at a location before creating a
file there, and do not try to delete synthetic gofer files from the remote fs.
This fixes runsc_ptrace socket tests that create/unlink synthetic, named socket
files.
Updates #2923.
PiperOrigin-RevId: 317293648
The test was expecting that the root mount pathname was "/", but it doesn't
need to be. Only the mount point actually should be "/" (otherwise it is not
the root).
PiperOrigin-RevId: 316968025
- Change FileDescriptionImpl Lock/UnlockPOSIX signature to
take {start,length,whence}, so the correct offset can be
calculated in the implementations.
- Create PosixLocker interface to make it possible to share
the same locking code from different implementations.
Closes#1480
PiperOrigin-RevId: 316910286
On UDP sockets, SO_REUSEADDR allows multiple sockets to bind to the same
address, but only delivers packets to the most recently bound socket. This
differs from the behavior of SO_REUSEADDR on TCP sockets. SO_REUSEADDR for TCP
sockets will likely need an almost completely independent implementation.
SO_REUSEADDR has some odd interactions with the similar SO_REUSEPORT. These
interactions are tested fairly extensively and all but one particularly odd
one (that honestly seems like a bug) behave the same on gVisor and Linux.
PiperOrigin-RevId: 315844832
LockFD is the generic implementation that can be embedded in
FileDescriptionImpl implementations. Unique lock ID is
maintained in vfs.FileDescription and is created on demand.
Updates #1480
PiperOrigin-RevId: 315604825
The current task can share its fdtable with a few other tasks,
but after exec, this should be a completely separate process.
PiperOrigin-RevId: 314999565
For TCP sockets gVisor incorrectly returns EAGAIN when no ephemeral ports are
available to bind during a connect. Linux returns EADDRNOTAVAIL. This change
fixes gVisor to return the correct code and adds a test for the same.
This change also fixes a minor bug for ping sockets where connect() would fail
with EINVAL unless the socket was bound first.
Also added tests for testing UDP Port exhaustion and Ping socket port
exhaustion.
PiperOrigin-RevId: 314988525
b/36576592 calls out an edge case previously not supported
by HostFS. HostFS is currently being removed, meaning gVisor
supports this feature. Simply add the test to open_test.
PiperOrigin-RevId: 314610226
Splice, setxattr and removexattr should generate events. Note that VFS2 already
generates events for extended attributes.
Updates #1479.
PiperOrigin-RevId: 314244261
Support in other filesystem impls is still needed. Unlike in Linux and vfs1, we
need to plumb inotify down to each filesystem implementation in order to keep
track of links/inode structures properly.
IN_EXCL_UNLINK still needs to be implemented, as well as a few inotify hooks
that are not present in either vfs1 or vfs2. Those will be addressed in
subsequent changes.
Updates #1479.
PiperOrigin-RevId: 313781995
With additional logging, the issue described by the new comment looks like:
D0518 21:28:08.416810 6777 task_signals.go:459] [ 8] Notified of signal 27
D0518 21:28:08.416852 6777 task_block.go:223] [ 8] Interrupt queued
D0518 21:28:08.417013 6777 task_run.go:250] [ 8] Switching to sentry
D0518 21:28:08.417033 6777 task_signals.go:220] [ 8] Signal 27: delivering to handler
D0518 21:28:08.417127 6777 task_run.go:248] [ 8] Switching to app
D0518 21:28:08.443765 6777 task_signals.go:519] [ 8] Refusing masked signal 27 // ED: note the ~26ms elapsed since TID 8 "switched to app"
D0518 21:28:08.443814 6777 task_signals.go:465] [ 6] Notified of group signal 27
D0518 21:28:08.443832 6777 task_block.go:223] [ 6] Interrupt queued
D0518 21:28:08.443914 6777 task_block.go:223] [ 6] Interrupt queued
D0518 21:28:08.443859 6777 task_run.go:250] [ 8] Switching to sentry
I0518 21:28:08.443936 6777 strace.go:576] [ 8] exe E rt_sigreturn()
Slow context switches on ptrace are probably due to kernel scheduling delays.
Slow context switches on KVM are less clear, so leave that bug and TODO open.
PiperOrigin-RevId: 312322782
On native Linux, calling recv/read right after send/write sometimes returns
EWOULDBLOCK, if the data has not made it to the receiving socket (even though
the endpoints are on the same host). Poll before reading to avoid this.
Making this change also uncovered a hostinet bug (gvisor.dev/issue/2726),
which is noted in this CL.
PiperOrigin-RevId: 312320587
Some functions were added for Arm64 platform:
a, get_fp/set_fp
b, inline_tgkill
Test step:
bazel test //test/syscalls:fpsig_nested_test_runsc_ptrace
Signed-off-by: Bin Lu <bin.lu@arm.com>
This change adds support for TCP_SYNCNT and TCP_WINDOW_CLAMP options
in GetSockOpt/SetSockOpt. This change does not really change any
behaviour in Netstack and only stores/returns the stored value.
Actual honoring of these options will be added as required.
Fixes#2626, #2625
PiperOrigin-RevId: 311453777
Some functions were added for Arm64 platform:
a, get_fp/set_fp
b, inline_tgkill
Test step:
bazel test //test/syscalls:fpsig_fork_test_runsc_ptrace
Signed-off-by: Bin Lu <bin.lu@arm.com>
There is the known issue of the linux procfs, that two consequent calls of
readdir can return the same entry twice if between these calls one or more
entries have been removed from this directory.
PiperOrigin-RevId: 309803066
Enforce write permission checks in BoundEndpointAt, which corresponds to the
permission checks in Linux (net/unix/af_unix.c:unix_find_other).
Also, create bound socket files with the correct permissions in VFS2.
Fixes#2324.
PiperOrigin-RevId: 308949084
TempPath's destructor runs at the end of the named pipe creation functions,
deleting the named pipe. If the named pipe is backed by a "non-virtual"
filesystem (!fs.Inode.IsVirtual()), this causes the following save attempt to
fail because there are FDs holding the deleted named pipe open.
PiperOrigin-RevId: 308861999