Commit Graph

94 Commits

Author SHA1 Message Date
Tamir Duberstein debd213da6 Allow dual stack sockets to operate on AF_INET
Fixes #1490
Fixes #1495

PiperOrigin-RevId: 289523250
2020-01-13 14:47:22 -08:00
Ian Lewis fbb2c008e2 Return correct length with MSG_TRUNC for unix sockets.
This change calls a new Truncate method on the EndpointReader in RecvMsg for
both netlink and unix sockets.  This allows readers such as sockets to peek at
the length of data without actually reading it to a buffer.

Fixes #993 #1240

PiperOrigin-RevId: 288800167
2020-01-08 17:24:05 -08:00
Jay Zhuang 18d6e59b45 Switch to netinet/tcp.h and poll.h to for better platform portability.
PiperOrigin-RevId: 286249699
2019-12-18 13:58:38 -08:00
Jay Zhuang 65f53c5833 Put GetSocketPairs() in unnamed namespace
This avoids conflicting definitions of GetSocketPairs() in outer namespace when
multiple such cc files are complied for one binary.

PiperOrigin-RevId: 286243045
2019-12-18 12:50:04 -08:00
gVisor bot 67000b929b Explicitly export files needed by other packages
PiperOrigin-RevId: 285968611
2019-12-17 06:33:08 -08:00
Dean Deng 1601e78a52 Add syscall tests for getxattr and setxattr.
Support for getxattr and setxattr are in subsequent commits.

PiperOrigin-RevId: 285088817
2019-12-11 16:41:17 -08:00
Bhasker Hariharan cb5f9b8f86 Mark test as non flaky.
PiperOrigin-RevId: 284606133
2019-12-09 12:04:51 -08:00
Michael Pratt 498595d543 Add tests for rseq(2)
Add a decent set of syscall tests for rseq(2). These are a bit awkward because
of issues with library integration. libc may register rseq on thread start
(including before main on the initial thread), precluding much testing. Thus we
run tests in a libc-free subprocess.

Support for rseq(2) in gVisor will come in a later commit.

PiperOrigin-RevId: 284595994
2019-12-09 11:22:31 -08:00
Ian Gudger 13f0f6069a Implement F_GETOWN_EX and F_SETOWN_EX.
Some versions of glibc will convert F_GETOWN fcntl(2) calls into F_GETOWN_EX in
some cases.

PiperOrigin-RevId: 284089373
2019-12-05 17:28:52 -08:00
gVisor bot 05758f34b2 Explicitly export files needed by other packages
PiperOrigin-RevId: 283955946
2019-12-05 05:45:09 -08:00
Dean Deng 80b7ba0c97 Clean up readv_socket test suite.
Get rid of the SocketTest class, which is only extended by ReadvSocketTest.
Also, get rid of TCP sockets (which were unused anyway) from readv_socket.cc.
This is a very old test suite that isn't the right place for TCP loopback
tests.

PiperOrigin-RevId: 283672772
2019-12-03 19:42:20 -08:00
Bhasker Hariharan 27e2c4ddca Fix panic due to early transition to Closed.
The code in rcv.consumeSegment incorrectly transitions to
CLOSED state from LAST-ACK before the final ACK for the FIN.

Further if receiving a segment changes a socket to a closed state
then we should not invoke the sender as the socket is now closed
and sending any segments is incorrect.

PiperOrigin-RevId: 283625300
2019-12-03 14:41:55 -08:00
Jay Zhuang aa70523da2 Port tests in udp_socket.cc to Fuchsia
Separate out a test in udp_socket.cc that depends on <linux/errqueue.h> so the
rest of the tests can run on Fuchsia.

PiperOrigin-RevId: 283322633
2019-12-02 05:38:30 -08:00
Ian Gudger 57a2a5ea33 Add tests for SO_REUSEADDR and SO_REUSEPORT.
* Basic tests for the SO_REUSEADDR and SO_REUSEPORT options.
* SO_REUSEADDR functional tests for TCP and UDP.
* SO_REUSEADDR and SO_REUSEPORT interaction tests for UDP.
* Stubbed support for UDP getsockopt(SO_REUSEADDR).

PiperOrigin-RevId: 280049265
2019-11-12 14:04:14 -08:00
Bhasker Hariharan 66ebb6575f Add support for TIME_WAIT timeout.
This change adds explicit support for honoring the 2MSL timeout
for sockets in TIME_WAIT state. It also adds support for the
TCP_LINGER2 option that allows modification of the FIN_WAIT2
state timeout duration for a given socket.

It also adds an option to modify the Stack wide TIME_WAIT timeout
but this is only for testing. On Linux this is fixed at 60s.

Further, we also now correctly process RST's in CLOSE_WAIT and
close the socket similar to linux without moving it to error
state.

We also now handle SYN in ESTABLISHED state as per
RFC5961#section-4.1. Earlier we would just drop these SYNs.
Which can result in some tests that pass on linux to fail on
gVisor.

Netstack now honors TIME_WAIT correctly as well as handles the
following cases correctly.

- TCP RSTs in TIME_WAIT are ignored.
- A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT
  and a dup ACK is sent in response to the FIN as the dup FIN
  indicates potential loss of the original final ACK.
- An out of order segment during TIME_WAIT generates a dup ACK.
- A new SYN w/ a sequence number > the highest sequence number
  in the previous connection closes the TIME_WAIT early and
  opens a new connection.

Further to make the SYN case work correctly the ISN (Initial
Sequence Number) generation for Netstack has been updated to
be as per RFC. Its not a pure random number anymore and follows
the recommendation in https://tools.ietf.org/html/rfc6528#page-3.

The current hash used is not a cryptographically secure hash
function. A separate change will update the hash function used
to Siphash similar to what is used in Linux.

PiperOrigin-RevId: 279106406
2019-11-07 09:46:55 -08:00
Michael Pratt b23b36e701 Add NETLINK_KOBJECT_UEVENT socket support
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.

systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.

Fixes #1117
Updates #1119

PiperOrigin-RevId: 278405893
2019-11-04 10:07:52 -08:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Dean Deng 0b569b7cae Add basic implementation of execveat syscall and associated tests.
Allow file descriptors of directories as well as AT_FDCWD.

PiperOrigin-RevId: 275929668
2019-10-21 14:55:18 -07:00
Michael Pratt 49b596b98d Cleanup host UDS support
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.

A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.

Updates #235
Updates #1003

[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.

PiperOrigin-RevId: 275558502
2019-10-18 15:33:03 -07:00
Jianfeng Tan e3d4a67739 support /proc/net/snmp
This proc file contains statistics according to [1].

[1] https://tools.ietf.org/html/rfc2013

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-15 16:38:40 +00:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Andrei Vagin db218fdfcf Don't report partialResult errors from sendfile
The input file descriptor is always a regular file, so sendfile can't lose any
data if it will not be able to write them to the output file descriptor.

Reported-by: syzbot+22d22330a35fa1c02155@syzkaller.appspotmail.com
PiperOrigin-RevId: 272730357
2019-10-03 13:38:30 -07:00
Michael Pratt 61e40819d9 Sanity test that open(2) on a UDS fails
Spoiler alert: it doesn't.

PiperOrigin-RevId: 272513529
2019-10-02 14:01:49 -07:00
Adin Scannell c8bb20865d Automated rollback of changelist 256276198
PiperOrigin-RevId: 271665517
2019-09-27 15:58:51 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Adin Scannell 502f8f238e Stub out readahead implementation.
Closes #261

PiperOrigin-RevId: 270973347
2019-09-24 13:29:46 -07:00
Kevin Krakauer 0a8a75f3da Job control: controlling TTYs and foreground process groups.
Adresses a deadlock with the rolled back change:
b6a5b950d2
Creating a session from an orphaned process group was causing a lock to be
acquired twice by a single goroutine. This behavior is addressed, and a test
(OrphanRegression) has been added to pty.cc.

Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().

Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.

PiperOrigin-RevId: 270088599
2019-09-19 11:36:47 -07:00
Adin Scannell c98e7f0d19 Signalfd support
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.

In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context.  Probably not
worthwhile fixing immediately.

PiperOrigin-RevId: 269901749
2019-09-18 15:16:42 -07:00
Michael Pratt 56cb004218 Migrate from gflags to absl flags
absl flags are more modern and we can easily depend on them directly.

The repo now successfully builds with --incompatible_load_cc_rules_from_bzl.

PiperOrigin-RevId: 269387081
2019-09-16 11:58:27 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Michael Pratt 98f7fbb59f Load C++ rules from @rules_cc
See https://github.com/bazelbuild/bazel/issues/8743. This will be required in
Bazel 1.0.

Protobuf was updated in
bf0c69e130 (diff-96239ee297e0a92ac6ff96a6bc434ef0).

GoogleTest was updated in
6fd262ecf7.

gflags has not yet been updated, so the repo still won't build with
--incompatible_load_cc_rules_from_bzl.

Tested with buildifier -warnings=native-cc -lint=warn **/BUILD.

PiperOrigin-RevId: 267638515
2019-09-06 11:29:00 -07:00
Jamie Liu fbdd3ff1da Deflake aio_test.
- Most AIO tests call io_setup(nr_events = 128). sizeof(struct io_event)
(128*32 = 4096). However, the actual size of the mapping created by
io_setup() is determined by:

(from fs/aio.c:ioctx_alloc())
/*
 * We keep track of the number of available ringbuffer slots, to prevent
 * overflow (reqs_available), and we also use percpu counters for this.
 *
 * So since up to half the slots might be on other cpu's percpu counters
 * and unavailable, double nr_events so userspace sees what they
 * expected: additionally, we move req_batch slots to/from percpu
 * counters at a time, so make sure that isn't 0:
 */
nr_events = max(nr_events, num_possible_cpus() * 4);
nr_events *= 2;

(from fs/aio.c:aio_setup_ring())
/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
size = sizeof(struct aio_ring);
size += sizeof(struct io_event) * nr_events;
nr_pages = PFN_UP(size);

When we mremap() only the first page of a multi-page AIO ring buffer
mapping, fs/aio.c:aio_ring_mremap() updates struct kioctx::mmap_base -
but struct kioctx::mmap_size is untouched, so sys_io_destroy() =>
kill_ioctx() vm_unmaps() the mremapped page, plus some number of pages
after it. Just get the actual size of the mapping from /proc/self/maps.

- Delete test case MremapOver; while it is correct that Linux will not
complain if you overwrite the AIO ring buffer with another mapping, it
won't actually work in the sense that AIO events will not be written to
the new mapping, because Linux stores the struct pages of the ring
buffer in struct kioctx::ring_pages and writes to those through kmap()
rather than using userspace addresses.

- Don't munmap() after mremap(MREMAP_FIXED) returns EFAULT; see new
comment in factored-out test case MremapExpansion.

PiperOrigin-RevId: 267482903
2019-09-05 16:36:44 -07:00
Ian Gudger fbbb2f7ed6 Run proc_net tests.
PiperOrigin-RevId: 267280086
2019-09-04 19:08:12 -07:00
Bhasker Hariharan 54bf2e8eff Automated rollback of changelist 261387276
PiperOrigin-RevId: 266491264
2019-08-30 18:15:32 -07:00
Adin Scannell 888e87909e Add C++ toolchain and fix compile issues.
This was accidentally introduced in 31f05d5d4f.

Fixes #788.

PiperOrigin-RevId: 266462843
2019-08-30 15:03:15 -07:00
Rahat Mahmood 863e11ac4d Implement /proc/net/udp.
PiperOrigin-RevId: 266229756
2019-08-29 14:30:41 -07:00
gVisor bot baf4d8aaca Internal change.
PiperOrigin-RevId: 265535438
2019-08-26 14:07:17 -07:00
Kevin Krakauer 6c3a242143 Add tests for raw AF_PACKET sockets.
PiperOrigin-RevId: 264494359
2019-08-20 16:36:06 -07:00
Kevin Krakauer bd826092fe Read iptables via sockopts.
PiperOrigin-RevId: 264180125
2019-08-19 10:05:59 -07:00
Kevin Krakauer ef045b914b Add tests for "cooked" AF_PACKET sockets.
PiperOrigin-RevId: 263666789
2019-08-15 16:31:35 -07:00
Bhasker Hariharan 570fb1db6b Improve SendMsg performance.
SendMsg before this change would copy all the data over into a
new slice even if the underlying socket could only accept a
small amount of data. This is really inefficient with non-blocking
sockets and under high throughput where large writes could get
ErrWouldBlock or if there was say a timeout associated with the sendmsg()
syscall.

With this change we delay copying bytes in till they are needed and only
copy what can be potentially sent/held in the socket buffer. Reducing
the need to repeatedly copy data over.

Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called
instead of when we transmit the actual FIN. Otherwise the socket could remain in
CONNECTED state even though the user has called shutdown() on the socket.

Updates #627

PiperOrigin-RevId: 263430505
2019-08-14 14:34:27 -07:00
Kevin Krakauer b6a5b950d2 Job control: controlling TTYs and foreground process groups.
(Don't worry, this is mostly tests.)

Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().

Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.

PiperOrigin-RevId: 261387276
2019-08-02 14:05:48 -07:00
Kevin Krakauer 09be87bbee Add iptables types for syscalls tests.
Unfortunately, Linux's ip_tables.h header doesn't compile in C++ because it
implicitly converts from void* to struct xt_entry_target*. C allows this, but
C++ does not. So we have to re-implement many types ourselves.

Relevant code here:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/netfilter_ipv4/ip_tables.h#L222

PiperOrigin-RevId: 260565570
2019-07-29 13:20:09 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Neel Natu 1178a278ae Mark timers_test flaky because setrlimit(RLIMIT_CPU) is broken in some kernels.
https://bugzilla.redhat.com/show_bug.cgi?id=1568337

PiperOrigin-RevId: 256276198
2019-07-02 17:58:15 -07:00
Andrei Vagin e9ea7230f7 fs: synchronize concurrent writes into files with O_APPEND
For files with O_APPEND, a file write operation gets a file size and uses it as
offset to call an inode write operation. This means that all other operations
which can change a file size should be blocked while the write operation doesn't
complete.

PiperOrigin-RevId: 254873771
2019-06-24 17:45:02 -07:00
Rahat Mahmood 94a6bfab5d Implement /proc/net/tcp.
PiperOrigin-RevId: 254854346
2019-06-24 15:56:36 -07:00
Neel Natu 0b2135072d Implement madvise(MADV_DONTFORK)
PiperOrigin-RevId: 254253777
2019-06-20 12:56:00 -07:00
Michael Pratt c2d87d5d7c Mark tcp_socket test flaky (for real)
The tag on the binary has no effect. It must be on the test.

PiperOrigin-RevId: 254103480
2019-06-19 17:18:11 -07:00
Neel Natu 0d1dc50b70 Mark tcp_socket test flaky.
PiperOrigin-RevId: 253997465
2019-06-19 08:08:12 -07:00