Commit Graph

541 Commits

Author SHA1 Message Date
Jay Zhuang 18d6e59b45 Switch to netinet/tcp.h and poll.h to for better platform portability.
PiperOrigin-RevId: 286249699
2019-12-18 13:58:38 -08:00
Jay Zhuang 65f53c5833 Put GetSocketPairs() in unnamed namespace
This avoids conflicting definitions of GetSocketPairs() in outer namespace when
multiple such cc files are complied for one binary.

PiperOrigin-RevId: 286243045
2019-12-18 12:50:04 -08:00
gVisor bot 3f4d8fefb4 Internal change.
PiperOrigin-RevId: 286003946
2019-12-17 10:10:06 -08:00
gVisor bot 67000b929b Explicitly export files needed by other packages
PiperOrigin-RevId: 285968611
2019-12-17 06:33:08 -08:00
Dean Deng e6f4124afd Implement checks for get/setxattr at the syscall layer.
Add checks for input arguments, file type, permissions, etc. that match
the Linux implementation. A call to get/setxattr that passes all the
checks will still currently return EOPNOTSUPP. Actual support will be
added in following commits.

Only allow user.* extended attributes for the time being.

PiperOrigin-RevId: 285835159
2019-12-16 13:20:07 -08:00
Andrei Vagin 378d6c1f36 unix: allow to bind unix sockets only to AF_UNIX addresses
Reported-by: syzbot+2c0bcfd87fb4e8b7b009@syzkaller.appspotmail.com
PiperOrigin-RevId: 285228312
2019-12-12 11:08:56 -08:00
Bhasker Hariharan 6fc9f0aefd Add support for TCP_USER_TIMEOUT option.
The implementation follows the linux behavior where specifying
a TCP_USER_TIMEOUT will cause the resend timer to honor the
user specified timeout rather than the default rto based timeout.

Further it alters when connections are timedout due to keepalive
failures. It does not alter the behavior of when keepalives are
sent. This is as per the linux behavior.

PiperOrigin-RevId: 285099795
2019-12-11 17:52:53 -08:00
Dean Deng 1601e78a52 Add syscall tests for getxattr and setxattr.
Support for getxattr and setxattr are in subsequent commits.

PiperOrigin-RevId: 285088817
2019-12-11 16:41:17 -08:00
Dean Deng 769e1cdcbe Re-enable execveat test that was causing files in /bin to be deleted.
Test now no longer deletes files incorrectly, due to a fix in fs utils
used by TempPath (github.com/google/gvisor/pull/1368).

Fixes #1366

PiperOrigin-RevId: 284814605
2019-12-10 11:42:03 -08:00
Dean Deng aadbf322c6 Disable execveat test that is causing files in /bin to be deleted.
Disable until gvisor.dev/issue/1366 is resolved.

Updates #1366

PiperOrigin-RevId: 284786895
2019-12-10 09:41:07 -08:00
Dean Deng 4a19ebd431 Add hostinet tests for sendmsg and recvmsg with TOS/TCLASS.
PiperOrigin-RevId: 284786069
2019-12-10 09:34:38 -08:00
Ian Gudger 98aafb1334 Add test for SO_BINDTODEVICE state bug.
This was accidentally dropped from the change which fixed the bug.

Updates #1217

PiperOrigin-RevId: 284689362
2019-12-09 20:09:23 -08:00
Ian Gudger 18af75db9d Add UDP SO_REUSEADDR support to the port manager.
Next steps include adding support to the transport demuxer and the UDP endpoint.

PiperOrigin-RevId: 284652151
2019-12-09 15:53:00 -08:00
Jay Zhuang 17867c88f7 Include <netinet/tcp.h> for TCP enums in proc_net tests
These are currently duplicated in ip_socket_test_util, so tests including
both netinet/tcp.h and ip_socket_test_util won't compile.

PiperOrigin-RevId: 284623958
2019-12-09 13:37:32 -08:00
Bhasker Hariharan cb5f9b8f86 Mark test as non flaky.
PiperOrigin-RevId: 284606133
2019-12-09 12:04:51 -08:00
Michael Pratt 498595d543 Add tests for rseq(2)
Add a decent set of syscall tests for rseq(2). These are a bit awkward because
of issues with library integration. libc may register rseq on thread start
(including before main on the initial thread), precluding much testing. Thus we
run tests in a libc-free subprocess.

Support for rseq(2) in gVisor will come in a later commit.

PiperOrigin-RevId: 284595994
2019-12-09 11:22:31 -08:00
Dean Deng b0066217ec Add hostinet tests for UDP sockets.
We need to skip a subset of the tests, because of features that hostinet does
not currently support.

Fixes #1209

PiperOrigin-RevId: 284235911
2019-12-06 12:14:23 -08:00
Ian Gudger 13f0f6069a Implement F_GETOWN_EX and F_SETOWN_EX.
Some versions of glibc will convert F_GETOWN fcntl(2) calls into F_GETOWN_EX in
some cases.

PiperOrigin-RevId: 284089373
2019-12-05 17:28:52 -08:00
Bhasker Hariharan f053c52812 Reduce flakiness under gotsan runs.
TcpPortReuseMultiThread creates lots of connections which result in
a lot of goroutines in the sentry. This can cause gotsan runs to
take really long and timeout. Increasing listen backlog and
reducing number of connections should help the connections complete
faster as well as reduce the number of goroutines that gotsan needs
to track.

PiperOrigin-RevId: 284046018
2019-12-05 13:57:08 -08:00
Zach Koopmans 0a32c02357 Create correct file for /proc/[pid]/task/[tid]/io
PiperOrigin-RevId: 284038840
2019-12-05 13:24:05 -08:00
gVisor bot 05758f34b2 Explicitly export files needed by other packages
PiperOrigin-RevId: 283955946
2019-12-05 05:45:09 -08:00
Dean Deng 80b7ba0c97 Clean up readv_socket test suite.
Get rid of the SocketTest class, which is only extended by ReadvSocketTest.
Also, get rid of TCP sockets (which were unused anyway) from readv_socket.cc.
This is a very old test suite that isn't the right place for TCP loopback
tests.

PiperOrigin-RevId: 283672772
2019-12-03 19:42:20 -08:00
Fabricio Voznika bb641c5403 Point TODO to gvisor.dev
PiperOrigin-RevId: 283657725
2019-12-03 17:33:50 -08:00
Andrei Vagin cf7f27c167 net/udp: return a local route address as the bound-to address
If the socket is bound to ANY and connected to a loopback address,
getsockname() has to return the loopback address. Without this fix,
getsockname() returns ANY.

PiperOrigin-RevId: 283647781
2019-12-03 16:32:13 -08:00
Bhasker Hariharan 27e2c4ddca Fix panic due to early transition to Closed.
The code in rcv.consumeSegment incorrectly transitions to
CLOSED state from LAST-ACK before the final ACK for the FIN.

Further if receiving a segment changes a socket to a closed state
then we should not invoke the sender as the socket is now closed
and sending any segments is incorrect.

PiperOrigin-RevId: 283625300
2019-12-03 14:41:55 -08:00
Andrei Vagin 43643752f0 strace: don't create a slice with a negative value
PiperOrigin-RevId: 283613824
2019-12-03 13:49:38 -08:00
Michael Pratt d7cc2480cb Add RunfilesPath to test_util
A few tests have their own ad-hoc implementations. Add a single common one.

PiperOrigin-RevId: 283601666
2019-12-03 12:47:03 -08:00
Andrei Vagin b41277049c test/syscal: Don't skip ClockGettime.CputimeId
We skipped it due to the issue in the golang scheduler
which has been fixed in go1.13.

PiperOrigin-RevId: 283432226
2019-12-02 15:37:17 -08:00
Jay Zhuang 1518f7fd38 Fix typo, s/Convertable/Convertible/g
PiperOrigin-RevId: 283345791
2019-12-02 08:33:43 -08:00
Jay Zhuang aa70523da2 Port tests in udp_socket.cc to Fuchsia
Separate out a test in udp_socket.cc that depends on <linux/errqueue.h> so the
rest of the tests can run on Fuchsia.

PiperOrigin-RevId: 283322633
2019-12-02 05:38:30 -08:00
Michael Pratt 58afb4be69 Add floating point exception tests
PiperOrigin-RevId: 282828273
2019-11-27 13:49:12 -08:00
Ian Lewis 20279c305e Allow open(O_TRUNC) and (f)truncate for proc files.
This allows writable proc and devices files to be opened with O_CREAT|O_TRUNC.
This is encountered most frequently when interacting with proc or devices files
via the command line.
e.g. $ echo 8192 1048576 4194304 > /proc/sys/net/ipv4/tcp_rmem

Also adds a test to test the behavior of open(O_TRUNC), truncate, and ftruncate
on named pipes.

Fixes #1116

PiperOrigin-RevId: 282677425
2019-11-26 18:21:09 -08:00
Andrei Vagin 4e27ba372e tests: include sys/socket.h before linux/if_arp.h
This is how it has to be accoding to the man page.

PiperOrigin-RevId: 281998068
2019-11-22 10:57:11 -08:00
Adin Scannell c0f89eba6e Import and structure cleanup.
PiperOrigin-RevId: 281795269
2019-11-21 11:41:30 -08:00
Ting-Yu Wang af323eb7c1 Fix return codes for {get,set}sockopt for some nullptr cases.
Updates #1092

PiperOrigin-RevId: 280547239
2019-11-14 17:04:34 -08:00
Kevin Krakauer 339536de5e Check that a file is a regular file with open(O_TRUNC).
It was possible to panic the sentry by opening a cache revalidating folder with
O_TRUNC|O_CREAT.

Avoids breaking php tests.

PiperOrigin-RevId: 280533213
2019-11-14 16:08:34 -08:00
Andrei Vagin 1e55eb3800 test/syscalls/proc: check an return code of waitid
PiperOrigin-RevId: 280295208
2019-11-13 15:48:12 -08:00
Ian Gudger 2c6c9af904 Add UDP SO_REUSEADDR/SO_REUSEPORT conversion tests.
Add additional tests for UDP SO_REUSEADDR and SO_REUSEPORT interaction.

If all existing all currently bound sockets as well as the current binding
socket have SO_REUSEADDR, or if all existing all currently bound sockets as
well as the current binding socket have SO_REUSEPORT, binding a currently bound
address is allowed. This seems odd since it means that the
SO_REUSEADDR/SO_REUSEPORT behavior can change with the binding of additional
sockets.

PiperOrigin-RevId: 280116163
2019-11-12 20:39:04 -08:00
Ian Gudger 57a2a5ea33 Add tests for SO_REUSEADDR and SO_REUSEPORT.
* Basic tests for the SO_REUSEADDR and SO_REUSEPORT options.
* SO_REUSEADDR functional tests for TCP and UDP.
* SO_REUSEADDR and SO_REUSEPORT interaction tests for UDP.
* Stubbed support for UDP getsockopt(SO_REUSEADDR).

PiperOrigin-RevId: 280049265
2019-11-12 14:04:14 -08:00
Ian Gudger b82bd24f94 Update ephemeral port reservation tests.
The existing tests which are disabled on gVisor are failing because we default
to SO_REUSEADDR being enabled for TCP sockets. Update the test comments.

Also add new tests for enabled SO_REUSEADDR.

PiperOrigin-RevId: 279862275
2019-11-11 18:35:48 -08:00
Bhasker Hariharan 2b0e4dc6aa Remove obsolete TODO. This is now fixed.
PiperOrigin-RevId: 279835100
2019-11-11 15:51:10 -08:00
gVisor bot 7730716800 Make `connect` on socket returned by `accept` correctly error out with EISCONN
PiperOrigin-RevId: 279814493
2019-11-11 14:15:06 -08:00
Bhasker Hariharan 66ebb6575f Add support for TIME_WAIT timeout.
This change adds explicit support for honoring the 2MSL timeout
for sockets in TIME_WAIT state. It also adds support for the
TCP_LINGER2 option that allows modification of the FIN_WAIT2
state timeout duration for a given socket.

It also adds an option to modify the Stack wide TIME_WAIT timeout
but this is only for testing. On Linux this is fixed at 60s.

Further, we also now correctly process RST's in CLOSE_WAIT and
close the socket similar to linux without moving it to error
state.

We also now handle SYN in ESTABLISHED state as per
RFC5961#section-4.1. Earlier we would just drop these SYNs.
Which can result in some tests that pass on linux to fail on
gVisor.

Netstack now honors TIME_WAIT correctly as well as handles the
following cases correctly.

- TCP RSTs in TIME_WAIT are ignored.
- A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT
  and a dup ACK is sent in response to the FIN as the dup FIN
  indicates potential loss of the original final ACK.
- An out of order segment during TIME_WAIT generates a dup ACK.
- A new SYN w/ a sequence number > the highest sequence number
  in the previous connection closes the TIME_WAIT early and
  opens a new connection.

Further to make the SYN case work correctly the ISN (Initial
Sequence Number) generation for Netstack has been updated to
be as per RFC. Its not a pure random number anymore and follows
the recommendation in https://tools.ietf.org/html/rfc6528#page-3.

The current hash used is not a cryptographically secure hash
function. A separate change will update the hash function used
to Siphash similar to what is used in Linux.

PiperOrigin-RevId: 279106406
2019-11-07 09:46:55 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Andrei Vagin 493334f8b5 kokoro: run KVM syscall tests
We don't know how stable they are, so let's start with warning.

PiperOrigin-RevId: 278484186
2019-11-04 16:00:34 -08:00
Michael Pratt b23b36e701 Add NETLINK_KOBJECT_UEVENT socket support
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.

systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.

Fixes #1117
Updates #1119

PiperOrigin-RevId: 278405893
2019-11-04 10:07:52 -08:00
Michael Pratt 515fee5b6d Add SO_PASSCRED support to netlink sockets
Since we only supporting sending messages from the kernel, the peer is always
the kernel, simplifying handling.

There are currently no known users of SO_PASSCRED that would actually receive
messages from gVisor, but adding full support is barely more work than stubbing
out fake support.

Updates #1117
Fixes #1119

PiperOrigin-RevId: 277981465
2019-11-01 12:45:11 -07:00
Andrei Vagin af6af2c341 tests: don't use ASSERT_THAT after fork
PiperOrigin-RevId: 277965624
2019-11-01 11:22:21 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Dean Deng 8bc7b8dba2 Clean up typos in test names.
PiperOrigin-RevId: 277572791
2019-10-30 13:31:12 -07:00
Dean Deng 38330e9377 Update symlink traversal limit when resolving interpreter path.
When execveat is called on an interpreter script, the symlink count for
resolving the script path should be separate from the count for resolving the
the corresponding interpreter. An ELOOP error should not occur if we do not hit
the symlink limit along any individual path, even if the total number of
symlinks encountered exceeds the limit.

Closes #574

PiperOrigin-RevId: 277358474
2019-10-29 13:59:28 -07:00
Bhasker Hariharan 392c561495 Fix PollWithFullBufferBlocks.
Set the snd/rcv buffer sizes so that the test is deterministic and runs in a
reasonable amount of time. It also ensures that we disable any auto-tuning of
the send/receive buffer which may happen.

PiperOrigin-RevId: 277337232
2019-10-29 12:17:06 -07:00
Dean Deng 29273b0384 Disallow execveat on interpreter scripts with fd opened with O_CLOEXEC.
When an interpreter script is opened with O_CLOEXEC and the resulting fd is
passed into execveat, an ENOENT error should occur (the script would otherwise
be inaccessible to the interpreter). This matches the actual behavior of
Linux's execveat.

PiperOrigin-RevId: 277306680
2019-10-29 10:04:39 -07:00
Haibo e0c84f284c test/syscall: Remove duplicated gtest/gtest.h.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I05a7ec69b98b88931ba4a8adb3e8a7b822006001
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/1023 from xiaobo55x:syscall_test d44a8b1f827ed4081997af96cd58ba7449e0a9e1
PiperOrigin-RevId: 276740442
2019-10-25 12:40:36 -07:00
Dean Deng d9fd536340 Handle AT_SYMLINK_NOFOLLOW flag for execveat.
PiperOrigin-RevId: 276441249
2019-10-24 01:45:25 -07:00
Dean Deng 7ca50236c4 Handle AT_EMPTY_PATH flag in execveat.
PiperOrigin-RevId: 276419967
2019-10-23 22:23:05 -07:00
Kevin Krakauer 072af49059 Add check for proper settings to AF_PACKET tests.
As in packet_socket_raw.cc, we should check that certain proc files are set
correctly.

PiperOrigin-RevId: 276384534
2019-10-23 17:21:12 -07:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Michael Pratt c0065e296f Remove comparison between signed and unsigned int
Some compilers don't like the comparison between int and size_t. Remove it.

The other changes are minor style cleanups.

PiperOrigin-RevId: 276333450
2019-10-23 12:59:48 -07:00
Dean Deng 0b569b7cae Add basic implementation of execveat syscall and associated tests.
Allow file descriptors of directories as well as AT_FDCWD.

PiperOrigin-RevId: 275929668
2019-10-21 14:55:18 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Michael Pratt 49b596b98d Cleanup host UDS support
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.

A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.

Updates #235
Updates #1003

[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.

PiperOrigin-RevId: 275558502
2019-10-18 15:33:03 -07:00
Andrei Vagin 4c7f849b25 test: use a bigger buffer to fill a socket
Otherwise we need to do a lot of system calls and cooperative_save tests work
slow.

PiperOrigin-RevId: 275536957
2019-10-18 13:40:31 -07:00
gVisor bot d22f0534c0 Merge pull request #736 from tanjianfeng:fix-unix
PiperOrigin-RevId: 275114157
2019-10-16 14:41:43 -07:00
Michael Pratt de9a8e0eb7 Remove death from exec test names
These aren't actually death tests in the GUnit sense. i.e., they don't call
EXPECT_EXIT or EXPECT_DEATH.

PiperOrigin-RevId: 275099957
2019-10-16 13:25:11 -07:00
Jianfeng Tan d277bfba27 epsocket: support /proc/net/snmp
Netstack has its own stats, we use this to fill /proc/net/snmp.

Note that some metrics are not recorded in Netstack, which will be shown
as 0 in the proc file.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15 16:38:41 +00:00
Jianfeng Tan e3d4a67739 support /proc/net/snmp
This proc file contains statistics according to [1].

[1] https://tools.ietf.org/html/rfc2013

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-15 16:38:40 +00:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Ian Lewis 470997ca99 Allow for zero byte iovec with MSG_PEEK | MSG_TRUNC in recvmsg.
This allows for peeking at the length of the next message on a netlink socket
without pulling it off the socket's buffer/queue, allowing tools like 'ip' to
work.

This CL also fixes an issue where dump_done_errno was not included in the
NLMSG_DONE messages payload.

Issue #769

PiperOrigin-RevId: 274068637
2019-10-10 16:55:48 -07:00
Adin Scannell f8b1859319 Fix signalfd polling.
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 274017890
2019-10-10 12:51:22 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
Andrei Vagin db218fdfcf Don't report partialResult errors from sendfile
The input file descriptor is always a regular file, so sendfile can't lose any
data if it will not be able to write them to the output file descriptor.

Reported-by: syzbot+22d22330a35fa1c02155@syzkaller.appspotmail.com
PiperOrigin-RevId: 272730357
2019-10-03 13:38:30 -07:00
gVisor bot cde7711837 Merge pull request #865 from tanjianfeng:fix-829
PiperOrigin-RevId: 272522508
2019-10-02 14:51:04 -07:00
Michael Pratt 61e40819d9 Sanity test that open(2) on a UDS fails
Spoiler alert: it doesn't.

PiperOrigin-RevId: 272513529
2019-10-02 14:01:49 -07:00
Michael Pratt 0d483985c5 Include AT_SECURE in the aux vector
gVisor does not currently implement the functionality that would result in
AT_SECURE = 1, but Linux includes AT_SECURE = 0 in the normal case, so we
should do the same.
PiperOrigin-RevId: 272311488
2019-10-01 15:43:14 -07:00
Michael Pratt 277f84ad20 Support new interpreter requirements in test
Refactoring in 0036d1f7eb95bcc52977f15507f00dd07018e7e2 (v4.10) caused Linux to
start unconditionally zeroing the remainder of the last page in the
interpreter. Previously it did not due so if filesz == memsz, and *still* does
not do so when filesz == memsz for loading binaries, only interpreter.

This inconsistency is not worth replicating in gVisor, as it is arguably a bug,
but our tests must ensure we create interpreter ELFs compatible with this new
requirement.

PiperOrigin-RevId: 272266401
2019-10-01 12:25:11 -07:00
Michael Pratt dd69b49ed1 Disable cpuClockTicker when app is idle
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to
track their CPU usage. This improves latency in the syscall path by avoid
expensive monotonic clock calls on every syscall entry/exit.

However, this timer fires every 10ms. Thus, when all tasks are idle (i.e.,
blocked or stopped), this forces a sentry wakeup every 10ms, when we may
otherwise be able to sleep until the next app-relevant event. These wakeups
cause the sentry to utilize approximately 2% CPU when the application is
otherwise idle.

Updates to clock are not strictly necessary when the app is idle, as there are
no readers of cpuClock. This commit reduces idle CPU by disabling the timer
when tasks are completely idle, and computing its effects at the next wakeup.

Rather than disabling the timer as soon as the app goes idle, we wait until the
next tick, which provides a window for short sleeps to sleep and wakeup without
doing the (relatively) expensive work of disabling and enabling the timer.

PiperOrigin-RevId: 272265822
2019-10-01 12:21:01 -07:00
Michael Pratt 53cc72da90 Honor X bit on extra anon pages in PT_LOAD segments
Linux changed this behavior in 16e72e9b30986ee15f17fbb68189ca842c32af58
(v4.11). Previously, extra pages were always mapped RW. Now, those pages will
be executable if the segment specified PF_X. They still must be writeable.

PiperOrigin-RevId: 272256280
2019-10-01 11:30:36 -07:00
Kevin Krakauer c06cca6678 De-flake SetForegroundProcessGroupDifferentSession.
PiperOrigin-RevId: 272059043
2019-09-30 13:59:36 -07:00
Michael Pratt 981fc188f0 Only copy out remaining time on nanosleep success
It looks like the old code attempted to do this, but didn't realize that err !=
nil even in the happy case.

PiperOrigin-RevId: 272005887
2019-09-30 13:07:32 -07:00
Adin Scannell c8bb20865d Automated rollback of changelist 256276198
PiperOrigin-RevId: 271665517
2019-09-27 15:58:51 -07:00
gVisor bot 8539abc0df Merge pull request #864 from tanjianfeng:fix-861
PiperOrigin-RevId: 271649711
2019-09-27 15:18:09 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Kevin Krakauer 543492650d Make raw socket tests pass in environments with or without CAP_NET_RAW.
PiperOrigin-RevId: 271442321
2019-09-26 15:09:20 -07:00
Andrei Vagin 2fb34c8d5c test: don't use designated initializers
This change fixes compile errors:
pty.cc:1460:7: error: expected primary-expression before '.' token
...

PiperOrigin-RevId: 271033729
2019-09-24 19:05:12 -07:00
Adin Scannell 502f8f238e Stub out readahead implementation.
Closes #261

PiperOrigin-RevId: 270973347
2019-09-24 13:29:46 -07:00
Bhasker Hariharan 9846da5e65 Fix bug in RstCausesPollHUP.
The test is checking the wrong poll_fd for POLLHUP. The only
reason it passed till now was because it was also checking
for POLLIN which was always true on the other fd from the
previous poll!

PiperOrigin-RevId: 270780401
2019-09-23 16:00:50 -07:00
Jianfeng Tan 223481e927 fix set hostname
Previously, when we set hostname:

$ strace hostname abc
...
sethostname("abc", 3) = -1 ENAMETOOLONG (File name too long)
...

According to man 2 sethostname:

"The len argument specifies the number of bytes in name. (Thus, name
does not require a terminating null byte.)"

We wrongly use the CopyStringIn() to check terminating zero byte in
the implementation of sethostname syscall.

To fix this, we use CopyInBytes() instead.

Fixes: #861

Reported-by: chenglang.hy <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-09-20 17:57:25 +00:00
Jianfeng Tan 329b6653ff Implement /proc/net/tcp6
Fixes: #829

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Signed-off-by: Jielong Zhou <jielong.zjl@antfin.com>
2019-09-20 17:20:08 +00:00
Kevin Krakauer 0a8a75f3da Job control: controlling TTYs and foreground process groups.
Adresses a deadlock with the rolled back change:
b6a5b950d2
Creating a session from an orphaned process group was causing a lock to be
acquired twice by a single goroutine. This behavior is addressed, and a test
(OrphanRegression) has been added to pty.cc.

Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().

Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.

PiperOrigin-RevId: 270088599
2019-09-19 11:36:47 -07:00
Adin Scannell c98e7f0d19 Signalfd support
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.

In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context.  Probably not
worthwhile fixing immediately.

PiperOrigin-RevId: 269901749
2019-09-18 15:16:42 -07:00
Michael Pratt 56cb004218 Migrate from gflags to absl flags
absl flags are more modern and we can easily depend on them directly.

The repo now successfully builds with --incompatible_load_cc_rules_from_bzl.

PiperOrigin-RevId: 269387081
2019-09-16 11:58:27 -07:00
Andrei Vagin 239a07aabf gvisor: return ENOTDIR from the unlink syscall
ENOTDIR has to be returned when a component used as a directory in
pathname is not, in  fact,  a directory.

PiperOrigin-RevId: 269037893
2019-09-13 21:44:57 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Adin Scannell 849c57314f Fix minor Kokoro issues.
A recent Kokoro change pointed to go_tests.cfg (in line with the
other configurations), which unfortunately broke the presubmits.

This change also enabled the KVM tests, which were still using a
remote execution strategy.

This fixes both of these issues and allows presubmits to pass.

One additional test was caught with this case, which seems to
have been broken. It's unclear why this was not being caught.

PiperOrigin-RevId: 268166291
2019-09-10 00:38:52 -07:00
Michael Pratt 98f7fbb59f Load C++ rules from @rules_cc
See https://github.com/bazelbuild/bazel/issues/8743. This will be required in
Bazel 1.0.

Protobuf was updated in
bf0c69e130 (diff-96239ee297e0a92ac6ff96a6bc434ef0).

GoogleTest was updated in
6fd262ecf7.

gflags has not yet been updated, so the repo still won't build with
--incompatible_load_cc_rules_from_bzl.

Tested with buildifier -warnings=native-cc -lint=warn **/BUILD.

PiperOrigin-RevId: 267638515
2019-09-06 11:29:00 -07:00
Bhasker Hariharan eb074a61f2 Fix bug in proc_test.
TestNoDuplicates is racy as it tries to read the /proc file system
while the test is running. But it's possible that from the time a
directory entries are read and each entry processed something could
change and in some cases the entry being processed could have been
deleted. In such cases we should not fail the test but just
ignore the error and move on.

PiperOrigin-RevId: 267483094
2019-09-05 16:40:46 -07:00
Jamie Liu fbdd3ff1da Deflake aio_test.
- Most AIO tests call io_setup(nr_events = 128). sizeof(struct io_event)
(128*32 = 4096). However, the actual size of the mapping created by
io_setup() is determined by:

(from fs/aio.c:ioctx_alloc())
/*
 * We keep track of the number of available ringbuffer slots, to prevent
 * overflow (reqs_available), and we also use percpu counters for this.
 *
 * So since up to half the slots might be on other cpu's percpu counters
 * and unavailable, double nr_events so userspace sees what they
 * expected: additionally, we move req_batch slots to/from percpu
 * counters at a time, so make sure that isn't 0:
 */
nr_events = max(nr_events, num_possible_cpus() * 4);
nr_events *= 2;

(from fs/aio.c:aio_setup_ring())
/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
size = sizeof(struct aio_ring);
size += sizeof(struct io_event) * nr_events;
nr_pages = PFN_UP(size);

When we mremap() only the first page of a multi-page AIO ring buffer
mapping, fs/aio.c:aio_ring_mremap() updates struct kioctx::mmap_base -
but struct kioctx::mmap_size is untouched, so sys_io_destroy() =>
kill_ioctx() vm_unmaps() the mremapped page, plus some number of pages
after it. Just get the actual size of the mapping from /proc/self/maps.

- Delete test case MremapOver; while it is correct that Linux will not
complain if you overwrite the AIO ring buffer with another mapping, it
won't actually work in the sense that AIO events will not be written to
the new mapping, because Linux stores the struct pages of the ring
buffer in struct kioctx::ring_pages and writes to those through kmap()
rather than using userspace addresses.

- Don't munmap() after mremap(MREMAP_FIXED) returns EFAULT; see new
comment in factored-out test case MremapExpansion.

PiperOrigin-RevId: 267482903
2019-09-05 16:36:44 -07:00
Ian Gudger fbbb2f7ed6 Run proc_net tests.
PiperOrigin-RevId: 267280086
2019-09-04 19:08:12 -07:00
Bhasker Hariharan 54bf2e8eff Automated rollback of changelist 261387276
PiperOrigin-RevId: 266491264
2019-08-30 18:15:32 -07:00
Jamie Liu f3dabdfc48 Fix async-signal-unsafety in MlockallTest_Future.
PiperOrigin-RevId: 266491246
2019-08-30 18:11:15 -07:00
Fabricio Voznika 502c47f7a7 Return correct buffer size for ioctl(socket, FIONREAD)
Ioctl was returning just the buffer size from epsocket.endpoint
and it was not considering data from epsocket.SocketOperations
that was read from the endpoint, but not yet sent to the caller.

PiperOrigin-RevId: 266485461
2019-08-30 17:19:09 -07:00
Adin Scannell 888e87909e Add C++ toolchain and fix compile issues.
This was accidentally introduced in 31f05d5d4f.

Fixes #788.

PiperOrigin-RevId: 266462843
2019-08-30 15:03:15 -07:00
Rahat Mahmood f74affe203 Handle new representation of abstract UDS paths.
When abstract unix domain socket paths are displayed in
/proc/net/unix, Linux historically emitted null bytes as padding at
the end of the path. Newer versions of Linux (v4.9,
e7947ea770d0de434d38a0f823e660d3fd4bebb5) display these as '@'
characters.

Update proc_net_unix test to handle both version of the padding.

PiperOrigin-RevId: 266230200
2019-08-29 14:37:47 -07:00
Rahat Mahmood 863e11ac4d Implement /proc/net/udp.
PiperOrigin-RevId: 266229756
2019-08-29 14:30:41 -07:00
gVisor bot 31f05d5d4f Internal change.
PiperOrigin-RevId: 266199211
2019-08-29 14:01:47 -07:00
Zach Koopmans f64d9a7d93 Fix pwritev2 flaky test.
Fix a uninitialized memory bug in pwritev2 test.

PiperOrigin-RevId: 265772176
2019-08-27 14:50:03 -07:00
Fabricio Voznika 8fd89fd7a2 Fix sendfile(2) error code
When output file is in append mode, sendfile(2) should fail
with EINVAL and not EBADF.

Closes #721

PiperOrigin-RevId: 265718958
2019-08-27 10:52:46 -07:00
gVisor bot baf4d8aaca Internal change.
PiperOrigin-RevId: 265535438
2019-08-26 14:07:17 -07:00
Zach Koopmans a5d0115943 Second try at flaky futex test.
The flake had the call to futex_unlock_pi() returning EINVAL with the
FUTEX_OWNER_DIED set. In this case, userspace has to clean up stale
state. So instead of calling FUTEX_UNLOCK_PI outright, we'll use the
advised atomic compare_exchange as advised in the man page.

PiperOrigin-RevId: 265163920
2019-08-23 16:54:18 -07:00
Michael Pratt 52e674b44d Remove ASSERT from fork child
The gunit macros are not safe to use in the child.

PiperOrigin-RevId: 264904348
2019-08-22 13:21:04 -07:00
Jianfeng Tan 2c3e2ed2bf unix: return ECONNRESET if peer closed with data not read
For SOCK_STREAM type unix socket, we shall return ECONNRESET if peer is
closed with data not read.

We explictly set a flag when closing one end, to differentiate from
just shutdown (where zero shall be returned).

Fixes: #735

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-08-22 15:25:38 +00:00
Jianfeng Tan 96f78e2466 unix: return zero if peer is closed
Previously, recvmsg() on a unix stream socket with its peer closed will
never return, with goroutine call trace like this:

  ...
  2  in gvisor.dev/gvisor/pkg/sentry/kernel.(*Task).block
     at pkg/sentry/kernel/task_block.go:124
  3  in gvisor.dev/gvisor/pkg/sentry/kernel.(*Task).BlockWithDeadline
     at pkg/sentry/kernel/task_block.go:69
  4  in gvisor.dev/gvisor/pkg/sentry/socket/unix.(*SocketOperations).RecvMsg
     at pkg/sentry/socket/unix/unix.go:612
  5  in gvisor.dev/gvisor/pkg/sentry/syscalls/linux.recvFrom
     at pkg/sentry/syscalls/linux/sys_socket.go:885
  6  in gvisor.dev/gvisor/pkg/sentry/syscalls/linux.RecvFrom
     at pkg/sentry/syscalls/linux/sys_socket.go:910
  ...

The issue is caused by that ErrClosedForReceive returned by
unix/transport.queue is turned into nil in
unix.(*EndpointReader).ReadToBlocks():

  err.ToError()

As a result, in unix.(*SocketOperations).RecvMsg():

  n == 0 and err == nil

We shall differentiate it from another case - no data to read where
ErrWouldBlock shall be returned; and return 0 immediately.

Fixes: #734

Reported-by: chenglang.hy <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-08-22 15:25:38 +00:00
Chris Kuiper 8d9276ed56 Support binding to multicast and broadcast addresses
This fixes the issue of not being able to bind to either a multicast or
broadcast address as well as to send and receive data from it. The way to solve
this is to treat these addresses similar to the ANY address and register their
transport endpoint ID with the global stack's demuxer rather than the NIC's.
That way there is no need to require an endpoint with that multicast or
broadcast address. The stack's demuxer is in fact the only correct one to use,
because neither broadcast- nor multicast-bound sockets care which NIC a
packet was received on (for multicast a join is still needed to receive packets
on a NIC).

I also took the liberty of refactoring udp_test.go to consolidate a lot of
duplicate code and make it easier to create repetitive tests that test the same
feature for a variety of packet and socket types. For this purpose I created a
"flowType" that represents two things: 1) the type of packet being sent or
received and 2) the type of socket used for the test. E.g., a "multicastV4in6"
flow represents a V4-mapped multicast packet run through a V6-dual socket.

This allows writing significantly simpler tests. A nice example is testTTL().

PiperOrigin-RevId: 264766909
2019-08-21 22:54:25 -07:00
Andrei Vagin 5fd63d1c7f tests: retry connect if it fails with EINTR
test/syscalls/linux/proc_net_tcp.cc:252: Failure
 Value of: connect(client->get(), &addr, addrlen)
 Expected: not -1 (success)
   Actual: -1 (of type int), with errno PosixError(errno=4 Interrupted system call)

PiperOrigin-RevId: 264743815
2019-08-21 19:07:11 -07:00
Kevin Krakauer 6c3a242143 Add tests for raw AF_PACKET sockets.
PiperOrigin-RevId: 264494359
2019-08-20 16:36:06 -07:00
Zach Koopmans 3d0715b3f8 Fix flaky futex test.
The test is long running (175128 ms or so) which causes timeouts.
The test simply makes sure that private futexes can acquire
locks concurrently. Dropping current threads and increasing the
number of locks each thread tests the same concurrency concerns
but drops execution time to ~1411 ms.

PiperOrigin-RevId: 264476144
2019-08-20 15:06:54 -07:00
Kevin Krakauer bd826092fe Read iptables via sockopts.
PiperOrigin-RevId: 264180125
2019-08-19 10:05:59 -07:00
Andrei Vagin 3e4102b2ea netstack: disconnect an unix socket only if the address family is AF_UNSPEC
Linux allows to call connect for ANY and the zero port.

PiperOrigin-RevId: 263892534
2019-08-16 19:32:14 -07:00
Kevin Krakauer ef045b914b Add tests for "cooked" AF_PACKET sockets.
PiperOrigin-RevId: 263666789
2019-08-15 16:31:35 -07:00
Bhasker Hariharan 570fb1db6b Improve SendMsg performance.
SendMsg before this change would copy all the data over into a
new slice even if the underlying socket could only accept a
small amount of data. This is really inefficient with non-blocking
sockets and under high throughput where large writes could get
ErrWouldBlock or if there was say a timeout associated with the sendmsg()
syscall.

With this change we delay copying bytes in till they are needed and only
copy what can be potentially sent/held in the socket buffer. Reducing
the need to repeatedly copy data over.

Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called
instead of when we transmit the actual FIN. Otherwise the socket could remain in
CONNECTED state even though the user has called shutdown() on the socket.

Updates #627

PiperOrigin-RevId: 263430505
2019-08-14 14:34:27 -07:00
Rahat Mahmood 691c2f8173 Compute size of struct tcp_info instead of hardcoding it.
PiperOrigin-RevId: 263040624
2019-08-12 17:34:38 -07:00
Andrei Vagin af90e68623 netlink: return an error in nlmsgerr
Now if a process sends an unsupported netlink requests,
an error is returned from the send system call.

The linux kernel works differently in this case. It returns errors in the
nlmsgerr netlink message.

Reported-by: syzbot+571d99510c6f935202da@syzkaller.appspotmail.com
PiperOrigin-RevId: 262690453
2019-08-09 22:34:54 -07:00
Bhasker Hariharan dfbc0b0a4c Fix for a panic due to writing to a closed accept channel.
This can happen because endpoint.Close() closes the accept channel first and
then drains/resets any accepted but not delivered connections. But there can be
connections that are connected but not delivered to the channel as the channel
was full. But closing the channel can cause these writes to fail with a write to
a closed channel.

The correct solution is to abort any connections in SYN-RCVD state and
drain/abort all completed connections before closing the accept channel.

PiperOrigin-RevId: 261951132
2019-08-06 11:01:27 -07:00
Michael Pratt 704f9610f3 Require pread/pwrite for splice file offsets
If there is an offset, the file must support pread/pwrite. See
fs/splice.c:do_splice.

PiperOrigin-RevId: 261944932
2019-08-06 10:35:28 -07:00
Kevin Krakauer b6a5b950d2 Job control: controlling TTYs and foreground process groups.
(Don't worry, this is mostly tests.)

Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().

Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.

PiperOrigin-RevId: 261387276
2019-08-02 14:05:48 -07:00
Rahat Mahmood 2906dffcdb Automated rollback of changelist 261191548
PiperOrigin-RevId: 261373749
2019-08-02 12:52:40 -07:00
Rahat Mahmood 79511e8a50 Implement getsockopt(TCP_INFO).
Export some readily-available fields for TCP_INFO and stub out the rest.

PiperOrigin-RevId: 261191548
2019-08-01 13:58:48 -07:00
Ian Lewis 0a246fab80 Basic support for 'ip route'
Implements support for RTM_GETROUTE requests for netlink sockets.

Fixes #507

PiperOrigin-RevId: 261051045
2019-07-31 20:30:09 -07:00
Austin Kiekintveld 12c4eb294a Fix ICMPv4 EchoReply packet checksum
The checksum was not being reset before being re-calculated and sent out.
This caused the sent checksum to always be `0x0800`.

Fixes #605.

PiperOrigin-RevId: 260965059
2019-07-31 11:26:41 -07:00
Tamir Duberstein c6e6d92cb1 Test connecting UDP sockets to the ANY address
This doesn't currently pass on gVisor.

While I'm here, fix a bug where connecting to the v6-mapped v4 address doesn't
work in gVisor.

PiperOrigin-RevId: 260923961
2019-07-31 07:41:20 -07:00
Zach Koopmans f0507e1db1 Fix flaky stat.cc test.
This test flaked on my current CL. Linux makes no guarantee
that two inodes will consecutive (overflows happen).

https://github.com/avagin/linux-task-diag/blob/master/fs/inode.c#L880

PiperOrigin-RevId: 260608240
2019-07-29 16:47:58 -07:00
Kevin Krakauer 09be87bbee Add iptables types for syscalls tests.
Unfortunately, Linux's ip_tables.h header doesn't compile in C++ because it
implicitly converts from void* to struct xt_entry_target*. C allows this, but
C++ does not. So we have to re-implement many types ourselves.

Relevant code here:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/netfilter_ipv4/ip_tables.h#L222

PiperOrigin-RevId: 260565570
2019-07-29 13:20:09 -07:00
Tamir Duberstein 12c256568b Deduplicate EndpointState.connected some
This fixes a bug introduced in cl/251934850 that caused
connect-accept-close-connect races to result in the second connect call
failiing when it should have succeeded.

PiperOrigin-RevId: 259584525
2019-07-23 12:10:18 -07:00
gVisor bot f544509c01 Merge pull request #450 from Pixep:feature/add-clock-boottime-as-monotonic
PiperOrigin-RevId: 258996346
2019-07-19 10:44:45 -07:00
Chris Kuiper 0e040ba6e8 Handle interfaceAddr and NIC options separately for IP_MULTICAST_IF
This tweaks the handling code for IP_MULTICAST_IF to ignore the InterfaceAddr
if a NICID is given.

PiperOrigin-RevId: 258982541
2019-07-19 09:29:04 -07:00
Andrei Vagin eefa817cfd net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)
PiperOrigin-RevId: 258859507
2019-07-18 15:41:04 -07:00
Kevin Krakauer 9f1189130e Add AF_UNIX, SOCK_RAW sockets, which exist for some reason.
tcpdump creates these.

PiperOrigin-RevId: 258611829
2019-07-17 11:49:16 -07:00
gVisor bot 682fd2d68f Merge pull request #533 from kevinGC:stub-dev-tty
PiperOrigin-RevId: 258607547
2019-07-17 11:28:30 -07:00
Michael Pratt ca829158e3 Properly invalidate cache in rename and remove
We were invalidating the wrong overlayEntry in rename and missing invalidation
in rename and remove if lower exists.

PiperOrigin-RevId: 258604685
2019-07-17 11:14:57 -07:00
Adrien Leravat 02d1bd67f0 Add CLOCK_BOOTTIME tests to timerfd.cc 2019-07-16 21:30:48 -07:00
gVisor bot 74dc663bbb Internal change.
PiperOrigin-RevId: 258424489
2019-07-16 13:03:37 -07:00
Kevin Krakauer 3d78baf06d Replace vector of arrays with array of arrays.
C++ does not like vectors of arrays (because arrays are not copy-constructable).

PiperOrigin-RevId: 258270980
2019-07-15 17:29:13 -07:00
Neel Natu ab44d145bb Fix initialization of badhandler_low_water_mark in SigaltstackTest.
It is now correctly initialized to the top of the signal stack.
Previously it was initialized to the address of 'stack.ss_sp' on
the main thread stack.

PiperOrigin-RevId: 258248363
2019-07-15 15:22:24 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Kevin Krakauer 6ebb925acd Add permission, char device, and uid checks.
Change-Id: I8307bfb390a56424aaa651285a218aad277c4aed
2019-07-12 15:16:01 -07:00
Bhasker Hariharan 6116473b2f Stub out support for TCP_MAXSEG.
Adds support to set/get the TCP_MAXSEG value but does not
really change the segment sizes emitted by netstack or
alter the MSS advertised by the endpoint. This is currently
being added only to unblock iperf3 on gVisor. Plumbing
this correctly requires a bit more work which will come
in separate CLs.

PiperOrigin-RevId: 257859112
2019-07-12 13:35:17 -07:00
gVisor bot eff2c264a4 Merge pull request #282 from zhangningdlut:chris_test_proc
PiperOrigin-RevId: 257855479
2019-07-12 13:11:01 -07:00
Kevin 44427d8e26 Add a stub for /dev/tty.
Actual implementation to follow, but this will satisfy applications that
want it to just exist.
2019-07-11 21:24:27 -07:00
Liu Hua 7581e84cb6 tss: block userspace access to all I/O ports.
A userspace process (CPL=3) can access an i/o port if the bit corresponding to
the port is set to 0 in the I/O permission bitmap.

Configure the I/O permission bitmap address beyond the last valid byte in the
TSS so access to all i/o ports is blocked.

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Change-Id: I3df76980c3735491db768f7210e71703f86bb989
PiperOrigin-RevId: 257336518
2019-07-09 22:21:56 -07:00
Nicolas Lacasse 6db3f8d54c Don't mask errors in createAt loop.
The error set in the loop in createAt was being masked
by other errors declared with ":=". This allowed an
ErrResolveViaReadlink error to escape, which can cause
a sentry panic.

Added test case which repros without the fix.

PiperOrigin-RevId: 257061767
2019-07-08 14:57:15 -07:00
Andrei Vagin 116cac053e netstack/udp: connect with the AF_UNSPEC address family means disconnect
PiperOrigin-RevId: 256433283
2019-07-03 14:19:02 -07:00
Neel Natu 1178a278ae Mark timers_test flaky because setrlimit(RLIMIT_CPU) is broken in some kernels.
https://bugzilla.redhat.com/show_bug.cgi?id=1568337

PiperOrigin-RevId: 256276198
2019-07-02 17:58:15 -07:00
Nicolas Lacasse 06537129a6 Check remaining traversal limit when creating a file through a symlink.
This fixes the case when an app tries to create a file that already exists, and
is a symlink to itself. A test was added.

PiperOrigin-RevId: 256044811
2019-07-01 15:25:22 -07:00
Nicolas Lacasse cf51e77d6d Fix suggestions from clang.
PiperOrigin-RevId: 255679603
2019-06-28 15:32:29 -07:00
Fabricio Voznika b2907595e5 Complete pipe support on overlayfs
Get/Set pipe size and ioctl support were missing from
overlayfs. It required moving the pipe.Sizer interface
to fs so that overlay could get access.

Fixes #318

PiperOrigin-RevId: 255511125
2019-06-27 17:22:53 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
gVisor bot 7188790f92 Merge pull request #461 from brb-g:128_procseekend
PiperOrigin-RevId: 255462850
2019-06-27 13:58:14 -07:00
Nicolas Lacasse 857e5c47e9 Follow symlinks when creating a file, and create the target.
If we have a symlink whose target does not exist, creating the symlink (either
via 'creat' or 'open' with O_CREAT flag) should create the target of the
symlink. Previously, gVisor would error with EEXIST in this case

PiperOrigin-RevId: 255232944
2019-06-26 11:49:20 -07:00
Adrien Leravat 3688e6e99d Add CLOCK_BOOTTIME as a CLOCK_MONOTONIC alias
Makes CLOCK_BOOTTIME available with
* clock_gettime
* timerfd_create
* clock_gettime vDSO

CLOCK_BOOTTIME is implemented as an alias to CLOCK_MONOTONIC.
CLOCK_MONOTONIC already keeps track of time across save
and restore. This is the closest possible behavior to Linux
CLOCK_BOOTIME, as there is no concept of suspend/resume.

Updates google/gvisor#218
2019-06-24 21:14:38 -07:00
Andrei Vagin e9ea7230f7 fs: synchronize concurrent writes into files with O_APPEND
For files with O_APPEND, a file write operation gets a file size and uses it as
offset to call an inode write operation. This means that all other operations
which can change a file size should be blocked while the write operation doesn't
complete.

PiperOrigin-RevId: 254873771
2019-06-24 17:45:02 -07:00
Rahat Mahmood 94a6bfab5d Implement /proc/net/tcp.
PiperOrigin-RevId: 254854346
2019-06-24 15:56:36 -07:00
Nicolas Lacasse 87df9aab24 Use correct statx syscall number for amd64.
The previous number was for the arm architecture.

Also change the statx tests to force them to run on gVisor, which would have
caught this issue.

PiperOrigin-RevId: 254846831
2019-06-24 15:19:36 -07:00
brb-g 6f0a7de44b Add regression test for #128 (fixed in ab6774ce)
Tests run at HEAD (35719d52):
```
$ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)')
<snip>
//test/syscalls:getdents_test_native                                     PASSED in 0.3s
//test/syscalls:getdents_test_runsc_ptrace                               PASSED in 4.9s
//test/syscalls:getdents_test_runsc_ptrace_overlay                       PASSED in 4.7s
//test/syscalls:getdents_test_runsc_ptrace_shared                        PASSED in 5.2s
//test/syscalls:getdents_test_runsc_kvm                                  FAILED in 4.0s
```

Tests run at ab6774ce~1 (6f933a93):
```
$ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)')
//test/syscalls:getdents_test_native                                     PASSED in 0.2s
//test/syscalls:getdents_test_runsc_kvm                                  FAILED in 4.2s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_kvm/test.log
//test/syscalls:getdents_test_runsc_ptrace                               FAILED in 5.3s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace/test.log
//test/syscalls:getdents_test_runsc_ptrace_overlay                       FAILED in 4.9s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_overlay/test.log
//test/syscalls:getdents_test_runsc_ptrace_shared                        FAILED in 5.2s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_shared/test.log
```

(I think all runsc_kvm tests are broken on my machine -- I'll rerun them
if you can point me at the documentation to set it up)
2019-06-24 14:37:14 -07:00
chris.zn f957fb23cf Return ENOENT when reading /proc/{pid}/task of an exited process
There will be a deadloop when we use getdents to read /proc/{pid}/task
of an exited process

Like this:

Process A is running
                         Process B: open /proc/{pid of A}/task
Process A exits
                         Process B: getdents /proc/{pid of A}/task

Then, process B will fall into deadloop, and return "." and ".."
in loops and never ends.

This patch returns ENOENT when use getdents to read /proc/{pid}/task
if the process is just exited.

Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-06-24 15:49:53 +08:00
Nicolas Lacasse 35719d52c7 Implement statx.
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.

Fixes #343

PiperOrigin-RevId: 254575752
2019-06-22 13:29:26 -07:00
Ian Gudger dc36c34a76 Close FD on TcpSocketTest loop failure.
This helps prevent the blocking call from getting stuck and causing a test
timeout.

PiperOrigin-RevId: 254325926
2019-06-20 20:40:31 -07:00
Neel Natu 3c7448ab6f Deflake TestSIGALRMToMainThread.
Bump up the threshold on number of SIGALRMs received by worker
threads from 50 to 200. Even with the new threshold we still
expect that the majority of SIGALRMs are received by the
thread group leader.

PiperOrigin-RevId: 254289787
2019-06-20 15:58:18 -07:00
Neel Natu 0b2135072d Implement madvise(MADV_DONTFORK)
PiperOrigin-RevId: 254253777
2019-06-20 12:56:00 -07:00
Michael Pratt c2d87d5d7c Mark tcp_socket test flaky (for real)
The tag on the binary has no effect. It must be on the test.

PiperOrigin-RevId: 254103480
2019-06-19 17:18:11 -07:00
Nicolas Lacasse 9781128d5a Deflake mount_test.
Inode ids are only stable across Save/Restore if we have an open FD on the
inode. All tests that compare inode ids must therefor hold an FD open.

PiperOrigin-RevId: 254086603
2019-06-19 15:46:11 -07:00
Michael Pratt 773423a997 Abort loop on failure
As-is, on failure these will infinite loop, resulting in test timeout
instead of failure.

PiperOrigin-RevId: 254074989
2019-06-19 14:48:18 -07:00
Neel Natu 0d1dc50b70 Mark tcp_socket test flaky.
PiperOrigin-RevId: 253997465
2019-06-19 08:08:12 -07:00
Rahat Mahmood 546b2948cb Use return values from syscalls in eventfd tests.
PiperOrigin-RevId: 253890611
2019-06-18 16:21:56 -07:00
Brad Burlage 2e1379867a Replace usage of deprecated strtoul/strtoull
PiperOrigin-RevId: 253864770
2019-06-18 14:18:47 -07:00
Fabricio Voznika ec15fb1162 Fix PipeTest_Streaming timeout
Test was calling Size() inside read and write loops. Size()
makes 2 syscalls to return the pipe size, making the test
do a lot more work than it should.

PiperOrigin-RevId: 253824690
2019-06-18 11:03:33 -07:00
Ian Gudger 0a5ee6f7b2 Fix deadlock in fasync.
The deadlock can occur when both ends of a connected Unix socket which has
FIOASYNC enabled on at least one end are closed at the same time. One end
notifies that it is closing, calling (*waiter.Queue).Notify which takes
waiter.Queue.mu (as a read lock) and then calls (*FileAsync).Callback, which
takes FileAsync.mu. The other end tries to unregister for notifications by
calling (*FileAsync).Unregister, which takes FileAsync.mu and calls
(*waiter.Queue).EventUnregister which takes waiter.Queue.mu.

This is fixed by moving the calls to waiter.Waitable.EventRegister and
waiter.Waitable.EventUnregister outside of the protection of any mutex used
in (*FileAsync).Callback.

The new test is related, but does not cover this particular situation.

Also fix a data race on FileAsync.e.Callback. (*FileAsync).Callback checked
FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter
calling (*FileAsync).Callback could not and did not. This is fixed by making
FileAsync.e.Callback immutable before passing it to the waiter for the first
time.

Fixes #346

PiperOrigin-RevId: 253138340
2019-06-13 17:26:22 -07:00
Rahat Mahmood 05ff1ffaad Implement getsockopt() SO_DOMAIN, SO_PROTOCOL and SO_TYPE.
SO_TYPE was already implemented for everything but netlink sockets.

PiperOrigin-RevId: 253138157
2019-06-13 17:24:51 -07:00
Bhasker Hariharan 9f77b36fa1 Set optlen correctly when calling getsockopt.
PiperOrigin-RevId: 253096085
2019-06-13 13:41:39 -07:00
Bhasker Hariharan 70578806e8 Add support for TCP_CONGESTION socket option.
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.

PiperOrigin-RevId: 252889093
2019-06-12 13:35:50 -07:00
Adin Scannell df110ad4fe Eat sendfile partial error
For sendfile(2), we propagate a TCP error through the system call layer.
This should be eaten if there is a partial result. This change also adds
a test to ensure that there is no panic in this case, for both TCP sockets
and unix domain sockets.

PiperOrigin-RevId: 252746192
2019-06-11 19:24:35 -07:00
Bhasker Hariharan 3933dd5c04 Fixes to listen backlog handling.
Changes netstack to confirm to current linux behaviour where if the backlog is
full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto
backlog connections to be in SYN-RCVD state as long as the backlog is not full.

We also now drop a SYN if syn cookies are in use and the backlog for the
listening endpoint is full.

Added new tests to confirm the behaviour.

Also reverted the change to increase the backlog in TcpPortReuseMultiThread
syscall test.

Fixes #236

PiperOrigin-RevId: 252500462
2019-06-10 15:40:44 -07:00
Jamie Liu b3f104507d "Implement" mbind(2).
We still only advertise a single NUMA node, and ignore mempolicy
accordingly, but mbind() at least now succeeds and has effects reflected
by get_mempolicy().

Also fix handling of nodemasks: round sizes to unsigned long (as
documented and done by Linux), and zero trailing bits when copying them
out.

PiperOrigin-RevId: 251950859
2019-06-06 16:29:46 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Rahat Mahmood 8b8bd8d5b2 Try increase listen backlog.
PiperOrigin-RevId: 251928000
2019-06-06 14:32:04 -07:00
Googler 81eafb2c5e Internal change.
PiperOrigin-RevId: 251902567
2019-06-06 12:29:12 -07:00
Michael Pratt 57772db2e7 Shutdown host sockets on internal shutdown
This is required to make the shutdown visible to peers outside the
sandbox.

The readClosed / writeClosed fields were dropped, as they were
preventing a shutdown socket from reading the remainder of queued bytes.
The host syscalls will return the appropriate errors for shutdown.

The control message tests have been split out of socket_unix.cc to make
the (few) remaining tests accessible to testing inherited host UDS,
which don't support sending control messages.

Updates #273

PiperOrigin-RevId: 251763060
2019-06-05 18:40:37 -07:00
Chris Kuiper d18bb4f38a Adjust route when looping multicast packets
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.

PiperOrigin-RevId: 251739298
2019-06-05 16:08:29 -07:00
Ian Gudger c08fcaa364 Give test instantiations meaningful names.
PiperOrigin-RevId: 251737069
2019-06-05 15:57:27 -07:00
Michael Pratt d3ed9baac0 Implement dumpability tracking and checks
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.

Lack of support for set-uid binaries or fs creds simplifies things a
bit.

As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.

PiperOrigin-RevId: 251712714
2019-06-05 14:00:13 -07:00
Andrei Vagin 90a116890f gvisor/sock/unix: pass creds when a message is sent between unconnected sockets
and don't report a sender address if it doesn't have one

PiperOrigin-RevId: 251371284
2019-06-03 21:48:19 -07:00
Michael Pratt 6e1f51f3eb Remove duplicate socket tests
socket_unix_abstract.cc: Subset of socket_abstract.cc
socket_unix_filesystem.cc: Subset of socket_filesystem.cc

PiperOrigin-RevId: 251297117
2019-06-03 13:31:47 -07:00
chris.zn b18df9bed6 Add VmData field to /proc/{pid}/status
VmData is the size of private data segments.
It has the same meaning as in Linux.

Change-Id: Iebf1ae85940a810524a6cde9c2e767d4233ddb2a
PiperOrigin-RevId: 250593739
2019-05-30 12:07:40 -07:00
Andrei Vagin 4b9cb38157 gvisor: socket() returns EPROTONOSUPPORT if protocol is not supported
PiperOrigin-RevId: 250426407
2019-05-30 12:06:15 -07:00
Michael Pratt 507a15dce9 Always wait on tracee children
After bf959931ddb88c4e4366e96dd22e68fa0db9527c ("wait/ptrace: assume
__WALL if the child is traced") (Linux 4.7), tracees are always eligible
for waiting, regardless of type.

PiperOrigin-RevId: 250399527
2019-05-30 12:05:46 -07:00
Tamir Duberstein 9119478830 Extract SleepSafe from test_util
Allows socket tests that rely on test_util to compile on Fuchsia.

PiperOrigin-RevId: 249884084
Change-Id: I17617e3f1baaba4c85c689f40db4a42a8de1597e
2019-05-24 12:58:46 -07:00
Michael Pratt f65dfec096 Add WCLONE / WALL support to waitid
The previous commit adds WNOTHREAD support to waitid, so we may as well
complete the upstream change.

Linux added WCLONE, WALL, WNOTHREAD support to waitid(2) in
91c4e8ea8f05916df0c8a6f383508ac7c9e10dba ("wait: allow sys_waitid() to
accept __WNOTHREAD/__WCLONE/__WALL"). i.e., Linux 4.7.

PiperOrigin-RevId: 249560587
Change-Id: Iff177b0848a3f7bae6cb5592e44500c5a942fbeb
2019-05-22 18:11:50 -07:00
Michael Pratt 711290a7f6 Add support for wait(WNOTHREAD)
PiperOrigin-RevId: 249537694
Change-Id: Iaa4bca73a2d8341e03064d59a2eb490afc3f80da
2019-05-22 15:54:23 -07:00
Kevin Krakauer c1cdf18e7b UDP and TCP raw socket support.
PiperOrigin-RevId: 249511348
Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-22 13:45:15 -07:00
Adin Scannell ae1bb08871 Clean up pipe internals and add fcntl support
Pipe internals are made more efficient by avoiding garbage collection.
A pool is now used that can be shared by all pipes, and buffers are
chained via an intrusive list. The documentation for pipe structures
and methods is also simplified and clarified.

The pipe tests are now parameterized, so that they are run on all
different variants (named pipes, small buffers, default buffers).

The pipe buffer sizes are exposed by fcntl, which is now supported
by this change. A size change test has been added to the suite.

These new tests uncovered a bug regarding the semantics of open
named pipes with O_NONBLOCK, which is also fixed by this CL. This
fix also addresses the lack of the O_LARGEFILE flag for named pipes.

PiperOrigin-RevId: 249375888
Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773
2019-05-21 20:12:27 -07:00
Michael Pratt c8857f7269 Fix inconsistencies in ELF anonymous mappings
* A segment with filesz == 0, memsz > 0 should be an anonymous only
  mapping. We were failing to load such an ELF.
* Anonymous pages are always mapped RW, regardless of the segment
  protections.

PiperOrigin-RevId: 249355239
Change-Id: I251e5c0ce8848cf8420c3aadf337b0d77b1ad991
2019-05-21 17:06:05 -07:00
Adin Scannell 9cdae51fec Add basic plumbing for splice and stub implementation.
This does not actually implement an efficient splice or sendfile. Rather, it
adds a generic plumbing to the file internals so that this can be added. All
file implementations use the stub fileutil.NoSplice implementation, which
causes sendfile and splice to fall back to an internal copy.

A basic splice system call interface is added, along with a test.

PiperOrigin-RevId: 249335960
Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-05-21 15:18:12 -07:00
Michael Pratt 6588427451 Fix incorrect tmpfs timestamp updates
* Creation of files, directories (and other fs objects) in a directory
  should always update ctime.
* Same for removal.
* atime should not be updated on lookup, only readdir.

I've also renamed some misleading functions that update mtime and ctime.

PiperOrigin-RevId: 249115063
Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7
2019-05-20 13:35:17 -07:00
Michael Pratt 04105781ad Fix gofer rename ctime and cleanup stat_times test
There is a lot of redundancy that we can simplify in the stat_times
test. This will make it easier to add new tests. However, the
simplification reveals that cached uattrs on goferfs don't properly
update ctime on rename.

PiperOrigin-RevId: 248773425
Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf
2019-05-17 13:05:47 -07:00
Ian Gudger 40419a16eb Add test for duplicate proc entries.
The issue with duplicate /proc/sys entries seems to have been fixed in:
PiperOrigin-RevId 229305982
Git hash dc8450b567

Fixes google/gvisor#125

PiperOrigin-RevId: 248571903
Change-Id: I76ff3b525c93dafb92da6e5cf56e440187f14579
2019-05-16 11:59:01 -07:00
Michael Pratt c61a2e709a Modernize mknod test
PiperOrigin-RevId: 247704588
Change-Id: I1e63e2b310145695fbe38429b91e44d72473fcd6
2019-05-10 17:37:43 -07:00
Fabricio Voznika 1bee43be13 Implement fallocate(2)
Closes #225

PiperOrigin-RevId: 247508791
Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
2019-05-09 15:35:49 -07:00
Googler c3b6d4587e Fix types that are subtly incorrect.
PiperOrigin-RevId: 247294093
Change-Id: Iac8c76e50bbc15c240ae7da7f5786f9968e7057c
2019-05-08 14:40:09 -07:00
Kevin Krakauer ff8ed5e6a5 Fix raw socket behavior and tests.
Some behavior was broken due to the difficulty of running automated raw
socket tests.

Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
2019-05-05 16:07:25 -07:00
Andrei Vagin 9e1c253fe8 gvisor: run bazel in a docker container
bazel has a lot of dependencies and users don't want to install them
just to build gvisor.

These changes allows to run bazel in a docker container.
A bazel cache is on the local file system (~/.cache/bazel), so
incremental builds should be fast event after recreating a bazel
container.

Here is an example how to build runsc:
make BAZEL_OPTIONS="build runsc:runsc" bazel

Change-Id: I8c0a6d0c30e835892377fb6dd5f4af7a0052d12a
PiperOrigin-RevId: 246570877
2019-05-03 14:13:08 -07:00
Fabricio Voznika 6b9ab65163 Skip flaky ClockGettime.CputimeId take 2
The test also times out when GCE machine has 2 CPUs. I cannot
repro it locally with a 2 CPU cgroup though. Let's skip the
test when there are 2 CPUs to stop the flakiness and retest it
once the fix is available.

PiperOrigin-RevId: 246523363
Change-Id: I9d9d922a5be3aa7bc91dff5a1807ca99f3f4a4f9
2019-05-03 09:42:10 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Kevin Krakauer bf40fa2129 Replace dynamic macros with constants in memfd test.
PiperOrigin-RevId: 246433167
Change-Id: Idb9b6c20ee1da193176288dfd2f9d85ec0e69c54
2019-05-02 18:57:58 -07:00
Ian Gudger 81ecd8b6ea Implement the MSG_CTRUNC msghdr flag for Unix sockets.
Updates google/gvisor#206

PiperOrigin-RevId: 245880573
Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29 21:21:08 -07:00
Fabricio Voznika 2843f2a956 Skip flaky ClockGettime.CputimeId
Test times out when it runs on a single core. Skip until the
bug in the Go runtime is fixed.

PiperOrigin-RevId: 245866466
Change-Id: Ic3e72131c27136d58b71f6b11acc78abf55895d4
2019-04-29 18:41:54 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Tamir Duberstein ac8fca1ef4 Appease googletest deprecation
PiperOrigin-RevId: 245788366
Change-Id: I17bbecf8493132dbe95564c34c45b838194bfabb
2019-04-29 11:34:16 -07:00
Nicolas Lacasse 2df64cd6d2 createAt should return all errors from FindInode except ENOENT.
Previously, createAt was eating all errors from FindInode except for EACCES and
proceeding with the creation. This is incorrect, as FindInode can return many
other errors (like ENAMETOOLONG) that should stop creation.

This CL changes createAt to return all errors encountered except for ENOENT,
which we can ignore because we are about to create the thing.

PiperOrigin-RevId: 245773222
Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
2019-04-29 10:30:24 -07:00
Tamir Duberstein 59442238d4 Remove syscall tests' dependency on glog
PiperOrigin-RevId: 245469859
Change-Id: I0610e477cc3a884275852e83028ecfb501f2c039
2019-04-26 12:47:46 -07:00
Kevin Krakauer 5f13338d30 Fix reference counting bug in /proc/PID/fdinfo/.
PiperOrigin-RevId: 245452217
Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
2019-04-26 11:09:55 -07:00
Kevin Krakauer f4d34b420b Change name of sticky test arg.
PiperOrigin-RevId: 245451875
Change-Id: Icee2c4ed74564e77454c60d60f456454443ccadf
2019-04-26 11:08:08 -07:00
Jamie Liu 6b76c172b4 Don't enforce NAME_MAX in fs.Dirent.walk().
Maximum filename length is filesystem-dependent, and obtained via
statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not
always. For example, VFAT supports filenames of up to 255... UCS-2
characters, which Linux conservatively takes to mean UTF-8-encoded
bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
As a result, Linux's VFS does not enforce NAME_MAX:

$ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/
fs/libfs.c
38:     buf->f_namelen = NAME_MAX;
64:     if (dentry->d_name.len > NAME_MAX)

include/linux/relay.h
74:     char base_filename[NAME_MAX];   /* saved base filename */

include/linux/fscrypt.h
149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length.

include/linux/exportfs.h
176: *    understanding that it is already pointing to a a %NAME_MAX+1 sized

Remove this check from core VFS, and add it to ramfs (and by extension
tmpfs), where it is actually applicable:
mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does*
enforce NAME_MAX.

PiperOrigin-RevId: 245324748
Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-25 16:05:13 -07:00
Tamir Duberstein 7219781040 s,sys/poll.h/,poll.h,g
See https://git.musl-libc.org/cgit/musl/tree/include/sys/poll.h

PiperOrigin-RevId: 245312375
Change-Id: If749ae3f94ccedc82eb6b594b32155924a354b58
2019-04-25 14:57:06 -07:00
Ian Gudger 962567aafd Add Unix socket tests for the MSG_CTRUNC msghdr flag.
TCP tests and the implementation will come in followup CLs.

Updates google/gvisor#206
Updates google/gvisor#207

PiperOrigin-RevId: 245121470
Change-Id: Ib50b62724d3ba0cbfb1374e1f908798431ee2b21
2019-04-24 14:51:42 -07:00
Wei Zhang 17ff6063a3 Bugfix: fix fstatat symbol link to dir
For a symbol link to some directory, eg.

`/tmp/symlink -> /tmp/dir`

`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.

Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.

Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
2019-04-22 20:07:06 -07:00
Ben Burkert 56927e5317 tcpip/transport/tcp: read side only shutdown of an endpoint
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.

Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.

Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
2019-04-19 19:29:05 -07:00
Ian Gudger 358eb52a76 Add support for the MSG_TRUNC msghdr flag.
The MSG_TRUNC flag is set in the msghdr when a message is truncated.

Fixes google/gvisor#200

PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-19 16:17:01 -07:00
Nicolas Lacasse ce64d9ebf0 Keep symlink target open while in test that compares inode ids.
Inode ids are only guaranteed to be stable across save/restore if the file is
held open. This CL fixes a simple stat test to allow it to compare symlink and
target by inode id, as long as the link target is held open.

PiperOrigin-RevId: 244238343
Change-Id: I74c5115915b1cc032a4c16515a056a480f218f00
2019-04-18 12:39:35 -07:00
Michael Pratt b52cbd6028 Don't allow sigtimedwait to catch unblockable signals
The existing logic attempting to do this is incorrect. Unary ^ has
higher precedence than &^, so mask always has UnblockableSignals
cleared, allowing dequeueSignalLocked to dequeue unblockable signals
(which allows userspace to ignore them).

Switch the logic so that unblockable signals are always masked.

PiperOrigin-RevId: 244058487
Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
2019-04-17 13:43:20 -07:00
Fabricio Voznika c8cee7108f Use FD limit and file size limit from host
FD limit and file size limit is read from the host, instead
of using hard-coded defaults, given that they effect the sandbox
process. Also limit the direct cache to use no more than half
if the available FDs.

PiperOrigin-RevId: 244050323
Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-04-17 12:57:40 -07:00
Michael Pratt cc48969bb7 Internal change
PiperOrigin-RevId: 242978508
Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
2019-04-10 18:00:18 -07:00
Kevin Krakauer c8368e477b rlimits test: don't exceed nr_open.
Even superuser cannot raise RLIMIT_NOFILE above /proc/sys/fs/nr_open, so
start the test by lowering the limits before raising.

Change-Id: Ied6021c64178a6cb9098088a1a3384db523a226f
PiperOrigin-RevId: 242965249
2019-04-10 16:34:50 -07:00
Kevin Krakauer f7aff0aaa4 Allow threads with CAP_SYS_RESOURCE to raise hard rlimits.
PiperOrigin-RevId: 242919489
Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
2019-04-10 12:36:45 -07:00
Shiva Prasanth 7140b1fdca Fixed /proc/cpuinfo permissions
This also applies these permissions to other static proc files.

Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d
PiperOrigin-RevId: 242898575
2019-04-10 10:49:43 -07:00
Michael Pratt 0e14e48b84 Match multi-word State
From a recent test failure:

"State:\tD (disk sleep)\n"

"disk sleep" does not match \w+. We need to allow spaces.

PiperOrigin-RevId: 242762469
Change-Id: Ic8d05a16669412a72c1e76b498373e5b22fe64c4
2019-04-09 16:26:11 -07:00
Michael Pratt 05979a7547 Internal change
PiperOrigin-RevId: 242573252
Change-Id: Ibb4c6bfae2c2e322bf1cec23181a0ab663d8530a
2019-04-08 17:35:51 -07:00
Jamie Liu 9471c01348 Export kernel.SignalInfoPriv.
Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks.

PiperOrigin-RevId: 242562428
Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90
2019-04-08 16:32:11 -07:00
Michael Pratt 218a7b5449 Add TODO
PiperOrigin-RevId: 242531141
Change-Id: I2a3bd815bda09f392f511f47120d5d9e6e86a40d
2019-04-08 13:48:40 -07:00
Jamie Liu 124bafc81c Deflake PtraceTest.SeizeSetOptions.
PiperOrigin-RevId: 242226319
Change-Id: Iefc78656841315f6b7d48bd85db451486850264d
2019-04-05 17:54:31 -07:00
Andrei Vagin 88409e983c gvisor: Add support for the MS_NOEXEC mount option
https://github.com/google/gvisor/issues/145

PiperOrigin-RevId: 242044115
Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878
2019-04-04 17:43:53 -07:00
Kevin Krakauer 82529becae Fix index out of bounds in tty implementation.
The previous implementation revolved around runes instead of bytes, which caused
weird behavior when converting between the two. For example, peekRune would read
the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an
alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for
?. However, peekRune also returned the length of the byte (1). When calling
utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte
character ?.

tl;dr: I apparently didn't understand runes when I wrote this.

PiperOrigin-RevId: 241789081
Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727
2019-04-03 13:00:34 -07:00
Jamie Liu c4caccd540 Set options on the correct Task in PTRACE_SEIZE.
$ docker run --rm --runtime=runsc -it --cap-add=SYS_PTRACE debian bash -c "apt-get update && apt-get install strace && strace ls"
...
Setting up strace (4.15-2) ...
execve("/bin/ls", ["ls"], [/* 6 vars */]) = 0
brk(NULL)                               = 0x5646d8c1e000
uname({sysname="Linux", nodename="114ef93d2db3", ...}) = 0
...

PiperOrigin-RevId: 241643321
Change-Id: Ie4bce27a7fb147eef07bbae5895c6ef3f529e177
2019-04-02 18:13:19 -07:00
Nicolas Lacasse 1776ab28f0 Add test that symlinking over a directory returns EEXIST.
Also remove comments in InodeOperations that required that implementation of
some Create* operations ensure that the name does not already exist, since
these checks are all centralized in the Dirent.

PiperOrigin-RevId: 241637335
Change-Id: Id098dc6063ff7c38347af29d1369075ad1e89a58
2019-04-02 17:28:36 -07:00
Kevin Krakauer 52a51a8e20 Add a raw socket transport endpoint and use it for raw ICMP sockets.
Having raw socket code together will make it easier to add support for other raw
network protocols. Currently, only ICMP uses the raw endpoint. However, adding
support for other protocols such as UDP shouldn't be much more difficult than
adding a few switch cases.

PiperOrigin-RevId: 241564875
Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-04-02 11:13:49 -07:00
Rahat Mahmood 7cff746ef2 Save/restore simple devices.
We weren't saving simple devices' last allocated inode numbers, which
caused inode number reuse across S/R.

PiperOrigin-RevId: 241414245
Change-Id: I964289978841ef0a57d2fa48daf8eab7633c1284
2019-04-01 15:39:16 -07:00
Jamie Liu 1a02ba3e6e Trim trailing newline when reading /proc/[pid]/{uid,gid}_map in test.
This reveals a bug in the tests that require CAP_SET{UID,GID}: After the
child process enters the new user namespace, it ceases to have the
relevant capability in the parent user namespace, so the privileged
write must be done by the parent process. Change tests accordingly.

PiperOrigin-RevId: 241412765
Change-Id: I587c1f24aa6f2180fb2e5e5c0162691ba5bac1bc
2019-04-01 15:31:37 -07:00
Jamie Liu 60efd53822 Fix MemfdTest_OtherProcessCanOpenFromProcfs.
- Make the body of InForkedProcess async-signal-safe.

- Pass the correct path to open().

PiperOrigin-RevId: 241348774
Change-Id: I753dfa36e4fb05521e659c173e3b7db0c7fc159b
2019-04-01 10:18:36 -07:00