Commit Graph

186 Commits

Author SHA1 Message Date
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Kevin Krakauer 2302afb53d Reorder BUILD license and load functions in netstack.
PiperOrigin-RevId: 274672346
2019-10-14 15:21:59 -07:00
gVisor bot bf870c1a42 Internal change.
PiperOrigin-RevId: 273861936
2019-10-09 17:56:05 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
Bhasker Hariharan 61f6fbd0ce Fix bugs in PickEphemeralPort for TCP.
Netstack always picks a random start point everytime PickEphemeralPort
is called. While this is required for UDP so that DNS requests go
out through a randomized set of ports it is not required for TCP. Infact
Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret
initialized at start of the application to get a random offset. But to
ensure it doesn't start from the same point on every scan it uses a static
hint that is incremented by 2 in every call to pick ephemeral ports.

The reason for 2 is Linux seems to split the port ranges where active connects
seem to use even ones while odd ones are used by listening sockets.

This CL implements a similar strategy where we use a hash + hint to generate
the offset to start the search for a free Ephemeral port.

This ensures that we cycle through the available port space in order for
repeated connects to the same destination and significantly reduces the
chance of picking a recently released port.

PiperOrigin-RevId: 272058370
2019-09-30 13:55:22 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Andrei Vagin 03ee55cc62 netstack: convert more socket options to {Set,Get}SockOptInt
PiperOrigin-RevId: 270763208
2019-09-23 14:39:14 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Ian Gudger 9dfcd8b09f Fix ephemeral port leak.
Fix a bug where udp.(*endpoint).Disconnect [accessible in gVisor via
epsocket.(*SocketOperations).Connect with AF_UNSPEC] would leak a port
reservation if the socket/endpoint had an ephemeral port assigned to it.

glibc's getaddrinfo uses connect with AF_UNSPEC, causing each call of
getaddrinfo to leak a port. Call getaddrinfo too many times and you run out of
ports (shows up as connect returning EAGAIN and getaddrinfo returning
EAI_NONAME "Name or service not known").

PiperOrigin-RevId: 268071160
2019-09-09 14:02:00 -07:00
Ian Gudger fe1f521077 Remove reundant global tcpip.LinkEndpointID.
PiperOrigin-RevId: 267709597
2019-09-06 18:01:14 -07:00
Bhasker Hariharan 3dc3cffb2d Fix RST generation bugs.
There are a few cases addressed by this change

- We no longer generate a RST in response to a RST packet.

- When we receive a RST we cleanup and release all reservations immediately as
  the connection is now aborted.

- An ACK received by a listening socket generates a RST when SYN cookies are not
  in-use. The only reason an ACK should land at the listening socket is if we
  are using SYN cookies otherwise the goroutine for the handshake in progress
  should have gotten the packet and it should never have arrived at the
  listening endpoint.

- Also fixes the error returned when a connection times out due to a
  Keepalive timer expiration from ECONNRESET to a ETIMEDOUT.

PiperOrigin-RevId: 267238427
2019-09-04 14:59:53 -07:00
Bhasker Hariharan 3789c34b22 Make UDP traceroute work.
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.

Fixes #703

PiperOrigin-RevId: 267034418
2019-09-03 16:01:17 -07:00
Rahat Mahmood 863e11ac4d Implement /proc/net/udp.
PiperOrigin-RevId: 266229756
2019-08-29 14:30:41 -07:00
Rahat Mahmood 1fdefd41c5 netstack/tcp: Add LastAck transition.
Add missing state transition to LastAck, which should happen when the
endpoint has already recieved a FIN from the remote side, and is
sending its own FIN.

PiperOrigin-RevId: 265568314
2019-08-26 16:39:13 -07:00
Chris Kuiper 8d9276ed56 Support binding to multicast and broadcast addresses
This fixes the issue of not being able to bind to either a multicast or
broadcast address as well as to send and receive data from it. The way to solve
this is to treat these addresses similar to the ANY address and register their
transport endpoint ID with the global stack's demuxer rather than the NIC's.
That way there is no need to require an endpoint with that multicast or
broadcast address. The stack's demuxer is in fact the only correct one to use,
because neither broadcast- nor multicast-bound sockets care which NIC a
packet was received on (for multicast a join is still needed to receive packets
on a NIC).

I also took the liberty of refactoring udp_test.go to consolidate a lot of
duplicate code and make it easier to create repetitive tests that test the same
feature for a variety of packet and socket types. For this purpose I created a
"flowType" that represents two things: 1) the type of packet being sent or
received and 2) the type of socket used for the test. E.g., a "multicastV4in6"
flow represents a V4-mapped multicast packet run through a V6-dual socket.

This allows writing significantly simpler tests. A nice example is testTTL().

PiperOrigin-RevId: 264766909
2019-08-21 22:54:25 -07:00
Tamir Duberstein 573e6e4bba Use tcpip.Subnet in tcpip.Route
This is the first step in replacing some of the redundant types with the
standard library equivalents.

PiperOrigin-RevId: 264706552
2019-08-21 15:31:18 -07:00
Andrei Vagin 3e4102b2ea netstack: disconnect an unix socket only if the address family is AF_UNSPEC
Linux allows to call connect for ANY and the zero port.

PiperOrigin-RevId: 263892534
2019-08-16 19:32:14 -07:00
Tamir Duberstein fe74bba2bd Don't dereference errors passed to panic()
These errors are always pointers; there's no sense in dereferencing them
in the panic call. Changed one false positive for clarity.

PiperOrigin-RevId: 263611579
2019-08-15 11:58:16 -07:00
Tamir Duberstein 816a9211e9 netstack: move resumption logic into *_state.go
13a98df rearranged some of this code in a way that broke compilation of
the netstack-only export at github.com/google/netstack because
*_state.go files are not included in that export.

This commit moves resumption logic back into *_state.go, fixing the
compilation breakage.

PiperOrigin-RevId: 263601629
2019-08-15 11:13:46 -07:00
Tamir Duberstein d81d94ac4c Replace uinptr with int64 when returning lengths
This is in accordance with newer parts of the standard library.

PiperOrigin-RevId: 263449916
2019-08-14 16:05:56 -07:00
Bhasker Hariharan 570fb1db6b Improve SendMsg performance.
SendMsg before this change would copy all the data over into a
new slice even if the underlying socket could only accept a
small amount of data. This is really inefficient with non-blocking
sockets and under high throughput where large writes could get
ErrWouldBlock or if there was say a timeout associated with the sendmsg()
syscall.

With this change we delay copying bytes in till they are needed and only
copy what can be potentially sent/held in the socket buffer. Reducing
the need to repeatedly copy data over.

Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called
instead of when we transmit the actual FIN. Otherwise the socket could remain in
CONNECTED state even though the user has called shutdown() on the socket.

Updates #627

PiperOrigin-RevId: 263430505
2019-08-14 14:34:27 -07:00
Bhasker Hariharan 5a38eb120a Add congestion control states to sender.
This change just introduces different congestion control states and
ensures the sender.state is updated to reflect the current state
of the connection.

It is not used for any decisions yet but this is required before
algorithms like Eiffel/PRR can be implemented.

Fixes #394

PiperOrigin-RevId: 262638292
2019-08-09 14:50:30 -07:00
Rahat Mahmood 13a98df49e netstack: Don't start endpoint goroutines too soon on restore.
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.

This CL defers resuming all protocol goroutines until the end of
restore.

PiperOrigin-RevId: 262409429
2019-08-08 12:33:11 -07:00
Bhasker Hariharan dfbc0b0a4c Fix for a panic due to writing to a closed accept channel.
This can happen because endpoint.Close() closes the accept channel first and
then drains/resets any accepted but not delivered connections. But there can be
connections that are connected but not delivered to the channel as the channel
was full. But closing the channel can cause these writes to fail with a write to
a closed channel.

The correct solution is to abort any connections in SYN-RCVD state and
drain/abort all completed connections before closing the accept channel.

PiperOrigin-RevId: 261951132
2019-08-06 11:01:27 -07:00
Kevin Krakauer 810cc07aab Plumbing for iptables sockopts.
PiperOrigin-RevId: 261413396
2019-08-02 16:26:48 -07:00
Rahat Mahmood 2906dffcdb Automated rollback of changelist 261191548
PiperOrigin-RevId: 261373749
2019-08-02 12:52:40 -07:00
Rahat Mahmood 79511e8a50 Implement getsockopt(TCP_INFO).
Export some readily-available fields for TCP_INFO and stub out the rest.

PiperOrigin-RevId: 261191548
2019-08-01 13:58:48 -07:00
Tamir Duberstein c6e6d92cb1 Test connecting UDP sockets to the ANY address
This doesn't currently pass on gVisor.

While I'm here, fix a bug where connecting to the v6-mapped v4 address doesn't
work in gVisor.

PiperOrigin-RevId: 260923961
2019-07-31 07:41:20 -07:00
Chris Kuiper 40e682759f Add support for a subnet prefix length on interface network addresses
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.

PiperOrigin-RevId: 259807693
2019-07-24 13:42:14 -07:00
Tamir Duberstein 12c256568b Deduplicate EndpointState.connected some
This fixes a bug introduced in cl/251934850 that caused
connect-accept-close-connect races to result in the second connect call
failiing when it should have succeeded.

PiperOrigin-RevId: 259584525
2019-07-23 12:10:18 -07:00
Chris Kuiper 0e040ba6e8 Handle interfaceAddr and NIC options separately for IP_MULTICAST_IF
This tweaks the handling code for IP_MULTICAST_IF to ignore the InterfaceAddr
if a NICID is given.

PiperOrigin-RevId: 258982541
2019-07-19 09:29:04 -07:00
Andrei Vagin eefa817cfd net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)
PiperOrigin-RevId: 258859507
2019-07-18 15:41:04 -07:00
gVisor bot 74dc663bbb Internal change.
PiperOrigin-RevId: 258424489
2019-07-16 13:03:37 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Bhasker Hariharan 6116473b2f Stub out support for TCP_MAXSEG.
Adds support to set/get the TCP_MAXSEG value but does not
really change the segment sizes emitted by netstack or
alter the MSS advertised by the endpoint. This is currently
being added only to unblock iperf3 on gVisor. Plumbing
this correctly requires a bit more work which will come
in separate CLs.

PiperOrigin-RevId: 257859112
2019-07-12 13:35:17 -07:00
Andrei Vagin 116cac053e netstack/udp: connect with the AF_UNSPEC address family means disconnect
PiperOrigin-RevId: 256433283
2019-07-03 14:19:02 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Bhasker Hariharan c1761378a9 Fix the logic for sending zero window updates.
Today we have the logic split in two places between endpoint Read() and the
worker goroutine which actually sends a zero window. This change makes it so
that when a zero window ACK is sent we set a flag in the endpoint which can be
read by the endpoint to decide if it should notify the worker to send a
nonZeroWindow update.

The worker now does not do the check again but instead sends an ACK and flips
the flag right away.

Similarly today when SO_RECVBUF is set the SetSockOpt call has logic
to decide if a zero window update is required. Rather than do that we move
the logic to the worker goroutine and it can check the zeroWindow flag
and send an update if required.

PiperOrigin-RevId: 254505447
2019-06-21 18:31:31 -07:00
Bhasker Hariharan 3d71c627fa Add support for TCP receive buffer auto tuning.
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.

This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.

The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf

NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.

Updates #230

PiperOrigin-RevId: 253168283
2019-06-13 22:28:01 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Adin Scannell e352f46478 Minor BUILD file cleanup.
PiperOrigin-RevId: 252918338
2019-06-12 15:59:46 -07:00
Bhasker Hariharan 70578806e8 Add support for TCP_CONGESTION socket option.
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.

PiperOrigin-RevId: 252889093
2019-06-12 13:35:50 -07:00
Bhasker Hariharan 3933dd5c04 Fixes to listen backlog handling.
Changes netstack to confirm to current linux behaviour where if the backlog is
full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto
backlog connections to be in SYN-RCVD state as long as the backlog is not full.

We also now drop a SYN if syn cookies are in use and the backlog for the
listening endpoint is full.

Added new tests to confirm the behaviour.

Also reverted the change to increase the backlog in TcpPortReuseMultiThread
syscall test.

Fixes #236

PiperOrigin-RevId: 252500462
2019-06-10 15:40:44 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Andrei Vagin a12848ffeb netstack/tcp: fix calculating a number of outstanding packets
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.

PiperOrigin-RevId: 251743020
2019-06-05 16:30:45 -07:00
Bhasker Hariharan e0fb921205 Fix data race in synRcvdState.
When checking the length of the acceptedChan we should hold the
endpoint mutex otherwise a syn received while the listening socket
is being closed can result in a data race where the cleanupLocked
routine sets acceptedChan to nil while a handshake goroutine
in progress could try and check it at the same time.

PiperOrigin-RevId: 251537697
2019-06-04 16:17:24 -07:00
Bhasker Hariharan bfe3220992 Delete debug log lines left by mistake.
Updates #236

PiperOrigin-RevId: 251337915
2019-06-03 17:00:18 -07:00