Commit Graph

158 Commits

Author SHA1 Message Date
Tamir Duberstein 3830786883 Map IPv{4,6} addresses to ethernet addresses
...in accordance with RFCs 1112 and 2464.

Fixes IPv4 multicast when IP_MULTICAST_IF is specified.

Don't return ErrNoRoute when no route is needed.
Don't set Route.NextHop when no route is needed.

PiperOrigin-RevId: 236199813
Change-Id: I48ed33e1b7f760deaa37e18ad7f1b8b62819ab43
2019-02-28 14:38:32 -08:00
Kevin Krakauer 121db29a93 Ping support via IPv4 raw sockets.
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
  which can pass it up to the socket layer. This allows a raw socket to return
  the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
  that enable incoming packets to be delivered to raw endpoints. New raw sockets
  of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.

PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-27 14:31:21 -08:00
Googler 12d9cf6fab Adds a WriteRawPacket method to the InjectableLinkEndpoint interface.
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.

PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
2019-02-26 14:58:37 -08:00
Amanda Tait 33d0e824c7 Use more conservative locking in NIC.DeliverNetworkPacket
An earlier CL excessively minimizes the period in which it
holds a lock on NIC. This earlier CL had done this out of
the mistaken impression it fixed a broken test, when in
fact it just reduced the rate of failure of a flaky test
in tcp_test.go. This new change holds the lock on NIC
for the duration of the loop over n.endpoints.

PiperOrigin-RevId: 235732487
Change-Id: I53ee6df264f093ddc4d29e9acdcba6b4838cb112
2019-02-26 09:10:37 -08:00
Bhasker Hariharan 26be25e4ec Add a SACK scoreboard to TCP endpoints.
This change does not make use of SACK information but adds support to track
SACK information and store it in the endpoint.

The actual SACK based recovery will be in a separate CL.

Part of commits to add RFC 6675 support to Netstack.

PiperOrigin-RevId: 235612264
Change-Id: I261f94844d7bad5abda803152ce6cc6125a467ff
2019-02-25 15:20:04 -08:00
Amanda Tait c14a1a1618 Fix race condition in NIC.DeliverNetworkPacket
cl/234850781 introduced a race condition in NIC.DeliverNetworkPacket
by failing to hold a lock. This change fixes this regressesion by acquiring
a read lock before iterating through n.endpoints, and then releasing the lock
once iteration is complete.

PiperOrigin-RevId: 235549770
Change-Id: Ib0133288be512d478cf759c3314dc95ec3205d4b
2019-02-25 10:02:29 -08:00
Kevin Krakauer b75aa51504 Rename ping endpoints to icmp endpoints.
PiperOrigin-RevId: 235248572
Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09
2019-02-22 13:34:47 -08:00
Amanda Tait ea070b9d5f Implement Broadcast support
This change adds support for the SO_BROADCAST socket option in gVisor Netstack.
This support includes getsockopt()/setsockopt() functionality for both UDP and
TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and
down the stack, and route finding/creation for broadcast packets. Finally, a
suite of tests have been implemented, exercising this functionality through the
Linux syscall API.

PiperOrigin-RevId: 234850781
Change-Id: If3e666666917d39f55083741c78314a06defb26c
2019-02-20 12:54:13 -08:00
Bhasker Hariharan 3e3a1ef9d6 Updates tcp_proxy to use an AF_PACKET and veth devices.
tcp_proxy now uses an AF_PACKET socket as the FD for netstack link layer
endpoint instead of a tap device. It also changes the link layer endpoint to use
PacketMMap dispatch instead of Readv. This reduces overall cpu and reflects the
current runsc setup which uses PacketMMap and also uses veth devices to receive
packets.

Also fixed a bug in gonet where Read() was not doing coalescing read and would
read small amounts at a time.

PiperOrigin-RevId: 234714768
Change-Id: Idabf8e600e4512489d3ba441c4096dc74deba5d7
2019-02-19 18:23:54 -08:00
Ian Gudger c611dbc5a7 Implement IP_MULTICAST_IF.
This allows setting a default send interface for IPv4 multicast. IPv6 support
will come later.

PiperOrigin-RevId: 234251379
Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173
2019-02-15 18:40:15 -08:00
Kevin Krakauer a9cb3dcd9d Move SO_TIMESTAMP from different transport endpoints to epsocket.
SO_TIMESTAMP is reimplemented in ping and UDP sockets (and needs to be added for
TCP), but can just be implemented in epsocket for simplicity. This will also
make SIOCGSTAMP easier to implement.

PiperOrigin-RevId: 234179300
Change-Id: Ib5ea0b1261dc218c1a8b15a65775de0050fe3230
2019-02-15 11:18:44 -08:00
Googler d60ce17a21 Internal change.
PiperOrigin-RevId: 234011346
Change-Id: Ic69375ddb3794dd0d3d6e62ee4dc60fdf4baf2c7
2019-02-14 12:54:27 -08:00
Bhasker Hariharan e0b3d3323f Add support for using PACKET_RX_RING to receive packets.
PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the
kernel. This should cut down the number of host syscalls that need to be made
to receive packets when the underlying fd is a socket of the AF_PACKET type.

PiperOrigin-RevId: 233834998
Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f
2019-02-13 14:53:03 -08:00
Bhasker Hariharan efe5e737d7 Do not drop packets w/ missing TCP timestamps.
RFC7323 recommends that if the timestamp option was negotiated
then all packets should carry a TCP Timestamp and any packets that
do not should be dropped.

Netstack implemented this behaviour. Linux OTOH does not and will
accept such packets. This change makes Netstack behaviour compatible
with Linux.

Also now that we allow such packets, we do need to update RTO calculations
based on these packets even if timestamp option is enabled.

PiperOrigin-RevId: 233432268
Change-Id: I9f4742ae6b63930ac3b5e37d8c238761e6a4b29f
2019-02-11 10:23:43 -08:00
Ian Gudger 967326131a Fix build error.
PiperOrigin-RevId: 233139020
Change-Id: I2e7089fa25d20e5662eb941054a684d41f5d3e12
2019-02-08 15:37:20 -08:00
Ian Gudger 80f901b16b Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack.
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in
a followup CL.

PiperOrigin-RevId: 233008638
Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
2019-02-07 23:15:23 -08:00
Googler e0afa87899 Internal change.
PiperOrigin-RevId: 232937200
Change-Id: I5c3709cc8f1313313ff618a45e48c14a3a111cb4
2019-02-07 13:46:26 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Bhasker Hariharan f03c7e48e7 Fix IsLost check to match the description in RFC6675.
quoting what "rscheff@gmx.at" pointed out over email.
"IsLost in RFC3517 is defined as  >=  (DupThresh * SMSS) while
RFC6675 improves upon this, and defines IsLost as  >
((DupThresh - 1) * SMSS + 1).

The latter addresses situations where partial segments (size < MSS)
are sent (eg. last segment of a http protocol message sent with PSH
being less than MSS is common)."

PiperOrigin-RevId: 231512331
Change-Id: I1addd4a92e3e7baeb0bdda46463ebfae435da958
2019-01-29 18:13:48 -08:00
Ian Gudger ff1c3bb0b5 Fix NIC endpoint forwarding.
Also adds a test for regular NIC forwarding.

PiperOrigin-RevId: 231495279
Change-Id: Ic7edec249568e9ad0280cea77eac14478c9073e1
2019-01-29 16:23:30 -08:00
Bhasker Hariharan 24cb2c0a72 Use recvmmsg() instead of readv() to read packets from NIC.
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.

PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
2019-01-29 01:39:01 -08:00
Kevin Krakauer 9a01287d23 test: Tag tcp_test as flaky.
PiperOrigin-RevId: 229427852
Change-Id: I9de8ed63f4a7672dacd3b282c863c599d00acd52
2019-01-15 13:21:00 -08:00
Zhaozhong Ni 7182b9cf52 netstack: release port inline for listening sockets only.
PiperOrigin-RevId: 229243918
Change-Id: Ie14ef34e66ae851ed080f57b7d26a369a66f7664
2019-01-14 13:33:47 -08:00
Googler 1e1dae50ca Internal change.
PiperOrigin-RevId: 228979583
Change-Id: I69bd82def48ceb19bc8558c890622b8528d98764
2019-01-11 18:52:36 -08:00
Bert Muthalaly 3f45878b73 Implement Stringer for tcpip.StatCounter
This enables formatting tcpip.Stats readably with %+v.

PiperOrigin-RevId: 228379088
Change-Id: I6a9876454a22f151ee752cf94589b4188729458f
2019-01-08 12:35:35 -08:00
Andrei Vagin 652d068119 Implement SO_REUSEPORT for TCP and UDP sockets
This option allows multiple sockets to be bound to the same port.

Incoming packets are distributed to sockets using a hash based on source and
destination addresses. This means that all packets from one sender will be
received by the same server socket.

PiperOrigin-RevId: 227153413
Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
2018-12-28 11:27:14 -08:00
Ian Gudger 0df0df35fc Stub out SO_OOBINLINE.
We don't explicitly support out-of-band data and treat it like normal in-band
data. This is equilivent to SO_OOBINLINE being enabled, so always report that
it is enabled.

PiperOrigin-RevId: 226572742
Change-Id: I4c30ccb83265e76c30dea631cbf86822e6ee1c1b
2018-12-21 19:46:55 -08:00
Michael Pratt 71f0d5108b Internal Change
PiperOrigin-RevId: 226542979
Change-Id: Ife11ebd0a85b8a63078e6daa71b4a99a82080ac9
2018-12-21 14:29:35 -08:00
Ian Gudger b515556519 Implement SO_KEEPALIVE, TCP_KEEPIDLE, and TCP_KEEPINTVL.
Within gVisor, plumb new socket options to netstack.

Within netstack, fix GetSockOpt and SetSockOpt return value logic.

PiperOrigin-RevId: 226532229
Change-Id: If40734e119eed633335f40b4c26facbebc791c74
2018-12-21 13:13:45 -08:00
Chris Kuiper e491ebbacf Allow sending of multicast and IPv6 link-local packets w/o route.
Same as with broadcast packets, sending of a multicast packet shouldn't require
accessing the route table. The same applies to IPv6 link-local addresses, which
aren't routable at all (they don't belong to any subnet by definition).

PiperOrigin-RevId: 225775870
Change-Id: Ic53e6560c125a83be2be9c3d112e66b36e8dfe7b
2018-12-16 23:05:59 -08:00
Ian Gudger 6253d32cc9 transport/tcp: remove unused error return values
PiperOrigin-RevId: 225421480
Change-Id: I1e9259b0b7e8490164e830b73338a615129c7f0e
2018-12-13 13:02:49 -08:00
Ian Gudger 25b8424d75 Stub out TCP_QUICKACK
PiperOrigin-RevId: 224696233
Change-Id: I45c425d9e32adee5dcce29ca7439a06567b26014
2018-12-09 00:50:33 -08:00
Chris Kuiper 1b3442cae0 Allow sending of broadcast packets w/o route.
Currently sending a broadcast packet (for DHCP, e.g.) requires a "default
route" of the format "0.0.0.0/0 via 0.0.0.0 <intf>". There is no good reason
for this and on devices with several ports this creates a rather akward route
table with lots of such default routes (which defeats the purpose of a default
route).

PiperOrigin-RevId: 224378769
Change-Id: Icd7ec8a206eb08083cff9a837f6f9ab231c73a19
2018-12-06 11:48:12 -08:00
Ian Gudger 000fa84a3b Fix tcpip.Endpoint.Write contract regarding short writes
* Clarify tcpip.Endpoint.Write contract regarding short writes.
* Enforce tcpip.Endpoint.Write contract regarding short writes.
* Update relevant users of tcpip.Endpoint.Write.

PiperOrigin-RevId: 224377586
Change-Id: I24299ecce902eb11317ee13dae3b8d8a7c5b097d
2018-12-06 11:41:33 -08:00
Zhaozhong Ni 7f35daddd2 sentry: support save / restore of TCP bind socket after shutdown.
PiperOrigin-RevId: 224227677
Change-Id: I08b0e0c0574170556269900653e5bcf9e9e5c9c9
2018-12-05 15:02:40 -08:00
Zhaozhong Ni fda4557e3d sentry: skip waiting for undrain for netstack TCP endpoints in error state.
PiperOrigin-RevId: 224214981
Change-Id: I4c1dd5b1c856f7a4f9866a5dda44a5297e92486a
2018-12-05 13:51:16 -08:00
Chris Kuiper fab029c50b Remove incorrect code and improve testing of Stack.GetMainNICAddress
This removes code that should have never made it in in the first place, but did so due to incomplete testing. With the new tests the original code fails, the new code passes.

PiperOrigin-RevId: 224086966
Change-Id: I646fef76977f4528f3705f497b95fad6b3ec32bc
2018-12-04 19:09:11 -08:00
Ian Gudger d209f71b9f Whitelist Go 1.12 for tcpip/time_unsafe.go
The signature of time.now has remained unchanged:
c2412a7681/src/time/time.go (L1072)

PiperOrigin-RevId: 224061160
Change-Id: Ic84bd6ee8fb9952cd9ab580bcb0892444ce7c2da
2018-12-04 15:52:14 -08:00
Ian Gudger 8cbd6153a6 Fix available calculation when merging TCP segments
PiperOrigin-RevId: 224033418
Change-Id: I780be973e8be68ac93e8c9e7a100002e912f40d2
2018-12-04 13:15:25 -08:00
Zhaozhong Ni ad8f293e1a sentry: save copy of tcp segment's delivered views to avoid in-struct pointers.
PiperOrigin-RevId: 224033238
Change-Id: Ie5b1854b29340843b02c123766d290a8738d7631
2018-12-04 13:14:24 -08:00
Ian Gudger 99fb113869 Test that full segments will be sent when delay/cork is enabled
PiperOrigin-RevId: 223425575
Change-Id: Idd777e04c69e6ffcbfb0bdbea828a8b8b42d7672
2018-11-29 15:46:38 -08:00
Ian Gudger 1918563525 Make ToView non-allocating for single VectorizedViews containing a single View
PiperOrigin-RevId: 222483471
Change-Id: I6720690b20167dd541fdfa5218eba7c9f7483347
2018-11-21 18:11:13 -08:00
Ian Gudger 9d8e49d950 Process delayed packets when delay is disabled
Moving the wakeup logic into the disable blocks is an optimization.

PiperOrigin-RevId: 221677028
Change-Id: Ib5a5a6d52cc77b4bbc5dedcad9ee1dbb3da98deb
2018-11-15 13:17:06 -08:00
Bert Muthalaly bc41e4761b Rename incorrectly named (dst, src) arguments in DeliverNetworkPacket prototype
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).

Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.

Note that every callsite passes arguments in the order (src, dst).

PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
2018-11-14 14:46:24 -08:00
Ian Gudger b5e91eaa52 Clean up tcp.sendData
PiperOrigin-RevId: 221484739
Change-Id: I44c71f79f99d0d00a2e70a7f06d7024a62a5de0a
2018-11-14 11:58:41 -08:00
Ian Gudger 7f60294a73 Implement TCP_NODELAY and TCP_CORK
Previously, TCP_NODELAY was always enabled and we would lie about it being
configurable. TCP_NODELAY is now disabled by default (to match Linux) in the
socket layer so that non-gVisor users don't automatically start using this
questionable optimization.

PiperOrigin-RevId: 221368472
Change-Id: Ib0240f66d94455081f4e0ca94f09d9338b2c1356
2018-11-13 18:02:43 -08:00
Ian Gudger c22da3e705 Remove obsolete TODO
PiperOrigin-RevId: 221117846
Change-Id: I2a43fd8135b1d1194ff81e98644ce6b6182ece50
2018-11-12 10:45:19 -08:00
Bhasker Hariharan 33089561b1 Add an implementation of a SACK scoreboard as per RFC6675.
PiperOrigin-RevId: 220866996
Change-Id: I89d48215df57c00d6a6ec512fc18712a2ea9080b
2018-11-09 14:38:46 -08:00
Fabricio Voznika dce61075c0 Fix flaky TestCacheResolutionTimeout
Increase timeout to prevent the entry from being
found when there is delay on the address resolution
goroutine that doesn't mark the request as failed.

PiperOrigin-RevId: 220504789
Change-Id: I7e44fd95d8624bd69962f862fbf5517a81395f2a
2018-11-07 12:01:48 -08:00
Googler 9256ed5283 Internal change.
PiperOrigin-RevId: 220314735
Change-Id: Ic519567e43f6caf042b9f223e517da40640b7d38
2018-11-06 11:08:22 -08:00