Commit Graph

69 Commits

Author SHA1 Message Date
Bhasker Hariharan 458fe955a7 Implement support for SACK based recovery(RFC 6675).
PiperOrigin-RevId: 246536003
Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
2019-05-03 10:51:18 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Kevin Krakauer 43dff57b87 Make raw sockets a toggleable feature disabled by default.
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-26 16:51:46 -07:00
Bhasker Hariharan eaac2806ff Add TCP checksum verification.
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
2019-04-09 11:23:47 -07:00
Kevin Krakauer 52a51a8e20 Add a raw socket transport endpoint and use it for raw ICMP sockets.
Having raw socket code together will make it easier to add support for other raw
network protocols. Currently, only ICMP uses the raw endpoint. However, adding
support for other protocols such as UDP shouldn't be much more difficult than
adding a few switch cases.

PiperOrigin-RevId: 241564875
Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-04-02 11:13:49 -07:00
Andrei Vagin f4105ac21a netstack/fdbased: add generic segmentation offload (GSO) support
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.

Here are numbers for the nginx-1m test:
runsc:		579330.01 [Kbytes/sec] received
runsc-gso:	1794121.66 [Kbytes/sec] received
runc:		2122139.06 [Kbytes/sec] received

and for tcp_benchmark:

$ tcp_benchmark  --duration 15   --ideal
[  4]  0.0-15.0 sec  86647 MBytes  48456 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal
[  4]  0.0-15.0 sec  2173 MBytes  1214 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal --gso 65536
[  4]  0.0-15.0 sec  19357 MBytes  10825 Mbits/sec

PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-03-28 11:03:41 -07:00
Andrei Vagin 654e878abb netstack: Don't exclude length when a pseudo-header checksum is calculated
This is a preparation for GSO changes (cl/234508902).

RELNOTES[gofers]: Refactor checksum code to include length, which
it already did, but in a convoluted way. Should be a no-op.

PiperOrigin-RevId: 240460794
Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2
2019-03-26 17:15:13 -07:00
Bert Muthalaly 928809fa7d Add layer 2 stats (tx, rx) X (packets, bytes) to netstack
PiperOrigin-RevId: 239194420
Change-Id: Ie193e8ac2b7a6db21195ac85824a335930483971
2019-03-19 08:30:43 -07:00
Ian Gudger a16f6e50c5 Make HandleLocal apply to all non-loopback interfaces.
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify
the implementations. This has the benefit of making HandleLocal apply even when
the fdbased link endpoint isn't in use.

In addition, move looping logic to route creation so that it doesn't need to be
run for each packet. This should improve performance.

PiperOrigin-RevId: 238099480
Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
2019-03-12 14:37:56 -07:00
Ian Gudger 56a6128295 Implement IP_MULTICAST_LOOP.
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.

For now we only support IPv4 multicast.

PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
2019-03-08 15:49:17 -08:00
Kevin Krakauer 23e66ee96d Remove unused commit() function argument to Bind.
PiperOrigin-RevId: 236926132
Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181
2019-03-05 14:53:34 -08:00
Tamir Duberstein 3830786883 Map IPv{4,6} addresses to ethernet addresses
...in accordance with RFCs 1112 and 2464.

Fixes IPv4 multicast when IP_MULTICAST_IF is specified.

Don't return ErrNoRoute when no route is needed.
Don't set Route.NextHop when no route is needed.

PiperOrigin-RevId: 236199813
Change-Id: I48ed33e1b7f760deaa37e18ad7f1b8b62819ab43
2019-02-28 14:38:32 -08:00
Kevin Krakauer 121db29a93 Ping support via IPv4 raw sockets.
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
  which can pass it up to the socket layer. This allows a raw socket to return
  the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
  that enable incoming packets to be delivered to raw endpoints. New raw sockets
  of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.

PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-27 14:31:21 -08:00
Googler 12d9cf6fab Adds a WriteRawPacket method to the InjectableLinkEndpoint interface.
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.

PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
2019-02-26 14:58:37 -08:00
Amanda Tait 33d0e824c7 Use more conservative locking in NIC.DeliverNetworkPacket
An earlier CL excessively minimizes the period in which it
holds a lock on NIC. This earlier CL had done this out of
the mistaken impression it fixed a broken test, when in
fact it just reduced the rate of failure of a flaky test
in tcp_test.go. This new change holds the lock on NIC
for the duration of the loop over n.endpoints.

PiperOrigin-RevId: 235732487
Change-Id: I53ee6df264f093ddc4d29e9acdcba6b4838cb112
2019-02-26 09:10:37 -08:00
Amanda Tait c14a1a1618 Fix race condition in NIC.DeliverNetworkPacket
cl/234850781 introduced a race condition in NIC.DeliverNetworkPacket
by failing to hold a lock. This change fixes this regressesion by acquiring
a read lock before iterating through n.endpoints, and then releasing the lock
once iteration is complete.

PiperOrigin-RevId: 235549770
Change-Id: Ib0133288be512d478cf759c3314dc95ec3205d4b
2019-02-25 10:02:29 -08:00
Amanda Tait ea070b9d5f Implement Broadcast support
This change adds support for the SO_BROADCAST socket option in gVisor Netstack.
This support includes getsockopt()/setsockopt() functionality for both UDP and
TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and
down the stack, and route finding/creation for broadcast packets. Finally, a
suite of tests have been implemented, exercising this functionality through the
Linux syscall API.

PiperOrigin-RevId: 234850781
Change-Id: If3e666666917d39f55083741c78314a06defb26c
2019-02-20 12:54:13 -08:00
Ian Gudger c611dbc5a7 Implement IP_MULTICAST_IF.
This allows setting a default send interface for IPv4 multicast. IPv6 support
will come later.

PiperOrigin-RevId: 234251379
Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173
2019-02-15 18:40:15 -08:00
Ian Gudger 80f901b16b Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack.
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in
a followup CL.

PiperOrigin-RevId: 233008638
Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
2019-02-07 23:15:23 -08:00
Googler e0afa87899 Internal change.
PiperOrigin-RevId: 232937200
Change-Id: I5c3709cc8f1313313ff618a45e48c14a3a111cb4
2019-02-07 13:46:26 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Ian Gudger ff1c3bb0b5 Fix NIC endpoint forwarding.
Also adds a test for regular NIC forwarding.

PiperOrigin-RevId: 231495279
Change-Id: Ic7edec249568e9ad0280cea77eac14478c9073e1
2019-01-29 16:23:30 -08:00
Andrei Vagin 652d068119 Implement SO_REUSEPORT for TCP and UDP sockets
This option allows multiple sockets to be bound to the same port.

Incoming packets are distributed to sockets using a hash based on source and
destination addresses. This means that all packets from one sender will be
received by the same server socket.

PiperOrigin-RevId: 227153413
Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
2018-12-28 11:27:14 -08:00
Chris Kuiper e491ebbacf Allow sending of multicast and IPv6 link-local packets w/o route.
Same as with broadcast packets, sending of a multicast packet shouldn't require
accessing the route table. The same applies to IPv6 link-local addresses, which
aren't routable at all (they don't belong to any subnet by definition).

PiperOrigin-RevId: 225775870
Change-Id: Ic53e6560c125a83be2be9c3d112e66b36e8dfe7b
2018-12-16 23:05:59 -08:00
Chris Kuiper 1b3442cae0 Allow sending of broadcast packets w/o route.
Currently sending a broadcast packet (for DHCP, e.g.) requires a "default
route" of the format "0.0.0.0/0 via 0.0.0.0 <intf>". There is no good reason
for this and on devices with several ports this creates a rather akward route
table with lots of such default routes (which defeats the purpose of a default
route).

PiperOrigin-RevId: 224378769
Change-Id: Icd7ec8a206eb08083cff9a837f6f9ab231c73a19
2018-12-06 11:48:12 -08:00
Chris Kuiper fab029c50b Remove incorrect code and improve testing of Stack.GetMainNICAddress
This removes code that should have never made it in in the first place, but did so due to incomplete testing. With the new tests the original code fails, the new code passes.

PiperOrigin-RevId: 224086966
Change-Id: I646fef76977f4528f3705f497b95fad6b3ec32bc
2018-12-04 19:09:11 -08:00
Bert Muthalaly bc41e4761b Rename incorrectly named (dst, src) arguments in DeliverNetworkPacket prototype
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).

Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.

Note that every callsite passes arguments in the order (src, dst).

PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
2018-11-14 14:46:24 -08:00
Bhasker Hariharan 33089561b1 Add an implementation of a SACK scoreboard as per RFC6675.
PiperOrigin-RevId: 220866996
Change-Id: I89d48215df57c00d6a6ec512fc18712a2ea9080b
2018-11-09 14:38:46 -08:00
Fabricio Voznika dce61075c0 Fix flaky TestCacheResolutionTimeout
Increase timeout to prevent the entry from being
found when there is delay on the address resolution
goroutine that doesn't mark the request as failed.

PiperOrigin-RevId: 220504789
Change-Id: I7e44fd95d8624bd69962f862fbf5517a81395f2a
2018-11-07 12:01:48 -08:00
Tamir Duberstein 692df85673 Simplify channel management
The channels {cancel,resCh} have roughly the same lifetime and are used for
roughly the same purpose as an entry's waiters; we can unify the state
management of the two mechanisms, while also reducing unncessary mutex locking
and unlocking.

Made some cosmetic changes while I'm here.

PiperOrigin-RevId: 218343915
Change-Id: Ic69546a2b7b390162b2231f07f335dd6199472d7
2018-10-23 08:16:13 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Sepehr Raissian c17ea8c6e2 Block for link address resolution
Previously, if address resolution for UDP or Ping sockets required sending
packets using Write in Transport layer, Resolve would return ErrWouldBlock
and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution
would run in background. Further calls to Write using same address would also
return ErrNoLinkAddress until resolution has been completed successfully.

Since Write is not allowed to block and System Calls need to be
interruptible in System Call layer, the caller to Write is responsible for
blocking upon return of ErrWouldBlock.

Now, when startAddressResolution is called a notification channel for
the completion of the address resolution is returned.
The channel will traverse up to the calling function of Write as well as
ErrNoLinkAddress. Once address resolution is complete (success or not) the
channel is closed. The caller would call Write again to send packets and
check if address resolution was compeleted successfully or not.

Fixes google/gvisor#5

Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93
PiperOrigin-RevId: 214962200
2018-09-28 11:00:16 -07:00
Bert Muthalaly 2e497de2d9 Pass local link address to DeliverNetworkPacket
This allows a NetworkDispatcher to implement transparent bridging,
assuming all implementations of LinkEndpoint.WritePacket call eth.Encode
with header.EthernetFields.SrcAddr set to the passed
Route.LocalLinkAddress, if it is provided.

PiperOrigin-RevId: 213686651
Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
2018-09-19 13:43:58 -07:00
Tamir Duberstein 23258ca284 Implement packet forwarding to enable NAT
PiperOrigin-RevId: 213323501
Change-Id: I0996ddbdcf097588745efe35481085d42dbaf446
2018-09-17 13:05:36 -07:00
Tamir Duberstein d7a05b4e63 Pass buffer.Prependable by value
PiperOrigin-RevId: 213053370
Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562
2018-09-14 15:23:58 -07:00
Tamir Duberstein d689f8422f Always pass buffer.VectorisedView by value
PiperOrigin-RevId: 212757571
Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
2018-09-12 21:57:55 -07:00
Tamir Duberstein 5adb3468d4 Add multicast support
PiperOrigin-RevId: 212750821
Change-Id: I822fd63e48c684b45fd91f9ce057867b7eceb792
2018-09-12 20:39:24 -07:00
Tamir Duberstein 156b49ca85 Fix race condition introduced in 211135505
Now that it's possible to remove subnets, we must iterate over them with locks
held.

Also do the removal more efficiently while I'm here.

PiperOrigin-RevId: 211737416
Change-Id: I29025ec8b0c3ad11f22d4447e8ad473f1c785463
2018-09-05 18:59:16 -07:00
Bert Muthalaly 5685d6b5ad Update {LinkEndpoint,NetworkEndpoint}#WritePacket to take a VectorisedView
Makes it possible to avoid copying or allocating in cases where DeliverNetworkPacket (rx)
needs to turn around and call WritePacket (tx) with its VectorisedView.

Also removes the restriction on having VectorisedViews with multiple views in the write path.

PiperOrigin-RevId: 211728717
Change-Id: Ie03a65ecb4e28bd15ebdb9c69f05eced18fdfcff
2018-09-05 17:34:25 -07:00
Tamir Duberstein fe8ca76c22 Implement Subnet removal
This was used to implement https://fuchsia-review.googlesource.com/c/garnet/+/177771.

PiperOrigin-RevId: 211725098
Change-Id: Ib0acc7c13430b7341e8e0ec6eb5fc35f5cee5083
2018-09-05 17:06:29 -07:00
Bhasker Hariharan 2cff07381a Automated rollback of changelist 211156845
PiperOrigin-RevId: 211525182
Change-Id: I462c20328955c77ecc7bfd8ee803ac91f15858e6
2018-09-04 14:31:52 -07:00
Googler f0d8817654 Automated rollback of changelist 211103930
PiperOrigin-RevId: 211156845
Change-Id: Ie28011d7eb5f45f3a0158dbee2a68c5edf22f6e0
2018-08-31 15:48:50 -07:00
Tamir Duberstein 625edb9f28 ipv6: ICMP support
This CL does NDP link-address discovery for IPv6.

It includes several small changes necessary to get linux to talk to
this implementation. In particular, a hop limit of 255 is necessary
for ICMPv6.

PiperOrigin-RevId: 211103930
Change-Id: If25370ab84c6b1decfb15de917f3b0020f2c4e0e
2018-08-31 10:23:32 -07:00
Tamir Duberstein 0923bcf06b Add various statistics
PiperOrigin-RevId: 210442599
Change-Id: I9498351f461dc69c77b7f815d526c5693bec8e4a
2018-08-27 15:29:55 -07:00
Tamir Duberstein b17e80ef5a Upstreaming DHCP changes from Fuchsia
PiperOrigin-RevId: 210221388
Change-Id: Ic82d592b8c4778855fa55ba913f6b9a10b2d511f
2018-08-25 06:17:32 -07:00
Ian Gudger abe7764928 Encapsulate netstack metrics
PiperOrigin-RevId: 209943212
Change-Id: I96dcbc7c2ab2426e510b94a564436505256c5c79
2018-08-23 08:55:23 -07:00
Googler a316f83977 Expose route table
PiperOrigin-RevId: 209670528
Change-Id: I2890bcdef36f0b5f24b372b42cf628b38dd5764e
2018-08-21 15:27:09 -07:00