Commit Graph

35 Commits

Author SHA1 Message Date
Ghanan Gowripalan 87c5c0ad25 Receive broadcast packets on interested endpoints
When a broadcast packet is received by the stack, the packet should be
delivered to each endpoint that may be interested in the packet. This
includes all any address and specified broadcast address listeners.

Test: integration_test.TestReuseAddrAndBroadcast
PiperOrigin-RevId: 332060652
2020-09-16 12:20:45 -07:00
Ian Gudger 2141013dce Add support for SO_REUSEADDR to TCP sockets/endpoints.
For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses.

gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did
not allow disabling it. This change brings the SO_REUSEADDR behavior closer to
the behavior implemented by Linux and adds a new SO_REUSEADDR disabled
behavior. Like Linux, SO_REUSEADDR is now disabled by default.

PiperOrigin-RevId: 317984380
2020-06-23 19:15:38 -07:00
Ian Gudger a085e562d0 Add support for SO_REUSEADDR to UDP sockets/endpoints.
On UDP sockets, SO_REUSEADDR allows multiple sockets to bind to the same
address, but only delivers packets to the most recently bound socket. This
differs from the behavior of SO_REUSEADDR on TCP sockets. SO_REUSEADDR for TCP
sockets will likely need an almost completely independent implementation.

SO_REUSEADDR has some odd interactions with the similar SO_REUSEPORT. These
interactions are tested fairly extensively and all but one particularly odd
one (that honestly seems like a bug) behave the same on gVisor and Linux.

PiperOrigin-RevId: 315844832
2020-06-10 23:49:26 -07:00
Ting-Yu Wang d3a8bffe04 Pass PacketBuffer as pointer.
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.

With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.

Updates #2404.

PiperOrigin-RevId: 314610879
2020-06-03 15:00:42 -07:00
Jay Zhuang c64796748c Clean up transport_demuxer.go and test
- Change receiver of endpoint lookup functions
- Remove unused struct fields and functions in test
- s/%v/%s/ for errors
- Capitalize NIC
  https://github.com/golang/go/wiki/CodeReviewComments#initialisms

PiperOrigin-RevId: 303119580
2020-03-26 08:50:17 -07:00
Bhasker Hariharan 7e4073af12 Move tcpip.PacketBuffer and IPTables to stack package.
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.

Updates #2214

PiperOrigin-RevId: 302677662
2020-03-24 09:06:26 -07:00
Jamie Liu 86409c9181 Avoid unnecessary work in transportDemuxer.deliverPacket().
- Don't allocate []*endpointsByNic in transportDemuxer.deliverPacket() unless
  actually needed for UDP broadcast/multicast.

- Don't allocate []*endpointsByNic via transportDemuxer.findEndpointLocked()
  => transportDemuxer.findAllEndpointsLocked().

- Skip unnecessary map lookups in transportDemuxer.findEndpointLocked() =>
  transportDemuxer.findAllEndpointsLocked() (now iterEndpointsLocked).

For most deliverable packets other than UDP broadcast/multicast packets, this
saves two slice allocations and three map lookups per packet.

PiperOrigin-RevId: 300804135
2020-03-13 12:22:19 -07:00
Tamir Duberstein 035f7434e9 Use a heap in transport demuxer
...instead of sorting at various times. Plug a memory leak by setting
removed elements to nil.

PiperOrigin-RevId: 300471087
2020-03-11 21:13:46 -07:00
Ian Gudger c37b196455 Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.

Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.

PiperOrigin-RevId: 296922548
2020-02-24 10:32:17 -08:00
Bhasker Hariharan a611fdaee3 Changes TCP packet dispatch to use a pool of goroutines.
All inbound segments for connections in ESTABLISHED state are delivered to the
endpoint's queue but for every segment delivered we also queue the endpoint for
processing to a selected processor. This ensures that when there are a large
number of connections in ESTABLISHED state the inbound packets are all handled
by a small number of goroutines and significantly reduces the amount of work the
goscheduler has to perform.

We let connections in other states follow the current path where the
endpoint's goroutine directly handles the segments.

Updates #231

PiperOrigin-RevId: 289728325
2020-01-14 14:15:50 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Ghanan Gowripalan 3f51bef8cd Do not handle TCP packets that include a non-unicast IP address
This change drops TCP packets with a non-unicast IP address as the source or
destination address as TCP is meant for communication between two endpoints.

Test: Make sure that if the source or destination address contains a non-unicast
address, no TCP packet is sent in response and the packet is dropped.
PiperOrigin-RevId: 280073731
2019-11-12 15:50:02 -08:00
Bhasker Hariharan 66ebb6575f Add support for TIME_WAIT timeout.
This change adds explicit support for honoring the 2MSL timeout
for sockets in TIME_WAIT state. It also adds support for the
TCP_LINGER2 option that allows modification of the FIN_WAIT2
state timeout duration for a given socket.

It also adds an option to modify the Stack wide TIME_WAIT timeout
but this is only for testing. On Linux this is fixed at 60s.

Further, we also now correctly process RST's in CLOSE_WAIT and
close the socket similar to linux without moving it to error
state.

We also now handle SYN in ESTABLISHED state as per
RFC5961#section-4.1. Earlier we would just drop these SYNs.
Which can result in some tests that pass on linux to fail on
gVisor.

Netstack now honors TIME_WAIT correctly as well as handles the
following cases correctly.

- TCP RSTs in TIME_WAIT are ignored.
- A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT
  and a dup ACK is sent in response to the FIN as the dup FIN
  indicates potential loss of the original final ACK.
- An out of order segment during TIME_WAIT generates a dup ACK.
- A new SYN w/ a sequence number > the highest sequence number
  in the previous connection closes the TIME_WAIT early and
  opens a new connection.

Further to make the SYN case work correctly the ISN (Initial
Sequence Number) generation for Netstack has been updated to
be as per RFC. Its not a pure random number anymore and follows
the recommendation in https://tools.ietf.org/html/rfc6528#page-3.

The current hash used is not a cryptographically secure hash
function. A separate change will update the hash function used
to Siphash similar to what is used in Linux.

PiperOrigin-RevId: 279106406
2019-11-07 09:46:55 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Ian Gudger a2c51efe36 Add endpoint tracking to the stack.
In the future this will replace DanglingEndpoints. DanglingEndpoints must be
kept for now due to issues with save/restore.

This is arguably a cleaner design and allows the stack to know which transport
endpoints might still be using its link endpoints.

Updates #837

PiperOrigin-RevId: 277386633
2019-10-29 16:14:51 -07:00
Ian Gudger 7d80e85835 Allow waiting for Endpoint worker goroutines to finish.
Updates #837

PiperOrigin-RevId: 277325162
2019-10-29 11:32:48 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Kevin Krakauer c1cdf18e7b UDP and TCP raw socket support.
PiperOrigin-RevId: 249511348
Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-22 13:45:15 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Kevin Krakauer 52a51a8e20 Add a raw socket transport endpoint and use it for raw ICMP sockets.
Having raw socket code together will make it easier to add support for other raw
network protocols. Currently, only ICMP uses the raw endpoint. However, adding
support for other protocols such as UDP shouldn't be much more difficult than
adding a few switch cases.

PiperOrigin-RevId: 241564875
Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-04-02 11:13:49 -07:00
Kevin Krakauer 121db29a93 Ping support via IPv4 raw sockets.
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
  which can pass it up to the socket layer. This allows a raw socket to return
  the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
  that enable incoming packets to be delivered to raw endpoints. New raw sockets
  of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.

PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-27 14:31:21 -08:00
Amanda Tait ea070b9d5f Implement Broadcast support
This change adds support for the SO_BROADCAST socket option in gVisor Netstack.
This support includes getsockopt()/setsockopt() functionality for both UDP and
TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and
down the stack, and route finding/creation for broadcast packets. Finally, a
suite of tests have been implemented, exercising this functionality through the
Linux syscall API.

PiperOrigin-RevId: 234850781
Change-Id: If3e666666917d39f55083741c78314a06defb26c
2019-02-20 12:54:13 -08:00
Andrei Vagin 652d068119 Implement SO_REUSEPORT for TCP and UDP sockets
This option allows multiple sockets to be bound to the same port.

Incoming packets are distributed to sockets using a hash based on source and
destination addresses. This means that all packets from one sender will be
received by the same server socket.

PiperOrigin-RevId: 227153413
Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
2018-12-28 11:27:14 -08:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Tamir Duberstein d689f8422f Always pass buffer.VectorisedView by value
PiperOrigin-RevId: 212757571
Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
2018-09-12 21:57:55 -07:00
Tamir Duberstein 0923bcf06b Add various statistics
PiperOrigin-RevId: 210442599
Change-Id: I9498351f461dc69c77b7f815d526c5693bec8e4a
2018-08-27 15:29:55 -07:00
Nicolas Lacasse bf0fa09537 Switch netstack licenses to Apache 2.0.
Fixes #27

PiperOrigin-RevId: 203825288
Change-Id: Ie9f3a2b2c1e296b026b024f75c07da1a7e118633
2018-07-09 14:04:40 -07:00
Ian Gudger b4765f782d Fix warning: redundant if ...; err != nil check, just return error instead.
This warning is produced by golint.

PiperOrigin-RevId: 195833381
Change-Id: Idd6a7e57e3cfdf00819f2374b19fc113585dc1e1
2018-05-08 09:51:56 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00