gvisor

Commit Graph

Author	SHA1	Message	Date
Ghanan Gowripalan	87c5c0ad25	Receive broadcast packets on interested endpoints When a broadcast packet is received by the stack, the packet should be delivered to each endpoint that may be interested in the packet. This includes all any address and specified broadcast address listeners. Test: integration_test.TestReuseAddrAndBroadcast PiperOrigin-RevId: 332060652	2020-09-16 12:20:45 -07:00
Ian Gudger	2141013dce	Add support for SO_REUSEADDR to TCP sockets/endpoints. For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses. gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did not allow disabling it. This change brings the SO_REUSEADDR behavior closer to the behavior implemented by Linux and adds a new SO_REUSEADDR disabled behavior. Like Linux, SO_REUSEADDR is now disabled by default. PiperOrigin-RevId: 317984380	2020-06-23 19:15:38 -07:00
Ian Gudger	a085e562d0	Add support for SO_REUSEADDR to UDP sockets/endpoints. On UDP sockets, SO_REUSEADDR allows multiple sockets to bind to the same address, but only delivers packets to the most recently bound socket. This differs from the behavior of SO_REUSEADDR on TCP sockets. SO_REUSEADDR for TCP sockets will likely need an almost completely independent implementation. SO_REUSEADDR has some odd interactions with the similar SO_REUSEPORT. These interactions are tested fairly extensively and all but one particularly odd one (that honestly seems like a bug) behave the same on gVisor and Linux. PiperOrigin-RevId: 315844832	2020-06-10 23:49:26 -07:00
Ting-Yu Wang	d3a8bffe04	Pass PacketBuffer as pointer. Historically we've been passing PacketBuffer by shallow copying through out the stack. Right now, this is only correct as the caller would not use PacketBuffer after passing into the next layer in netstack. With new buffer management effort in gVisor/netstack, PacketBuffer will own a Buffer (to be added). Internally, both PacketBuffer and Buffer may have pointers and shallow copying shouldn't be used. Updates #2404. PiperOrigin-RevId: 314610879	2020-06-03 15:00:42 -07:00
Jay Zhuang	c64796748c	Clean up transport_demuxer.go and test - Change receiver of endpoint lookup functions - Remove unused struct fields and functions in test - s/%v/%s/ for errors - Capitalize NIC https://github.com/golang/go/wiki/CodeReviewComments#initialisms PiperOrigin-RevId: 303119580	2020-03-26 08:50:17 -07:00
Bhasker Hariharan	7e4073af12	Move tcpip.PacketBuffer and IPTables to stack package. This is a precursor to be being able to build an intrusive list of PacketBuffers for use in queuing disciplines being implemented. Updates #2214 PiperOrigin-RevId: 302677662	2020-03-24 09:06:26 -07:00
Jamie Liu	86409c9181	Avoid unnecessary work in transportDemuxer.deliverPacket(). - Don't allocate []endpointsByNic in transportDemuxer.deliverPacket() unless actually needed for UDP broadcast/multicast. - Don't allocate []endpointsByNic via transportDemuxer.findEndpointLocked() => transportDemuxer.findAllEndpointsLocked(). - Skip unnecessary map lookups in transportDemuxer.findEndpointLocked() => transportDemuxer.findAllEndpointsLocked() (now iterEndpointsLocked). For most deliverable packets other than UDP broadcast/multicast packets, this saves two slice allocations and three map lookups per packet. PiperOrigin-RevId: 300804135	2020-03-13 12:22:19 -07:00
Tamir Duberstein	035f7434e9	Use a heap in transport demuxer ...instead of sorting at various times. Plug a memory leak by setting removed elements to nil. PiperOrigin-RevId: 300471087	2020-03-11 21:13:46 -07:00
Ian Gudger	c37b196455	Add support for tearing down protocol dispatchers and TIME_WAIT endpoints. Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to test this change. Also fix a race when a socket in SYN-RCVD is closed. This is also required to test this change. PiperOrigin-RevId: 296922548	2020-02-24 10:32:17 -08:00
Bhasker Hariharan	a611fdaee3	Changes TCP packet dispatch to use a pool of goroutines. All inbound segments for connections in ESTABLISHED state are delivered to the endpoint's queue but for every segment delivered we also queue the endpoint for processing to a selected processor. This ensures that when there are a large number of connections in ESTABLISHED state the inbound packets are all handled by a small number of goroutines and significantly reduces the amount of work the goscheduler has to perform. We let connections in other states follow the current path where the endpoint's goroutine directly handles the segments. Updates #231 PiperOrigin-RevId: 289728325	2020-01-14 14:15:50 -08:00
Ian Gudger	27500d529f	New sync package. * Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387	2020-01-09 22:02:24 -08:00
Ghanan Gowripalan	3f51bef8cd	Do not handle TCP packets that include a non-unicast IP address This change drops TCP packets with a non-unicast IP address as the source or destination address as TCP is meant for communication between two endpoints. Test: Make sure that if the source or destination address contains a non-unicast address, no TCP packet is sent in response and the packet is dropped. PiperOrigin-RevId: 280073731	2019-11-12 15:50:02 -08:00
Bhasker Hariharan	66ebb6575f	Add support for TIME_WAIT timeout. This change adds explicit support for honoring the 2MSL timeout for sockets in TIME_WAIT state. It also adds support for the TCP_LINGER2 option that allows modification of the FIN_WAIT2 state timeout duration for a given socket. It also adds an option to modify the Stack wide TIME_WAIT timeout but this is only for testing. On Linux this is fixed at 60s. Further, we also now correctly process RST's in CLOSE_WAIT and close the socket similar to linux without moving it to error state. We also now handle SYN in ESTABLISHED state as per RFC5961#section-4.1. Earlier we would just drop these SYNs. Which can result in some tests that pass on linux to fail on gVisor. Netstack now honors TIME_WAIT correctly as well as handles the following cases correctly. - TCP RSTs in TIME_WAIT are ignored. - A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT and a dup ACK is sent in response to the FIN as the dup FIN indicates potential loss of the original final ACK. - An out of order segment during TIME_WAIT generates a dup ACK. - A new SYN w/ a sequence number > the highest sequence number in the previous connection closes the TIME_WAIT early and opens a new connection. Further to make the SYN case work correctly the ISN (Initial Sequence Number) generation for Netstack has been updated to be as per RFC. Its not a pure random number anymore and follows the recommendation in https://tools.ietf.org/html/rfc6528#page-3. The current hash used is not a cryptographically secure hash function. A separate change will update the hash function used to Siphash similar to what is used in Linux. PiperOrigin-RevId: 279106406	2019-11-07 09:46:55 -08:00
Kevin Krakauer	e1b21f3c8c	Use PacketBuffers, rather than VectorisedViews, in netstack. PacketBuffers are analogous to Linux's sk_buff. They hold all information about a packet, headers, and payload. This is important for: * iptables to access various headers of packets * Preventing the clutter of passing different net and link headers along with VectorisedViews to packet handling functions. This change only affects the incoming packet path, and a future change will change the outgoing path. Benchmark Regular PacketBufferPtr PacketBufferConcrete -------------------------------------------------------------------------------- BM_Recvmsg 400.715MB/s 373.676MB/s 396.276MB/s BM_Sendmsg 361.832MB/s 333.003MB/s 335.571MB/s BM_Recvfrom 453.336MB/s 393.321MB/s 381.650MB/s BM_Sendto 378.052MB/s 372.134MB/s 341.342MB/s BM_SendmsgTCP/0/1k 353.711MB/s 316.216MB/s 322.747MB/s BM_SendmsgTCP/0/2k 600.681MB/s 588.776MB/s 565.050MB/s BM_SendmsgTCP/0/4k 995.301MB/s 888.808MB/s 941.888MB/s BM_SendmsgTCP/0/8k 1.517GB/s 1.274GB/s 1.345GB/s BM_SendmsgTCP/0/16k 1.872GB/s 1.586GB/s 1.698GB/s BM_SendmsgTCP/0/32k 1.017GB/s 1.020GB/s 1.133GB/s BM_SendmsgTCP/0/64k 475.626MB/s 584.587MB/s 627.027MB/s BM_SendmsgTCP/0/128k 416.371MB/s 503.434MB/s 409.850MB/s BM_SendmsgTCP/0/256k 323.449MB/s 449.599MB/s 388.852MB/s BM_SendmsgTCP/0/512k 243.992MB/s 267.676MB/s 314.474MB/s BM_SendmsgTCP/0/1M 95.138MB/s 95.874MB/s 95.417MB/s BM_SendmsgTCP/0/2M 96.261MB/s 94.977MB/s 96.005MB/s BM_SendmsgTCP/0/4M 96.512MB/s 95.978MB/s 95.370MB/s BM_SendmsgTCP/0/8M 95.603MB/s 95.541MB/s 94.935MB/s BM_SendmsgTCP/0/16M 94.598MB/s 94.696MB/s 94.521MB/s BM_SendmsgTCP/0/32M 94.006MB/s 94.671MB/s 94.768MB/s BM_SendmsgTCP/0/64M 94.133MB/s 94.333MB/s 94.746MB/s BM_SendmsgTCP/0/128M 93.615MB/s 93.497MB/s 93.573MB/s BM_SendmsgTCP/0/256M 93.241MB/s 95.100MB/s 93.272MB/s BM_SendmsgTCP/1/1k 303.644MB/s 316.074MB/s 308.430MB/s BM_SendmsgTCP/1/2k 537.093MB/s 584.962MB/s 529.020MB/s BM_SendmsgTCP/1/4k 882.362MB/s 939.087MB/s 892.285MB/s BM_SendmsgTCP/1/8k 1.272GB/s 1.394GB/s 1.296GB/s BM_SendmsgTCP/1/16k 1.802GB/s 2.019GB/s 1.830GB/s BM_SendmsgTCP/1/32k 2.084GB/s 2.173GB/s 2.156GB/s BM_SendmsgTCP/1/64k 2.515GB/s 2.463GB/s 2.473GB/s BM_SendmsgTCP/1/128k 2.811GB/s 3.004GB/s 2.946GB/s BM_SendmsgTCP/1/256k 3.008GB/s 3.159GB/s 3.171GB/s BM_SendmsgTCP/1/512k 2.980GB/s 3.150GB/s 3.126GB/s BM_SendmsgTCP/1/1M 2.165GB/s 2.233GB/s 2.163GB/s BM_SendmsgTCP/1/2M 2.370GB/s 2.219GB/s 2.453GB/s BM_SendmsgTCP/1/4M 2.005GB/s 2.091GB/s 2.214GB/s BM_SendmsgTCP/1/8M 2.111GB/s 2.013GB/s 2.109GB/s BM_SendmsgTCP/1/16M 1.902GB/s 1.868GB/s 1.897GB/s BM_SendmsgTCP/1/32M 1.655GB/s 1.665GB/s 1.635GB/s BM_SendmsgTCP/1/64M 1.575GB/s 1.547GB/s 1.575GB/s BM_SendmsgTCP/1/128M 1.524GB/s 1.584GB/s 1.580GB/s BM_SendmsgTCP/1/256M 1.579GB/s 1.607GB/s 1.593GB/s PiperOrigin-RevId: 278940079	2019-11-06 14:25:59 -08:00
Andrei Vagin	db37483cb6	Store endpoints inside multiPortEndpoint in a sorted order It is required to guarantee the same order of endpoints after save/restore. PiperOrigin-RevId: 277598665	2019-10-30 15:33:41 -07:00
Ian Gudger	a2c51efe36	Add endpoint tracking to the stack. In the future this will replace DanglingEndpoints. DanglingEndpoints must be kept for now due to issues with save/restore. This is arguably a cleaner design and allows the stack to know which transport endpoints might still be using its link endpoints. Updates #837 PiperOrigin-RevId: 277386633	2019-10-29 16:14:51 -07:00
Ian Gudger	7d80e85835	Allow waiting for Endpoint worker goroutines to finish. Updates #837 PiperOrigin-RevId: 277325162	2019-10-29 11:32:48 -07:00
Kevin Krakauer	12235d533a	AF_PACKET support for netstack (aka epsocket). Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366	2019-10-21 13:23:18 -07:00
Chris Kuiper	4874525161	Implement proper local broadcast behavior The behavior for sending and receiving local broadcast (255.255.255.255) traffic is as follows: Outgoing -------- * A broadcast packet sent on a socket that is bound to an interface goes out that interface * A broadcast packet sent on an unbound socket follows the route table to select the outgoing interface + if an explicit route entry exists for 255.255.255.255/32, use that one + else use the default route * Broadcast packets are looped back and delivered following the rules for incoming packets (see next). This is the same behavior as for multicast packets, except that it cannot be disabled via sockopt. Incoming -------- * Sockets wishing to receive broadcast packets must bind to either INADDR_ANY (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives broadcast packets. * Broadcast packets are multiplexed to all sockets matching it. This is the same behavior as for multicast packets. * A socket can bind to 255.255.255.255:<port> and then receive its own broadcast packets sent to 255.255.255.255:<port> In addition, this change implicitly fixes an issue with multicast reception. If two sockets want to receive a given multicast stream and one is bound to ANY while the other is bound to the multicast address, only one of them will receive the traffic. PiperOrigin-RevId: 272792377	2019-10-03 19:31:35 -07:00
gVisor bot	abbee5615f	Implement SO_BINDTODEVICE sockopt PiperOrigin-RevId: 271644926	2019-09-27 14:14:04 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00
Kevin Krakauer	c1cdf18e7b	UDP and TCP raw socket support. PiperOrigin-RevId: 249511348 Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b	2019-05-22 13:45:15 -07:00
Chris Kuiper	2d8e90b311	Proper cleanup of sockets that used REUSEPORT Fixed a small logic error that broke proper accounting of MultiPortEndpoints. PiperOrigin-RevId: 246502126 Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2	2019-05-03 07:02:51 -07:00
Chris Kuiper	8972e47a2e	Support reception of multicast data on more than one socket This requires two changes: 1) Support for more than one socket to join a given multicast group. 2) Duplicate delivery of incoming multicast packets to all sockets listening for it. In addition, I tweaked the code (and added a test) to disallow duplicates IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does it. PiperOrigin-RevId: 246437315 Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72	2019-05-02 19:41:00 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Kevin Krakauer	52a51a8e20	Add a raw socket transport endpoint and use it for raw ICMP sockets. Having raw socket code together will make it easier to add support for other raw network protocols. Currently, only ICMP uses the raw endpoint. However, adding support for other protocols such as UDP shouldn't be much more difficult than adding a few switch cases. PiperOrigin-RevId: 241564875 Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf	2019-04-02 11:13:49 -07:00
Kevin Krakauer	121db29a93	Ping support via IPv4 raw sockets. Broadly, this change: * Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`. * Passes the network-layer (IP) header up the stack to the transport endpoint, which can pass it up to the socket layer. This allows a raw socket to return the entire IP packet to users. * Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer that enable incoming packets to be delivered to raw endpoints. New raw sockets of other protocols (not ICMP) just need to register with the stack. * Enables ping.endpoint to return IP headers when created via SOCK_RAW. PiperOrigin-RevId: 235993280 Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a	2019-02-27 14:31:21 -08:00
Amanda Tait	ea070b9d5f	Implement Broadcast support This change adds support for the SO_BROADCAST socket option in gVisor Netstack. This support includes getsockopt()/setsockopt() functionality for both UDP and TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and down the stack, and route finding/creation for broadcast packets. Finally, a suite of tests have been implemented, exercising this functionality through the Linux syscall API. PiperOrigin-RevId: 234850781 Change-Id: If3e666666917d39f55083741c78314a06defb26c	2019-02-20 12:54:13 -08:00
Andrei Vagin	652d068119	Implement SO_REUSEPORT for TCP and UDP sockets This option allows multiple sockets to be bound to the same port. Incoming packets are distributed to sockets using a hash based on source and destination addresses. This means that all packets from one sender will be received by the same server socket. PiperOrigin-RevId: 227153413 Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf	2018-12-28 11:27:14 -08:00
Ian Gudger	8fce67af24	Use correct company name in copyright header PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035	2018-10-19 16:35:11 -07:00
Tamir Duberstein	d689f8422f	Always pass buffer.VectorisedView by value PiperOrigin-RevId: 212757571 Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a	2018-09-12 21:57:55 -07:00
Tamir Duberstein	0923bcf06b	Add various statistics PiperOrigin-RevId: 210442599 Change-Id: I9498351f461dc69c77b7f815d526c5693bec8e4a	2018-08-27 15:29:55 -07:00
Nicolas Lacasse	bf0fa09537	Switch netstack licenses to Apache 2.0. Fixes #27 PiperOrigin-RevId: 203825288 Change-Id: Ie9f3a2b2c1e296b026b024f75c07da1a7e118633	2018-07-09 14:04:40 -07:00
Ian Gudger	b4765f782d	Fix warning: redundant if ...; err != nil check, just return error instead. This warning is produced by golint. PiperOrigin-RevId: 195833381 Change-Id: Idd6a7e57e3cfdf00819f2374b19fc113585dc1e1	2018-05-08 09:51:56 -07:00
Googler	d02b74a5dc	Check in gVisor. PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463	2018-04-28 01:44:26 -04:00

35 Commits