gvisor

Commit Graph

Author	SHA1	Message	Date
Tamir Duberstein	573e6e4bba	Use tcpip.Subnet in tcpip.Route This is the first step in replacing some of the redundant types with the standard library equivalents. PiperOrigin-RevId: 264706552	2019-08-21 15:31:18 -07:00
Andrei Vagin	3e4102b2ea	netstack: disconnect an unix socket only if the address family is AF_UNSPEC Linux allows to call connect for ANY and the zero port. PiperOrigin-RevId: 263892534	2019-08-16 19:32:14 -07:00
Tamir Duberstein	816a9211e9	netstack: move resumption logic into _state.go `13a98df` rearranged some of this code in a way that broke compilation of the netstack-only export at github.com/google/netstack because _state.go files are not included in that export. This commit moves resumption logic back into *_state.go, fixing the compilation breakage. PiperOrigin-RevId: 263601629	2019-08-15 11:13:46 -07:00
Tamir Duberstein	d81d94ac4c	Replace uinptr with int64 when returning lengths This is in accordance with newer parts of the standard library. PiperOrigin-RevId: 263449916	2019-08-14 16:05:56 -07:00
Bhasker Hariharan	570fb1db6b	Improve SendMsg performance. SendMsg before this change would copy all the data over into a new slice even if the underlying socket could only accept a small amount of data. This is really inefficient with non-blocking sockets and under high throughput where large writes could get ErrWouldBlock or if there was say a timeout associated with the sendmsg() syscall. With this change we delay copying bytes in till they are needed and only copy what can be potentially sent/held in the socket buffer. Reducing the need to repeatedly copy data over. Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called instead of when we transmit the actual FIN. Otherwise the socket could remain in CONNECTED state even though the user has called shutdown() on the socket. Updates #627 PiperOrigin-RevId: 263430505	2019-08-14 14:34:27 -07:00
Bhasker Hariharan	5a38eb120a	Add congestion control states to sender. This change just introduces different congestion control states and ensures the sender.state is updated to reflect the current state of the connection. It is not used for any decisions yet but this is required before algorithms like Eiffel/PRR can be implemented. Fixes #394 PiperOrigin-RevId: 262638292	2019-08-09 14:50:30 -07:00
Rahat Mahmood	13a98df49e	netstack: Don't start endpoint goroutines too soon on restore. Endpoint protocol goroutines were previously started as part of loading the endpoint. This is potentially too soon, as resources used by these goroutine may not have been loaded. Protocol goroutines may perform meaningful work as soon as they're started (ex: incoming connect) which can cause them to indirectly access resources that haven't been loaded yet. This CL defers resuming all protocol goroutines until the end of restore. PiperOrigin-RevId: 262409429	2019-08-08 12:33:11 -07:00
Bhasker Hariharan	dfbc0b0a4c	Fix for a panic due to writing to a closed accept channel. This can happen because endpoint.Close() closes the accept channel first and then drains/resets any accepted but not delivered connections. But there can be connections that are connected but not delivered to the channel as the channel was full. But closing the channel can cause these writes to fail with a write to a closed channel. The correct solution is to abort any connections in SYN-RCVD state and drain/abort all completed connections before closing the accept channel. PiperOrigin-RevId: 261951132	2019-08-06 11:01:27 -07:00
Kevin Krakauer	810cc07aab	Plumbing for iptables sockopts. PiperOrigin-RevId: 261413396	2019-08-02 16:26:48 -07:00
Rahat Mahmood	2906dffcdb	Automated rollback of changelist 261191548 PiperOrigin-RevId: 261373749	2019-08-02 12:52:40 -07:00
Rahat Mahmood	79511e8a50	Implement getsockopt(TCP_INFO). Export some readily-available fields for TCP_INFO and stub out the rest. PiperOrigin-RevId: 261191548	2019-08-01 13:58:48 -07:00
Tamir Duberstein	12c256568b	Deduplicate EndpointState.connected some This fixes a bug introduced in cl/251934850 that caused connect-accept-close-connect races to result in the second connect call failiing when it should have succeeded. PiperOrigin-RevId: 259584525	2019-07-23 12:10:18 -07:00
Andrei Vagin	eefa817cfd	net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ) PiperOrigin-RevId: 258859507	2019-07-18 15:41:04 -07:00
gVisor bot	74dc663bbb	Internal change. PiperOrigin-RevId: 258424489	2019-07-16 13:03:37 -07:00
Bhasker Hariharan	6116473b2f	Stub out support for TCP_MAXSEG. Adds support to set/get the TCP_MAXSEG value but does not really change the segment sizes emitted by netstack or alter the MSS advertised by the endpoint. This is currently being added only to unblock iperf3 on gVisor. Plumbing this correctly requires a bit more work which will come in separate CLs. PiperOrigin-RevId: 257859112	2019-07-12 13:35:17 -07:00
Andrei Vagin	116cac053e	netstack/udp: connect with the AF_UNSPEC address family means disconnect PiperOrigin-RevId: 256433283	2019-07-03 14:19:02 -07:00
Bhasker Hariharan	c1761378a9	Fix the logic for sending zero window updates. Today we have the logic split in two places between endpoint Read() and the worker goroutine which actually sends a zero window. This change makes it so that when a zero window ACK is sent we set a flag in the endpoint which can be read by the endpoint to decide if it should notify the worker to send a nonZeroWindow update. The worker now does not do the check again but instead sends an ACK and flips the flag right away. Similarly today when SO_RECVBUF is set the SetSockOpt call has logic to decide if a zero window update is required. Rather than do that we move the logic to the worker goroutine and it can check the zeroWindow flag and send an update if required. PiperOrigin-RevId: 254505447	2019-06-21 18:31:31 -07:00
Bhasker Hariharan	3d71c627fa	Add support for TCP receive buffer auto tuning. The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283	2019-06-13 22:28:01 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00
Bhasker Hariharan	70578806e8	Add support for TCP_CONGESTION socket option. This CL also cleans up the error returned for setting congestion control which was incorrectly returning EINVAL instead of ENOENT. PiperOrigin-RevId: 252889093	2019-06-12 13:35:50 -07:00
Bhasker Hariharan	3933dd5c04	Fixes to listen backlog handling. Changes netstack to confirm to current linux behaviour where if the backlog is full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto backlog connections to be in SYN-RCVD state as long as the backlog is not full. We also now drop a SYN if syn cookies are in use and the backlog for the listening endpoint is full. Added new tests to confirm the behaviour. Also reverted the change to increase the backlog in TcpPortReuseMultiThread syscall test. Fixes #236 PiperOrigin-RevId: 252500462	2019-06-10 15:40:44 -07:00
Rahat Mahmood	2d2831e354	Track and export socket state. This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850	2019-06-06 15:04:47 -07:00
Andrei Vagin	a12848ffeb	netstack/tcp: fix calculating a number of outstanding packets In case of GSO, a segment can container more than one packet and we need to use the pCount() helper to get a number of packets. PiperOrigin-RevId: 251743020	2019-06-05 16:30:45 -07:00
Bhasker Hariharan	e0fb921205	Fix data race in synRcvdState. When checking the length of the acceptedChan we should hold the endpoint mutex otherwise a syn received while the listening socket is being closed can result in a data race where the cleanupLocked routine sets acceptedChan to nil while a handshake goroutine in progress could try and check it at the same time. PiperOrigin-RevId: 251537697	2019-06-04 16:17:24 -07:00
Bhasker Hariharan	bfe3220992	Delete debug log lines left by mistake. Updates #236 PiperOrigin-RevId: 251337915	2019-06-03 17:00:18 -07:00
Bhasker Hariharan	3577a4f691	Disable certain tests that are flaky under race detector. PiperOrigin-RevId: 250976665	2019-05-31 16:19:49 -07:00
Bhasker Hariharan	033f96cc93	Change segment queue limit to be of fixed size. Netstack sets the unprocessed segment queue size to match the receive buffer size. This is not required as this queue only needs to hold enough for a short duration before the endpoint goroutine can process it. Updates #230 PiperOrigin-RevId: 250976323	2019-05-31 16:17:33 -07:00
Bhasker Hariharan	ae26b2c425	Fixes to TCP listen behavior. Netstack listen loop can get stuck if cookies are in-use and the app is slow to accept incoming connections. Further we continue to complete handshake for a connection even if the backlog is full. This creates a problem when a lots of connections come in rapidly and we end up with lots of completed connections just hanging around to be delivered. These fixes change netstack behaviour to mirror what linux does as described here in the following article http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html Now when cookies are not in-use Netstack will silently drop the ACK to a SYN-ACK and not complete the handshake if the backlog is full. This will result in the connection staying in a half-complete state. Eventually the sender will retransmit the ACK and if backlog has space we will transition to a connected state and deliver the endpoint. Similarly when cookies are in use we do not try and create an endpoint unless there is space in the accept queue to accept the newly created endpoint. If there is no space then we again silently drop the ACK as we can just recreate it when the ACK is retransmitted by the peer. We also now use the backlog to cap the size of the SYN-RCVD queue for a given endpoint. So at any time there can be N connections in the backlog and N in a SYN-RCVD state if the application is not accepting connections. Any new SYNs will be dropped. This CL also fixes another small bug where we mark a new endpoint which has not completed handshake as connected. We should wait till handshake successfully completes before marking it connected. Updates #236 PiperOrigin-RevId: 250717817	2019-05-30 12:08:41 -07:00
Kevin Krakauer	c1cdf18e7b	UDP and TCP raw socket support. PiperOrigin-RevId: 249511348 Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b	2019-05-22 13:45:15 -07:00
Googler	f2699b76c8	Support IPv4 fragmentation in netstack Testing: Unit tests and also large ping in Fuchsia OS PiperOrigin-RevId: 246563592 Change-Id: Ia12ab619f64f4be2c8d346ce81341a91724aef95	2019-05-03 13:30:35 -07:00
Bhasker Hariharan	458fe955a7	Implement support for SACK based recovery(RFC 6675). PiperOrigin-RevId: 246536003 Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9	2019-05-03 10:51:18 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Kevin Krakauer	43dff57b87	Make raw sockets a toggleable feature disabled by default. PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615	2019-04-26 16:51:46 -07:00
Ben Burkert	56927e5317	tcpip/transport/tcp: read side only shutdown of an endpoint Support shutdown on only the read side of an endpoint. Reads performed after a call to Shutdown with only the ShutdownRead flag will return ErrClosedForReceive without data. Break out the shutdown(2) with SHUT_RD syscall test into to two tests. The first tests that no packets are sent when shutting down the read side of a socket. The second tests that, after shutting down the read side of a socket, unread data can still be read, or an EOF if there is no more data to read. Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce PiperOrigin-RevId: 244459430	2019-04-19 19:29:05 -07:00
Andrei Vagin	4524790ff6	netstack: use a proper network protocol to set gso.L3HdrLen It is possible to create a listening socket which will accept IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber for all accepted endpoints, even if they handle IPv4 connections. This means that we can't use endpoint.netProto to set gso.L3HdrLen. PiperOrigin-RevId: 244227948 Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1	2019-04-18 11:42:23 -07:00
Bhasker Hariharan	eaac2806ff	Add TCP checksum verification. PiperOrigin-RevId: 242704699 Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f	2019-04-09 11:23:47 -07:00
Kevin Krakauer	52a51a8e20	Add a raw socket transport endpoint and use it for raw ICMP sockets. Having raw socket code together will make it easier to add support for other raw network protocols. Currently, only ICMP uses the raw endpoint. However, adding support for other protocols such as UDP shouldn't be much more difficult than adding a few switch cases. PiperOrigin-RevId: 241564875 Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf	2019-04-02 11:13:49 -07:00
Bhasker Hariharan	45c54b1f4e	Fix incorrect checksums in TCP and UDP tests. PiperOrigin-RevId: 241025361 Change-Id: I292e7aea9a4b294b11e4f736e107010d9524586b	2019-03-29 12:05:43 -07:00
Bhasker Hariharan	cc0e96a4bd	Fix Panic in SACKScoreboard.Delete. The panic was caused by modifying the tree while iterating which invalidated the iterator. Also fixes another bug in SACKScoreboard.Insert() which was causing blocks to be merged incorrectly. PiperOrigin-RevId: 240895053 Change-Id: Ia72b8244297962df5c04283346da5226434740af	2019-03-28 18:18:39 -07:00
Andrei Vagin	f4105ac21a	netstack/fdbased: add generic segmentation offload (GSO) support The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 240809103 Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a	2019-03-28 11:03:41 -07:00
Andrei Vagin	654e878abb	netstack: Don't exclude length when a pseudo-header checksum is calculated This is a preparation for GSO changes (cl/234508902). RELNOTES[gofers]: Refactor checksum code to include length, which it already did, but in a convoluted way. Should be a no-op. PiperOrigin-RevId: 240460794 Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2	2019-03-26 17:15:13 -07:00
Andrei Vagin	9f4e1cb797	netstack: adjust the sequence number after trimming the packet PiperOrigin-RevId: 239417224 Change-Id: I14a9adc31a6330a79a6156c105969cd5f1f63d20	2019-03-20 09:58:10 -07:00
Andrei Vagin	87cce0ec08	netstack: reduce MSS from SYN to account tcp options See: https://tools.ietf.org/html/rfc6691#section-2 PiperOrigin-RevId: 239305632 Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e	2019-03-19 17:33:20 -07:00
Tamir Duberstein	5496be7c5d	Remove duplicate TCP flag definitions PiperOrigin-RevId: 238467634 Change-Id: If4cd8efff7386fbee1195f051d15549b495910a9	2019-03-14 10:19:21 -07:00
Ian Gudger	56a6128295	Implement IP_MULTICAST_LOOP. IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default route are looped back. In order to implement this switch, support for sending and looping back multicast packets on the default route had to be implemented. For now we only support IPv4 multicast. PiperOrigin-RevId: 237534603 Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c	2019-03-08 15:49:17 -08:00
Bhasker Hariharan	1718fdd1a8	Add new retransmissions and recovery related metrics. PiperOrigin-RevId: 236945145 Change-Id: I051760d95154ea5574c8bb6aea526f488af5e07b	2019-03-05 16:41:44 -08:00
Kevin Krakauer	23e66ee96d	Remove unused commit() function argument to Bind. PiperOrigin-RevId: 236926132 Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181	2019-03-05 14:53:34 -08:00
Kevin Krakauer	121db29a93	Ping support via IPv4 raw sockets. Broadly, this change: * Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`. * Passes the network-layer (IP) header up the stack to the transport endpoint, which can pass it up to the socket layer. This allows a raw socket to return the entire IP packet to users. * Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer that enable incoming packets to be delivered to raw endpoints. New raw sockets of other protocols (not ICMP) just need to register with the stack. * Enables ping.endpoint to return IP headers when created via SOCK_RAW. PiperOrigin-RevId: 235993280 Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a	2019-02-27 14:31:21 -08:00
Bhasker Hariharan	26be25e4ec	Add a SACK scoreboard to TCP endpoints. This change does not make use of SACK information but adds support to track SACK information and store it in the endpoint. The actual SACK based recovery will be in a separate CL. Part of commits to add RFC 6675 support to Netstack. PiperOrigin-RevId: 235612264 Change-Id: I261f94844d7bad5abda803152ce6cc6125a467ff	2019-02-25 15:20:04 -08:00

1 2 3

130 Commits