Commit Graph

482 Commits

Author SHA1 Message Date
Ghanan Gowripalan e63db5e7bb Discover default routers from Router Advertisements
This change allows the netstack to do NDP's Router Discovery as outlined by
RFC 4861 section 6.3.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that Router Discovery
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change introduces 2 options required to take advantage of Router Discovery,
all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- DiscoverDefaultRouters: Whether or not Router Discovery is performed

Another note: for a NIC to process Router Advertisements, it must not be a
router itself. Currently the netstack does not have per-interface routing
configuration; the routing/forwarding configuration is controlled stack-wide.
Therefore, if the stack is configured to enable forwarding/routing, no Router
Advertisements will be processed.

Tests: Unittest to make sure that Router Discovery and updates to the routing
table only occur if explicitly configured to do so. Unittest to make sure at
max stack.MaxDiscoveredDefaultRouters discovered default routers are remembered.
PiperOrigin-RevId: 278965143
2019-11-06 16:29:58 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Ghanan Gowripalan d0d89ceedd Send a TCP RST in response to a TCP SYN-ACK on a listening endpoint
This change better follows what is outlined in RFC 793 section 3.4 figure 12
where a listening socket should not accept a SYN-ACK segment in response to a
(potentially) old SYN segment.

Tests: Test that checks the TCP RST segment sent in response to a TCP SYN-ACK
segment received on a listening TCP endpoint.
PiperOrigin-RevId: 278893114
2019-11-06 10:44:20 -08:00
Ghanan Gowripalan a824b48cea Validate incoming NDP Router Advertisements, as per RFC 4861 section 6.1.2
This change validates incoming NDP Router Advertisements as per RFC 4861 section
6.1.2. It also includes the skeleton to handle Router Advertiements that arrive
on some NIC.

Tests: Unittest to make sure only valid NDP Router Advertisements are received/
not dropped.
PiperOrigin-RevId: 278891972
2019-11-06 10:39:29 -08:00
Kevin Krakauer 3246040447 Deep copy dispatcher views.
When VectorisedViews were passed up the stack from packet_dispatchers, we were
passing a sub-slice of the dispatcher's views fields. The dispatchers then
immediately set those views to nil.

This wasn't caught before because every implementer copied the data in these
views before returning.

PiperOrigin-RevId: 277615351
2019-10-30 17:12:57 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Ian Gudger dc21c5ca16 Add Close and Wait methods to stack.
Link endpoints still don't have a unified way to be requested to stop.

Updates #837

PiperOrigin-RevId: 277398952
2019-10-29 17:22:32 -07:00
Ian Gudger a2c51efe36 Add endpoint tracking to the stack.
In the future this will replace DanglingEndpoints. DanglingEndpoints must be
kept for now due to issues with save/restore.

This is arguably a cleaner design and allows the stack to know which transport
endpoints might still be using its link endpoints.

Updates #837

PiperOrigin-RevId: 277386633
2019-10-29 16:14:51 -07:00
Michael Pratt c0b8fd4b6a Update build tags to allow Go 1.14
Currently there are no ABI changes. We should check again closer to release.

PiperOrigin-RevId: 277349744
2019-10-29 13:18:16 -07:00
Ian Gudger 7d80e85835 Allow waiting for Endpoint worker goroutines to finish.
Updates #837

PiperOrigin-RevId: 277325162
2019-10-29 11:32:48 -07:00
Ghanan Gowripalan 41e2df1bde Support iterating an NDP options buffer.
This change helps support iterating over an NDP options buffer so that
implementations can handle all the NDP options present in an NDP packet.

Note, this change does not yet actually handle these options, it just provides
the tools to do so (in preparation for NDP's Prefix, Parameter, and a complete
implementation of Neighbor Discovery).

Tests: Unittests to make sure we can iterate over a valid NDP options buffer
that may contain multiple options. Also tests to check an iterator before
using it to see if the NDP options buffer is malformed.
PiperOrigin-RevId: 277312487
2019-10-29 10:30:21 -07:00
Ghanan Gowripalan 0864549ecc Use the user supplied TCP MSS when creating a new active socket
This change supports using a user supplied TCP MSS for new active TCP
connections. Note, the user supplied MSS must be less than or equal to the
maximum possible MSS for a TCP connection's route. If it is greater than the
maximum possible MSS, the maximum possible MSS will be used as the connection's
MSS instead.

This change does not use this user supplied MSS for connections accepted from
listening sockets - that will come in a later change.

Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user
supplied MSS if it is not greater than the maximum possible MSS for the route.
PiperOrigin-RevId: 277185125
2019-10-28 18:20:36 -07:00
Ghanan Gowripalan 5a421058a0 Validate the checksum for incoming ICMPv6 packets
This change validates the ICMPv6 checksum field before further processing an
ICMPv6 packet.

Tests: Unittests to make sure that only ICMPv6 packets with a valid checksum
are accepted/processed. Existing tests using checker.ICMPv6 now also check the
ICMPv6 checksum field.
PiperOrigin-RevId: 276779148
2019-10-25 16:06:55 -07:00
Ian Gudger 8f029b3f82 Convert DelayOption to the newer/faster SockOpt int type.
DelayOption is set on all new endpoints in gVisor.

PiperOrigin-RevId: 276746791
2019-10-25 13:15:34 -07:00
Ghanan Gowripalan 27e896f290 Add a type to represent the NDP Prefix Information option.
This change is in preparation for NDP Prefix Discovery and SLAAC where the stack
will need to handle NDP Prefix Information options.

Tests: Test that given an NDP Prefix Information option buffer, correct values
are returned by the field getters.
PiperOrigin-RevId: 276594592
2019-10-24 16:53:08 -07:00
Ghanan Gowripalan e50a1f5739 Remove the amss field from tcpip.tcp.handshake as it was unused
The amss field in the tcpip.tcp.handshake was not used anywhere. Removed it to
not cause confusion with the amss field in the tcpip.tcp.endpoint struct, which
was documented to be used (and is actually being used) for the same purpose.

PiperOrigin-RevId: 276577088
2019-10-24 15:23:43 -07:00
Ghanan Gowripalan f034790ad8 Use interface-specific NDP configurations instead of the stack-wide default.
This change makes it so that NDP work is done using the per-interface NDP
configurations instead of the stack-wide default NDP configurations to correctly
implement RFC 4861 section 6.3.2 (note here, a host is a single NIC operating
as a host device), and RFC 4862 section 5.1.

Test: Test that we can set NDP configurations on a per-interface basis without
affecting the configurations of other interfaces or the stack-wide default. Also
make sure that after the configurations are updated, the updated configurations
are used for NDP processes (e.g. Duplicate Address Detection).
PiperOrigin-RevId: 276525661
2019-10-24 11:09:18 -07:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Ghanan Gowripalan de3dbf8a09 Inform netstack integrator when Duplicate Address Detection completes
This change introduces a new interface, stack.NDPDispatcher. It can be
implemented by the netstack integrator to receive NDP related events. As of this
change, only DAD related events are supported.

Tests: Existing tests were modified to use the NDPDispatcher's DAD events for
DAD tests where it needed to wait for DAD completing (failing and resolving).
PiperOrigin-RevId: 276338733
2019-10-23 13:26:35 -07:00
Ghanan Gowripalan 515e0558d4 Add a type to represent the NDP Router Advertisement message.
This change is in preparation for NDP Router Discovery where the stack will need
to handle NDP Router Advertisments.

Tests: Test that given an NDP Router Advertisement buffer (body of an ICMPv6
packet, correct values are returned by the field getters).
PiperOrigin-RevId: 276146817
2019-10-22 14:41:51 -07:00
Ghanan Gowripalan c356fe2ebb Respect new PrimaryEndpointBehavior when addresses gets promoted to permanent
This change makes sure that when an address which is already known by a NIC and
has kind = permanentExpired gets promoted to permanent, the new
PrimaryEndpointBehavior is respected.

PiperOrigin-RevId: 276136317
2019-10-22 13:54:33 -07:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Ghanan Gowripalan fb69de696b Auto-generate an IPv6 link-local address based on the NIC's MAC Address.
This change adds support for optionally auto-generating an IPv6 link-local
address based on the NIC's MAC Address on NIC enable.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that a link-local
address will not be auto-generated unless the stack is explicitly configured.
See `stack.Options` for more details. Specifically, see
`stack.Options.AutoGenIPv6LinkLocal`.

Tests: Tests to make sure that the IPb6 link-local address is only
auto-generated if the stack is specifically configured to do so. Also tests to
make sure that an auto-generated address goes through the DAD process.
PiperOrigin-RevId: 276059813
2019-10-22 07:26:54 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Tamir Duberstein 51538c973e Store primary endpoints in a slice
There's no need for a linked list here.

PiperOrigin-RevId: 275565920
2019-10-18 16:14:09 -07:00
Mithun Iyer 487d3b2358 Fix typo while initializing protocol for UDP endpoints.
Fixes #763

PiperOrigin-RevId: 275563222
2019-10-18 16:00:11 -07:00
Tamir Duberstein 4e6f3a0c71 Remove restrictions on the sending address
It is quite legal to send from the ANY address (it is required for
DHCP). I can't figure out why the broadcast address was included here,
so removing that as well.

PiperOrigin-RevId: 275541954
2019-10-18 14:10:30 -07:00
Ghanan Gowripalan 962aa235de NDP Neighbor Solicitations sent during DAD must have an IP hop limit of 255
NDP Neighbor Solicitations sent during Duplicate Address Detection must have an
IP hop limit of 255, as all NDP Neighbor Solicitations should have.

Test: Test that DAD messages have the IPv6 hop limit field set to 255.
PiperOrigin-RevId: 275321680
2019-10-17 13:06:15 -07:00
Ghanan Gowripalan 06ed9e329d Do Duplicate Address Detection on permanent IPv6 addresses.
This change adds support for Duplicate Address Detection on IPv6 addresses
as defined by RFC 4862 section 5.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that DAD will not be
performed. See `stack.Options` and `stack.NDPConfigurations` for more details.

Tests: Tests to make sure that the DAD process properly resolves or fails.
That is, tests make sure that DAD resolves only if:
  - No other node is performing DAD for the same address
  - No other node owns the same address
PiperOrigin-RevId: 275189471
2019-10-16 22:54:45 -07:00
Bhasker Hariharan f98c3ee32c Remove panic when reassembly fails.
Reassembly can fail due to an invalid sequence of fragments
being received. eg. Multiple fragments with same id which
claim to be the last one by setting the more flag to 0 etc.
It's safer to just drop the reassembler and increment a metric
than to panic when reassembly fails.

PiperOrigin-RevId: 274920901
2019-10-15 17:04:44 -07:00
Tamir Duberstein db1ca5c786 Set NDP hop limit in accordance with RFC 4861
...and do not populate link address cache at dispatch. This partially
reverts 313c767b00, which caused malformed
packets (e.g. NDP Neighbor Adverts with incorrect hop limit values) to
populate the address cache. In particular, this masked a bug that was
introduced to the Neighbor Advert generation code in
7c1587e340.

PiperOrigin-RevId: 274865182
2019-10-15 12:43:25 -07:00
Jianfeng Tan d277bfba27 epsocket: support /proc/net/snmp
Netstack has its own stats, we use this to fill /proc/net/snmp.

Note that some metrics are not recorded in Netstack, which will be shown
as 0 in the proc file.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15 16:38:41 +00:00
Jianfeng Tan aee2c93366 netstack: add counters for tcp CurrEstab and EstabResets
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-15 16:38:40 +00:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Kevin Krakauer 2302afb53d Reorder BUILD license and load functions in netstack.
PiperOrigin-RevId: 274672346
2019-10-14 15:21:59 -07:00
Bhasker Hariharan a296425970 Use a different fanoutID for each new fdbased endpoint.
PiperOrigin-RevId: 274638272
2019-10-14 13:10:16 -07:00
Bhasker Hariharan c7e901f47a Fix bugs in fragment handling.
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.

PiperOrigin-RevId: 274049313
2019-10-10 15:14:55 -07:00
gVisor bot bf870c1a42 Internal change.
PiperOrigin-RevId: 273861936
2019-10-09 17:56:05 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
Bhasker Hariharan bcbb3ef317 Add a Stringer implementation to PacketDispatchMode
PiperOrigin-RevId: 272083936
2019-09-30 15:52:55 -07:00
Bhasker Hariharan 61f6fbd0ce Fix bugs in PickEphemeralPort for TCP.
Netstack always picks a random start point everytime PickEphemeralPort
is called. While this is required for UDP so that DNS requests go
out through a randomized set of ports it is not required for TCP. Infact
Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret
initialized at start of the application to get a random offset. But to
ensure it doesn't start from the same point on every scan it uses a static
hint that is incremented by 2 in every call to pick ephemeral ports.

The reason for 2 is Linux seems to split the port ranges where active connects
seem to use even ones while odd ones are used by listening sockets.

This CL implements a similar strategy where we use a hash + hint to generate
the offset to start the search for a free Ephemeral port.

This ensures that we cycle through the available port space in order for
repeated connects to the same destination and significantly reduces the
chance of picking a recently released port.

PiperOrigin-RevId: 272058370
2019-09-30 13:55:22 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Chris Kuiper 6704d625ef Return only primary addresses in Stack.NICInfo()
Non-primary addresses are used for endpoints created to accept multicast and
broadcast packets, as well as "helper" endpoints (0.0.0.0) that allow sending
packets when no proper address has been assigned yet (e.g., for DHCP). These
addresses are not real addresses from a user point of view and should not be
part of the NICInfo() value. Also see b/127321246 for more info.

This switches NICInfo() to call a new NIC.PrimaryAddresses() function. To still
allow an option to get all addresses (mostly for testing) I added
Stack.GetAllAddresses() and NIC.AllAddresses().

In addition, the return value for GetMainNICAddress() was changed for the case
where the NIC has no primary address. Instead of returning an error here,
it now returns an empty AddressWithPrefix() value. The rational for this
change is that it is a valid case for a NIC to have no primary addresses.

Lastly, I refactored the code based on the new additions.

PiperOrigin-RevId: 270971764
2019-09-24 13:21:20 -07:00
Tamir Duberstein bbaaa1fcc2 Simplify ICMPRateLimiter
https://github.com/golang/time/commit/c4c64ca added SetBurst upstream.

PiperOrigin-RevId: 270925077
2019-09-24 09:50:51 -07:00
Andrei Vagin 03ee55cc62 netstack: convert more socket options to {Set,Get}SockOptInt
PiperOrigin-RevId: 270763208
2019-09-23 14:39:14 -07:00
Ian Gudger 002f1d4aae Allow waiting for LinkEndpoint worker goroutines to finish.
Previously, the only safe way to use an fdbased endpoint was to leak the FD.
This change makes it possible to safely close the FD.

This is the first step towards having stoppable stacks.

Updates #837

PiperOrigin-RevId: 270346582
2019-09-20 14:10:02 -07:00
Ghanan Gowripalan 60fe8719e1 Automated rollback of changelist 268047073
PiperOrigin-RevId: 269658971
2019-09-17 14:47:09 -07:00
Ian Gudger 747320a7aa Update remaining users of LinkEndpoints to not refer to them as an ID.
PiperOrigin-RevId: 269614517
2019-09-17 11:31:00 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Ghanan Gowripalan 857940d30d Automated rollback of changelist 268047073
PiperOrigin-RevId: 268757842
2019-09-12 13:52:25 -07:00
Ian Gudger 9dfcd8b09f Fix ephemeral port leak.
Fix a bug where udp.(*endpoint).Disconnect [accessible in gVisor via
epsocket.(*SocketOperations).Connect with AF_UNSPEC] would leak a port
reservation if the socket/endpoint had an ephemeral port assigned to it.

glibc's getaddrinfo uses connect with AF_UNSPEC, causing each call of
getaddrinfo to leak a port. Call getaddrinfo too many times and you run out of
ports (shows up as connect returning EAGAIN and getaddrinfo returning
EAI_NONAME "Name or service not known").

PiperOrigin-RevId: 268071160
2019-09-09 14:02:00 -07:00
Ghanan Gowripalan a8943325db Join IPv6 all-nodes and solicited-node multicast addresses where appropriate.
The IPv6 all-nodes multicast address will be joined on NIC enable, and the
appropriate IPv6 solicited-node multicast address will be joined when IPv6
addresses are added.

Tests: Test receiving packets destined to the IPv6 link-local all-nodes
multicast address and the IPv6 solicted node address of an added IPv6 address.
PiperOrigin-RevId: 268047073
2019-09-09 12:06:06 -07:00
Ian Gudger fe1f521077 Remove reundant global tcpip.LinkEndpointID.
PiperOrigin-RevId: 267709597
2019-09-06 18:01:14 -07:00
Bhasker Hariharan 3dc3cffb2d Fix RST generation bugs.
There are a few cases addressed by this change

- We no longer generate a RST in response to a RST packet.

- When we receive a RST we cleanup and release all reservations immediately as
  the connection is now aborted.

- An ACK received by a listening socket generates a RST when SYN cookies are not
  in-use. The only reason an ACK should land at the listening socket is if we
  are using SYN cookies otherwise the goroutine for the handshake in progress
  should have gotten the packet and it should never have arrived at the
  listening endpoint.

- Also fixes the error returned when a connection times out due to a
  Keepalive timer expiration from ECONNRESET to a ETIMEDOUT.

PiperOrigin-RevId: 267238427
2019-09-04 14:59:53 -07:00
Chris Kuiper 7bf1d426d5 Handle subnet and broadcast addresses correctly with NIC.subnets
This also renames "subnet" to "addressRange" to avoid any more confusion with
an interface IP's subnet.

Lastly, this also removes the Stack.ContainsSubnet(..) API since it isn't used
by anyone. Plus the same information can be obtained from
Stack.NICAddressRanges().

PiperOrigin-RevId: 267229843
2019-09-04 14:19:32 -07:00
Ghanan Gowripalan 144127e5e1 Validate IPv6 Hop Limit field for received NDP packets
Make sure that NDP packets are only received if their IP header's hop limit
field is set to 255, as per RFC 4861.

PiperOrigin-RevId: 267061457
2019-09-03 18:43:12 -07:00
Bhasker Hariharan 3789c34b22 Make UDP traceroute work.
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.

Fixes #703

PiperOrigin-RevId: 267034418
2019-09-03 16:01:17 -07:00
Haibo Xu fa151e3971 Remove duplicated file in pkg/tcpip/link/rawfile.
The blockingpoll_unsafe.go was copied to blockingpoll_noyield_unsafe.go
during merging commit 7206202bb9. If it still stay here, it would
cause build errors on non-amd64 platform.

ERROR:
pkg/tcpip/link/rawfile/BUILD:5:1:
GoCompilePkg
pkg/tcpip/link/rawfile.a
failed (Exit 1) builder failed: error executing command
bazel-out/host/bin/external/go_sdk/builder compilepkg -sdk
external/go_sdk -installsuffix linux_arm64 -src
pkg/tcpip/link/rawfile/blockingpoll_noyield_unsafe.go -src ...
(remaining 33 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
compilepkg: error running subcommand: exit status 2
pkg/tcpip/link/rawfile/blockingpoll_yield_unsafe.go:35:6:
BlockingPoll redeclared in this block
        previous declaration at
pkg/tcpip/link/rawfile/blockingpoll_unsafe.go:26:78
Target //pkg/tcpip/link/rawfile:rawfile failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 25.531s, Critical Path: 21.08s
INFO: 262 processes: 262 linux-sandbox.
FAILED: Build did NOT complete successfully

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I4e21f82984225d0aa173de456f7a7c66053a053e
2019-09-02 02:49:41 +00:00
Chris Kuiper afbdf2f212 Fix data race accessing referencedNetworkEndpoint.kind
Wrapping "kind" into atomic access functions.

Fixes #789

PiperOrigin-RevId: 266485501
2019-08-30 17:23:53 -07:00
Rahat Mahmood 863e11ac4d Implement /proc/net/udp.
PiperOrigin-RevId: 266229756
2019-08-29 14:30:41 -07:00
Tamir Duberstein 24ecce5dbf Export generated linkAddrEntryEntry
PiperOrigin-RevId: 266000128
2019-08-28 14:56:33 -07:00
Tamir Duberstein 313c767b00 Populate link address cache at dispatch
This allows the stack to learn remote link addresses on incoming
packets, reducing the need to ARP to send responses.

This also reduces the number of round trips to the system clock,
since that may also prove to be performance-sensitive.

Fixes #739.

PiperOrigin-RevId: 265815816
2019-08-27 18:54:56 -07:00
Rahat Mahmood 1fdefd41c5 netstack/tcp: Add LastAck transition.
Add missing state transition to LastAck, which should happen when the
endpoint has already recieved a FIN from the remote side, and is
sending its own FIN.

PiperOrigin-RevId: 265568314
2019-08-26 16:39:13 -07:00
gVisor bot 7206202bb9 Merge pull request #696 from xiaobo55x:tcpip_link
PiperOrigin-RevId: 265534854
2019-08-26 14:03:30 -07:00
Chris Kuiper ac2200b8a9 Prevent a network endpoint to send/rcv if its address was removed
This addresses the problem where an endpoint has its address removed but still
has outstanding references held by routes used in connected TCP/UDP sockets
which prevent the removal of the endpoint.

The fix adds a new "expired" flag to the referenced network endpoint, which is
set when an endpoint has its address removed. Incoming packets are not
delivered to an expired endpoint (unless in promiscuous mode), while sending
outgoing packets triggers an error to the caller (unless in spoofing mode).

In addition, a few helper functions were added to stack_test.go to reduce
code duplications.

PiperOrigin-RevId: 265514326
2019-08-26 12:29:47 -07:00
Tamir Duberstein e75a12e89d Implement fmt.Stringer on Route by value
This is more convenient, since it implements the interface for both
value and pointer.

PiperOrigin-RevId: 265086510
2019-08-23 10:44:11 -07:00
Chris Kuiper 8d9276ed56 Support binding to multicast and broadcast addresses
This fixes the issue of not being able to bind to either a multicast or
broadcast address as well as to send and receive data from it. The way to solve
this is to treat these addresses similar to the ANY address and register their
transport endpoint ID with the global stack's demuxer rather than the NIC's.
That way there is no need to require an endpoint with that multicast or
broadcast address. The stack's demuxer is in fact the only correct one to use,
because neither broadcast- nor multicast-bound sockets care which NIC a
packet was received on (for multicast a join is still needed to receive packets
on a NIC).

I also took the liberty of refactoring udp_test.go to consolidate a lot of
duplicate code and make it easier to create repetitive tests that test the same
feature for a variety of packet and socket types. For this purpose I created a
"flowType" that represents two things: 1) the type of packet being sent or
received and 2) the type of socket used for the test. E.g., a "multicastV4in6"
flow represents a V4-mapped multicast packet run through a V6-dual socket.

This allows writing significantly simpler tests. A nice example is testTTL().

PiperOrigin-RevId: 264766909
2019-08-21 22:54:25 -07:00
Tamir Duberstein 573e6e4bba Use tcpip.Subnet in tcpip.Route
This is the first step in replacing some of the redundant types with the
standard library equivalents.

PiperOrigin-RevId: 264706552
2019-08-21 15:31:18 -07:00
Chris Kuiper 7e79ca0225 Add tcpip.Route.String and tcpip.AddressMask.Prefix
PiperOrigin-RevId: 264544163
2019-08-20 23:28:52 -07:00
gVisor bot 3ffbdffd7e Internal change.
PiperOrigin-RevId: 264218306
2019-08-19 12:43:22 -07:00
Andrei Vagin 3e4102b2ea netstack: disconnect an unix socket only if the address family is AF_UNSPEC
Linux allows to call connect for ANY and the zero port.

PiperOrigin-RevId: 263892534
2019-08-16 19:32:14 -07:00
Chris Kuiper f7114e0a27 Add subnet checking to NIC.findEndpoint and consolidate with NIC.getRef
This adds the same logic to NIC.findEndpoint that is already done in
NIC.getRef. Since this makes the two functions very similar they were combined
into one with the originals being wrappers.

PiperOrigin-RevId: 263864708
2019-08-16 15:58:58 -07:00
Tamir Duberstein fe74bba2bd Don't dereference errors passed to panic()
These errors are always pointers; there's no sense in dereferencing them
in the panic call. Changed one false positive for clarity.

PiperOrigin-RevId: 263611579
2019-08-15 11:58:16 -07:00
Tamir Duberstein 816a9211e9 netstack: move resumption logic into *_state.go
13a98df rearranged some of this code in a way that broke compilation of
the netstack-only export at github.com/google/netstack because
*_state.go files are not included in that export.

This commit moves resumption logic back into *_state.go, fixing the
compilation breakage.

PiperOrigin-RevId: 263601629
2019-08-15 11:13:46 -07:00
Haibo Xu 1b1e39d7a1 Enabling pkg/tcpip/link support on arm64.
Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: Ib6b4aa2db19032e58bf0395f714e6883caee460a
2019-08-15 03:19:30 +00:00
Haibo Xu 52843719ca Rename fdbased/mmap.go to fdbased/mmap_stub.go.
Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: Id4489554b9caa332695df8793d361f8332f6a13b
2019-08-15 03:19:22 +00:00
Haibo Xu 0624858593 Rename rawfile/blockingpoll_unsafe.go to rawfile/blockingpoll_stub_unsafe.go.
Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: I2376e502c1a860d5e624c8a8e3afab5da4c53022
2019-08-15 03:19:14 +00:00
Tamir Duberstein d81d94ac4c Replace uinptr with int64 when returning lengths
This is in accordance with newer parts of the standard library.

PiperOrigin-RevId: 263449916
2019-08-14 16:05:56 -07:00
Tamir Duberstein 69d1414a32 Add tcpip.AddressWithPrefix.String
PiperOrigin-RevId: 263436592
2019-08-14 15:02:14 -07:00
Bhasker Hariharan 570fb1db6b Improve SendMsg performance.
SendMsg before this change would copy all the data over into a
new slice even if the underlying socket could only accept a
small amount of data. This is really inefficient with non-blocking
sockets and under high throughput where large writes could get
ErrWouldBlock or if there was say a timeout associated with the sendmsg()
syscall.

With this change we delay copying bytes in till they are needed and only
copy what can be potentially sent/held in the socket buffer. Reducing
the need to repeatedly copy data over.

Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called
instead of when we transmit the actual FIN. Otherwise the socket could remain in
CONNECTED state even though the user has called shutdown() on the socket.

Updates #627

PiperOrigin-RevId: 263430505
2019-08-14 14:34:27 -07:00
Ian Gudger 99bf75a6dc gonet: Replace NewPacketConn with DialUDP.
This better matches the standard library and allows creating connected
PacketConns.

PiperOrigin-RevId: 263187462
2019-08-13 12:11:09 -07:00
Ian Gudger eac690e358 Fix netstack build error on non-AMD64.
This stub had the wrong function signature.

PiperOrigin-RevId: 262992682
2019-08-12 13:31:16 -07:00
Bhasker Hariharan 5a38eb120a Add congestion control states to sender.
This change just introduces different congestion control states and
ensures the sender.state is updated to reflect the current state
of the connection.

It is not used for any decisions yet but this is required before
algorithms like Eiffel/PRR can be implemented.

Fixes #394

PiperOrigin-RevId: 262638292
2019-08-09 14:50:30 -07:00
Rahat Mahmood 13a98df49e netstack: Don't start endpoint goroutines too soon on restore.
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.

This CL defers resuming all protocol goroutines until the end of
restore.

PiperOrigin-RevId: 262409429
2019-08-08 12:33:11 -07:00
Tamir Duberstein 67a3f4039d Set target address in ARP Reply
PiperOrigin-RevId: 262163794
2019-08-07 10:27:43 -07:00
Bhasker Hariharan dfbc0b0a4c Fix for a panic due to writing to a closed accept channel.
This can happen because endpoint.Close() closes the accept channel first and
then drains/resets any accepted but not delivered connections. But there can be
connections that are connected but not delivered to the channel as the channel
was full. But closing the channel can cause these writes to fail with a write to
a closed channel.

The correct solution is to abort any connections in SYN-RCVD state and
drain/abort all completed connections before closing the accept channel.

PiperOrigin-RevId: 261951132
2019-08-06 11:01:27 -07:00
Kevin Krakauer 810cc07aab Plumbing for iptables sockopts.
PiperOrigin-RevId: 261413396
2019-08-02 16:26:48 -07:00
Rahat Mahmood 2906dffcdb Automated rollback of changelist 261191548
PiperOrigin-RevId: 261373749
2019-08-02 12:52:40 -07:00
Rahat Mahmood 79511e8a50 Implement getsockopt(TCP_INFO).
Export some readily-available fields for TCP_INFO and stub out the rest.

PiperOrigin-RevId: 261191548
2019-08-01 13:58:48 -07:00
Austin Kiekintveld 12c4eb294a Fix ICMPv4 EchoReply packet checksum
The checksum was not being reset before being re-calculated and sent out.
This caused the sent checksum to always be `0x0800`.

Fixes #605.

PiperOrigin-RevId: 260965059
2019-07-31 11:26:41 -07:00
Tamir Duberstein c6e6d92cb1 Test connecting UDP sockets to the ANY address
This doesn't currently pass on gVisor.

While I'm here, fix a bug where connecting to the v6-mapped v4 address doesn't
work in gVisor.

PiperOrigin-RevId: 260923961
2019-07-31 07:41:20 -07:00
Tamir Duberstein 7369c63e42 Pass ProtocolAddress instead of its fields
PiperOrigin-RevId: 260803517
2019-07-30 15:06:39 -07:00
Haibo Xu 1decf76471 Change syscall.POLL to syscall.PPOLL.
syscall.POLL is not supported on arm64, using syscall.PPOLL
to support both the x86 and arm64. refs #63

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I2c81a063d3ec4e7e6b38fe62f17a0924977f505e
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/543 from xiaobo55x:master ba598263fd3748d1addd48e4194080aa12085164
PiperOrigin-RevId: 260752049
2019-07-30 11:01:29 -07:00
Chris Kuiper 40e682759f Add support for a subnet prefix length on interface network addresses
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.

PiperOrigin-RevId: 259807693
2019-07-24 13:42:14 -07:00
Tamir Duberstein 12c256568b Deduplicate EndpointState.connected some
This fixes a bug introduced in cl/251934850 that caused
connect-accept-close-connect races to result in the second connect call
failiing when it should have succeeded.

PiperOrigin-RevId: 259584525
2019-07-23 12:10:18 -07:00
Chris Kuiper 0e040ba6e8 Handle interfaceAddr and NIC options separately for IP_MULTICAST_IF
This tweaks the handling code for IP_MULTICAST_IF to ignore the InterfaceAddr
if a NICID is given.

PiperOrigin-RevId: 258982541
2019-07-19 09:29:04 -07:00
Andrei Vagin eefa817cfd net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)
PiperOrigin-RevId: 258859507
2019-07-18 15:41:04 -07:00
gVisor bot 74dc663bbb Internal change.
PiperOrigin-RevId: 258424489
2019-07-16 13:03:37 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Tamir Duberstein 17bab652af Check that IP headers contain correct version
PiperOrigin-RevId: 257888338
2019-07-12 16:19:18 -07:00
Bhasker Hariharan 6116473b2f Stub out support for TCP_MAXSEG.
Adds support to set/get the TCP_MAXSEG value but does not
really change the segment sizes emitted by netstack or
alter the MSS advertised by the endpoint. This is currently
being added only to unblock iperf3 on gVisor. Plumbing
this correctly requires a bit more work which will come
in separate CLs.

PiperOrigin-RevId: 257859112
2019-07-12 13:35:17 -07:00
Andrei Vagin 116cac053e netstack/udp: connect with the AF_UNSPEC address family means disconnect
PiperOrigin-RevId: 256433283
2019-07-03 14:19:02 -07:00
gVisor bot d60ae0ddee Merge pull request #279 from kevinGC:iptables-1-pkg
PiperOrigin-RevId: 256231055
2019-07-02 13:48:06 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Bhasker Hariharan c1761378a9 Fix the logic for sending zero window updates.
Today we have the logic split in two places between endpoint Read() and the
worker goroutine which actually sends a zero window. This change makes it so
that when a zero window ACK is sent we set a flag in the endpoint which can be
read by the endpoint to decide if it should notify the worker to send a
nonZeroWindow update.

The worker now does not do the check again but instead sends an ACK and flips
the flag right away.

Similarly today when SO_RECVBUF is set the SetSockOpt call has logic
to decide if a zero window update is required. Rather than do that we move
the logic to the worker goroutine and it can check the zeroWindow flag
and send an update if required.

PiperOrigin-RevId: 254505447
2019-06-21 18:31:31 -07:00
Brad Burlage ae4ef32b8c Deflake TestSimpleReceive failures due to timeouts
This test will occasionally fail waiting to read a packet. From repeated runs,
I've seen it up to 1.5s for waitForPackets to complete.

PiperOrigin-RevId: 254484627
2019-06-21 15:56:12 -07:00
Bhasker Hariharan 3d71c627fa Add support for TCP receive buffer auto tuning.
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.

This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.

The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf

NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.

Updates #230

PiperOrigin-RevId: 253168283
2019-06-13 22:28:01 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Adin Scannell e352f46478 Minor BUILD file cleanup.
PiperOrigin-RevId: 252918338
2019-06-12 15:59:46 -07:00
Kevin Krakauer 0bbbcafd68 Merge branch 'master' into iptables-1-pkg
Change-Id: I7457a11de4725e1bf3811420c505d225b1cb6943
2019-06-12 15:21:22 -07:00
Bhasker Hariharan 70578806e8 Add support for TCP_CONGESTION socket option.
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.

PiperOrigin-RevId: 252889093
2019-06-12 13:35:50 -07:00
Bhasker Hariharan 3933dd5c04 Fixes to listen backlog handling.
Changes netstack to confirm to current linux behaviour where if the backlog is
full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto
backlog connections to be in SYN-RCVD state as long as the backlog is not full.

We also now drop a SYN if syn cookies are in use and the backlog for the
listening endpoint is full.

Added new tests to confirm the behaviour.

Also reverted the change to increase the backlog in TcpPortReuseMultiThread
syscall test.

Fixes #236

PiperOrigin-RevId: 252500462
2019-06-10 15:40:44 -07:00
Kevin Krakauer 06a83df533 Address more comments.
Change-Id: I83ae1079f3dcba6b018f59ab7898decab5c211d2
2019-06-10 12:43:54 -07:00
Kevin Krakauer 8afbd974da Address Ian's comments.
Change-Id: I7445033b1970cbba3f2ed0682fe520dce02d8fad
2019-06-07 12:54:53 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Bhasker Hariharan 85be01b42d Add multi-fd support to fdbased endpoint.
This allows an fdbased endpoint to have multiple underlying fd's from which
packets can be read and dispatched/written to.

This should allow for higher throughput as well as better scalability of the
network stack as number of connections increases.

Updates #231

PiperOrigin-RevId: 251852825
2019-06-06 08:07:02 -07:00
Andrei Vagin 79f7cb6c1c netstack/sniffer: log GSO attributes
PiperOrigin-RevId: 251788534
2019-06-05 22:51:53 -07:00
Andrei Vagin a12848ffeb netstack/tcp: fix calculating a number of outstanding packets
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.

PiperOrigin-RevId: 251743020
2019-06-05 16:30:45 -07:00
Chris Kuiper d18bb4f38a Adjust route when looping multicast packets
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.

PiperOrigin-RevId: 251739298
2019-06-05 16:08:29 -07:00
Bhasker Hariharan e0fb921205 Fix data race in synRcvdState.
When checking the length of the acceptedChan we should hold the
endpoint mutex otherwise a syn received while the listening socket
is being closed can result in a data race where the cleanupLocked
routine sets acceptedChan to nil while a handshake goroutine
in progress could try and check it at the same time.

PiperOrigin-RevId: 251537697
2019-06-04 16:17:24 -07:00
Bhasker Hariharan bfe3220992 Delete debug log lines left by mistake.
Updates #236

PiperOrigin-RevId: 251337915
2019-06-03 17:00:18 -07:00
Bhasker Hariharan 3577a4f691 Disable certain tests that are flaky under race detector.
PiperOrigin-RevId: 250976665
2019-05-31 16:19:49 -07:00
Bhasker Hariharan 033f96cc93 Change segment queue limit to be of fixed size.
Netstack sets the unprocessed segment queue size to match the receive
buffer size. This is not required as this queue only needs to hold enough
for a short duration before the endpoint goroutine can process it.

Updates #230

PiperOrigin-RevId: 250976323
2019-05-31 16:17:33 -07:00
Kevin Krakauer d58eb9ce82 Add basic iptables structures to netstack.
Change-Id: Ib589906175a59dae315405a28f2d7f525ff8877f
2019-05-31 16:14:04 -07:00
Fabricio Voznika 38de91b028 Add build guard to files using go:linkname
Funcion signatures are not validated during compilation. Since
they are not exported, they can change at any time. The guard
ensures that they are verified at least on every version upgrade.

PiperOrigin-RevId: 250733742
2019-05-30 12:09:39 -07:00
Bhasker Hariharan ae26b2c425 Fixes to TCP listen behavior.
Netstack listen loop can get stuck if cookies are in-use and the app is slow to
accept incoming connections. Further we continue to complete handshake for a
connection even if the backlog is full. This creates a problem when a lots of
connections come in rapidly and we end up with lots of completed connections
just hanging around to be delivered.

These fixes change netstack behaviour to mirror what linux does as described
here in the following article

http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

Now when cookies are not in-use Netstack will silently drop the ACK to a SYN-ACK
and not complete the handshake if the backlog is full.  This will result in the
connection staying in a half-complete state. Eventually the sender will
retransmit the ACK and if backlog has space we will transition to a connected
state and deliver the endpoint.

Similarly when cookies are in use we do not try and create an endpoint unless
there is space in the accept queue to accept the newly created endpoint. If
there is no space then we again silently drop the ACK as we can just recreate it
when the ACK is retransmitted by the peer.

We also now use the backlog to cap the size of the SYN-RCVD queue for a given
endpoint. So at any time there can be N connections in the backlog and N in a
SYN-RCVD state if the application is not accepting connections. Any new SYNs
will be dropped.

This CL also fixes another small bug where we mark a new endpoint which has not
completed handshake as connected. We should wait till handshake successfully
completes before marking it connected.

Updates #236

PiperOrigin-RevId: 250717817
2019-05-30 12:08:41 -07:00
Tamir Duberstein e4b395db49 Remove unused wakers
These wakers are uselessly allocated and passed around; nothing ever
listens for notifications on them. The code here appears to be
vestigial, so removing it and allowing a nil waker to be passed seems
appropriate.

PiperOrigin-RevId: 249879320
Change-Id: Icd209fb77cc0dd4e5c49d7a9f2adc32bf88b4b71
2019-05-24 12:29:14 -07:00
Kevin Krakauer c1cdf18e7b UDP and TCP raw socket support.
PiperOrigin-RevId: 249511348
Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-22 13:45:15 -07:00
Bhasker Hariharan 2ac0aeeb42 Refactor fdbased endpoint dispatcher code.
This is in preparation to support an fdbased endpoint that can read/dispatch
packets from multiple underlying fds.

Updates #231

PiperOrigin-RevId: 249337074
Change-Id: Id7d375186cffcf55ae5e38986e7d605a96916d35
2019-05-21 15:24:25 -07:00
Nicolas Lacasse bfd9f75ba4 Set the FilesytemType in MountSource from the Filesystem.
And stop storing the Filesystem in the MountSource.

This allows us to decouple the MountSource filesystem type from the name of the
filesystem.

PiperOrigin-RevId: 247292982
Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8
2019-05-08 14:35:06 -07:00
Googler cbf6ab9697 Check GSO for nil in WritePacket
Testing:
Unit tests added
PiperOrigin-RevId: 247096269
Change-Id: I849c010eadcb53caf45896a15ef38162d66a9568
2019-05-07 14:57:03 -07:00
Ian Gudger 20862f0db2 Add gonet.DialContextTCP.
Allows cancellation and timeouts.

PiperOrigin-RevId: 247090428
Change-Id: I91907f12e218677dcd0e0b6d72819deedbd9f20c
2019-05-07 14:27:36 -07:00
Kevin Krakauer ff8ed5e6a5 Fix raw socket behavior and tests.
Some behavior was broken due to the difficulty of running automated raw
socket tests.

Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
2019-05-05 16:07:25 -07:00
Ian Gudger b4a9f18687 Update tcpip Clock description.
The tcpip.Clock comment stated that times provided by it should not be used for
netstack internal timekeeping. This comment was from before the interface
supported monotonic times. The monotonic times that it provides are now be the
preferred time source for netstack internal timekeeping.

PiperOrigin-RevId: 246618772
Change-Id: I853b720e3d719b03fabd6156d2431da05d354bda
2019-05-03 21:01:42 -07:00
Googler f2699b76c8 Support IPv4 fragmentation in netstack
Testing:
Unit tests and also large ping in Fuchsia OS
PiperOrigin-RevId: 246563592
Change-Id: Ia12ab619f64f4be2c8d346ce81341a91724aef95
2019-05-03 13:30:35 -07:00
Tamir Duberstein 0e1cc476db Fix transport/raw copybara export
- include packet_list.go
- exclude state.go (by renaming to include an underscore)

Also rename raw.go to endpoint.go for consistency.

PiperOrigin-RevId: 246547912
Change-Id: I19c8331c794ba683a940cc96a8be6497b53ff24d
2019-05-03 11:52:59 -07:00
Bhasker Hariharan 458fe955a7 Implement support for SACK based recovery(RFC 6675).
PiperOrigin-RevId: 246536003
Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
2019-05-03 10:51:18 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Ben Burkert 66bca6fc22 tcpip/adapters/gonet: add CloseRead & CloseWrite methods to Conn
Add the CloseRead & CloseWrite methods that performs shutdown on the
corresponding Read & Write sides of a connection.

Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1
PiperOrigin-RevId: 245537950
2019-04-26 22:46:45 -07:00
Kevin Krakauer 43dff57b87 Make raw sockets a toggleable feature disabled by default.
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-26 16:51:46 -07:00
Bhasker Hariharan 228dc15fd1 Bump the AF_PACKET socket rcv buf size to 4MB by default.
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.

Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.

Updates #211

iperf output w/ 4MB buffer.

 iperf3 -c 172.17.0.2 -t 100
 Connecting to host 172.17.0.2, port 5201
 [  4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
 [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
 [  4]   0.00-1.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.02 MBytes
 [  4]   1.00-2.00   sec  1.18 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]   2.00-3.00   sec   965 MBytes  8.09 Gbits/sec    0   1.02 MBytes
 [  4]   3.00-4.00   sec   942 MBytes  7.90 Gbits/sec    0   1.02 MBytes
 [  4]   4.00-5.00   sec   952 MBytes  7.99 Gbits/sec    0   1.02 MBytes
 [  4]   5.00-6.00   sec  1.14 GBytes  9.81 Gbits/sec    0   1.02 MBytes
 [  4]   6.00-7.00   sec  1.13 GBytes  9.68 Gbits/sec    0   1.02 MBytes
 [  4]   7.00-8.00   sec   930 MBytes  7.80 Gbits/sec    0   1.02 MBytes
 [  4]   8.00-9.00   sec  1.15 GBytes  9.91 Gbits/sec    0   1.02 MBytes
 [  4]   9.00-10.00  sec   938 MBytes  7.87 Gbits/sec    0   1.02 MBytes
 [  4]  10.00-11.00  sec   737 MBytes  6.18 Gbits/sec    0   1.02 MBytes
 [  4]  11.00-12.00  sec  1.16 GBytes  9.93 Gbits/sec    0   1.02 MBytes
 [  4]  12.00-13.00  sec   917 MBytes  7.69 Gbits/sec    0   1.02 MBytes
 [  4]  13.00-14.00  sec  1.19 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]  14.00-15.00  sec  1.01 GBytes  8.70 Gbits/sec    0   1.02 MBytes
 [  4]  15.00-16.00  sec  1.20 GBytes  10.3 Gbits/sec    0   1.02 MBytes
 [  4]  16.00-17.00  sec  1.14 GBytes  9.80 Gbits/sec    0   1.02 MBytes
 ^C[  4]  17.00-17.60  sec   718 MBytes  10.1 Gbits/sec    0   1.02 MBytes
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ ID] Interval           Transfer     Bandwidth       Retr
 [  4]   0.00-17.60  sec  18.4 GBytes  8.98 Gbits/sec    0             sender
 [  4]   0.00-17.60  sec  0.00 Bytes  0.00 bits/sec                  receiver

PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
2019-04-26 12:52:02 -07:00
Bhasker Hariharan 56cadcac4e Fixes to PacketMMap dispatcher.
This CL fixes the following bugs:

- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.

- Increments ringOffsets for frames that are truncated (i.e status is
  tpStatusCopy)

- Does not ignore frames with tpStatusLost bit set as they are valid frames and
  only indicate that there some frames were lost before this one and metrics can
  be retrieved with a getsockopt call.

- Adds checks to make sure blockSize is a multiple of page size. This is
  required as the kernel allocates in pages per block and rejects sizes that are
  not page aligned with an EINVAL.

Updates #210

PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
2019-04-23 17:47:56 -07:00
Ben Burkert 56927e5317 tcpip/transport/tcp: read side only shutdown of an endpoint
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.

Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.

Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
2019-04-19 19:29:05 -07:00
Ben Burkert cec2cdc12f tcpip/transport/udp: add Forwarder type
Add a UDP forwarder for intercepting and forwarding UDP sessions.

Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9
PiperOrigin-RevId: 244293576
2019-04-18 17:49:57 -07:00
Andrei Vagin 4524790ff6 netstack: use a proper network protocol to set gso.L3HdrLen
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.

This means that we can't use endpoint.netProto to set gso.L3HdrLen.

PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
2019-04-18 11:42:23 -07:00
Fabricio Voznika 9f8c89fc7f Return error from fdbased.New
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
2019-04-17 11:16:35 -07:00
Bhasker Hariharan eaac2806ff Add TCP checksum verification.
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
2019-04-09 11:23:47 -07:00
Kevin Krakauer 52a51a8e20 Add a raw socket transport endpoint and use it for raw ICMP sockets.
Having raw socket code together will make it easier to add support for other raw
network protocols. Currently, only ICMP uses the raw endpoint. However, adding
support for other protocols such as UDP shouldn't be much more difficult than
adding a few switch cases.

PiperOrigin-RevId: 241564875
Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-04-02 11:13:49 -07:00
Bhasker Hariharan 45c54b1f4e Fix incorrect checksums in TCP and UDP tests.
PiperOrigin-RevId: 241025361
Change-Id: I292e7aea9a4b294b11e4f736e107010d9524586b
2019-03-29 12:05:43 -07:00
Bhasker Hariharan cc0e96a4bd Fix Panic in SACKScoreboard.Delete.
The panic was caused by modifying the tree while iterating which invalidated the
iterator.

Also fixes another bug in SACKScoreboard.Insert() which was causing blocks to be
merged incorrectly.

PiperOrigin-RevId: 240895053
Change-Id: Ia72b8244297962df5c04283346da5226434740af
2019-03-28 18:18:39 -07:00
Bert Muthalaly f2e5dcf21c Add ICMP stats
PiperOrigin-RevId: 240848882
Change-Id: I23dd4599f073263437aeab357c3f767e1a432b82
2019-03-28 14:09:20 -07:00
Andrei Vagin f4105ac21a netstack/fdbased: add generic segmentation offload (GSO) support
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.

Here are numbers for the nginx-1m test:
runsc:		579330.01 [Kbytes/sec] received
runsc-gso:	1794121.66 [Kbytes/sec] received
runc:		2122139.06 [Kbytes/sec] received

and for tcp_benchmark:

$ tcp_benchmark  --duration 15   --ideal
[  4]  0.0-15.0 sec  86647 MBytes  48456 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal
[  4]  0.0-15.0 sec  2173 MBytes  1214 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal --gso 65536
[  4]  0.0-15.0 sec  19357 MBytes  10825 Mbits/sec

PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-03-28 11:03:41 -07:00
Tamir Duberstein 8406504817 Avoid mutating memory passed to DeliverTransportPacket
PiperOrigin-RevId: 240642903
Change-Id: I16625015123a827d267d60b328a202057264bbd6
2019-03-27 14:36:57 -07:00
Tamir Duberstein 9c20a88bd7 Remove polling from ICMP test
PiperOrigin-RevId: 240483396
Change-Id: Ie75d3ae38af83f1d92f167ff9ba58fa10f5b372b
2019-03-26 20:20:52 -07:00
Andrei Vagin 654e878abb netstack: Don't exclude length when a pseudo-header checksum is calculated
This is a preparation for GSO changes (cl/234508902).

RELNOTES[gofers]: Refactor checksum code to include length, which
it already did, but in a convoluted way. Should be a no-op.

PiperOrigin-RevId: 240460794
Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2
2019-03-26 17:15:13 -07:00
Tamir Duberstein 9cd2b66f10 Remove echoReplier
Mirror the ICMPv6 echo implementation in ICMPv4 echo. This removes
unnecessary asynchrony, reduces copying, and reduces complexity.

PiperOrigin-RevId: 240394525
Change-Id: If8f53254154f86772f5e51159765aa23b3b328b8
2019-03-26 11:45:01 -07:00
Andrei Vagin 9f4e1cb797 netstack: adjust the sequence number after trimming the packet
PiperOrigin-RevId: 239417224
Change-Id: I14a9adc31a6330a79a6156c105969cd5f1f63d20
2019-03-20 09:58:10 -07:00
Andrei Vagin 87cce0ec08 netstack: reduce MSS from SYN to account tcp options
See: https://tools.ietf.org/html/rfc6691#section-2
PiperOrigin-RevId: 239305632
Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e
2019-03-19 17:33:20 -07:00
Bert Muthalaly 928809fa7d Add layer 2 stats (tx, rx) X (packets, bytes) to netstack
PiperOrigin-RevId: 239194420
Change-Id: Ie193e8ac2b7a6db21195ac85824a335930483971
2019-03-19 08:30:43 -07:00
Tamir Duberstein 5496be7c5d Remove duplicate TCP flag definitions
PiperOrigin-RevId: 238467634
Change-Id: If4cd8efff7386fbee1195f051d15549b495910a9
2019-03-14 10:19:21 -07:00
Kevin Krakauer f97c4f1b7a Remove unused function.
PiperOrigin-RevId: 238336475
Change-Id: I8131e04699028246ebc233953ebb3feca5673940
2019-03-13 16:40:10 -07:00
Fabricio Voznika 70d0613444 Reduce PACKET_RX_RING memory usage
Previous memory allocation was excessive (80 MB). Changed
it to use 2 MB instead. There is no drop in perfomance due
to this change:

ab -n 100 -c 10 http://server/latin10m.txt  ==> 10 MB file
80 MB: 178 MB/s
 2 MB: 181 MB/s

PiperOrigin-RevId: 238321594
Change-Id: I1c8aed13cad5d75f4506d2b406b305117055fbe5
2019-03-13 15:25:13 -07:00
Noah Gold 8003bd6a5c Make gonet.PacketConn implement net.Conn.
gonet.PacketConn now implements net.Conn, allowing it to be returned from
net.Dial.Dialer functions.

PiperOrigin-RevId: 238111980
Change-Id: I174884385ff4d9b8e9918fac7bbb5b93ca366ba7
2019-03-12 15:36:33 -07:00
Ian Gudger a16f6e50c5 Make HandleLocal apply to all non-loopback interfaces.
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify
the implementations. This has the benefit of making HandleLocal apply even when
the fdbased link endpoint isn't in use.

In addition, move looping logic to route creation so that it doesn't need to be
run for each packet. This should improve performance.

PiperOrigin-RevId: 238099480
Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
2019-03-12 14:37:56 -07:00
Ian Gudger 86036f979b Validate multicast addresses in multicast group operations.
PiperOrigin-RevId: 237559843
Change-Id: I93a9d83a08cd3d49d5fc7fcad5b0710d0aa04aaa
2019-03-08 19:05:26 -08:00
Ian Gudger 56a6128295 Implement IP_MULTICAST_LOOP.
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.

For now we only support IPv4 multicast.

PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
2019-03-08 15:49:17 -08:00
Bhasker Hariharan 1718fdd1a8 Add new retransmissions and recovery related metrics.
PiperOrigin-RevId: 236945145
Change-Id: I051760d95154ea5574c8bb6aea526f488af5e07b
2019-03-05 16:41:44 -08:00
Kevin Krakauer 23e66ee96d Remove unused commit() function argument to Bind.
PiperOrigin-RevId: 236926132
Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181
2019-03-05 14:53:34 -08:00
Tamir Duberstein 3830786883 Map IPv{4,6} addresses to ethernet addresses
...in accordance with RFCs 1112 and 2464.

Fixes IPv4 multicast when IP_MULTICAST_IF is specified.

Don't return ErrNoRoute when no route is needed.
Don't set Route.NextHop when no route is needed.

PiperOrigin-RevId: 236199813
Change-Id: I48ed33e1b7f760deaa37e18ad7f1b8b62819ab43
2019-02-28 14:38:32 -08:00
Kevin Krakauer 121db29a93 Ping support via IPv4 raw sockets.
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
  which can pass it up to the socket layer. This allows a raw socket to return
  the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
  that enable incoming packets to be delivered to raw endpoints. New raw sockets
  of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.

PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-27 14:31:21 -08:00
Googler 12d9cf6fab Adds a WriteRawPacket method to the InjectableLinkEndpoint interface.
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.

PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
2019-02-26 14:58:37 -08:00
Amanda Tait 33d0e824c7 Use more conservative locking in NIC.DeliverNetworkPacket
An earlier CL excessively minimizes the period in which it
holds a lock on NIC. This earlier CL had done this out of
the mistaken impression it fixed a broken test, when in
fact it just reduced the rate of failure of a flaky test
in tcp_test.go. This new change holds the lock on NIC
for the duration of the loop over n.endpoints.

PiperOrigin-RevId: 235732487
Change-Id: I53ee6df264f093ddc4d29e9acdcba6b4838cb112
2019-02-26 09:10:37 -08:00
Bhasker Hariharan 26be25e4ec Add a SACK scoreboard to TCP endpoints.
This change does not make use of SACK information but adds support to track
SACK information and store it in the endpoint.

The actual SACK based recovery will be in a separate CL.

Part of commits to add RFC 6675 support to Netstack.

PiperOrigin-RevId: 235612264
Change-Id: I261f94844d7bad5abda803152ce6cc6125a467ff
2019-02-25 15:20:04 -08:00
Amanda Tait c14a1a1618 Fix race condition in NIC.DeliverNetworkPacket
cl/234850781 introduced a race condition in NIC.DeliverNetworkPacket
by failing to hold a lock. This change fixes this regressesion by acquiring
a read lock before iterating through n.endpoints, and then releasing the lock
once iteration is complete.

PiperOrigin-RevId: 235549770
Change-Id: Ib0133288be512d478cf759c3314dc95ec3205d4b
2019-02-25 10:02:29 -08:00
Kevin Krakauer b75aa51504 Rename ping endpoints to icmp endpoints.
PiperOrigin-RevId: 235248572
Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09
2019-02-22 13:34:47 -08:00
Amanda Tait ea070b9d5f Implement Broadcast support
This change adds support for the SO_BROADCAST socket option in gVisor Netstack.
This support includes getsockopt()/setsockopt() functionality for both UDP and
TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and
down the stack, and route finding/creation for broadcast packets. Finally, a
suite of tests have been implemented, exercising this functionality through the
Linux syscall API.

PiperOrigin-RevId: 234850781
Change-Id: If3e666666917d39f55083741c78314a06defb26c
2019-02-20 12:54:13 -08:00
Bhasker Hariharan 3e3a1ef9d6 Updates tcp_proxy to use an AF_PACKET and veth devices.
tcp_proxy now uses an AF_PACKET socket as the FD for netstack link layer
endpoint instead of a tap device. It also changes the link layer endpoint to use
PacketMMap dispatch instead of Readv. This reduces overall cpu and reflects the
current runsc setup which uses PacketMMap and also uses veth devices to receive
packets.

Also fixed a bug in gonet where Read() was not doing coalescing read and would
read small amounts at a time.

PiperOrigin-RevId: 234714768
Change-Id: Idabf8e600e4512489d3ba441c4096dc74deba5d7
2019-02-19 18:23:54 -08:00
Ian Gudger c611dbc5a7 Implement IP_MULTICAST_IF.
This allows setting a default send interface for IPv4 multicast. IPv6 support
will come later.

PiperOrigin-RevId: 234251379
Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173
2019-02-15 18:40:15 -08:00
Kevin Krakauer a9cb3dcd9d Move SO_TIMESTAMP from different transport endpoints to epsocket.
SO_TIMESTAMP is reimplemented in ping and UDP sockets (and needs to be added for
TCP), but can just be implemented in epsocket for simplicity. This will also
make SIOCGSTAMP easier to implement.

PiperOrigin-RevId: 234179300
Change-Id: Ib5ea0b1261dc218c1a8b15a65775de0050fe3230
2019-02-15 11:18:44 -08:00
Googler d60ce17a21 Internal change.
PiperOrigin-RevId: 234011346
Change-Id: Ic69375ddb3794dd0d3d6e62ee4dc60fdf4baf2c7
2019-02-14 12:54:27 -08:00
Bhasker Hariharan e0b3d3323f Add support for using PACKET_RX_RING to receive packets.
PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the
kernel. This should cut down the number of host syscalls that need to be made
to receive packets when the underlying fd is a socket of the AF_PACKET type.

PiperOrigin-RevId: 233834998
Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f
2019-02-13 14:53:03 -08:00
Bhasker Hariharan efe5e737d7 Do not drop packets w/ missing TCP timestamps.
RFC7323 recommends that if the timestamp option was negotiated
then all packets should carry a TCP Timestamp and any packets that
do not should be dropped.

Netstack implemented this behaviour. Linux OTOH does not and will
accept such packets. This change makes Netstack behaviour compatible
with Linux.

Also now that we allow such packets, we do need to update RTO calculations
based on these packets even if timestamp option is enabled.

PiperOrigin-RevId: 233432268
Change-Id: I9f4742ae6b63930ac3b5e37d8c238761e6a4b29f
2019-02-11 10:23:43 -08:00
Ian Gudger 967326131a Fix build error.
PiperOrigin-RevId: 233139020
Change-Id: I2e7089fa25d20e5662eb941054a684d41f5d3e12
2019-02-08 15:37:20 -08:00
Ian Gudger 80f901b16b Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack.
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in
a followup CL.

PiperOrigin-RevId: 233008638
Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
2019-02-07 23:15:23 -08:00
Googler e0afa87899 Internal change.
PiperOrigin-RevId: 232937200
Change-Id: I5c3709cc8f1313313ff618a45e48c14a3a111cb4
2019-02-07 13:46:26 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Bhasker Hariharan f03c7e48e7 Fix IsLost check to match the description in RFC6675.
quoting what "rscheff@gmx.at" pointed out over email.
"IsLost in RFC3517 is defined as  >=  (DupThresh * SMSS) while
RFC6675 improves upon this, and defines IsLost as  >
((DupThresh - 1) * SMSS + 1).

The latter addresses situations where partial segments (size < MSS)
are sent (eg. last segment of a http protocol message sent with PSH
being less than MSS is common)."

PiperOrigin-RevId: 231512331
Change-Id: I1addd4a92e3e7baeb0bdda46463ebfae435da958
2019-01-29 18:13:48 -08:00
Ian Gudger ff1c3bb0b5 Fix NIC endpoint forwarding.
Also adds a test for regular NIC forwarding.

PiperOrigin-RevId: 231495279
Change-Id: Ic7edec249568e9ad0280cea77eac14478c9073e1
2019-01-29 16:23:30 -08:00
Bhasker Hariharan 24cb2c0a72 Use recvmmsg() instead of readv() to read packets from NIC.
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.

PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
2019-01-29 01:39:01 -08:00
Kevin Krakauer 9a01287d23 test: Tag tcp_test as flaky.
PiperOrigin-RevId: 229427852
Change-Id: I9de8ed63f4a7672dacd3b282c863c599d00acd52
2019-01-15 13:21:00 -08:00
Zhaozhong Ni 7182b9cf52 netstack: release port inline for listening sockets only.
PiperOrigin-RevId: 229243918
Change-Id: Ie14ef34e66ae851ed080f57b7d26a369a66f7664
2019-01-14 13:33:47 -08:00
Googler 1e1dae50ca Internal change.
PiperOrigin-RevId: 228979583
Change-Id: I69bd82def48ceb19bc8558c890622b8528d98764
2019-01-11 18:52:36 -08:00
Bert Muthalaly 3f45878b73 Implement Stringer for tcpip.StatCounter
This enables formatting tcpip.Stats readably with %+v.

PiperOrigin-RevId: 228379088
Change-Id: I6a9876454a22f151ee752cf94589b4188729458f
2019-01-08 12:35:35 -08:00
Andrei Vagin 652d068119 Implement SO_REUSEPORT for TCP and UDP sockets
This option allows multiple sockets to be bound to the same port.

Incoming packets are distributed to sockets using a hash based on source and
destination addresses. This means that all packets from one sender will be
received by the same server socket.

PiperOrigin-RevId: 227153413
Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
2018-12-28 11:27:14 -08:00