Commit Graph

222 Commits

Author SHA1 Message Date
Ghanan Gowripalan 815df2959a Solicit IPv6 routers when a NIC becomes enabled as a host
This change adds support to send NDP Router Solicitation messages when a NIC
becomes enabled as a host, as per RFC 4861 section 6.3.7.

Note, Router Solicitations will only be sent when the stack has forwarding
disabled.

Tests: Unittests to make sure that the initial Router Solicitations are sent
as configured. The tests also validate the sent Router Solicitations' fields.
PiperOrigin-RevId: 289964095
2020-01-15 17:10:48 -08:00
Bhasker Hariharan a611fdaee3 Changes TCP packet dispatch to use a pool of goroutines.
All inbound segments for connections in ESTABLISHED state are delivered to the
endpoint's queue but for every segment delivered we also queue the endpoint for
processing to a selected processor. This ensures that when there are a large
number of connections in ESTABLISHED state the inbound packets are all handled
by a small number of goroutines and significantly reduces the amount of work the
goscheduler has to perform.

We let connections in other states follow the current path where the
endpoint's goroutine directly handles the segments.

Updates #231

PiperOrigin-RevId: 289728325
2020-01-14 14:15:50 -08:00
Tamir Duberstein 50625cee59 Implement {g,s}etsockopt(IP_RECVTOS) for UDP sockets
PiperOrigin-RevId: 289718534
2020-01-14 13:33:23 -08:00
Ghanan Gowripalan 1ad8381eac Do Source Address Selection when choosing an IPv6 source address
Do Source Address Selection when choosing an IPv6 source address as per RFC 6724
section 5 rules 1-3:
1) Prefer same address
2) Prefer appropriate scope
3) Avoid deprecated addresses.

A later change will update Source Address Selection to follow rules 4-8.

Tests:
Rule 1 & 2: stack.TestIPv6SourceAddressSelectionScopeAndSameAddress,
Rule 3:     stack.TestAutoGenAddrTimerDeprecation,
            stack.TestAutoGenAddrDeprecateFromPI
PiperOrigin-RevId: 289559373
2020-01-13 17:58:28 -08:00
Kevin Krakauer 1c3d3c70b9 Fix test building. 2020-01-13 14:54:32 -08:00
Tamir Duberstein debd213da6 Allow dual stack sockets to operate on AF_INET
Fixes #1490
Fixes #1495

PiperOrigin-RevId: 289523250
2020-01-13 14:47:22 -08:00
Kevin Krakauer 31e49f4b19 Merge branch 'master' into iptables-write-input-drop 2020-01-13 12:22:15 -08:00
Ghanan Gowripalan d27208463e Automated rollback of changelist 288990597
PiperOrigin-RevId: 289169518
2020-01-10 14:58:47 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Ghanan Gowripalan 26c5653bb5 Inform NDPDispatcher when Stack learns about available configurations via DHCPv6
Inform the Stack's NDPDispatcher when it receives an NDP Router Advertisement
that updates the available configurations via DHCPv6. The Stack makes sure that
its NDPDispatcher isn't informed unless the avaiable configurations via DHCPv6
for a NIC is updated.

Tests: Test that a Stack's NDPDispatcher is informed when it receives an NDP
Router Advertisement that informs it of new configurations available via DHCPv6.
PiperOrigin-RevId: 289001283
2020-01-09 16:56:28 -08:00
Ghanan Gowripalan 8fafd3142e Separate NDP tests into its own package
Internal tools timeout after 60s during tests that are required to pass before
changes can be submitted. Separate out NDP tests into its own package to help
prevent timeouts when testing.

PiperOrigin-RevId: 288990597
2020-01-09 15:56:44 -08:00
Eyal Soha 8643933d6e Change BindToDeviceOption to store NICID
This makes it possible to call the sockopt from go even when the NIC has no
name.

PiperOrigin-RevId: 288955236
2020-01-09 13:07:53 -08:00
Bert Muthalaly e752ddbb72 Allow clients to store an opaque NICContext with NICs
...retrievable later via stack.NICInfo().

Clients of this library can use it to add metadata that should be tracked
alongside a NIC, to avoid having to keep a map[tcpip.NICID]metadata mirroring
stack.Stack's nic map.

PiperOrigin-RevId: 288924900
2020-01-09 10:46:01 -08:00
Ghanan Gowripalan d057871f41 CancellableTimer to encapsulate the work of safely stopping timers
Add a new CancellableTimer type to encapsulate the work of safely stopping
timers when it fires at the same time some "related work" is being handled. The
term "related work" is some work that needs to be done while having obtained
some common lock (L).

Example: Say we have an invalidation timer that may be extended or cancelled by
some event. Creating a normal timer and simply cancelling may not be sufficient
as the timer may have already fired when the event handler attemps to cancel it.
Even if the timer and event handler obtains L before doing work, once the event
handler releases L, the timer will eventually obtain L and do some unwanted
work.

To prevent the timer from doing unwanted work, it checks if it should early
return instead of doing the normal work after obtaining L. When stopping the
timer callers must have L locked so the timer can be safely informed that it
should early return.

Test: Tests that CancellableTimer fires and resets properly. Test to make sure
the timer fn is not called after being stopped within the lock L.
PiperOrigin-RevId: 288806984
2020-01-08 17:50:54 -08:00
Kevin Krakauer 0999ae8b34 Getting a panic when running tests. For some reason the filter table is
ending up with the wrong chains and is indexing -1 into rules.
2020-01-08 15:57:25 -08:00
Tamir Duberstein d530df2f95 Introduce tcpip.SockOptBool
...and port V6OnlyOption to it.

PiperOrigin-RevId: 288789451
2020-01-08 15:40:48 -08:00
Bert Muthalaly e21c584056 Combine various Create*NIC methods into CreateNICWithOptions.
PiperOrigin-RevId: 288779416
2020-01-08 14:50:49 -08:00
Tamir Duberstein a271bccfc6 Rename tcpip.SockOpt{,Int}
PiperOrigin-RevId: 288772878
2020-01-08 14:20:07 -08:00
Tamir Duberstein 9df018767c Remove redundant function argument
PacketLooping is already a member on the passed Route.

PiperOrigin-RevId: 288721500
2020-01-08 10:22:51 -08:00
Bert Muthalaly 0cc1e74b57 Add NIC.isLoopback()
...enabling us to remove the "CreateNamedLoopbackNIC" variant of
CreateNIC and all the plumbing to connect it through to where the value
is read in FindRoute.

PiperOrigin-RevId: 288713093
2020-01-08 09:30:20 -08:00
Ghanan Gowripalan 4e19d165cc Support deprecating SLAAC addresses after the preferred lifetime
Support deprecating network endpoints on a NIC. If an endpoint is deprecated, it
should not be used for new connections unless a more preferred endpoint is not
available, or unless the deprecated endpoint was explicitly requested.

Test: Test that deprecated endpoints are only returned when more preferred
endpoints are not available and SLAAC addresses are deprecated after its
preferred lifetime
PiperOrigin-RevId: 288562705
2020-01-07 13:42:08 -08:00
Ghanan Gowripalan 2031cc4701 Disable auto-generation of IPv6 link-local addresses for loopback NICs
Test: Test that an IPv6 link-local address is not auto-generated for loopback
NICs, even when it is enabled for non-loopback NICS.
PiperOrigin-RevId: 288519591
2020-01-07 10:09:39 -08:00
Ghanan Gowripalan 8dfd922840 Pass the NIC-internal name to the NIC name function when generating opaque IIDs
Pass the NIC-internal name to the NIC name function when generating opaque IIDs
so implementations can use the name that was provided when the NIC was created.
Previously, explicit NICID to NIC name resolution was required from the netstack
integrator.

Tests: Test that the name provided when creating a NIC is passed to the NIC name
function when generating opaque IIDs.
PiperOrigin-RevId: 288395359
2020-01-06 16:21:31 -08:00
Ghanan Gowripalan 83ab47e87b Use opaque interface identifiers when generating IPv6 addresses via SLAAC
Support using opaque interface identifiers when generating IPv6 addresses via
SLAAC when configured to do so.

Note, this change does not handle retries in response to DAD conflicts yet.
That will also come in a later change.

Test: Test that when SLAAC addresses are generated, they use opaque interface
identifiers when configured to do so.
PiperOrigin-RevId: 288078605
2020-01-03 18:28:22 -08:00
Ghanan Gowripalan d1d878a801 Support generating opaque interface identifiers as defined by RFC 7217
Support generating opaque interface identifiers as defined by RFC 7217 for
auto-generated IPv6 link-local addresses. Opaque interface identifiers will also
be used for IPv6 addresses auto-generated via SLAAC in a later change.

Note, this change does not handle retries in response to DAD conflicts yet.
That will also come in a later change.

Tests: Test that when configured to generated opaque IIDs, they are properly
generated as outlined by RFC 7217.
PiperOrigin-RevId: 288035349
2020-01-03 13:00:19 -08:00
gVisor bot 87e4d03fdf Automated rollback of changelist 287029703
PiperOrigin-RevId: 287217899
2019-12-26 13:05:52 -08:00
Ryan Heacock e013c48c78 Enable IP_RECVTOS socket option for datagram sockets
Added the ability to get/set the IP_RECVTOS socket option on UDP endpoints. If
enabled, TOS from the incoming Network Header passed as ancillary data in the
ControlMessages.

Test:
* Added unit test to udp_test.go that tests getting/setting as well as
verifying that we receive expected TOS from incoming packet.
* Added a syscall test
PiperOrigin-RevId: 287029703
2019-12-24 08:49:39 -08:00
Ghanan Gowripalan 5bc4ae9d57 Clear any host-specific NDP state when becoming a router
This change supports clearing all host-only NDP state when NICs become routers.
All discovered routers, discovered on-link prefixes and auto-generated addresses
will be invalidated when becoming a router. This is because normally, routers do
not process Router Advertisements to discover routers or on-link prefixes, and
do not do SLAAC.

Tests: Unittest to make sure that all discovered routers, discovered prefixes
and auto-generated addresses get invalidated when transitioning from a host to
a router.
PiperOrigin-RevId: 286902309
2019-12-23 08:55:25 -08:00
Ghanan Gowripalan 8e6e87f8e8 Allow 'out-of-line' routing table updates for Router and Prefix discovery events
This change removes the requirement that a new routing table be provided when a
router or prefix discovery event happens so that an updated routing table may
be provided to the stack at a later time from the event.

This change is to address the use case where the netstack integrator may need to
obtain a lock before providing updated routes in response to the events above.

As an example, say we have an integrator that performs the below two operations
operations as described:
A. Normal route update:
  1. Obtain integrator lock
  2. Update routes in the integrator
  3. Call Stack.SetRouteTable with the updated routes
    3.1. Obtain Stack lock
    3.2. Update routes in Stack
    3.3. Release Stack lock
  4. Release integrator lock
B. NDP event triggered route update:
  1. Obtain Stack lock
  2. Call event handler
    2.1. Obtain integrator lock
    2.2. Update routes in the integrator
    2.3. Release integrator lock
    2.4. Return updated routes to update Stack
  3. Update routes in Stack
  4. Release Stack lock

A deadlock may occur if a Normal route update was attemped at the same time an
NDP event triggered route update was attempted. With threads T1 and T2:
1) T1 -> A.1, A.2
2) T2 -> B.1
3) T1 -> A.3 (hangs at A.3.1 since Stack lock is taken in step 2)
4) T2 -> B.2 (hangs at B.2.1 since integrator lock is taken in step 1)

Test: Existing tests were modified to not provide or expect routing table
changes in response to Router and Prefix discovery events.
PiperOrigin-RevId: 286274712
2019-12-18 15:18:33 -08:00
Ghanan Gowripalan 628948b1e1 Cleanup NDP Tests
This change makes sure that test variables are captured before running tests
in parallel, and removes unneeded buffered channel allocations. This change also
removes unnecessary timeouts.

PiperOrigin-RevId: 286255066
2019-12-18 14:24:39 -08:00
Ghanan Gowripalan ad80dcf470 Properly generate the EUI64 interface identifier from an Ethernet address
Fixed a bug where the interface identifier was not properly generated from an
Ethernet address.

Tests: Unittests to make sure the functions generating the EUI64 interface
identifier are correct.
PiperOrigin-RevId: 285494562
2019-12-13 16:41:41 -08:00
Ghanan Gowripalan 4ff71b5be4 Inform the integrator on receipt of an NDP Recursive DNS Server option
This change adds support to let an integrator know when it receives an NDP
Router Advertisement message with the NDP Recursive DNS Server option with at
least one DNS server's address. The stack will not maintain any state related to
the DNS servers - the integrator is expected to maintain any required state and
invalidate the servers after its valid lifetime expires, or refresh the lifetime
when a new one is received for a known DNS server.

Test: Unittest to make sure that an event is sent to the integrator when an NDP
Recursive DNS Server option is received with at least one address.
PiperOrigin-RevId: 284890502
2019-12-10 18:05:23 -08:00
Ghanan Gowripalan ab3f7bc393 Do IPv6 Stateless Address Auto-Configuration (SLAAC)
This change allows the netstack to do SLAAC as outlined by RFC 4862 section 5.5.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that SLAAC
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change reuses 1 option and introduces a new one that is required to take
advantage of SLAAC, all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- AutoGenGlobalAddresses: Whether or not SLAAC is performed.

Also note, this change does not deprecate SLAAC generated addresses after the
preferred lifetime. That will come in a later change (b/143713887). Currently,
only the valid lifetime is honoured.

Tests: Unittest to make sure that SLAAC generates and adds addresses only when
configured to do so. Tests also makes sure that conflicts with static addresses
do not modify the static address.
PiperOrigin-RevId: 284265317
2019-12-06 14:41:30 -08:00
Ghanan Gowripalan 10f7b109ab Add a type to represent the NDP Recursive DNS Server option
This change adds a type to represent the NDP Recursive DNS Server option, as
defined by RFC 8106 section 5.1.

PiperOrigin-RevId: 284005493
2019-12-05 10:41:45 -08:00
Adin Scannell c3b93afeaf Cleanup visibility.
PiperOrigin-RevId: 282194656
2019-11-23 23:54:41 -08:00
Kevin Krakauer 9db08c4e58 Use PacketBuffers with GSO.
PiperOrigin-RevId: 282045221
2019-11-22 14:52:35 -08:00
Kevin Krakauer 3f7d937090 Use PacketBuffers for outgoing packets.
PiperOrigin-RevId: 280455453
2019-11-14 10:15:38 -08:00
Ghanan Gowripalan 3f51bef8cd Do not handle TCP packets that include a non-unicast IP address
This change drops TCP packets with a non-unicast IP address as the source or
destination address as TCP is meant for communication between two endpoints.

Test: Make sure that if the source or destination address contains a non-unicast
address, no TCP packet is sent in response and the packet is dropped.
PiperOrigin-RevId: 280073731
2019-11-12 15:50:02 -08:00
Ghanan Gowripalan 5398530e45 Discover on-link prefixes from Router Advertisements' Prefix Information options
This change allows the netstack to do NDP's Prefix Discovery as outlined by
RFC 4861 section 6.3.4. If configured to do so, when a new on-link prefix is
discovered, the routing table will be updated with a device route through
the nic the RA arrived at. Likewise, when such a prefix gets invalidated, the
device route will be removed.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that Prefix Discovery
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change reuses 1 option and introduces a new one that is required to take
advantage of Prefix Discovery, all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- DiscoverOnLinkPrefixes: Whether or not Prefix Discovery is performed (new)

Another note: for a NIC to process Prefix Information options (in Router
Advertisements), it must not be a router itself. Currently the netstack does not
have per-interface routing configuration; the routing/forwarding configuration
is controlled stack-wide. Therefore, if the stack is configured to enable
forwarding/routing, no router Advertisements (and by extension the Prefix
Information options) will be processed.

Tests: Unittest to make sure that Prefix Discovery and updates to the routing
table only occur if explicitly configured to do so. Unittest to make sure at
max stack.MaxDiscoveredOnLinkPrefixes discovered on-link prefixes are
remembered.
PiperOrigin-RevId: 280049278
2019-11-12 14:09:43 -08:00
Bhasker Hariharan 66ebb6575f Add support for TIME_WAIT timeout.
This change adds explicit support for honoring the 2MSL timeout
for sockets in TIME_WAIT state. It also adds support for the
TCP_LINGER2 option that allows modification of the FIN_WAIT2
state timeout duration for a given socket.

It also adds an option to modify the Stack wide TIME_WAIT timeout
but this is only for testing. On Linux this is fixed at 60s.

Further, we also now correctly process RST's in CLOSE_WAIT and
close the socket similar to linux without moving it to error
state.

We also now handle SYN in ESTABLISHED state as per
RFC5961#section-4.1. Earlier we would just drop these SYNs.
Which can result in some tests that pass on linux to fail on
gVisor.

Netstack now honors TIME_WAIT correctly as well as handles the
following cases correctly.

- TCP RSTs in TIME_WAIT are ignored.
- A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT
  and a dup ACK is sent in response to the FIN as the dup FIN
  indicates potential loss of the original final ACK.
- An out of order segment during TIME_WAIT generates a dup ACK.
- A new SYN w/ a sequence number > the highest sequence number
  in the previous connection closes the TIME_WAIT early and
  opens a new connection.

Further to make the SYN case work correctly the ISN (Initial
Sequence Number) generation for Netstack has been updated to
be as per RFC. Its not a pure random number anymore and follows
the recommendation in https://tools.ietf.org/html/rfc6528#page-3.

The current hash used is not a cryptographically secure hash
function. A separate change will update the hash function used
to Siphash similar to what is used in Linux.

PiperOrigin-RevId: 279106406
2019-11-07 09:46:55 -08:00
Ghanan Gowripalan 0c424ea731 Rename nicid to nicID to follow go-readability initialisms
https://github.com/golang/go/wiki/CodeReviewComments#initialisms

This change does not introduce any new functionality. It just renames variables
from `nicid` to `nicID`.

PiperOrigin-RevId: 278992966
2019-11-06 19:41:25 -08:00
Ghanan Gowripalan e63db5e7bb Discover default routers from Router Advertisements
This change allows the netstack to do NDP's Router Discovery as outlined by
RFC 4861 section 6.3.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that Router Discovery
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change introduces 2 options required to take advantage of Router Discovery,
all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- DiscoverDefaultRouters: Whether or not Router Discovery is performed

Another note: for a NIC to process Router Advertisements, it must not be a
router itself. Currently the netstack does not have per-interface routing
configuration; the routing/forwarding configuration is controlled stack-wide.
Therefore, if the stack is configured to enable forwarding/routing, no Router
Advertisements will be processed.

Tests: Unittest to make sure that Router Discovery and updates to the routing
table only occur if explicitly configured to do so. Unittest to make sure at
max stack.MaxDiscoveredDefaultRouters discovered default routers are remembered.
PiperOrigin-RevId: 278965143
2019-11-06 16:29:58 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Ghanan Gowripalan a824b48cea Validate incoming NDP Router Advertisements, as per RFC 4861 section 6.1.2
This change validates incoming NDP Router Advertisements as per RFC 4861 section
6.1.2. It also includes the skeleton to handle Router Advertiements that arrive
on some NIC.

Tests: Unittest to make sure only valid NDP Router Advertisements are received/
not dropped.
PiperOrigin-RevId: 278891972
2019-11-06 10:39:29 -08:00
Kevin Krakauer 3246040447 Deep copy dispatcher views.
When VectorisedViews were passed up the stack from packet_dispatchers, we were
passing a sub-slice of the dispatcher's views fields. The dispatchers then
immediately set those views to nil.

This wasn't caught before because every implementer copied the data in these
views before returning.

PiperOrigin-RevId: 277615351
2019-10-30 17:12:57 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Ian Gudger dc21c5ca16 Add Close and Wait methods to stack.
Link endpoints still don't have a unified way to be requested to stop.

Updates #837

PiperOrigin-RevId: 277398952
2019-10-29 17:22:32 -07:00
Ian Gudger a2c51efe36 Add endpoint tracking to the stack.
In the future this will replace DanglingEndpoints. DanglingEndpoints must be
kept for now due to issues with save/restore.

This is arguably a cleaner design and allows the stack to know which transport
endpoints might still be using its link endpoints.

Updates #837

PiperOrigin-RevId: 277386633
2019-10-29 16:14:51 -07:00
Ian Gudger 7d80e85835 Allow waiting for Endpoint worker goroutines to finish.
Updates #837

PiperOrigin-RevId: 277325162
2019-10-29 11:32:48 -07:00
Ghanan Gowripalan 0864549ecc Use the user supplied TCP MSS when creating a new active socket
This change supports using a user supplied TCP MSS for new active TCP
connections. Note, the user supplied MSS must be less than or equal to the
maximum possible MSS for a TCP connection's route. If it is greater than the
maximum possible MSS, the maximum possible MSS will be used as the connection's
MSS instead.

This change does not use this user supplied MSS for connections accepted from
listening sockets - that will come in a later change.

Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user
supplied MSS if it is not greater than the maximum possible MSS for the route.
PiperOrigin-RevId: 277185125
2019-10-28 18:20:36 -07:00
Ghanan Gowripalan f034790ad8 Use interface-specific NDP configurations instead of the stack-wide default.
This change makes it so that NDP work is done using the per-interface NDP
configurations instead of the stack-wide default NDP configurations to correctly
implement RFC 4861 section 6.3.2 (note here, a host is a single NIC operating
as a host device), and RFC 4862 section 5.1.

Test: Test that we can set NDP configurations on a per-interface basis without
affecting the configurations of other interfaces or the stack-wide default. Also
make sure that after the configurations are updated, the updated configurations
are used for NDP processes (e.g. Duplicate Address Detection).
PiperOrigin-RevId: 276525661
2019-10-24 11:09:18 -07:00
Ghanan Gowripalan de3dbf8a09 Inform netstack integrator when Duplicate Address Detection completes
This change introduces a new interface, stack.NDPDispatcher. It can be
implemented by the netstack integrator to receive NDP related events. As of this
change, only DAD related events are supported.

Tests: Existing tests were modified to use the NDPDispatcher's DAD events for
DAD tests where it needed to wait for DAD completing (failing and resolving).
PiperOrigin-RevId: 276338733
2019-10-23 13:26:35 -07:00
Ghanan Gowripalan c356fe2ebb Respect new PrimaryEndpointBehavior when addresses gets promoted to permanent
This change makes sure that when an address which is already known by a NIC and
has kind = permanentExpired gets promoted to permanent, the new
PrimaryEndpointBehavior is respected.

PiperOrigin-RevId: 276136317
2019-10-22 13:54:33 -07:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Ghanan Gowripalan fb69de696b Auto-generate an IPv6 link-local address based on the NIC's MAC Address.
This change adds support for optionally auto-generating an IPv6 link-local
address based on the NIC's MAC Address on NIC enable.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that a link-local
address will not be auto-generated unless the stack is explicitly configured.
See `stack.Options` for more details. Specifically, see
`stack.Options.AutoGenIPv6LinkLocal`.

Tests: Tests to make sure that the IPb6 link-local address is only
auto-generated if the stack is specifically configured to do so. Also tests to
make sure that an auto-generated address goes through the DAD process.
PiperOrigin-RevId: 276059813
2019-10-22 07:26:54 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Tamir Duberstein 51538c973e Store primary endpoints in a slice
There's no need for a linked list here.

PiperOrigin-RevId: 275565920
2019-10-18 16:14:09 -07:00
Tamir Duberstein 4e6f3a0c71 Remove restrictions on the sending address
It is quite legal to send from the ANY address (it is required for
DHCP). I can't figure out why the broadcast address was included here,
so removing that as well.

PiperOrigin-RevId: 275541954
2019-10-18 14:10:30 -07:00
Ghanan Gowripalan 962aa235de NDP Neighbor Solicitations sent during DAD must have an IP hop limit of 255
NDP Neighbor Solicitations sent during Duplicate Address Detection must have an
IP hop limit of 255, as all NDP Neighbor Solicitations should have.

Test: Test that DAD messages have the IPv6 hop limit field set to 255.
PiperOrigin-RevId: 275321680
2019-10-17 13:06:15 -07:00
Ghanan Gowripalan 06ed9e329d Do Duplicate Address Detection on permanent IPv6 addresses.
This change adds support for Duplicate Address Detection on IPv6 addresses
as defined by RFC 4862 section 5.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that DAD will not be
performed. See `stack.Options` and `stack.NDPConfigurations` for more details.

Tests: Tests to make sure that the DAD process properly resolves or fails.
That is, tests make sure that DAD resolves only if:
  - No other node is performing DAD for the same address
  - No other node owns the same address
PiperOrigin-RevId: 275189471
2019-10-16 22:54:45 -07:00
Tamir Duberstein db1ca5c786 Set NDP hop limit in accordance with RFC 4861
...and do not populate link address cache at dispatch. This partially
reverts 313c767b00, which caused malformed
packets (e.g. NDP Neighbor Adverts with incorrect hop limit values) to
populate the address cache. In particular, this masked a bug that was
introduced to the Neighbor Advert generation code in
7c1587e340.

PiperOrigin-RevId: 274865182
2019-10-15 12:43:25 -07:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Kevin Krakauer 2302afb53d Reorder BUILD license and load functions in netstack.
PiperOrigin-RevId: 274672346
2019-10-14 15:21:59 -07:00
gVisor bot bf870c1a42 Internal change.
PiperOrigin-RevId: 273861936
2019-10-09 17:56:05 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
Bhasker Hariharan 61f6fbd0ce Fix bugs in PickEphemeralPort for TCP.
Netstack always picks a random start point everytime PickEphemeralPort
is called. While this is required for UDP so that DNS requests go
out through a randomized set of ports it is not required for TCP. Infact
Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret
initialized at start of the application to get a random offset. But to
ensure it doesn't start from the same point on every scan it uses a static
hint that is incremented by 2 in every call to pick ephemeral ports.

The reason for 2 is Linux seems to split the port ranges where active connects
seem to use even ones while odd ones are used by listening sockets.

This CL implements a similar strategy where we use a hash + hint to generate
the offset to start the search for a free Ephemeral port.

This ensures that we cycle through the available port space in order for
repeated connects to the same destination and significantly reduces the
chance of picking a recently released port.

PiperOrigin-RevId: 272058370
2019-09-30 13:55:22 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Chris Kuiper 6704d625ef Return only primary addresses in Stack.NICInfo()
Non-primary addresses are used for endpoints created to accept multicast and
broadcast packets, as well as "helper" endpoints (0.0.0.0) that allow sending
packets when no proper address has been assigned yet (e.g., for DHCP). These
addresses are not real addresses from a user point of view and should not be
part of the NICInfo() value. Also see b/127321246 for more info.

This switches NICInfo() to call a new NIC.PrimaryAddresses() function. To still
allow an option to get all addresses (mostly for testing) I added
Stack.GetAllAddresses() and NIC.AllAddresses().

In addition, the return value for GetMainNICAddress() was changed for the case
where the NIC has no primary address. Instead of returning an error here,
it now returns an empty AddressWithPrefix() value. The rational for this
change is that it is a valid case for a NIC to have no primary addresses.

Lastly, I refactored the code based on the new additions.

PiperOrigin-RevId: 270971764
2019-09-24 13:21:20 -07:00
Tamir Duberstein bbaaa1fcc2 Simplify ICMPRateLimiter
https://github.com/golang/time/commit/c4c64ca added SetBurst upstream.

PiperOrigin-RevId: 270925077
2019-09-24 09:50:51 -07:00
Andrei Vagin 03ee55cc62 netstack: convert more socket options to {Set,Get}SockOptInt
PiperOrigin-RevId: 270763208
2019-09-23 14:39:14 -07:00
Ian Gudger 002f1d4aae Allow waiting for LinkEndpoint worker goroutines to finish.
Previously, the only safe way to use an fdbased endpoint was to leak the FD.
This change makes it possible to safely close the FD.

This is the first step towards having stoppable stacks.

Updates #837

PiperOrigin-RevId: 270346582
2019-09-20 14:10:02 -07:00
Ghanan Gowripalan 60fe8719e1 Automated rollback of changelist 268047073
PiperOrigin-RevId: 269658971
2019-09-17 14:47:09 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Ghanan Gowripalan 857940d30d Automated rollback of changelist 268047073
PiperOrigin-RevId: 268757842
2019-09-12 13:52:25 -07:00
Ghanan Gowripalan a8943325db Join IPv6 all-nodes and solicited-node multicast addresses where appropriate.
The IPv6 all-nodes multicast address will be joined on NIC enable, and the
appropriate IPv6 solicited-node multicast address will be joined when IPv6
addresses are added.

Tests: Test receiving packets destined to the IPv6 link-local all-nodes
multicast address and the IPv6 solicted node address of an added IPv6 address.
PiperOrigin-RevId: 268047073
2019-09-09 12:06:06 -07:00
Ian Gudger fe1f521077 Remove reundant global tcpip.LinkEndpointID.
PiperOrigin-RevId: 267709597
2019-09-06 18:01:14 -07:00
Chris Kuiper 7bf1d426d5 Handle subnet and broadcast addresses correctly with NIC.subnets
This also renames "subnet" to "addressRange" to avoid any more confusion with
an interface IP's subnet.

Lastly, this also removes the Stack.ContainsSubnet(..) API since it isn't used
by anyone. Plus the same information can be obtained from
Stack.NICAddressRanges().

PiperOrigin-RevId: 267229843
2019-09-04 14:19:32 -07:00
Bhasker Hariharan 3789c34b22 Make UDP traceroute work.
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.

Fixes #703

PiperOrigin-RevId: 267034418
2019-09-03 16:01:17 -07:00
Chris Kuiper afbdf2f212 Fix data race accessing referencedNetworkEndpoint.kind
Wrapping "kind" into atomic access functions.

Fixes #789

PiperOrigin-RevId: 266485501
2019-08-30 17:23:53 -07:00
Tamir Duberstein 24ecce5dbf Export generated linkAddrEntryEntry
PiperOrigin-RevId: 266000128
2019-08-28 14:56:33 -07:00
Tamir Duberstein 313c767b00 Populate link address cache at dispatch
This allows the stack to learn remote link addresses on incoming
packets, reducing the need to ARP to send responses.

This also reduces the number of round trips to the system clock,
since that may also prove to be performance-sensitive.

Fixes #739.

PiperOrigin-RevId: 265815816
2019-08-27 18:54:56 -07:00
Chris Kuiper ac2200b8a9 Prevent a network endpoint to send/rcv if its address was removed
This addresses the problem where an endpoint has its address removed but still
has outstanding references held by routes used in connected TCP/UDP sockets
which prevent the removal of the endpoint.

The fix adds a new "expired" flag to the referenced network endpoint, which is
set when an endpoint has its address removed. Incoming packets are not
delivered to an expired endpoint (unless in promiscuous mode), while sending
outgoing packets triggers an error to the caller (unless in spoofing mode).

In addition, a few helper functions were added to stack_test.go to reduce
code duplications.

PiperOrigin-RevId: 265514326
2019-08-26 12:29:47 -07:00
Tamir Duberstein 573e6e4bba Use tcpip.Subnet in tcpip.Route
This is the first step in replacing some of the redundant types with the
standard library equivalents.

PiperOrigin-RevId: 264706552
2019-08-21 15:31:18 -07:00
Andrei Vagin 3e4102b2ea netstack: disconnect an unix socket only if the address family is AF_UNSPEC
Linux allows to call connect for ANY and the zero port.

PiperOrigin-RevId: 263892534
2019-08-16 19:32:14 -07:00
Chris Kuiper f7114e0a27 Add subnet checking to NIC.findEndpoint and consolidate with NIC.getRef
This adds the same logic to NIC.findEndpoint that is already done in
NIC.getRef. Since this makes the two functions very similar they were combined
into one with the originals being wrappers.

PiperOrigin-RevId: 263864708
2019-08-16 15:58:58 -07:00
Tamir Duberstein d81d94ac4c Replace uinptr with int64 when returning lengths
This is in accordance with newer parts of the standard library.

PiperOrigin-RevId: 263449916
2019-08-14 16:05:56 -07:00
Rahat Mahmood 13a98df49e netstack: Don't start endpoint goroutines too soon on restore.
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.

This CL defers resuming all protocol goroutines until the end of
restore.

PiperOrigin-RevId: 262409429
2019-08-08 12:33:11 -07:00
Kevin Krakauer 810cc07aab Plumbing for iptables sockopts.
PiperOrigin-RevId: 261413396
2019-08-02 16:26:48 -07:00
Rahat Mahmood 2906dffcdb Automated rollback of changelist 261191548
PiperOrigin-RevId: 261373749
2019-08-02 12:52:40 -07:00
Rahat Mahmood 79511e8a50 Implement getsockopt(TCP_INFO).
Export some readily-available fields for TCP_INFO and stub out the rest.

PiperOrigin-RevId: 261191548
2019-08-01 13:58:48 -07:00
Tamir Duberstein 7369c63e42 Pass ProtocolAddress instead of its fields
PiperOrigin-RevId: 260803517
2019-07-30 15:06:39 -07:00
Chris Kuiper 40e682759f Add support for a subnet prefix length on interface network addresses
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.

PiperOrigin-RevId: 259807693
2019-07-24 13:42:14 -07:00
Andrei Vagin eefa817cfd net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)
PiperOrigin-RevId: 258859507
2019-07-18 15:41:04 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Bhasker Hariharan 3d71c627fa Add support for TCP receive buffer auto tuning.
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.

This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.

The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf

NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.

Updates #230

PiperOrigin-RevId: 253168283
2019-06-13 22:28:01 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00