Commit Graph

444 Commits

Author SHA1 Message Date
Kevin Krakauer aeb3a4017b Working on filtering by protocol. 2020-01-08 22:10:35 -08:00
Kevin Krakauer 06e2366e96 Merge branch 'iptables-write' into iptables-write-input-drop 2020-01-08 20:05:02 -08:00
Ghanan Gowripalan d057871f41 CancellableTimer to encapsulate the work of safely stopping timers
Add a new CancellableTimer type to encapsulate the work of safely stopping
timers when it fires at the same time some "related work" is being handled. The
term "related work" is some work that needs to be done while having obtained
some common lock (L).

Example: Say we have an invalidation timer that may be extended or cancelled by
some event. Creating a normal timer and simply cancelling may not be sufficient
as the timer may have already fired when the event handler attemps to cancel it.
Even if the timer and event handler obtains L before doing work, once the event
handler releases L, the timer will eventually obtain L and do some unwanted
work.

To prevent the timer from doing unwanted work, it checks if it should early
return instead of doing the normal work after obtaining L. When stopping the
timer callers must have L locked so the timer can be safely informed that it
should early return.

Test: Tests that CancellableTimer fires and resets properly. Test to make sure
the timer fn is not called after being stopped within the lock L.
PiperOrigin-RevId: 288806984
2020-01-08 17:50:54 -08:00
Kevin Krakauer ae060a63d9 More GH comments. 2020-01-08 17:30:08 -08:00
Kevin Krakauer 0999ae8b34 Getting a panic when running tests. For some reason the filter table is
ending up with the wrong chains and is indexing -1 into rules.
2020-01-08 15:57:25 -08:00
Tamir Duberstein d530df2f95 Introduce tcpip.SockOptBool
...and port V6OnlyOption to it.

PiperOrigin-RevId: 288789451
2020-01-08 15:40:48 -08:00
Bert Muthalaly e21c584056 Combine various Create*NIC methods into CreateNICWithOptions.
PiperOrigin-RevId: 288779416
2020-01-08 14:50:49 -08:00
Kevin Krakauer b2a881784c Built dead-simple traversal, but now getting depedency cycle error :'( 2020-01-08 14:48:47 -08:00
Tamir Duberstein a271bccfc6 Rename tcpip.SockOpt{,Int}
PiperOrigin-RevId: 288772878
2020-01-08 14:20:07 -08:00
Kevin Krakauer 446a250996 Comment cleanup. 2020-01-08 11:20:48 -08:00
Kevin Krakauer 1e1921e2ac Minor fixes to comments and logging 2020-01-08 11:15:46 -08:00
Tamir Duberstein 9df018767c Remove redundant function argument
PacketLooping is already a member on the passed Route.

PiperOrigin-RevId: 288721500
2020-01-08 10:22:51 -08:00
Kevin Krakauer 8cc1c35bbd Write simple ACCEPT rules to the filter table.
This gets us closer to passing the iptables tests and opens up iptables
so it can be worked on by multiple people.

A few restrictions are enforced for security (i.e. we don't want to let
users write a bunch of iptables rules and then just not enforce them):

- Only the filter table is writable.
- Only ACCEPT rules with no matching criteria can be added.
2020-01-08 10:08:14 -08:00
Bert Muthalaly 0cc1e74b57 Add NIC.isLoopback()
...enabling us to remove the "CreateNamedLoopbackNIC" variant of
CreateNIC and all the plumbing to connect it through to where the value
is read in FindRoute.

PiperOrigin-RevId: 288713093
2020-01-08 09:30:20 -08:00
Marek Majkowski c276e4740f Fix #1522 - implement silly window sydrome protection on rx side
Before, each of small read()'s that raises window either from zero
or above threshold of aMSS, would generate an ACK. In a classic
silly-window-syndrome scenario, we can imagine a pessimistic case
when small read()'s generate a stream of ACKs.

This PR fixes that, essentially treating window size < aMSS as zero.
We send ACK exactly in a moment when window increases to >= aMSS
or half of receive buffer size (whichever smaller).
2020-01-08 12:56:39 +00:00
Ghanan Gowripalan 4e19d165cc Support deprecating SLAAC addresses after the preferred lifetime
Support deprecating network endpoints on a NIC. If an endpoint is deprecated, it
should not be used for new connections unless a more preferred endpoint is not
available, or unless the deprecated endpoint was explicitly requested.

Test: Test that deprecated endpoints are only returned when more preferred
endpoints are not available and SLAAC addresses are deprecated after its
preferred lifetime
PiperOrigin-RevId: 288562705
2020-01-07 13:42:08 -08:00
Marek Majkowski 08a97a6d19 #1398 - send ACK when available buffer space gets larger than 1 MSS
When receiving data, netstack avoids sending spurious acks. When
user does recv() should netstack send ack telling the sender that
the window was increased? It depends. Before this patch, netstack
_will_ send the ack in the case when window was zero or window >>
scale was zero. Basically - when recv space increased from zero.

This is not working right with silly-window-avoidance on the sender
side. Some network stacks refuse to transmit segments, that will fill
the window but are below MSS. Before this patch, this confuses
netstack. On one hand if the window was like 3 bytes, netstack
will _not_ send ack if the window increases. On the other hand
sending party will refuse to transmit 3-byte packet.

This patch changes that, making netstack will send an ACK when
the available buffer size increases to or above 1*MSS. This will
inform other party buffer is large enough, and hopefully uncork it.


Signed-off-by: Marek Majkowski <marek@cloudflare.com>
2020-01-07 20:35:39 +00:00
Ghanan Gowripalan 2031cc4701 Disable auto-generation of IPv6 link-local addresses for loopback NICs
Test: Test that an IPv6 link-local address is not auto-generated for loopback
NICs, even when it is enabled for non-loopback NICS.
PiperOrigin-RevId: 288519591
2020-01-07 10:09:39 -08:00
Ghanan Gowripalan 8dfd922840 Pass the NIC-internal name to the NIC name function when generating opaque IIDs
Pass the NIC-internal name to the NIC name function when generating opaque IIDs
so implementations can use the name that was provided when the NIC was created.
Previously, explicit NICID to NIC name resolution was required from the netstack
integrator.

Tests: Test that the name provided when creating a NIC is passed to the NIC name
function when generating opaque IIDs.
PiperOrigin-RevId: 288395359
2020-01-06 16:21:31 -08:00
Ghanan Gowripalan 83ab47e87b Use opaque interface identifiers when generating IPv6 addresses via SLAAC
Support using opaque interface identifiers when generating IPv6 addresses via
SLAAC when configured to do so.

Note, this change does not handle retries in response to DAD conflicts yet.
That will also come in a later change.

Test: Test that when SLAAC addresses are generated, they use opaque interface
identifiers when configured to do so.
PiperOrigin-RevId: 288078605
2020-01-03 18:28:22 -08:00
Ghanan Gowripalan d1d878a801 Support generating opaque interface identifiers as defined by RFC 7217
Support generating opaque interface identifiers as defined by RFC 7217 for
auto-generated IPv6 link-local addresses. Opaque interface identifiers will also
be used for IPv6 addresses auto-generated via SLAAC in a later change.

Note, this change does not handle retries in response to DAD conflicts yet.
That will also come in a later change.

Tests: Test that when configured to generated opaque IIDs, they are properly
generated as outlined by RFC 7217.
PiperOrigin-RevId: 288035349
2020-01-03 13:00:19 -08:00
Marek Majkowski 200cf245c4 netstack: minor fix typo in "if err" handler 2019-12-31 16:49:30 +01:00
gVisor bot 87e4d03fdf Automated rollback of changelist 287029703
PiperOrigin-RevId: 287217899
2019-12-26 13:05:52 -08:00
Ryan Heacock e013c48c78 Enable IP_RECVTOS socket option for datagram sockets
Added the ability to get/set the IP_RECVTOS socket option on UDP endpoints. If
enabled, TOS from the incoming Network Header passed as ancillary data in the
ControlMessages.

Test:
* Added unit test to udp_test.go that tests getting/setting as well as
verifying that we receive expected TOS from incoming packet.
* Added a syscall test
PiperOrigin-RevId: 287029703
2019-12-24 08:49:39 -08:00
Ghanan Gowripalan 5bc4ae9d57 Clear any host-specific NDP state when becoming a router
This change supports clearing all host-only NDP state when NICs become routers.
All discovered routers, discovered on-link prefixes and auto-generated addresses
will be invalidated when becoming a router. This is because normally, routers do
not process Router Advertisements to discover routers or on-link prefixes, and
do not do SLAAC.

Tests: Unittest to make sure that all discovered routers, discovered prefixes
and auto-generated addresses get invalidated when transitioning from a host to
a router.
PiperOrigin-RevId: 286902309
2019-12-23 08:55:25 -08:00
Kevin Krakauer 08c39e2587 Change TODO to track correct bug.
PiperOrigin-RevId: 286639163
2019-12-20 14:19:21 -08:00
Andrei Vagin 57ce26c0b4 net/tcp: allow to call listen without bind
When listen(2) is called on an unbound socket, the socket is
automatically bound to a random free port with the local address
set to INADDR_ANY.

PiperOrigin-RevId: 286305906
2019-12-18 18:24:17 -08:00
Ghanan Gowripalan 8e6e87f8e8 Allow 'out-of-line' routing table updates for Router and Prefix discovery events
This change removes the requirement that a new routing table be provided when a
router or prefix discovery event happens so that an updated routing table may
be provided to the stack at a later time from the event.

This change is to address the use case where the netstack integrator may need to
obtain a lock before providing updated routes in response to the events above.

As an example, say we have an integrator that performs the below two operations
operations as described:
A. Normal route update:
  1. Obtain integrator lock
  2. Update routes in the integrator
  3. Call Stack.SetRouteTable with the updated routes
    3.1. Obtain Stack lock
    3.2. Update routes in Stack
    3.3. Release Stack lock
  4. Release integrator lock
B. NDP event triggered route update:
  1. Obtain Stack lock
  2. Call event handler
    2.1. Obtain integrator lock
    2.2. Update routes in the integrator
    2.3. Release integrator lock
    2.4. Return updated routes to update Stack
  3. Update routes in Stack
  4. Release Stack lock

A deadlock may occur if a Normal route update was attemped at the same time an
NDP event triggered route update was attempted. With threads T1 and T2:
1) T1 -> A.1, A.2
2) T2 -> B.1
3) T1 -> A.3 (hangs at A.3.1 since Stack lock is taken in step 2)
4) T2 -> B.2 (hangs at B.2.1 since integrator lock is taken in step 1)

Test: Existing tests were modified to not provide or expect routing table
changes in response to Router and Prefix discovery events.
PiperOrigin-RevId: 286274712
2019-12-18 15:18:33 -08:00
Ghanan Gowripalan 628948b1e1 Cleanup NDP Tests
This change makes sure that test variables are captured before running tests
in parallel, and removes unneeded buffered channel allocations. This change also
removes unnecessary timeouts.

PiperOrigin-RevId: 286255066
2019-12-18 14:24:39 -08:00
gVisor bot 3f4d8fefb4 Internal change.
PiperOrigin-RevId: 286003946
2019-12-17 10:10:06 -08:00
Ghanan Gowripalan ad80dcf470 Properly generate the EUI64 interface identifier from an Ethernet address
Fixed a bug where the interface identifier was not properly generated from an
Ethernet address.

Tests: Unittests to make sure the functions generating the EUI64 interface
identifier are correct.
PiperOrigin-RevId: 285494562
2019-12-13 16:41:41 -08:00
Bhasker Hariharan 6fc9f0aefd Add support for TCP_USER_TIMEOUT option.
The implementation follows the linux behavior where specifying
a TCP_USER_TIMEOUT will cause the resend timer to honor the
user specified timeout rather than the default rto based timeout.

Further it alters when connections are timedout due to keepalive
failures. It does not alter the behavior of when keepalives are
sent. This is as per the linux behavior.

PiperOrigin-RevId: 285099795
2019-12-11 17:52:53 -08:00
Michael Pratt 0d027262e0 Add additional packages to go branch
We're missing several packages that runsc doesn't depend on. Most notable are
several tcpip link packages.

To find packages, I looked at a diff of directories on master vs go:

$ bazel build //:gopath
$ find bazel-bin/gopath/src/gvisor.dev/gvisor/ -type d > /tmp/gopath.txt
$ find . -type d > /tmp/master.txt
$ sed 's|bazel-bin/gopath/src/gvisor.dev/gvisor/||' < /tmp/gopath.txt > /tmp/gopath.trunc.txt
$ sed 's|./||' < /tmp/master.txt > /tmp/master.trunc.txt
$ vimdiff /tmp/gopath.trunc.txt /tmp/master.trunc.txt

Testing packages are still left out because :gopath can't depend on testonly
targets...

PiperOrigin-RevId: 285049029
2019-12-11 14:22:36 -08:00
Ghanan Gowripalan 4ff71b5be4 Inform the integrator on receipt of an NDP Recursive DNS Server option
This change adds support to let an integrator know when it receives an NDP
Router Advertisement message with the NDP Recursive DNS Server option with at
least one DNS server's address. The stack will not maintain any state related to
the DNS servers - the integrator is expected to maintain any required state and
invalidate the servers after its valid lifetime expires, or refresh the lifetime
when a new one is received for a known DNS server.

Test: Unittest to make sure that an event is sent to the integrator when an NDP
Recursive DNS Server option is received with at least one address.
PiperOrigin-RevId: 284890502
2019-12-10 18:05:23 -08:00
Ian Gudger 18af75db9d Add UDP SO_REUSEADDR support to the port manager.
Next steps include adding support to the transport demuxer and the UDP endpoint.

PiperOrigin-RevId: 284652151
2019-12-09 15:53:00 -08:00
Mithun Iyer b1d44be7ad Add TCP stats for connection close and keep-alive timeouts.
Fix bugs in updates to TCP CurrentEstablished stat.

Fixes #1277

PiperOrigin-RevId: 284292459
2019-12-06 17:17:33 -08:00
Bhasker Hariharan 3e84777d2e Fix flakiness in tcp_test.
This change marks the socket as ESTABLISHED and creates the receiver and sender
the moment we send the final ACK in case of an active TCP handshake or when we
receive the final ACK for a passive TCP handshake. Before this change there was
a short window in which an ACK can be received and processed but the state on
the socket is not yet ESTABLISHED.

This can be seen in TestConnectBindToDevice which is flaky because sometimes
the socket is in SYN-SENT and not ESTABLISHED even though the other side has
already received the final ACK of the handshake.

PiperOrigin-RevId: 284277713
2019-12-06 15:46:26 -08:00
Ghanan Gowripalan ab3f7bc393 Do IPv6 Stateless Address Auto-Configuration (SLAAC)
This change allows the netstack to do SLAAC as outlined by RFC 4862 section 5.5.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that SLAAC
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change reuses 1 option and introduces a new one that is required to take
advantage of SLAAC, all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- AutoGenGlobalAddresses: Whether or not SLAAC is performed.

Also note, this change does not deprecate SLAAC generated addresses after the
preferred lifetime. That will come in a later change (b/143713887). Currently,
only the valid lifetime is honoured.

Tests: Unittest to make sure that SLAAC generates and adds addresses only when
configured to do so. Tests also makes sure that conflicts with static addresses
do not modify the static address.
PiperOrigin-RevId: 284265317
2019-12-06 14:41:30 -08:00
Ghanan Gowripalan 10f7b109ab Add a type to represent the NDP Recursive DNS Server option
This change adds a type to represent the NDP Recursive DNS Server option, as
defined by RFC 8106 section 5.1.

PiperOrigin-RevId: 284005493
2019-12-05 10:41:45 -08:00
Andrei Vagin cf7f27c167 net/udp: return a local route address as the bound-to address
If the socket is bound to ANY and connected to a loopback address,
getsockname() has to return the loopback address. Without this fix,
getsockname() returns ANY.

PiperOrigin-RevId: 283647781
2019-12-03 16:32:13 -08:00
Bhasker Hariharan 27e2c4ddca Fix panic due to early transition to Closed.
The code in rcv.consumeSegment incorrectly transitions to
CLOSED state from LAST-ACK before the final ACK for the FIN.

Further if receiving a segment changes a socket to a closed state
then we should not invoke the sender as the socket is now closed
and sending any segments is incorrect.

PiperOrigin-RevId: 283625300
2019-12-03 14:41:55 -08:00
Ghanan Gowripalan 10bbcf97d2 Test handling segments on completed but not yet accepted TCP connections
This change does not introduce any new features, or modify existing ones.

This change tests handling TCP segments right away for connections that were
completed from a listening endpoint.

PiperOrigin-RevId: 282986457
2019-11-28 17:15:07 -08:00
Dean Deng 684f757a22 Add support for receiving TOS and TCLASS control messages in hostinet.
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.

PiperOrigin-RevId: 282851425
2019-11-27 16:21:05 -08:00
Kevin Krakauer 1641338b14 Set transport and network headers on outbound packets.
These are necessary for iptables to read and parse headers for packet filtering.

PiperOrigin-RevId: 282372811
2019-11-25 09:37:53 -08:00
Kevin Krakauer 2b1b51f1d7 Fix panic in sniffer.
Packets written via SOCK_RAW are guaranteed to have network headers, but not
transport headers. Check first whether there are enough bytes left in the packet
to contain a transport header before attempting to parse it.

PiperOrigin-RevId: 282363895
2019-11-25 09:11:05 -08:00
Adin Scannell c3b93afeaf Cleanup visibility.
PiperOrigin-RevId: 282194656
2019-11-23 23:54:41 -08:00
Adin Scannell b0a1bbd3e2 Internal change.
PiperOrigin-RevId: 282068093
2019-11-22 16:56:31 -08:00
Ian Gudger 8eb68912e4 Store SO_BINDTODEVICE state at bind.
This allows us to ensure that the correct port reservation is released.

Fixes #1217

PiperOrigin-RevId: 282048155
2019-11-22 15:20:52 -08:00
Kevin Krakauer 9db08c4e58 Use PacketBuffers with GSO.
PiperOrigin-RevId: 282045221
2019-11-22 14:52:35 -08:00
Mithun Iyer f27f38d137 Add segment dequeue check while emptying segment queue.
PiperOrigin-RevId: 282023891
2019-11-22 13:15:33 -08:00
Bhasker Hariharan 5107e6b6bd Automated rollback of changelist 280594395
PiperOrigin-RevId: 280763655
2019-11-15 16:52:34 -08:00
Mithun Iyer 3e534f2974 Handle in-flight TCP segments when moving to CLOSE.
As we move to CLOSE state from LAST-ACK or TIME-WAIT,
ensure that we re-match all in-flight segments to any
listening endpoint.

Also fix LISTEN state handling of any ACK segments as per RFC793.

Fixes #1153

PiperOrigin-RevId: 280703556
2019-11-15 12:11:36 -08:00
Kevin Krakauer 23574b1b87 Fix panic when logging raw packets via sniffer.
Sniffer assumed that outgoing packets have transport headers, but
users can write packets via SOCK_RAW with arbitrary transport headers that
netstack doesn't know about. We now explicitly check for the presence of network
and transport headers before assuming they exist.

PiperOrigin-RevId: 280594395
2019-11-14 22:55:15 -08:00
Kevin Krakauer 3f7d937090 Use PacketBuffers for outgoing packets.
PiperOrigin-RevId: 280455453
2019-11-14 10:15:38 -08:00
Bhasker Hariharan 6dd4c9ee74 Fix flaky behaviour during S/R.
PiperOrigin-RevId: 280280156
2019-11-13 14:40:08 -08:00
Ghanan Gowripalan 3f51bef8cd Do not handle TCP packets that include a non-unicast IP address
This change drops TCP packets with a non-unicast IP address as the source or
destination address as TCP is meant for communication between two endpoints.

Test: Make sure that if the source or destination address contains a non-unicast
address, no TCP packet is sent in response and the packet is dropped.
PiperOrigin-RevId: 280073731
2019-11-12 15:50:02 -08:00
Ghanan Gowripalan 5398530e45 Discover on-link prefixes from Router Advertisements' Prefix Information options
This change allows the netstack to do NDP's Prefix Discovery as outlined by
RFC 4861 section 6.3.4. If configured to do so, when a new on-link prefix is
discovered, the routing table will be updated with a device route through
the nic the RA arrived at. Likewise, when such a prefix gets invalidated, the
device route will be removed.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that Prefix Discovery
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change reuses 1 option and introduces a new one that is required to take
advantage of Prefix Discovery, all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- DiscoverOnLinkPrefixes: Whether or not Prefix Discovery is performed (new)

Another note: for a NIC to process Prefix Information options (in Router
Advertisements), it must not be a router itself. Currently the netstack does not
have per-interface routing configuration; the routing/forwarding configuration
is controlled stack-wide. Therefore, if the stack is configured to enable
forwarding/routing, no router Advertisements (and by extension the Prefix
Information options) will be processed.

Tests: Unittest to make sure that Prefix Discovery and updates to the routing
table only occur if explicitly configured to do so. Unittest to make sure at
max stack.MaxDiscoveredOnLinkPrefixes discovered on-link prefixes are
remembered.
PiperOrigin-RevId: 280049278
2019-11-12 14:09:43 -08:00
Ian Gudger 57a2a5ea33 Add tests for SO_REUSEADDR and SO_REUSEPORT.
* Basic tests for the SO_REUSEADDR and SO_REUSEPORT options.
* SO_REUSEADDR functional tests for TCP and UDP.
* SO_REUSEADDR and SO_REUSEPORT interaction tests for UDP.
* Stubbed support for UDP getsockopt(SO_REUSEADDR).

PiperOrigin-RevId: 280049265
2019-11-12 14:04:14 -08:00
gVisor bot 7730716800 Make `connect` on socket returned by `accept` correctly error out with EISCONN
PiperOrigin-RevId: 279814493
2019-11-11 14:15:06 -08:00
Bhasker Hariharan 66ebb6575f Add support for TIME_WAIT timeout.
This change adds explicit support for honoring the 2MSL timeout
for sockets in TIME_WAIT state. It also adds support for the
TCP_LINGER2 option that allows modification of the FIN_WAIT2
state timeout duration for a given socket.

It also adds an option to modify the Stack wide TIME_WAIT timeout
but this is only for testing. On Linux this is fixed at 60s.

Further, we also now correctly process RST's in CLOSE_WAIT and
close the socket similar to linux without moving it to error
state.

We also now handle SYN in ESTABLISHED state as per
RFC5961#section-4.1. Earlier we would just drop these SYNs.
Which can result in some tests that pass on linux to fail on
gVisor.

Netstack now honors TIME_WAIT correctly as well as handles the
following cases correctly.

- TCP RSTs in TIME_WAIT are ignored.
- A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT
  and a dup ACK is sent in response to the FIN as the dup FIN
  indicates potential loss of the original final ACK.
- An out of order segment during TIME_WAIT generates a dup ACK.
- A new SYN w/ a sequence number > the highest sequence number
  in the previous connection closes the TIME_WAIT early and
  opens a new connection.

Further to make the SYN case work correctly the ISN (Initial
Sequence Number) generation for Netstack has been updated to
be as per RFC. Its not a pure random number anymore and follows
the recommendation in https://tools.ietf.org/html/rfc6528#page-3.

The current hash used is not a cryptographically secure hash
function. A separate change will update the hash function used
to Siphash similar to what is used in Linux.

PiperOrigin-RevId: 279106406
2019-11-07 09:46:55 -08:00
Ghanan Gowripalan 0c424ea731 Rename nicid to nicID to follow go-readability initialisms
https://github.com/golang/go/wiki/CodeReviewComments#initialisms

This change does not introduce any new functionality. It just renames variables
from `nicid` to `nicID`.

PiperOrigin-RevId: 278992966
2019-11-06 19:41:25 -08:00
gVisor bot adb10f4d53 Internal change.
PiperOrigin-RevId: 278979065
2019-11-06 17:56:25 -08:00
Ghanan Gowripalan e63db5e7bb Discover default routers from Router Advertisements
This change allows the netstack to do NDP's Router Discovery as outlined by
RFC 4861 section 6.3.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that Router Discovery
will not be performed. See `stack.Options` and `stack.NDPConfigurations` for
more details.

This change introduces 2 options required to take advantage of Router Discovery,
all available under NDPConfigurations:
- HandleRAs: Whether or not NDP RAs are processes
- DiscoverDefaultRouters: Whether or not Router Discovery is performed

Another note: for a NIC to process Router Advertisements, it must not be a
router itself. Currently the netstack does not have per-interface routing
configuration; the routing/forwarding configuration is controlled stack-wide.
Therefore, if the stack is configured to enable forwarding/routing, no Router
Advertisements will be processed.

Tests: Unittest to make sure that Router Discovery and updates to the routing
table only occur if explicitly configured to do so. Unittest to make sure at
max stack.MaxDiscoveredDefaultRouters discovered default routers are remembered.
PiperOrigin-RevId: 278965143
2019-11-06 16:29:58 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Ghanan Gowripalan d0d89ceedd Send a TCP RST in response to a TCP SYN-ACK on a listening endpoint
This change better follows what is outlined in RFC 793 section 3.4 figure 12
where a listening socket should not accept a SYN-ACK segment in response to a
(potentially) old SYN segment.

Tests: Test that checks the TCP RST segment sent in response to a TCP SYN-ACK
segment received on a listening TCP endpoint.
PiperOrigin-RevId: 278893114
2019-11-06 10:44:20 -08:00
Ghanan Gowripalan a824b48cea Validate incoming NDP Router Advertisements, as per RFC 4861 section 6.1.2
This change validates incoming NDP Router Advertisements as per RFC 4861 section
6.1.2. It also includes the skeleton to handle Router Advertiements that arrive
on some NIC.

Tests: Unittest to make sure only valid NDP Router Advertisements are received/
not dropped.
PiperOrigin-RevId: 278891972
2019-11-06 10:39:29 -08:00
Kevin Krakauer 3246040447 Deep copy dispatcher views.
When VectorisedViews were passed up the stack from packet_dispatchers, we were
passing a sub-slice of the dispatcher's views fields. The dispatchers then
immediately set those views to nil.

This wasn't caught before because every implementer copied the data in these
views before returning.

PiperOrigin-RevId: 277615351
2019-10-30 17:12:57 -07:00
Andrei Vagin db37483cb6 Store endpoints inside multiPortEndpoint in a sorted order
It is required to guarantee the same order of endpoints after save/restore.

PiperOrigin-RevId: 277598665
2019-10-30 15:33:41 -07:00
Ian Gudger dc21c5ca16 Add Close and Wait methods to stack.
Link endpoints still don't have a unified way to be requested to stop.

Updates #837

PiperOrigin-RevId: 277398952
2019-10-29 17:22:32 -07:00
Ian Gudger a2c51efe36 Add endpoint tracking to the stack.
In the future this will replace DanglingEndpoints. DanglingEndpoints must be
kept for now due to issues with save/restore.

This is arguably a cleaner design and allows the stack to know which transport
endpoints might still be using its link endpoints.

Updates #837

PiperOrigin-RevId: 277386633
2019-10-29 16:14:51 -07:00
Michael Pratt c0b8fd4b6a Update build tags to allow Go 1.14
Currently there are no ABI changes. We should check again closer to release.

PiperOrigin-RevId: 277349744
2019-10-29 13:18:16 -07:00
Ian Gudger 7d80e85835 Allow waiting for Endpoint worker goroutines to finish.
Updates #837

PiperOrigin-RevId: 277325162
2019-10-29 11:32:48 -07:00
Ghanan Gowripalan 41e2df1bde Support iterating an NDP options buffer.
This change helps support iterating over an NDP options buffer so that
implementations can handle all the NDP options present in an NDP packet.

Note, this change does not yet actually handle these options, it just provides
the tools to do so (in preparation for NDP's Prefix, Parameter, and a complete
implementation of Neighbor Discovery).

Tests: Unittests to make sure we can iterate over a valid NDP options buffer
that may contain multiple options. Also tests to check an iterator before
using it to see if the NDP options buffer is malformed.
PiperOrigin-RevId: 277312487
2019-10-29 10:30:21 -07:00
Ghanan Gowripalan 0864549ecc Use the user supplied TCP MSS when creating a new active socket
This change supports using a user supplied TCP MSS for new active TCP
connections. Note, the user supplied MSS must be less than or equal to the
maximum possible MSS for a TCP connection's route. If it is greater than the
maximum possible MSS, the maximum possible MSS will be used as the connection's
MSS instead.

This change does not use this user supplied MSS for connections accepted from
listening sockets - that will come in a later change.

Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user
supplied MSS if it is not greater than the maximum possible MSS for the route.
PiperOrigin-RevId: 277185125
2019-10-28 18:20:36 -07:00
Ghanan Gowripalan 5a421058a0 Validate the checksum for incoming ICMPv6 packets
This change validates the ICMPv6 checksum field before further processing an
ICMPv6 packet.

Tests: Unittests to make sure that only ICMPv6 packets with a valid checksum
are accepted/processed. Existing tests using checker.ICMPv6 now also check the
ICMPv6 checksum field.
PiperOrigin-RevId: 276779148
2019-10-25 16:06:55 -07:00
Ian Gudger 8f029b3f82 Convert DelayOption to the newer/faster SockOpt int type.
DelayOption is set on all new endpoints in gVisor.

PiperOrigin-RevId: 276746791
2019-10-25 13:15:34 -07:00
Ghanan Gowripalan 27e896f290 Add a type to represent the NDP Prefix Information option.
This change is in preparation for NDP Prefix Discovery and SLAAC where the stack
will need to handle NDP Prefix Information options.

Tests: Test that given an NDP Prefix Information option buffer, correct values
are returned by the field getters.
PiperOrigin-RevId: 276594592
2019-10-24 16:53:08 -07:00
Ghanan Gowripalan e50a1f5739 Remove the amss field from tcpip.tcp.handshake as it was unused
The amss field in the tcpip.tcp.handshake was not used anywhere. Removed it to
not cause confusion with the amss field in the tcpip.tcp.endpoint struct, which
was documented to be used (and is actually being used) for the same purpose.

PiperOrigin-RevId: 276577088
2019-10-24 15:23:43 -07:00
Ghanan Gowripalan f034790ad8 Use interface-specific NDP configurations instead of the stack-wide default.
This change makes it so that NDP work is done using the per-interface NDP
configurations instead of the stack-wide default NDP configurations to correctly
implement RFC 4861 section 6.3.2 (note here, a host is a single NIC operating
as a host device), and RFC 4862 section 5.1.

Test: Test that we can set NDP configurations on a per-interface basis without
affecting the configurations of other interfaces or the stack-wide default. Also
make sure that after the configurations are updated, the updated configurations
are used for NDP processes (e.g. Duplicate Address Detection).
PiperOrigin-RevId: 276525661
2019-10-24 11:09:18 -07:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Ghanan Gowripalan de3dbf8a09 Inform netstack integrator when Duplicate Address Detection completes
This change introduces a new interface, stack.NDPDispatcher. It can be
implemented by the netstack integrator to receive NDP related events. As of this
change, only DAD related events are supported.

Tests: Existing tests were modified to use the NDPDispatcher's DAD events for
DAD tests where it needed to wait for DAD completing (failing and resolving).
PiperOrigin-RevId: 276338733
2019-10-23 13:26:35 -07:00
Ghanan Gowripalan 515e0558d4 Add a type to represent the NDP Router Advertisement message.
This change is in preparation for NDP Router Discovery where the stack will need
to handle NDP Router Advertisments.

Tests: Test that given an NDP Router Advertisement buffer (body of an ICMPv6
packet, correct values are returned by the field getters).
PiperOrigin-RevId: 276146817
2019-10-22 14:41:51 -07:00
Ghanan Gowripalan c356fe2ebb Respect new PrimaryEndpointBehavior when addresses gets promoted to permanent
This change makes sure that when an address which is already known by a NIC and
has kind = permanentExpired gets promoted to permanent, the new
PrimaryEndpointBehavior is respected.

PiperOrigin-RevId: 276136317
2019-10-22 13:54:33 -07:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Ghanan Gowripalan fb69de696b Auto-generate an IPv6 link-local address based on the NIC's MAC Address.
This change adds support for optionally auto-generating an IPv6 link-local
address based on the NIC's MAC Address on NIC enable.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that a link-local
address will not be auto-generated unless the stack is explicitly configured.
See `stack.Options` for more details. Specifically, see
`stack.Options.AutoGenIPv6LinkLocal`.

Tests: Tests to make sure that the IPb6 link-local address is only
auto-generated if the stack is specifically configured to do so. Also tests to
make sure that an auto-generated address goes through the DAD process.
PiperOrigin-RevId: 276059813
2019-10-22 07:26:54 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Tamir Duberstein 51538c973e Store primary endpoints in a slice
There's no need for a linked list here.

PiperOrigin-RevId: 275565920
2019-10-18 16:14:09 -07:00
Mithun Iyer 487d3b2358 Fix typo while initializing protocol for UDP endpoints.
Fixes #763

PiperOrigin-RevId: 275563222
2019-10-18 16:00:11 -07:00
Tamir Duberstein 4e6f3a0c71 Remove restrictions on the sending address
It is quite legal to send from the ANY address (it is required for
DHCP). I can't figure out why the broadcast address was included here,
so removing that as well.

PiperOrigin-RevId: 275541954
2019-10-18 14:10:30 -07:00
Ghanan Gowripalan 962aa235de NDP Neighbor Solicitations sent during DAD must have an IP hop limit of 255
NDP Neighbor Solicitations sent during Duplicate Address Detection must have an
IP hop limit of 255, as all NDP Neighbor Solicitations should have.

Test: Test that DAD messages have the IPv6 hop limit field set to 255.
PiperOrigin-RevId: 275321680
2019-10-17 13:06:15 -07:00
Ghanan Gowripalan 06ed9e329d Do Duplicate Address Detection on permanent IPv6 addresses.
This change adds support for Duplicate Address Detection on IPv6 addresses
as defined by RFC 4862 section 5.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that DAD will not be
performed. See `stack.Options` and `stack.NDPConfigurations` for more details.

Tests: Tests to make sure that the DAD process properly resolves or fails.
That is, tests make sure that DAD resolves only if:
  - No other node is performing DAD for the same address
  - No other node owns the same address
PiperOrigin-RevId: 275189471
2019-10-16 22:54:45 -07:00
Bhasker Hariharan f98c3ee32c Remove panic when reassembly fails.
Reassembly can fail due to an invalid sequence of fragments
being received. eg. Multiple fragments with same id which
claim to be the last one by setting the more flag to 0 etc.
It's safer to just drop the reassembler and increment a metric
than to panic when reassembly fails.

PiperOrigin-RevId: 274920901
2019-10-15 17:04:44 -07:00
Tamir Duberstein db1ca5c786 Set NDP hop limit in accordance with RFC 4861
...and do not populate link address cache at dispatch. This partially
reverts 313c767b00, which caused malformed
packets (e.g. NDP Neighbor Adverts with incorrect hop limit values) to
populate the address cache. In particular, this masked a bug that was
introduced to the Neighbor Advert generation code in
7c1587e340.

PiperOrigin-RevId: 274865182
2019-10-15 12:43:25 -07:00
Jianfeng Tan d277bfba27 epsocket: support /proc/net/snmp
Netstack has its own stats, we use this to fill /proc/net/snmp.

Note that some metrics are not recorded in Netstack, which will be shown
as 0 in the proc file.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15 16:38:41 +00:00
Jianfeng Tan aee2c93366 netstack: add counters for tcp CurrEstab and EstabResets
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-15 16:38:40 +00:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Kevin Krakauer 2302afb53d Reorder BUILD license and load functions in netstack.
PiperOrigin-RevId: 274672346
2019-10-14 15:21:59 -07:00
Bhasker Hariharan a296425970 Use a different fanoutID for each new fdbased endpoint.
PiperOrigin-RevId: 274638272
2019-10-14 13:10:16 -07:00
Bhasker Hariharan c7e901f47a Fix bugs in fragment handling.
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.

PiperOrigin-RevId: 274049313
2019-10-10 15:14:55 -07:00
gVisor bot bf870c1a42 Internal change.
PiperOrigin-RevId: 273861936
2019-10-09 17:56:05 -07:00