Commit Graph

301 Commits

Author SHA1 Message Date
Kevin Krakauer b3ff31d041 fix panic when calling SO_ORIGINAL_DST without initializing iptables
Reported-by: syzbot+074ec22c42305725b79f@syzkaller.appspotmail.com
PiperOrigin-RevId: 328963899
2020-08-28 10:35:18 -07:00
Ghanan Gowripalan 8ae0ab722c Use a single NetworkEndpoint per address
This change was already done as of
https://github.com/google/gvisor/commit/1736b2208f but
https://github.com/google/gvisor/commit/a174aa7597 conflicted with that
change and it was missed in reviews.

This change fixes the conflict.

PiperOrigin-RevId: 328920372
2020-08-28 05:09:15 -07:00
Ghanan Gowripalan 6f8fb7e0db Improve type safety for socket options
The existing implementation for {G,S}etSockOpt take arguments of an
empty interface type which all types (implicitly) implement; any
type may be passed to the functions.

This change introduces marker interfaces for socket options that may be
set or queried which socket option types implement to ensure that invalid
types are caught at compile time. Different interfaces are used to allow
the compiler to enforce read-only or set-only socket options.

Fixes #3714.

RELNOTES: n/a
PiperOrigin-RevId: 328832161
2020-08-27 15:46:44 -07:00
Ghanan Gowripalan dc81eb9c37 Add function to get error from a tcpip.Endpoint
In an upcoming CL, socket option types are made to implement a marker
interface with pointer receivers. Since this results in calling methods
of an interface with a pointer, we incur an allocation when attempting
to get an Endpoint's last error with the current implementation.

When calling the method of an interface, the compiler is unable to
determine what the interface implementation does with the pointer
(since calling a method on an interface uses virtual dispatch at runtime
so the compiler does not know what the interface method will do) so it
allocates on the heap to be safe incase an implementation continues to
hold the pointer after the functioon returns (the reference escapes the
scope of the object).

In the example below, the compiler does not know what b.foo does with
the reference to a it allocates a on the heap as the reference to a may
escape the scope of a.
```
var a int
var b someInterface
b.foo(&a)
```

This change removes the opportunity for that allocation.

RELNOTES: n/a
PiperOrigin-RevId: 328796559
2020-08-27 12:50:19 -07:00
Kevin Krakauer 01a35a2f19 ip6tables: (de)serialize ip6tables structs
More implementation+testing to follow.

#3549.

PiperOrigin-RevId: 328770160
2020-08-27 10:53:49 -07:00
Sam Balana a174aa7597 Add option to replace linkAddrCache with neighborCache
This change adds an option to replace the current implementation of ARP through
linkAddrCache, with an implementation of NUD through neighborCache. Switching
to using NUD for both ARP and NDP is beneficial for the reasons described by
RFC 4861 Section 3.1:

  "[Using NUD] significantly improves the robustness of packet delivery in the
  presence of failing routers, partially failing or partitioned links, or nodes
  that change their link-layer addresses. For instance, mobile nodes can move
  off-link without losing any connectivity due to stale ARP caches."

  "Unlike ARP, Neighbor Unreachability Detection detects half-link failures and
  avoids sending traffic to neighbors with which two-way connectivity is
  absent."

Along with these changes exposes the API for querying and operating the
neighbor cache. Operations include:
  - Create a static entry
  - List all entries
  - Delete all entries
  - Remove an entry by address

This also exposes the API to change the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][1] for the list of
constants.

Finally, an API for subscribing to NUD state changes is exposed through
NUDDispatcher. See [RFC 4861 Appendix C][3] for the list of edges.

Tests:
 pkg/tcpip/network/arp:arp_test
 + TestDirectRequest

 pkg/tcpip/network/ipv6:ipv6_test
 + TestLinkResolution
 + TestNDPValidation
 + TestNeighorAdvertisementWithTargetLinkLayerOption
 + TestNeighorSolicitationResponse
 + TestNeighorSolicitationWithSourceLinkLayerOption
 + TestRouterAdvertValidation

 pkg/tcpip/stack:stack_test
 + TestCacheWaker
 + TestForwardingWithFakeResolver
 + TestForwardingWithFakeResolverManyPackets
 + TestForwardingWithFakeResolverManyResolutions
 + TestForwardingWithFakeResolverPartialTimeout
 + TestForwardingWithFakeResolverTwoPackets
 + TestIPv6SourceAddressSelectionScopeAndSameAddress

[1]: https://tools.ietf.org/html/rfc4861#section-10
[2]: https://tools.ietf.org/html/rfc4861#appendix-C

Fixes #1889
Fixes #1894
Fixes #1895
Fixes #1947
Fixes #1948
Fixes #1949
Fixes #1950

PiperOrigin-RevId: 328365034
2020-08-25 11:09:33 -07:00
Ghanan Gowripalan 339d266be4 Consider loopback bound to all addresses in subnet
When a loopback interface is configurd with an address and associated
subnet, the loopback should treat all addresses in that subnet as an
address it owns.

This is mimicking linux behaviour as seen below:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
^C
--- 192.0.2.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms

$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
^C
--- 192.0.2.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms

$ sudo ip addr add 192.0.2.1/24 dev lo
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 192.0.2.1/24 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms
^C
--- 192.0.2.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms
^C
--- 192.0.2.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms
```

Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 328188546
2020-08-24 12:28:35 -07:00
Michael Pratt 129018ab3d Consistent precondition formatting
Our "Preconditions:" blocks are very useful to determine the input invariants,
but they are bit inconsistent throughout the codebase, which makes them harder
to read (particularly cases with 5+ conditions in a single paragraph).

I've reformatted all of the cases to fit in simple rules:

1. Cases with a single condition are placed on a single line.
2. Cases with multiple conditions are placed in a bulleted list.

This format has been added to the style guide.

I've also mentioned "Postconditions:", though those are much less frequently
used, and all uses already match this style.

PiperOrigin-RevId: 327687465
2020-08-20 13:32:24 -07:00
Ghanan Gowripalan e3c4bbd10a Remove address range functions
Should have been removed in cl/326791119
9a7b5830aa

PiperOrigin-RevId: 327074156
2020-08-17 12:30:05 -07:00
Ghanan Gowripalan 9a7b5830aa Don't support address ranges
Previously the netstack supported assignment of a range of addresses.
This feature is not used so remove it.

PiperOrigin-RevId: 326791119
2020-08-15 00:06:29 -07:00
Ghanan Gowripalan 1736b2208f Use a single NetworkEndpoint per NIC per protocol
The NetworkEndpoint does not need to be created for each address.
Most of the work the NetworkEndpoint does is address agnostic.

PiperOrigin-RevId: 326759605
2020-08-14 17:30:01 -07:00
Ting-Yu Wang 47515f4751 Migrate to PacketHeader API for PacketBuffer.
Formerly, when a packet is constructed or parsed, all headers are set by the
client code. This almost always involved prepending to pk.Header buffer or
trimming pk.Data portion. This is known to prone to bugs, due to the complexity
and number of the invariants assumed across netstack to maintain.

In the new PacketHeader API, client will call Push()/Consume() method to
construct/parse an outgoing/incoming packet. All invariants, such as slicing
and trimming, are maintained by the API itself.

NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no
longer valid.

PacketBuffer now assumes the packet is a concatenation of following portions:
* LinkHeader
* NetworkHeader
* TransportHeader
* Data
Any of them could be empty, or zero-length.

PiperOrigin-RevId: 326507688
2020-08-13 13:08:57 -07:00
Kevin Krakauer 8e31f0dc57 Set the NetworkProtocolNumber of all PacketBuffers.
NetworkEndpoints set the number on outgoing packets in Write() and
NetworkProtocols set them on incoming packets in Parse().

Needed for #3549.

PiperOrigin-RevId: 325938745
2020-08-10 19:34:28 -07:00
Ghanan Gowripalan b404b5c255 Use unicast source for ICMP echo replies
Packets MUST NOT use a non-unicast source address for ICMP
Echo Replies.

Test: integration_test.TestPingMulticastBroadcast
PiperOrigin-RevId: 325634380
2020-08-08 17:45:19 -07:00
Sam Balana 94447aeab3 Fix panic during Address Resolution of neighbor entry created by NS
When a Neighbor Solicitation is received, a neighbor entry is created with the
remote host's link layer address, but without a link layer address resolver. If
the host decides to send a packet addressed to the IP address of that neighbor
entry, Address Resolution starts with a nil pointer to the link layer address
resolver. This causes the netstack to panic and crash.

This change ensures that when a packet is sent in that situation, the link
layer address resolver will be set before Address Resolution begins.

Tests:
 pkg/tcpip/stack:stack_test
 + TestEntryUnknownToStaleToProbeToReachable
 - TestNeighborCacheEntryNoLinkAddress

Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950

PiperOrigin-RevId: 325516471
2020-08-07 15:07:33 -07:00
Ghanan Gowripalan fc4dd3ef45 Join IPv4 all-systems group on NIC enable
Test:
- stack_test.TestJoinLeaveMulticastOnNICEnableDisable
- integration_test.TestIncomingMulticastAndBroadcast
PiperOrigin-RevId: 325185259
2020-08-06 01:32:21 -07:00
Ghanan Gowripalan 90a2d4e823 Support receiving broadcast IPv4 packets
Test: integration_test.TestIncomingSubnetBroadcast
PiperOrigin-RevId: 325135617
2020-08-05 17:32:54 -07:00
Bhasker Hariharan e7b232a5b8 Prefer RLock over Lock in functions that don't need Lock().
Updates #231

PiperOrigin-RevId: 325097683
2020-08-05 14:11:29 -07:00
Nayana Bidari 0e6f7a12c2 Update variables for implementation of RACK in TCP
RACK (Recent Acknowledgement) is a new loss detection
algorithm in TCP. These are the fields which should be
stored on connections to implement RACK algorithm.

PiperOrigin-RevId: 324948703
2020-08-04 20:59:34 -07:00
Kevin Krakauer 2a7b2a61e3 iptables: support SO_ORIGINAL_DST
Envoy (#170) uses this to get the original destination of redirected
packets.
2020-07-31 10:47:26 -07:00
Sam Balana ab4bb38455 Implement neighbor unreachability detection for ARP and NDP.
This change implements the Neighbor Unreachability Detection (NUD) state
machine, as per RFC 4861 [1]. The state machine operates on a single neighbor
in the local network. This requires the state machine to be implemented on each
entry of the neighbor table.

This change also adds, but does not expose, several APIs. The first API is for
performing basic operations on the neighbor table:
 - Create a static entry
 - List all entries
 - Delete all entries
 - Remove an entry by address

The second API is used for changing the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][2] for the list of
constants.

Finally, the last API is for allowing users to subscribe to NUD state changes.
See [RFC 4861 Appendix C][3] for the list of edges.

[1]: https://tools.ietf.org/html/rfc4861
[2]: https://tools.ietf.org/html/rfc4861#section-10
[3]: https://tools.ietf.org/html/rfc4861#appendix-C

Tests:
 pkg/tcpip/stack:stack_test
 - TestNeighborCacheAddStaticEntryThenOverflow
 - TestNeighborCacheClear
 - TestNeighborCacheClearThenOverflow
 - TestNeighborCacheConcurrent
 - TestNeighborCacheDuplicateStaticEntryWithDifferentLinkAddress
 - TestNeighborCacheDuplicateStaticEntryWithSameLinkAddress
 - TestNeighborCacheEntry
 - TestNeighborCacheEntryNoLinkAddress
 - TestNeighborCacheGetConfig
 - TestNeighborCacheKeepFrequentlyUsed
 - TestNeighborCacheNotifiesWaker
 - TestNeighborCacheOverflow
 - TestNeighborCacheOverwriteWithStaticEntryThenOverflow
 - TestNeighborCacheRemoveEntry
 - TestNeighborCacheRemoveEntryThenOverflow
 - TestNeighborCacheRemoveStaticEntry
 - TestNeighborCacheRemoveStaticEntryThenOverflow
 - TestNeighborCacheRemoveWaker
 - TestNeighborCacheReplace
 - TestNeighborCacheResolutionFailed
 - TestNeighborCacheResolutionTimeout
 - TestNeighborCacheSetConfig
 - TestNeighborCacheStaticResolution
 - TestEntryAddsAndClearsWakers
 - TestEntryDelayToProbe
 - TestEntryDelayToReachableWhenSolicitedOverrideConfirmation
 - TestEntryDelayToReachableWhenUpperLevelConfirmation
 - TestEntryDelayToStaleWhenConfirmationWithDifferentAddress
 - TestEntryDelayToStaleWhenProbeWithDifferentAddress
 - TestEntryFailedGetsDeleted
 - TestEntryIncompleteToFailed
 - TestEntryIncompleteToIncompleteDoesNotChangeUpdatedAt
 - TestEntryIncompleteToReachable
 - TestEntryIncompleteToReachableWithRouterFlag
 - TestEntryIncompleteToStale
 - TestEntryInitiallyUnknown
 - TestEntryProbeToFailed
 - TestEntryProbeToReachableWhenSolicitedConfirmationWithSameAddress
 - TestEntryProbeToReachableWhenSolicitedOverrideConfirmation
 - TestEntryProbeToStaleWhenConfirmationWithDifferentAddress
 - TestEntryProbeToStaleWhenProbeWithDifferentAddress
 - TestEntryReachableToStaleWhenConfirmationWithDifferentAddress
 - TestEntryReachableToStaleWhenConfirmationWithDifferentAddressAndOverride
 - TestEntryReachableToStaleWhenProbeWithDifferentAddress
 - TestEntryReachableToStaleWhenTimeout
 - TestEntryStaleToDelay
 - TestEntryStaleToReachableWhenSolicitedOverrideConfirmation
 - TestEntryStaleToStaleWhenOverrideConfirmation
 - TestEntryStaleToStaleWhenProbeUpdateAddress
 - TestEntryStaysDelayWhenOverrideConfirmationWithSameAddress
 - TestEntryStaysProbeWhenOverrideConfirmationWithSameAddress
 - TestEntryStaysReachableWhenConfirmationWithRouterFlag
 - TestEntryStaysReachableWhenProbeWithSameAddress
 - TestEntryStaysStaleWhenProbeWithSameAddress
 - TestEntryUnknownToIncomplete
 - TestEntryUnknownToStale
 - TestEntryUnknownToUnknownWhenConfirmationWithUnknownAddress

 pkg/tcpip/stack:stack_x_test
 - TestDefaultNUDConfigurations
 - TestNUDConfigurationFailsForNotSupported
 - TestNUDConfigurationsBaseReachableTime
 - TestNUDConfigurationsDelayFirstProbeTime
 - TestNUDConfigurationsMaxMulticastProbes
 - TestNUDConfigurationsMaxRandomFactor
 - TestNUDConfigurationsMaxUnicastProbes
 - TestNUDConfigurationsMinRandomFactor
 - TestNUDConfigurationsRetransmitTimer
 - TestNUDConfigurationsUnreachableTime
 - TestNUDStateReachableTime
 - TestNUDStateRecomputeReachableTime
 - TestSetNUDConfigurationFailsForBadNICID
 - TestSetNUDConfigurationFailsForNotSupported

[1]: https://tools.ietf.org/html/rfc4861
[2]: https://tools.ietf.org/html/rfc4861#section-10
[3]: https://tools.ietf.org/html/rfc4861#appendix-C

Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950

PiperOrigin-RevId: 324070795
2020-07-30 13:30:16 -07:00
Ghanan Gowripalan b00858d075 Use brodcast MAC for broadcast IPv4 packets
When sending packets to a known network's broadcast address, use the
broadcast MAC address.

Test:
- stack_test.TestOutgoingSubnetBroadcast
- udp_test.TestOutgoingSubnetBroadcast
PiperOrigin-RevId: 324062407
2020-07-30 12:50:02 -07:00
Fabricio Voznika f82dd8ddb4 Redirect TODO to GitHub issues
PiperOrigin-RevId: 323715260
2020-07-28 21:24:26 -07:00
Sam Balana 8dbf428a12 Add ability to send unicast ARP requests and Neighbor Solicitations
The previous implementation of LinkAddressRequest only supported sending
broadcast ARP requests and multicast Neighbor Solicitations. The ability to
send these packets as unicast is required for Neighbor Unreachability
Detection.

Tests:
 pkg/tcpip/network/arp:arp_test
 - TestLinkAddressRequest

 pkg/tcpip/network/ipv6:ipv6_test
 - TestLinkAddressRequest

Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950

PiperOrigin-RevId: 323451569
2020-07-27 15:21:17 -07:00
Sam Balana 82a5cada59 Add AfterFunc to tcpip.Clock
Changes the API of tcpip.Clock to also provide a method for scheduling and
rescheduling work after a specified duration. This change also implements the
AfterFunc method for existing implementations of tcpip.Clock.

This is the groundwork required to mock time within tests. All references to
CancellableTimer has been replaced with the tcpip.Job interface, allowing for
custom implementations of scheduling work.

This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or
use tcpip.CancellableTimer. Migration plan:
 1. Add AfterFunc(d, f) to tcpip.Clock
 2. Replace references of tcpip.CancellableTimer with tcpip.Job
 3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel
 4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule
 5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob.

PiperOrigin-RevId: 322906897
2020-07-23 18:00:43 -07:00
Kevin Krakauer dd530eeeff iptables: use keyed array literals
PiperOrigin-RevId: 322882426
2020-07-23 15:46:03 -07:00
gVisor bot fc26b3764e Merge pull request #3207 from kevinGC:icmp-connect
PiperOrigin-RevId: 322853192
2020-07-23 13:25:33 -07:00
Kevin Krakauer fb8be7e627 make connect(2) fail when dest is unreachable
Previously, ICMP destination unreachable datagrams were ignored by TCP
endpoints. This caused connect to hang when an intermediate router
couldn't find a route to the host.

This manifested as a Kokoro error when Docker IPv6 was enabled. The Ruby
image test would try to install the sinatra gem and hang indefinitely
attempting to use an IPv6 address.

Fixes #3079.
2020-07-22 16:51:42 -07:00
Kevin Krakauer 89bd71c942 iptables: don't NAT existing connections
Fixes a NAT bug that manifested as:
- A SYN was sent from gVisor to another host, unaffected by iptables.
- The corresponding SYN/ACK was NATted by a PREROUTING REDIRECT rule
  despite being part of the existing connection.
- The socket that sent the SYN never received the SYN/ACK and thus a
  connection could not be established.

We handle this (as Linux does) by tracking all connections, inserting a
no-op conntrack rule for new connections with no rules of their own.

Needed for istio support (#170).
2020-07-22 16:49:11 -07:00
Kevin Krakauer bd98f82014 iptables: replace maps with arrays
For iptables users, Check() is a hot path called for every packet one or more
times. Let's avoid a bunch of map lookups.

PiperOrigin-RevId: 322678699
2020-07-22 16:23:55 -07:00
Bhasker Hariharan 71bf90c55b Support for receiving outbound packets in AF_PACKET.
Updates #173

PiperOrigin-RevId: 322665518
2020-07-22 15:33:33 -07:00
Kevin Krakauer e92f38ff0c iptables: remove check for NetworkHeader
This is no longer necessary, as we always set NetworkHeader before calling
iptables.Check.

PiperOrigin-RevId: 321461978
2020-07-15 16:35:59 -07:00
Bhasker Hariharan fef90c61c6 Fix minor bugs in a couple of interface IOCTLs.
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks
tcpdump as it tries to interpret the packets incorrectly.

Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which
fails with an EINVAL since we don't implement it. For now change it to return
EOPNOTSUPP to indicate that we don't support the query rather than return
EINVAL.

NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities
and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type
field while NIC capabilities are more like the device features which can be
queried using SIOCETHTOOL but not modified and NIC Flags are fields that can
be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc.

Updates #2746

PiperOrigin-RevId: 321436525
2020-07-15 14:15:44 -07:00
gVisor bot c81ac8ec3b Merge pull request #2672 from amscanne:shim-integrated
PiperOrigin-RevId: 321053634
2020-07-13 16:10:58 -07:00
Kevin Krakauer 43c209f48e garbage collect connections
As in Linux, we must periodically clean up unused connections.

PiperOrigin-RevId: 321003353
2020-07-13 12:00:46 -07:00
Ghanan Gowripalan 9c32fd3f4d Do not copy sleep.Waker
sleep.Waker's fields are modified as values.

PiperOrigin-RevId: 320873451
2020-07-12 17:22:08 -07:00
Ting-Yu Wang 7e4d2d63ee icmp: When setting TransportHeader, remove from the Data portion.
The current convention is when a header is set to pkt.XxxHeader field, it
gets removed from pkt.Data. ICMP does not currently follow this convention.

PiperOrigin-RevId: 320078606
2020-07-07 15:56:46 -07:00
Ting-Yu Wang 1e5b0a9732 Shard some slow tests.
stack_x_test: 2m -> 20s
tcp_x_test: 80s -> 25s
PiperOrigin-RevId: 319828101
2020-07-06 12:14:08 -07:00
Kevin Krakauer 7fb6cc286f conntrack refactor, no behavior changes
- Split connTrackForPacket into 2 functions instead of switching on flag
- Replace hash with struct keys.
- Remove prefixes where possible
- Remove unused connStatus, timeout
- Flatten ConnTrack struct a bit - some intermediate structs had no meaning
  outside of the context of their parent.
- Protect conn.tcb with a mutex
- Remove redundant error checking (e.g. when is pkt.NetworkHeader valid)
- Clarify that HandlePacket and CreateConnFor are the expected entrypoints for
  ConnTrack

PiperOrigin-RevId: 318407168
2020-06-25 21:21:57 -07:00
Bhasker Hariharan b070e218c6 Add support for Stack level options.
Linux controls socket send/receive buffers using a few sysctl variables
  - net.core.rmem_default
  - net.core.rmem_max
  - net.core.wmem_max
  - net.core.wmem_default
  - net.ipv4.tcp_rmem
  - net.ipv4.tcp_wmem

The first 4 control the default socket buffer sizes for all sockets
raw/packet/tcp/udp and also the maximum permitted socket buffer that can be
specified in setsockopt(SOL_SOCKET, SO_(RCV|SND)BUF,...).

The last two control the TCP auto-tuning limits and override the default
specified in rmem_default/wmem_default as well as the max limits.

Netstack today only implements tcp_rmem/tcp_wmem and incorrectly uses it
to limit the maximum size in setsockopt() as well as uses it for raw/udp
sockets.

This changelist introduces the other 4 and updates the udp/raw sockets to use
the newly introduced variables. The values for min/max match the current
tcp_rmem/wmem values and the default value buffers for UDP/RAW sockets is
updated to match the linux value of 212KiB up from the really low current value
of 32 KiB.

Updates #3043
Fixes #3043

PiperOrigin-RevId: 318089805
2020-06-24 10:24:20 -07:00
Ian Gudger 2141013dce Add support for SO_REUSEADDR to TCP sockets/endpoints.
For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses.

gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did
not allow disabling it. This change brings the SO_REUSEADDR behavior closer to
the behavior implemented by Linux and adds a new SO_REUSEADDR disabled
behavior. Like Linux, SO_REUSEADDR is now disabled by default.

PiperOrigin-RevId: 317984380
2020-06-23 19:15:38 -07:00
Kevin Krakauer 0c169b6ad5 iptables: skip iptables if no rules are set
Users that never set iptables rules shouldn't incur the iptables performance
cost. Suggested by Ian (@iangudger).

PiperOrigin-RevId: 317232921
2020-06-18 19:46:36 -07:00
Kevin Krakauer 28b8a5cc3a iptables: remove metadata struct
Metadata was useful for debugging and safety, but enough tests exist that we
should see failures when (de)serialization is broken. It made stack
initialization more cumbersome and it's also getting in the way of ip6tables.

PiperOrigin-RevId: 317210653
2020-06-18 17:02:16 -07:00
Ghanan Gowripalan 09b2fca40c Cleanup tcp.timer and tcpip.Route
When a tcp.timer or tcpip.Route is no longer used, clean up its
resources so that unused memory may be released.

PiperOrigin-RevId: 317046582
2020-06-18 00:10:05 -07:00
Ghanan Gowripalan 57286eb642 Increase timeouts for NDP tests
... to help reduce flakes.

When waiting for an event to occur, use a timeout of 10s. When waiting
for an event to not occur, use a timeout of 1s.

Test: Ran test locally w/ run count of 1000 with and without gotsan.
PiperOrigin-RevId: 316998128
2020-06-17 17:22:43 -07:00
Ghanan Gowripalan 4c0a8bdaf5 Do not use tentative addresses for routes
Tentative addresses should not be used when finding a route. This change
fixes a bug where a tentative address may have been used.

Test: stack_test.TestDADResolve
PiperOrigin-RevId: 315997624
2020-06-11 16:09:45 -07:00
Ian Gudger a085e562d0 Add support for SO_REUSEADDR to UDP sockets/endpoints.
On UDP sockets, SO_REUSEADDR allows multiple sockets to bind to the same
address, but only delivers packets to the most recently bound socket. This
differs from the behavior of SO_REUSEADDR on TCP sockets. SO_REUSEADDR for TCP
sockets will likely need an almost completely independent implementation.

SO_REUSEADDR has some odd interactions with the similar SO_REUSEPORT. These
interactions are tested fairly extensively and all but one particularly odd
one (that honestly seems like a bug) behave the same on gVisor and Linux.

PiperOrigin-RevId: 315844832
2020-06-10 23:49:26 -07:00
Ghanan Gowripalan 2d3b9d18e7 Handle removed NIC in NDP timer for packet tx
NDP packets are sent periodically from NDP timers. These timers do not
hold the NIC lock when sending packets as the packet write operation
may take some time. While the lock is not held, the NIC may be removed
by some other goroutine. This change handles that scenario gracefully.

Test: stack_test.TestRemoveNICWhileHandlingRSTimer
PiperOrigin-RevId: 315524143
2020-06-09 11:33:20 -07:00
Kevin Krakauer 32b823fcdb netstack: parse incoming packet headers up-front
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.

This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.

PiperOrigin-RevId: 315179302
2020-06-07 13:38:43 -07:00
gVisor bot 427d208216 Merge pull request #2872 from kevinGC:ipt-skip-prerouting
PiperOrigin-RevId: 315041419
2020-06-05 20:44:01 -07:00