Commit Graph

91 Commits

Author SHA1 Message Date
Arthur Sfez 58a3c2d44f Add underflow check when calculating the MTU
Also enforce the minimum MTU for IPv4 and IPv6, and discard packets if the
minimum is not met.

PiperOrigin-RevId: 338404225
2020-10-21 22:12:13 -07:00
Julian Elischer 4d27f33b09 Make IPv4 check the IP header checksum
The IPv4 header checksum has not been checked, at least in recent times,
so add code to do so. Fix all the tests that fail because they never
needed to set the checksum.

Fixes #4484

PiperOrigin-RevId: 337556243
2020-10-16 12:31:05 -07:00
Arthur Sfez edc1068244 Enable IPv4 fragmentation for every code path.
Currently, fragmentation can only occur during WritePacket(). This enables
it for WritePackets() and WriteIncludedHeaderPacket() as well.

IPv4 unit tests were refactored to be consistent with the IPv6 unit tests.

This removes the extraHeaderReserveLength field and the related
"prependable bytes" unit tests (for both IPv4 and IPv6) because it was only
testing a panic condition when the value was too low.

Fixes #3796

PiperOrigin-RevId: 337550061
2020-10-16 11:57:27 -07:00
Ghanan Gowripalan fbfcf8144c Enable IPv6 WriteHeaderIncludedPacket
Allow writing an IPv6 packet where the IPv6 header is a provided by
the user.

* Introduce an error to let callers know a header is malformed.
We previously useed tcpip.ErrInvalidOptionValue but that did not seem
appropriate for generic malformed header errors.

* Populate network header in WriteHeaderIncludedPacket
IPv4's implementation of WriteHeaderIncludedPacket did not previously
populate the packet buffer's network header. This change fixes that.

Fixes #4527

Test: ip_test.TestWriteHeaderIncludedPacket
PiperOrigin-RevId: 337534548
2020-10-16 10:42:34 -07:00
Ghanan Gowripalan 257703c050 Automated rollback of changelist 336304024
PiperOrigin-RevId: 336339194
2020-10-09 12:09:12 -07:00
Bhasker Hariharan 8566decab0 Automated rollback of changelist 336185457
PiperOrigin-RevId: 336304024
2020-10-09 09:11:18 -07:00
Ghanan Gowripalan 6768e6c59e Do not resolve routes immediately
When a response needs to be sent to an incoming packet, the stack should
consult its neighbour table to determine the remote address's link
address.

When an entry does not exist in the stack's neighbor table, the stack
should queue the packet while link resolution completes. See comments.

PiperOrigin-RevId: 336185457
2020-10-08 16:15:59 -07:00
Arthur Sfez 0c3134028d Change IPv6 reassembly timeout to 60s
It was originally set to 30s for IPv6 (same as IPv4) but this is not
what RFC 8200 prescibes. Linux also defaults to 60s [1].

[1] 47ec5303d7/include/net/ipv6.h (L456)

PiperOrigin-RevId: 336034636
2020-10-08 00:56:16 -07:00
Arthur Sfez 99bf022c2a Add support for IPv6 fragmentation
Most of the IPv4 fragmentation code was moved in the fragmentation
package and it is reused by IPv6 fragmentation.

Test:
  - pkg/tcpip/network/ipv4:ipv4_test
  - pkg/tcpip/network/ipv6:ipv6_test
  - pkg/tcpip/network/fragmentation:fragmentation_test

Fixes #4389

PiperOrigin-RevId: 335714280
2020-10-06 14:03:39 -07:00
Julian Elischer 798cc6b04d Fix IPv4 ICMP echo handler to copy options
The IPv4 RFCs are specific (though obtuse) that an echo response
packet needs to contain all the options from the echo request,
much as if it been routed back to the sender, though apparently
with a new TTL. They suggest copying the incoming packet header
to achieve this so that is what this patch does.

PiperOrigin-RevId: 335559176
2020-10-05 20:43:55 -07:00
Arthur Sfez 7f9e13053e Count IP OutgoingPacketErrors in the NetworkEndpoint methods
Before this change, OutgoingPacketErrors was incremented in the
stack.Route methods. This was going to be a problem once
IPv4/IPv6 WritePackets support fragmentation because Route.WritePackets
might now know how many packets are left after an error occurs.

Test:
  - pkg/tcpip/network/ipv4:ipv4_test
  - pkg/tcpip/network/ipv6:ipv6_test
PiperOrigin-RevId: 334687983
2020-09-30 15:09:38 -07:00
Julian Elischer 694d6ae32f Use the ICMP error response facility
Add code in IPv6 to send ICMP packets while processing extension headers.

Add some accounting in processing IPV6 Extension headers which
allows us to report meaningful information back in ICMP parameter
problem packets.

IPv4 also needs to send a message when an unsupported protocol
is requested.

Add some tests to generate both ipv4 and ipv6 packets with
various errors and check the responses.

Add some new checkers and cleanup some inconsistencies in
the messages in that file.

Add new error types for the ICMPv4/6 generators.

Fix a bug in the ICMPv4 generator that stopped it from generating
"Unknown protocol" messages.

Updates #2211

PiperOrigin-RevId: 334661716
2020-09-30 13:05:14 -07:00
Ghanan Gowripalan e5ece9aea7 Return permanent addresses when NIC is down
Test: stack_test.TestGetMainNICAddressWhenNICDisabled
PiperOrigin-RevId: 334513286
2020-09-29 19:46:50 -07:00
Ghanan Gowripalan 6ae83404af Don't allow broadcast/multicast source address
As per relevant IP RFCS (see code comments), broadcast (for IPv4) and
multicast addresses are not allowed. Currently checks for these are
done at the transport layer, but since it is explicitly forbidden at
the IP layers, check for them there.

This change also removes the UDP.InvalidSourceAddress stat since there
is no longer a need for it.

Test: ip_test.TestSourceAddressValidation
PiperOrigin-RevId: 334490971
2020-09-29 16:54:23 -07:00
Toshi Kikuchi f15182243e Discard IP fragments as soon as it expires
Currently expired IP fragments are discarded only if another fragment for the
same IP datagram is received after timeout or the total size of the fragment
queue exceeded a predefined value.

Test: fragmentation.TestReassemblingTimeout

Fixes #3960

PiperOrigin-RevId: 334423710
2020-09-29 11:29:50 -07:00
Ghanan Gowripalan 5075d0342f Trim Network/Transport Endpoint/Protocol
* Remove Capabilities and NICID methods from NetworkEndpoint.

* Remove linkEP and stack parameters from NetworkProtocol.NewEndpoint.
The LinkEndpoint can be fetched from the NetworkInterface. The stack
is passed to the NetworkProtocol when it is created so the
NetworkEndpoint can get it from its protocol.

* Remove stack parameter from TransportProtocol.NewEndpoint.
Like the NetworkProtocol/Endpoint, the stack is passed to the
TransportProtocol when it is created.

PiperOrigin-RevId: 334332721
2020-09-29 02:05:50 -07:00
Ghanan Gowripalan 48915bdedb Move IP state from NIC to NetworkEndpoint/Protocol
* Add network address to network endpoints.
Hold network-specific state in the NetworkEndpoint instead of the stack.
This results in the stack no longer needing to "know" about the network
endpoints and special case certain work for various endpoints
(e.g. IPv6 DAD).

* Provide NetworkEndpoints with an NetworkInterface interface.
Instead of just passing the NIC ID of a NIC, pass an interface so the
network endpoint may query other information about the NIC such as
whether or not it is a loopback device.

* Move NDP code and state to the IPv6 package.
NDP is IPv6 specific so there is no need for it to live in the stack.

* Control forwarding through NetworkProtocols instead of Stack
Forwarding should be controlled on a per-network protocol basis so
forwarding configurations are now controlled through network protocols.

* Remove stack.referencedNetworkEndpoint.
Now that addresses are exposed via AddressEndpoint and only one
NetworkEndpoint is created per interface, there is no need for a
referenced NetworkEndpoint.

* Assume network teardown methods are infallible.

Fixes #3871, #3916

PiperOrigin-RevId: 334319433
2020-09-29 00:20:41 -07:00
Ghanan Gowripalan a5acc0616c Support creating protocol instances with Stack ref
Network or transport protocols may want to reach the stack. Support this
by letting the stack create the protocol instances so it can pass a
reference to itself at protocol creation time.

Note, protocols do not yet use the stack in this CL but later CLs will
make use of the stack from protocols.

PiperOrigin-RevId: 334260210
2020-09-28 16:24:04 -07:00
Ghanan Gowripalan a376a0baf3 Remove generic ICMP errors
Generic ICMP errors were required because the transport dispatcher was
given the responsibility of sending ICMP errors in response to transport
packet delivery failures. Instead, the transport dispatcher should let
network layer know it failed to deliver a packet (and why) and let the
network layer make the decision as to what error to send (if any).

Fixes #4068

PiperOrigin-RevId: 333962333
2020-09-26 19:24:41 -07:00
Julian Elischer 99decaadd6 Extract ICMP error sender from UDP
Store transport protocol number on packet buffers for use in ICMP error
generation.

Updates #2211.

PiperOrigin-RevId: 333252762
2020-09-23 02:28:43 -07:00
Kevin Krakauer bd69afdcd1 Count packets dropped by iptables in IPStats
PiperOrigin-RevId: 332486383
2020-09-18 11:13:19 -07:00
Ghanan Gowripalan 360006d894 Use common parsing utilities when sniffing
Extract parsing utilities so they can be used by the sniffer.

Fixes #3930

PiperOrigin-RevId: 332401880
2020-09-18 00:48:09 -07:00
Kevin Krakauer 0b8d306e64 ip6tables: filter table support
`ip6tables -t filter` is now usable. NAT support will come in a future CL.

#3549

PiperOrigin-RevId: 332381801
2020-09-17 21:54:48 -07:00
Toshi Kikuchi b6ca96b9b9 Cap reassembled IPv6 packets at 65535 octets
IPv4 can accept 65536-octet reassembled packets.

Test:
- ipv4_test.TestInvalidFragments
- ipv4_test.TestReceiveFragments
- ipv6.TestInvalidIPv6Fragments
- ipv6.TestReceiveIPv6Fragments

Fixes #3770

PiperOrigin-RevId: 331382977
2020-09-12 23:21:27 -07:00
Ghanan Gowripalan bdd5996a73 Improve type safety for network protocol options
The existing implementation for NetworkProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.

This change introduces marker interfaces for network protocol options
that may be set or queried which network protocol option types implement
to ensure that invalid types are caught at compile time. Different
interfaces are used to allow the compiler to enforce read-only or
set-only socket options.

PiperOrigin-RevId: 328980359
2020-08-28 11:50:17 -07:00
Sam Balana a174aa7597 Add option to replace linkAddrCache with neighborCache
This change adds an option to replace the current implementation of ARP through
linkAddrCache, with an implementation of NUD through neighborCache. Switching
to using NUD for both ARP and NDP is beneficial for the reasons described by
RFC 4861 Section 3.1:

  "[Using NUD] significantly improves the robustness of packet delivery in the
  presence of failing routers, partially failing or partitioned links, or nodes
  that change their link-layer addresses. For instance, mobile nodes can move
  off-link without losing any connectivity due to stale ARP caches."

  "Unlike ARP, Neighbor Unreachability Detection detects half-link failures and
  avoids sending traffic to neighbors with which two-way connectivity is
  absent."

Along with these changes exposes the API for querying and operating the
neighbor cache. Operations include:
  - Create a static entry
  - List all entries
  - Delete all entries
  - Remove an entry by address

This also exposes the API to change the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][1] for the list of
constants.

Finally, an API for subscribing to NUD state changes is exposed through
NUDDispatcher. See [RFC 4861 Appendix C][3] for the list of edges.

Tests:
 pkg/tcpip/network/arp:arp_test
 + TestDirectRequest

 pkg/tcpip/network/ipv6:ipv6_test
 + TestLinkResolution
 + TestNDPValidation
 + TestNeighorAdvertisementWithTargetLinkLayerOption
 + TestNeighorSolicitationResponse
 + TestNeighorSolicitationWithSourceLinkLayerOption
 + TestRouterAdvertValidation

 pkg/tcpip/stack:stack_test
 + TestCacheWaker
 + TestForwardingWithFakeResolver
 + TestForwardingWithFakeResolverManyPackets
 + TestForwardingWithFakeResolverManyResolutions
 + TestForwardingWithFakeResolverPartialTimeout
 + TestForwardingWithFakeResolverTwoPackets
 + TestIPv6SourceAddressSelectionScopeAndSameAddress

[1]: https://tools.ietf.org/html/rfc4861#section-10
[2]: https://tools.ietf.org/html/rfc4861#appendix-C

Fixes #1889
Fixes #1894
Fixes #1895
Fixes #1947
Fixes #1948
Fixes #1949
Fixes #1950

PiperOrigin-RevId: 328365034
2020-08-25 11:09:33 -07:00
Arthur Sfez 7ca62b9daa Only use the NextHeader value of the first IPv6 fragment extension header.
As per RFC 8200 Section 4.5:
  The Next Header field of the last header of the Per-Fragment
  headers is obtained from the Next Header field of the first
  fragment's Fragment header.

Test:
  - pkg/tcpip/network/ipv6:ipv6_test
  - pkg/tcpip/network/ipv4:ipv4_test
  - pkg/tcpip/network/fragmentation:fragmentation_test

Updates #2197

PiperOrigin-RevId: 327671635
2020-08-20 12:06:45 -07:00
Ghanan Gowripalan 1736b2208f Use a single NetworkEndpoint per NIC per protocol
The NetworkEndpoint does not need to be created for each address.
Most of the work the NetworkEndpoint does is address agnostic.

PiperOrigin-RevId: 326759605
2020-08-14 17:30:01 -07:00
Ting-Yu Wang 47515f4751 Migrate to PacketHeader API for PacketBuffer.
Formerly, when a packet is constructed or parsed, all headers are set by the
client code. This almost always involved prepending to pk.Header buffer or
trimming pk.Data portion. This is known to prone to bugs, due to the complexity
and number of the invariants assumed across netstack to maintain.

In the new PacketHeader API, client will call Push()/Consume() method to
construct/parse an outgoing/incoming packet. All invariants, such as slicing
and trimming, are maintained by the API itself.

NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no
longer valid.

PacketBuffer now assumes the packet is a concatenation of following portions:
* LinkHeader
* NetworkHeader
* TransportHeader
* Data
Any of them could be empty, or zero-length.

PiperOrigin-RevId: 326507688
2020-08-13 13:08:57 -07:00
Kevin Krakauer 8e31f0dc57 Set the NetworkProtocolNumber of all PacketBuffers.
NetworkEndpoints set the number on outgoing packets in Write() and
NetworkProtocols set them on incoming packets in Parse().

Needed for #3549.

PiperOrigin-RevId: 325938745
2020-08-10 19:34:28 -07:00
Ghanan Gowripalan 00993130e5 Use 1 fragmentation component per IP stack
This will help manage memory consumption by IP reassembly when
receiving IP fragments on multiple network endpoints. Previously,
each endpoint would cap memory consumption at 4MB, but with this
change, each IP stack will cap memory consumption at 4MB.

No behaviour changes.

PiperOrigin-RevId: 324913904
2020-08-04 16:27:00 -07:00
Ghanan Gowripalan ade4ff95fc Support fragments from different sources
Prevent fragments with different source-destination pairs from
conflicting with each other.

Test:
    - ipv6_test.TestReceiveIPv6Fragments
    - ipv4_test.TestReceiveIPv6Fragments
PiperOrigin-RevId: 324283246
2020-07-31 14:19:49 -07:00
Ghanan Gowripalan 9960a816a9 Enforce fragment block size and validate args
Allow configuring fragmentation.Fragmentation with a fragment
block size which will be enforced when processing fragments. Also
validate arguments when processing fragments.

Test:
    - fragmentation.TestErrors
    - ipv6_test.TestReceiveIPv6Fragments
    - ipv4_test.TestReceiveIPv6Fragments
PiperOrigin-RevId: 324081521
2020-07-30 14:25:53 -07:00
Tony Gong 76c7bc51b7 Set IPv4 ID on all non-atomic datagrams
RFC 6864 imposes various restrictions on the uniqueness of the IPv4
Identification field for non-atomic datagrams, defined as an IP datagram that
either can be fragmented (DF=0) or is already a fragment (MF=1 or positive
fragment offset). In order to be compliant, the ID field is assigned for all
non-atomic datagrams.

Add a TCP unit test that induces retransmissions and checks that the IPv4
ID field is unique every time. Add basic handling of the IP_MTU_DISCOVER
socket option so that the option can be used to disable PMTU discovery,
effectively setting DF=0. Attempting to set the sockopt to anything other
than disabled will fail because PMTU discovery is currently not implemented,
and the default behavior matches that of disabled.

PiperOrigin-RevId: 320081842
2020-07-07 16:14:49 -07:00
Kevin Krakauer 32b823fcdb netstack: parse incoming packet headers up-front
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.

This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.

PiperOrigin-RevId: 315179302
2020-06-07 13:38:43 -07:00
Kevin Krakauer 74a7d76c97 iptables: loopback traffic skips prerouting chain
Loopback traffic is not affected by rules in the PREROUTING chain.

This change is also necessary for istio's envoy to talk to other
components in the same pod.
2020-06-05 16:43:50 -07:00
Ting-Yu Wang d3a8bffe04 Pass PacketBuffer as pointer.
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.

With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.

Updates #2404.

PiperOrigin-RevId: 314610879
2020-06-03 15:00:42 -07:00
gVisor bot cfd30665c1 iptables - filter packets using outgoing interface.
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT

PiperOrigin-RevId: 310642286
2020-05-08 15:44:54 -07:00
Nayana Bidari b660f16d18 Support for connection tracking of TCP packets.
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
2020-05-01 16:59:40 -07:00
Kevin Krakauer 5e1e61fbcb Automated rollback of changelist 308674219
PiperOrigin-RevId: 309491861
2020-05-01 16:09:53 -07:00
Bhasker Hariharan ae15d90436 FIFO QDisc implementation
Updates #231

PiperOrigin-RevId: 309323808
2020-04-30 16:41:00 -07:00
gVisor bot 55f0c3316a Automated rollback of changelist 308163542
PiperOrigin-RevId: 308674219
2020-04-27 12:26:32 -07:00
Kevin Krakauer eccae0f77d Remove View.First() and View.RemoveFirst()
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.

The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.

PiperOrigin-RevId: 308163542
2020-04-23 17:28:49 -07:00
gVisor bot 120d3b50f4 Automated rollback of changelist 307477185
PiperOrigin-RevId: 307598974
2020-04-21 07:16:30 -07:00
Kevin Krakauer a551add5d8 Remove View.First() and View.RemoveFirst()
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.

The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
2020-04-17 13:25:57 -07:00
Bhasker Hariharan fc99a7ebf0 Refactor software GSO code.
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.

This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.

While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.

Updates #231

PiperOrigin-RevId: 304731742
2020-04-03 18:35:55 -07:00
Nayana Bidari 92b9069b67 Support owner matching for iptables.
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
2020-03-26 12:21:24 -07:00
Bhasker Hariharan 7e4073af12 Move tcpip.PacketBuffer and IPTables to stack package.
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.

Updates #2214

PiperOrigin-RevId: 302677662
2020-03-24 09:06:26 -07:00
Ian Gudger c37b196455 Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.

Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.

PiperOrigin-RevId: 296922548
2020-02-24 10:32:17 -08:00
Kevin Krakauer 9143fcd7fd Add UDP matchers. 2020-01-21 14:47:17 -08:00