Commit Graph

173 Commits

Author SHA1 Message Date
Ghanan Gowripalan c19e049f2c Check local address directly through NIC
Network endpoints that wish to check addresses on another NIC-local
network endpoint may now do so through the NetworkInterface.

This fixes a lock ordering issue between NIC removal and link
resolution. Before this change:

  NIC Removal takes the stack lock, neighbor cache lock then neighbor
  entries' locks.

  When performing IPv4 link resolution, we take the entry lock then ARP
  would try check IPv4 local addresses through the stack which tries to
  obtain the stack's lock.

Now that ARP can check IPv4 addreses through the NIC, we avoid the lock
ordering issue, while also removing the need for stack to lookup the
NIC.

PiperOrigin-RevId: 356034245
2021-02-06 09:09:19 -08:00
Ghanan Gowripalan 83b764d9d2 Batch write packets after iptables checks
After IPTables checks a batch of packets, we can write packets that are
not dropped or locally destined as a batch instead of individually.

This previously caused a bug since WritePacket* functions expect to take
ownership of passed PacketBuffer{List}. WritePackets assumed the list of
PacketBuffers will not be invalidated when calling WritePacket for each
PacketBuffer in the list, but this is not true. WritePacket may add the
passed PacketBuffer into a different list which would modify the
PacketBuffer in such a way that it no longer points to the next
PacketBuffer to write.

Example: Given a PB list of
    PB_a -> PB_b -> PB_c

WritePackets may be iterating over the list and calling WritePacket for
each PB. When WritePacket takes PB_a, it may add it to a new list which
would update pointers such that PB_a no longer points to PB_b.

Test: integration_test.TestIPTableWritePackets
PiperOrigin-RevId: 355969560
2021-02-05 18:44:04 -08:00
Ghanan Gowripalan 24416032ab Refactor locally delivered packets
Make it clear that failing to parse a looped back is not a packet
sending error but a malformed received packet error.

FindNetworkEndpoint returns nil when no network endpoint is found
instead of an error.

PiperOrigin-RevId: 355954946
2021-02-05 16:47:11 -08:00
Ghanan Gowripalan ebd3912c0f Refactor HandleControlPacket/SockError
...to remove the need for the transport layer to deduce the type of
error it received.

Rename HandleControlPacket to HandleError as HandleControlPacket only
handles errors.

tcpip.SockError now holds a tcpip.SockErrorCause interface that
different errors can implement.

PiperOrigin-RevId: 354994306
2021-02-01 12:04:03 -08:00
Ghanan Gowripalan daeb06d2cb Hide neighbor table kind from NetworkEndpoint
The network endpoint should not need to have logic to handle different
kinds of neighbor tables. Network endpoints can let the NIC know about
differnt neighbor discovery messages and let the NIC decide which table
to update.

This allows us to remove the LinkAddressCache interface.

PiperOrigin-RevId: 354812584
2021-01-31 10:03:46 -08:00
Ting-Yu Wang 825c185dc5 Make fragmentation return a reassembled PacketBuffer
This allows later decoupling of the backing network buffer implementation.

PiperOrigin-RevId: 354643297
2021-01-29 17:37:29 -08:00
Ghanan Gowripalan 45fe9fe9c6 Clear IGMPv1 present flag on NIC down
This is dynamic state that can be re-learned when the NIC comes
back up.

Test: ipv4_test.TestIgmpV1Present
PiperOrigin-RevId: 354630921
2021-01-29 16:10:49 -08:00
Tamir Duberstein 8d1afb4185 Change tcpip.Error to an interface
This makes it possible to add data to types that implement tcpip.Error.
ErrBadLinkEndpoint is removed as it is unused.

PiperOrigin-RevId: 354437314
2021-01-28 17:59:58 -08:00
Julian Elischer 3731ebb3fe Adjust included data size on icmp errors
The RFC for icmpv6 specifies that an errant packet should be included
in the returned ICMP packet, and that it should include up to the amount
needed to fill the minimum MTU (1280 bytes) if possible. The current code
included the Link header in that calculation but the RFC is referring
to the IP MTU not the link MTU. Some conformance tests check this and
report an error agains the stack for this.

The full header length shoudl however continue to be used when allocating
header space.

Make the same change for IPv4 for consistency.

Add a test for icmp payload sizing.
Test that the included data in an ICMP error packet conforms to the
requirements of RFC 972, RFC 4443 section 2.4 and RFC 1812 Section 4.3.2.3.

Fixes #5311

PiperOrigin-RevId: 353790203
2021-01-25 20:50:02 -08:00
Arthur Sfez 39db3b9355 Add per endpoint ARP statistics
The ARP stat NetworkUnreachable was removed, and was replaced by
InterfaceHasNoLocalAddress. No stats are recorded when dealing with an
missing endpoint (ErrNotConnected) (because if there is no endpoint,
there is no valid per-endpoint stats).

PiperOrigin-RevId: 353759462
2021-01-25 16:52:05 -08:00
Arthur Sfez 18ebec0ec9 Refactor GetMainNICAddress
It previously returned an error but it could only be UnknownNICID. It now
returns a boolean to indicate whether the nic exists or not.

PiperOrigin-RevId: 353337489
2021-01-22 16:12:12 -08:00
Ghanan Gowripalan 76da673a0d Do not modify IGMP packets when verifying checksum
PiperOrigin-RevId: 353336894
2021-01-22 16:06:05 -08:00
Toshi Kikuchi cfbf209173 iptables: support matching the input interface name
We have support for the output interface name, but not for the input
interface name.
This change adds the support for the input interface name, and adds the
test cases for it.

Fixes #5300

PiperOrigin-RevId: 353179389
2021-01-21 23:19:19 -08:00
Julian Elischer 2865166403 Change the way the IP options report problems
The error messages are not needed or used as these are not processing errors
so much as errors to be reported back to the packet sender. Implicitly
describe whether each error should generate ICMP packets or not. Most do
but there are a couple that do not.

Slightly alter some test expectations for Linux compatibility and add a
couple more. Improve Linux compatibility on error packet returns. Some
cosmetic changes to tests to match the upcoming packet impact version
of the same tests.

PiperOrigin-RevId: 352889785
2021-01-20 15:36:03 -08:00
Arthur Sfez be17b94446 Per NIC NetworkEndpoint statistics
To facilitate the debugging of multi-homed setup, track Network
protocols statistics for each endpoint. Note that the original
stack-wide stats still exist.

A new type of statistic counter is introduced, which track two
versions of a stat at the same time. This lets a network endpoint
increment both the local stat and the stack-wide stat at the same
time.

Fixes #4605

PiperOrigin-RevId: 352663276
2021-01-19 15:07:39 -08:00
Tamir Duberstein 12d9790833 Remove count argument from tcpip.Endpoint.Read
The same intent can be specified via the io.Writer.

PiperOrigin-RevId: 352098747
2021-01-15 15:49:15 -08:00
Ting-Yu Wang ec9e263f21 Correctly return EMSGSIZE when packet is too big in raw socket.
IPv4 previously accepts the packet, while IPv6 panics. Neither is the behavior
in Linux.

splice() in Linux has different behavior than in gVisor. This change documents
it in the SpliceTooLong test.

Reported-by: syzbot+b550e78e5c24d1d521f2@syzkaller.appspotmail.com
PiperOrigin-RevId: 352091286
2021-01-15 15:10:27 -08:00
Ghanan Gowripalan 25b5ec7135 Do not resolve remote link address at transport layer
Link address resolution is performed at the link layer (if required) so
we can defer it from the transport layer. When link resolution is
required, packets will be queued and sent once link resolution
completes. If link resolution fails, the transport layer will receive a
control message indicating that the stack failed to route the packet.

tcpip.Endpoint.Write no longer returns a channel now that writes do not
wait for link resolution at the transport layer.

tcpip.ErrNoLinkAddress is no longer used so it is removed.

Removed calls to stack.Route.ResolveWith from the transport layer so
that link resolution is performed when a route is created in response
to an incoming packet (e.g. to complete TCP handshakes or send a RST).

Tests:
- integration_test.TestForwarding
- integration_test.TestTCPLinkResolutionFailure

Fixes #4458

RELNOTES: n/a
PiperOrigin-RevId: 351684158
2021-01-13 16:04:33 -08:00
Ting-Yu Wang b1de1da318 netstack: Refactor tcpip.Endpoint.Read
Read now takes a destination io.Writer, count, options. Keeping the method name
Read, in contrast to the Write method.

This enables:
* direct transfer of views under VV
* zero copy

It also eliminates the need for sentry to keep a slice of view because
userspace had requested a read that is smaller than the view returned, removing
the complexity there.

Read/Peek/ReadPacket are now consolidated together and some duplicate code is
removed.

PiperOrigin-RevId: 350636322
2021-01-07 14:17:18 -08:00
Peter Johnston fee2cd640f Invoke address resolution upon subsequent traffic to Failed neighbor
Removes the period of time in which subseqeuent traffic to a Failed neighbor
immediately fails with ErrNoLinkAddress. A Failed neighbor is one in which
address resolution fails; or in other words, the neighbor's IP address cannot
be translated to a MAC address.

This means removing the Failed state for linkAddrCache and allowing transitiong
out of Failed into Incomplete for neighborCache. Previously, both caches would
transition entries to Failed after address resolution fails. In this state, any
subsequent traffic requested within an unreachable time would immediately fail
with ErrNoLinkAddress. This does not follow RFC 4861 section 7.3.3:

  If address resolution fails, the entry SHOULD be deleted, so that subsequent
  traffic to that neighbor invokes the next-hop determination procedure again.
  Invoking next-hop determination at this point ensures that alternate default
  routers are tried.

The API for getting a link address for a given address, whether through the link
address cache or the neighbor table, is updated to optionally take a callback
which will be called when address resolution completes. This allows `Route` to
handle completing link resolution internally, so callers of (*Route).Resolve
(e.g. endpoints) don’t have to keep track of when it completes and update the
Route accordingly.

This change also removes the wakers from LinkAddressCache, NeighborCache, and
Route in favor of the callbacks, and callers that previously used a waker can
now just pass a callback to (*Route).Resolve that will notify the waker on
resolution completion.

Fixes #4796

Startblock:
  has LGTM from sbalana
  and then
  add reviewer ghanan
PiperOrigin-RevId: 348597478
2020-12-22 01:37:05 -08:00
Ghanan Gowripalan 50c658a9f6 Don't split enabled flag across multicast group state
Startblock:
  has LGTM from asfez
  and then
  add reviewer brunodalbo
PiperOrigin-RevId: 347716242
2020-12-15 16:28:53 -08:00
Ghanan Gowripalan 53a95ad0df Use specified source address for IGMP/MLD packets
This change also considers interfaces and network endpoints enabled up
up to the point all work to disable them are complete. This was needed
so that protocols can perform shutdown work while being disabled (e.g.
sending a packet which requires the endpoint to be enabled to obtain a
source address).

Bug #4682, #4861
Fixes #4888

Startblock:
  has LGTM from peterjohnston
  and then
  add reviewer brunodalbo
PiperOrigin-RevId: 346869702
2020-12-10 14:50:20 -08:00
Ghanan Gowripalan 50189b0d6f Do not perform IGMP/MLD on loopback interfaces
The loopback interface will never have any neighbouring nodes so
advertising its interest in multicast groups is unnecessary.

Bug #4682, #4861

Startblock:
  has LGTM from asfez
  and then
  add reviewer tamird
PiperOrigin-RevId: 346587604
2020-12-09 15:54:18 -08:00
Zeling Feng 96d14de0fa export MountTempDirectory
PiperOrigin-RevId: 346487763
2020-12-09 15:50:35 -08:00
Ghanan Gowripalan df2dbe3e38 Remove stack.ReadOnlyAddressableEndpointState
Startblock:
  has LGTM from asfez
  and then
  add reviewer tamird
PiperOrigin-RevId: 345815146
2020-12-04 22:04:23 -08:00
Bruno Dal Bo fd28ccfaa4 Introduce IPv4 options serializer and add RouterAlert to IGMP
PiperOrigin-RevId: 345701623
2020-12-04 10:10:56 -08:00
Peter Johnston 3ff1aef544 Make `stack.Route` thread safe
Currently we rely on the user to take the lock on the endpoint that owns the
route, in order to modify it safely. We can instead move
`Route.RemoteLinkAddress` under `Route`'s mutex, and allow non-locking and
thread-safe access to other fields of `Route`.

PiperOrigin-RevId: 345461586
2020-12-03 08:54:24 -08:00
Arthur Sfez bdaae08ee2 Extract ICMPv4/v6 specific stats to their own types
This change lets us split the v4 stats from the v6 stats, which will be
useful when adding stats for each network endpoint.

PiperOrigin-RevId: 345322615
2020-12-02 15:17:20 -08:00
Ghanan Gowripalan 25570ac4f3 Track join count in multicast group protocol state
Before this change, the join count and the state for IGMP/MLD was held
across different types which required multiple locks to be held when
accessing a multicast group's state.

Bug #4682, #4861
Fixes #4916

PiperOrigin-RevId: 345019091
2020-12-01 07:52:40 -08:00
Toshi Kikuchi 54ad145f2e Add more fragment reassembly tests
These tests check if a maximum-sized (64k) packet is reassembled without
receiving a fragment with MF flag set to zero.

PiperOrigin-RevId: 344913172
2020-11-30 16:36:51 -08:00
Ghanan Gowripalan e813008664 Perform IGMP/MLD when the NIC is enabled/disabled
Test: ip_test.TestMGPWithNICLifecycle

Bug #4682, #4861

PiperOrigin-RevId: 344888091
2020-11-30 14:24:47 -08:00
Ghanan Gowripalan bc81fcceda Support listener-side MLDv1
...as defined by RFC 2710. Querier (router)-side MLDv1 is not yet
supported.

The core state machine is shared with IGMPv2.

This is guarded behind a flag (ipv6.Options.MLDEnabled).

Tests: ip_test.TestMGP*

Bug #4861

PiperOrigin-RevId: 344344095
2020-11-25 18:00:41 -08:00
Ghanan Gowripalan 2485a4e2cb Make stack.Route safe to access concurrently
Multiple goroutines may use the same stack.Route concurrently so
the stack.Route should make sure that any functions called on it
are thread-safe.

Fixes #4073

PiperOrigin-RevId: 344320491
2020-11-25 14:52:59 -08:00
Ghanan Gowripalan 732e989855 Extract IGMPv2 core state machine
The IGMPv2 core state machine can be shared with MLDv1 since they are
almost identical, ignoring specific addresses, constants and packets.

Bug #4682, #4861

PiperOrigin-RevId: 344102615
2020-11-24 11:50:00 -08:00
Ghanan Gowripalan ba2d5cb7e1 Use time.Duration for IGMP Max Response Time field
Bug #4682

PiperOrigin-RevId: 343993297
2020-11-23 22:47:55 -08:00
Ryan Heacock fbc4a8dbd1 Perform IGMPv2 when joining IPv4 multicast groups
Added headers, stats, checksum parsing capabilities from RFC 2236 describing
IGMPv2.

IGMPv2 state machine is implemented for each condition, sending and receiving
IGMP Membership Reports and Leave Group messages with backwards compatibility
with IGMPv1 routers.

Test:
* Implemented igmp header parser and checksum calculator in header/igmp_test.go
* ipv4/igmp_test.go tests incoming and outgoing IGMP messages and pathways.
* Added unit test coverage for IGMPv2 RFC behavior + IGMPv1 backwards
   compatibility in ipv4/igmp_test.go.

Fixes #4682

PiperOrigin-RevId: 343408809
2020-11-19 18:15:25 -08:00
Julian Elischer 49adf36ed7 Fix possible panic due to bad data.
Found by a Fuzzer.

Reported-by: syzbot+619fa10be366d553ef7f@syzkaller.appspotmail.com
PiperOrigin-RevId: 343379575
2020-11-19 15:17:00 -08:00
Ghanan Gowripalan 27ee4fe76a Don't hold AddressEndpoints for multicast addresses
Group addressable endpoints can simply check if it has joined the
multicast group without maintaining address endpoints. This also
helps remove the dependency on AddressableEndpoint from
GroupAddressableEndpoint.

Now that group addresses are not tracked with address endpoints, we can
avoid accidentally obtaining a route with a multicast local address.

PiperOrigin-RevId: 343336912
2020-11-19 11:48:15 -08:00
Ghanan Gowripalan cc5cfce4c6 Remove ARP address workaround
- Make AddressableEndpoint optional for NetworkEndpoint.
Not all NetworkEndpoints need to support addressing (e.g. ARP), so
AddressableEndpoint should only be implemented for protocols that
support addressing such as IPv4 and IPv6.

With this change, tcpip.ErrNotSupported will be returned by the stack
when attempting to modify addresses on a network endpoint that does
not support addressing.

Now that packets are fully handled at the network layer, and (with this
change) addresses are optional for network endpoints, we no longer need
the workaround for ARP where a fake ARP address was added to each NIC
that performs ARP so that packets would be delivered to the ARP layer.

PiperOrigin-RevId: 342722547
2020-11-16 14:36:10 -08:00
Toshi Kikuchi 758e45618f Clean up fragmentation.Process
- Pass a PacketBuffer directly instead of releaseCB
- No longer pass a VectorisedView, which is included in the PacketBuffer
- Make it an error if data size is not equal to (last - first + 1)
- Set the callback for the reassembly timeout on NewFragmentation

PiperOrigin-RevId: 342702432
2020-11-16 13:06:38 -08:00
Julian Elischer 0fee59c8c8 Requested Comment/Message wording changes
PiperOrigin-RevId: 342366891
2020-11-13 17:13:11 -08:00
Ghanan Gowripalan 6c0f53002a Decrement TTL/Hop Limit when forwarding IP packets
If the packet must no longer be forwarded because its TTL/Hop Limit
reaches 0, send an ICMP Time Exceeded error to the source.

Required as per relevant RFCs. See comments in code for RFC references.

Fixes #1085

Tests:
  - ipv4_test.TestForwarding
  - ipv6.TestForwarding
PiperOrigin-RevId: 342323610
2020-11-13 13:13:21 -08:00
Julian Elischer 638d64c633 Change AllocationSize to SizeWithPadding as requested
RELNOTES: n/a
PiperOrigin-RevId: 342176296
2020-11-12 18:38:43 -08:00
Julian Elischer d700ba22ab Pad with a loop rather than a copy from an allocation.
Add a unit test for ipv4.Encode and a round trip test.

PiperOrigin-RevId: 342169517
2020-11-12 17:50:24 -08:00
Ghanan Gowripalan 1a972411b3 Move packet handling to NetworkEndpoint
The NIC should not hold network-layer state or logic - network packet
handling/forwarding should be performed at the network layer instead
of the NIC.

Fixes #4688

PiperOrigin-RevId: 342166985
2020-11-12 17:33:21 -08:00
Julian Elischer 9c4102896d Teach netstack how to add options to IPv4 packets
Most packets don't have options but they are an integral part of the
standard. Teaching the ipv4 code how to handle them will simplify future
testing and use.  Because Options are so rare it is worth making sure
that the extra work is kept out of the fast path as much as possible.

Prior to this change, all usages of the IHL field of the IPv4Fields/Encode
system set it to the same constant value except in a couple of tests
for bad values. From this change IHL will not be a constant as it will
depend on the size of any Options. Since ipv4.Encode() now handles the
options it becomes a possible source of errors to let the callers set
this value, so remove it entirely and calculate the value from the size
of the Options if present (or not) therefore guaranteeing a correct value.

Fixes #4709
RELNOTES: n/a
PiperOrigin-RevId: 341864765
2020-11-11 10:59:35 -08:00
Bhasker Hariharan 06e33cd737 Cache addressEndpoint.addr.Subnet() to avoid allocations.
This change adds a Subnet() method to AddressableEndpoint so that we
can avoid repeated calls to AddressableEndpoint.AddressWithPrefix().Subnet().

Updates #231

PiperOrigin-RevId: 340969877
2020-11-05 19:12:09 -08:00
Ghanan Gowripalan 8c0701462a Use stack.Route exclusively for writing packets
* Remove stack.Route from incoming packet path.
There is no need to pass around a stack.Route during the incoming path
of a packet. Instead, pass around the packet's link/network layer
information in the packet buffer since all layers may need this
information.

* Support address bound and outgoing packet NIC in routes.
When forwarding is enabled, the source address of a packet may be bound
to a different interface than the outgoing interface. This change
updates stack.Route to hold both NICs so that one can be used to write
packets while the other is used to check if the route's bound address
is valid. Note, we need to hold the address's interface so we can check
if the address is a spoofed address.

* Introduce the concept of a local route.
Local routes are routes where the packet never needs to leave the stack;
the destination is stack-local. We can now route between interfaces
within a stack if the packet never needs to leave the stack, even when
forwarding is disabled.

* Always obtain a route from the stack before sending a packet.
If a packet needs to be sent in response to an incoming packet, a route
must be obtained from the stack to ensure the stack is configured to
send packets to the packet's source from the packet's destination.

* Enable spoofing if a stack may send packets from unowned addresses.
This change required changes to some netgophers since previously,
promiscuous mode was enough to let the netstack respond to all
incoming packets regardless of the packet's destination address. Now
that a stack.Route is not held for each incoming packet, finding a route
may fail with local addresses we don't own but accepted packets for
while in promiscuous mode. Since we also want to be able to send from
any address (in response the received promiscuous mode packets), we need
to enable spoofing.

* Skip transport layer checksum checks for locally generated packets.
If a packet is locally generated, the stack can safely assume that no
errors were introduced while being locally routed since the packet is
never sent out the wire.

Some bugs fixed:
- transport layer checksum was never calculated after NAT.
- handleLocal didn't handle routing across interfaces.
- stack didn't support forwarding across interfaces.
- always consult the routing table before creating an endpoint.

Updates #4688
Fixes #3906

PiperOrigin-RevId: 340943442
2020-11-05 15:52:16 -08:00
Kevin Krakauer 02fe467b47 Keep magic constants out of netstack
PiperOrigin-RevId: 339721152
2020-10-29 12:22:21 -07:00
Julian Elischer 035b1c8272 Add support for Timestamp and RecordRoute IP options
IPv4 options extend the size of the IP header and have a basic known
format. The framework can process that format without needing to know
about every possible option. We can add more code to handle additional
option types as we need them. Bad options or mangled option entries
can result in ICMP Parameter Problem packets. The first types we
support are the Timestamp option and the Record Route option, included
in this change.

The options are processed at several points in the packet flow within
the Network stack, with slightly different requirements. The framework
includes a mechanism to control this at each point. Support has been
added for such points which are only present in upcoming CLs such as
during packet forwarding and fragmentation.

With this change, 'ping -R' and 'ping -T' work against gVisor and Fuchsia.

$ ping -R 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.990 ms
NOP
RR:     192.168.1.1
        192.168.1.2
        192.168.1.1

$ ping -T tsprespec 192.168.1.2 192.168.1.1 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=1.20 ms
TS:     192.168.1.2    71486821 absolute
        192.168.1.1    746

Unit tests included for generic options, Timestamp options
and Record Route options.

PiperOrigin-RevId: 339379076
2020-10-27 19:32:09 -07:00