Prevents the following deadlock:
- Raw packet is sent via e.Write(), which read locks e.mu
- Connect() is called, blocking on write locking e.mu
- The packet is routed to loopback and back to e.HandlePacket(), which read
locks e.mu
Per the atomic.RWMutex documentation, this deadlocks:
"If a goroutine holds a RWMutex for reading and another goroutine might call
Lock, no goroutine should expect to be able to acquire a read lock until the
initial read lock is released. In particular, this prohibits recursive read
locking. This is to ensure that the lock eventually becomes available; a blocked
Lock call excludes new readers from acquiring the lock."
Also, release eps.mu earlier in deliverRawPacket.
PiperOrigin-RevId: 359600926
- This CL will initialize the function handler used for getting the send
buffer size limits during endpoint creation and does not require the caller of
SetSendBufferSize(..) to know the endpoint type(tcp/udp/..)
PiperOrigin-RevId: 353992634
connect() can be invoked multiple times on UDP/RAW sockets and in such
a case we should release the cached route from the previous connect.
Fixes#5359
PiperOrigin-RevId: 353919891
This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For
unix sockets, we do not support setting of this option.
PiperOrigin-RevId: 353871484
Link address resolution is performed at the link layer (if required) so
we can defer it from the transport layer. When link resolution is
required, packets will be queued and sent once link resolution
completes. If link resolution fails, the transport layer will receive a
control message indicating that the stack failed to route the packet.
tcpip.Endpoint.Write no longer returns a channel now that writes do not
wait for link resolution at the transport layer.
tcpip.ErrNoLinkAddress is no longer used so it is removed.
Removed calls to stack.Route.ResolveWith from the transport layer so
that link resolution is performed when a route is created in response
to an incoming packet (e.g. to complete TCP handshakes or send a RST).
Tests:
- integration_test.TestForwarding
- integration_test.TestTCPLinkResolutionFailure
Fixes#4458
RELNOTES: n/a
PiperOrigin-RevId: 351684158
Read now takes a destination io.Writer, count, options. Keeping the method name
Read, in contrast to the Write method.
This enables:
* direct transfer of views under VV
* zero copy
It also eliminates the need for sentry to keep a slice of view because
userspace had requested a read that is smaller than the view returned, removing
the complexity there.
Read/Peek/ReadPacket are now consolidated together and some duplicate code is
removed.
PiperOrigin-RevId: 350636322
Removes the period of time in which subseqeuent traffic to a Failed neighbor
immediately fails with ErrNoLinkAddress. A Failed neighbor is one in which
address resolution fails; or in other words, the neighbor's IP address cannot
be translated to a MAC address.
This means removing the Failed state for linkAddrCache and allowing transitiong
out of Failed into Incomplete for neighborCache. Previously, both caches would
transition entries to Failed after address resolution fails. In this state, any
subsequent traffic requested within an unreachable time would immediately fail
with ErrNoLinkAddress. This does not follow RFC 4861 section 7.3.3:
If address resolution fails, the entry SHOULD be deleted, so that subsequent
traffic to that neighbor invokes the next-hop determination procedure again.
Invoking next-hop determination at this point ensures that alternate default
routers are tried.
The API for getting a link address for a given address, whether through the link
address cache or the neighbor table, is updated to optionally take a callback
which will be called when address resolution completes. This allows `Route` to
handle completing link resolution internally, so callers of (*Route).Resolve
(e.g. endpoints) don’t have to keep track of when it completes and update the
Route accordingly.
This change also removes the wakers from LinkAddressCache, NeighborCache, and
Route in favor of the callbacks, and callers that previously used a waker can
now just pass a callback to (*Route).Resolve that will notify the waker on
resolution completion.
Fixes#4796
Startblock:
has LGTM from sbalana
and then
add reviewer ghanan
PiperOrigin-RevId: 348597478
We do not rely on error for getsockopt options(which have boolean values)
anymore. This will cause issue in sendmsg where we used to return error
for IPV6_V6Only option. Fix the panic by returning error (for sockets other
than TCP and UDP) if the address does not match the type(AF_INET/AF_INET6) of
the socket.
PiperOrigin-RevId: 347063838
tcpip.ControlMessages can not contain Linux specific structures which makes it
painful to convert back and forth from Linux to tcpip back to Linux when passing
around control messages in hostinet and raw sockets.
Now we convert to the Linux version of the control message as soon as we are
out of tcpip.
PiperOrigin-RevId: 347027065
Ports the following options:
- TCP_NODELAY
- TCP_CORK
- TCP_QUICKACK
Also deletes the {Get/Set}SockOptBool interface methods from all implementations
PiperOrigin-RevId: 344378824
We will use SocketOptions for all kinds of options, not just SOL_SOCKET options
because (1) it is consistent with Linux which defines all option variables on
the top level socket struct, (2) avoid code complexity. Appropriate checks
have been added for matching option level to the endpoint type.
Ported the following options to this new utility:
- IP_MULTICAST_LOOP
- IP_RECVTOS
- IPV6_RECVTCLASS
- IP_PKTINFO
- IP_HDRINCL
- IPV6_V6ONLY
Changes in behavior (these are consistent with what Linux does AFAICT):
- Now IP_MULTICAST_LOOP can be set for TCP (earlier it was a noop) but does not
affect the endpoint itself.
- We can now getsockopt IP_HDRINCL (earlier we would get an error).
- Now we return ErrUnknownProtocolOption if SOL_IP or SOL_IPV6 options are used
on unix sockets.
- Now we return ErrUnknownProtocolOption if SOL_IPV6 options are used on non
AF_INET6 endpoints.
This change additionally makes the following modifications:
- Add State() uint32 to commonEndpoint because both tcpip.Endpoint and
transport.Endpoint interfaces have it. It proves to be quite useful.
- Gets rid of SocketOptionsHandler.IsListening(). It was an anomaly as it was
not a handler. It is now implemented on netstack itself.
- Gets rid of tcp.endpoint.EndpointInfo and directly embeds
stack.TransportEndpointInfo. There was an unnecessary level of embedding
which served no purpose.
- Removes some checks dual_stack_test.go that used the errors from
GetSockOptBool(tcpip.V6OnlyOption) to confirm some state. This is not
consistent with the new design and also seemed to be testing the
implementation instead of behavior.
PiperOrigin-RevId: 344354051
Multiple goroutines may use the same stack.Route concurrently so
the stack.Route should make sure that any functions called on it
are thread-safe.
Fixes#4073
PiperOrigin-RevId: 344320491
This changes also introduces:
- `SocketOptionsHandler` interface which can be implemented by endpoints to
handle endpoint specific behavior on SetSockOpt. This is analogous to what
Linux does.
- `DefaultSocketOptionsHandler` which is a default implementation of the above.
This is embedded in all endpoints so that we don't have to uselessly
implement empty functions. Endpoints with specific behavior can override the
embedded method by manually defining its own implementation.
PiperOrigin-RevId: 343158301
This change also makes the following fixes:
- Make SocketOptions use atomic operations instead of having to acquire/drop
locks upon each get/set option.
- Make documentation more consistent.
- Remove tcpip.SocketOptions from socketOpsCommon because it already exists
in transport.Endpoint.
- Refactors get/set socket options tests to be easily extendable.
PiperOrigin-RevId: 343103780
Store all the socket level options in a struct and call {Get/Set}SockOpt on
this struct. This will avoid implementing socket level options on all
endpoints. This CL contains implementing one socket level option for tcp and
udp endpoints.
PiperOrigin-RevId: 342203981
* Remove stack.Route from incoming packet path.
There is no need to pass around a stack.Route during the incoming path
of a packet. Instead, pass around the packet's link/network layer
information in the packet buffer since all layers may need this
information.
* Support address bound and outgoing packet NIC in routes.
When forwarding is enabled, the source address of a packet may be bound
to a different interface than the outgoing interface. This change
updates stack.Route to hold both NICs so that one can be used to write
packets while the other is used to check if the route's bound address
is valid. Note, we need to hold the address's interface so we can check
if the address is a spoofed address.
* Introduce the concept of a local route.
Local routes are routes where the packet never needs to leave the stack;
the destination is stack-local. We can now route between interfaces
within a stack if the packet never needs to leave the stack, even when
forwarding is disabled.
* Always obtain a route from the stack before sending a packet.
If a packet needs to be sent in response to an incoming packet, a route
must be obtained from the stack to ensure the stack is configured to
send packets to the packet's source from the packet's destination.
* Enable spoofing if a stack may send packets from unowned addresses.
This change required changes to some netgophers since previously,
promiscuous mode was enough to let the netstack respond to all
incoming packets regardless of the packet's destination address. Now
that a stack.Route is not held for each incoming packet, finding a route
may fail with local addresses we don't own but accepted packets for
while in promiscuous mode. Since we also want to be able to send from
any address (in response the received promiscuous mode packets), we need
to enable spoofing.
* Skip transport layer checksum checks for locally generated packets.
If a packet is locally generated, the stack can safely assume that no
errors were introduced while being locally routed since the packet is
never sent out the wire.
Some bugs fixed:
- transport layer checksum was never calculated after NAT.
- handleLocal didn't handle routing across interfaces.
- stack didn't support forwarding across interfaces.
- always consult the routing table before creating an endpoint.
Updates #4688Fixes#3906
PiperOrigin-RevId: 340943442