Commit Graph

444 Commits

Author SHA1 Message Date
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
Bhasker Hariharan bcbb3ef317 Add a Stringer implementation to PacketDispatchMode
PiperOrigin-RevId: 272083936
2019-09-30 15:52:55 -07:00
Bhasker Hariharan 61f6fbd0ce Fix bugs in PickEphemeralPort for TCP.
Netstack always picks a random start point everytime PickEphemeralPort
is called. While this is required for UDP so that DNS requests go
out through a randomized set of ports it is not required for TCP. Infact
Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret
initialized at start of the application to get a random offset. But to
ensure it doesn't start from the same point on every scan it uses a static
hint that is incremented by 2 in every call to pick ephemeral ports.

The reason for 2 is Linux seems to split the port ranges where active connects
seem to use even ones while odd ones are used by listening sockets.

This CL implements a similar strategy where we use a hash + hint to generate
the offset to start the search for a free Ephemeral port.

This ensures that we cycle through the available port space in order for
repeated connects to the same destination and significantly reduces the
chance of picking a recently released port.

PiperOrigin-RevId: 272058370
2019-09-30 13:55:22 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Chris Kuiper 6704d625ef Return only primary addresses in Stack.NICInfo()
Non-primary addresses are used for endpoints created to accept multicast and
broadcast packets, as well as "helper" endpoints (0.0.0.0) that allow sending
packets when no proper address has been assigned yet (e.g., for DHCP). These
addresses are not real addresses from a user point of view and should not be
part of the NICInfo() value. Also see b/127321246 for more info.

This switches NICInfo() to call a new NIC.PrimaryAddresses() function. To still
allow an option to get all addresses (mostly for testing) I added
Stack.GetAllAddresses() and NIC.AllAddresses().

In addition, the return value for GetMainNICAddress() was changed for the case
where the NIC has no primary address. Instead of returning an error here,
it now returns an empty AddressWithPrefix() value. The rational for this
change is that it is a valid case for a NIC to have no primary addresses.

Lastly, I refactored the code based on the new additions.

PiperOrigin-RevId: 270971764
2019-09-24 13:21:20 -07:00
Tamir Duberstein bbaaa1fcc2 Simplify ICMPRateLimiter
https://github.com/golang/time/commit/c4c64ca added SetBurst upstream.

PiperOrigin-RevId: 270925077
2019-09-24 09:50:51 -07:00
Andrei Vagin 03ee55cc62 netstack: convert more socket options to {Set,Get}SockOptInt
PiperOrigin-RevId: 270763208
2019-09-23 14:39:14 -07:00
Ian Gudger 002f1d4aae Allow waiting for LinkEndpoint worker goroutines to finish.
Previously, the only safe way to use an fdbased endpoint was to leak the FD.
This change makes it possible to safely close the FD.

This is the first step towards having stoppable stacks.

Updates #837

PiperOrigin-RevId: 270346582
2019-09-20 14:10:02 -07:00
Ghanan Gowripalan 60fe8719e1 Automated rollback of changelist 268047073
PiperOrigin-RevId: 269658971
2019-09-17 14:47:09 -07:00
Ian Gudger 747320a7aa Update remaining users of LinkEndpoints to not refer to them as an ID.
PiperOrigin-RevId: 269614517
2019-09-17 11:31:00 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Ghanan Gowripalan 857940d30d Automated rollback of changelist 268047073
PiperOrigin-RevId: 268757842
2019-09-12 13:52:25 -07:00
Ian Gudger 9dfcd8b09f Fix ephemeral port leak.
Fix a bug where udp.(*endpoint).Disconnect [accessible in gVisor via
epsocket.(*SocketOperations).Connect with AF_UNSPEC] would leak a port
reservation if the socket/endpoint had an ephemeral port assigned to it.

glibc's getaddrinfo uses connect with AF_UNSPEC, causing each call of
getaddrinfo to leak a port. Call getaddrinfo too many times and you run out of
ports (shows up as connect returning EAGAIN and getaddrinfo returning
EAI_NONAME "Name or service not known").

PiperOrigin-RevId: 268071160
2019-09-09 14:02:00 -07:00
Ghanan Gowripalan a8943325db Join IPv6 all-nodes and solicited-node multicast addresses where appropriate.
The IPv6 all-nodes multicast address will be joined on NIC enable, and the
appropriate IPv6 solicited-node multicast address will be joined when IPv6
addresses are added.

Tests: Test receiving packets destined to the IPv6 link-local all-nodes
multicast address and the IPv6 solicted node address of an added IPv6 address.
PiperOrigin-RevId: 268047073
2019-09-09 12:06:06 -07:00
Ian Gudger fe1f521077 Remove reundant global tcpip.LinkEndpointID.
PiperOrigin-RevId: 267709597
2019-09-06 18:01:14 -07:00
Bhasker Hariharan 3dc3cffb2d Fix RST generation bugs.
There are a few cases addressed by this change

- We no longer generate a RST in response to a RST packet.

- When we receive a RST we cleanup and release all reservations immediately as
  the connection is now aborted.

- An ACK received by a listening socket generates a RST when SYN cookies are not
  in-use. The only reason an ACK should land at the listening socket is if we
  are using SYN cookies otherwise the goroutine for the handshake in progress
  should have gotten the packet and it should never have arrived at the
  listening endpoint.

- Also fixes the error returned when a connection times out due to a
  Keepalive timer expiration from ECONNRESET to a ETIMEDOUT.

PiperOrigin-RevId: 267238427
2019-09-04 14:59:53 -07:00
Chris Kuiper 7bf1d426d5 Handle subnet and broadcast addresses correctly with NIC.subnets
This also renames "subnet" to "addressRange" to avoid any more confusion with
an interface IP's subnet.

Lastly, this also removes the Stack.ContainsSubnet(..) API since it isn't used
by anyone. Plus the same information can be obtained from
Stack.NICAddressRanges().

PiperOrigin-RevId: 267229843
2019-09-04 14:19:32 -07:00
Ghanan Gowripalan 144127e5e1 Validate IPv6 Hop Limit field for received NDP packets
Make sure that NDP packets are only received if their IP header's hop limit
field is set to 255, as per RFC 4861.

PiperOrigin-RevId: 267061457
2019-09-03 18:43:12 -07:00
Bhasker Hariharan 3789c34b22 Make UDP traceroute work.
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.

Fixes #703

PiperOrigin-RevId: 267034418
2019-09-03 16:01:17 -07:00
Haibo Xu fa151e3971 Remove duplicated file in pkg/tcpip/link/rawfile.
The blockingpoll_unsafe.go was copied to blockingpoll_noyield_unsafe.go
during merging commit 7206202bb9. If it still stay here, it would
cause build errors on non-amd64 platform.

ERROR:
pkg/tcpip/link/rawfile/BUILD:5:1:
GoCompilePkg
pkg/tcpip/link/rawfile.a
failed (Exit 1) builder failed: error executing command
bazel-out/host/bin/external/go_sdk/builder compilepkg -sdk
external/go_sdk -installsuffix linux_arm64 -src
pkg/tcpip/link/rawfile/blockingpoll_noyield_unsafe.go -src ...
(remaining 33 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
compilepkg: error running subcommand: exit status 2
pkg/tcpip/link/rawfile/blockingpoll_yield_unsafe.go:35:6:
BlockingPoll redeclared in this block
        previous declaration at
pkg/tcpip/link/rawfile/blockingpoll_unsafe.go:26:78
Target //pkg/tcpip/link/rawfile:rawfile failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 25.531s, Critical Path: 21.08s
INFO: 262 processes: 262 linux-sandbox.
FAILED: Build did NOT complete successfully

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I4e21f82984225d0aa173de456f7a7c66053a053e
2019-09-02 02:49:41 +00:00
Chris Kuiper afbdf2f212 Fix data race accessing referencedNetworkEndpoint.kind
Wrapping "kind" into atomic access functions.

Fixes #789

PiperOrigin-RevId: 266485501
2019-08-30 17:23:53 -07:00
Rahat Mahmood 863e11ac4d Implement /proc/net/udp.
PiperOrigin-RevId: 266229756
2019-08-29 14:30:41 -07:00
Tamir Duberstein 24ecce5dbf Export generated linkAddrEntryEntry
PiperOrigin-RevId: 266000128
2019-08-28 14:56:33 -07:00
Tamir Duberstein 313c767b00 Populate link address cache at dispatch
This allows the stack to learn remote link addresses on incoming
packets, reducing the need to ARP to send responses.

This also reduces the number of round trips to the system clock,
since that may also prove to be performance-sensitive.

Fixes #739.

PiperOrigin-RevId: 265815816
2019-08-27 18:54:56 -07:00
Rahat Mahmood 1fdefd41c5 netstack/tcp: Add LastAck transition.
Add missing state transition to LastAck, which should happen when the
endpoint has already recieved a FIN from the remote side, and is
sending its own FIN.

PiperOrigin-RevId: 265568314
2019-08-26 16:39:13 -07:00
gVisor bot 7206202bb9 Merge pull request #696 from xiaobo55x:tcpip_link
PiperOrigin-RevId: 265534854
2019-08-26 14:03:30 -07:00
Chris Kuiper ac2200b8a9 Prevent a network endpoint to send/rcv if its address was removed
This addresses the problem where an endpoint has its address removed but still
has outstanding references held by routes used in connected TCP/UDP sockets
which prevent the removal of the endpoint.

The fix adds a new "expired" flag to the referenced network endpoint, which is
set when an endpoint has its address removed. Incoming packets are not
delivered to an expired endpoint (unless in promiscuous mode), while sending
outgoing packets triggers an error to the caller (unless in spoofing mode).

In addition, a few helper functions were added to stack_test.go to reduce
code duplications.

PiperOrigin-RevId: 265514326
2019-08-26 12:29:47 -07:00
Tamir Duberstein e75a12e89d Implement fmt.Stringer on Route by value
This is more convenient, since it implements the interface for both
value and pointer.

PiperOrigin-RevId: 265086510
2019-08-23 10:44:11 -07:00
Chris Kuiper 8d9276ed56 Support binding to multicast and broadcast addresses
This fixes the issue of not being able to bind to either a multicast or
broadcast address as well as to send and receive data from it. The way to solve
this is to treat these addresses similar to the ANY address and register their
transport endpoint ID with the global stack's demuxer rather than the NIC's.
That way there is no need to require an endpoint with that multicast or
broadcast address. The stack's demuxer is in fact the only correct one to use,
because neither broadcast- nor multicast-bound sockets care which NIC a
packet was received on (for multicast a join is still needed to receive packets
on a NIC).

I also took the liberty of refactoring udp_test.go to consolidate a lot of
duplicate code and make it easier to create repetitive tests that test the same
feature for a variety of packet and socket types. For this purpose I created a
"flowType" that represents two things: 1) the type of packet being sent or
received and 2) the type of socket used for the test. E.g., a "multicastV4in6"
flow represents a V4-mapped multicast packet run through a V6-dual socket.

This allows writing significantly simpler tests. A nice example is testTTL().

PiperOrigin-RevId: 264766909
2019-08-21 22:54:25 -07:00
Tamir Duberstein 573e6e4bba Use tcpip.Subnet in tcpip.Route
This is the first step in replacing some of the redundant types with the
standard library equivalents.

PiperOrigin-RevId: 264706552
2019-08-21 15:31:18 -07:00
Chris Kuiper 7e79ca0225 Add tcpip.Route.String and tcpip.AddressMask.Prefix
PiperOrigin-RevId: 264544163
2019-08-20 23:28:52 -07:00
gVisor bot 3ffbdffd7e Internal change.
PiperOrigin-RevId: 264218306
2019-08-19 12:43:22 -07:00
Andrei Vagin 3e4102b2ea netstack: disconnect an unix socket only if the address family is AF_UNSPEC
Linux allows to call connect for ANY and the zero port.

PiperOrigin-RevId: 263892534
2019-08-16 19:32:14 -07:00
Chris Kuiper f7114e0a27 Add subnet checking to NIC.findEndpoint and consolidate with NIC.getRef
This adds the same logic to NIC.findEndpoint that is already done in
NIC.getRef. Since this makes the two functions very similar they were combined
into one with the originals being wrappers.

PiperOrigin-RevId: 263864708
2019-08-16 15:58:58 -07:00
Tamir Duberstein fe74bba2bd Don't dereference errors passed to panic()
These errors are always pointers; there's no sense in dereferencing them
in the panic call. Changed one false positive for clarity.

PiperOrigin-RevId: 263611579
2019-08-15 11:58:16 -07:00
Tamir Duberstein 816a9211e9 netstack: move resumption logic into *_state.go
13a98df rearranged some of this code in a way that broke compilation of
the netstack-only export at github.com/google/netstack because
*_state.go files are not included in that export.

This commit moves resumption logic back into *_state.go, fixing the
compilation breakage.

PiperOrigin-RevId: 263601629
2019-08-15 11:13:46 -07:00
Haibo Xu 1b1e39d7a1 Enabling pkg/tcpip/link support on arm64.
Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: Ib6b4aa2db19032e58bf0395f714e6883caee460a
2019-08-15 03:19:30 +00:00
Haibo Xu 52843719ca Rename fdbased/mmap.go to fdbased/mmap_stub.go.
Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: Id4489554b9caa332695df8793d361f8332f6a13b
2019-08-15 03:19:22 +00:00
Haibo Xu 0624858593 Rename rawfile/blockingpoll_unsafe.go to rawfile/blockingpoll_stub_unsafe.go.
Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: I2376e502c1a860d5e624c8a8e3afab5da4c53022
2019-08-15 03:19:14 +00:00
Tamir Duberstein d81d94ac4c Replace uinptr with int64 when returning lengths
This is in accordance with newer parts of the standard library.

PiperOrigin-RevId: 263449916
2019-08-14 16:05:56 -07:00
Tamir Duberstein 69d1414a32 Add tcpip.AddressWithPrefix.String
PiperOrigin-RevId: 263436592
2019-08-14 15:02:14 -07:00
Bhasker Hariharan 570fb1db6b Improve SendMsg performance.
SendMsg before this change would copy all the data over into a
new slice even if the underlying socket could only accept a
small amount of data. This is really inefficient with non-blocking
sockets and under high throughput where large writes could get
ErrWouldBlock or if there was say a timeout associated with the sendmsg()
syscall.

With this change we delay copying bytes in till they are needed and only
copy what can be potentially sent/held in the socket buffer. Reducing
the need to repeatedly copy data over.

Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called
instead of when we transmit the actual FIN. Otherwise the socket could remain in
CONNECTED state even though the user has called shutdown() on the socket.

Updates #627

PiperOrigin-RevId: 263430505
2019-08-14 14:34:27 -07:00
Ian Gudger 99bf75a6dc gonet: Replace NewPacketConn with DialUDP.
This better matches the standard library and allows creating connected
PacketConns.

PiperOrigin-RevId: 263187462
2019-08-13 12:11:09 -07:00
Ian Gudger eac690e358 Fix netstack build error on non-AMD64.
This stub had the wrong function signature.

PiperOrigin-RevId: 262992682
2019-08-12 13:31:16 -07:00
Bhasker Hariharan 5a38eb120a Add congestion control states to sender.
This change just introduces different congestion control states and
ensures the sender.state is updated to reflect the current state
of the connection.

It is not used for any decisions yet but this is required before
algorithms like Eiffel/PRR can be implemented.

Fixes #394

PiperOrigin-RevId: 262638292
2019-08-09 14:50:30 -07:00
Rahat Mahmood 13a98df49e netstack: Don't start endpoint goroutines too soon on restore.
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.

This CL defers resuming all protocol goroutines until the end of
restore.

PiperOrigin-RevId: 262409429
2019-08-08 12:33:11 -07:00
Tamir Duberstein 67a3f4039d Set target address in ARP Reply
PiperOrigin-RevId: 262163794
2019-08-07 10:27:43 -07:00
Bhasker Hariharan dfbc0b0a4c Fix for a panic due to writing to a closed accept channel.
This can happen because endpoint.Close() closes the accept channel first and
then drains/resets any accepted but not delivered connections. But there can be
connections that are connected but not delivered to the channel as the channel
was full. But closing the channel can cause these writes to fail with a write to
a closed channel.

The correct solution is to abort any connections in SYN-RCVD state and
drain/abort all completed connections before closing the accept channel.

PiperOrigin-RevId: 261951132
2019-08-06 11:01:27 -07:00
Kevin Krakauer 810cc07aab Plumbing for iptables sockopts.
PiperOrigin-RevId: 261413396
2019-08-02 16:26:48 -07:00
Rahat Mahmood 2906dffcdb Automated rollback of changelist 261191548
PiperOrigin-RevId: 261373749
2019-08-02 12:52:40 -07:00
Rahat Mahmood 79511e8a50 Implement getsockopt(TCP_INFO).
Export some readily-available fields for TCP_INFO and stub out the rest.

PiperOrigin-RevId: 261191548
2019-08-01 13:58:48 -07:00
Austin Kiekintveld 12c4eb294a Fix ICMPv4 EchoReply packet checksum
The checksum was not being reset before being re-calculated and sent out.
This caused the sent checksum to always be `0x0800`.

Fixes #605.

PiperOrigin-RevId: 260965059
2019-07-31 11:26:41 -07:00
Tamir Duberstein c6e6d92cb1 Test connecting UDP sockets to the ANY address
This doesn't currently pass on gVisor.

While I'm here, fix a bug where connecting to the v6-mapped v4 address doesn't
work in gVisor.

PiperOrigin-RevId: 260923961
2019-07-31 07:41:20 -07:00
Tamir Duberstein 7369c63e42 Pass ProtocolAddress instead of its fields
PiperOrigin-RevId: 260803517
2019-07-30 15:06:39 -07:00
Haibo Xu 1decf76471 Change syscall.POLL to syscall.PPOLL.
syscall.POLL is not supported on arm64, using syscall.PPOLL
to support both the x86 and arm64. refs #63

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I2c81a063d3ec4e7e6b38fe62f17a0924977f505e
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/543 from xiaobo55x:master ba598263fd3748d1addd48e4194080aa12085164
PiperOrigin-RevId: 260752049
2019-07-30 11:01:29 -07:00
Chris Kuiper 40e682759f Add support for a subnet prefix length on interface network addresses
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.

PiperOrigin-RevId: 259807693
2019-07-24 13:42:14 -07:00
Tamir Duberstein 12c256568b Deduplicate EndpointState.connected some
This fixes a bug introduced in cl/251934850 that caused
connect-accept-close-connect races to result in the second connect call
failiing when it should have succeeded.

PiperOrigin-RevId: 259584525
2019-07-23 12:10:18 -07:00
Chris Kuiper 0e040ba6e8 Handle interfaceAddr and NIC options separately for IP_MULTICAST_IF
This tweaks the handling code for IP_MULTICAST_IF to ignore the InterfaceAddr
if a NICID is given.

PiperOrigin-RevId: 258982541
2019-07-19 09:29:04 -07:00
Andrei Vagin eefa817cfd net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)
PiperOrigin-RevId: 258859507
2019-07-18 15:41:04 -07:00
gVisor bot 74dc663bbb Internal change.
PiperOrigin-RevId: 258424489
2019-07-16 13:03:37 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Tamir Duberstein 17bab652af Check that IP headers contain correct version
PiperOrigin-RevId: 257888338
2019-07-12 16:19:18 -07:00
Bhasker Hariharan 6116473b2f Stub out support for TCP_MAXSEG.
Adds support to set/get the TCP_MAXSEG value but does not
really change the segment sizes emitted by netstack or
alter the MSS advertised by the endpoint. This is currently
being added only to unblock iperf3 on gVisor. Plumbing
this correctly requires a bit more work which will come
in separate CLs.

PiperOrigin-RevId: 257859112
2019-07-12 13:35:17 -07:00
Andrei Vagin 116cac053e netstack/udp: connect with the AF_UNSPEC address family means disconnect
PiperOrigin-RevId: 256433283
2019-07-03 14:19:02 -07:00
gVisor bot d60ae0ddee Merge pull request #279 from kevinGC:iptables-1-pkg
PiperOrigin-RevId: 256231055
2019-07-02 13:48:06 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Bhasker Hariharan c1761378a9 Fix the logic for sending zero window updates.
Today we have the logic split in two places between endpoint Read() and the
worker goroutine which actually sends a zero window. This change makes it so
that when a zero window ACK is sent we set a flag in the endpoint which can be
read by the endpoint to decide if it should notify the worker to send a
nonZeroWindow update.

The worker now does not do the check again but instead sends an ACK and flips
the flag right away.

Similarly today when SO_RECVBUF is set the SetSockOpt call has logic
to decide if a zero window update is required. Rather than do that we move
the logic to the worker goroutine and it can check the zeroWindow flag
and send an update if required.

PiperOrigin-RevId: 254505447
2019-06-21 18:31:31 -07:00
Brad Burlage ae4ef32b8c Deflake TestSimpleReceive failures due to timeouts
This test will occasionally fail waiting to read a packet. From repeated runs,
I've seen it up to 1.5s for waitForPackets to complete.

PiperOrigin-RevId: 254484627
2019-06-21 15:56:12 -07:00
Bhasker Hariharan 3d71c627fa Add support for TCP receive buffer auto tuning.
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.

This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.

The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf

NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.

Updates #230

PiperOrigin-RevId: 253168283
2019-06-13 22:28:01 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Adin Scannell e352f46478 Minor BUILD file cleanup.
PiperOrigin-RevId: 252918338
2019-06-12 15:59:46 -07:00
Kevin Krakauer 0bbbcafd68 Merge branch 'master' into iptables-1-pkg
Change-Id: I7457a11de4725e1bf3811420c505d225b1cb6943
2019-06-12 15:21:22 -07:00
Bhasker Hariharan 70578806e8 Add support for TCP_CONGESTION socket option.
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.

PiperOrigin-RevId: 252889093
2019-06-12 13:35:50 -07:00
Bhasker Hariharan 3933dd5c04 Fixes to listen backlog handling.
Changes netstack to confirm to current linux behaviour where if the backlog is
full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto
backlog connections to be in SYN-RCVD state as long as the backlog is not full.

We also now drop a SYN if syn cookies are in use and the backlog for the
listening endpoint is full.

Added new tests to confirm the behaviour.

Also reverted the change to increase the backlog in TcpPortReuseMultiThread
syscall test.

Fixes #236

PiperOrigin-RevId: 252500462
2019-06-10 15:40:44 -07:00
Kevin Krakauer 06a83df533 Address more comments.
Change-Id: I83ae1079f3dcba6b018f59ab7898decab5c211d2
2019-06-10 12:43:54 -07:00
Kevin Krakauer 8afbd974da Address Ian's comments.
Change-Id: I7445033b1970cbba3f2ed0682fe520dce02d8fad
2019-06-07 12:54:53 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Bhasker Hariharan 85be01b42d Add multi-fd support to fdbased endpoint.
This allows an fdbased endpoint to have multiple underlying fd's from which
packets can be read and dispatched/written to.

This should allow for higher throughput as well as better scalability of the
network stack as number of connections increases.

Updates #231

PiperOrigin-RevId: 251852825
2019-06-06 08:07:02 -07:00
Andrei Vagin 79f7cb6c1c netstack/sniffer: log GSO attributes
PiperOrigin-RevId: 251788534
2019-06-05 22:51:53 -07:00
Andrei Vagin a12848ffeb netstack/tcp: fix calculating a number of outstanding packets
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.

PiperOrigin-RevId: 251743020
2019-06-05 16:30:45 -07:00
Chris Kuiper d18bb4f38a Adjust route when looping multicast packets
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.

PiperOrigin-RevId: 251739298
2019-06-05 16:08:29 -07:00
Bhasker Hariharan e0fb921205 Fix data race in synRcvdState.
When checking the length of the acceptedChan we should hold the
endpoint mutex otherwise a syn received while the listening socket
is being closed can result in a data race where the cleanupLocked
routine sets acceptedChan to nil while a handshake goroutine
in progress could try and check it at the same time.

PiperOrigin-RevId: 251537697
2019-06-04 16:17:24 -07:00
Bhasker Hariharan bfe3220992 Delete debug log lines left by mistake.
Updates #236

PiperOrigin-RevId: 251337915
2019-06-03 17:00:18 -07:00
Bhasker Hariharan 3577a4f691 Disable certain tests that are flaky under race detector.
PiperOrigin-RevId: 250976665
2019-05-31 16:19:49 -07:00
Bhasker Hariharan 033f96cc93 Change segment queue limit to be of fixed size.
Netstack sets the unprocessed segment queue size to match the receive
buffer size. This is not required as this queue only needs to hold enough
for a short duration before the endpoint goroutine can process it.

Updates #230

PiperOrigin-RevId: 250976323
2019-05-31 16:17:33 -07:00
Kevin Krakauer d58eb9ce82 Add basic iptables structures to netstack.
Change-Id: Ib589906175a59dae315405a28f2d7f525ff8877f
2019-05-31 16:14:04 -07:00
Fabricio Voznika 38de91b028 Add build guard to files using go:linkname
Funcion signatures are not validated during compilation. Since
they are not exported, they can change at any time. The guard
ensures that they are verified at least on every version upgrade.

PiperOrigin-RevId: 250733742
2019-05-30 12:09:39 -07:00
Bhasker Hariharan ae26b2c425 Fixes to TCP listen behavior.
Netstack listen loop can get stuck if cookies are in-use and the app is slow to
accept incoming connections. Further we continue to complete handshake for a
connection even if the backlog is full. This creates a problem when a lots of
connections come in rapidly and we end up with lots of completed connections
just hanging around to be delivered.

These fixes change netstack behaviour to mirror what linux does as described
here in the following article

http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

Now when cookies are not in-use Netstack will silently drop the ACK to a SYN-ACK
and not complete the handshake if the backlog is full.  This will result in the
connection staying in a half-complete state. Eventually the sender will
retransmit the ACK and if backlog has space we will transition to a connected
state and deliver the endpoint.

Similarly when cookies are in use we do not try and create an endpoint unless
there is space in the accept queue to accept the newly created endpoint. If
there is no space then we again silently drop the ACK as we can just recreate it
when the ACK is retransmitted by the peer.

We also now use the backlog to cap the size of the SYN-RCVD queue for a given
endpoint. So at any time there can be N connections in the backlog and N in a
SYN-RCVD state if the application is not accepting connections. Any new SYNs
will be dropped.

This CL also fixes another small bug where we mark a new endpoint which has not
completed handshake as connected. We should wait till handshake successfully
completes before marking it connected.

Updates #236

PiperOrigin-RevId: 250717817
2019-05-30 12:08:41 -07:00
Tamir Duberstein e4b395db49 Remove unused wakers
These wakers are uselessly allocated and passed around; nothing ever
listens for notifications on them. The code here appears to be
vestigial, so removing it and allowing a nil waker to be passed seems
appropriate.

PiperOrigin-RevId: 249879320
Change-Id: Icd209fb77cc0dd4e5c49d7a9f2adc32bf88b4b71
2019-05-24 12:29:14 -07:00
Kevin Krakauer c1cdf18e7b UDP and TCP raw socket support.
PiperOrigin-RevId: 249511348
Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-22 13:45:15 -07:00
Bhasker Hariharan 2ac0aeeb42 Refactor fdbased endpoint dispatcher code.
This is in preparation to support an fdbased endpoint that can read/dispatch
packets from multiple underlying fds.

Updates #231

PiperOrigin-RevId: 249337074
Change-Id: Id7d375186cffcf55ae5e38986e7d605a96916d35
2019-05-21 15:24:25 -07:00
Nicolas Lacasse bfd9f75ba4 Set the FilesytemType in MountSource from the Filesystem.
And stop storing the Filesystem in the MountSource.

This allows us to decouple the MountSource filesystem type from the name of the
filesystem.

PiperOrigin-RevId: 247292982
Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8
2019-05-08 14:35:06 -07:00
Googler cbf6ab9697 Check GSO for nil in WritePacket
Testing:
Unit tests added
PiperOrigin-RevId: 247096269
Change-Id: I849c010eadcb53caf45896a15ef38162d66a9568
2019-05-07 14:57:03 -07:00
Ian Gudger 20862f0db2 Add gonet.DialContextTCP.
Allows cancellation and timeouts.

PiperOrigin-RevId: 247090428
Change-Id: I91907f12e218677dcd0e0b6d72819deedbd9f20c
2019-05-07 14:27:36 -07:00
Kevin Krakauer ff8ed5e6a5 Fix raw socket behavior and tests.
Some behavior was broken due to the difficulty of running automated raw
socket tests.

Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
2019-05-05 16:07:25 -07:00
Ian Gudger b4a9f18687 Update tcpip Clock description.
The tcpip.Clock comment stated that times provided by it should not be used for
netstack internal timekeeping. This comment was from before the interface
supported monotonic times. The monotonic times that it provides are now be the
preferred time source for netstack internal timekeeping.

PiperOrigin-RevId: 246618772
Change-Id: I853b720e3d719b03fabd6156d2431da05d354bda
2019-05-03 21:01:42 -07:00
Googler f2699b76c8 Support IPv4 fragmentation in netstack
Testing:
Unit tests and also large ping in Fuchsia OS
PiperOrigin-RevId: 246563592
Change-Id: Ia12ab619f64f4be2c8d346ce81341a91724aef95
2019-05-03 13:30:35 -07:00
Tamir Duberstein 0e1cc476db Fix transport/raw copybara export
- include packet_list.go
- exclude state.go (by renaming to include an underscore)

Also rename raw.go to endpoint.go for consistency.

PiperOrigin-RevId: 246547912
Change-Id: I19c8331c794ba683a940cc96a8be6497b53ff24d
2019-05-03 11:52:59 -07:00
Bhasker Hariharan 458fe955a7 Implement support for SACK based recovery(RFC 6675).
PiperOrigin-RevId: 246536003
Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
2019-05-03 10:51:18 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Ben Burkert 66bca6fc22 tcpip/adapters/gonet: add CloseRead & CloseWrite methods to Conn
Add the CloseRead & CloseWrite methods that performs shutdown on the
corresponding Read & Write sides of a connection.

Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1
PiperOrigin-RevId: 245537950
2019-04-26 22:46:45 -07:00
Kevin Krakauer 43dff57b87 Make raw sockets a toggleable feature disabled by default.
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-26 16:51:46 -07:00
Bhasker Hariharan 228dc15fd1 Bump the AF_PACKET socket rcv buf size to 4MB by default.
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.

Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.

Updates #211

iperf output w/ 4MB buffer.

 iperf3 -c 172.17.0.2 -t 100
 Connecting to host 172.17.0.2, port 5201
 [  4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
 [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
 [  4]   0.00-1.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.02 MBytes
 [  4]   1.00-2.00   sec  1.18 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]   2.00-3.00   sec   965 MBytes  8.09 Gbits/sec    0   1.02 MBytes
 [  4]   3.00-4.00   sec   942 MBytes  7.90 Gbits/sec    0   1.02 MBytes
 [  4]   4.00-5.00   sec   952 MBytes  7.99 Gbits/sec    0   1.02 MBytes
 [  4]   5.00-6.00   sec  1.14 GBytes  9.81 Gbits/sec    0   1.02 MBytes
 [  4]   6.00-7.00   sec  1.13 GBytes  9.68 Gbits/sec    0   1.02 MBytes
 [  4]   7.00-8.00   sec   930 MBytes  7.80 Gbits/sec    0   1.02 MBytes
 [  4]   8.00-9.00   sec  1.15 GBytes  9.91 Gbits/sec    0   1.02 MBytes
 [  4]   9.00-10.00  sec   938 MBytes  7.87 Gbits/sec    0   1.02 MBytes
 [  4]  10.00-11.00  sec   737 MBytes  6.18 Gbits/sec    0   1.02 MBytes
 [  4]  11.00-12.00  sec  1.16 GBytes  9.93 Gbits/sec    0   1.02 MBytes
 [  4]  12.00-13.00  sec   917 MBytes  7.69 Gbits/sec    0   1.02 MBytes
 [  4]  13.00-14.00  sec  1.19 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]  14.00-15.00  sec  1.01 GBytes  8.70 Gbits/sec    0   1.02 MBytes
 [  4]  15.00-16.00  sec  1.20 GBytes  10.3 Gbits/sec    0   1.02 MBytes
 [  4]  16.00-17.00  sec  1.14 GBytes  9.80 Gbits/sec    0   1.02 MBytes
 ^C[  4]  17.00-17.60  sec   718 MBytes  10.1 Gbits/sec    0   1.02 MBytes
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ ID] Interval           Transfer     Bandwidth       Retr
 [  4]   0.00-17.60  sec  18.4 GBytes  8.98 Gbits/sec    0             sender
 [  4]   0.00-17.60  sec  0.00 Bytes  0.00 bits/sec                  receiver

PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
2019-04-26 12:52:02 -07:00
Bhasker Hariharan 56cadcac4e Fixes to PacketMMap dispatcher.
This CL fixes the following bugs:

- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.

- Increments ringOffsets for frames that are truncated (i.e status is
  tpStatusCopy)

- Does not ignore frames with tpStatusLost bit set as they are valid frames and
  only indicate that there some frames were lost before this one and metrics can
  be retrieved with a getsockopt call.

- Adds checks to make sure blockSize is a multiple of page size. This is
  required as the kernel allocates in pages per block and rejects sizes that are
  not page aligned with an EINVAL.

Updates #210

PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
2019-04-23 17:47:56 -07:00
Ben Burkert 56927e5317 tcpip/transport/tcp: read side only shutdown of an endpoint
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.

Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.

Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
2019-04-19 19:29:05 -07:00
Ben Burkert cec2cdc12f tcpip/transport/udp: add Forwarder type
Add a UDP forwarder for intercepting and forwarding UDP sessions.

Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9
PiperOrigin-RevId: 244293576
2019-04-18 17:49:57 -07:00
Andrei Vagin 4524790ff6 netstack: use a proper network protocol to set gso.L3HdrLen
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.

This means that we can't use endpoint.netProto to set gso.L3HdrLen.

PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
2019-04-18 11:42:23 -07:00
Fabricio Voznika 9f8c89fc7f Return error from fdbased.New
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
2019-04-17 11:16:35 -07:00
Bhasker Hariharan eaac2806ff Add TCP checksum verification.
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
2019-04-09 11:23:47 -07:00
Kevin Krakauer 52a51a8e20 Add a raw socket transport endpoint and use it for raw ICMP sockets.
Having raw socket code together will make it easier to add support for other raw
network protocols. Currently, only ICMP uses the raw endpoint. However, adding
support for other protocols such as UDP shouldn't be much more difficult than
adding a few switch cases.

PiperOrigin-RevId: 241564875
Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-04-02 11:13:49 -07:00
Bhasker Hariharan 45c54b1f4e Fix incorrect checksums in TCP and UDP tests.
PiperOrigin-RevId: 241025361
Change-Id: I292e7aea9a4b294b11e4f736e107010d9524586b
2019-03-29 12:05:43 -07:00
Bhasker Hariharan cc0e96a4bd Fix Panic in SACKScoreboard.Delete.
The panic was caused by modifying the tree while iterating which invalidated the
iterator.

Also fixes another bug in SACKScoreboard.Insert() which was causing blocks to be
merged incorrectly.

PiperOrigin-RevId: 240895053
Change-Id: Ia72b8244297962df5c04283346da5226434740af
2019-03-28 18:18:39 -07:00
Bert Muthalaly f2e5dcf21c Add ICMP stats
PiperOrigin-RevId: 240848882
Change-Id: I23dd4599f073263437aeab357c3f767e1a432b82
2019-03-28 14:09:20 -07:00
Andrei Vagin f4105ac21a netstack/fdbased: add generic segmentation offload (GSO) support
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.

Here are numbers for the nginx-1m test:
runsc:		579330.01 [Kbytes/sec] received
runsc-gso:	1794121.66 [Kbytes/sec] received
runc:		2122139.06 [Kbytes/sec] received

and for tcp_benchmark:

$ tcp_benchmark  --duration 15   --ideal
[  4]  0.0-15.0 sec  86647 MBytes  48456 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal
[  4]  0.0-15.0 sec  2173 MBytes  1214 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal --gso 65536
[  4]  0.0-15.0 sec  19357 MBytes  10825 Mbits/sec

PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-03-28 11:03:41 -07:00
Tamir Duberstein 8406504817 Avoid mutating memory passed to DeliverTransportPacket
PiperOrigin-RevId: 240642903
Change-Id: I16625015123a827d267d60b328a202057264bbd6
2019-03-27 14:36:57 -07:00
Tamir Duberstein 9c20a88bd7 Remove polling from ICMP test
PiperOrigin-RevId: 240483396
Change-Id: Ie75d3ae38af83f1d92f167ff9ba58fa10f5b372b
2019-03-26 20:20:52 -07:00
Andrei Vagin 654e878abb netstack: Don't exclude length when a pseudo-header checksum is calculated
This is a preparation for GSO changes (cl/234508902).

RELNOTES[gofers]: Refactor checksum code to include length, which
it already did, but in a convoluted way. Should be a no-op.

PiperOrigin-RevId: 240460794
Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2
2019-03-26 17:15:13 -07:00
Tamir Duberstein 9cd2b66f10 Remove echoReplier
Mirror the ICMPv6 echo implementation in ICMPv4 echo. This removes
unnecessary asynchrony, reduces copying, and reduces complexity.

PiperOrigin-RevId: 240394525
Change-Id: If8f53254154f86772f5e51159765aa23b3b328b8
2019-03-26 11:45:01 -07:00
Andrei Vagin 9f4e1cb797 netstack: adjust the sequence number after trimming the packet
PiperOrigin-RevId: 239417224
Change-Id: I14a9adc31a6330a79a6156c105969cd5f1f63d20
2019-03-20 09:58:10 -07:00
Andrei Vagin 87cce0ec08 netstack: reduce MSS from SYN to account tcp options
See: https://tools.ietf.org/html/rfc6691#section-2
PiperOrigin-RevId: 239305632
Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e
2019-03-19 17:33:20 -07:00
Bert Muthalaly 928809fa7d Add layer 2 stats (tx, rx) X (packets, bytes) to netstack
PiperOrigin-RevId: 239194420
Change-Id: Ie193e8ac2b7a6db21195ac85824a335930483971
2019-03-19 08:30:43 -07:00
Tamir Duberstein 5496be7c5d Remove duplicate TCP flag definitions
PiperOrigin-RevId: 238467634
Change-Id: If4cd8efff7386fbee1195f051d15549b495910a9
2019-03-14 10:19:21 -07:00
Kevin Krakauer f97c4f1b7a Remove unused function.
PiperOrigin-RevId: 238336475
Change-Id: I8131e04699028246ebc233953ebb3feca5673940
2019-03-13 16:40:10 -07:00
Fabricio Voznika 70d0613444 Reduce PACKET_RX_RING memory usage
Previous memory allocation was excessive (80 MB). Changed
it to use 2 MB instead. There is no drop in perfomance due
to this change:

ab -n 100 -c 10 http://server/latin10m.txt  ==> 10 MB file
80 MB: 178 MB/s
 2 MB: 181 MB/s

PiperOrigin-RevId: 238321594
Change-Id: I1c8aed13cad5d75f4506d2b406b305117055fbe5
2019-03-13 15:25:13 -07:00
Noah Gold 8003bd6a5c Make gonet.PacketConn implement net.Conn.
gonet.PacketConn now implements net.Conn, allowing it to be returned from
net.Dial.Dialer functions.

PiperOrigin-RevId: 238111980
Change-Id: I174884385ff4d9b8e9918fac7bbb5b93ca366ba7
2019-03-12 15:36:33 -07:00
Ian Gudger a16f6e50c5 Make HandleLocal apply to all non-loopback interfaces.
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify
the implementations. This has the benefit of making HandleLocal apply even when
the fdbased link endpoint isn't in use.

In addition, move looping logic to route creation so that it doesn't need to be
run for each packet. This should improve performance.

PiperOrigin-RevId: 238099480
Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
2019-03-12 14:37:56 -07:00
Ian Gudger 86036f979b Validate multicast addresses in multicast group operations.
PiperOrigin-RevId: 237559843
Change-Id: I93a9d83a08cd3d49d5fc7fcad5b0710d0aa04aaa
2019-03-08 19:05:26 -08:00
Ian Gudger 56a6128295 Implement IP_MULTICAST_LOOP.
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.

For now we only support IPv4 multicast.

PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
2019-03-08 15:49:17 -08:00
Bhasker Hariharan 1718fdd1a8 Add new retransmissions and recovery related metrics.
PiperOrigin-RevId: 236945145
Change-Id: I051760d95154ea5574c8bb6aea526f488af5e07b
2019-03-05 16:41:44 -08:00
Kevin Krakauer 23e66ee96d Remove unused commit() function argument to Bind.
PiperOrigin-RevId: 236926132
Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181
2019-03-05 14:53:34 -08:00
Tamir Duberstein 3830786883 Map IPv{4,6} addresses to ethernet addresses
...in accordance with RFCs 1112 and 2464.

Fixes IPv4 multicast when IP_MULTICAST_IF is specified.

Don't return ErrNoRoute when no route is needed.
Don't set Route.NextHop when no route is needed.

PiperOrigin-RevId: 236199813
Change-Id: I48ed33e1b7f760deaa37e18ad7f1b8b62819ab43
2019-02-28 14:38:32 -08:00
Kevin Krakauer 121db29a93 Ping support via IPv4 raw sockets.
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
  which can pass it up to the socket layer. This allows a raw socket to return
  the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
  that enable incoming packets to be delivered to raw endpoints. New raw sockets
  of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.

PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-27 14:31:21 -08:00
Googler 12d9cf6fab Adds a WriteRawPacket method to the InjectableLinkEndpoint interface.
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.

PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
2019-02-26 14:58:37 -08:00
Amanda Tait 33d0e824c7 Use more conservative locking in NIC.DeliverNetworkPacket
An earlier CL excessively minimizes the period in which it
holds a lock on NIC. This earlier CL had done this out of
the mistaken impression it fixed a broken test, when in
fact it just reduced the rate of failure of a flaky test
in tcp_test.go. This new change holds the lock on NIC
for the duration of the loop over n.endpoints.

PiperOrigin-RevId: 235732487
Change-Id: I53ee6df264f093ddc4d29e9acdcba6b4838cb112
2019-02-26 09:10:37 -08:00
Bhasker Hariharan 26be25e4ec Add a SACK scoreboard to TCP endpoints.
This change does not make use of SACK information but adds support to track
SACK information and store it in the endpoint.

The actual SACK based recovery will be in a separate CL.

Part of commits to add RFC 6675 support to Netstack.

PiperOrigin-RevId: 235612264
Change-Id: I261f94844d7bad5abda803152ce6cc6125a467ff
2019-02-25 15:20:04 -08:00
Amanda Tait c14a1a1618 Fix race condition in NIC.DeliverNetworkPacket
cl/234850781 introduced a race condition in NIC.DeliverNetworkPacket
by failing to hold a lock. This change fixes this regressesion by acquiring
a read lock before iterating through n.endpoints, and then releasing the lock
once iteration is complete.

PiperOrigin-RevId: 235549770
Change-Id: Ib0133288be512d478cf759c3314dc95ec3205d4b
2019-02-25 10:02:29 -08:00
Kevin Krakauer b75aa51504 Rename ping endpoints to icmp endpoints.
PiperOrigin-RevId: 235248572
Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09
2019-02-22 13:34:47 -08:00
Amanda Tait ea070b9d5f Implement Broadcast support
This change adds support for the SO_BROADCAST socket option in gVisor Netstack.
This support includes getsockopt()/setsockopt() functionality for both UDP and
TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and
down the stack, and route finding/creation for broadcast packets. Finally, a
suite of tests have been implemented, exercising this functionality through the
Linux syscall API.

PiperOrigin-RevId: 234850781
Change-Id: If3e666666917d39f55083741c78314a06defb26c
2019-02-20 12:54:13 -08:00
Bhasker Hariharan 3e3a1ef9d6 Updates tcp_proxy to use an AF_PACKET and veth devices.
tcp_proxy now uses an AF_PACKET socket as the FD for netstack link layer
endpoint instead of a tap device. It also changes the link layer endpoint to use
PacketMMap dispatch instead of Readv. This reduces overall cpu and reflects the
current runsc setup which uses PacketMMap and also uses veth devices to receive
packets.

Also fixed a bug in gonet where Read() was not doing coalescing read and would
read small amounts at a time.

PiperOrigin-RevId: 234714768
Change-Id: Idabf8e600e4512489d3ba441c4096dc74deba5d7
2019-02-19 18:23:54 -08:00
Ian Gudger c611dbc5a7 Implement IP_MULTICAST_IF.
This allows setting a default send interface for IPv4 multicast. IPv6 support
will come later.

PiperOrigin-RevId: 234251379
Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173
2019-02-15 18:40:15 -08:00
Kevin Krakauer a9cb3dcd9d Move SO_TIMESTAMP from different transport endpoints to epsocket.
SO_TIMESTAMP is reimplemented in ping and UDP sockets (and needs to be added for
TCP), but can just be implemented in epsocket for simplicity. This will also
make SIOCGSTAMP easier to implement.

PiperOrigin-RevId: 234179300
Change-Id: Ib5ea0b1261dc218c1a8b15a65775de0050fe3230
2019-02-15 11:18:44 -08:00
Googler d60ce17a21 Internal change.
PiperOrigin-RevId: 234011346
Change-Id: Ic69375ddb3794dd0d3d6e62ee4dc60fdf4baf2c7
2019-02-14 12:54:27 -08:00
Bhasker Hariharan e0b3d3323f Add support for using PACKET_RX_RING to receive packets.
PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the
kernel. This should cut down the number of host syscalls that need to be made
to receive packets when the underlying fd is a socket of the AF_PACKET type.

PiperOrigin-RevId: 233834998
Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f
2019-02-13 14:53:03 -08:00
Bhasker Hariharan efe5e737d7 Do not drop packets w/ missing TCP timestamps.
RFC7323 recommends that if the timestamp option was negotiated
then all packets should carry a TCP Timestamp and any packets that
do not should be dropped.

Netstack implemented this behaviour. Linux OTOH does not and will
accept such packets. This change makes Netstack behaviour compatible
with Linux.

Also now that we allow such packets, we do need to update RTO calculations
based on these packets even if timestamp option is enabled.

PiperOrigin-RevId: 233432268
Change-Id: I9f4742ae6b63930ac3b5e37d8c238761e6a4b29f
2019-02-11 10:23:43 -08:00
Ian Gudger 967326131a Fix build error.
PiperOrigin-RevId: 233139020
Change-Id: I2e7089fa25d20e5662eb941054a684d41f5d3e12
2019-02-08 15:37:20 -08:00
Ian Gudger 80f901b16b Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack.
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in
a followup CL.

PiperOrigin-RevId: 233008638
Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
2019-02-07 23:15:23 -08:00
Googler e0afa87899 Internal change.
PiperOrigin-RevId: 232937200
Change-Id: I5c3709cc8f1313313ff618a45e48c14a3a111cb4
2019-02-07 13:46:26 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Bhasker Hariharan f03c7e48e7 Fix IsLost check to match the description in RFC6675.
quoting what "rscheff@gmx.at" pointed out over email.
"IsLost in RFC3517 is defined as  >=  (DupThresh * SMSS) while
RFC6675 improves upon this, and defines IsLost as  >
((DupThresh - 1) * SMSS + 1).

The latter addresses situations where partial segments (size < MSS)
are sent (eg. last segment of a http protocol message sent with PSH
being less than MSS is common)."

PiperOrigin-RevId: 231512331
Change-Id: I1addd4a92e3e7baeb0bdda46463ebfae435da958
2019-01-29 18:13:48 -08:00
Ian Gudger ff1c3bb0b5 Fix NIC endpoint forwarding.
Also adds a test for regular NIC forwarding.

PiperOrigin-RevId: 231495279
Change-Id: Ic7edec249568e9ad0280cea77eac14478c9073e1
2019-01-29 16:23:30 -08:00
Bhasker Hariharan 24cb2c0a72 Use recvmmsg() instead of readv() to read packets from NIC.
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.

PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
2019-01-29 01:39:01 -08:00
Kevin Krakauer 9a01287d23 test: Tag tcp_test as flaky.
PiperOrigin-RevId: 229427852
Change-Id: I9de8ed63f4a7672dacd3b282c863c599d00acd52
2019-01-15 13:21:00 -08:00
Zhaozhong Ni 7182b9cf52 netstack: release port inline for listening sockets only.
PiperOrigin-RevId: 229243918
Change-Id: Ie14ef34e66ae851ed080f57b7d26a369a66f7664
2019-01-14 13:33:47 -08:00
Googler 1e1dae50ca Internal change.
PiperOrigin-RevId: 228979583
Change-Id: I69bd82def48ceb19bc8558c890622b8528d98764
2019-01-11 18:52:36 -08:00
Bert Muthalaly 3f45878b73 Implement Stringer for tcpip.StatCounter
This enables formatting tcpip.Stats readably with %+v.

PiperOrigin-RevId: 228379088
Change-Id: I6a9876454a22f151ee752cf94589b4188729458f
2019-01-08 12:35:35 -08:00
Andrei Vagin 652d068119 Implement SO_REUSEPORT for TCP and UDP sockets
This option allows multiple sockets to be bound to the same port.

Incoming packets are distributed to sockets using a hash based on source and
destination addresses. This means that all packets from one sender will be
received by the same server socket.

PiperOrigin-RevId: 227153413
Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
2018-12-28 11:27:14 -08:00
Ian Gudger 0df0df35fc Stub out SO_OOBINLINE.
We don't explicitly support out-of-band data and treat it like normal in-band
data. This is equilivent to SO_OOBINLINE being enabled, so always report that
it is enabled.

PiperOrigin-RevId: 226572742
Change-Id: I4c30ccb83265e76c30dea631cbf86822e6ee1c1b
2018-12-21 19:46:55 -08:00
Michael Pratt 71f0d5108b Internal Change
PiperOrigin-RevId: 226542979
Change-Id: Ife11ebd0a85b8a63078e6daa71b4a99a82080ac9
2018-12-21 14:29:35 -08:00
Ian Gudger b515556519 Implement SO_KEEPALIVE, TCP_KEEPIDLE, and TCP_KEEPINTVL.
Within gVisor, plumb new socket options to netstack.

Within netstack, fix GetSockOpt and SetSockOpt return value logic.

PiperOrigin-RevId: 226532229
Change-Id: If40734e119eed633335f40b4c26facbebc791c74
2018-12-21 13:13:45 -08:00
Chris Kuiper e491ebbacf Allow sending of multicast and IPv6 link-local packets w/o route.
Same as with broadcast packets, sending of a multicast packet shouldn't require
accessing the route table. The same applies to IPv6 link-local addresses, which
aren't routable at all (they don't belong to any subnet by definition).

PiperOrigin-RevId: 225775870
Change-Id: Ic53e6560c125a83be2be9c3d112e66b36e8dfe7b
2018-12-16 23:05:59 -08:00
Ian Gudger 6253d32cc9 transport/tcp: remove unused error return values
PiperOrigin-RevId: 225421480
Change-Id: I1e9259b0b7e8490164e830b73338a615129c7f0e
2018-12-13 13:02:49 -08:00
Ian Gudger 25b8424d75 Stub out TCP_QUICKACK
PiperOrigin-RevId: 224696233
Change-Id: I45c425d9e32adee5dcce29ca7439a06567b26014
2018-12-09 00:50:33 -08:00
Chris Kuiper 1b3442cae0 Allow sending of broadcast packets w/o route.
Currently sending a broadcast packet (for DHCP, e.g.) requires a "default
route" of the format "0.0.0.0/0 via 0.0.0.0 <intf>". There is no good reason
for this and on devices with several ports this creates a rather akward route
table with lots of such default routes (which defeats the purpose of a default
route).

PiperOrigin-RevId: 224378769
Change-Id: Icd7ec8a206eb08083cff9a837f6f9ab231c73a19
2018-12-06 11:48:12 -08:00
Ian Gudger 000fa84a3b Fix tcpip.Endpoint.Write contract regarding short writes
* Clarify tcpip.Endpoint.Write contract regarding short writes.
* Enforce tcpip.Endpoint.Write contract regarding short writes.
* Update relevant users of tcpip.Endpoint.Write.

PiperOrigin-RevId: 224377586
Change-Id: I24299ecce902eb11317ee13dae3b8d8a7c5b097d
2018-12-06 11:41:33 -08:00
Zhaozhong Ni 7f35daddd2 sentry: support save / restore of TCP bind socket after shutdown.
PiperOrigin-RevId: 224227677
Change-Id: I08b0e0c0574170556269900653e5bcf9e9e5c9c9
2018-12-05 15:02:40 -08:00
Zhaozhong Ni fda4557e3d sentry: skip waiting for undrain for netstack TCP endpoints in error state.
PiperOrigin-RevId: 224214981
Change-Id: I4c1dd5b1c856f7a4f9866a5dda44a5297e92486a
2018-12-05 13:51:16 -08:00
Chris Kuiper fab029c50b Remove incorrect code and improve testing of Stack.GetMainNICAddress
This removes code that should have never made it in in the first place, but did so due to incomplete testing. With the new tests the original code fails, the new code passes.

PiperOrigin-RevId: 224086966
Change-Id: I646fef76977f4528f3705f497b95fad6b3ec32bc
2018-12-04 19:09:11 -08:00
Ian Gudger d209f71b9f Whitelist Go 1.12 for tcpip/time_unsafe.go
The signature of time.now has remained unchanged:
c2412a7681/src/time/time.go (L1072)

PiperOrigin-RevId: 224061160
Change-Id: Ic84bd6ee8fb9952cd9ab580bcb0892444ce7c2da
2018-12-04 15:52:14 -08:00
Ian Gudger 8cbd6153a6 Fix available calculation when merging TCP segments
PiperOrigin-RevId: 224033418
Change-Id: I780be973e8be68ac93e8c9e7a100002e912f40d2
2018-12-04 13:15:25 -08:00
Zhaozhong Ni ad8f293e1a sentry: save copy of tcp segment's delivered views to avoid in-struct pointers.
PiperOrigin-RevId: 224033238
Change-Id: Ie5b1854b29340843b02c123766d290a8738d7631
2018-12-04 13:14:24 -08:00
Ian Gudger 99fb113869 Test that full segments will be sent when delay/cork is enabled
PiperOrigin-RevId: 223425575
Change-Id: Idd777e04c69e6ffcbfb0bdbea828a8b8b42d7672
2018-11-29 15:46:38 -08:00
Ian Gudger 1918563525 Make ToView non-allocating for single VectorizedViews containing a single View
PiperOrigin-RevId: 222483471
Change-Id: I6720690b20167dd541fdfa5218eba7c9f7483347
2018-11-21 18:11:13 -08:00
Ian Gudger 9d8e49d950 Process delayed packets when delay is disabled
Moving the wakeup logic into the disable blocks is an optimization.

PiperOrigin-RevId: 221677028
Change-Id: Ib5a5a6d52cc77b4bbc5dedcad9ee1dbb3da98deb
2018-11-15 13:17:06 -08:00
Bert Muthalaly bc41e4761b Rename incorrectly named (dst, src) arguments in DeliverNetworkPacket prototype
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).

Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.

Note that every callsite passes arguments in the order (src, dst).

PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
2018-11-14 14:46:24 -08:00
Ian Gudger b5e91eaa52 Clean up tcp.sendData
PiperOrigin-RevId: 221484739
Change-Id: I44c71f79f99d0d00a2e70a7f06d7024a62a5de0a
2018-11-14 11:58:41 -08:00
Ian Gudger 7f60294a73 Implement TCP_NODELAY and TCP_CORK
Previously, TCP_NODELAY was always enabled and we would lie about it being
configurable. TCP_NODELAY is now disabled by default (to match Linux) in the
socket layer so that non-gVisor users don't automatically start using this
questionable optimization.

PiperOrigin-RevId: 221368472
Change-Id: Ib0240f66d94455081f4e0ca94f09d9338b2c1356
2018-11-13 18:02:43 -08:00
Ian Gudger c22da3e705 Remove obsolete TODO
PiperOrigin-RevId: 221117846
Change-Id: I2a43fd8135b1d1194ff81e98644ce6b6182ece50
2018-11-12 10:45:19 -08:00
Bhasker Hariharan 33089561b1 Add an implementation of a SACK scoreboard as per RFC6675.
PiperOrigin-RevId: 220866996
Change-Id: I89d48215df57c00d6a6ec512fc18712a2ea9080b
2018-11-09 14:38:46 -08:00
Fabricio Voznika dce61075c0 Fix flaky TestCacheResolutionTimeout
Increase timeout to prevent the entry from being
found when there is delay on the address resolution
goroutine that doesn't mark the request as failed.

PiperOrigin-RevId: 220504789
Change-Id: I7e44fd95d8624bd69962f862fbf5517a81395f2a
2018-11-07 12:01:48 -08:00
Googler 9256ed5283 Internal change.
PiperOrigin-RevId: 220314735
Change-Id: Ic519567e43f6caf042b9f223e517da40640b7d38
2018-11-06 11:08:22 -08:00
Ian Gudger 37cbce1f91 Merge segments in sender's writeList
PiperOrigin-RevId: 220185891
Change-Id: Iaea73fd7b2fa8c399b989cdcaabf4885f370df4b
2018-11-05 15:39:30 -08:00
Ian Gudger 59b7766af7 Fix a race where keepalives could be sent while there is pending data
PiperOrigin-RevId: 219571556
Change-Id: I5a1042c1cb05eb2711eb01627fd298bad6c543a6
2018-10-31 18:42:44 -07:00
Ian Gudger eeddae1199 Use syserr style error translation in netstack's rawfile
Replacing map lookups with slice indexing is higher performance.

PiperOrigin-RevId: 219569901
Change-Id: I9b7cd22abd4b95383025edbd5a80d1c1a4496936
2018-10-31 18:22:05 -07:00
Tamir Duberstein 0692ad72ef Remove ipv4.endpoint.address
This field was added in the intial implementation, before Route existed
to pass the local and remote addresses to the packet-writing path.
Today, the Route's members should be respected. A similar bug was
previously fixed in 214650822.

PiperOrigin-RevId: 219474095
Change-Id: Id2a8ee4421d2841c8d88ccb3c193c455086350ee
2018-10-31 08:04:57 -07:00
Fabricio Voznika c99006a240 Mark netstack/tcpip/transport/tcp:tcp_test flaky
PiperOrigin-RevId: 218537640
Change-Id: I1c5f55a46390174e1f5caeff74b1a364fa3268d9
2018-10-24 10:46:25 -07:00
Adin Scannell 1369e17504 Remove blanket TODO, as it is self-evident.
PiperOrigin-RevId: 218390517
Change-Id: Ic891c1626e62a6c4ed57f8180740872bcd1be177
2018-10-23 12:52:27 -07:00
Tamir Duberstein 692df85673 Simplify channel management
The channels {cancel,resCh} have roughly the same lifetime and are used for
roughly the same purpose as an entry's waiters; we can unify the state
management of the two mechanisms, while also reducing unncessary mutex locking
and unlocking.

Made some cosmetic changes while I'm here.

PiperOrigin-RevId: 218343915
Change-Id: Ic69546a2b7b390162b2231f07f335dd6199472d7
2018-10-23 08:16:13 -07:00
Adin Scannell 75cd70ecc9 Track paths and provide a rename hook.
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.

PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-23 00:20:15 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Ian Gudger 6cba410df0 Move Unix transport out of netstack
PiperOrigin-RevId: 217557656
Change-Id: I63d27635b1a6c12877279995d2d9847b6a19da9b
2018-10-17 11:37:51 -07:00
Ian Gudger 324ad3564b Refactor host.ConnectedEndpoint
* Integrate recvMsg and sendMsg functions into Recv and Send respectively as
  they are no longer shared.
* Clean up partial read/write error handling code.
* Re-order code to make sense given that there is no longer a host.endpoint
  type.

PiperOrigin-RevId: 217255072
Change-Id: Ib43fe9286452f813b8309d969be11f5fa40694cd
2018-10-15 20:23:18 -07:00
Ian Gudger 167f2401c4 Merge host.endpoint into host.ConnectedEndpoint
host.endpoint contained duplicated logic from the sockerpair implementation and
host.ConnectedEndpoint. Remove host.endpoint in favor of a
host.ConnectedEndpoint wrapped in a socketpair end.

PiperOrigin-RevId: 217240096
Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c
2018-10-15 17:48:11 -07:00
Fabricio Voznika 86680fa002 Add String() method to AddressMask
PiperOrigin-RevId: 216770391
Change-Id: Idcdc28b2fe9e1b0b63b8119d445f05a8bcbce81e
2018-10-11 15:22:02 -07:00
Michael Pratt ddb34b3690 Enforce message size limits and avoid host calls with too many iovecs
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.

When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.

To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.

PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
2018-10-10 14:10:17 -07:00