Commit Graph

36 Commits

Author SHA1 Message Date
Ting-Yu Wang 1cd76d958a Make dedicated methods for data operations in PacketBuffer
One of the preparation to decouple underlying buffer implementation.
There are still some methods that tie to VectorisedView, and they will be
changed gradually in later CLs.

This CL also introduce a new ICMPv6ChecksumParams to replace long list of
parameters when calling ICMPv6Checksum, aiming to be more descriptive.

PiperOrigin-RevId: 360778149
2021-03-03 16:05:16 -08:00
Tamir Duberstein 8d1afb4185 Change tcpip.Error to an interface
This makes it possible to add data to types that implement tcpip.Error.
ErrBadLinkEndpoint is removed as it is unused.

PiperOrigin-RevId: 354437314
2021-01-28 17:59:58 -08:00
Ghanan Gowripalan 8c0701462a Use stack.Route exclusively for writing packets
* Remove stack.Route from incoming packet path.
There is no need to pass around a stack.Route during the incoming path
of a packet. Instead, pass around the packet's link/network layer
information in the packet buffer since all layers may need this
information.

* Support address bound and outgoing packet NIC in routes.
When forwarding is enabled, the source address of a packet may be bound
to a different interface than the outgoing interface. This change
updates stack.Route to hold both NICs so that one can be used to write
packets while the other is used to check if the route's bound address
is valid. Note, we need to hold the address's interface so we can check
if the address is a spoofed address.

* Introduce the concept of a local route.
Local routes are routes where the packet never needs to leave the stack;
the destination is stack-local. We can now route between interfaces
within a stack if the packet never needs to leave the stack, even when
forwarding is disabled.

* Always obtain a route from the stack before sending a packet.
If a packet needs to be sent in response to an incoming packet, a route
must be obtained from the stack to ensure the stack is configured to
send packets to the packet's source from the packet's destination.

* Enable spoofing if a stack may send packets from unowned addresses.
This change required changes to some netgophers since previously,
promiscuous mode was enough to let the netstack respond to all
incoming packets regardless of the packet's destination address. Now
that a stack.Route is not held for each incoming packet, finding a route
may fail with local addresses we don't own but accepted packets for
while in promiscuous mode. Since we also want to be able to send from
any address (in response the received promiscuous mode packets), we need
to enable spoofing.

* Skip transport layer checksum checks for locally generated packets.
If a packet is locally generated, the stack can safely assume that no
errors were introduced while being locally routed since the packet is
never sent out the wire.

Some bugs fixed:
- transport layer checksum was never calculated after NAT.
- handleLocal didn't handle routing across interfaces.
- stack didn't support forwarding across interfaces.
- always consult the routing table before creating an endpoint.

Updates #4688
Fixes #3906

PiperOrigin-RevId: 340943442
2020-11-05 15:52:16 -08:00
Ghanan Gowripalan 5075d0342f Trim Network/Transport Endpoint/Protocol
* Remove Capabilities and NICID methods from NetworkEndpoint.

* Remove linkEP and stack parameters from NetworkProtocol.NewEndpoint.
The LinkEndpoint can be fetched from the NetworkInterface. The stack
is passed to the NetworkProtocol when it is created so the
NetworkEndpoint can get it from its protocol.

* Remove stack parameter from TransportProtocol.NewEndpoint.
Like the NetworkProtocol/Endpoint, the stack is passed to the
TransportProtocol when it is created.

PiperOrigin-RevId: 334332721
2020-09-29 02:05:50 -07:00
Ghanan Gowripalan a5acc0616c Support creating protocol instances with Stack ref
Network or transport protocols may want to reach the stack. Support this
by letting the stack create the protocol instances so it can pass a
reference to itself at protocol creation time.

Note, protocols do not yet use the stack in this CL but later CLs will
make use of the stack from protocols.

PiperOrigin-RevId: 334260210
2020-09-28 16:24:04 -07:00
Julian Elischer 99decaadd6 Extract ICMP error sender from UDP
Store transport protocol number on packet buffers for use in ICMP error
generation.

Updates #2211.

PiperOrigin-RevId: 333252762
2020-09-23 02:28:43 -07:00
Ghanan Gowripalan 360006d894 Use common parsing utilities when sniffing
Extract parsing utilities so they can be used by the sniffer.

Fixes #3930

PiperOrigin-RevId: 332401880
2020-09-18 00:48:09 -07:00
Ghanan Gowripalan d35f07b36a Improve type safety for transport protocol options
The existing implementation for TransportProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.

This change introduces marker interfaces for transport protocol options
that may be set or queried which transport protocol option types
implement to ensure that invalid types are caught at compile time.
Different interfaces are used to allow the compiler to enforce read-only
or set-only socket options.

RELNOTES: n/a
PiperOrigin-RevId: 330559811
2020-09-08 12:17:39 -07:00
Toshi Kikuchi 70a7a3ac70 Only send an ICMP error message if UDP checksum is valid.
Test:
 - TestV4UnknownDestination
 - TestV6UnknownDestination
PiperOrigin-RevId: 328424137
2020-08-25 16:15:29 -07:00
Ting-Yu Wang 47515f4751 Migrate to PacketHeader API for PacketBuffer.
Formerly, when a packet is constructed or parsed, all headers are set by the
client code. This almost always involved prepending to pk.Header buffer or
trimming pk.Data portion. This is known to prone to bugs, due to the complexity
and number of the invariants assumed across netstack to maintain.

In the new PacketHeader API, client will call Push()/Consume() method to
construct/parse an outgoing/incoming packet. All invariants, such as slicing
and trimming, are maintained by the API itself.

NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no
longer valid.

PacketBuffer now assumes the packet is a concatenation of following portions:
* LinkHeader
* NetworkHeader
* TransportHeader
* Data
Any of them could be empty, or zero-length.

PiperOrigin-RevId: 326507688
2020-08-13 13:08:57 -07:00
Bhasker Hariharan b070e218c6 Add support for Stack level options.
Linux controls socket send/receive buffers using a few sysctl variables
  - net.core.rmem_default
  - net.core.rmem_max
  - net.core.wmem_max
  - net.core.wmem_default
  - net.ipv4.tcp_rmem
  - net.ipv4.tcp_wmem

The first 4 control the default socket buffer sizes for all sockets
raw/packet/tcp/udp and also the maximum permitted socket buffer that can be
specified in setsockopt(SOL_SOCKET, SO_(RCV|SND)BUF,...).

The last two control the TCP auto-tuning limits and override the default
specified in rmem_default/wmem_default as well as the max limits.

Netstack today only implements tcp_rmem/tcp_wmem and incorrectly uses it
to limit the maximum size in setsockopt() as well as uses it for raw/udp
sockets.

This changelist introduces the other 4 and updates the udp/raw sockets to use
the newly introduced variables. The values for min/max match the current
tcp_rmem/wmem values and the default value buffers for UDP/RAW sockets is
updated to match the linux value of 212KiB up from the really low current value
of 32 KiB.

Updates #3043
Fixes #3043

PiperOrigin-RevId: 318089805
2020-06-24 10:24:20 -07:00
Bhasker Hariharan 07ff909e76 Support setsockopt SO_SNDBUF/SO_RCVBUF for raw/udp sockets.
Updates #173,#6
Fixes #2888

PiperOrigin-RevId: 317087652
2020-06-18 06:07:20 -07:00
Kevin Krakauer 32b823fcdb netstack: parse incoming packet headers up-front
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.

This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.

PiperOrigin-RevId: 315179302
2020-06-07 13:38:43 -07:00
Ting-Yu Wang d3a8bffe04 Pass PacketBuffer as pointer.
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.

With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.

Updates #2404.

PiperOrigin-RevId: 314610879
2020-06-03 15:00:42 -07:00
Kevin Krakauer 5e1e61fbcb Automated rollback of changelist 308674219
PiperOrigin-RevId: 309491861
2020-05-01 16:09:53 -07:00
gVisor bot 55f0c3316a Automated rollback of changelist 308163542
PiperOrigin-RevId: 308674219
2020-04-27 12:26:32 -07:00
Kevin Krakauer eccae0f77d Remove View.First() and View.RemoveFirst()
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.

The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.

PiperOrigin-RevId: 308163542
2020-04-23 17:28:49 -07:00
gVisor bot 120d3b50f4 Automated rollback of changelist 307477185
PiperOrigin-RevId: 307598974
2020-04-21 07:16:30 -07:00
Kevin Krakauer a551add5d8 Remove View.First() and View.RemoveFirst()
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.

The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
2020-04-17 13:25:57 -07:00
Bhasker Hariharan 7e4073af12 Move tcpip.PacketBuffer and IPTables to stack package.
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.

Updates #2214

PiperOrigin-RevId: 302677662
2020-03-24 09:06:26 -07:00
Ian Gudger c37b196455 Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.

Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.

PiperOrigin-RevId: 296922548
2020-02-24 10:32:17 -08:00
Kevin Krakauer 3f7d937090 Use PacketBuffers for outgoing packets.
PiperOrigin-RevId: 280455453
2019-11-14 10:15:38 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Bhasker Hariharan 3789c34b22 Make UDP traceroute work.
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.

Fixes #703

PiperOrigin-RevId: 267034418
2019-09-03 16:01:17 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Kevin Krakauer c1cdf18e7b UDP and TCP raw socket support.
PiperOrigin-RevId: 249511348
Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-22 13:45:15 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Kevin Krakauer 121db29a93 Ping support via IPv4 raw sockets.
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
  which can pass it up to the socket layer. This allows a raw socket to return
  the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
  that enable incoming packets to be delivered to raw endpoints. New raw sockets
  of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.

PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-27 14:31:21 -08:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Tamir Duberstein d689f8422f Always pass buffer.VectorisedView by value
PiperOrigin-RevId: 212757571
Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
2018-09-12 21:57:55 -07:00
Nicolas Lacasse bf0fa09537 Switch netstack licenses to Apache 2.0.
Fixes #27

PiperOrigin-RevId: 203825288
Change-Id: Ie9f3a2b2c1e296b026b024f75c07da1a7e118633
2018-07-09 14:04:40 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00