Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.
The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.
Nginx test results:
Server Software: nginx/1.15.9
Server Hostname: 10.138.0.2
Server Port: 8080
Document Path: /10m.txt
Document Length: 10485760 bytes
w/o gso:
Concurrency Level: 5
Time taken for tests: 5.491 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 18.21 [#/sec] (mean)
Time per request: 274.525 [ms] (mean)
Time per request: 54.905 [ms] (mean, across all concurrent requests)
Transfer rate: 186508.03 [Kbytes/sec] received
sw-gso:
Concurrency Level: 5
Time taken for tests: 3.852 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 25.96 [#/sec] (mean)
Time per request: 192.576 [ms] (mean)
Time per request: 38.515 [ms] (mean, across all concurrent requests)
Transfer rate: 265874.92 [Kbytes/sec] received
w/o gso:
$ ./tcp_benchmark --client --duration 15 --ideal
[SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec
software gso:
$ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso
[SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec
PiperOrigin-RevId: 276112677
It is quite legal to send from the ANY address (it is required for
DHCP). I can't figure out why the broadcast address was included here,
so removing that as well.
PiperOrigin-RevId: 275541954
Reassembly can fail due to an invalid sequence of fragments
being received. eg. Multiple fragments with same id which
claim to be the last one by setting the more flag to 0 etc.
It's safer to just drop the reassembler and increment a metric
than to panic when reassembly fails.
PiperOrigin-RevId: 274920901
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.
PiperOrigin-RevId: 274049313
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:
Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
that interface
* A broadcast packet sent on an unbound socket follows the route table to
select the outgoing interface
+ if an explicit route entry exists for 255.255.255.255/32, use that one
+ else use the default route
* Broadcast packets are looped back and delivered following the rules for
incoming packets (see next). This is the same behavior as for multicast
packets, except that it cannot be disabled via sockopt.
Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
(0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
broadcast packets sent to 255.255.255.255:<port>
In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.
PiperOrigin-RevId: 272792377
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.
Fixes#703
PiperOrigin-RevId: 267034418
The checksum was not being reset before being re-calculated and sent out.
This caused the sent checksum to always be `0x0800`.
Fixes#605.
PiperOrigin-RevId: 260965059
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.
PiperOrigin-RevId: 259807693
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.
PiperOrigin-RevId: 257903078
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.
PiperOrigin-RevId: 251739298
Some behavior was broken due to the difficulty of running automated raw
socket tests.
Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes#209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.
Here are numbers for the nginx-1m test:
runsc: 579330.01 [Kbytes/sec] received
runsc-gso: 1794121.66 [Kbytes/sec] received
runc: 2122139.06 [Kbytes/sec] received
and for tcp_benchmark:
$ tcp_benchmark --duration 15 --ideal
[ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal
[ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal --gso 65536
[ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec
PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.
For now we only support IPv4 multicast.
PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
which can pass it up to the socket layer. This allows a raw socket to return
the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
that enable incoming packets to be delivered to raw endpoints. New raw sockets
of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.
PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.
PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
Nothing reads them and they can simply get stale.
Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD
PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
This field was added in the intial implementation, before Route existed
to pass the local and remote addresses to the packet-writing path.
Today, the Route's members should be respected. A similar bug was
previously fixed in 214650822.
PiperOrigin-RevId: 219474095
Change-Id: Id2a8ee4421d2841c8d88ccb3c193c455086350ee
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.
PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
Makes it possible to avoid copying or allocating in cases where DeliverNetworkPacket (rx)
needs to turn around and call WritePacket (tx) with its VectorisedView.
Also removes the restriction on having VectorisedViews with multiple views in the write path.
PiperOrigin-RevId: 211728717
Change-Id: Ie03a65ecb4e28bd15ebdb9c69f05eced18fdfcff
This CL does NDP link-address discovery for IPv6.
It includes several small changes necessary to get linux to talk to
this implementation. In particular, a hop limit of 255 is necessary
for ICMPv6.
PiperOrigin-RevId: 211103930
Change-Id: If25370ab84c6b1decfb15de917f3b0020f2c4e0e
We have been unnecessarily creating too many savable types implicitly.
PiperOrigin-RevId: 206334201
Change-Id: Idc5a3a14bfb7ee125c4f2bb2b1c53164e46f29a8