Commit Graph

76 Commits

Author SHA1 Message Date
gVisor bot cfd30665c1 iptables - filter packets using outgoing interface.
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT

PiperOrigin-RevId: 310642286
2020-05-08 15:44:54 -07:00
Kevin Krakauer 763b5ad596 Add basic incoming ipv4 fragment tests
Based on ipv6's TestReceiveIPv6Fragments.
2020-05-06 22:45:21 -07:00
Nayana Bidari b660f16d18 Support for connection tracking of TCP packets.
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
2020-05-01 16:59:40 -07:00
Kevin Krakauer 5e1e61fbcb Automated rollback of changelist 308674219
PiperOrigin-RevId: 309491861
2020-05-01 16:09:53 -07:00
Bhasker Hariharan ae15d90436 FIFO QDisc implementation
Updates #231

PiperOrigin-RevId: 309323808
2020-04-30 16:41:00 -07:00
gVisor bot 55f0c3316a Automated rollback of changelist 308163542
PiperOrigin-RevId: 308674219
2020-04-27 12:26:32 -07:00
Kevin Krakauer eccae0f77d Remove View.First() and View.RemoveFirst()
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.

The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.

PiperOrigin-RevId: 308163542
2020-04-23 17:28:49 -07:00
gVisor bot 120d3b50f4 Automated rollback of changelist 307477185
PiperOrigin-RevId: 307598974
2020-04-21 07:16:30 -07:00
Kevin Krakauer a551add5d8 Remove View.First() and View.RemoveFirst()
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.

The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
2020-04-17 13:25:57 -07:00
Bhasker Hariharan fc99a7ebf0 Refactor software GSO code.
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.

This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.

While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.

Updates #231

PiperOrigin-RevId: 304731742
2020-04-03 18:35:55 -07:00
Nayana Bidari 92b9069b67 Support owner matching for iptables.
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
2020-03-26 12:21:24 -07:00
Bhasker Hariharan 7e4073af12 Move tcpip.PacketBuffer and IPTables to stack package.
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.

Updates #2214

PiperOrigin-RevId: 302677662
2020-03-24 09:06:26 -07:00
Ian Gudger c37b196455 Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.

Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.

PiperOrigin-RevId: 296922548
2020-02-24 10:32:17 -08:00
gVisor bot b29aeebaf6 Merge pull request #1683 from kevinGC:ipt-udp-matchers
PiperOrigin-RevId: 293243342
2020-02-04 16:20:16 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Kevin Krakauer 9143fcd7fd Add UDP matchers. 2020-01-21 14:47:17 -08:00
Kevin Krakauer d51eaa59c0 Merge branch 'iptables-write-input-drop' into iptables-write-filter-proto 2020-01-13 16:06:29 -08:00
Kevin Krakauer 31e49f4b19 Merge branch 'master' into iptables-write-input-drop 2020-01-13 12:22:15 -08:00
Kevin Krakauer d793677cd4 I think INPUT works with protocol 2020-01-10 18:07:15 -08:00
Kevin Krakauer 0999ae8b34 Getting a panic when running tests. For some reason the filter table is
ending up with the wrong chains and is indexing -1 into rules.
2020-01-08 15:57:25 -08:00
Kevin Krakauer b2a881784c Built dead-simple traversal, but now getting depedency cycle error :'( 2020-01-08 14:48:47 -08:00
Tamir Duberstein 9df018767c Remove redundant function argument
PacketLooping is already a member on the passed Route.

PiperOrigin-RevId: 288721500
2020-01-08 10:22:51 -08:00
Kevin Krakauer 1641338b14 Set transport and network headers on outbound packets.
These are necessary for iptables to read and parse headers for packet filtering.

PiperOrigin-RevId: 282372811
2019-11-25 09:37:53 -08:00
Adin Scannell c3b93afeaf Cleanup visibility.
PiperOrigin-RevId: 282194656
2019-11-23 23:54:41 -08:00
Kevin Krakauer 9db08c4e58 Use PacketBuffers with GSO.
PiperOrigin-RevId: 282045221
2019-11-22 14:52:35 -08:00
Kevin Krakauer 3f7d937090 Use PacketBuffers for outgoing packets.
PiperOrigin-RevId: 280455453
2019-11-14 10:15:38 -08:00
Ghanan Gowripalan 0c424ea731 Rename nicid to nicID to follow go-readability initialisms
https://github.com/golang/go/wiki/CodeReviewComments#initialisms

This change does not introduce any new functionality. It just renames variables
from `nicid` to `nicID`.

PiperOrigin-RevId: 278992966
2019-11-06 19:41:25 -08:00
Kevin Krakauer e1b21f3c8c Use PacketBuffers, rather than VectorisedViews, in netstack.
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:

* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
  VectorisedViews to packet handling functions.

This change only affects the incoming packet path, and a future change will
change the outgoing path.

Benchmark               Regular         PacketBufferPtr  PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg             400.715MB/s      373.676MB/s      396.276MB/s
BM_Sendmsg             361.832MB/s      333.003MB/s      335.571MB/s
BM_Recvfrom            453.336MB/s      393.321MB/s      381.650MB/s
BM_Sendto              378.052MB/s      372.134MB/s      341.342MB/s
BM_SendmsgTCP/0/1k     353.711MB/s      316.216MB/s      322.747MB/s
BM_SendmsgTCP/0/2k     600.681MB/s      588.776MB/s      565.050MB/s
BM_SendmsgTCP/0/4k     995.301MB/s      888.808MB/s      941.888MB/s
BM_SendmsgTCP/0/8k     1.517GB/s        1.274GB/s        1.345GB/s
BM_SendmsgTCP/0/16k    1.872GB/s        1.586GB/s        1.698GB/s
BM_SendmsgTCP/0/32k    1.017GB/s        1.020GB/s        1.133GB/s
BM_SendmsgTCP/0/64k    475.626MB/s      584.587MB/s      627.027MB/s
BM_SendmsgTCP/0/128k   416.371MB/s      503.434MB/s      409.850MB/s
BM_SendmsgTCP/0/256k   323.449MB/s      449.599MB/s      388.852MB/s
BM_SendmsgTCP/0/512k   243.992MB/s      267.676MB/s      314.474MB/s
BM_SendmsgTCP/0/1M     95.138MB/s       95.874MB/s       95.417MB/s
BM_SendmsgTCP/0/2M     96.261MB/s       94.977MB/s       96.005MB/s
BM_SendmsgTCP/0/4M     96.512MB/s       95.978MB/s       95.370MB/s
BM_SendmsgTCP/0/8M     95.603MB/s       95.541MB/s       94.935MB/s
BM_SendmsgTCP/0/16M    94.598MB/s       94.696MB/s       94.521MB/s
BM_SendmsgTCP/0/32M    94.006MB/s       94.671MB/s       94.768MB/s
BM_SendmsgTCP/0/64M    94.133MB/s       94.333MB/s       94.746MB/s
BM_SendmsgTCP/0/128M   93.615MB/s       93.497MB/s       93.573MB/s
BM_SendmsgTCP/0/256M   93.241MB/s       95.100MB/s       93.272MB/s
BM_SendmsgTCP/1/1k     303.644MB/s      316.074MB/s      308.430MB/s
BM_SendmsgTCP/1/2k     537.093MB/s      584.962MB/s      529.020MB/s
BM_SendmsgTCP/1/4k     882.362MB/s      939.087MB/s      892.285MB/s
BM_SendmsgTCP/1/8k     1.272GB/s        1.394GB/s        1.296GB/s
BM_SendmsgTCP/1/16k    1.802GB/s        2.019GB/s        1.830GB/s
BM_SendmsgTCP/1/32k    2.084GB/s        2.173GB/s        2.156GB/s
BM_SendmsgTCP/1/64k    2.515GB/s        2.463GB/s        2.473GB/s
BM_SendmsgTCP/1/128k   2.811GB/s        3.004GB/s        2.946GB/s
BM_SendmsgTCP/1/256k   3.008GB/s        3.159GB/s        3.171GB/s
BM_SendmsgTCP/1/512k   2.980GB/s        3.150GB/s        3.126GB/s
BM_SendmsgTCP/1/1M     2.165GB/s        2.233GB/s        2.163GB/s
BM_SendmsgTCP/1/2M     2.370GB/s        2.219GB/s        2.453GB/s
BM_SendmsgTCP/1/4M     2.005GB/s        2.091GB/s        2.214GB/s
BM_SendmsgTCP/1/8M     2.111GB/s        2.013GB/s        2.109GB/s
BM_SendmsgTCP/1/16M    1.902GB/s        1.868GB/s        1.897GB/s
BM_SendmsgTCP/1/32M    1.655GB/s        1.665GB/s        1.635GB/s
BM_SendmsgTCP/1/64M    1.575GB/s        1.547GB/s        1.575GB/s
BM_SendmsgTCP/1/128M   1.524GB/s        1.584GB/s        1.580GB/s
BM_SendmsgTCP/1/256M   1.579GB/s        1.607GB/s        1.593GB/s

PiperOrigin-RevId: 278940079
2019-11-06 14:25:59 -08:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Tamir Duberstein 4e6f3a0c71 Remove restrictions on the sending address
It is quite legal to send from the ANY address (it is required for
DHCP). I can't figure out why the broadcast address was included here,
so removing that as well.

PiperOrigin-RevId: 275541954
2019-10-18 14:10:30 -07:00
Bhasker Hariharan f98c3ee32c Remove panic when reassembly fails.
Reassembly can fail due to an invalid sequence of fragments
being received. eg. Multiple fragments with same id which
claim to be the last one by setting the more flag to 0 etc.
It's safer to just drop the reassembler and increment a metric
than to panic when reassembly fails.

PiperOrigin-RevId: 274920901
2019-10-15 17:04:44 -07:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Bhasker Hariharan c7e901f47a Fix bugs in fragment handling.
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.

PiperOrigin-RevId: 274049313
2019-10-10 15:14:55 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Ian Gudger fe1f521077 Remove reundant global tcpip.LinkEndpointID.
PiperOrigin-RevId: 267709597
2019-09-06 18:01:14 -07:00
Bhasker Hariharan 3789c34b22 Make UDP traceroute work.
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.

Fixes #703

PiperOrigin-RevId: 267034418
2019-09-03 16:01:17 -07:00
Tamir Duberstein 573e6e4bba Use tcpip.Subnet in tcpip.Route
This is the first step in replacing some of the redundant types with the
standard library equivalents.

PiperOrigin-RevId: 264706552
2019-08-21 15:31:18 -07:00
Austin Kiekintveld 12c4eb294a Fix ICMPv4 EchoReply packet checksum
The checksum was not being reset before being re-calculated and sent out.
This caused the sent checksum to always be `0x0800`.

Fixes #605.

PiperOrigin-RevId: 260965059
2019-07-31 11:26:41 -07:00
Chris Kuiper 40e682759f Add support for a subnet prefix length on interface network addresses
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.

PiperOrigin-RevId: 259807693
2019-07-24 13:42:14 -07:00
gVisor bot 74dc663bbb Internal change.
PiperOrigin-RevId: 258424489
2019-07-16 13:03:37 -07:00
Kevin Krakauer 9b4d3280e1 Add IPPROTO_RAW, which allows raw sockets to write IP headers.
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.

PiperOrigin-RevId: 257903078
2019-07-12 18:09:12 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Chris Kuiper d18bb4f38a Adjust route when looping multicast packets
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.

PiperOrigin-RevId: 251739298
2019-06-05 16:08:29 -07:00
Googler cbf6ab9697 Check GSO for nil in WritePacket
Testing:
Unit tests added
PiperOrigin-RevId: 247096269
Change-Id: I849c010eadcb53caf45896a15ef38162d66a9568
2019-05-07 14:57:03 -07:00
Kevin Krakauer ff8ed5e6a5 Fix raw socket behavior and tests.
Some behavior was broken due to the difficulty of running automated raw
socket tests.

Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
2019-05-05 16:07:25 -07:00
Googler f2699b76c8 Support IPv4 fragmentation in netstack
Testing:
Unit tests and also large ping in Fuchsia OS
PiperOrigin-RevId: 246563592
Change-Id: Ia12ab619f64f4be2c8d346ce81341a91724aef95
2019-05-03 13:30:35 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00