Commit Graph

5303 Commits

Author SHA1 Message Date
Zach Koopmans ba51999fa6 Fix bug with iperf and don't profile runc.
Fix issue with iperf where b.N wasn't changing across runs.
Also, if the given runtime is runc/not given, don't run a profile against it.

PiperOrigin-RevId: 357231450
2021-02-12 11:28:16 -08:00
Andrei Vagin a6d813ad55 tests: getsockname expects that addrlen will be initialized
PiperOrigin-RevId: 357224877
2021-02-12 10:58:17 -08:00
Ayush Ranjan 845d0a65f4 [rack] TLP: ACK Processing and PTO scheduling.
This change implements TLP details enumerated in
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3

Fixes #5085

PiperOrigin-RevId: 357125037
2021-02-11 22:06:09 -08:00
Jamie Liu 34614c3986 Unconditionally check for directory-ness in overlay.filesystem.UnlinkAt().
PiperOrigin-RevId: 357106080
2021-02-11 19:10:22 -08:00
Ayush Ranjan 91cf7b3ca4 [netstack] Fix recovery entry and exit checks.
Entry check:

- Earlier implementation was preventing us from entering recovery even if
  SND.UNA is lost but dupAckCount is still below threshold. Fixed that.
- We should only enter recovery when at least one more byte of data beyond the
  highest byte that was outstanding when fast retransmit was last entered is
  acked. Added that check.

Exit check:

- Earlier we were checking if SEG.ACK is in range [SND.UNA, SND.NXT]. The
  intention was to check if any unacknowledged data was ACKed. Note that
  (SEG.ACK - 1) is actually the sequence number which was ACKed. So we were
  incorrectly including (SND.UNA - 1) in the range. Fixed the check to now be
  (SEG.ACK - 1) in range [SND.UNA, SND.NXT).

Additionally, moved a RACK specific test to the rack tests file.
Added tests for the changes I made.

PiperOrigin-RevId: 357091322
2021-02-11 17:19:47 -08:00
gVisor bot 4314bb0b2b Internal change.
PiperOrigin-RevId: 357090170
2021-02-11 17:12:23 -08:00
Kevin Krakauer c39284f457 Let sentry understand tcpip.ErrMalformedHeader
Added a LINT IfChange/ThenChange check to catch this in the future.

PiperOrigin-RevId: 357077564
2021-02-11 16:01:43 -08:00
Toshi Kikuchi 2129dfff61 iptables test: Implement testCase interface on pointers
Implementing interfaces on value types causes the interface to be
implemented by both the value type and the pointer type of the
implementer. This complicates type assertion as it requires the
assertion to check for both the pointer type and the value type.

PiperOrigin-RevId: 357061063
2021-02-11 14:39:41 -08:00
Jing Chen c833eed80a Implement semtimedop.
PiperOrigin-RevId: 357031904
2021-02-11 12:21:59 -08:00
Kevin Krakauer ae8d966f5a Assign controlling terminal when tty is opened and support NOCTTY
PiperOrigin-RevId: 357015186
2021-02-11 11:09:22 -08:00
Fabricio Voznika 192780946f Allow rt_sigaction in gofer seccomp
rt_sigaction may be called by Go runtime when trying to panic:

https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;drc=ed3e4afa12d655a0c5606bcf3dd4e1cdadcb1476;bpv=1;bpt=1;l=780?q=rt_sigaction&ss=go

Updates #5038

PiperOrigin-RevId: 357013186
2021-02-11 11:01:21 -08:00
Kevin Krakauer 81ea0016e6 Support setgid directories in tmpfs and kernfs
PiperOrigin-RevId: 356868412
2021-02-10 17:45:18 -08:00
Nayana Bidari ff04d019e3 RACK: Fix re-transmitting the segment twice when entering recovery.
TestRACKWithDuplicateACK is flaky as the reorder window can expire before
receiving three duplicate ACKs which will result in sending the first
unacknowledged segment twice: when reorder timer expired and again after
receiving the third duplicate ACK.

This CL will fix this behavior and will not resend the segment again if it was
already re-transmittted when reorder timer expired.

Update the TestRACKWithDuplicateACK to test that the first segment is
considered as lost and is re-transmitted.

PiperOrigin-RevId: 356855168
2021-02-10 16:38:55 -08:00
Andrei Vagin 97a36d1696 Don't allow to umount the namespace root mount
Linux does the same thing.

Reported-by: syzbot+6c79385c930c929d1d9e@syzkaller.appspotmail.com
PiperOrigin-RevId: 356854562
2021-02-10 16:32:45 -08:00
Ayush Ranjan 96d3b3188b Fix broken IFTTT link in tcpip.
PiperOrigin-RevId: 356852625
2021-02-10 16:22:53 -08:00
Zach Koopmans 36e4100a28 Update benchmarks README.md
PiperOrigin-RevId: 356843249
2021-02-10 15:42:48 -08:00
Mithun Iyer 380ede9b73 Retry RST expectation in tcp_synrcvd_reset_test
Deflake this test by retransmitting the ACK and retrying RST
expectation after the supposed state transition to CLOSED.
This gives time for the state transition to complete.

Without such a retransmit from the test, the ACK could get silently
dropped by the listener when the passively connecting endpoint
has not yet completely updated the state (in gVisor this would be
endpoint state and decrement of synRcvdCount).

PiperOrigin-RevId: 356825562
2021-02-10 14:22:16 -08:00
Rahat Mahmood c2f204658e Add proposal for io_uring project.
PiperOrigin-RevId: 356807933
2021-02-10 13:06:42 -08:00
Matt LaPlante 458bf12c13 Internal change.
PiperOrigin-RevId: 356784956
2021-02-10 11:36:15 -08:00
Zach Koopmans 1ac58cc23e Add mitigate command to runsc
PiperOrigin-RevId: 356772367
2021-02-10 10:48:48 -08:00
gVisor bot b9db7db3bd Merge pull request #5267 from lubinszARM:pr_usr_lazy_fp
PiperOrigin-RevId: 356762859
2021-02-10 10:10:17 -08:00
Bhasker Hariharan 298c129cc1 Add support for setting SO_SNDBUF for unix domain sockets.
The limits for snd/rcv buffers for unix domain socket is controlled by the
following sysctls on linux

 - net.core.rmem_default
 - net.core.rmem_max
 - net.core.wmem_default
 - net.core.wmem_max

Today in gVisor we do not expose these sysctls but we do support setting the
equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor
can be used without netstack, with hostinet or even without any networking stack
at all. Which means ideally these sysctls need to live as globals in gVisor.

But rather than make this a big change for now we hardcode the limits in the
AF_UNIX implementation itself (which in itself is better than where we were
before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial
limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use
today.

Updates #5132

PiperOrigin-RevId: 356665498
2021-02-09 21:55:16 -08:00
Zeling Feng 2de36e44ed Make RPCTimeout for udp_send_recv_dgram to be 500 milliseconds.
The test will sometimes fail on Bind calls using the old RPCTimeout.

PiperOrigin-RevId: 356646668
2021-02-09 19:32:47 -08:00
Dean Deng f6de413c39 Add cleanup TODO for integer-based proc files.
PiperOrigin-RevId: 356645022
2021-02-09 19:18:09 -08:00
Tamir Duberstein c41583fb20 Update and tidy Go modules
PiperOrigin-RevId: 356624256
2021-02-09 16:56:09 -08:00
Fabricio Voznika 0f84ea5afe Fix fd leak from test
PiperOrigin-RevId: 356587965
2021-02-09 14:12:53 -08:00
Ghanan Gowripalan 18e993eb4f Move network internal code to internal package
Utilities written to be common across IPv4/IPv6 are not planned to be
available for public use.

https://golang.org/doc/go1.4#internalpackages

PiperOrigin-RevId: 356554862
2021-02-09 11:52:31 -08:00
Sam Balana d0c0549e60 Deprecate Failed state in favor of Unreachable state
... as per RFC 7048. The Failed state is an internal state that is not
specified by any RFC; replacing it with the Unreachable state enables us to
expose this state while keeping our terminology consistent with RFC 4861 and
RFC 7048.

Unreachable state replaces all internal references for Failed state. However
unlike the Failed state, change events are dispatched when moving into
Unreachable state. This gives developers insight into whether a neighbor entry
failed address resolution or whether it was explicitly removed.

The Failed state will be removed entirely once all references to it are
removed. This is done to avoid a Fuchsia roll failure.

Updates #4667

PiperOrigin-RevId: 356554104
2021-02-09 11:47:06 -08:00
Tamir Duberstein 2b978d8743 Collapse code that always returns error
PiperOrigin-RevId: 356536548
2021-02-09 10:42:38 -08:00
Andrei Vagin fe4f478960 kernel: reparentLocked has to update children maps of old and new parents
Reported-by: syzbot+9ffc71246fe72c73fc25@syzkaller.appspotmail.com
PiperOrigin-RevId: 356536113
2021-02-09 10:37:04 -08:00
Andrei Vagin d6dbe6e5ca pipe: writeLocked has to return ErrWouldBlock if the pipe is full
PiperOrigin-RevId: 356450303
2021-02-09 01:34:45 -08:00
Julian Elischer 3f802e7180 add IPv4 options processing for forwarding and reassembly
IPv4 forwarding and reassembly needs support for option processing
and regular processing also needs options to be processed before
being passed to the transport layer. This patch extends option processing
to those cases and provides additional testing. A small change to the ICMP
error generation API code was required to allow it to know when a packet was
being forwarded or not.

Updates #4586

PiperOrigin-RevId: 356446681
2021-02-09 01:02:43 -08:00
Ghanan Gowripalan 6671a42d60 Remove unnecessary locking
The thing the lock protects will never be accessed concurrently.

PiperOrigin-RevId: 356423331
2021-02-08 21:41:17 -08:00
Zeling Feng 95500ece56 Allow UDP sockets connect()ing to port 0
We previously return EINVAL when connecting to port 0, however this is not the
observed behavior on Linux. One of the observable effects after connecting to
port 0 on Linux is that getpeername() will fail with ENOTCONN.

PiperOrigin-RevId: 356413451
2021-02-08 20:13:17 -08:00
Andrei Vagin bf4968e17d exec: don't panic if an elf file is malformed
Reported-by: syzbot+d54bc27a15aefe52c330@syzkaller.appspotmail.com
PiperOrigin-RevId: 356406975
2021-02-08 19:18:03 -08:00
Ghanan Gowripalan 39251f31cb Support performing DAD for any address
...as long as the network protocol supports duplicate address detection.

This CL provides the facilities for a netstack integrator to perform
DAD.

DHCP recommends that clients effectively perform DAD before accepting an
offer. As per RFC 2131 section 4.4.1 pg 38,

  The client SHOULD perform a check on the suggested address to ensure
  that the address is not already in use.  For example, if the client
  is on a network that supports ARP, the client may issue an ARP request
  for the suggested request.

The implementation of ARP-based IPv4 DAD effectively operates the same
as IPv6's NDP DAD - using ARP requests and responses in place of
NDP neighbour solicitations and advertisements, respectively.

DAD performed by calls to (*Stack).CheckDuplicateAddress don't interfere
with DAD performed when a new IPv6 address is added. This is so that
integrator requests to check for duplicate addresses aren't unexpectedly
aborted when addresses are removed.

A network package internal package provides protocol agnostic DAD state
management that specific protocols that provide DAD can use.

Fixes #4550.

Tests:
  - internal/ip_test.*
  - integration_test.TestDAD
  - arp_test.TestDADARPRequestPacket
  - ipv6.TestCheckDuplicateAddress
PiperOrigin-RevId: 356405593
2021-02-08 19:05:45 -08:00
Ayush Ranjan cfa4633c3d [go-marshal] Add dynamic tag in go_marshal.
This makes it easier to implement dynamically sized types in go-marshal. You
really only need to implement MarshalBytes, UnmarshalBytes and SizeBytes to
implement the entire interface.

By using the `dynamic` tag, the autogenerator will generate the rest of the
methods for us.

This change also simplifies how KernelIPTGetEntries implements Marshallable
using the newly added utility.

PiperOrigin-RevId: 356397114
2021-02-08 18:08:29 -08:00
Ayush Ranjan e51f775cbb [go-marshal] Remove binary package reference from syscalls package.
Fixes a bug in our getsockopt(2) implementation which was incorrectly using
binary.Size() instead of Marshallable.SizeBytes().

PiperOrigin-RevId: 356396551
2021-02-08 18:02:35 -08:00
Nayana Bidari fe63db2e96 RACK: Detect loss
Detect packet loss using reorder window and re-transmit them after the reorder
timer expires.

PiperOrigin-RevId: 356321786
2021-02-08 12:09:54 -08:00
Ghanan Gowripalan 3853a94f10 Remove linkAddrCache
It was replaced by NUD/neighborCache.

Fixes #4658.

PiperOrigin-RevId: 356085221
2021-02-06 21:37:15 -08:00
Ghanan Gowripalan 554c405e87 Synchronously send packets over pipe link endpoint
Before this change, packets were delivered asynchronously to the remote
end of a pipe. This was to avoid a deadlock during link resolution where
the stack would attempt to double-lock a mutex (see removed comments in
the parent commit for details).

As of https://github.com/google/gvisor/commit/4943347137, we do not hold
locks while sending link resolution probes so the deadlock will no
longer occur.

PiperOrigin-RevId: 356066224
2021-02-06 16:58:25 -08:00
Ghanan Gowripalan 11ce8ba992 Use fine grained locks while sending NDP packets
Previously when sending NDP DAD or RS messages, we would hold a shared
lock which lead to deadlocks (due to synchronous packet loooping
(e.g. pipe and loopback link endpoints)) and lock contention.

Writing packets may be an expensive operation which could prevent other
goroutines from doing meaningful work if a shared lock is held while
writing packets.

This change upates the NDP DAD/RS timers to not hold shared locks while
sending packets.

PiperOrigin-RevId: 356053146
2021-02-06 13:47:37 -08:00
Ghanan Gowripalan c5afaf2854 Remove (*stack.Stack).FindNetworkEndpoint
The network endpoints only look for other network endpoints of the
same kind. Since the network protocols keeps track of all endpoints,
go through the protocol to find an endpoint with an address instead
of the stack.

PiperOrigin-RevId: 356051498
2021-02-06 13:25:28 -08:00
Ghanan Gowripalan 4943347137 Use fine grained locks while sending NUD probes
Previously when sending probe messages, we would hold a shared lock
which lead to deadlocks (due to synchronous packet loooping (e.g. pipe
and loopback link endpoints)) and lock contention.

Writing packets may be an expensive operation which could prevent other
goroutines from doing meaningful work if a shared lock is held while
writing packets.

This change upates the NUD timers to not hold shared locks while
sending packets.

PiperOrigin-RevId: 356048697
2021-02-06 12:44:15 -08:00
Ghanan Gowripalan a83c8585af Use embedded mutex pattern in neighbor cache/entry
Also while I'm here, update neighbor cahce/entry tests to use the
stack's RNG instead of creating a neigbor cache/entry specific one.

PiperOrigin-RevId: 356040581
2021-02-06 10:47:28 -08:00
Ghanan Gowripalan 9530f624e9 Unexpose NIC
The NIC structure is not to be used outside of the stack package
directly.

PiperOrigin-RevId: 356036737
2021-02-06 09:49:14 -08:00
Ghanan Gowripalan c19e049f2c Check local address directly through NIC
Network endpoints that wish to check addresses on another NIC-local
network endpoint may now do so through the NetworkInterface.

This fixes a lock ordering issue between NIC removal and link
resolution. Before this change:

  NIC Removal takes the stack lock, neighbor cache lock then neighbor
  entries' locks.

  When performing IPv4 link resolution, we take the entry lock then ARP
  would try check IPv4 local addresses through the stack which tries to
  obtain the stack's lock.

Now that ARP can check IPv4 addreses through the NIC, we avoid the lock
ordering issue, while also removing the need for stack to lookup the
NIC.

PiperOrigin-RevId: 356034245
2021-02-06 09:09:19 -08:00
Ghanan Gowripalan 83b764d9d2 Batch write packets after iptables checks
After IPTables checks a batch of packets, we can write packets that are
not dropped or locally destined as a batch instead of individually.

This previously caused a bug since WritePacket* functions expect to take
ownership of passed PacketBuffer{List}. WritePackets assumed the list of
PacketBuffers will not be invalidated when calling WritePacket for each
PacketBuffer in the list, but this is not true. WritePacket may add the
passed PacketBuffer into a different list which would modify the
PacketBuffer in such a way that it no longer points to the next
PacketBuffer to write.

Example: Given a PB list of
    PB_a -> PB_b -> PB_c

WritePackets may be iterating over the list and calling WritePacket for
each PB. When WritePacket takes PB_a, it may add it to a new list which
would update pointers such that PB_a no longer points to PB_b.

Test: integration_test.TestIPTableWritePackets
PiperOrigin-RevId: 355969560
2021-02-05 18:44:04 -08:00
Ting-Yu Wang 120c8e3468 Replace TaskFromContext(ctx).Kernel() with KernelFromContext(ctx)
Panic seen at some code path like control.ExecAsync where
ctx does not have a Task.

Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com
PiperOrigin-RevId: 355960596
2021-02-05 17:28:01 -08:00
Ayush Ranjan 09afd68326 [vfs] Handle `.` and `..` as last path component names in kernfs Rename.
According to vfs.FilesystemImpl.RenameAt documentation:

- If the last path component in rp is "." or "..", and opts.Flags contains
  RENAME_NOREPLACE, RenameAt returns EEXIST.
- If the last path component in rp is "." or "..", and opts.Flags does not
  contain RENAME_NOREPLACE, RenameAt returns EBUSY.

Reported-by: syzbot+6189786e64fe13fe43f8@syzkaller.appspotmail.com
PiperOrigin-RevId: 355959266
2021-02-05 17:17:30 -08:00