Commit Graph

827 Commits

Author SHA1 Message Date
gVisor bot 387730e4ab Merge release-20210301.0-8-g3e69f5d08 (automated) 2021-03-03 20:29:39 +00:00
Bhasker Hariharan 3e69f5d088 Add checklocks analyzer.
This validates that struct fields if annotated with "// checklocks:mu" where
"mu" is a mutex field in the same struct then access to the field is only
done with "mu" locked.

All types that are guarded by a mutex must be annotated with

// +checklocks:<mutex field name>

For more details please refer to README.md.

PiperOrigin-RevId: 360729328
2021-03-03 12:24:21 -08:00
gVisor bot 6b785c5e3d Merge release-20210208.0-106-g865ca64ee (automated) 2021-03-01 20:21:56 +00:00
Andrei Vagin 865ca64ee8 tcp: endpoint.Write has to send all data that has been read from payload
io.Reader.ReadFull returns the number of bytes copied and an error if fewer
bytes were read.

PiperOrigin-RevId: 360247614
2021-03-01 12:17:20 -08:00
gVisor bot 0c7b403661 Merge release-20210208.0-105-g037bb2f45 (automated) 2021-02-27 04:21:37 +00:00
Bhasker Hariharan 037bb2f45a Fix panic due to zero length writes in TCP.
There is a short race where in Write an endpoint can transition from writable
to non-writable state due to say an incoming RST during the time we release
the endpoint lock and reacquire after copying the payload. In such a case
if the write happens to be a zero sized write we end up trying to call
sendData() even though nothing was queued.

This can panic when trying to enable/disable TCP timers if the endpoint had
already transitioned to a CLOSED/ERROR state due to the incoming RST as we
cleanup timers when the protocol goroutine terminates.

Sadly the race window is small enough that my attempts at reproducing the panic
in a syscall test has not been successful.

PiperOrigin-RevId: 359887905
2021-02-26 20:16:48 -08:00
gVisor bot 796fd943e2 Merge release-20210208.0-101-gda2505df9 (automated) 2021-02-26 19:23:18 +00:00
Tamir Duberstein da2505df94 Use closure to avoid manual unlocking
Also increase refcount of raw.endpoint.route while in use.

Avoid allocating an array of size zero.

PiperOrigin-RevId: 359797788
2021-02-26 11:18:30 -08:00
gVisor bot 05e99cbff6 Merge release-20210208.0-99-gf3de211bb (automated) 2021-02-26 00:34:30 +00:00
Nayana Bidari f3de211bb7 RACK: recovery logic should check for receive window before re-transmitting.
Use maybeSendSegment while sending segments in RACK recovery which checks if
the receiver has space and splits the segments when the segment size is
greater than MSS.

PiperOrigin-RevId: 359641097
2021-02-25 16:29:28 -08:00
gVisor bot c35f73b493 Merge release-20210208.0-97-g38c42bbf4 (automated) 2021-02-25 21:40:11 +00:00
Kevin Krakauer 38c42bbf4a Remove deadlock in raw.endpoint caused by recursive read locking
Prevents the following deadlock:
- Raw packet is sent via e.Write(), which read locks e.mu
- Connect() is called, blocking on write locking e.mu
- The packet is routed to loopback and back to e.HandlePacket(), which read
  locks e.mu

Per the atomic.RWMutex documentation, this deadlocks:

"If a goroutine holds a RWMutex for reading and another goroutine might call
Lock, no goroutine should expect to be able to acquire a read lock until the
initial read lock is released. In particular, this prohibits recursive read
locking. This is to ensure that the lock eventually becomes available; a blocked
Lock call excludes new readers from acquiring the lock."

Also, release eps.mu earlier in deliverRawPacket.

PiperOrigin-RevId: 359600926
2021-02-25 13:35:44 -08:00
gVisor bot 40dd18cd67 Merge release-20210208.0-52-g845d0a65f (automated) 2021-02-12 06:10:43 +00:00
Ayush Ranjan 845d0a65f4 [rack] TLP: ACK Processing and PTO scheduling.
This change implements TLP details enumerated in
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3

Fixes #5085

PiperOrigin-RevId: 357125037
2021-02-11 22:06:09 -08:00
gVisor bot a5c96a9f60 Merge release-20210201.0-91-g91cf7b3ca (automated) 2021-02-12 01:27:06 +00:00
Ayush Ranjan 91cf7b3ca4 [netstack] Fix recovery entry and exit checks.
Entry check:

- Earlier implementation was preventing us from entering recovery even if
  SND.UNA is lost but dupAckCount is still below threshold. Fixed that.
- We should only enter recovery when at least one more byte of data beyond the
  highest byte that was outstanding when fast retransmit was last entered is
  acked. Added that check.

Exit check:

- Earlier we were checking if SEG.ACK is in range [SND.UNA, SND.NXT]. The
  intention was to check if any unacknowledged data was ACKed. Note that
  (SEG.ACK - 1) is actually the sequence number which was ACKed. So we were
  incorrectly including (SND.UNA - 1) in the range. Fixed the check to now be
  (SEG.ACK - 1) in range [SND.UNA, SND.NXT).

Additionally, moved a RACK specific test to the rack tests file.
Added tests for the changes I made.

PiperOrigin-RevId: 357091322
2021-02-11 17:19:47 -08:00
gVisor bot 9994360861 Merge release-20210201.0-83-gff04d019e (automated) 2021-02-11 00:43:48 +00:00
Nayana Bidari ff04d019e3 RACK: Fix re-transmitting the segment twice when entering recovery.
TestRACKWithDuplicateACK is flaky as the reorder window can expire before
receiving three duplicate ACKs which will result in sending the first
unacknowledged segment twice: when reorder timer expired and again after
receiving the third duplicate ACK.

This CL will fix this behavior and will not resend the segment again if it was
already re-transmittted when reorder timer expired.

Update the TestRACKWithDuplicateACK to test that the first segment is
considered as lost and is re-transmitted.

PiperOrigin-RevId: 356855168
2021-02-10 16:38:55 -08:00
gVisor bot 549583ee9f Merge release-20210201.0-72-g298c129cc (automated) 2021-02-10 06:00:13 +00:00
Bhasker Hariharan 298c129cc1 Add support for setting SO_SNDBUF for unix domain sockets.
The limits for snd/rcv buffers for unix domain socket is controlled by the
following sysctls on linux

 - net.core.rmem_default
 - net.core.rmem_max
 - net.core.wmem_default
 - net.core.wmem_max

Today in gVisor we do not expose these sysctls but we do support setting the
equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor
can be used without netstack, with hostinet or even without any networking stack
at all. Which means ideally these sysctls need to live as globals in gVisor.

But rather than make this a big change for now we hardcode the limits in the
AF_UNIX implementation itself (which in itself is better than where we were
before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial
limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use
today.

Updates #5132

PiperOrigin-RevId: 356665498
2021-02-09 21:55:16 -08:00
gVisor bot ccf8cbfe80 Merge release-20210201.0-60-g95500ece5 (automated) 2021-02-09 04:18:27 +00:00
Zeling Feng 95500ece56 Allow UDP sockets connect()ing to port 0
We previously return EINVAL when connecting to port 0, however this is not the
observed behavior on Linux. One of the observable effects after connecting to
port 0 on Linux is that getpeername() will fail with ENOTCONN.

PiperOrigin-RevId: 356413451
2021-02-08 20:13:17 -08:00
gVisor bot d2f0aebf95 Merge release-20210201.0-55-gfe63db2e9 (automated) 2021-02-08 20:15:57 +00:00
Nayana Bidari fe63db2e96 RACK: Detect loss
Detect packet loss using reorder window and re-transmit them after the reorder
timer expires.

PiperOrigin-RevId: 356321786
2021-02-08 12:09:54 -08:00
gVisor bot 939f5cc51a Merge release-20210125.0-74-ge3bce9689 (automated) 2021-02-03 19:14:39 +00:00
Nayana Bidari e3bce9689f Add a function to enable RACK in tests.
- Adds a function to enable RACK in tests.
- RACK update functions are guarded behind the flag tcpRecovery.

PiperOrigin-RevId: 355435973
2021-02-03 11:09:23 -08:00
gVisor bot ef8d37b838 Merge release-20210125.0-63-g49f783fb6 (automated) 2021-02-02 21:37:30 +00:00
Nayana Bidari 49f783fb65 Rename HandleNDupAcks in TCP.
Rename HandleNDupAcks() to HandleLossDetected() as it will enter this when
is detected after:
- reorder window expires and TLP (in case of RACK)
- dupAckCount >= 3

PiperOrigin-RevId: 355237858
2021-02-02 13:21:40 -08:00
gVisor bot 831751d3d1 Merge release-20210125.0-58-g8c7c5abaf (automated) 2021-02-02 19:20:47 +00:00
Bhasker Hariharan 8c7c5abafb Add support for rate limiting out of window ACKs.
Netstack today will send dupACK's with no rate limit for incoming out of
window segments. This can result in ACK loops for example if a TCP socket
connects to itself (actually permitted by TCP). Where the ACK sent in
response to packets being out of order itself gets considered as an out
of window segment resulting in another ACK being generated.

PiperOrigin-RevId: 355206877
2021-02-02 11:05:28 -08:00
gVisor bot e82d147017 Merge release-20210125.0-47-gebd3912c0 (automated) 2021-02-01 20:21:37 +00:00
Ghanan Gowripalan ebd3912c0f Refactor HandleControlPacket/SockError
...to remove the need for the transport layer to deduce the type of
error it received.

Rename HandleControlPacket to HandleError as HandleControlPacket only
handles errors.

tcpip.SockError now holds a tcpip.SockErrorCause interface that
different errors can implement.

PiperOrigin-RevId: 354994306
2021-02-01 12:04:03 -08:00
Ghanan Gowripalan daeb06d2cb Hide neighbor table kind from NetworkEndpoint
The network endpoint should not need to have logic to handle different
kinds of neighbor tables. Network endpoints can let the NIC know about
differnt neighbor discovery messages and let the NIC decide which table
to update.

This allows us to remove the LinkAddressCache interface.

PiperOrigin-RevId: 354812584
2021-01-31 10:03:46 -08:00
gVisor bot 95a799b374 Merge release-20210125.0-24-gff4fc4278 (automated) 2021-01-29 04:25:45 +00:00
Nayana Bidari ff4fc42784 RACK: Update reorder window.
After receiving an ACK(cumulative or selective), RACK will update the reorder
window which is used as a settling time before marking the packet as lost.
This change will add an init function to initialize the variables in RACK and
also store the reference to sender in rackControl.
The reorder window is calculated as per rfc:
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.2 Step 4.

PiperOrigin-RevId: 354453528
2021-01-28 20:08:23 -08:00
gVisor bot ed0a3c9243 Merge release-20210125.0-21-g8d1afb418 (automated) 2021-01-29 02:16:39 +00:00
Tamir Duberstein 8d1afb4185 Change tcpip.Error to an interface
This makes it possible to add data to types that implement tcpip.Error.
ErrBadLinkEndpoint is removed as it is unused.

PiperOrigin-RevId: 354437314
2021-01-28 17:59:58 -08:00
gVisor bot 8226803c10 Merge release-20210125.0-12-g6012fe9b5 (automated) 2021-01-28 14:42:15 +00:00
Marina Ciocea 6012fe9b59 Respect SO_BINDTODEVICE in unconnected UDP writes
Previously, sending on an unconnected UDP socket would ignore the
SO_BINDTODEVICE option. Send on the configured interface when an UDP socket
is bound to an interface through setsockop SO_BINDTODEVICE.

Add packetimpact tests exercising UDP reads and writes with every combination
of bound/unbound, broadcast/multicast/unicast destination, and bound/not-bound
to device.

PiperOrigin-RevId: 354299670
2021-01-28 06:24:46 -08:00
gVisor bot d34fe8d385 Merge release-20210125.0-11-gb85b23e50 (automated) 2021-01-28 03:23:43 +00:00
Ghanan Gowripalan b85b23e50d Confirm neighbor reachability with TCP ACKs
As per RFC 4861 section 7.3.1,
  A neighbor is considered reachable if the node has recently received
  a confirmation that packets sent recently to the neighbor were
  received by its IP layer. Positive confirmation can be gathered in
  two ways: hints from upper-layer protocols that indicate a connection
  is making "forward progress", or receipt of a Neighbor Advertisement
  message that is a response to a Neighbor Solicitation message.

This change adds support for TCP to let the IP/link layers know that a
neighbor is reachable.

Test: integration_test.TestTCPConfirmNeighborReachability
PiperOrigin-RevId: 354222833
2021-01-27 19:08:51 -08:00
gVisor bot b161e5d2a3 Merge release-20210112.0-104-g99988e45e (automated) 2021-01-28 00:27:45 +00:00
Nayana Bidari 99988e45ed Add support for more fields in netstack for TCP_INFO
This CL adds support for the following fields:
- RTT, RTTVar, RTO
- send congestion window (sndCwnd) and send slow start threshold (sndSsthresh)
- congestion control state(CaState)
- ReorderSeen

PiperOrigin-RevId: 354195361
2021-01-27 16:14:50 -08:00
gVisor bot 14ee229ad6 Merge release-20210112.0-98-g8e6604474 (automated) 2021-01-27 02:19:18 +00:00
Nayana Bidari 8e66044741 Initialize the send buffer handler in endpoint creation.
- This CL will initialize the function handler used for getting the send
buffer size limits during endpoint creation and does not require the caller of
SetSendBufferSize(..) to know the endpoint type(tcp/udp/..)

PiperOrigin-RevId: 353992634
2021-01-26 18:05:29 -08:00
gVisor bot 7edc1ce12f Merge release-20210112.0-96-gce39f8298 (automated) 2021-01-26 21:26:00 +00:00
Tamir Duberstein ce39f82985 Implement error on pointers
This improves type-assertion safety.

PiperOrigin-RevId: 353931228
2021-01-26 13:03:40 -08:00
gVisor bot fcb89d2d29 Merge release-20210112.0-95-ga90661654 (automated) 2021-01-26 20:48:14 +00:00
Bhasker Hariharan a90661654d Fix couple of potential route leaks.
connect() can be invoked multiple times on UDP/RAW sockets and in such
a case we should release the cached route from the previous connect.

Fixes #5359

PiperOrigin-RevId: 353919891
2021-01-26 12:09:10 -08:00
gVisor bot 270d619546 Merge release-20210112.0-92-g9ba24d449 (automated) 2021-01-26 19:04:34 +00:00