Commit Graph

2249 Commits

Author SHA1 Message Date
Dean Deng b918d97850 Add reference counting utility to VFS2.
The utility has several differences from the VFS1 equivalent:
- There are no weak references, which have a significant overhead
- In order to print useful debug messages with the type of the reference-
  counted object, we use a generic Refs object with the owner type as a
  template parameter. In vfs1, this was accomplished by storing a type name
  and caller stack directly in the ref count (as in vfs1), which increases the
  struct size by 6x. (Note that the caller stack was needed because fs types
  like Dirent were shared by all fs implementations; in vfs2, each impl has
  its own data structures, so this is no longer necessary.)

Updates #1486.

PiperOrigin-RevId: 325271469
2020-08-06 11:40:03 -07:00
Dean Deng 63447e5afa Only register /dev/net/tun if supported.
PiperOrigin-RevId: 325266487
2020-08-06 11:03:04 -07:00
Ghanan Gowripalan fc4dd3ef45 Join IPv4 all-systems group on NIC enable
Test:
- stack_test.TestJoinLeaveMulticastOnNICEnableDisable
- integration_test.TestIncomingMulticastAndBroadcast
PiperOrigin-RevId: 325185259
2020-08-06 01:32:21 -07:00
Nayana Bidari 35312a95c4 Add loss recovery option for TCP.
/proc/sys/net/ipv4/tcp_recovery is used to enable RACK loss
recovery in TCP.

PiperOrigin-RevId: 325157807
2020-08-05 20:50:06 -07:00
Dean Deng 7ed4b2b5a6 Correctly decrement link counts in tmpfs rename operations.
When a directory is replaced by a rename operation, its link count should
reach zero. We were missing the link from `dir/.`

PiperOrigin-RevId: 325141730
2020-08-05 18:16:57 -07:00
Ghanan Gowripalan 90a2d4e823 Support receiving broadcast IPv4 packets
Test: integration_test.TestIncomingSubnetBroadcast
PiperOrigin-RevId: 325135617
2020-08-05 17:32:54 -07:00
Dean Deng 1403a88c67 Release extra memfd reference.
PiperOrigin-RevId: 325122849
2020-08-05 16:16:32 -07:00
Bhasker Hariharan e7b232a5b8 Prefer RLock over Lock in functions that don't need Lock().
Updates #231

PiperOrigin-RevId: 325097683
2020-08-05 14:11:29 -07:00
Fabricio Voznika 190b1e6bd4 Stop profiling when the sentry exits
Also removes `--profile-goroutine` because it's equivalent
to `debug --stacks`.

PiperOrigin-RevId: 325061502
2020-08-05 11:30:11 -07:00
Dean Deng a2e129b540 Add missing case in tmpfs.inode.direntType.
This was discovered by syzkaller.

PiperOrigin-RevId: 325025193
2020-08-05 08:35:41 -07:00
Nayana Bidari 0e6f7a12c2 Update variables for implementation of RACK in TCP
RACK (Recent Acknowledgement) is a new loss detection
algorithm in TCP. These are the fields which should be
stored on connections to implement RACK algorithm.

PiperOrigin-RevId: 324948703
2020-08-04 20:59:34 -07:00
Dean Deng 87ee3898f7 Handle EOF in vfs2 sendfile.
Discovered by syzkaller.

PiperOrigin-RevId: 324938438
2020-08-04 19:12:31 -07:00
Fabricio Voznika 102735bfb4 Inline gofer.regularFileFD.pwriteLocked
Go compiler barely inlines anything, so inline by hand
pwriteLocked since it's called from a single place.

PiperOrigin-RevId: 324937734
2020-08-04 19:05:55 -07:00
Dean Deng b44408b40e Automated rollback of changelist 324906582
PiperOrigin-RevId: 324931854
2020-08-04 18:20:20 -07:00
Ghanan Gowripalan 00993130e5 Use 1 fragmentation component per IP stack
This will help manage memory consumption by IP reassembly when
receiving IP fragments on multiple network endpoints. Previously,
each endpoint would cap memory consumption at 4MB, but with this
change, each IP stack will cap memory consumption at 4MB.

No behaviour changes.

PiperOrigin-RevId: 324913904
2020-08-04 16:27:00 -07:00
Dean Deng 0500f84b6f Add reference counting utility to VFS2.
The utility has several differences from the VFS1 equivalent:
- There are no weak references, which have a significant overhead
- In order to print useful debug messages with the type of the reference-
  counted object, we use a generic Refs object with the owner type as a
  template parameter. In vfs1, this was accomplished by storing a type name
  and caller stack directly in the ref count (as in vfs1), which increases the
  struct size by 6x. (Note that the caller stack was needed because fs types
  like Dirent were shared by all fs implementations; in vfs2, each impl has
  its own data structures, so this is no longer necessary.)

As an example, the utility is added to tmpfs.inode.

Updates #1486.

PiperOrigin-RevId: 324906582
2020-08-04 15:48:27 -07:00
gVisor bot d64ba89da3 Internal change.
PiperOrigin-RevId: 324826968
2020-08-04 09:31:11 -07:00
gVisor bot 7142a86a2c Internal change.
PiperOrigin-RevId: 324819246
2020-08-04 08:49:02 -07:00
Andrei Vagin 25798f214c Add callbacks to support lazy loading/restoring thread states
PiperOrigin-RevId: 324748508
2020-08-03 22:08:25 -07:00
Ayush Ranjan ad7c9fc4c3 [vfs2] Implement /sys/devices/system/cpu/cpuX.
Fixes #3364

PiperOrigin-RevId: 324724614
2020-08-03 18:16:54 -07:00
gVisor bot fe441dd251 Internal change.
PiperOrigin-RevId: 324695672
2020-08-03 15:30:30 -07:00
Dean Deng 5626ccf61f Remove old TODO.
Fixes #2920.

PiperOrigin-RevId: 324695118
2020-08-03 15:24:54 -07:00
Nayana Bidari b2ae7ea1bb Plumbing context.Context to DecRef() and Release().
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.

PiperOrigin-RevId: 324672584
2020-08-03 13:36:05 -07:00
gVisor bot ef11bb936b Merge pull request #3460 from zhlhahaha:1927
PiperOrigin-RevId: 324658881
2020-08-03 12:38:15 -07:00
Dean Deng 1fbbc795ef Add inotify events for fallocate and tests for fallocate/sendfile.
Updates #1479, #2923.

PiperOrigin-RevId: 324658826
2020-08-03 12:36:14 -07:00
Howard Zhang b9a49f2065 AARCH64:fix variable name collision with register name
The variable name is g which is collision with the reserved name
for R28. This leads to bazel build failure on ARM with following
information:
(register+register) not supported on this architecture

rename it from g to ptr (referenced from golang source
code)

Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2020-08-03 16:38:51 +08:00
gVisor bot d5b31458aa Merge pull request #3300 from lubinszARM:pr_fpsimd_usr
PiperOrigin-RevId: 324309862
2020-07-31 16:48:24 -07:00
Ghanan Gowripalan ade4ff95fc Support fragments from different sources
Prevent fragments with different source-destination pairs from
conflicting with each other.

Test:
    - ipv6_test.TestReceiveIPv6Fragments
    - ipv4_test.TestReceiveIPv6Fragments
PiperOrigin-RevId: 324283246
2020-07-31 14:19:49 -07:00
gVisor bot b22c2ab1d7 Merge pull request #3348 from kevinGC:so-orig-dst
PiperOrigin-RevId: 324279280
2020-07-31 14:01:03 -07:00
gVisor bot 8908baaf79 Internal change.
PiperOrigin-RevId: 324259991
2020-07-31 12:25:38 -07:00
Kevin Krakauer 2a7b2a61e3 iptables: support SO_ORIGINAL_DST
Envoy (#170) uses this to get the original destination of redirected
packets.
2020-07-31 10:47:26 -07:00
Dean Deng 68a7da9549 Clean up vfs2 fallocate.
Move to setstat.go and add a FileDescription wrapper method.

PiperOrigin-RevId: 324165277
2020-07-31 00:40:52 -07:00
Mithun Iyer ad8164bb50 Fix TCP CurrentConnected counter updates.
CurrentConnected counter is incorrectly decremented on close of an
endpoint which is still not connected.

Fixes #3443

PiperOrigin-RevId: 324155171
2020-07-30 22:49:30 -07:00
gVisor bot 6a4bcbdb28 Merge pull request #3448 from lubinszARM:pr_tls_tests
PiperOrigin-RevId: 324127810
2020-07-30 18:44:17 -07:00
gVisor bot c9515dcca3 Merge pull request #3028 from lubinszARM:pr_kvm_hello1
PiperOrigin-RevId: 324125938
2020-07-30 18:29:32 -07:00
gVisor bot 103a178026 Merge pull request #3179 from jinmouil:fuse_init
PiperOrigin-RevId: 324100220
2020-07-30 15:53:40 -07:00
Bhasker Hariharan f15d5a8d0f Revert change to default buffer size.
In
ca6bded95d
we reduced the default buffer size to 32KB. This mostly works fine except at
high throughput where we hit zero window very quickly and the TCP receive
buffer moderation is not able to grow the window. This can be seen in the
benchmarks where with a 32KB buffer and 100 connections downloading a 10MB
file we get about 30 requests/s vs the 1MB buffer gives us about 53 requests/s.

A proper fix requires a few changes to when we send a zero window as well as
when we decide to send a zero window update. Today we consider available space
below 1MSS as zero and send an update when it crosses 1MSS of available space.
This is way too low and results in the window staying very small once we hit
a zero window condition as we keep sending updates with size barely over 1MSS.

Linux and BSD are smarter about this and use different thresholds. We should
separately update our logic to match linux or BSD so that we don't send
window updates that are really tiny or wait until we drop below 1MSS to
advertise a zero window.

PiperOrigin-RevId: 324087019
2020-07-30 14:46:49 -07:00
Ghanan Gowripalan 9960a816a9 Enforce fragment block size and validate args
Allow configuring fragmentation.Fragmentation with a fragment
block size which will be enforced when processing fragments. Also
validate arguments when processing fragments.

Test:
    - fragmentation.TestErrors
    - ipv6_test.TestReceiveIPv6Fragments
    - ipv4_test.TestReceiveIPv6Fragments
PiperOrigin-RevId: 324081521
2020-07-30 14:25:53 -07:00
Sam Balana ab4bb38455 Implement neighbor unreachability detection for ARP and NDP.
This change implements the Neighbor Unreachability Detection (NUD) state
machine, as per RFC 4861 [1]. The state machine operates on a single neighbor
in the local network. This requires the state machine to be implemented on each
entry of the neighbor table.

This change also adds, but does not expose, several APIs. The first API is for
performing basic operations on the neighbor table:
 - Create a static entry
 - List all entries
 - Delete all entries
 - Remove an entry by address

The second API is used for changing the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][2] for the list of
constants.

Finally, the last API is for allowing users to subscribe to NUD state changes.
See [RFC 4861 Appendix C][3] for the list of edges.

[1]: https://tools.ietf.org/html/rfc4861
[2]: https://tools.ietf.org/html/rfc4861#section-10
[3]: https://tools.ietf.org/html/rfc4861#appendix-C

Tests:
 pkg/tcpip/stack:stack_test
 - TestNeighborCacheAddStaticEntryThenOverflow
 - TestNeighborCacheClear
 - TestNeighborCacheClearThenOverflow
 - TestNeighborCacheConcurrent
 - TestNeighborCacheDuplicateStaticEntryWithDifferentLinkAddress
 - TestNeighborCacheDuplicateStaticEntryWithSameLinkAddress
 - TestNeighborCacheEntry
 - TestNeighborCacheEntryNoLinkAddress
 - TestNeighborCacheGetConfig
 - TestNeighborCacheKeepFrequentlyUsed
 - TestNeighborCacheNotifiesWaker
 - TestNeighborCacheOverflow
 - TestNeighborCacheOverwriteWithStaticEntryThenOverflow
 - TestNeighborCacheRemoveEntry
 - TestNeighborCacheRemoveEntryThenOverflow
 - TestNeighborCacheRemoveStaticEntry
 - TestNeighborCacheRemoveStaticEntryThenOverflow
 - TestNeighborCacheRemoveWaker
 - TestNeighborCacheReplace
 - TestNeighborCacheResolutionFailed
 - TestNeighborCacheResolutionTimeout
 - TestNeighborCacheSetConfig
 - TestNeighborCacheStaticResolution
 - TestEntryAddsAndClearsWakers
 - TestEntryDelayToProbe
 - TestEntryDelayToReachableWhenSolicitedOverrideConfirmation
 - TestEntryDelayToReachableWhenUpperLevelConfirmation
 - TestEntryDelayToStaleWhenConfirmationWithDifferentAddress
 - TestEntryDelayToStaleWhenProbeWithDifferentAddress
 - TestEntryFailedGetsDeleted
 - TestEntryIncompleteToFailed
 - TestEntryIncompleteToIncompleteDoesNotChangeUpdatedAt
 - TestEntryIncompleteToReachable
 - TestEntryIncompleteToReachableWithRouterFlag
 - TestEntryIncompleteToStale
 - TestEntryInitiallyUnknown
 - TestEntryProbeToFailed
 - TestEntryProbeToReachableWhenSolicitedConfirmationWithSameAddress
 - TestEntryProbeToReachableWhenSolicitedOverrideConfirmation
 - TestEntryProbeToStaleWhenConfirmationWithDifferentAddress
 - TestEntryProbeToStaleWhenProbeWithDifferentAddress
 - TestEntryReachableToStaleWhenConfirmationWithDifferentAddress
 - TestEntryReachableToStaleWhenConfirmationWithDifferentAddressAndOverride
 - TestEntryReachableToStaleWhenProbeWithDifferentAddress
 - TestEntryReachableToStaleWhenTimeout
 - TestEntryStaleToDelay
 - TestEntryStaleToReachableWhenSolicitedOverrideConfirmation
 - TestEntryStaleToStaleWhenOverrideConfirmation
 - TestEntryStaleToStaleWhenProbeUpdateAddress
 - TestEntryStaysDelayWhenOverrideConfirmationWithSameAddress
 - TestEntryStaysProbeWhenOverrideConfirmationWithSameAddress
 - TestEntryStaysReachableWhenConfirmationWithRouterFlag
 - TestEntryStaysReachableWhenProbeWithSameAddress
 - TestEntryStaysStaleWhenProbeWithSameAddress
 - TestEntryUnknownToIncomplete
 - TestEntryUnknownToStale
 - TestEntryUnknownToUnknownWhenConfirmationWithUnknownAddress

 pkg/tcpip/stack:stack_x_test
 - TestDefaultNUDConfigurations
 - TestNUDConfigurationFailsForNotSupported
 - TestNUDConfigurationsBaseReachableTime
 - TestNUDConfigurationsDelayFirstProbeTime
 - TestNUDConfigurationsMaxMulticastProbes
 - TestNUDConfigurationsMaxRandomFactor
 - TestNUDConfigurationsMaxUnicastProbes
 - TestNUDConfigurationsMinRandomFactor
 - TestNUDConfigurationsRetransmitTimer
 - TestNUDConfigurationsUnreachableTime
 - TestNUDStateReachableTime
 - TestNUDStateRecomputeReachableTime
 - TestSetNUDConfigurationFailsForBadNICID
 - TestSetNUDConfigurationFailsForNotSupported

[1]: https://tools.ietf.org/html/rfc4861
[2]: https://tools.ietf.org/html/rfc4861#section-10
[3]: https://tools.ietf.org/html/rfc4861#appendix-C

Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950

PiperOrigin-RevId: 324070795
2020-07-30 13:30:16 -07:00
Ghanan Gowripalan b00858d075 Use brodcast MAC for broadcast IPv4 packets
When sending packets to a known network's broadcast address, use the
broadcast MAC address.

Test:
- stack_test.TestOutgoingSubnetBroadcast
- udp_test.TestOutgoingSubnetBroadcast
PiperOrigin-RevId: 324062407
2020-07-30 12:50:02 -07:00
Kevin Krakauer bc8201d01b Have dockerutil.Wait* respect the context deadline
PiperOrigin-RevId: 324044634
2020-07-30 11:29:24 -07:00
Dean Deng c43305731e Fix SETOWN_EX return value.
Return on success should be 0, not size of the struct copied out.

PiperOrigin-RevId: 324029193
2020-07-30 10:27:44 -07:00
Bin Lu bb25c9611b add usr-tls test cases for Arm64
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-30 03:44:23 -04:00
Jinmou Li 2e19a8b951 Add FUSE_INIT
This change allows the sentry to send FUSE_INIT request and process
the reply. It adds the corresponding structs, employs the fuse
device to send and read the message, and stores the results of negotiation
in corresponding places (inside connection struct).

It adds a CallAsync() function to the FUSE connection interface:

- like Call(), but it's for requests that do not expect immediate response (init, release, interrupt etc.)
- will block if the connection hasn't initialized, which is the same for Call()
2020-07-29 22:52:12 +00:00
Jamie Liu 4cd4759238 Force registration for EPOLLHUP, not EPOLLRDHUP, in vfs2's epoll.
Compare Linux's fs/eventpoll.c:do_epoll_ctl(). I don't know where EPOLLRDHUP
came from.

PiperOrigin-RevId: 323874419
2020-07-29 14:57:48 -07:00
Bin Lu 267f48ebe2 load/store user fpsimd on Arm64
full context switch: add fpsimd load/store support to container
application.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-29 09:42:07 -04:00
Fabricio Voznika f82dd8ddb4 Redirect TODO to GitHub issues
PiperOrigin-RevId: 323715260
2020-07-28 21:24:26 -07:00
Kevin Krakauer d9c9420335 ip6tables testing
We skip gVisor tests for now, as ip6tables aren't yet implemented.
2020-07-28 10:51:14 -07:00
Jamie Liu 18c2463596 Fix strace for epoll event arrays.
PiperOrigin-RevId: 323491461
2020-07-27 19:27:14 -07:00
gVisor bot b0eafc7454 Merge pull request #3201 from lubinszARM:pr_sys64_2
PiperOrigin-RevId: 323456118
2020-07-27 15:46:33 -07:00