Commit Graph

1936 Commits

Author SHA1 Message Date
Zeling Feng 5f3eeb4728 Test that we have PAWS mechanism
If there is a Timestamps option in the arriving segment and SEG.TSval
< TS.Recent and if TS.Recent is valid, then treat the arriving segment
as not acceptable: Send an acknowledgement in reply as specified in
RFC-793 page 69 and drop the segment.

https://tools.ietf.org/html/rfc1323#page-19

PiperOrigin-RevId: 312590678
2020-05-20 17:53:35 -07:00
gVisor bot a338eed1d8 Internal change.
PiperOrigin-RevId: 312559963
2020-05-20 14:57:59 -07:00
Dean Deng 76369b6480 Move fsimpl/host file offset from inode to fileDescription.
PiperOrigin-RevId: 312559861
2020-05-20 14:53:51 -07:00
gVisor bot 6efce83821 Merge pull request #2688 from lubinszARM:pr_goid
PiperOrigin-RevId: 312524376
2020-05-20 11:48:12 -07:00
Dean Deng 05c89af6ed Implement mmap for host fs in vfs2.
In VFS1, both fs/host and fs/gofer used the same utils for host file mappings.
Refactor parts of fsimpl/gofer to create similar utils to share with
fsimpl/host (memory accounting code moved to fsutil, page rounding arithmetic
moved to usermem).

Updates #1476.

PiperOrigin-RevId: 312345090
2020-05-19 13:46:42 -07:00
Dean Deng d06de1bede Fix flaky udp tests by polling before reading.
On native Linux, calling recv/read right after send/write sometimes returns
EWOULDBLOCK, if the data has not made it to the receiving socket (even though
the endpoints are on the same host). Poll before reading to avoid this.

Making this change also uncovered a hostinet bug (gvisor.dev/issue/2726),
which is noted in this CL.

PiperOrigin-RevId: 312320587
2020-05-19 11:41:52 -07:00
gVisor bot 5823629442 Merge pull request #2687 from lubinszARM:pr_tls_1
PiperOrigin-RevId: 312299234
2020-05-19 09:55:20 -07:00
Fabricio Voznika 20e6efd302 Remove IfChange/ThenChange lint from VFS2
As new functionality is added to VFS2, corresponding files in VFS1
don't need to be changed.

PiperOrigin-RevId: 312153799
2020-05-18 14:26:09 -07:00
Adin Scannell 420b791a3d Minor formatting updates for gvisor.dev.
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense
  in one place.

* Drop "user-space kernel" and use "application kernel". The term "user-space
  kernel" is confusing when some platform implementation do not run in
  user-space (instead running in guest ring zero).

* Clear up the relationship between the Platform page in the user guide and the
  Platform page in the architecture guide, and ensure they are cross-linked.

* Restore the call-to-action quick start link in the main page, and drop the
  GitHub link (which also appears in the top-right).

* Improve image formatting by centering all doc and blog images, and move the
  image captions to the alt text.

PiperOrigin-RevId: 311845158
2020-05-15 20:05:18 -07:00
Bhasker Hariharan 679fd2527b Remove debug log left behind by mistake.
PiperOrigin-RevId: 311808460
2020-05-15 15:06:08 -07:00
Jamie Liu fb7e5f1676 Make utimes_test pass on VFS2.
PiperOrigin-RevId: 311657502
2020-05-14 20:09:55 -07:00
Nicolas Lacasse 47dfba7661 Port memfd_create to vfs2 and finish implementation of file seals.
Closes #2612.

PiperOrigin-RevId: 311548074
2020-05-14 09:35:54 -07:00
Mithun Iyer f1ad2d54ab Fix TCP segment retransmit timeout handling.
As per RFC 1122 and Linux retransmit timeout handling:
- The segment retransmit timeout needs to exponentially increase and
  cap at a predefined value.
- TCP connection needs to timeout after a predefined number of
  segment retransmissions.
- TCP connection should not timeout when the retranmission timeout
  exceeds MaxRTO, predefined upper bound.

Fixes #2673

PiperOrigin-RevId: 311463961
2020-05-13 21:26:54 -07:00
Bhasker Hariharan 8b8774d715 Stub support for TCP_SYNCNT and TCP_WINDOW_CLAMP.
This change adds support for TCP_SYNCNT and TCP_WINDOW_CLAMP options
in GetSockOpt/SetSockOpt. This change does not really change any
behaviour in Netstack and only stores/returns the stored value.

Actual honoring of these options will be added as required.

Fixes #2626, #2625

PiperOrigin-RevId: 311453777
2020-05-13 19:49:09 -07:00
Nicolas Lacasse db655f020e Resolve remaining TODOs for tmpfs.
Closes #1197

PiperOrigin-RevId: 311438223
2020-05-13 17:36:37 -07:00
Bhasker Hariharan 8605c97136 Automated rollback of changelist 311285868
PiperOrigin-RevId: 311424257
2020-05-13 16:13:37 -07:00
Jamie Liu d846077628 Enable overlayfs_stale_read by default for runsc.
Linux 4.18 and later make reads and writes coherent between pre-copy-up and
post-copy-up FDs representing the same file on an overlay filesystem. However,
memory mappings remain incoherent:

- Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file
  residing on a lower layer is opened for read-only and then memory mapped with
  MAP_SHARED, then subsequent changes to the file are not reflected in the
  memory mapping."

- fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any
  management of coherence in the overlay.

- Experimentally on Linux 5.2:

```
$ cat mmap_cat_page.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

int main(int argc, char **argv) {
  if (argc < 2) {
    errx(1, "syntax: %s [FILE]", argv[0]);
  }
  const int fd = open(argv[1], O_RDONLY);
  if (fd < 0) {
    err(1, "open(%s)", argv[1]);
  }
  const size_t page_size = sysconf(_SC_PAGE_SIZE);
  void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
  if (page == MAP_FAILED) {
    err(1, "mmap");
  }
  for (;;) {
    write(1, page, strnlen(page, page_size));
    if (getc(stdin) == EOF) {
      break;
    }
  }
  return 0;
}

$ gcc -O2 -o mmap_cat_page mmap_cat_page.c
$ mkdir lowerdir upperdir workdir overlaydir
$ echo old > lowerdir/file
$ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir
$ ./mmap_cat_page overlaydir/file
old
^Z
[1]+  Stopped                 ./mmap_cat_page overlaydir/file
$ echo new > overlaydir/file
$ cat overlaydir/file
new
$ fg
./mmap_cat_page overlaydir/file

old
```

Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only
necessary pre-4.18, replacing existing memory mappings (in both sentry and
application address spaces) with mappings of the new FD is required regardless
of kernel version, and this latter behavior is common to both VFS1 and VFS2.
Re-document accordingly, and change the runsc flag to enabled by default.

New test:
- Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b
- After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab

PiperOrigin-RevId: 311361267
2020-05-13 10:53:37 -07:00
Bin Lu ba27514083 add arm64 support to goid
Adding a method to get g on Arm64

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-13 04:35:47 -04:00
Bin Lu a19c8f0b92 adding the methods to get/set TLS for Arm64 kvm platform
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-13 04:23:35 -04:00
Ian Gudger e4058c0355 Replace test_runner.sh bash script with Go.
PiperOrigin-RevId: 311285868
2020-05-13 01:22:42 -07:00
gVisor bot 725afc6f25 Merge pull request #2678 from nybidari:iptables
PiperOrigin-RevId: 311203776
2020-05-12 14:37:00 -07:00
Nicolas Lacasse 7b691ab73c Don't allow rename across different gofer or tmpfs mounts.
Fixes #2651.

PiperOrigin-RevId: 311193661
2020-05-12 13:43:48 -07:00
gVisor bot 6a4466a46c Merge pull request #2671 from kevinGC:skip-output
PiperOrigin-RevId: 311181084
2020-05-12 12:39:03 -07:00
Jamie Liu 8dd1d5b75a Don't call kernel.Task.Block() from netstack.SocketOperations.Write().
kernel.Task.Block() requires that the caller is running on the task goroutine.
netstack.SocketOperations.Write() uses kernel.TaskFromContext() to call
kernel.Task.Block() even if it's not running on the task goroutine. Stop doing
that.

PiperOrigin-RevId: 311178335
2020-05-12 12:26:05 -07:00
Nayana Bidari 27b1f19cab iptables: support gid match for owner matching.
- Added support for matching gid owner and invert flag for uid
and gid.
$ iptables -A OUTPUT -p tcp -m owner --gid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --uid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --gid-owner root -j DROP

- Added tests for uid, gid and invert flags.
2020-05-12 12:20:47 -07:00
gVisor bot 06ded1c437 Merge pull request #2664 from lubinszARM:pr_sigfp
PiperOrigin-RevId: 311153824
2020-05-12 10:32:16 -07:00
Jamie Liu 94251aedb4 Internal change.
PiperOrigin-RevId: 311046755
2020-05-11 20:03:25 -07:00
Kevin Krakauer 87225fad2a iptables: check for truly unconditional rules
We weren't properly checking whether the inserted default rule was
unconditional.
2020-05-11 19:50:25 -07:00
Bin Lu 9bd9882b81 Add fpsimd support in sigreturn on Arm64
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-11 21:53:29 -04:00
Jamie Liu 15de8cc9e0 Add fsimpl/gofer.InternalFilesystemOptions.OpenSocketsByConnecting.
PiperOrigin-RevId: 311014995
2020-05-11 16:14:36 -07:00
Bhasker Hariharan e838e7ab34 Automated rollback of changelist 310417191
PiperOrigin-RevId: 310963404
2020-05-11 12:09:06 -07:00
Bhasker Hariharan 0cb9e1d021 Fix view.ToVectorisedView().
view.ToVectorisedView() now just returns an empty vectorised
view if the view is of zero length. Earlier it would return
a VectorisedView of zero length but with 1 empty view. This
has been a source of bugs as lower layers don't expect
zero length views in VectorisedViews.

VectorisedView.AppendView() now is a no-op if the view being
appended is of zero length.

Fixes #2658

PiperOrigin-RevId: 310942269
2020-05-11 10:35:28 -07:00
Nicolas Lacasse c52195d258 Stop avoiding preadv2 and pwritev2, and add them to the filters.
Some code paths needed these syscalls anyways, so they should be included in
the filters. Given that we depend on these syscalls in some cases, there's no
real reason to avoid them any more.

PiperOrigin-RevId: 310829126
2020-05-10 17:52:20 -07:00
gVisor bot cfd30665c1 iptables - filter packets using outgoing interface.
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT

PiperOrigin-RevId: 310642286
2020-05-08 15:44:54 -07:00
Jamie Liu 21b71395a6 Pass flags to fsimpl/host.inode.open().
This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]")
effective, and it prevents imported pipes/sockets/character devices from being
opened with O_NONBLOCK unconditionally (because the underlying host FD was set
to non-blocking in ImportFD()).

PiperOrigin-RevId: 310596062
2020-05-08 11:35:41 -07:00
Zeling Feng 5d7d5ed7d6 Send ACK to OTW SEQs/unacc ACKs in CLOSE_WAIT
This fixed the corresponding packetimpact test.

PiperOrigin-RevId: 310593470
2020-05-08 11:23:24 -07:00
Adin Scannell 7b4a913f36 Fix ARM64 build.
The common syscall definitions mean that ARM64-exclusive files need stubs in
the ARM64 build.

PiperOrigin-RevId: 310446698
2020-05-07 15:18:47 -07:00
Sam Balana 9242d3493d Capture range variable in parallel subtests
Only the last test was running before since the goroutines won't be executed
until after this loop. I added t.Log(test.name) and this is was the result:

TestListenNoAcceptNonUnicastV4/SourceUnspecified:    DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestUnspecified:      DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestOtherMulticast:   DestOtherMulticast
TestListenNoAcceptNonUnicastV4/SourceBroadcast:      DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestOurMulticast:     DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestBroadcast:        DestOtherMulticast
TestListenNoAcceptNonUnicastV4/SourceOtherMulticast: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/SourceOurMulticast:   DestOtherMulticast

https://github.com/golang/go/wiki/TableDrivenTests#parallel-testing

PiperOrigin-RevId: 310440629
2020-05-07 14:46:51 -07:00
Jamie Liu 9115f26851 Allocate device numbers for VFS2 filesystems.
Updates #1197, #1198, #1672

PiperOrigin-RevId: 310432006
2020-05-07 14:01:53 -07:00
Bhasker Hariharan 28b5565fdd Automated rollback of changelist 309339316
PiperOrigin-RevId: 310417191
2020-05-07 12:48:23 -07:00
Nicolas Lacasse d0b1d0233d Move pkg/sentry/vfs/{eventfd,timerfd} to new packages in pkg/sentry/fsimpl.
They don't depend on anything in VFS2, so they should be their own packages.

PiperOrigin-RevId: 310416807
2020-05-07 12:44:03 -07:00
Nicolas Lacasse 26c60d7d5d Port signalfd to vfs2.
PiperOrigin-RevId: 310404113
2020-05-07 11:41:50 -07:00
Bhasker Hariharan 08f4846ebe Fix bugs in SACK recovery.
Every call to sender.NextSeg does not need to iterate from the
front of the writeList as in a given recovery episode we can cache
the last nextSeg returned. There cannot be a lower sequenced segment
that matches the next call to NextSeg as otherwise we would have
returned that instead in the previous call.

This fixes the issue of excessive CPU usage w/ large send buffers
where we spend a lot of time iterating from the front of the list on
every NextSeg invocation.

Further the following other bugs were also fixed:
  * Iteration of segments never sent in NextSeg() when looking for segments for
    retransmission that match step1/3/4 of the NextSeg algorithm
  * Correctly setting rescueRxt only if the rescue segment was actually sent.
  * Correctly initializing rescueRxt/highRxt when entering SACK recovery.
  * Correctly re-arming the timer only on retransmissions when SACK is in use
    and not for every segment being sent as it was being done before.
  * Copy over xmitTime and xmitCount on segment clone.
  * Move writeNext along when skipping over SACKED segments. This is required
    to prevent spurious retransmissions where we end up retransmitting data
    that was never lost.

PiperOrigin-RevId: 310387671
2020-05-07 10:26:00 -07:00
Dean Deng 16da7e790f Update privateunixsocket TODOs.
Synthetic sockets do not have the race condition issue in VFS2, and we will
get rid of privateunixsocket as well.

Fixes #1200.

PiperOrigin-RevId: 310386474
2020-05-07 10:20:48 -07:00
gVisor bot 553da2cdc8 Merge pull request #2639 from kevinGC:ipv4-frag-reassembly-test
PiperOrigin-RevId: 310380911
2020-05-07 09:58:30 -07:00
Dean Deng e0089a20e4 Remove outdated TODO for VFS2 AccessAt.
Fixes #1965.

PiperOrigin-RevId: 310380433
2020-05-07 09:53:52 -07:00
Kevin Krakauer 763b5ad596 Add basic incoming ipv4 fragment tests
Based on ipv6's TestReceiveIPv6Fragments.
2020-05-06 22:45:21 -07:00
gVisor bot feece24bf5 Merge pull request #2570 from lubinszARM:pr_clean
PiperOrigin-RevId: 310259686
2020-05-06 17:19:55 -07:00
Jamie Liu 7cd54c1f14 Remove vfs.FileDescriptionOptions.InvalidWrite.
Compare:
https://elixir.bootlin.com/linux/v5.6/source/fs/timerfd.c#L431
PiperOrigin-RevId: 310246908
2020-05-06 16:08:12 -07:00
Ghanan Gowripalan 485ca36adf Do not assume no DHCPv6 configurations
Do not assume that networks need any DHCPv6 configurations. Instead,
notify the NDP dispatcher in response to the first NDP RA's DHCPv6
flags, even if the flags indicate no DHCPv6 configurations are
available.

PiperOrigin-RevId: 310245068
2020-05-06 15:59:08 -07:00