Commit Graph

5382 Commits

Author SHA1 Message Date
Andrei Vagin 0e55b57452 perf/getpid: add a case when syscalls are executed via mov $XXX, %eax; syscall
This is the most often pattern of calling system calls in real applications.

PiperOrigin-RevId: 367320048
2021-04-07 16:25:02 -07:00
Tamir Duberstein e6133abfca Remove flock suppression
PiperOrigin-RevId: 367312275
2021-04-07 15:41:17 -07:00
Ghanan Gowripalan d7fd00bad1 Do not perform MLD for certain multicast scopes
...as per RFC 2710 section 5 page 10.

Test: ipv6_test.TestMLDSkipProtocol
PiperOrigin-RevId: 367031126
2021-04-06 10:15:46 -07:00
Ayush Ranjan fb4c700d06 Update gofer dentry permissions only when needed.
Without this change, we ask the gofer server to update the permissions
whenever the UID, GID or size is updated via SetStat. Consequently, we don not
generate inotify events when the permissions actually change due to SGID bit
getting cleared.

With this change, we will update the permissions only when needed and generate
inotify events.

PiperOrigin-RevId: 366946842
2021-04-05 23:48:26 -07:00
Mithun Iyer 56c69fb0e7 Fix listen backlog handling to be in parity with Linux
- Change the accept queue full condition for a listening endpoint
  to only honor completed (and delivered) connections.
- Use syncookies if the number of incomplete connections is beyond
  listen backlog. This also cleans up the SynThreshold option code
  as that is no longer used with this change.
- Added a new stack option to unconditionally generate syncookies.
  Similar to sysctl -w net.ipv4.tcp_syncookies=2 on Linux.
- Enable keeping of incomplete connections beyond listen backlog.
- Drop incoming SYNs only if the accept queue is filled up.
- Drop incoming ACKs that complete handshakes when accept queue is full
- Enable the stack to accept one more connection than programmed by
  listen backlog.
- Handle backlog argument being zero, negative for listen, as Linux.
- Add syscall and packetimpact tests to reflect the changes above.
- Remove TCPConnectBacklog test which is polling for completed
  connections on the client side which is not reflective of whether
  the accept queue is filled up by the test. The modified syscall test
  in this CL addresses testing of connecting sockets.

Fixes #3153

PiperOrigin-RevId: 366935921
2021-04-05 21:53:41 -07:00
Rahat Mahmood 7a7fcf2dba Report task CPU usage through the cpuacct cgroup controller.
PiperOrigin-RevId: 366923274
2021-04-05 19:46:26 -07:00
Chong Cai 63340e6138 Add initial verity ioctl syscall tests
PiperOrigin-RevId: 366907152
2021-04-05 17:31:48 -07:00
Fabricio Voznika 661e5ae7ae Enable Checkpoint/Restore test with VFS2
Closes #3373

PiperOrigin-RevId: 366903991
2021-04-05 17:10:01 -07:00
Fabricio Voznika 198e0dcde2 Add fsstress on tmpfs to presubmit
Updates #5273

PiperOrigin-RevId: 366902314
2021-04-05 17:00:25 -07:00
Rahat Mahmood 88f198c2a9 Allow default control values to be set for cgroupfs.
PiperOrigin-RevId: 366891806
2021-04-05 16:06:11 -07:00
Ayush Ranjan 2d9095c7a6 Actually don't run unlink_benchmark with TSAN.
This benchmark currently takes > 15 minutes to run in that case.

PiperOrigin-RevId: 366891726
2021-04-05 16:01:39 -07:00
Kevin Krakauer e7b2023647 deflake semaphore test
There's no reason to actually increment the semaphore, it just introduces the
chance of a race.

PiperOrigin-RevId: 366851795
2021-04-05 12:37:31 -07:00
Chong Cai e21a71bff1 Allow user mount for verity fs
Allow user mounting a verity fs on an existing mount by specifying mount
flags root_hash and lower_path.

PiperOrigin-RevId: 366843846
2021-04-05 12:01:44 -07:00
Chong Cai 58afd120d3 Set Verity bit in verity_prepare cmd
This is needed to enable Xattrs features required by verity.

PiperOrigin-RevId: 366843640
2021-04-05 11:56:59 -07:00
Fabricio Voznika 3007ae647d Fail tests when container returns non-zero status
PiperOrigin-RevId: 366839955
2021-04-05 11:39:53 -07:00
Adin Scannell 8161ed4110 Don't run unlink_benchmark with TSAN.
This benchmark currently takes > 15 minutes to run in that case.

PiperOrigin-RevId: 366817185
2021-04-05 09:57:35 -07:00
Adin Scannell 9a8692c82a Remove eternal and enormous tests.
PiperOrigin-RevId: 366573366
2021-04-03 00:18:34 -07:00
Rahat Mahmood 932c8abd0f Implement cgroupfs.
A skeleton implementation of cgroupfs. It supports trivial cpu and
memory controllers with no support for hierarchies.

PiperOrigin-RevId: 366561126
2021-04-02 21:10:44 -07:00
gVisor bot a0c1674478 Internal change.
PiperOrigin-RevId: 366555466
2021-04-02 20:02:26 -07:00
Rahat Mahmood 491b106d62 Implement the runsc verity-prepare command.
Implement a new runsc command to set up a sandbox with verityfs and
run the measure tool. This is loosely forked from the do command, and
currently requires the caller to provide the measure tool binary.

PiperOrigin-RevId: 366553769
2021-04-02 19:34:50 -07:00
Zach Koopmans 1b53550e55 Add vfs1 to go/runsc-benchmarks
PiperOrigin-RevId: 366470480
2021-04-02 10:41:23 -07:00
gVisor bot cc762235ce Internal change.
PiperOrigin-RevId: 366462448
2021-04-02 09:58:19 -07:00
Bhasker Hariharan b2ea37401e Internal changes
PiperOrigin-RevId: 366344805
2021-04-01 15:40:07 -07:00
Adin Scannell 513de4039c Remove invalid dependency.
PiperOrigin-RevId: 366344222
2021-04-01 15:34:56 -07:00
Andrei Vagin eb9b8e53a3 platform/kvm/x86: restore mxcsr when switching from guest to sentry
Goruntime sets mxcsr once and never changes it.

Reported-by: syzbot+ec55cea6e57ec083b7a6@syzkaller.appspotmail.com
Fixes: #5754
2021-04-01 13:28:15 -07:00
gVisor bot 6c10c772e4 Internal change.
PiperOrigin-RevId: 366292533
2021-04-01 11:24:04 -07:00
Fabricio Voznika 71f3dccbb3 Fix panic when overriding /dev files with VFS2
VFS1 skips over mounts that overrides files in /dev because the list of
files is hardcoded. This is not needed for VFS2 and a recent change
lifted this restriction. However, parts of the code were still skipping
/dev mounts even in VFS2, causing the loader to panic when it ran short
of FDs to connect to the gofer.

PiperOrigin-RevId: 365858436
2021-03-30 11:36:55 -07:00
Zach Koopmans 8a2f7e716d [syserror] Split usermem package
Split usermem package to help remove syserror dependency in go_marshal.
New hostarch package contains code not dependent on syserror.

PiperOrigin-RevId: 365651233
2021-03-29 13:30:21 -07:00
gVisor bot b125afba41 Merge pull request #5728 from zhlhahaha:2091
PiperOrigin-RevId: 365613394
2021-03-29 10:57:46 -07:00
Ayush Ranjan da6ddd1df8 [perf] Reduce contention in ptrace.threadPool.lookupOrCreate().
lookupOrCreate is called from subprocess.switchToApp() and subprocess.syscall().
lookupOrCreate() looks for a thread already created for the current TID. If a
thread exists (common case), it returns immediately. Otherwise it creates a new
one.

This change switches to using a sync.RWMutex. The initial thread existence
lookup is now done only with the read lock. So multiple successful lookups can
occur concurrently. Only when a new thread is created will it acquire the lock
for writing and update the map (which is not the common case).

Discovered in mutex profiles from the various ptrace benchmarks.
Example: https://gvisor.dev/profile/gvisor-buildkite/fd14bfad-b30f-44dc-859b-80ebac50beb4/843827db-da50-4dc9-a2ea-ecf734dde2d5/tmp/profile/ptrace/BenchmarkFio/operation.write/blockSize.4K/filesystem.tmpfs/benchmarks/fio/mutex.pprof/flamegraph
PiperOrigin-RevId: 365612094
2021-03-29 10:52:19 -07:00
Robin Luk 72cd22163f arm64 ring0: don't use inner-sharable to invalidate tlb
It is enough to invalidate the tlb of local vcpu in switch().
TLBI with inner-sharable will invalidate the tlb in other vcpu.

Arm64 hardware supports at least 256 pcid, so I think it's ok
to set the length of pcid pool to 128.

Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
2021-03-26 16:10:21 +08:00
Jamie Liu fbec65fc3f Use seqfile.SeqHandles correctly in VFS1 /proc/net/.
Before this change:

```
$ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = EOF
$ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = 256
```

After this change:

```
$ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = 256
$ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024
#1: read(128) = 128
#2: read(1024) = 256
```

Fixes #5732

PiperOrigin-RevId: 365178386
2021-03-25 20:27:38 -07:00
Jamie Liu 79bc446fac Lock TaskSet mutex for writing in ptraceClone().
This is necessary since ptraceClone() mutates tracer.ptraceTracees.

PiperOrigin-RevId: 365152396
2021-03-25 16:50:06 -07:00
Kevin Krakauer 6b085ba477 setgid: skip tests when we can't find usable GIDs
PiperOrigin-RevId: 365092320
2021-03-25 12:00:24 -07:00
Howard Zhang 253f180c69 Fix comments error
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2021-03-25 17:39:45 +08:00
Howard Zhang a01fc7108f Fix nogo test error
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2021-03-25 17:39:28 +08:00
Ian Lewis c27fac421b Fix path to runsc in CNI tutorial.
PiperOrigin-RevId: 364931406
2021-03-24 18:01:05 -07:00
Ian Lewis e4772bd845 Fix highlighting sidebar menu on the website
Highlighting previously highlighted multiple items in the sidebar if the had
the same page name (not full url). This change simplifies this by adding the
highlight class in the jekyll template rather than javascript, and highlights
only the correct page.

PiperOrigin-RevId: 364931350
2021-03-24 17:56:40 -07:00
Bhasker Hariharan e7ca2a51a8 Add POLLRDNORM/POLLWRNORM support.
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather
than hack these on in sys_poll etc it felt cleaner to just cleanup
the call sites to notify for both events. This is what linux does
as well.

Fixes #5544

PiperOrigin-RevId: 364859977
2021-03-24 12:11:44 -07:00
Bhasker Hariharan 72ff6a1cac Fix data race in fdbased when accessing fanoutID.
PiperOrigin-RevId: 364859173
2021-03-24 12:07:10 -07:00
Nick Brown ec0aa657ed Unexpose immutable fields in stack.Route
This change sets the inner `routeInfo` struct to be a named private member
and replaces direct access with access through getters. Note that direct
access to the fields of `routeInfo` is still possible through the `RouteInfo`
struct.

Fixes #4902

PiperOrigin-RevId: 364822872
2021-03-24 09:38:27 -07:00
gVisor bot 8ee4a3f6d0 Merge pull request #5677 from avagin:kvm-mmio
PiperOrigin-RevId: 364728696
2021-03-23 22:50:14 -07:00
Andrei Vagin 56a9a13976 Move the code that manages floating-point state to a separate package
This change is inspired by Adin's cl/355256448.

PiperOrigin-RevId: 364695931
2021-03-23 18:46:37 -07:00
Fabricio Voznika 960155cdaa Add --file-access-mounts flag
--file-access-mounts flag is similar to --file-access, but controls
non-root mounts that were previously mounted in shared mode only.
This gives more flexibility to control how mounts are shared within
a container.

PiperOrigin-RevId: 364669882
2021-03-23 16:21:12 -07:00
Kevin Krakauer 92374e5197 setgid directory support in goferfs
Also adds support for clearing the setuid bit when appropriate (writing,
truncating, changing size, changing UID, or changing GID).

VFS2 only.

PiperOrigin-RevId: 364661835
2021-03-23 15:42:12 -07:00
Rahat Mahmood acb4c62885 Skip checklocks analysis for stateify generated code.
Stateify methods are always called without holding the appropriate
locks. The system is paused and we know there will be no mutations
when we call Save/Load, so this is perfectly safe. However, checklocks
can't know about this, and it will always complain.

Mark stateify generated methods that touch struct fields as
"checklocksignore" to avoid this.

PiperOrigin-RevId: 364610241
2021-03-23 11:56:59 -07:00
Chong Cai beb11cec76 Allow FSETXATTR/FGETXATTR host calls for Verity
These host calls are needed for Verity fs to generate/verify hashes.

PiperOrigin-RevId: 364598180
2021-03-23 11:06:02 -07:00
Nayana Bidari dc75f08c2a Use constant (TestInitialSequenceNumber) instead of integer (789) in tests.
PiperOrigin-RevId: 364596526
2021-03-23 10:59:57 -07:00
Zach Koopmans 98f378d9ef Split fio read/write and randread/randwrite operations
The fio benchmark was changed to a fixed size read/write ammount
because the timed benchmark was overwhelming machine memory on
tmpfs mounts.

Now rand(read|write) operations are prohibitively long, leading to timeouts.

Split the benchmarks as they were in python bm-tools: the read/write as
fixed sized (1GB) and the rand(read|write) as timed operations (15s).

PiperOrigin-RevId: 364584436
2021-03-23 10:11:26 -07:00
Ghanan Gowripalan 409a114454 Explicitly allow martian loopback packets
...instead of opting out of them.

Loopback traffic should be stack-local but gVisor has some clients
that depend on the ability to receive loopback traffic that originated
from outside of the stack. Because of this, we guard this change behind
IP protocol options.

A previous change provided the facility to deny these martian loopback
packets but this change requires client to opt-in to accepting martian
loopback packets as accepting martian loopback packets are not meant
to be accepted, as per RFC 1122 section 3.2.1.3.g:

        (g)  { 127, <any> }

             Internal host loopback address.  Addresses of this form
             MUST NOT appear outside a host.

PiperOrigin-RevId: 364581174
2021-03-23 09:57:01 -07:00