Commit Graph

1402 Commits

Author SHA1 Message Date
Andrei Vagin e9ea7230f7 fs: synchronize concurrent writes into files with O_APPEND
For files with O_APPEND, a file write operation gets a file size and uses it as
offset to call an inode write operation. This means that all other operations
which can change a file size should be blocked while the write operation doesn't
complete.

PiperOrigin-RevId: 254873771
2019-06-24 17:45:02 -07:00
Adin Scannell 7f5d0afe52 Add O_EXITKILL to ptrace options.
This prevents a race before PDEATH_SIG can take effect during
a sentry crash.

Discovered and solution by avagin@.

PiperOrigin-RevId: 254871534
2019-06-24 17:30:01 -07:00
Rahat Mahmood 94a6bfab5d Implement /proc/net/tcp.
PiperOrigin-RevId: 254854346
2019-06-24 15:56:36 -07:00
Andrei Vagin c5486f5122 platform/ptrace: specify PTRACE_O_TRACEEXIT for stub-processes
The tracee is stopped early  during  process  exit,  when registers are still
available, allowing the tracer to see where the exit occurred, whereas the
normal exit  notifi? cation  is  done  after  the process is finished exiting.

Without this option, dumpAndPanic fails to get registers.

PiperOrigin-RevId: 254852917
2019-06-24 15:48:58 -07:00
Nicolas Lacasse 87df9aab24 Use correct statx syscall number for amd64.
The previous number was for the arm architecture.

Also change the statx tests to force them to run on gVisor, which would have
caught this issue.

PiperOrigin-RevId: 254846831
2019-06-24 15:19:36 -07:00
Fabricio Voznika b21b1db700 Allow to change logging options using 'runsc debug'
New options are:
  runsc debug --strace=off|all|function1,function2
  runsc debug --log-level=warning|info|debug
  runsc debug --log-packets=true|false

Updates #407

PiperOrigin-RevId: 254843128
2019-06-24 15:03:02 -07:00
Nicolas Lacasse 35719d52c7 Implement statx.
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.

Fixes #343

PiperOrigin-RevId: 254575752
2019-06-22 13:29:26 -07:00
Bhasker Hariharan c1761378a9 Fix the logic for sending zero window updates.
Today we have the logic split in two places between endpoint Read() and the
worker goroutine which actually sends a zero window. This change makes it so
that when a zero window ACK is sent we set a flag in the endpoint which can be
read by the endpoint to decide if it should notify the worker to send a
nonZeroWindow update.

The worker now does not do the check again but instead sends an ACK and flips
the flag right away.

Similarly today when SO_RECVBUF is set the SetSockOpt call has logic
to decide if a zero window update is required. Rather than do that we move
the logic to the worker goroutine and it can check the zeroWindow flag
and send an update if required.

PiperOrigin-RevId: 254505447
2019-06-21 18:31:31 -07:00
Andrei Vagin ab6774cebf gvisor/fs: getdents returns 0 if offset is equal to FileMaxOffset
FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called.

PiperOrigin-RevId: 254498777
2019-06-21 17:25:17 -07:00
Michael Pratt 6f933a934f Remove O(n) lookup on unlink/rename
Currently, the path tracking in the gofer involves an O(n) lookup of
child fidRefs. This causes a significant overhead on unlinks in
directories with lots of child fidRefs (<4k).

In this transition, pathNode moves from sync.Map to normal synchronized
maps. There is a small chance of contention in walk, but the lock is
held for a very short time (and sync.Map also had a chance of requiring
locking).

OTOH, sync.Map makes it very difficult to add a fidRef reverse map.

PiperOrigin-RevId: 254489952
2019-06-21 16:27:26 -07:00
Brad Burlage ae4ef32b8c Deflake TestSimpleReceive failures due to timeouts
This test will occasionally fail waiting to read a packet. From repeated runs,
I've seen it up to 1.5s for waitForPackets to complete.

PiperOrigin-RevId: 254484627
2019-06-21 15:56:12 -07:00
Ayush Ranjan 727375321f ext4 block group descriptor implementation in disk layout package.
PiperOrigin-RevId: 254482180
2019-06-21 15:42:46 -07:00
Jamie Liu e806466fc5 Add //pkg/flipcall.
Flipcall is a (conceptually) simple local-only RPC mechanism. Compared
to unet, Flipcall does not support passing FDs (support for which will
be provided out of band by another package), requires users to establish
connections manually, and requires user management of concurrency since
each connected Endpoint pair supports only a single RPC at a time;
however, it improves performance by using shared memory for data
(reducing memory copies) and using futexes for control signaling (which
is much cheaper than sendto/recvfrom/sendmsg/recvmsg).

PiperOrigin-RevId: 254471986
2019-06-21 14:47:04 -07:00
Fabricio Voznika 5ba16d51a9 Add list of stuck tasks to panic message
PiperOrigin-RevId: 254450309
2019-06-21 12:46:53 -07:00
Michael Pratt c0317b28cb Update pathNode documentation to reflect reality
Neither fidRefs or children are (directly) synchronized by mu. Remove
the preconditions that say so.

That said, the surrounding does enforce some synchronization guarantees
(e.g., fidRef.renameChildTo does not atomically replace the child in the
maps). I've tried to note the need for callers to do this
synchronization.

I've also renamed the maps to what are (IMO) clearer names. As is, it is
not obvious that pathNode.fidRefs is a map of *child* fidRefs rather
than self fidRefs.

PiperOrigin-RevId: 254446965
2019-06-21 12:26:42 -07:00
Andrei Vagin f94653b3de kernel: call t.mu.Unlock() explicitly in WithMuLocked
defer here doesn't improve readability, but we know it slower that
the explicit call.

PiperOrigin-RevId: 254441473
2019-06-21 11:55:42 -07:00
Nicolas Lacasse 335fd987b0 Delete dangling comment line.
This was from an old comment, which was superseded by the
existing comment which is correct.

PiperOrigin-RevId: 254434535
2019-06-21 11:24:12 -07:00
Fabricio Voznika 054b5632ef Update comment
PiperOrigin-RevId: 254428866
2019-06-21 10:56:42 -07:00
Ian Gudger dc36c34a76 Close FD on TcpSocketTest loop failure.
This helps prevent the blocking call from getting stuck and causing a test
timeout.

PiperOrigin-RevId: 254325926
2019-06-20 20:40:31 -07:00
Neel Natu 3c7448ab6f Deflake TestSIGALRMToMainThread.
Bump up the threshold on number of SIGALRMs received by worker
threads from 50 to 200. Even with the new threshold we still
expect that the majority of SIGALRMs are received by the
thread group leader.

PiperOrigin-RevId: 254289787
2019-06-20 15:58:18 -07:00
Jamie Liu 7db8685100 Preallocate auth.NewAnonymousCredentials() in contexttest.TestContext.
Otherwise every call to, say, fs.ContextCanAccessFile() in a benchmark
using contexttest allocates new auth.Credentials, a new
auth.UserNamespace, ...

PiperOrigin-RevId: 254261051
2019-06-20 13:36:14 -07:00
Michael Pratt 292f70cbf7 Add package docs to seqfile and ramfs
These are the only packages missing docs:
https://godoc.org/gvisor.dev/gvisor

PiperOrigin-RevId: 254261022
2019-06-20 13:34:33 -07:00
Rahat Mahmood ddc1d94a37 Unmark amutex_test as flaky.
PiperOrigin-RevId: 254254058
2019-06-20 12:58:04 -07:00
Neel Natu 0b2135072d Implement madvise(MADV_DONTFORK)
PiperOrigin-RevId: 254253777
2019-06-20 12:56:00 -07:00
Michael Pratt b46ec3704b Drop extra character
PiperOrigin-RevId: 254237530
2019-06-20 11:31:17 -07:00
Ian Gudger 7e49515696 Deflake SendFileTest_Shutdown.
The sendfile syscall's backing doSplice contained a race with regard to
blocking. If the first attempt failed with syserror.ErrWouldBlock and then
the blocking file became ready before registering a waiter, we would just
return the ErrWouldBlock (even if we were supposed to block).

PiperOrigin-RevId: 254114432
2019-06-19 18:40:54 -07:00
Michael Pratt c2d87d5d7c Mark tcp_socket test flaky (for real)
The tag on the binary has no effect. It must be on the test.

PiperOrigin-RevId: 254103480
2019-06-19 17:18:11 -07:00
Nicolas Lacasse 9781128d5a Deflake mount_test.
Inode ids are only stable across Save/Restore if we have an open FD on the
inode. All tests that compare inode ids must therefor hold an FD open.

PiperOrigin-RevId: 254086603
2019-06-19 15:46:11 -07:00
Michael Pratt 773423a997 Abort loop on failure
As-is, on failure these will infinite loop, resulting in test timeout
instead of failure.

PiperOrigin-RevId: 254074989
2019-06-19 14:48:18 -07:00
Michael Pratt 9d2efaac5a Add renamed children pathNodes to target parent
Otherwise future renames may miss Renamed calls.

PiperOrigin-RevId: 254060946
2019-06-19 13:41:07 -07:00
Nicolas Lacasse 29f9e4fa87 fileOp{On,At} should pass the remaning symlink traversal count.
And methods that do more traversals should use the remaining count rather than
resetting.

PiperOrigin-RevId: 254041720
2019-06-19 11:56:34 -07:00
Nicolas Lacasse f7428af9c1 Add MountNamespace to task.
This allows tasks to have distinct mount namespace, instead of all sharing the
kernel's root mount namespace.

Currently, the only way for a task to get a different mount namespace than the
kernel's root is by explicitly setting a different MountNamespace in
CreateProcessArgs, and nothing does this (yet).

In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a
new container inside runsc.

Note that "MountNamespace" is a poor term for this thing. It's more like a
distinct VFS tree. When we get around to adding real mount namespaces, this
will need a better naem.

PiperOrigin-RevId: 254009310
2019-06-19 09:21:21 -07:00
Neel Natu 0d1dc50b70 Mark tcp_socket test flaky.
PiperOrigin-RevId: 253997465
2019-06-19 08:08:12 -07:00
Fabricio Voznika ca245a428b Attempt to fix TestPipeWritesAccumulate
Test fails because it's reading 4KB instead of the
expected 64KB. Changed the test to read pipe buffer
size instead of hardcode and added some logging in
case the reason for failure was not pipe buffer size.

PiperOrigin-RevId: 253916040
2019-06-18 19:16:11 -07:00
Rahat Mahmood 546b2948cb Use return values from syscalls in eventfd tests.
PiperOrigin-RevId: 253890611
2019-06-18 16:21:56 -07:00
Fabricio Voznika 0e07c94d54 Kill sandbox process when 'runsc do' exits
PiperOrigin-RevId: 253882115
2019-06-18 15:36:17 -07:00
Fabricio Voznika bdb19b82ef Add Container/Sandbox args struct for creation
There were 3 string arguments that could be easily misplaced
and it makes it easier to add new arguments, especially for
Container that has dozens of callers.

PiperOrigin-RevId: 253872074
2019-06-18 14:46:49 -07:00
Brad Burlage 2e1379867a Replace usage of deprecated strtoul/strtoull
PiperOrigin-RevId: 253864770
2019-06-18 14:18:47 -07:00
Fabricio Voznika ec15fb1162 Fix PipeTest_Streaming timeout
Test was calling Size() inside read and write loops. Size()
makes 2 syscalls to return the pipe size, making the test
do a lot more work than it should.

PiperOrigin-RevId: 253824690
2019-06-18 11:03:33 -07:00
Andrei Vagin 8ab0848c70 gvisor/fs: don't update file.offset for sockets, pipes, etc
sockets, pipes and other non-seekable file descriptors don't
use file.offset, so we don't need to update it.

With this change, we will be able to call file operations
without locking the file.mu mutex. This is already used for
pipes in the splice system call.

PiperOrigin-RevId: 253746644
2019-06-18 01:43:29 -07:00
Andrei Vagin 3d1e44a677 gvisor/kokoro: don't modify tests names in the BUILD file
PiperOrigin-RevId: 253746380
2019-06-18 01:41:29 -07:00
Andrei Vagin 66cc0e9f92 gvisor/bazel: use python2 to build runsc-debian
$ bazel build runsc:runsc-debian
  File ".../bazel_tools/tools/build_defs/pkg/make_deb.py", line 311,
  in GetFlagValue:
    flagvalue = flagvalue.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'

make_deb.py is incompatible with Python3.
https://github.com/bazelbuild/bazel/issues/8443

PiperOrigin-RevId: 253691923
2019-06-17 17:09:06 -07:00
gVisor bot 99d286370d Internal change.
PiperOrigin-RevId: 253559564
2019-06-17 05:13:55 -07:00
Bhasker Hariharan a8608c501b Enable Receive Buffer Auto-Tuning for runsc.
Updates #230

PiperOrigin-RevId: 253225078
2019-06-14 07:31:45 -07:00
Bhasker Hariharan 3d71c627fa Add support for TCP receive buffer auto tuning.
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.

This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.

The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf

NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.

Updates #230

PiperOrigin-RevId: 253168283
2019-06-13 22:28:01 -07:00
Ian Gudger 3e9b8ecbfe Plumb context through more layers of filesytem.
All functions which allocate objects containing AtomicRefCounts will soon need
a context.

PiperOrigin-RevId: 253147709
2019-06-13 18:40:38 -07:00
Ian Gudger 0a5ee6f7b2 Fix deadlock in fasync.
The deadlock can occur when both ends of a connected Unix socket which has
FIOASYNC enabled on at least one end are closed at the same time. One end
notifies that it is closing, calling (*waiter.Queue).Notify which takes
waiter.Queue.mu (as a read lock) and then calls (*FileAsync).Callback, which
takes FileAsync.mu. The other end tries to unregister for notifications by
calling (*FileAsync).Unregister, which takes FileAsync.mu and calls
(*waiter.Queue).EventUnregister which takes waiter.Queue.mu.

This is fixed by moving the calls to waiter.Waitable.EventRegister and
waiter.Waitable.EventUnregister outside of the protection of any mutex used
in (*FileAsync).Callback.

The new test is related, but does not cover this particular situation.

Also fix a data race on FileAsync.e.Callback. (*FileAsync).Callback checked
FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter
calling (*FileAsync).Callback could not and did not. This is fixed by making
FileAsync.e.Callback immutable before passing it to the waiter for the first
time.

Fixes #346

PiperOrigin-RevId: 253138340
2019-06-13 17:26:22 -07:00
Rahat Mahmood 05ff1ffaad Implement getsockopt() SO_DOMAIN, SO_PROTOCOL and SO_TYPE.
SO_TYPE was already implemented for everything but netlink sockets.

PiperOrigin-RevId: 253138157
2019-06-13 17:24:51 -07:00
Shentubot 7b0f068258 Merge pull request #306 from amscanne:add_readme
PiperOrigin-RevId: 253137638
2019-06-13 17:20:49 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00