Commit Graph

1534 Commits

Author SHA1 Message Date
Nicolas Lacasse d3f97aec49 Remove events from name_to_handle_at and open_by_handle_at.
These syscalls require filesystem support that gVisor does not provide, and is
not planning to implement. Their absense should not trigger an event.

PiperOrigin-RevId: 255692871
2019-06-28 16:50:24 -07:00
Ayush Ranjan c4da599e22 ext4: disklayout: SuperBlock interface implementations.
PiperOrigin-RevId: 255687771
2019-06-28 16:18:29 -07:00
Nicolas Lacasse cf51e77d6d Fix suggestions from clang.
PiperOrigin-RevId: 255679603
2019-06-28 15:32:29 -07:00
Nicolas Lacasse 295078fa7a Automated rollback of changelist 255263686
PiperOrigin-RevId: 255679453
2019-06-28 15:28:41 -07:00
Andrei Vagin e21d49c2d8 platform/ptrace: return more detailed errors
Right now, if we can't create a stub process, we will see this error:
panic: unable to activate mm: resource temporarily unavailable

It would be better to know the root cause of this "resource temporarily
unavailable".

PiperOrigin-RevId: 255656831
2019-06-28 13:23:36 -07:00
Ayush Ranjan 7c13789818 Superblock interface in the disk layout package for ext4.
PiperOrigin-RevId: 255644277
2019-06-28 12:07:28 -07:00
Andrei Vagin 8a625ceeb1 runsc: allow openat for runsc-race
I see that runsc-race is killed by SIGSYS, because openat isn't
allowed by seccomp filters:
60052 openat(AT_FDCWD, "/proc/sys/vm/overcommit_memory",
			O_RDONLY|O_CLOEXEC <unfinished ...>
60052 <... openat resumed> )            = 257
60052 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0xfaacf1,
		si_syscall=__NR_openat, si_arch=AUDIT_ARCH_X86_64} ---

PiperOrigin-RevId: 255640808
2019-06-28 11:49:45 -07:00
Yong He c61d7761b4 Fix deadloop in proc subtask list
Readdir of /proc/x/task/ will get direntry entries
from tasks of specified taskgroup. Now the tasks
slice is unsorted, use sort.SearchInts search entry
from the slice may cause infinity loops.
The fix is sort the slice before search.
This issue could be easily reproduced via following
steps, revise Readdir in pkg/sentry/fs/proc/task.go,
force set taskInts into test slice
[]int{1, 11, 7, 5, 10, 6, 8, 3, 9, 2, 4},
then run docker image and run ls /proc/1/task, the
command will cause infinity loops.
2019-06-28 22:20:57 +08:00
Fabricio Voznika b2907595e5 Complete pipe support on overlayfs
Get/Set pipe size and ioctl support were missing from
overlayfs. It required moving the pipe.Sizer interface
to fs so that overlay could get access.

Fixes #318

PiperOrigin-RevId: 255511125
2019-06-27 17:22:53 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Michael Pratt 085a907565 Cache directory entries in the overlay
Currently, the overlay dirCache is only used for a single logical use of
getdents. i.e., it is discard when the FD is closed or seeked back to
the beginning.

But the initial work of getting the directory contents can be quite
expensive (particularly sorting large directories), so we should keep it
as long as possible.

This is very similar to the readdirCache in fs/gofer.

Since the upper filesystem does not have to allow caching readdir
entries, the new CacheReaddir MountSourceOperations method controls this
behavior.

This caching should be trivially movable to all Inodes if desired,
though that adds an additional copy step for non-overlay Inodes.
(Overlay Inodes already do the extra copy).

PiperOrigin-RevId: 255477592
2019-06-27 14:24:03 -07:00
Andrei Vagin e276083903 gvisor/ptrace: grub initial thread registers only once
PiperOrigin-RevId: 255465635
2019-06-27 13:59:57 -07:00
gVisor bot 7188790f92 Merge pull request #461 from brb-g:128_procseekend
PiperOrigin-RevId: 255462850
2019-06-27 13:58:14 -07:00
Fabricio Voznika 42e212f6b7 Preserve permissions when checking lower
The code was wrongly assuming that only read access was
required from the lower overlay when checking for permissions.
This allowed non-writable files to be writable in the overlay.

Fixes #316

PiperOrigin-RevId: 255263686
2019-06-26 14:24:44 -07:00
Nicolas Lacasse 857e5c47e9 Follow symlinks when creating a file, and create the target.
If we have a symlink whose target does not exist, creating the symlink (either
via 'creat' or 'open' with O_CREAT flag) should create the target of the
symlink. Previously, gVisor would error with EEXIST in this case

PiperOrigin-RevId: 255232944
2019-06-26 11:49:20 -07:00
Nicolas Lacasse 67e2f227aa Always set SysProcAttr.Ctty to an FD in the child's FD table.
Go was going to change the behavior of SysProcAttr.Ctty such that it must be an
FD in the *parent* FD table:
https://go-review.googlesource.com/c/go/+/178919/

However, after some debate, it was decided that this change was too
backwards-incompatible, and so it was reverted.
https://github.com/golang/go/issues/29458

The behavior going forward is unchanged: the Ctty FD must be an FD in the
*child* FD table.

PiperOrigin-RevId: 255228476
2019-06-26 11:27:31 -07:00
Michael Pratt e98ce4a2c6 Add TODO reminder to remove tmpfs caching options
Updates #179

PiperOrigin-RevId: 255081565
2019-06-25 17:12:34 -07:00
Jamie Liu ffee0f36b1 Add //pkg/fdchannel.
To accompany flipcall connections in cases where passing FDs is required
(as for gofers).

PiperOrigin-RevId: 255062277
2019-06-25 15:38:11 -07:00
Nicolas Lacasse a8f148b8e4 Use different Ctty FDs based on the go version.
An upcoming change in Go 1.13 [1] changes the semantics of the SysProcAttr.Ctty
field. Prior to the change, the FD must be an FD in the child process's FD
table (aka "post-shuffle"). After the change, the FD must be an FD in the
current process's FD table (aka "pre-shuffle").

To be compatible with both versions this CL introduces a new boolean
"CttyFdIsPostShuffle" which indicates whether a pre- or post-shuffle FD should
be provided. We use build tags to chose the correct one.

1: https://go-review.googlesource.com/c/go/+/178919/
PiperOrigin-RevId: 255015303
2019-06-25 11:47:27 -07:00
Andrei Vagin 03ae91c662 gvisor: lockless read access for task credentials
Credentials are immutable and even before these changes we could read them
without locks, but we needed to take a task lock to get a credential object
from a task object.

It is possible to avoid this lock, if we will guarantee that a credential
object will not be changed after setting it on a task.

PiperOrigin-RevId: 254989492
2019-06-25 09:52:49 -07:00
Andrei Vagin fd16a329ce fsgopher: reopen files via /proc/self/fd
When we reopen file by path, we can't be sure that
we will open exactly the same file. The file can be
deleted and another one with the same name can be
created.

PiperOrigin-RevId: 254898594
2019-06-24 21:44:27 -07:00
Adrien Leravat 3688e6e99d Add CLOCK_BOOTTIME as a CLOCK_MONOTONIC alias
Makes CLOCK_BOOTTIME available with
* clock_gettime
* timerfd_create
* clock_gettime vDSO

CLOCK_BOOTTIME is implemented as an alias to CLOCK_MONOTONIC.
CLOCK_MONOTONIC already keeps track of time across save
and restore. This is the closest possible behavior to Linux
CLOCK_BOOTIME, as there is no concept of suspend/resume.

Updates google/gvisor#218
2019-06-24 21:14:38 -07:00
Andrei Vagin e9ea7230f7 fs: synchronize concurrent writes into files with O_APPEND
For files with O_APPEND, a file write operation gets a file size and uses it as
offset to call an inode write operation. This means that all other operations
which can change a file size should be blocked while the write operation doesn't
complete.

PiperOrigin-RevId: 254873771
2019-06-24 17:45:02 -07:00
Adin Scannell 7f5d0afe52 Add O_EXITKILL to ptrace options.
This prevents a race before PDEATH_SIG can take effect during
a sentry crash.

Discovered and solution by avagin@.

PiperOrigin-RevId: 254871534
2019-06-24 17:30:01 -07:00
Rahat Mahmood 94a6bfab5d Implement /proc/net/tcp.
PiperOrigin-RevId: 254854346
2019-06-24 15:56:36 -07:00
Andrei Vagin c5486f5122 platform/ptrace: specify PTRACE_O_TRACEEXIT for stub-processes
The tracee is stopped early  during  process  exit,  when registers are still
available, allowing the tracer to see where the exit occurred, whereas the
normal exit  notifi? cation  is  done  after  the process is finished exiting.

Without this option, dumpAndPanic fails to get registers.

PiperOrigin-RevId: 254852917
2019-06-24 15:48:58 -07:00
Nicolas Lacasse 87df9aab24 Use correct statx syscall number for amd64.
The previous number was for the arm architecture.

Also change the statx tests to force them to run on gVisor, which would have
caught this issue.

PiperOrigin-RevId: 254846831
2019-06-24 15:19:36 -07:00
Fabricio Voznika b21b1db700 Allow to change logging options using 'runsc debug'
New options are:
  runsc debug --strace=off|all|function1,function2
  runsc debug --log-level=warning|info|debug
  runsc debug --log-packets=true|false

Updates #407

PiperOrigin-RevId: 254843128
2019-06-24 15:03:02 -07:00
brb-g 6f0a7de44b Add regression test for #128 (fixed in ab6774ce)
Tests run at HEAD (35719d52):
```
$ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)')
<snip>
//test/syscalls:getdents_test_native                                     PASSED in 0.3s
//test/syscalls:getdents_test_runsc_ptrace                               PASSED in 4.9s
//test/syscalls:getdents_test_runsc_ptrace_overlay                       PASSED in 4.7s
//test/syscalls:getdents_test_runsc_ptrace_shared                        PASSED in 5.2s
//test/syscalls:getdents_test_runsc_kvm                                  FAILED in 4.0s
```

Tests run at ab6774ce~1 (6f933a93):
```
$ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)')
//test/syscalls:getdents_test_native                                     PASSED in 0.2s
//test/syscalls:getdents_test_runsc_kvm                                  FAILED in 4.2s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_kvm/test.log
//test/syscalls:getdents_test_runsc_ptrace                               FAILED in 5.3s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace/test.log
//test/syscalls:getdents_test_runsc_ptrace_overlay                       FAILED in 4.9s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_overlay/test.log
//test/syscalls:getdents_test_runsc_ptrace_shared                        FAILED in 5.2s
  /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_shared/test.log
```

(I think all runsc_kvm tests are broken on my machine -- I'll rerun them
if you can point me at the documentation to set it up)
2019-06-24 14:37:14 -07:00
chris.zn f957fb23cf Return ENOENT when reading /proc/{pid}/task of an exited process
There will be a deadloop when we use getdents to read /proc/{pid}/task
of an exited process

Like this:

Process A is running
                         Process B: open /proc/{pid of A}/task
Process A exits
                         Process B: getdents /proc/{pid of A}/task

Then, process B will fall into deadloop, and return "." and ".."
in loops and never ends.

This patch returns ENOENT when use getdents to read /proc/{pid}/task
if the process is just exited.

Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-06-24 15:49:53 +08:00
Nicolas Lacasse 35719d52c7 Implement statx.
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.

Fixes #343

PiperOrigin-RevId: 254575752
2019-06-22 13:29:26 -07:00
Bhasker Hariharan c1761378a9 Fix the logic for sending zero window updates.
Today we have the logic split in two places between endpoint Read() and the
worker goroutine which actually sends a zero window. This change makes it so
that when a zero window ACK is sent we set a flag in the endpoint which can be
read by the endpoint to decide if it should notify the worker to send a
nonZeroWindow update.

The worker now does not do the check again but instead sends an ACK and flips
the flag right away.

Similarly today when SO_RECVBUF is set the SetSockOpt call has logic
to decide if a zero window update is required. Rather than do that we move
the logic to the worker goroutine and it can check the zeroWindow flag
and send an update if required.

PiperOrigin-RevId: 254505447
2019-06-21 18:31:31 -07:00
Andrei Vagin ab6774cebf gvisor/fs: getdents returns 0 if offset is equal to FileMaxOffset
FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called.

PiperOrigin-RevId: 254498777
2019-06-21 17:25:17 -07:00
Michael Pratt 6f933a934f Remove O(n) lookup on unlink/rename
Currently, the path tracking in the gofer involves an O(n) lookup of
child fidRefs. This causes a significant overhead on unlinks in
directories with lots of child fidRefs (<4k).

In this transition, pathNode moves from sync.Map to normal synchronized
maps. There is a small chance of contention in walk, but the lock is
held for a very short time (and sync.Map also had a chance of requiring
locking).

OTOH, sync.Map makes it very difficult to add a fidRef reverse map.

PiperOrigin-RevId: 254489952
2019-06-21 16:27:26 -07:00
Brad Burlage ae4ef32b8c Deflake TestSimpleReceive failures due to timeouts
This test will occasionally fail waiting to read a packet. From repeated runs,
I've seen it up to 1.5s for waitForPackets to complete.

PiperOrigin-RevId: 254484627
2019-06-21 15:56:12 -07:00
Ayush Ranjan 727375321f ext4 block group descriptor implementation in disk layout package.
PiperOrigin-RevId: 254482180
2019-06-21 15:42:46 -07:00
Jamie Liu e806466fc5 Add //pkg/flipcall.
Flipcall is a (conceptually) simple local-only RPC mechanism. Compared
to unet, Flipcall does not support passing FDs (support for which will
be provided out of band by another package), requires users to establish
connections manually, and requires user management of concurrency since
each connected Endpoint pair supports only a single RPC at a time;
however, it improves performance by using shared memory for data
(reducing memory copies) and using futexes for control signaling (which
is much cheaper than sendto/recvfrom/sendmsg/recvmsg).

PiperOrigin-RevId: 254471986
2019-06-21 14:47:04 -07:00
Fabricio Voznika 5ba16d51a9 Add list of stuck tasks to panic message
PiperOrigin-RevId: 254450309
2019-06-21 12:46:53 -07:00
Michael Pratt c0317b28cb Update pathNode documentation to reflect reality
Neither fidRefs or children are (directly) synchronized by mu. Remove
the preconditions that say so.

That said, the surrounding does enforce some synchronization guarantees
(e.g., fidRef.renameChildTo does not atomically replace the child in the
maps). I've tried to note the need for callers to do this
synchronization.

I've also renamed the maps to what are (IMO) clearer names. As is, it is
not obvious that pathNode.fidRefs is a map of *child* fidRefs rather
than self fidRefs.

PiperOrigin-RevId: 254446965
2019-06-21 12:26:42 -07:00
Andrei Vagin f94653b3de kernel: call t.mu.Unlock() explicitly in WithMuLocked
defer here doesn't improve readability, but we know it slower that
the explicit call.

PiperOrigin-RevId: 254441473
2019-06-21 11:55:42 -07:00
Nicolas Lacasse 335fd987b0 Delete dangling comment line.
This was from an old comment, which was superseded by the
existing comment which is correct.

PiperOrigin-RevId: 254434535
2019-06-21 11:24:12 -07:00
Fabricio Voznika 054b5632ef Update comment
PiperOrigin-RevId: 254428866
2019-06-21 10:56:42 -07:00
Ian Gudger dc36c34a76 Close FD on TcpSocketTest loop failure.
This helps prevent the blocking call from getting stuck and causing a test
timeout.

PiperOrigin-RevId: 254325926
2019-06-20 20:40:31 -07:00
Neel Natu 3c7448ab6f Deflake TestSIGALRMToMainThread.
Bump up the threshold on number of SIGALRMs received by worker
threads from 50 to 200. Even with the new threshold we still
expect that the majority of SIGALRMs are received by the
thread group leader.

PiperOrigin-RevId: 254289787
2019-06-20 15:58:18 -07:00
Jamie Liu 7db8685100 Preallocate auth.NewAnonymousCredentials() in contexttest.TestContext.
Otherwise every call to, say, fs.ContextCanAccessFile() in a benchmark
using contexttest allocates new auth.Credentials, a new
auth.UserNamespace, ...

PiperOrigin-RevId: 254261051
2019-06-20 13:36:14 -07:00
Michael Pratt 292f70cbf7 Add package docs to seqfile and ramfs
These are the only packages missing docs:
https://godoc.org/gvisor.dev/gvisor

PiperOrigin-RevId: 254261022
2019-06-20 13:34:33 -07:00
Rahat Mahmood ddc1d94a37 Unmark amutex_test as flaky.
PiperOrigin-RevId: 254254058
2019-06-20 12:58:04 -07:00
Neel Natu 0b2135072d Implement madvise(MADV_DONTFORK)
PiperOrigin-RevId: 254253777
2019-06-20 12:56:00 -07:00
Michael Pratt b46ec3704b Drop extra character
PiperOrigin-RevId: 254237530
2019-06-20 11:31:17 -07:00
Ian Gudger 7e49515696 Deflake SendFileTest_Shutdown.
The sendfile syscall's backing doSplice contained a race with regard to
blocking. If the first attempt failed with syserror.ErrWouldBlock and then
the blocking file became ready before registering a waiter, we would just
return the ErrWouldBlock (even if we were supposed to block).

PiperOrigin-RevId: 254114432
2019-06-19 18:40:54 -07:00