Commit Graph

1375 Commits

Author SHA1 Message Date
Nicolas Lacasse 9781128d5a Deflake mount_test.
Inode ids are only stable across Save/Restore if we have an open FD on the
inode. All tests that compare inode ids must therefor hold an FD open.

PiperOrigin-RevId: 254086603
2019-06-19 15:46:11 -07:00
Michael Pratt 773423a997 Abort loop on failure
As-is, on failure these will infinite loop, resulting in test timeout
instead of failure.

PiperOrigin-RevId: 254074989
2019-06-19 14:48:18 -07:00
Michael Pratt 9d2efaac5a Add renamed children pathNodes to target parent
Otherwise future renames may miss Renamed calls.

PiperOrigin-RevId: 254060946
2019-06-19 13:41:07 -07:00
Nicolas Lacasse 29f9e4fa87 fileOp{On,At} should pass the remaning symlink traversal count.
And methods that do more traversals should use the remaining count rather than
resetting.

PiperOrigin-RevId: 254041720
2019-06-19 11:56:34 -07:00
Nicolas Lacasse f7428af9c1 Add MountNamespace to task.
This allows tasks to have distinct mount namespace, instead of all sharing the
kernel's root mount namespace.

Currently, the only way for a task to get a different mount namespace than the
kernel's root is by explicitly setting a different MountNamespace in
CreateProcessArgs, and nothing does this (yet).

In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a
new container inside runsc.

Note that "MountNamespace" is a poor term for this thing. It's more like a
distinct VFS tree. When we get around to adding real mount namespaces, this
will need a better naem.

PiperOrigin-RevId: 254009310
2019-06-19 09:21:21 -07:00
Neel Natu 0d1dc50b70 Mark tcp_socket test flaky.
PiperOrigin-RevId: 253997465
2019-06-19 08:08:12 -07:00
Fabricio Voznika ca245a428b Attempt to fix TestPipeWritesAccumulate
Test fails because it's reading 4KB instead of the
expected 64KB. Changed the test to read pipe buffer
size instead of hardcode and added some logging in
case the reason for failure was not pipe buffer size.

PiperOrigin-RevId: 253916040
2019-06-18 19:16:11 -07:00
Rahat Mahmood 546b2948cb Use return values from syscalls in eventfd tests.
PiperOrigin-RevId: 253890611
2019-06-18 16:21:56 -07:00
Fabricio Voznika 0e07c94d54 Kill sandbox process when 'runsc do' exits
PiperOrigin-RevId: 253882115
2019-06-18 15:36:17 -07:00
Fabricio Voznika bdb19b82ef Add Container/Sandbox args struct for creation
There were 3 string arguments that could be easily misplaced
and it makes it easier to add new arguments, especially for
Container that has dozens of callers.

PiperOrigin-RevId: 253872074
2019-06-18 14:46:49 -07:00
Brad Burlage 2e1379867a Replace usage of deprecated strtoul/strtoull
PiperOrigin-RevId: 253864770
2019-06-18 14:18:47 -07:00
Fabricio Voznika ec15fb1162 Fix PipeTest_Streaming timeout
Test was calling Size() inside read and write loops. Size()
makes 2 syscalls to return the pipe size, making the test
do a lot more work than it should.

PiperOrigin-RevId: 253824690
2019-06-18 11:03:33 -07:00
Andrei Vagin 8ab0848c70 gvisor/fs: don't update file.offset for sockets, pipes, etc
sockets, pipes and other non-seekable file descriptors don't
use file.offset, so we don't need to update it.

With this change, we will be able to call file operations
without locking the file.mu mutex. This is already used for
pipes in the splice system call.

PiperOrigin-RevId: 253746644
2019-06-18 01:43:29 -07:00
Andrei Vagin 3d1e44a677 gvisor/kokoro: don't modify tests names in the BUILD file
PiperOrigin-RevId: 253746380
2019-06-18 01:41:29 -07:00
Andrei Vagin 66cc0e9f92 gvisor/bazel: use python2 to build runsc-debian
$ bazel build runsc:runsc-debian
  File ".../bazel_tools/tools/build_defs/pkg/make_deb.py", line 311,
  in GetFlagValue:
    flagvalue = flagvalue.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'

make_deb.py is incompatible with Python3.
https://github.com/bazelbuild/bazel/issues/8443

PiperOrigin-RevId: 253691923
2019-06-17 17:09:06 -07:00
gVisor bot 99d286370d Internal change.
PiperOrigin-RevId: 253559564
2019-06-17 05:13:55 -07:00
Bhasker Hariharan a8608c501b Enable Receive Buffer Auto-Tuning for runsc.
Updates #230

PiperOrigin-RevId: 253225078
2019-06-14 07:31:45 -07:00
Bhasker Hariharan 3d71c627fa Add support for TCP receive buffer auto tuning.
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.

This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.

The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf

NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.

Updates #230

PiperOrigin-RevId: 253168283
2019-06-13 22:28:01 -07:00
Ian Gudger 3e9b8ecbfe Plumb context through more layers of filesytem.
All functions which allocate objects containing AtomicRefCounts will soon need
a context.

PiperOrigin-RevId: 253147709
2019-06-13 18:40:38 -07:00
Ian Gudger 0a5ee6f7b2 Fix deadlock in fasync.
The deadlock can occur when both ends of a connected Unix socket which has
FIOASYNC enabled on at least one end are closed at the same time. One end
notifies that it is closing, calling (*waiter.Queue).Notify which takes
waiter.Queue.mu (as a read lock) and then calls (*FileAsync).Callback, which
takes FileAsync.mu. The other end tries to unregister for notifications by
calling (*FileAsync).Unregister, which takes FileAsync.mu and calls
(*waiter.Queue).EventUnregister which takes waiter.Queue.mu.

This is fixed by moving the calls to waiter.Waitable.EventRegister and
waiter.Waitable.EventUnregister outside of the protection of any mutex used
in (*FileAsync).Callback.

The new test is related, but does not cover this particular situation.

Also fix a data race on FileAsync.e.Callback. (*FileAsync).Callback checked
FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter
calling (*FileAsync).Callback could not and did not. This is fixed by making
FileAsync.e.Callback immutable before passing it to the waiter for the first
time.

Fixes #346

PiperOrigin-RevId: 253138340
2019-06-13 17:26:22 -07:00
Rahat Mahmood 05ff1ffaad Implement getsockopt() SO_DOMAIN, SO_PROTOCOL and SO_TYPE.
SO_TYPE was already implemented for everything but netlink sockets.

PiperOrigin-RevId: 253138157
2019-06-13 17:24:51 -07:00
Shentubot 7b0f068258 Merge pull request #306 from amscanne:add_readme
PiperOrigin-RevId: 253137638
2019-06-13 17:20:49 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Jamie Liu 0c8603084d Add p9 and unet benchmarks.
PiperOrigin-RevId: 253122166
2019-06-13 15:53:43 -07:00
Ian Lewis 4fdd560b76 Set the HOME environment variable (fixes #293)
runsc will now set the HOME environment variable as required by POSIX. The
user's home directory is retrieved from the /etc/passwd file located on the
container's file system during boot.

PiperOrigin-RevId: 253120627
2019-06-13 15:45:25 -07:00
Bhasker Hariharan 9f77b36fa1 Set optlen correctly when calling getsockopt.
PiperOrigin-RevId: 253096085
2019-06-13 13:41:39 -07:00
Nicolas Lacasse 94ab253d7e Bump rules_go to v0.18.6, and go toolchain to v1.12.6.
PiperOrigin-RevId: 253061432
2019-06-13 10:47:29 -07:00
Adin Scannell e352f46478 Minor BUILD file cleanup.
PiperOrigin-RevId: 252918338
2019-06-12 15:59:46 -07:00
Bhasker Hariharan 70578806e8 Add support for TCP_CONGESTION socket option.
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.

PiperOrigin-RevId: 252889093
2019-06-12 13:35:50 -07:00
Andrei Vagin bb849bad29 gvisor/runsc: apply seccomp filters before parsing a state file
PiperOrigin-RevId: 252869983
2019-06-12 11:55:24 -07:00
Andrei Vagin 0d05a12fd3 gvisor/ptrace: print guest registers if a stub stopped with unexpected code
PiperOrigin-RevId: 252855280
2019-06-12 10:48:46 -07:00
Fabricio Voznika 356d1be140 Allow 'runsc do' to run without root
'--rootless' flag lets a non-root user execute 'runsc do'.
The drawback is that the sandbox and gofer processes will
run as root inside a user namespace that is mapped to the
caller's user, intead of nobody. And network is defaulted
to '--network=host' inside the root network namespace. On
the bright side, it's very convenient for testing:

runsc --rootless do ls
runsc --rootless do curl www.google.com

PiperOrigin-RevId: 252840970
2019-06-12 09:41:50 -07:00
Adin Scannell df110ad4fe Eat sendfile partial error
For sendfile(2), we propagate a TCP error through the system call layer.
This should be eaten if there is a partial result. This change also adds
a test to ensure that there is no panic in this case, for both TCP sockets
and unix domain sockets.

PiperOrigin-RevId: 252746192
2019-06-11 19:24:35 -07:00
Andrei Vagin 69c8657a66 kokoro: don't overwrite test results for different runtimes
PiperOrigin-RevId: 252724255
2019-06-11 16:36:53 -07:00
Michael Pratt 478a0873e1 Explicitly reference workspace root in test command
oh-my-zsh aliases ... to ../.. [1]. Add an explicit reference
to workspace root to work around the alias.

[1] https://github.com/robbyrussell/oh-my-zsh/blob/master/lib/directories.zsh

Fixes #341

PiperOrigin-RevId: 252720590
2019-06-11 16:15:39 -07:00
Fabricio Voznika fc746efa9a Add support to mount pod shared tmpfs mounts
Parse annotations containing 'gvisor.dev/spec/mount' that gives
hints about how mounts are shared between containers inside a
pod. This information can be used to better inform how to mount
these volumes inside gVisor. For example, a volume that is shared
between containers inside a pod can be bind mounted inside the
sandbox, instead of being two independent mounts.

For now, this information is used to allow the same tmpfs mounts
to be shared between containers which wasn't possible before.

PiperOrigin-RevId: 252704037
2019-06-11 14:54:31 -07:00
Fabricio Voznika 847c4b9759 Use net.HardwareAddr for FDBasedLink.LinkAddress
It prints formatted to the log.

PiperOrigin-RevId: 252699551
2019-06-11 14:31:46 -07:00
Fabricio Voznika a775ae82fe Fix broken pipe error building version file
(11:34:09) ERROR: /tmpfs/src/github/repo/runsc/BUILD:82:1: Couldn't build file runsc/version.txt: Executing genrule //runsc:deb-version failed (Broken pipe): bash failed: error executing command

PiperOrigin-RevId: 252691902
2019-06-11 13:56:32 -07:00
Andrei Vagin 307a9854ed gvisor/test: create a per-testcase directory for runsc logs
Otherwise it's hard to find a directory for a specific test case.

PiperOrigin-RevId: 252636901
2019-06-11 09:38:07 -07:00
Ian Lewis 74e397e39a Add introspection for Linux/AMD64 syscalls
Adds simple introspection for syscall compatibility information to Linux/AMD64.
Syscalls registered in the syscall table now have associated metadata like
name, support level, notes, and URLs to relevant issues.

Syscall information can be exported as a table, JSON, or CSV using the new
'runsc help syscalls' command. Users can use this info to debug and get info
on the compatibility of the version of runsc they are running or to generate
documentation.

PiperOrigin-RevId: 252558304
2019-06-10 23:38:36 -07:00
Jamie Liu 589f36ac4a Move //pkg/sentry/platform/procid to //pkg/procid.
PiperOrigin-RevId: 252501653
2019-06-10 15:47:25 -07:00
Bhasker Hariharan 3933dd5c04 Fixes to listen backlog handling.
Changes netstack to confirm to current linux behaviour where if the backlog is
full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto
backlog connections to be in SYN-RCVD state as long as the backlog is not full.

We also now drop a SYN if syn cookies are in use and the backlog for the
listening endpoint is full.

Added new tests to confirm the behaviour.

Also reverted the change to increase the backlog in TcpPortReuseMultiThread
syscall test.

Fixes #236

PiperOrigin-RevId: 252500462
2019-06-10 15:40:44 -07:00
Rahat Mahmood a00157cc0e Store more information in the kernel socket table.
Store enough information in the kernel socket table to distinguish
between different types of sockets. Previously we were only storing
the socket family, but this isn't enough to classify sockets. For
example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets
are SOCK_DGRAM sockets with a particular protocol.

Instead of creating more sub-tables, flatten the socket table and
provide a filtering mechanism based on the socket entry.

Also generate and store a socket entry index ("sl" in linux) which
allows us to output entries in a stable order from procfs.

PiperOrigin-RevId: 252495895
2019-06-10 15:17:43 -07:00
Jamie Liu 48961d27a8 Move //pkg/sentry/memutil to //pkg/memutil.
PiperOrigin-RevId: 252124156
2019-06-07 14:52:27 -07:00
Adin Scannell e5fb3aab12 BUILD: Use runsc to generate version
This also ensures BUILD files are correctly formatted.

PiperOrigin-RevId: 251990267
2019-06-06 22:09:55 -07:00
Jamie Liu c933f3eede Change visibility of //pkg/sentry/time.
PiperOrigin-RevId: 251965598
2019-06-06 17:58:55 -07:00
Fabricio Voznika 2e43dcb26b Add alsologtostderr option
When set sends log messages to the error log:

sudo ./runsc --logtostderr do ls
I0531 17:59:58.105064  144564 x:0] ***************************
I0531 17:59:58.105087  144564 x:0] Args: [runsc --logtostderr do ls]
I0531 17:59:58.105112  144564 x:0] PID: 144564
I0531 17:59:58.105125  144564 x:0] UID: 0, GID: 0
[...]

PiperOrigin-RevId: 251964377
2019-06-06 17:50:07 -07:00
Jamie Liu 9ea248489b Cap initial usermem.CopyStringIn buffer size.
Almost (?) all uses of CopyStringIn are via linux.copyInPath(), which
passes maxlen = linux.PATH_MAX = 4096. Pre-allocating a buffer of this
size is measurably inefficient in most cases: most paths will not be
this long, 4 KB is a lot of bytes to zero, and as of this writing the Go
runtime allocator maps only two 4 KB objects to each 8 KB span,
necessitating a call to runtime.mcache.refill() on ~every other call.
Limit the initial buffer size to 256 B instead, and geometrically
reallocate if necessary.

PiperOrigin-RevId: 251960441
2019-06-06 17:22:00 -07:00
Rahat Mahmood 315cf9a523 Use common definition of SockType.
SockType isn't specific to unix domain sockets, and the current
definition basically mirrors the linux ABI's definition.

PiperOrigin-RevId: 251956740
2019-06-06 17:00:27 -07:00
Ian Lewis 6a4c006564 Add the gVisor gitter badge to the README
Moves the build badge to just below the logo and adds the gitter badge next to
it for consistency.

PiperOrigin-RevId: 251956383
2019-06-06 16:58:29 -07:00