Commit Graph

81 Commits

Author SHA1 Message Date
Bhasker Hariharan ebc81fadfc Add openat() to list of permitted syscalls in gotsan runs.
PiperOrigin-RevId: 333853498
2020-09-25 19:36:01 -07:00
Michael Pratt 13a9a622e1 Allow CLONE_SETTLS for Go 1.16
https://go.googlesource.com/go/+/0941fc3 switches the Go runtime (on amd64)
from using arch_prctl(ARCH_SET_FS) to CLONE_SETTLS to set the TLS.

PiperOrigin-RevId: 333100550
2020-09-22 09:58:09 -07:00
Michael Pratt f134f873fc Force clone parent_tidptr and child_tidptr to zero
Neither CLONE_PARENT_SETTID nor CLONE_CHILD_SETTID are used, so these arguments
will always be NULL.

PiperOrigin-RevId: 333085326
2020-09-22 08:41:07 -07:00
Michael Pratt 313e1988c4 Drop ARCH_GET_FS
Go does not call arch_prctl(ARCH_GET_FS), nor am I sure it ever did. Drop the
filter.

PiperOrigin-RevId: 332470532
2020-09-18 09:57:27 -07:00
Fabricio Voznika da07e38f7c Remove option to panic gofer
Gofer panics are suppressed by p9 server and an error
is returned to the caller, making it effectively the
same as returning EROFS.

PiperOrigin-RevId: 332282959
2020-09-17 12:01:45 -07:00
Ian Lewis dcd532e2e4 Add support for OCI seccomp filters in the sandbox.
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.

The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.

- New conditional operations were added to pkg/seccomp to support operations
  available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
  that syscalls matching the conditionals result in the provided action not
  simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
  rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
  jumps are still not permitted.

Fixes #510

PiperOrigin-RevId: 331938697
2020-09-15 23:19:17 -07:00
Fabricio Voznika 37a217aca4 Implement setattr+clunk in 9P
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.

Benchmark results:

BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2:          68109 ns
VFS1:          72507 ns

Updates #1198

PiperOrigin-RevId: 329628461
2020-09-01 19:22:12 -07:00
Fabricio Voznika be76c7ce6e Move boot.Config to its own package
Updates #3494

PiperOrigin-RevId: 327548511
2020-08-19 18:37:42 -07:00
Fabricio Voznika 6335704625 Remove path walk from localFile.Mknod
Replace mknod call with mknodat equivalent to protect
against symlink attacks. Also added Mknod tests.

Remove goferfs reliance on gofer to check for file
existence before creating a synthetic entry.

Updates #2923

PiperOrigin-RevId: 327544516
2020-08-19 18:05:54 -07:00
Fabricio Voznika 760c131da1 Return EROFS if mount is read-only
PiperOrigin-RevId: 327300635
2020-08-18 13:58:42 -07:00
Jamie Liu 1b11326ecd Call lseek(0, SEEK_CUR) unconditionally in runsc fsgofer's Readdir(offset=0).
9P2000.L is silent as to how readdir RPCs interact with directory mutation. The
most performant option is for Treaddir with offset=0 to restart iteration,
avoiding needing to walk+open+clunk a new directory fid between invocations of
getdents64(2), and the VFS2 gofer client assumes this is the case. Make this
actually true for the runsc fsgofer.

Fixes #3344, #3345, #3355

PiperOrigin-RevId: 324090384
2020-07-30 15:02:22 -07:00
Andrei Vagin d6b676ae6a test/syscall: run each test case in a separate network namespace
... when it is possible.

The guitar gVisorKernel*Workflow-s runs test with the local execution_method.
In this case, blaze runs test cases locally without sandboxes. This means
that all tests run in the same network namespace. We have a few tests which
use hard-coded network ports and they can fail if one of these port will be
used by someone else or by another test cases.

PiperOrigin-RevId: 323137254
2020-07-25 01:04:45 -07:00
Fabricio Voznika bd97206fa1 Reduce walk and open cost in fsgofer
Implement WalkGetAttr() to reuse the stat that is already
needed for Walk(). In addition, cache file QID, so it
doesn't need to stat the file to compute it.

open(2) time improved by 10%:

Baseline: 6780 ns
Change: 6083 ns

Also fixed file type which was not being set in all places.

PiperOrigin-RevId: 323102560
2020-07-24 17:20:33 -07:00
Fabricio Voznika 384369e01e Fix fsgofer Open() when control file is using O_PATH
Open tries to reuse the control file to save syscalls and
file descriptors when opening a file. However, when the
control file was opened using O_PATH (e.g. no file permission
to open readonly), Open() would not check for it.

PiperOrigin-RevId: 322821729
2020-07-23 11:02:35 -07:00
Ayush Ranjan 10930189c3 Fix mknod and inotify syscall test
This change fixes a few things:
- creating sockets using mknod(2) is supported via vfs2
- fsgofer can create regular files via mknod(2)
- mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well

Updates #2923

PiperOrigin-RevId: 320074267
2020-07-07 15:35:01 -07:00
Fabricio Voznika bae1475603 Print spec as json when --debug is enabled
The previous format skipped many important structs that
are pointers, especially for cgroups. Change to print
as json, removing parts of the spec that are not relevant.

Also removed debug message from gofer that can be very
noisy when directories are large.

PiperOrigin-RevId: 316713267
2020-06-16 10:52:17 -07:00
Fabricio Voznika f7418e2159 Move Cleanup to its own package
PiperOrigin-RevId: 313663382
2020-05-28 14:49:06 -07:00
Dean Deng fc72eb3595 Remove TODOs for local gofer extended attributes.
PiperOrigin-RevId: 305344989
2020-04-07 14:48:40 -07:00
Dean Deng 17b9f5e662 Support listxattr and removexattr syscalls.
Note that these are only implemented for tmpfs, and other impls will still
return EOPNOTSUPP.

PiperOrigin-RevId: 293899385
2020-02-07 14:47:13 -08:00
Fabricio Voznika 6d8bf405bc Allow mlock in fsgofer system call filters
Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g
stack to prevent register corruption. We need to allow this syscall until it is
removed from Go.

PiperOrigin-RevId: 293212935
2020-02-04 13:42:27 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Dean Deng 07f2584979 Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.

Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.

Extended attributes will be implemented in future commits.

PiperOrigin-RevId: 290121300
2020-01-16 12:56:33 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Dean Deng 1643224af0 Finish incomplete comment.
PiperOrigin-RevId: 285012278
2019-12-11 10:37:35 -08:00
Jianfeng Tan f697d1a33e gofer: reduce CPU usage on GC as of frequent readdir
Refer to golang mallocgc(), each time of allocating an object > 32 KB,
a gc will be triggered.

When we do readdir, sentry always passes 65535, which leads to a malloc
of 65535 * sizeof(p9.Direnta) > 32 KB.

Considering we already use slice append, let's avoid defining the
capability for this slide.

Command for test:

Before this change:

  (container)$ time tree linux-5.3.1 > /dev/null

  real    0m54.272s
  user    0m2.010s
  sys     0m1.740s
  (CPU usage of Gofer: ~30 cores)

  (host)$ perf top -p <pid-of-gofer>

    42.57%  runsc        [.] runtime.gcDrain
    23.41%  runsc        [.] runtime.(*lfstack).pop
     9.74%  runsc        [.] runtime.greyobject
     8.06%  runsc        [.] runtime.(*lfstack).push
     4.33%  runsc        [.] runtime.scanobject
     1.69%  runsc        [.] runtime.findObject
     1.12%  runsc        [.] runtime.findrunnable
     0.69%  runsc        [.] runtime.runqgrab
    ...

  (host)$ mkdir test && cd test
  (host)$ for i in `seq 1 65536`; do mkdir $i; done
  (container)$ time ls test/ > /dev/null

  real    2m10.934s
  user    0m0.280s
  sys     0m4.260s
  (CPU usage of Gofer: ~1 core)

After this change:

  (container)$ time tree linux-5.3.1 > /dev/null

  real    0m22.465s
  user    0m1.270s
  sys     0m1.310s
  (CPU usage of Gofer: ~1 core)

  $ perf top -p <pid-of-gofer>

    20.57%  runsc        [.] runtime.gcDrain
     7.15%  runsc        [.] runtime.(*lfstack).pop
     4.11%  runsc        [.] runtime.scanobject
     3.78%  runsc        [.] runtime.greyobject
     2.78%  runsc        [.] runtime.(*lfstack).push
    ...

  (host)$ mkdir test && cd test
  (host)$ for i in `seq 1 65536`; do mkdir $i; done
  (container)$ time ls test/ > /dev/null

  real    0m13.338s
  user    0m0.190s
  sys     0m3.980s
  (CPU usage of Gofer: ~0.8 core)

Fixes #898

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-11-23 13:24:46 +00:00
Jamie Liu f8ffadddb3 Add p9.OpenTruncate.
This is required to implement O_TRUNC correctly on filesystems backed by
gofers.

9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags
bits, e.g. O_RDONLY, O_RDWR, O_WRONLY."

open(2): "The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation
flags and file status flags can be bitwise-or'd in flags."

The reference 9P2000.L implementation also appears to expect arbitrary flags,
not just access modes, in Tlopen.flags:
https://github.com/chaos/diod/blob/master/diod/ops.c#L703

PiperOrigin-RevId: 278972683
2019-11-06 17:11:58 -08:00
Haibo Xu 80d0db274e Enable runsc/fsgofer support on arm64.
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
2019-10-30 05:21:36 +00:00
Haibo Xu dec831b493 Cast the Stat_t.Nlink to uint64 on arm64.
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-28 05:56:03 +00:00
Michael Pratt 49b596b98d Cleanup host UDS support
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.

A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.

Updates #235
Updates #1003

[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.

PiperOrigin-RevId: 275558502
2019-10-18 15:33:03 -07:00
Michael Pratt a295616326 Make Attach no longer a special snowflake
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.

This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.

Updates #235

PiperOrigin-RevId: 274827872
2019-10-15 10:01:22 -07:00
Michael Pratt a5170fd825 Allow rt_sigreturn in runsc gofer
rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer
dereference). Before this, nil-pointer dereferences cause a syscall violation
instead of a panic.

PiperOrigin-RevId: 274028767
2019-10-10 13:41:29 -07:00
Fabricio Voznika 8337e4f509 Disallow opening of sockets if --fsgofer-host-uds=false
Updates #235

PiperOrigin-RevId: 271475319
2019-09-26 18:16:02 -07:00
gVisor bot dd0e5eedae Merge pull request #765 from trailofbits:uds_support
PiperOrigin-RevId: 271235134
2019-09-25 16:44:22 -07:00
Robert Tonic 9ebd498a55 Remove unecessary seccomp permission.
This removes the F_DUPFD_CLOEXEC support for the gofer, previously 
required when depending on the STL net package.
2019-09-24 18:37:25 -04:00
Robert Tonic 7810b30983 Refactor command line options and remove the allowed terminology for uds 2019-09-24 18:24:10 -04:00
Robert Tonic e975184bc5 Update InstallUDSFilters documentation to be accurate to functionality. 2019-09-19 17:44:46 -04:00
Robert Tonic 46beb91912 Fix documentation, clean up seccomp filter installation, rename helpers.
Filter installation has been streamlined and functions renamed. 
Documentation has been fixed to be standards compliant, and missing 
documentation added. gofmt has also been applied to modified files.
2019-09-19 17:10:50 -04:00
Robert Tonic ac38a7ead0 Place the host UDS mounting behind --fsgofer-host-uds-allowed.
This commit allows the use of the `--fsgofer-host-uds-allowed` flag to 
enable mounting sockets and add the appropriate seccomp filters.
2019-09-19 12:37:15 -04:00
Adin Scannell a8834fc555 Update p9 to support flipcall.
PiperOrigin-RevId: 268845090
2019-09-12 23:37:31 -07:00
Robert Tonic c2ae77a607 Apply go fmt to the fsgofer changes. 2019-09-05 17:12:09 -04:00
Robert Tonic 4288a57883 Remove seccomp permissions, and clean up the Attach logic. 2019-09-05 15:26:16 -04:00
Robert Tonic 07d329d89f Restrict seccomp filters for UDS support.
This commit further restricts the seccomp filters required for Gofer 
access ot Unix Domain Sockets (UDS).
2019-08-27 13:08:56 -04:00
Robert Tonic c319b360d1 First pass at implementing Unix Domain Socket support. No tests.
This commit adds support for detecting the socket file type, connecting 
to a Unix Domain Socket, and providing bidirectional communication 
(without file descriptor transfer support).
2019-08-27 13:08:56 -04:00
Andrei Vagin c9c52c024c fsgofer_test.go: a socket file path can't be longer than UNIX_MAX_PATH (108)
PiperOrigin-RevId: 265478578
2019-08-26 09:59:43 -07:00
Fabricio Voznika c386f046c1 Fix file mode check in fsgofer Attach
PiperOrigin-RevId: 263189654
2019-08-13 12:23:02 -07:00
Haibo Xu 1decf76471 Change syscall.POLL to syscall.PPOLL.
syscall.POLL is not supported on arm64, using syscall.PPOLL
to support both the x86 and arm64. refs #63

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I2c81a063d3ec4e7e6b38fe62f17a0924977f505e
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/543 from xiaobo55x:master ba598263fd3748d1addd48e4194080aa12085164
PiperOrigin-RevId: 260752049
2019-07-30 11:01:29 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Andrei Vagin fd16a329ce fsgopher: reopen files via /proc/self/fd
When we reopen file by path, we can't be sure that
we will open exactly the same file. The file can be
deleted and another one with the same name can be
created.

PiperOrigin-RevId: 254898594
2019-06-24 21:44:27 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Michael Pratt 4a842836e5 Return EPERM for mknod
This more directly matches what Linux does with unsupported
nodes.

PiperOrigin-RevId: 248780425
Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec
2019-05-17 13:47:40 -07:00