Commit Graph

180 Commits

Author SHA1 Message Date
Ian Lewis dcd532e2e4 Add support for OCI seccomp filters in the sandbox.
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.

The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.

- New conditional operations were added to pkg/seccomp to support operations
  available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
  that syscalls matching the conditionals result in the provided action not
  simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
  rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
  jumps are still not permitted.

Fixes #510

PiperOrigin-RevId: 331938697
2020-09-15 23:19:17 -07:00
Chong Cai cb2e3c946a Implement gvisor verity fs ioctl with GETFLAGS
PiperOrigin-RevId: 331905347
2020-09-15 19:01:59 -07:00
Rahat Mahmood 3ca73841d7 Move the 'marshal' and 'primitive' packages to the 'pkg' directory.
PiperOrigin-RevId: 331256608
2020-09-11 17:42:49 -07:00
gVisor bot 360f1535c7 Implement ioctl with enable verity
ioctl with FS_IOC_ENABLE_VERITY is added to verity file system to enable
a file as verity file. For a file, a Merkle tree is built with its data.
For a directory, a Merkle tree is built with the root hashes of its
children.

PiperOrigin-RevId: 330604368
2020-09-08 15:54:21 -07:00
Ayush Ranjan 2eaf54dd59 Refactor tty codebase to use master-replica terminology.
Updates #2972

PiperOrigin-RevId: 329584905
2020-09-01 14:43:41 -07:00
Ayush Ranjan 723fb5c116 [go-marshal] Enable auto-marshalling for fs/tty.
PiperOrigin-RevId: 329564614
2020-09-01 13:02:17 -07:00
Rahat Mahmood b4820e5986 Implement StatFS for various VFS2 filesystems.
This mainly involved enabling kernfs' client filesystems to provide a
StatFS implementation.

Fixes #3411, #3515.

PiperOrigin-RevId: 329009864
2020-08-28 14:31:11 -07:00
Kevin Krakauer 01a35a2f19 ip6tables: (de)serialize ip6tables structs
More implementation+testing to follow.

#3549.

PiperOrigin-RevId: 328770160
2020-08-27 10:53:49 -07:00
Nicolas Lacasse 83a8b309e9 tmpfs: Allow xattrs in the trusted namespace if creds has CAP_SYS_ADMIN.
This is needed to support the overlay opaque attribute.

PiperOrigin-RevId: 328552985
2020-08-26 10:05:34 -07:00
Jamie Liu 247dcd62d4 Return non-zero size for tmpfs statfs(2).
This does not implement accepting or enforcing any size limit, which will be
more complex and has performance implications; it just returns a fixed non-zero
size.

Updates #1936

PiperOrigin-RevId: 328428588
2020-08-25 16:40:02 -07:00
Dean Deng cb573c8e0b Expose basic coverage information to userspace through kcov interface.
In Linux, a kernel configuration is set that compiles the kernel with a
custom function that is called at the beginning of every basic block, which
updates the memory-mapped coverage information. The Go coverage tool does not
allow us to inject arbitrary instructions into basic blocks, but it does
provide data that we can convert to a kcov-like format and transfer them to
userspace through a memory mapping.

Note that this is not a strict implementation of kcov, which is especially
tricky to do because we do not have the same coverage tools available in Go
that that are available for the actual Linux kernel. In Linux, a kernel
configuration is set that compiles the kernel with a custom function that is
called at the beginning of every basic block to write program counters to the
kcov memory mapping. In Go, however, coverage tools only give us a count of
basic blocks as they are executed. Every time we return to userspace, we
collect the coverage information and write out PCs for each block that was
executed, providing userspace with the illusion that the kcov data is always
up to date. For convenience, we also generate a unique synthetic PC for each
block instead of using actual PCs. Finally, we do not provide thread-specific
coverage data (each kcov instance only contains PCs executed by the thread
owning it); instead, we will supply data for any file specified by --
instrumentation_filter.

Also, fix issue in nogo that was causing pkg/coverage:coverage_nogo
compilation to fail.

PiperOrigin-RevId: 328426526
2020-08-25 16:28:45 -07:00
Ayush Ranjan 430487c9e7 [go-marshal] Enable auto-marshalling for host tty.
PiperOrigin-RevId: 328415633
2020-08-25 15:29:03 -07:00
Kevin Krakauer d50f2e2c76 ip6tables: ABI structs and constants
Part of #3549.

PiperOrigin-RevId: 326329028
2020-08-12 16:20:51 -07:00
Craig Chi 51e64d2fc5 Implement FUSE_GETATTR
FUSE_GETATTR is called when a stat(2), fstat(2), or lstat(2) is issued
from VFS2 layer to a FUSE filesystem.

Fixes #3175
2020-08-10 18:15:32 -07:00
gVisor bot 7142a86a2c Internal change.
PiperOrigin-RevId: 324819246
2020-08-04 08:49:02 -07:00
Kevin Krakauer 2a7b2a61e3 iptables: support SO_ORIGINAL_DST
Envoy (#170) uses this to get the original destination of redirected
packets.
2020-07-31 10:47:26 -07:00
Jinmou Li 2e19a8b951 Add FUSE_INIT
This change allows the sentry to send FUSE_INIT request and process
the reply. It adds the corresponding structs, employs the fuse
device to send and read the message, and stores the results of negotiation
in corresponding places (inside connection struct).

It adds a CallAsync() function to the FUSE connection interface:

- like Call(), but it's for requests that do not expect immediate response (init, release, interrupt etc.)
- will block if the connection hasn't initialized, which is the same for Call()
2020-07-29 22:52:12 +00:00
Ridwan Sharif 112eb0c5b9 Add device implementation for /dev/fuse
This PR adds the following:
  - [x] Marshall-able structs for fuse headers
  - [x] Data structures needed in /dev/fuse to communicate with the daemon server
  - [x] Implementation of the device interface
  - [x] Go unit tests

This change adds the `/dev/fuse` implementation. `Connection` controls the
communication between the server and the sentry.  The FUSE server uses
the `FileDescription` interface to interact with the Sentry. The Sentry
implmenetation of fusefs, uses `Connection` and the Connection interface
to interact with the Server. All communication messages are in the form
of `go_marshal` backed structs defined in the ABI package.

This change also adds some go unit tests that test (pretty basically)
the interfaces and should be used as an example of an end to end FUSE
operation.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/3083 from ridwanmsharif:ridwanmsharif/fuse-device-impl 69aa2ce970004938fe9f918168dfe57636ab856e
PiperOrigin-RevId: 323428180
2020-07-27 13:34:44 -07:00
Ayush Ranjan e2c70ee981 Enable automated marshalling for netstack.
PiperOrigin-RevId: 322954792
2020-07-24 01:25:39 -07:00
Nicolas Lacasse 4ec3516332 Implement get/set_robust_list.
PiperOrigin-RevId: 322904430
2020-07-23 17:42:50 -07:00
Ayush Ranjan 6f7f739967 Marshallable socket opitons.
Socket option values are now required to implement marshal.Marshallable.

Co-authored-by: Rahat Mahmood <rahat@google.com>
PiperOrigin-RevId: 322831612
2020-07-23 11:45:10 -07:00
Bhasker Hariharan 71bf90c55b Support for receiving outbound packets in AF_PACKET.
Updates #173

PiperOrigin-RevId: 322665518
2020-07-22 15:33:33 -07:00
Bhasker Hariharan fef90c61c6 Fix minor bugs in a couple of interface IOCTLs.
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks
tcpdump as it tries to interpret the packets incorrectly.

Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which
fails with an EINVAL since we don't implement it. For now change it to return
EOPNOTSUPP to indicate that we don't support the query rather than return
EINVAL.

NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities
and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type
field while NIC capabilities are more like the device features which can be
queried using SIOCETHTOOL but not modified and NIC Flags are fields that can
be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc.

Updates #2746

PiperOrigin-RevId: 321436525
2020-07-15 14:15:44 -07:00
Dean Deng 6c099d8300 Update preadv2/pwritev2 flag handling in vfs2.
We do not support RWF_SYNC/RWF_DSYNC and probably shouldn't silently accept
them, since the user may incorrectly believe that we are synchronizing I/O.
Remove the pwritev2 test verifying that we support these flags.

gvisor.dev/issue/2601 is the tracking bug for deciding which RWF_.* flags
we need and supporting them.

Updates #2923, #2601.

PiperOrigin-RevId: 319351286
2020-07-01 22:04:42 -07:00
Dean Deng e8f1a5c1f6 Port GETOWN, SETOWN fcntls to vfs2.
Also make some fixes to vfs1's F_SETOWN. The fcntl test now entirely passes
on vfs2.

Fixes #2920.

PiperOrigin-RevId: 318669529
2020-06-27 21:33:37 -07:00
Ridwan Sharif bd5f0e2dc4 Add FUSE character device
This change adds a FUSE character device backed by devtmpfs. This
device will be used to establish a connection between the FUSE
server daemon and fusefs. The FileDescriptionImpl methods will
be implemented as we flesh out fusefs some more. The tests assert
that the device can be opened and used.
2020-06-25 14:22:21 -04:00
Dean Deng 7db196c4db Port fadvise64 to vfs2.
Like vfs1, we have a trivial implementation that ignores all valid advice.

Updates #2923.

PiperOrigin-RevId: 317349505
2020-06-19 11:50:09 -07:00
Nicolas Lacasse 810748f5c9 Port aio to VFS2.
In order to make sure all aio goroutines have stopped during S/R, a new
WaitGroup was added to TaskSet, analagous to runningGoroutines. This WaitGroup
is incremented with each aio goroutine, and waited on during kernel.Pause.

The old VFS1 aio code was changed to use this new WaitGroup, rather than
fs.Async. The only uses of fs.Async are now inode and mount Release operations,
which do not call fs.Async recursively. This fixes a lock-ordering violation
that can cause deadlocks.

Updates #1035.

PiperOrigin-RevId: 316689380
2020-06-16 08:49:06 -07:00
Nayana Bidari 4b9652d63b {S,G}etsockopt for TCP_KEEPCNT option.
TCP_KEEPCNT is used to set the maximum keepalive probes to be
sent before dropping the connection.

WANT_LGTM=jchacon
PiperOrigin-RevId: 315758094
2020-06-10 13:37:27 -07:00
gVisor bot 0baba92ad9 Internal change.
PiperOrigin-RevId: 313821986
2020-05-29 11:52:22 -07:00
gVisor bot cfd30665c1 iptables - filter packets using outgoing interface.
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT

PiperOrigin-RevId: 310642286
2020-05-08 15:44:54 -07:00
Jamie Liu 9115f26851 Allocate device numbers for VFS2 filesystems.
Updates #1197, #1198, #1672

PiperOrigin-RevId: 310432006
2020-05-07 14:01:53 -07:00
Rahat Mahmood 3c67754663 Enable automated marshalling for signals and the arch package.
PiperOrigin-RevId: 308472331
2020-04-25 23:56:04 -07:00
Rahat Mahmood f01f2132d8 Enable automated marshalling for mempolicy syscalls.
PiperOrigin-RevId: 308170679
2020-04-23 18:20:21 -07:00
Rahat Mahmood 93dd471461 Enable automated marshalling for epoll events.
Ensure we use the correct architecture-specific defintion of epoll
event, and use go-marshal for serialization.

PiperOrigin-RevId: 308145677
2020-04-23 15:49:05 -07:00
Andrei Vagin 0c586946ea Specify a memory file in platform.New().
PiperOrigin-RevId: 307941984
2020-04-22 17:50:10 -07:00
Nayana Bidari 92b9069b67 Support owner matching for iptables.
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
2020-03-26 12:21:24 -07:00
Fabricio Voznika de694e5484 Combine file mode and isDir arguments
Updates #1035

PiperOrigin-RevId: 303021328
2020-03-26 08:48:04 -07:00
gVisor bot 159a230b9b Merge pull request #1943 from kevinGC:ipt-filter-ip
PiperOrigin-RevId: 301197007
2020-03-16 11:13:14 -07:00
Dean Deng 5e413cad10 Plumb VFS2 imported fds into virtual filesystem.
- When setting up the virtual filesystem, mount a host.filesystem to contain
  all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
  supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
  enabled.

PiperOrigin-RevId: 300922353
2020-03-14 07:14:33 -07:00
gVisor bot 2c2622b942 Merge pull request #1975 from nybidari:iptables
PiperOrigin-RevId: 300362789
2020-03-11 11:02:04 -07:00
Haibo Xu c04958e2fa Enable thread local storage support on arm64.
Linux use the task.thread.uw.tp_value field to store the
TLS pointer on arm64 platform, and we use a similar way
in gvisor to store it in the arch/State struct.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Ie76b5c6d109bc27ccfd594008a96753806db7764
2020-03-09 01:04:55 +00:00
Kevin Krakauer 408979e619 iptables: filter by IP address (and range)
Enables commands such as:
$ iptables -A INPUT -d 127.0.0.1 -j ACCEPT
$ iptables -t nat -A PREROUTING ! -d 127.0.0.1 -j REDIRECT

Also adds a bunch of REDIRECT+destination tests.
2020-02-26 11:04:00 -08:00
nybidari 818abc2bd5
Merge branch 'master' into iptables 2020-02-25 15:33:59 -08:00
Nayana Bidari acc405ba60 Add nat table support for iptables.
- commit the changes for the comments.
2020-02-25 15:03:51 -08:00
Jamie Liu 471b15b212 Port most syscalls to VFS2.
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.

Updates #1623

PiperOrigin-RevId: 297188448
2020-02-25 13:37:34 -08:00
Ting-Yu Wang b8f56c79be Implement tap/tun device in vfs.
PiperOrigin-RevId: 296526279
2020-02-21 15:42:56 -08:00
gVisor bot 9bad87339a Better strace logging for epoll syscalls.
Example:

epoll_ctl(0x3 anon_inode:[eventpoll], EPOLL_CTL_ADD, 0x6 anon_inode:[eventfd], 0x7efe2fd92a80 {events=EPOLLIN|EPOLLOUT data=0x10203040506070a}) = 0x0 (4.411µs)

epoll_wait(0x3 anon_inode:[eventpoll], 0x7efe2fd92b50 {{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}}, 0x3, 0xffffffff) = 0x3 (29.891µs)

PiperOrigin-RevId: 296258146
2020-02-20 11:31:00 -08:00
gVisor bot 7fdb609b3e Merge pull request #1850 from kevinGC:jump2
PiperOrigin-RevId: 295785052
2020-02-18 11:41:54 -08:00
Nayana Bidari b30b7f3422 Add nat table support for iptables.
Add nat table support for Prerouting hook with Redirect option.
Add tests to check redirect of ports.
2020-02-18 11:30:42 -08:00