Commit Graph

113 Commits

Author SHA1 Message Date
gVisor bot 0693fb05d1 Merge pull request #1505 from xiaobo55x:fcntl_flags
PiperOrigin-RevId: 290840484
2020-01-21 17:02:56 -08:00
Rahat Mahmood ad1968ed56 Implement sysfs.
PiperOrigin-RevId: 290822487
2020-01-21 15:13:26 -08:00
Michael Pratt dc99897205 Add missing verb
PiperOrigin-RevId: 290821997
2020-01-21 14:47:41 -08:00
Nicolas Lacasse d6fb1ec6c7 Add timestamps to VFS2 tmpfs, and implement some of SetStat.
PiperOrigin-RevId: 289962040
2020-01-15 16:32:55 -08:00
Kevin Krakauer ae060a63d9 More GH comments. 2020-01-08 17:30:08 -08:00
Kevin Krakauer f26a576984 Addressed GH comments 2020-01-08 16:35:01 -08:00
Kevin Krakauer 446a250996 Comment cleanup. 2020-01-08 11:20:48 -08:00
Kevin Krakauer 8cc1c35bbd Write simple ACCEPT rules to the filter table.
This gets us closer to passing the iptables tests and opens up iptables
so it can be worked on by multiple people.

A few restrictions are enforced for security (i.e. we don't want to let
users write a bunch of iptables rules and then just not enforce them):

- Only the filter table is writable.
- Only ACCEPT rules with no matching criteria can be added.
2020-01-08 10:08:14 -08:00
Michael Pratt 354a15a234 Implement rseq(2)
PiperOrigin-RevId: 288342928
2020-01-06 11:42:44 -08:00
Haibo Xu de0d127ae6 Make some of the fcntl flags arch specific..
Some of the flags in the file system related system call
are architecture specific(O_NOFOLLOW/O_DIRECT..). Ref to
the fcntl.h file in the Linux src codes.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I354d988073bfd0c9ff5371d4e0be9da2b8fd019f
2020-01-06 06:11:07 +00:00
Dean Deng e6f4124afd Implement checks for get/setxattr at the syscall layer.
Add checks for input arguments, file type, permissions, etc. that match
the Linux implementation. A call to get/setxattr that passes all the
checks will still currently return EOPNOTSUPP. Actual support will be
added in following commits.

Only allow user.* extended attributes for the time being.

PiperOrigin-RevId: 285835159
2019-12-16 13:20:07 -08:00
Rahat Mahmood 007707a072 Implement kernfs.
PiperOrigin-RevId: 285231002
2019-12-12 11:20:47 -08:00
Jamie Liu 46651a7d26 Add most VFS methods for syscalls.
PiperOrigin-RevId: 284892289
2019-12-10 18:21:07 -08:00
Ian Gudger 13f0f6069a Implement F_GETOWN_EX and F_SETOWN_EX.
Some versions of glibc will convert F_GETOWN fcntl(2) calls into F_GETOWN_EX in
some cases.

PiperOrigin-RevId: 284089373
2019-12-05 17:28:52 -08:00
Dean Deng 684f757a22 Add support for receiving TOS and TCLASS control messages in hostinet.
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.

PiperOrigin-RevId: 282851425
2019-11-27 16:21:05 -08:00
Bin Lu 7f9c391cf1 slight changes to pkg/abi
In glibc, some structures are defined differently on different
platforms.
Such as: C.struct_stat

Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-10-24 09:15:29 +00:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Jianfeng Tan b94505ecc0 support /proc/net/route
This proc file reports routing information to applications inside the
container.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15 16:38:40 +00:00
Kevin Krakauer 7ef1c44a7f Change linux.FileMode from uint to uint16, and update VFS to use FileMode.
In Linux (include/linux/types.h), mode_t is an unsigned short.

PiperOrigin-RevId: 272956350
2019-10-04 14:20:32 -07:00
Adin Scannell c98e7f0d19 Signalfd support
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.

In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context.  Probably not
worthwhile fixing immediately.

PiperOrigin-RevId: 269901749
2019-09-18 15:16:42 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Zach Koopmans 67d7864f83 Document RWF_HIPRI not implemented for preadv2/pwritev2.
Document limitation of no reasonable implementation for RWF_HIPRI
flag (High Priority Read/Write for block-based file systems).

PiperOrigin-RevId: 264237589
2019-08-19 14:07:44 -07:00
Rahat Mahmood 6cfc76798b Document source and versioning of the TCPInfo struct.
PiperOrigin-RevId: 263637194
2019-08-15 14:05:59 -07:00
Rahat Mahmood 691c2f8173 Compute size of struct tcp_info instead of hardcoding it.
PiperOrigin-RevId: 263040624
2019-08-12 17:34:38 -07:00
Andrei Vagin af90e68623 netlink: return an error in nlmsgerr
Now if a process sends an unsupported netlink requests,
an error is returned from the send system call.

The linux kernel works differently in this case. It returns errors in the
nlmsgerr netlink message.

Reported-by: syzbot+571d99510c6f935202da@syzkaller.appspotmail.com
PiperOrigin-RevId: 262690453
2019-08-09 22:34:54 -07:00
Haibo Xu 1c9da886e7 Add initial ptrace stub and syscall support for arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I1dbd23bb240cca71d0cc30fc75ca5be28cb4c37c
PiperOrigin-RevId: 262619519
2019-08-09 13:18:11 -07:00
Rahat Mahmood 7bfad8ebb6 Return a well-defined socket address type from socket funtions.
Previously we were representing socket addresses as an interface{},
which allowed any type which could be binary.Marshal()ed to be used as
a socket address. This is fine when the address is passed to userspace
via the linux ABI, but is problematic when used from within the sentry
such as by networking procfs files.

PiperOrigin-RevId: 262460640
2019-08-08 16:50:33 -07:00
Ian Lewis 0a246fab80 Basic support for 'ip route'
Implements support for RTM_GETROUTE requests for netlink sockets.

Fixes #507

PiperOrigin-RevId: 261051045
2019-07-31 20:30:09 -07:00
Kevin Krakauer 5ddf9adb2b Fix up and add some iptables ABI.
PiperOrigin-RevId: 259437060
2019-07-22 17:06:18 -07:00
Jamie Liu fdac770f31 Fix struct statx field alignment.
PiperOrigin-RevId: 259376740
2019-07-22 12:04:21 -07:00
Jamie Liu 163ab5e9ba Sentry virtual filesystem, v2
Major differences from the current ("v1") sentry VFS:

- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).

- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.

Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:

BenchmarkVFS1TmpfsStat/1-6               3000000               497 ns/op
BenchmarkVFS1TmpfsStat/2-6               2000000               676 ns/op
BenchmarkVFS1TmpfsStat/3-6               2000000               904 ns/op
BenchmarkVFS1TmpfsStat/8-6               1000000              1944 ns/op
BenchmarkVFS1TmpfsStat/64-6               100000             14067 ns/op
BenchmarkVFS1TmpfsStat/100-6               50000             21700 ns/op
BenchmarkVFS2MemfsStat/1-6              10000000               197 ns/op
BenchmarkVFS2MemfsStat/2-6               5000000               233 ns/op
BenchmarkVFS2MemfsStat/3-6               5000000               268 ns/op
BenchmarkVFS2MemfsStat/8-6               3000000               477 ns/op
BenchmarkVFS2MemfsStat/64-6               500000              2592 ns/op
BenchmarkVFS2MemfsStat/100-6              300000              4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6          2000000               679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6          2000000               912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6          1000000              1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6          1000000              2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6                  100000             14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6                 100000             22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6                  5000000               317 ns/op
BenchmarkVFS2MemfsMountStat/2-6                  5000000               361 ns/op
BenchmarkVFS2MemfsMountStat/3-6                  5000000               387 ns/op
BenchmarkVFS2MemfsMountStat/8-6                  3000000               582 ns/op
BenchmarkVFS2MemfsMountStat/64-6                  500000              2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6                 300000              4133 ns/op

From this we can infer that, on this machine:

- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.

- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.

- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.

PiperOrigin-RevId: 258853946
2019-07-18 15:10:29 -07:00
Jamie Liu 2bc398bfd8 Separate O_DSYNC and O_SYNC.
PiperOrigin-RevId: 258657913
2019-07-17 15:52:38 -07:00
Ayush Ranjan 8e3e021aca ext: Filesystem init implementation.
PiperOrigin-RevId: 258645957
2019-07-17 14:48:04 -07:00
Adin Scannell cceef9d2cf Cleanup straggling syscall dependencies.
PiperOrigin-RevId: 257293198
2019-07-09 16:18:02 -07:00
Adin Scannell 7dae043fec Drop ashmem and binder.
These are unfortunately unused and unmaintained. They can be brought back in
the future if need requires it.

PiperOrigin-RevId: 255697132
2019-06-28 17:20:25 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Nicolas Lacasse 35719d52c7 Implement statx.
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.

Fixes #343

PiperOrigin-RevId: 254575752
2019-06-22 13:29:26 -07:00
Fabricio Voznika 054b5632ef Update comment
PiperOrigin-RevId: 254428866
2019-06-21 10:56:42 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Ian Lewis 74e397e39a Add introspection for Linux/AMD64 syscalls
Adds simple introspection for syscall compatibility information to Linux/AMD64.
Syscalls registered in the syscall table now have associated metadata like
name, support level, notes, and URLs to relevant issues.

Syscall information can be exported as a table, JSON, or CSV using the new
'runsc help syscalls' command. Users can use this info to debug and get info
on the compatibility of the version of runsc they are running or to generate
documentation.

PiperOrigin-RevId: 252558304
2019-06-10 23:38:36 -07:00
Rahat Mahmood 315cf9a523 Use common definition of SockType.
SockType isn't specific to unix domain sockets, and the current
definition basically mirrors the linux ABI's definition.

PiperOrigin-RevId: 251956740
2019-06-06 17:00:27 -07:00
Jamie Liu b3f104507d "Implement" mbind(2).
We still only advertise a single NUMA node, and ignore mempolicy
accordingly, but mbind() at least now succeeds and has effects reflected
by get_mempolicy().

Also fix handling of nodemasks: round sizes to unsigned long (as
documented and done by Linux), and zero trailing bits when copying them
out.

PiperOrigin-RevId: 251950859
2019-06-06 16:29:46 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Michael Pratt d3ed9baac0 Implement dumpability tracking and checks
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.

Lack of support for set-uid binaries or fs creds simplifies things a
bit.

As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.

PiperOrigin-RevId: 251712714
2019-06-05 14:00:13 -07:00
Michael Pratt 6cdec6fadf Wrap comments and reword in common present tense
PiperOrigin-RevId: 249888234
Change-Id: Icfef32c3ed34809c34100c07e93e9581c786776e
2019-05-24 13:23:53 -07:00
Michael Pratt 69eac1198f Move wait constants to abi/linux package
Updates #214

PiperOrigin-RevId: 249483756
Change-Id: I0d3cf4112bed75a863d5eb08c2063fbc506cd875
2019-05-22 11:15:33 -07:00
Adin Scannell ae1bb08871 Clean up pipe internals and add fcntl support
Pipe internals are made more efficient by avoiding garbage collection.
A pool is now used that can be shared by all pipes, and buffers are
chained via an intrusive list. The documentation for pipe structures
and methods is also simplified and clarified.

The pipe tests are now parameterized, so that they are run on all
different variants (named pipes, small buffers, default buffers).

The pipe buffer sizes are exposed by fcntl, which is now supported
by this change. A size change test has been added to the suite.

These new tests uncovered a bug regarding the semantics of open
named pipes with O_NONBLOCK, which is also fixed by this CL. This
fix also addresses the lack of the O_LARGEFILE flag for named pipes.

PiperOrigin-RevId: 249375888
Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773
2019-05-21 20:12:27 -07:00
Adin Scannell 9cdae51fec Add basic plumbing for splice and stub implementation.
This does not actually implement an efficient splice or sendfile. Rather, it
adds a generic plumbing to the file internals so that this can be added. All
file implementations use the stub fileutil.NoSplice implementation, which
causes sendfile and splice to fall back to an internal copy.

A basic splice system call interface is added, along with a test.

PiperOrigin-RevId: 249335960
Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-05-21 15:18:12 -07:00