Commit Graph

162 Commits

Author SHA1 Message Date
Michael Pratt dc99897205 Add missing verb
PiperOrigin-RevId: 290821997
2020-01-21 14:47:41 -08:00
Kevin Krakauer 9143fcd7fd Add UDP matchers. 2020-01-21 14:47:17 -08:00
Nicolas Lacasse d6fb1ec6c7 Add timestamps to VFS2 tmpfs, and implement some of SetStat.
PiperOrigin-RevId: 289962040
2020-01-15 16:32:55 -08:00
Kevin Krakauer ae060a63d9 More GH comments. 2020-01-08 17:30:08 -08:00
Kevin Krakauer f26a576984 Addressed GH comments 2020-01-08 16:35:01 -08:00
Kevin Krakauer 446a250996 Comment cleanup. 2020-01-08 11:20:48 -08:00
Kevin Krakauer 8cc1c35bbd Write simple ACCEPT rules to the filter table.
This gets us closer to passing the iptables tests and opens up iptables
so it can be worked on by multiple people.

A few restrictions are enforced for security (i.e. we don't want to let
users write a bunch of iptables rules and then just not enforce them):

- Only the filter table is writable.
- Only ACCEPT rules with no matching criteria can be added.
2020-01-08 10:08:14 -08:00
Michael Pratt 354a15a234 Implement rseq(2)
PiperOrigin-RevId: 288342928
2020-01-06 11:42:44 -08:00
Haibo Xu de0d127ae6 Make some of the fcntl flags arch specific..
Some of the flags in the file system related system call
are architecture specific(O_NOFOLLOW/O_DIRECT..). Ref to
the fcntl.h file in the Linux src codes.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I354d988073bfd0c9ff5371d4e0be9da2b8fd019f
2020-01-06 06:11:07 +00:00
Dean Deng e6f4124afd Implement checks for get/setxattr at the syscall layer.
Add checks for input arguments, file type, permissions, etc. that match
the Linux implementation. A call to get/setxattr that passes all the
checks will still currently return EOPNOTSUPP. Actual support will be
added in following commits.

Only allow user.* extended attributes for the time being.

PiperOrigin-RevId: 285835159
2019-12-16 13:20:07 -08:00
Rahat Mahmood 007707a072 Implement kernfs.
PiperOrigin-RevId: 285231002
2019-12-12 11:20:47 -08:00
Jamie Liu 46651a7d26 Add most VFS methods for syscalls.
PiperOrigin-RevId: 284892289
2019-12-10 18:21:07 -08:00
Ian Gudger 13f0f6069a Implement F_GETOWN_EX and F_SETOWN_EX.
Some versions of glibc will convert F_GETOWN fcntl(2) calls into F_GETOWN_EX in
some cases.

PiperOrigin-RevId: 284089373
2019-12-05 17:28:52 -08:00
Dean Deng 684f757a22 Add support for receiving TOS and TCLASS control messages in hostinet.
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.

PiperOrigin-RevId: 282851425
2019-11-27 16:21:05 -08:00
Bin Lu 7f9c391cf1 slight changes to pkg/abi
In glibc, some structures are defined differently on different
platforms.
Such as: C.struct_stat

Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-10-24 09:15:29 +00:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Jianfeng Tan b94505ecc0 support /proc/net/route
This proc file reports routing information to applications inside the
container.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15 16:38:40 +00:00
Kevin Krakauer 7ef1c44a7f Change linux.FileMode from uint to uint16, and update VFS to use FileMode.
In Linux (include/linux/types.h), mode_t is an unsigned short.

PiperOrigin-RevId: 272956350
2019-10-04 14:20:32 -07:00
Adin Scannell c98e7f0d19 Signalfd support
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.

In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context.  Probably not
worthwhile fixing immediately.

PiperOrigin-RevId: 269901749
2019-09-18 15:16:42 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Zach Koopmans 67d7864f83 Document RWF_HIPRI not implemented for preadv2/pwritev2.
Document limitation of no reasonable implementation for RWF_HIPRI
flag (High Priority Read/Write for block-based file systems).

PiperOrigin-RevId: 264237589
2019-08-19 14:07:44 -07:00
Rahat Mahmood 6cfc76798b Document source and versioning of the TCPInfo struct.
PiperOrigin-RevId: 263637194
2019-08-15 14:05:59 -07:00
Rahat Mahmood 691c2f8173 Compute size of struct tcp_info instead of hardcoding it.
PiperOrigin-RevId: 263040624
2019-08-12 17:34:38 -07:00
Andrei Vagin af90e68623 netlink: return an error in nlmsgerr
Now if a process sends an unsupported netlink requests,
an error is returned from the send system call.

The linux kernel works differently in this case. It returns errors in the
nlmsgerr netlink message.

Reported-by: syzbot+571d99510c6f935202da@syzkaller.appspotmail.com
PiperOrigin-RevId: 262690453
2019-08-09 22:34:54 -07:00
Haibo Xu 1c9da886e7 Add initial ptrace stub and syscall support for arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I1dbd23bb240cca71d0cc30fc75ca5be28cb4c37c
PiperOrigin-RevId: 262619519
2019-08-09 13:18:11 -07:00
Rahat Mahmood 7bfad8ebb6 Return a well-defined socket address type from socket funtions.
Previously we were representing socket addresses as an interface{},
which allowed any type which could be binary.Marshal()ed to be used as
a socket address. This is fine when the address is passed to userspace
via the linux ABI, but is problematic when used from within the sentry
such as by networking procfs files.

PiperOrigin-RevId: 262460640
2019-08-08 16:50:33 -07:00
Ian Lewis 0a246fab80 Basic support for 'ip route'
Implements support for RTM_GETROUTE requests for netlink sockets.

Fixes #507

PiperOrigin-RevId: 261051045
2019-07-31 20:30:09 -07:00
Kevin Krakauer 5ddf9adb2b Fix up and add some iptables ABI.
PiperOrigin-RevId: 259437060
2019-07-22 17:06:18 -07:00
Jamie Liu fdac770f31 Fix struct statx field alignment.
PiperOrigin-RevId: 259376740
2019-07-22 12:04:21 -07:00
Jamie Liu 163ab5e9ba Sentry virtual filesystem, v2
Major differences from the current ("v1") sentry VFS:

- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).

- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.

Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:

BenchmarkVFS1TmpfsStat/1-6               3000000               497 ns/op
BenchmarkVFS1TmpfsStat/2-6               2000000               676 ns/op
BenchmarkVFS1TmpfsStat/3-6               2000000               904 ns/op
BenchmarkVFS1TmpfsStat/8-6               1000000              1944 ns/op
BenchmarkVFS1TmpfsStat/64-6               100000             14067 ns/op
BenchmarkVFS1TmpfsStat/100-6               50000             21700 ns/op
BenchmarkVFS2MemfsStat/1-6              10000000               197 ns/op
BenchmarkVFS2MemfsStat/2-6               5000000               233 ns/op
BenchmarkVFS2MemfsStat/3-6               5000000               268 ns/op
BenchmarkVFS2MemfsStat/8-6               3000000               477 ns/op
BenchmarkVFS2MemfsStat/64-6               500000              2592 ns/op
BenchmarkVFS2MemfsStat/100-6              300000              4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6          2000000               679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6          2000000               912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6          1000000              1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6          1000000              2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6                  100000             14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6                 100000             22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6                  5000000               317 ns/op
BenchmarkVFS2MemfsMountStat/2-6                  5000000               361 ns/op
BenchmarkVFS2MemfsMountStat/3-6                  5000000               387 ns/op
BenchmarkVFS2MemfsMountStat/8-6                  3000000               582 ns/op
BenchmarkVFS2MemfsMountStat/64-6                  500000              2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6                 300000              4133 ns/op

From this we can infer that, on this machine:

- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.

- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.

- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.

PiperOrigin-RevId: 258853946
2019-07-18 15:10:29 -07:00
Jamie Liu 2bc398bfd8 Separate O_DSYNC and O_SYNC.
PiperOrigin-RevId: 258657913
2019-07-17 15:52:38 -07:00
Ayush Ranjan 8e3e021aca ext: Filesystem init implementation.
PiperOrigin-RevId: 258645957
2019-07-17 14:48:04 -07:00
Adin Scannell cceef9d2cf Cleanup straggling syscall dependencies.
PiperOrigin-RevId: 257293198
2019-07-09 16:18:02 -07:00
Adin Scannell 7dae043fec Drop ashmem and binder.
These are unfortunately unused and unmaintained. They can be brought back in
the future if need requires it.

PiperOrigin-RevId: 255697132
2019-06-28 17:20:25 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Nicolas Lacasse 35719d52c7 Implement statx.
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.

Fixes #343

PiperOrigin-RevId: 254575752
2019-06-22 13:29:26 -07:00
Fabricio Voznika 054b5632ef Update comment
PiperOrigin-RevId: 254428866
2019-06-21 10:56:42 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Ian Lewis 74e397e39a Add introspection for Linux/AMD64 syscalls
Adds simple introspection for syscall compatibility information to Linux/AMD64.
Syscalls registered in the syscall table now have associated metadata like
name, support level, notes, and URLs to relevant issues.

Syscall information can be exported as a table, JSON, or CSV using the new
'runsc help syscalls' command. Users can use this info to debug and get info
on the compatibility of the version of runsc they are running or to generate
documentation.

PiperOrigin-RevId: 252558304
2019-06-10 23:38:36 -07:00
Rahat Mahmood 315cf9a523 Use common definition of SockType.
SockType isn't specific to unix domain sockets, and the current
definition basically mirrors the linux ABI's definition.

PiperOrigin-RevId: 251956740
2019-06-06 17:00:27 -07:00
Jamie Liu b3f104507d "Implement" mbind(2).
We still only advertise a single NUMA node, and ignore mempolicy
accordingly, but mbind() at least now succeeds and has effects reflected
by get_mempolicy().

Also fix handling of nodemasks: round sizes to unsigned long (as
documented and done by Linux), and zero trailing bits when copying them
out.

PiperOrigin-RevId: 251950859
2019-06-06 16:29:46 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Michael Pratt d3ed9baac0 Implement dumpability tracking and checks
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.

Lack of support for set-uid binaries or fs creds simplifies things a
bit.

As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.

PiperOrigin-RevId: 251712714
2019-06-05 14:00:13 -07:00
Michael Pratt 6cdec6fadf Wrap comments and reword in common present tense
PiperOrigin-RevId: 249888234
Change-Id: Icfef32c3ed34809c34100c07e93e9581c786776e
2019-05-24 13:23:53 -07:00
Michael Pratt 69eac1198f Move wait constants to abi/linux package
Updates #214

PiperOrigin-RevId: 249483756
Change-Id: I0d3cf4112bed75a863d5eb08c2063fbc506cd875
2019-05-22 11:15:33 -07:00
Adin Scannell ae1bb08871 Clean up pipe internals and add fcntl support
Pipe internals are made more efficient by avoiding garbage collection.
A pool is now used that can be shared by all pipes, and buffers are
chained via an intrusive list. The documentation for pipe structures
and methods is also simplified and clarified.

The pipe tests are now parameterized, so that they are run on all
different variants (named pipes, small buffers, default buffers).

The pipe buffer sizes are exposed by fcntl, which is now supported
by this change. A size change test has been added to the suite.

These new tests uncovered a bug regarding the semantics of open
named pipes with O_NONBLOCK, which is also fixed by this CL. This
fix also addresses the lack of the O_LARGEFILE flag for named pipes.

PiperOrigin-RevId: 249375888
Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773
2019-05-21 20:12:27 -07:00
Adin Scannell 9cdae51fec Add basic plumbing for splice and stub implementation.
This does not actually implement an efficient splice or sendfile. Rather, it
adds a generic plumbing to the file internals so that this can be added. All
file implementations use the stub fileutil.NoSplice implementation, which
causes sendfile and splice to fall back to an internal copy.

A basic splice system call interface is added, along with a test.

PiperOrigin-RevId: 249335960
Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-05-21 15:18:12 -07:00
Fabricio Voznika 1bee43be13 Implement fallocate(2)
Closes #225

PiperOrigin-RevId: 247508791
Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
2019-05-09 15:35:49 -07:00
Kevin Krakauer 264d012d81 Add netfilter ABI for iptables support.
Change-Id: Ifbd2abf63ea8062a89b83e948d3e9735480d8216
PiperOrigin-RevId: 246559904
2019-05-03 13:06:09 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Ian Gudger 133700007a Only emit unimplemented syscall events for unsupported values.
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER
when attempting to set unsupported values.

PiperOrigin-RevId: 244229675
Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
2019-04-18 11:51:41 -07:00
Kevin Krakauer 82529becae Fix index out of bounds in tty implementation.
The previous implementation revolved around runes instead of bytes, which caused
weird behavior when converting between the two. For example, peekRune would read
the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an
alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for
?. However, peekRune also returned the length of the byte (1). When calling
utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte
character ?.

tl;dr: I apparently didn't understand runes when I wrote this.

PiperOrigin-RevId: 241789081
Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727
2019-04-03 13:00:34 -07:00
Rahat Mahmood 06ec97a3f8 Implement memfd_create.
Memfds are simply anonymous tmpfs files with no associated
mounts. Also implementing file seals, which Linux only implements for
memfds at the moment.

PiperOrigin-RevId: 240450031
Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f
2019-03-26 16:16:57 -07:00
Fabricio Voznika 0b76887147 Priority-inheritance futex implementation
It is Implemented without the priority inheritance part given
that gVisor defers scheduling decisions to Go runtime and doesn't
have control over it.

PiperOrigin-RevId: 236989545
Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
2019-03-05 23:40:18 -08:00
Fabricio Voznika 3dbd4a16f8 Add semctl(GETPID) syscall
Also added unimplemented notification for semctl(2)
commands.

PiperOrigin-RevId: 236340672
Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478
2019-03-01 10:57:02 -08:00
Nicolas Lacasse e884168e1e Encode stat to bytes manually, instead of calling CopyObjectOut.
CopyObjectOut grows its destination byte slice incrementally, causing
many small slice allocations on the heap. This leads to increased GC and
noticeably slower stat calls.

PiperOrigin-RevId: 233140904
Change-Id: Ieb90295dd8dd45b3e56506fef9d7f86c92e97d97
2019-02-08 15:48:23 -08:00
Ian Gudger 80f901b16b Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack.
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in
a followup CL.

PiperOrigin-RevId: 233008638
Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
2019-02-07 23:15:23 -08:00
Rahat Mahmood 2ba74f84be Implement /proc/net/unix.
PiperOrigin-RevId: 232948478
Change-Id: Ib830121e5e79afaf5d38d17aeef5a1ef97913d23
2019-02-07 14:44:21 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Nicolas Lacasse dc8450b567 Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.
More helper structs have been added to the fsutil package to make it easier to
implement fs.InodeOperations and fs.FileOperations.

PiperOrigin-RevId: 229305982
Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2019-01-14 20:34:28 -08:00
Ian Gudger b709997d78 Rename linux.Errno.Error to linux.Errno.String.
Using linux.Errno as an error doesn't work very well as none of the sentry code
expects error to contain a linux.Errno.

This moves using syserr.Error.ToLinux as an error in a syscall handler from a
runtime error to a compile error.

PiperOrigin-RevId: 227744312
Change-Id: Iea63108a5b198296c908614e09c01733dd684da0
2019-01-03 13:53:43 -08:00
Ian Gudger b515556519 Implement SO_KEEPALIVE, TCP_KEEPIDLE, and TCP_KEEPINTVL.
Within gVisor, plumb new socket options to netstack.

Within netstack, fix GetSockOpt and SetSockOpt return value logic.

PiperOrigin-RevId: 226532229
Change-Id: If40734e119eed633335f40b4c26facbebc791c74
2018-12-21 13:13:45 -08:00
Jamie Liu 9a442fa4b5 Automated rollback of changelist 226224230
PiperOrigin-RevId: 226493053
Change-Id: Ia98d1cb6dd0682049e4d907ef69619831de5c34a
2018-12-21 08:23:34 -08:00
Googler 86c9bd2547 Automated rollback of changelist 225861605
PiperOrigin-RevId: 226224230
Change-Id: Id24c7d3733722fd41d5fe74ef64e0ce8c68f0b12
2018-12-19 13:30:08 -08:00
Zach Koopmans ff7178a4d1 Implement pwritev2.
Implement pwritev2 and associated unit tests.
Clean up preadv2 unit tests.
Tag RWF_ flags in both preadv2 and pwritev2 with associated bug tickets.

PiperOrigin-RevId: 226222119
Change-Id: Ieb22672418812894ba114bbc88e67f1dd50de620
2018-12-19 13:16:06 -08:00
Fabricio Voznika 03226cd950 Add BPFAction type with Stringer
PiperOrigin-RevId: 226018694
Change-Id: I98965e26fe565f37e98e5df5f997363ab273c91b
2018-12-18 10:28:28 -08:00
Jamie Liu 2421006426 Implement mlock(), kind of.
Currently mlock() and friends do nothing whatsoever. However, mlocking
is directly application-visible in a number of ways; for example,
madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked
regions. We handle this inconsistently: MADV_DONTNEED is too important
to not work, but MS_INVALIDATE is rejected.

Change MM to track mlocked regions in a manner consistent with Linux.
It still will not actually pin pages into host physical memory, but:

- mlock() will now cause sentry memory management to precommit mlocked
pages.

- MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as
described above.

PiperOrigin-RevId: 225861605
Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee
2018-12-17 11:38:59 -08:00
Michael Pratt 42e2e5cae9 Format sigaction in strace
Sample:

I1206 14:24:56.768520    3700 x:0] [   1] ioctl_test E rt_sigaction(SIGSEGV, 0x7ee6edb0c590 {Handler: 0x559c6d915cf0, Flags: SA_SIGINFO|SA_RESTORER|SA_ONSTACK|SA_NODEFER, Restorer: 0x2a9901a259a0, Mask: []}, 0x7ee6edb0c630)
I1206 14:24:56.768530    3700 x:0] [   1] ioctl_test X rt_sigaction(SIGSEGV, 0x7ee6edb0c590 {Handler: 0x559c6d915cf0, Flags: SA_SIGINFO|SA_RESTORER|SA_ONSTACK|SA_NODEFER, Restorer: 0x2a9901a259a0, Mask: []}, 0x7ee6edb0c630 {Handler: SIG_DFL, Flags: 0x0, Restorer: 0x0, Mask: []}) = 0x0 (2.701?s)

PiperOrigin-RevId: 224596606
Change-Id: I3512493aed99d3d75600249263da46686b1dc0e7
2018-12-07 16:28:54 -08:00
Michael Pratt 51900fe3a4 Format signals, signal masks in strace
Sample:

I1205 16:51:49.869701    2492 x:0] [   1] ioctl_test E rt_sigaction(SIGIO, 0x7e0e5b5e8500, 0x7e0e5b5e85a0)
I1205 16:51:49.869766    2492 x:0] [   1] ioctl_test X rt_sigaction(SIGIO, 0x7e0e5b5e8500, 0x7e0e5b5e85a0) = 0x0 (44.336?s)
I1205 16:51:49.869831    2492 x:0] [   1] ioctl_test E rt_sigprocmask(SIG_UNBLOCK, 0x7e0e5b5e8878 [SIGIO], 0x7e0e5b5e87c0, 0x8)
I1205 16:51:49.869866    2492 x:0] [   1] ioctl_test X rt_sigprocmask(SIG_UNBLOCK, 0x7e0e5b5e8878 [SIGIO], 0x7e0e5b5e87c0 [SIGIO 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64], 0x8) = 0x0 (2.575?s)

PiperOrigin-RevId: 224422404
Change-Id: I3ed3f2ec6b1a639baa9cacd37ce7ee325c3703e4
2018-12-06 15:47:06 -08:00
Michael Pratt 666db00c26 Convert ValueSet to a map
Unlike FlagSet, order doesn't matter here, so it can simply be a map.

PiperOrigin-RevId: 224377910
Change-Id: I15810c698a7f02d8614bf09b59583ab73cba0514
2018-12-06 11:43:11 -08:00
Bin Lu c3dd68cea7 Add ARM64 support to pkg/abi/linux
Signed-off-by: Bin Lu <bin.lu@arm.com>
Change-Id: I73cc4c406fadccb054e8e83c9464f6bef6280b0f
PiperOrigin-RevId: 224025309
2018-12-04 12:24:07 -08:00
Zach Koopmans b3b60ea29a Implementation of preadv2 for Linux 4.4 support
Implement RWF_HIPRI (4.6) silently passes the read call.
Implement -1 offset calls readv.

PiperOrigin-RevId: 222840324
Change-Id: If9ddc1e8d086e1a632bdf5e00bae08205f95b6b0
2018-11-26 09:50:47 -08:00
Fabricio Voznika eaac94d91c Use RET_KILL_PROCESS if available in kernel
RET_KILL_THREAD doesn't work well for Go because it will
kill only the offending thread and leave the process hanging.
RET_TRAP can be masked out and it's not guaranteed to kill
the process. RET_KILL_PROCESS is available since 4.14.

For older kernel, continue to use RET_TRAP as this is the
best option (likely to kill process, easy to debug).

PiperOrigin-RevId: 222357867
Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
2018-11-20 22:56:51 -08:00
Fabricio Voznika fadffa2ff8 Add unsupported syscall events for get/setsockopt
PiperOrigin-RevId: 222148953
Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa
2018-11-20 14:04:12 -08:00
Andrei Vagin 2ef122da35 Implement sync_file_range()
sync_file_range - sync a file segment with disk

In Linux, sync_file_range() accepts three flags:

       SYNC_FILE_RANGE_WAIT_BEFORE
              Wait  upon  write-out  of  all pages in the specified range that
              have already been submitted to the device driver  for  write-out
              before performing any write.

       SYNC_FILE_RANGE_WRITE
              Initiate  write-out  of  all  dirty pages in the specified range
              which are not presently submitted  write-out.   Note  that  even
              this  may  block if you attempt to write more than request queue
              size.

       SYNC_FILE_RANGE_WAIT_AFTER
              Wait upon write-out of all pages in the range  after  performing
              any write.

In this implementation:

SYNC_FILE_RANGE_WAIT_BEFORE without SYNC_FILE_RANGE_WAIT_AFTER isn't
supported right now.

SYNC_FILE_RANGE_WRITE is skipped. It should initiate write-out of  all
dirty pages, but it doesn't wait, so it should be safe to do nothing
while nobody uses SYNC_FILE_RANGE_WAIT_BEFORE.

SYNC_FILE_RANGE_WAIT_AFTER is equal to fdatasync(). In Linux,
sync_file_range() doesn't writes out the  file's  meta-data, but
fdatasync() does if a file size is changed.

PiperOrigin-RevId: 220730840
Change-Id: Iae5dfb23c2c916967d67cf1a1ad32f25eb3f6286
2018-11-08 17:39:51 -08:00
Fabricio Voznika b2068cf5a5 Add more unimplemented syscall events
Added events for *ctl syscalls that may have multiple different commands.
For runsc, each syscall event is only logged once. For *ctl syscalls, use
the cmd as identifier, not only the syscall number.

PiperOrigin-RevId: 218015941
Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59
2018-10-20 11:14:23 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Adin Scannell 463e73d46d Add seccomp filter configuration to ptrace stubs.
This is a defense-in-depth measure. If the sentry is compromised, this prevents
system call injection to the stubs. There is some complexity with respect to
ptrace and seccomp interactions, so this protection is not really available
for kernel versions < 4.8; this is detected dynamically.

Note that this also solves the vsyscall emulation issue by adding in
appropriate trapping for those system calls. It does mean that a compromised
sentry could theoretically inject these into the stub (ignoring the trap and
resume, thereby allowing execution), but they are harmless.

PiperOrigin-RevId: 216647581
Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
2018-10-10 22:40:28 -07:00
Brian Geffon acf7a95189 Add memunit to sysinfo(2).
Also properly add padding after Procs in the linux.Sysinfo
structure. This will be implicitly padded to 64bits so we
need to do the same.

PiperOrigin-RevId: 216372907
Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
2018-10-09 09:52:14 -07:00
Michael Pratt 569c2b06c4 Statfs Namelen should be NAME_MAX not PATH_MAX
We accidentally set the wrong maximum. I've also added PATH_MAX and
NAME_MAX to the linux abi package.

PiperOrigin-RevId: 216221311
Change-Id: I44805fcf21508831809692184a0eba4cee469633
2018-10-08 11:39:54 -07:00
Nicolas Lacasse 213f6688a5 Implement TIOCSCTTY ioctl as a noop.
PiperOrigin-RevId: 215658757
Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
2018-10-03 17:29:56 -07:00
Michael Pratt 0400e54592 Add itimer types to linux package, strace
PiperOrigin-RevId: 215278262
Change-Id: Icd10384c99802be6097be938196044386441e282
2018-10-01 14:16:53 -07:00
Nicolas Lacasse b709d23987 Forward ioctl(TCSETSF) calls on host ttys to the host kernel.
We already forward TCSETS and TCSETSW.  TCSETSF is roughly equivalent but
discards pending input.

The filters were relaxed to allow host ioctls with TCSETSF argument.

This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.

Before:
	root@b8a0240fc836:/# passwd
	Enter new UNIX password: 123
	Retype new UNIX password: 123
	passwd: password updated successfully

After:
	root@ae6f5dabe402:/# passwd
	Enter new UNIX password:
	Retype new UNIX password:
	passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27 18:17:38 -07:00
Michael Pratt 3aa50f18a4 Reuse readlink parameter, add sockaddr max.
PiperOrigin-RevId: 213058623
Change-Id: I522598c655d633b9330990951ff1c54d1023ec29
2018-09-14 16:00:02 -07:00
Brian Geffon 2b8dae0bc5 Open(2) isn't honoring O_NOFOLLOW
PiperOrigin-RevId: 211644897
Change-Id: I882ed827a477d6c03576463ca5bf2d6351892b90
2018-09-05 09:21:28 -07:00
Brian Geffon f0492d45aa Add /proc/sys/kernel/shm[all,max,mni].
PiperOrigin-RevId: 210459956
Change-Id: I51859b90fa967631e0a54a390abc3b5541fbee66
2018-08-27 17:21:37 -07:00
Kevin Krakauer 2524111fc6 runsc: Terminal resizing support.
Implements the TIOCGWINSZ and TIOCSWINSZ ioctls, which allow processes to resize
the terminal. This allows, for example, sshd to properly set the window size for
ssh sessions.

PiperOrigin-RevId: 210392504
Change-Id: I0d4789154d6d22f02509b31d71392e13ee4a50ba
2018-08-27 10:49:16 -07:00
Nicolas Lacasse 106de2182d runsc: Terminal support for "docker exec -ti".
This CL adds terminal support for "docker exec".  We previously only supported
consoles for the container process, but not exec processes.

The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.

Note that control-character signals are still not properly supported.

Tested with:
	$ docker run --runtime=runsc -it alpine
In another terminial:
	$ docker exec -it <containerid> /bin/sh

PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24 17:43:21 -07:00
Jamie Liu 64403265a0 Implement POSIX per-process interval timers.
PiperOrigin-RevId: 210021612
Change-Id: If7c161e6fd08cf17942bfb6bc5a8d2c4e271c61e
2018-08-23 16:32:36 -07:00
Nicolas Lacasse 6cf2278167 Automated rollback of changelist 208284483
PiperOrigin-RevId: 208685417
Change-Id: Ie2849c4811e3a2d14a002f521cef018ded0c6c4a
2018-08-14 11:50:49 -07:00
Justine Olshan ae6f092fe1 Implemented the splice(2) syscall.
Currently the implementation matches the behavior of moving data
between two file descriptors. However, it does not implement this
through zero-copy movement. Thus, this code is a starting point
to build the more complex implementation.

PiperOrigin-RevId: 208284483
Change-Id: Ibde79520a3d50bc26aead7ad4f128d2be31db14e
2018-08-10 16:11:01 -07:00
Michael Pratt 2e06b23aa6 Fix missing O_LARGEFILE from O_CREAT files
Cleanup some more syscall.O_* references while we're here.

PiperOrigin-RevId: 208133460
Change-Id: I48db71a38f817e4f4673977eafcc0e3874eb9a25
2018-08-09 16:50:37 -07:00
Fabricio Voznika 4e171f7590 Basic support for ip link/addr and ifconfig
Closes #94

PiperOrigin-RevId: 207997580
Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924
2018-08-08 22:39:58 -07:00
Michael Pratt b6a37ab9d9 Update comment reference
PiperOrigin-RevId: 207180809
Change-Id: I08c264812919e81b2c56fdd4a9ef06924de8b52f
2018-08-02 15:56:40 -07:00
Zhaozhong Ni 57d0fcbdbf Automated rollback of changelist 207037226
PiperOrigin-RevId: 207125440
Change-Id: I6c572afb4d693ee72a0c458a988b0e96d191cd49
2018-08-02 10:42:48 -07:00
Michael Pratt 60add78980 Automated rollback of changelist 207007153
PiperOrigin-RevId: 207037226
Change-Id: I8b5f1a056d4f3eab17846f2e0193bb737ecb5428
2018-08-01 19:57:32 -07:00
Zhaozhong Ni b9e1cf8404 stateify: convert all packages to use explicit mode.
PiperOrigin-RevId: 207007153
Change-Id: Ifedf1cc3758dc18be16647a4ece9c840c1c636c9
2018-08-01 15:43:24 -07:00
Justine Olshan 2793f7ac5f Added the O_LARGEFILE flag.
This flag will always be true for gVisor files.

PiperOrigin-RevId: 206355963
Change-Id: I2f03d2412e2609042df43b06d1318cba674574d0
2018-07-27 12:27:46 -07:00