Commit Graph

176 Commits

Author SHA1 Message Date
Adin Scannell ae1bb08871 Clean up pipe internals and add fcntl support
Pipe internals are made more efficient by avoiding garbage collection.
A pool is now used that can be shared by all pipes, and buffers are
chained via an intrusive list. The documentation for pipe structures
and methods is also simplified and clarified.

The pipe tests are now parameterized, so that they are run on all
different variants (named pipes, small buffers, default buffers).

The pipe buffer sizes are exposed by fcntl, which is now supported
by this change. A size change test has been added to the suite.

These new tests uncovered a bug regarding the semantics of open
named pipes with O_NONBLOCK, which is also fixed by this CL. This
fix also addresses the lack of the O_LARGEFILE flag for named pipes.

PiperOrigin-RevId: 249375888
Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773
2019-05-21 20:12:27 -07:00
Michael Pratt c8857f7269 Fix inconsistencies in ELF anonymous mappings
* A segment with filesz == 0, memsz > 0 should be an anonymous only
  mapping. We were failing to load such an ELF.
* Anonymous pages are always mapped RW, regardless of the segment
  protections.

PiperOrigin-RevId: 249355239
Change-Id: I251e5c0ce8848cf8420c3aadf337b0d77b1ad991
2019-05-21 17:06:05 -07:00
Adin Scannell 9cdae51fec Add basic plumbing for splice and stub implementation.
This does not actually implement an efficient splice or sendfile. Rather, it
adds a generic plumbing to the file internals so that this can be added. All
file implementations use the stub fileutil.NoSplice implementation, which
causes sendfile and splice to fall back to an internal copy.

A basic splice system call interface is added, along with a test.

PiperOrigin-RevId: 249335960
Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-05-21 15:18:12 -07:00
Michael Pratt 6588427451 Fix incorrect tmpfs timestamp updates
* Creation of files, directories (and other fs objects) in a directory
  should always update ctime.
* Same for removal.
* atime should not be updated on lookup, only readdir.

I've also renamed some misleading functions that update mtime and ctime.

PiperOrigin-RevId: 249115063
Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7
2019-05-20 13:35:17 -07:00
Michael Pratt 04105781ad Fix gofer rename ctime and cleanup stat_times test
There is a lot of redundancy that we can simplify in the stat_times
test. This will make it easier to add new tests. However, the
simplification reveals that cached uattrs on goferfs don't properly
update ctime on rename.

PiperOrigin-RevId: 248773425
Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf
2019-05-17 13:05:47 -07:00
Ian Gudger 40419a16eb Add test for duplicate proc entries.
The issue with duplicate /proc/sys entries seems to have been fixed in:
PiperOrigin-RevId 229305982
Git hash dc8450b567

Fixes google/gvisor#125

PiperOrigin-RevId: 248571903
Change-Id: I76ff3b525c93dafb92da6e5cf56e440187f14579
2019-05-16 11:59:01 -07:00
Michael Pratt dc4a042f3a Update out of date comment
PiperOrigin-RevId: 248265524
Change-Id: Ib9082f08d24ba10535079cf89c714fb22a4fdf10
2019-05-14 20:58:53 -07:00
Michael Pratt c61a2e709a Modernize mknod test
PiperOrigin-RevId: 247704588
Change-Id: I1e63e2b310145695fbe38429b91e44d72473fcd6
2019-05-10 17:37:43 -07:00
Fabricio Voznika 1bee43be13 Implement fallocate(2)
Closes #225

PiperOrigin-RevId: 247508791
Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
2019-05-09 15:35:49 -07:00
Googler c3b6d4587e Fix types that are subtly incorrect.
PiperOrigin-RevId: 247294093
Change-Id: Iac8c76e50bbc15c240ae7da7f5786f9968e7057c
2019-05-08 14:40:09 -07:00
Kevin Krakauer ff8ed5e6a5 Fix raw socket behavior and tests.
Some behavior was broken due to the difficulty of running automated raw
socket tests.

Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
2019-05-05 16:07:25 -07:00
Andrei Vagin 9e1c253fe8 gvisor: run bazel in a docker container
bazel has a lot of dependencies and users don't want to install them
just to build gvisor.

These changes allows to run bazel in a docker container.
A bazel cache is on the local file system (~/.cache/bazel), so
incremental builds should be fast event after recreating a bazel
container.

Here is an example how to build runsc:
make BAZEL_OPTIONS="build runsc:runsc" bazel

Change-Id: I8c0a6d0c30e835892377fb6dd5f4af7a0052d12a
PiperOrigin-RevId: 246570877
2019-05-03 14:13:08 -07:00
Fabricio Voznika 6b9ab65163 Skip flaky ClockGettime.CputimeId take 2
The test also times out when GCE machine has 2 CPUs. I cannot
repro it locally with a 2 CPU cgroup though. Let's skip the
test when there are 2 CPUs to stop the flakiness and retest it
once the fix is available.

PiperOrigin-RevId: 246523363
Change-Id: I9d9d922a5be3aa7bc91dff5a1807ca99f3f4a4f9
2019-05-03 09:42:10 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Kevin Krakauer bf40fa2129 Replace dynamic macros with constants in memfd test.
PiperOrigin-RevId: 246433167
Change-Id: Idb9b6c20ee1da193176288dfd2f9d85ec0e69c54
2019-05-02 18:57:58 -07:00
Ian Gudger 81ecd8b6ea Implement the MSG_CTRUNC msghdr flag for Unix sockets.
Updates google/gvisor#206

PiperOrigin-RevId: 245880573
Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29 21:21:08 -07:00
Fabricio Voznika 2843f2a956 Skip flaky ClockGettime.CputimeId
Test times out when it runs on a single core. Skip until the
bug in the Go runtime is fixed.

PiperOrigin-RevId: 245866466
Change-Id: Ic3e72131c27136d58b71f6b11acc78abf55895d4
2019-04-29 18:41:54 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Tamir Duberstein ac8fca1ef4 Appease googletest deprecation
PiperOrigin-RevId: 245788366
Change-Id: I17bbecf8493132dbe95564c34c45b838194bfabb
2019-04-29 11:34:16 -07:00
Nicolas Lacasse 2df64cd6d2 createAt should return all errors from FindInode except ENOENT.
Previously, createAt was eating all errors from FindInode except for EACCES and
proceeding with the creation. This is incorrect, as FindInode can return many
other errors (like ENAMETOOLONG) that should stop creation.

This CL changes createAt to return all errors encountered except for ENOENT,
which we can ignore because we are about to create the thing.

PiperOrigin-RevId: 245773222
Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
2019-04-29 10:30:24 -07:00
Tamir Duberstein 59442238d4 Remove syscall tests' dependency on glog
PiperOrigin-RevId: 245469859
Change-Id: I0610e477cc3a884275852e83028ecfb501f2c039
2019-04-26 12:47:46 -07:00
Kevin Krakauer 5f13338d30 Fix reference counting bug in /proc/PID/fdinfo/.
PiperOrigin-RevId: 245452217
Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
2019-04-26 11:09:55 -07:00
Kevin Krakauer f4d34b420b Change name of sticky test arg.
PiperOrigin-RevId: 245451875
Change-Id: Icee2c4ed74564e77454c60d60f456454443ccadf
2019-04-26 11:08:08 -07:00
Jamie Liu 6b76c172b4 Don't enforce NAME_MAX in fs.Dirent.walk().
Maximum filename length is filesystem-dependent, and obtained via
statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not
always. For example, VFAT supports filenames of up to 255... UCS-2
characters, which Linux conservatively takes to mean UTF-8-encoded
bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
As a result, Linux's VFS does not enforce NAME_MAX:

$ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/
fs/libfs.c
38:     buf->f_namelen = NAME_MAX;
64:     if (dentry->d_name.len > NAME_MAX)

include/linux/relay.h
74:     char base_filename[NAME_MAX];   /* saved base filename */

include/linux/fscrypt.h
149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length.

include/linux/exportfs.h
176: *    understanding that it is already pointing to a a %NAME_MAX+1 sized

Remove this check from core VFS, and add it to ramfs (and by extension
tmpfs), where it is actually applicable:
mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does*
enforce NAME_MAX.

PiperOrigin-RevId: 245324748
Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-25 16:05:13 -07:00
Tamir Duberstein 7219781040 s,sys/poll.h/,poll.h,g
See https://git.musl-libc.org/cgit/musl/tree/include/sys/poll.h

PiperOrigin-RevId: 245312375
Change-Id: If749ae3f94ccedc82eb6b594b32155924a354b58
2019-04-25 14:57:06 -07:00
Tamir Duberstein 992b66e688 Handle glibc and XSI variants of strerror_r
PiperOrigin-RevId: 245306581
Change-Id: I44a034310809f8e9e651be8023ff1985561602fc
2019-04-25 14:23:46 -07:00
Tamir Duberstein 9c638f1beb Remove useless modifiers
PiperOrigin-RevId: 245304611
Change-Id: Ie0e9bfc03d064e41d50157eeb4df22b2635f41e2
2019-04-25 14:12:51 -07:00
Ian Gudger 962567aafd Add Unix socket tests for the MSG_CTRUNC msghdr flag.
TCP tests and the implementation will come in followup CLs.

Updates google/gvisor#206
Updates google/gvisor#207

PiperOrigin-RevId: 245121470
Change-Id: Ib50b62724d3ba0cbfb1374e1f908798431ee2b21
2019-04-24 14:51:42 -07:00
Wei Zhang 17ff6063a3 Bugfix: fix fstatat symbol link to dir
For a symbol link to some directory, eg.

`/tmp/symlink -> /tmp/dir`

`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.

Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.

Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
2019-04-22 20:07:06 -07:00
Ben Burkert 56927e5317 tcpip/transport/tcp: read side only shutdown of an endpoint
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.

Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.

Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
2019-04-19 19:29:05 -07:00
Ian Gudger 358eb52a76 Add support for the MSG_TRUNC msghdr flag.
The MSG_TRUNC flag is set in the msghdr when a message is truncated.

Fixes google/gvisor#200

PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-19 16:17:01 -07:00
Nicolas Lacasse ce64d9ebf0 Keep symlink target open while in test that compares inode ids.
Inode ids are only guaranteed to be stable across save/restore if the file is
held open. This CL fixes a simple stat test to allow it to compare symlink and
target by inode id, as long as the link target is held open.

PiperOrigin-RevId: 244238343
Change-Id: I74c5115915b1cc032a4c16515a056a480f218f00
2019-04-18 12:39:35 -07:00
Michael Pratt b52cbd6028 Don't allow sigtimedwait to catch unblockable signals
The existing logic attempting to do this is incorrect. Unary ^ has
higher precedence than &^, so mask always has UnblockableSignals
cleared, allowing dequeueSignalLocked to dequeue unblockable signals
(which allows userspace to ignore them).

Switch the logic so that unblockable signals are always masked.

PiperOrigin-RevId: 244058487
Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
2019-04-17 13:43:20 -07:00
Fabricio Voznika c8cee7108f Use FD limit and file size limit from host
FD limit and file size limit is read from the host, instead
of using hard-coded defaults, given that they effect the sandbox
process. Also limit the direct cache to use no more than half
if the available FDs.

PiperOrigin-RevId: 244050323
Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-04-17 12:57:40 -07:00
Adin Scannell efacb8d900 CONTRIBUTING: add style guide pointer
Change-Id: I93a78a6b2bb2eaa69046c6cfecee2e4cfcf20e44
PiperOrigin-RevId: 243140359
2019-04-11 14:18:01 -07:00
Michael Pratt cc48969bb7 Internal change
PiperOrigin-RevId: 242978508
Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
2019-04-10 18:00:18 -07:00
Kevin Krakauer c8368e477b rlimits test: don't exceed nr_open.
Even superuser cannot raise RLIMIT_NOFILE above /proc/sys/fs/nr_open, so
start the test by lowering the limits before raising.

Change-Id: Ied6021c64178a6cb9098088a1a3384db523a226f
PiperOrigin-RevId: 242965249
2019-04-10 16:34:50 -07:00
Kevin Krakauer f7aff0aaa4 Allow threads with CAP_SYS_RESOURCE to raise hard rlimits.
PiperOrigin-RevId: 242919489
Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
2019-04-10 12:36:45 -07:00
Shiva Prasanth 7140b1fdca Fixed /proc/cpuinfo permissions
This also applies these permissions to other static proc files.

Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d
PiperOrigin-RevId: 242898575
2019-04-10 10:49:43 -07:00
Michael Pratt 0e14e48b84 Match multi-word State
From a recent test failure:

"State:\tD (disk sleep)\n"

"disk sleep" does not match \w+. We need to allow spaces.

PiperOrigin-RevId: 242762469
Change-Id: Ic8d05a16669412a72c1e76b498373e5b22fe64c4
2019-04-09 16:26:11 -07:00
Michael Pratt 05979a7547 Internal change
PiperOrigin-RevId: 242573252
Change-Id: Ibb4c6bfae2c2e322bf1cec23181a0ab663d8530a
2019-04-08 17:35:51 -07:00
Jamie Liu 9471c01348 Export kernel.SignalInfoPriv.
Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks.

PiperOrigin-RevId: 242562428
Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90
2019-04-08 16:32:11 -07:00
Michael Pratt 218a7b5449 Add TODO
PiperOrigin-RevId: 242531141
Change-Id: I2a3bd815bda09f392f511f47120d5d9e6e86a40d
2019-04-08 13:48:40 -07:00
Jamie Liu 124bafc81c Deflake PtraceTest.SeizeSetOptions.
PiperOrigin-RevId: 242226319
Change-Id: Iefc78656841315f6b7d48bd85db451486850264d
2019-04-05 17:54:31 -07:00
Andrei Vagin 88409e983c gvisor: Add support for the MS_NOEXEC mount option
https://github.com/google/gvisor/issues/145

PiperOrigin-RevId: 242044115
Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878
2019-04-04 17:43:53 -07:00
Kevin Krakauer 82529becae Fix index out of bounds in tty implementation.
The previous implementation revolved around runes instead of bytes, which caused
weird behavior when converting between the two. For example, peekRune would read
the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an
alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for
?. However, peekRune also returned the length of the byte (1). When calling
utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte
character ?.

tl;dr: I apparently didn't understand runes when I wrote this.

PiperOrigin-RevId: 241789081
Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727
2019-04-03 13:00:34 -07:00
Jamie Liu c4caccd540 Set options on the correct Task in PTRACE_SEIZE.
$ docker run --rm --runtime=runsc -it --cap-add=SYS_PTRACE debian bash -c "apt-get update && apt-get install strace && strace ls"
...
Setting up strace (4.15-2) ...
execve("/bin/ls", ["ls"], [/* 6 vars */]) = 0
brk(NULL)                               = 0x5646d8c1e000
uname({sysname="Linux", nodename="114ef93d2db3", ...}) = 0
...

PiperOrigin-RevId: 241643321
Change-Id: Ie4bce27a7fb147eef07bbae5895c6ef3f529e177
2019-04-02 18:13:19 -07:00
Kevin Krakauer 5c465603b6 Add build rule for raw socket tests so they are runnable via:
bazel test test/syscalls:raw_socket_ipv4_test_{native,runsc_ptrace,runsc_kvm}

PiperOrigin-RevId: 241640049
Change-Id: Iac4dbdd7fd1827399a472059ac7d85fb6b506577
2019-04-02 17:48:33 -07:00