Commit Graph

1191 Commits

Author SHA1 Message Date
Adin Scannell 5749f64314 kvm: remove non-sane sanity check
Apparently some platforms don't have pSize < vSize.

Fixes #208

PiperOrigin-RevId: 245480998
Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
2019-04-26 13:53:12 -07:00
Bhasker Hariharan 228dc15fd1 Bump the AF_PACKET socket rcv buf size to 4MB by default.
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.

Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.

Updates #211

iperf output w/ 4MB buffer.

 iperf3 -c 172.17.0.2 -t 100
 Connecting to host 172.17.0.2, port 5201
 [  4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
 [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
 [  4]   0.00-1.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.02 MBytes
 [  4]   1.00-2.00   sec  1.18 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]   2.00-3.00   sec   965 MBytes  8.09 Gbits/sec    0   1.02 MBytes
 [  4]   3.00-4.00   sec   942 MBytes  7.90 Gbits/sec    0   1.02 MBytes
 [  4]   4.00-5.00   sec   952 MBytes  7.99 Gbits/sec    0   1.02 MBytes
 [  4]   5.00-6.00   sec  1.14 GBytes  9.81 Gbits/sec    0   1.02 MBytes
 [  4]   6.00-7.00   sec  1.13 GBytes  9.68 Gbits/sec    0   1.02 MBytes
 [  4]   7.00-8.00   sec   930 MBytes  7.80 Gbits/sec    0   1.02 MBytes
 [  4]   8.00-9.00   sec  1.15 GBytes  9.91 Gbits/sec    0   1.02 MBytes
 [  4]   9.00-10.00  sec   938 MBytes  7.87 Gbits/sec    0   1.02 MBytes
 [  4]  10.00-11.00  sec   737 MBytes  6.18 Gbits/sec    0   1.02 MBytes
 [  4]  11.00-12.00  sec  1.16 GBytes  9.93 Gbits/sec    0   1.02 MBytes
 [  4]  12.00-13.00  sec   917 MBytes  7.69 Gbits/sec    0   1.02 MBytes
 [  4]  13.00-14.00  sec  1.19 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]  14.00-15.00  sec  1.01 GBytes  8.70 Gbits/sec    0   1.02 MBytes
 [  4]  15.00-16.00  sec  1.20 GBytes  10.3 Gbits/sec    0   1.02 MBytes
 [  4]  16.00-17.00  sec  1.14 GBytes  9.80 Gbits/sec    0   1.02 MBytes
 ^C[  4]  17.00-17.60  sec   718 MBytes  10.1 Gbits/sec    0   1.02 MBytes
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ ID] Interval           Transfer     Bandwidth       Retr
 [  4]   0.00-17.60  sec  18.4 GBytes  8.98 Gbits/sec    0             sender
 [  4]   0.00-17.60  sec  0.00 Bytes  0.00 bits/sec                  receiver

PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
2019-04-26 12:52:02 -07:00
Tamir Duberstein 59442238d4 Remove syscall tests' dependency on glog
PiperOrigin-RevId: 245469859
Change-Id: I0610e477cc3a884275852e83028ecfb501f2c039
2019-04-26 12:47:46 -07:00
Kevin Krakauer 5f13338d30 Fix reference counting bug in /proc/PID/fdinfo/.
PiperOrigin-RevId: 245452217
Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
2019-04-26 11:09:55 -07:00
Kevin Krakauer f4d34b420b Change name of sticky test arg.
PiperOrigin-RevId: 245451875
Change-Id: Icee2c4ed74564e77454c60d60f456454443ccadf
2019-04-26 11:08:08 -07:00
Michael Pratt f17cfa4d53 Perform explicit CPUID and FP state compatibility checks on restore
PiperOrigin-RevId: 245341004
Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657
2019-04-25 17:47:05 -07:00
Jamie Liu 6b76c172b4 Don't enforce NAME_MAX in fs.Dirent.walk().
Maximum filename length is filesystem-dependent, and obtained via
statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not
always. For example, VFAT supports filenames of up to 255... UCS-2
characters, which Linux conservatively takes to mean UTF-8-encoded
bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
As a result, Linux's VFS does not enforce NAME_MAX:

$ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/
fs/libfs.c
38:     buf->f_namelen = NAME_MAX;
64:     if (dentry->d_name.len > NAME_MAX)

include/linux/relay.h
74:     char base_filename[NAME_MAX];   /* saved base filename */

include/linux/fscrypt.h
149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length.

include/linux/exportfs.h
176: *    understanding that it is already pointing to a a %NAME_MAX+1 sized

Remove this check from core VFS, and add it to ramfs (and by extension
tmpfs), where it is actually applicable:
mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does*
enforce NAME_MAX.

PiperOrigin-RevId: 245324748
Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-25 16:05:13 -07:00
Tamir Duberstein 7219781040 s,sys/poll.h/,poll.h,g
See https://git.musl-libc.org/cgit/musl/tree/include/sys/poll.h

PiperOrigin-RevId: 245312375
Change-Id: If749ae3f94ccedc82eb6b594b32155924a354b58
2019-04-25 14:57:06 -07:00
Tamir Duberstein 992b66e688 Handle glibc and XSI variants of strerror_r
PiperOrigin-RevId: 245306581
Change-Id: I44a034310809f8e9e651be8023ff1985561602fc
2019-04-25 14:23:46 -07:00
Tamir Duberstein 9c638f1beb Remove useless modifiers
PiperOrigin-RevId: 245304611
Change-Id: Ie0e9bfc03d064e41d50157eeb4df22b2635f41e2
2019-04-25 14:12:51 -07:00
Ian Lewis 03be9ae88c Update required bazel version to 0.23.0 in README
Bazel 0.23.0 is required due to the use of cc_flags_supplier.bzl in the vdso
package. cc_flags_supplier.bzl was added in 0.23.0.

PiperOrigin-RevId: 245192715
Change-Id: I4258c064e5cc3bac2a587c887e0d8f87b6678ec7
2019-04-25 01:16:33 -07:00
Ian Gudger 962567aafd Add Unix socket tests for the MSG_CTRUNC msghdr flag.
TCP tests and the implementation will come in followup CLs.

Updates google/gvisor#206
Updates google/gvisor#207

PiperOrigin-RevId: 245121470
Change-Id: Ib50b62724d3ba0cbfb1374e1f908798431ee2b21
2019-04-24 14:51:42 -07:00
Bhasker Hariharan 99b877fa1d Revert runsc to use RecvMMsg packet dispatcher.
PacketMMap mode has issues due to a kernel bug. This change
reverts us to using recvmmsg instead of a shared ring buffer to
dispatch inbound packets. This will reduce performance but should
be more stable under heavy load till PacketMMap is updated to
use TPacketv3.

See #210 for details.

Perf difference between recvmmsg vs packetmmap.

RecvMMsg :
iperf3 -c 172.17.0.2
Connecting to host 172.17.0.2, port 5201
[  4] local 172.17.0.1 port 43478 connected to 172.17.0.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   778 MBytes  6.53 Gbits/sec  4349    188 KBytes
[  4]   1.00-2.00   sec   786 MBytes  6.59 Gbits/sec  4395    212 KBytes
[  4]   2.00-3.00   sec   756 MBytes  6.34 Gbits/sec  3655    161 KBytes
[  4]   3.00-4.00   sec   782 MBytes  6.56 Gbits/sec  4419    175 KBytes
[  4]   4.00-5.00   sec   755 MBytes  6.34 Gbits/sec  4317    187 KBytes
[  4]   5.00-6.00   sec   774 MBytes  6.49 Gbits/sec  4002    173 KBytes
[  4]   6.00-7.00   sec   737 MBytes  6.18 Gbits/sec  3904    191 KBytes
[  4]   7.00-8.00   sec   530 MBytes  4.44 Gbits/sec  3318    189 KBytes
[  4]   8.00-9.00   sec   487 MBytes  4.09 Gbits/sec  2627    188 KBytes
[  4]   9.00-10.00  sec   770 MBytes  6.46 Gbits/sec  4221    170 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  6.99 GBytes  6.00 Gbits/sec  39207             sender
[  4]   0.00-10.00  sec  6.99 GBytes  6.00 Gbits/sec                  receiver

iperf Done.

PacketMMap:

bhaskerh@gvisor-bench:~/tensorflow$ iperf3 -c 172.17.0.2
Connecting to host 172.17.0.2, port 5201
[  4] local 172.17.0.1 port 43496 connected to 172.17.0.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   657 MBytes  5.51 Gbits/sec    0   1.01 MBytes
[  4]   1.00-2.00   sec  1021 MBytes  8.56 Gbits/sec    0   1.01 MBytes
[  4]   2.00-3.00   sec  1.21 GBytes  10.4 Gbits/sec   45   1.01 MBytes
[  4]   3.00-4.00   sec  1018 MBytes  8.54 Gbits/sec   15   1.01 MBytes
[  4]   4.00-5.00   sec  1.28 GBytes  11.0 Gbits/sec   45   1.01 MBytes
[  4]   5.00-6.00   sec  1.38 GBytes  11.9 Gbits/sec    0   1.01 MBytes
[  4]   6.00-7.00   sec  1.34 GBytes  11.5 Gbits/sec   45    856 KBytes
[  4]   7.00-8.00   sec  1.23 GBytes  10.5 Gbits/sec    0    901 KBytes
[  4]   8.00-9.00   sec  1010 MBytes  8.48 Gbits/sec    0    923 KBytes
[  4]   9.00-10.00  sec  1.39 GBytes  11.9 Gbits/sec    0    960 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  11.4 GBytes  9.83 Gbits/sec  150             sender
[  4]   0.00-10.00  sec  11.4 GBytes  9.83 Gbits/sec                  receiver

Updates #210

PiperOrigin-RevId: 244968438
Change-Id: Id461b5cbff2dea6fa55cfc108ea246d8f83da20b
2019-04-23 19:07:06 -07:00
Bhasker Hariharan 56cadcac4e Fixes to PacketMMap dispatcher.
This CL fixes the following bugs:

- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.

- Increments ringOffsets for frames that are truncated (i.e status is
  tpStatusCopy)

- Does not ignore frames with tpStatusLost bit set as they are valid frames and
  only indicate that there some frames were lost before this one and metrics can
  be retrieved with a getsockopt call.

- Adds checks to make sure blockSize is a multiple of page size. This is
  required as the kernel allocates in pages per block and rejects sizes that are
  not page aligned with an EINVAL.

Updates #210

PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
2019-04-23 17:47:56 -07:00
Fabricio Voznika 7c9c5fd36d Add Linux version to requirements section
PiperOrigin-RevId: 244959388
Change-Id: Ifb08678d975cf9f694a21012f9a1e9f45b1f197c
2019-04-23 17:46:44 -07:00
Fabricio Voznika 1b10f52d59 Remember file position during Readdir()
The caller must call Readdir() at least twice to detect
EOF. The old code was always restarting the directory
search and then skipping elements already seen, effectively
doubling the cost to read a directory. The code now
remembers the last offset and doesn't reposition the cursor
if next request comes at the same offset.

PiperOrigin-RevId: 244957816
Change-Id: If21a8dc68b76614adbcf4301439adfda40f2643f
2019-04-23 17:34:51 -07:00
Fabricio Voznika db334f7154 Remove reflection from 9P serving path
p9.messageByType was taking 7% of p9.recv before, spending time
with reflection and map lookup. Now it's reduced to 1%.

PiperOrigin-RevId: 244947313
Change-Id: I42813f920557b7656f8b29157eb32acd79e11fa5
2019-04-23 16:26:10 -07:00
Fabricio Voznika 908edee04f Replace os.File with fd.FD in fsgofer
os.NewFile() accounts for 38% of CPU time in localFile.Walk().
This change switchs to use fd.FD which is much cheaper to create.
Now, fd.New() in localFile.Walk() accounts for only 4%.

PiperOrigin-RevId: 244944983
Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
2019-04-23 16:10:54 -07:00
Kevin Krakauer df21460cfd Fix container_test flakes.
Create, Start, and Destroy were racing to create and destroy the
metadata directory of containers.

This is a re-upload of
https://gvisor-review.googlesource.com/c/gvisor/+/16260, but with the
correct account.

Change-Id: I16b7a9d0971f0df873e7f4145e6ac8f72730a4f1
PiperOrigin-RevId: 244892991
2019-04-23 11:33:40 -07:00
Wei Zhang 17ff6063a3 Bugfix: fix fstatat symbol link to dir
For a symbol link to some directory, eg.

`/tmp/symlink -> /tmp/dir`

`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.

Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.

Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
2019-04-22 20:07:06 -07:00
Michael Pratt d6aac9387f Fix doc typo
PiperOrigin-RevId: 244773890
Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa
2019-04-22 18:22:19 -07:00
Michael Pratt f86c35a51f Clean up state error handling
PiperOrigin-RevId: 244773836
Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf
2019-04-22 18:20:51 -07:00
Ben Burkert 56927e5317 tcpip/transport/tcp: read side only shutdown of an endpoint
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.

Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.

Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
2019-04-19 19:29:05 -07:00
Ian Gudger 358eb52a76 Add support for the MSG_TRUNC msghdr flag.
The MSG_TRUNC flag is set in the msghdr when a message is truncated.

Fixes google/gvisor#200

PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-19 16:17:01 -07:00
Ben Burkert cec2cdc12f tcpip/transport/udp: add Forwarder type
Add a UDP forwarder for intercepting and forwarding UDP sessions.

Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9
PiperOrigin-RevId: 244293576
2019-04-18 17:49:57 -07:00
Haibo Xu f4d434c180 Enable vDSO support on arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I20103cd6d193431ab7e8120005da1f567b9bc2eb
PiperOrigin-RevId: 244280119
2019-04-18 16:22:08 -07:00
Michael Pratt c931c8e082 Format struct pollfd in poll(2)/ppoll(2)
I0410 15:40:38.854295    3776 x:0] [   1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1)
I0410 15:40:38.854348    3776 x:0] [   1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s)

PiperOrigin-RevId: 244269879
Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
2019-04-18 15:24:07 -07:00
Nicolas Lacasse ce64d9ebf0 Keep symlink target open while in test that compares inode ids.
Inode ids are only guaranteed to be stable across save/restore if the file is
held open. This CL fixes a simple stat test to allow it to compare symlink and
target by inode id, as long as the link target is held open.

PiperOrigin-RevId: 244238343
Change-Id: I74c5115915b1cc032a4c16515a056a480f218f00
2019-04-18 12:39:35 -07:00
Ian Gudger 133700007a Only emit unimplemented syscall events for unsupported values.
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER
when attempting to set unsupported values.

PiperOrigin-RevId: 244229675
Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
2019-04-18 11:51:41 -07:00
Andrei Vagin 4524790ff6 netstack: use a proper network protocol to set gso.L3HdrLen
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.

This means that we can't use endpoint.netProto to set gso.L3HdrLen.

PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
2019-04-18 11:42:23 -07:00
Michael Pratt b52cbd6028 Don't allow sigtimedwait to catch unblockable signals
The existing logic attempting to do this is incorrect. Unary ^ has
higher precedence than &^, so mask always has UnblockableSignals
cleared, allowing dequeueSignalLocked to dequeue unblockable signals
(which allows userspace to ignore them).

Switch the logic so that unblockable signals are always masked.

PiperOrigin-RevId: 244058487
Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
2019-04-17 13:43:20 -07:00
Fabricio Voznika c8cee7108f Use FD limit and file size limit from host
FD limit and file size limit is read from the host, instead
of using hard-coded defaults, given that they effect the sandbox
process. Also limit the direct cache to use no more than half
if the available FDs.

PiperOrigin-RevId: 244050323
Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-04-17 12:57:40 -07:00
Michael Pratt 08d99c5fbe Convert poll/select to operate more directly on linux.PollFD
Current, doPoll copies the user struct pollfd array into a
[]syscalls.PollFD, which contains internal kdefs.FD and
waiter.EventMask types. While these are currently binary-compatible with
the Linux versions, we generally discourage copying directly to internal
types (someone may inadvertantly change kdefs.FD to uint64).

Instead, copy directly to a []linux.PollFD, which will certainly be
binary compatible. Most of syscalls/polling.go is included directly into
syscalls/linux/sys_poll.go, as it can then operate directly on
linux.PollFD. The additional syscalls.PollFD type is providing little
value.

I've also added explicit conversion functions for waiter.EventMask,
which creates the possibility of a different binary format.

PiperOrigin-RevId: 244042947
Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
2019-04-17 12:15:01 -07:00
Googler e091b4e7c0 Internal change.
PiperOrigin-RevId: 244036529
Change-Id: I280f9632a65d2e40d844e0d5ec3a101d808434ee
2019-04-17 11:40:11 -07:00
Fabricio Voznika 9f8c89fc7f Return error from fdbased.New
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
2019-04-17 11:16:35 -07:00
Lantao Liu e815666717 Fix gvisor-containerd-shim download in the test.
The file layout in the bucket is changed a little bit recently to support both v1 shim and v2 shim.

PiperOrigin-RevId: 243682904
Change-Id: Ic1373c6dc088ef41f829e7ce3ea3762e1e2b0292
2019-04-15 13:56:52 -07:00
Fabricio Voznika 546a1df7d1 Add 'runsc do' command
It provides an easy way to run commands to quickly test gVisor.
By default it maps the host root as the container root with a
writable overlay on top (so the host root is not modified).

Example:
  sudo runsc do ls -lh --color
  sudo runsc do ~/src/test/my-test.sh
PiperOrigin-RevId: 243178711
Change-Id: I05f3d6ce253fe4b5f1362f4a07b5387f6ddb5dd9
2019-04-11 17:54:34 -07:00
Michael Pratt 6b24f7ab08 Format FDs in strace logs
Normal files display their path in the current mount namespace:

I0410 10:57:54.964196  216336 x:0] [   1] ls X read(0x3 /proc/filesystems, 0x55cee3bdb2c0 "nodev\t9p\nnodev\tdevpts \nnodev\tdevtmpfs\nnodev\tproc\nnodev\tramdiskfs\nnodev\tsysfs\nnodev\ttmpfs\n", 0x1000) = 0x58 (24.462?s)

AT_FDCWD includes the CWD:

I0411 12:58:48.278427    1526 x:0] [   1] stat_test E newfstatat(AT_FDCWD /home/prattmic, 0x55ea719b564e /proc/self, 0x7ef5cefc2be8, 0x0)

Sockets (and other non-vfs files) display an inode number (like
/proc/PID/fd):

I0410 10:54:38.909123  207684 x:0] [   1] nc E bind(0x3 socket:[1], 0x55b5a1652040 {Family: AF_INET, Addr: , Port: 8080}, 0x10)

I also fixed a few syscall args that should be Path.

PiperOrigin-RevId: 243169025
Change-Id: Ic7dda6a82ae27062fe2a4a371557acfd6a21fa2a
2019-04-11 16:48:39 -07:00
Adin Scannell efacb8d900 CONTRIBUTING: add style guide pointer
Change-Id: I93a78a6b2bb2eaa69046c6cfecee2e4cfcf20e44
PiperOrigin-RevId: 243140359
2019-04-11 14:18:01 -07:00
Adin Scannell fab6352ac8 README: add build badge
Change-Id: Ie6b73ac729c8c85b1229e09da5b113be9780fa95
PiperOrigin-RevId: 243131814
2019-04-11 13:36:53 -07:00
Jamie Liu 4209edafb6 Use open fids when fstat()ing gofer files.
PiperOrigin-RevId: 243018347
Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37
2019-04-11 00:43:04 -07:00
Michael Pratt cc48969bb7 Internal change
PiperOrigin-RevId: 242978508
Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
2019-04-10 18:00:18 -07:00
Nicolas Lacasse d93d19fd4e Fix uses of RootFromContext.
RootFromContext can return a dirent with reference taken, or nil. We must call
DecRef if (and only if) a real dirent is returned.

PiperOrigin-RevId: 242965515
Change-Id: Ie2b7b4cb19ee09b6ccf788b71f3fd7efcdf35a11
2019-04-10 16:36:28 -07:00
Kevin Krakauer c8368e477b rlimits test: don't exceed nr_open.
Even superuser cannot raise RLIMIT_NOFILE above /proc/sys/fs/nr_open, so
start the test by lowering the limits before raising.

Change-Id: Ied6021c64178a6cb9098088a1a3384db523a226f
PiperOrigin-RevId: 242965249
2019-04-10 16:34:50 -07:00
Yong He 89cc8eef9b DATA RACE in fs.(*Dirent).fullName
add renameMu.Lock when oldParent == newParent
in order to avoid data race in following report:

WARNING: DATA RACE
Read at 0x00c000ba2160 by goroutine 405:
  gvisor.googlesource.com/gvisor/pkg/sentry/fs.(*Dirent).fullName()
      pkg/sentry/fs/dirent.go:246 +0x6c
  gvisor.googlesource.com/gvisor/pkg/sentry/fs.(*Dirent).FullName()
      pkg/sentry/fs/dirent.go:356 +0x8b
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*FDMap).String()
      pkg/sentry/kernel/fd_map.go:135 +0x1e0
  fmt.(*pp).handleMethods()
      GOROOT/src/fmt/print.go:603 +0x404
  fmt.(*pp).printArg()
      GOROOT/src/fmt/print.go:686 +0x255
  fmt.(*pp).doPrintf()
      GOROOT/src/fmt/print.go:1003 +0x33f
  fmt.Fprintf()
      GOROOT/src/fmt/print.go:188 +0x7f
  gvisor.googlesource.com/gvisor/pkg/log.(*Writer).Emit()
      pkg/log/log.go:121 +0x89
  gvisor.googlesource.com/gvisor/pkg/log.GoogleEmitter.Emit()
      pkg/log/glog.go:162 +0x1acc
  gvisor.googlesource.com/gvisor/pkg/log.(*GoogleEmitter).Emit()
      <autogenerated>:1 +0xe1
  gvisor.googlesource.com/gvisor/pkg/log.(*BasicLogger).Debugf()
      pkg/log/log.go:177 +0x111
  gvisor.googlesource.com/gvisor/pkg/log.Debugf()
      pkg/log/log.go:235 +0x66
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Debugf()
      pkg/sentry/kernel/task_log.go:48 +0xfe
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).DebugDumpState()
      pkg/sentry/kernel/task_log.go:66 +0x11f
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runApp).execute()
      pkg/sentry/kernel/task_run.go:272 +0xc80
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run()
      pkg/sentry/kernel/task_run.go:91 +0x24b

Previous write at 0x00c000ba2160 by goroutine 423:
  gvisor.googlesource.com/gvisor/pkg/sentry/fs.Rename()
      pkg/sentry/fs/dirent.go:1628 +0x61f
  gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1.1()
      pkg/sentry/syscalls/linux/sys_file.go:1864 +0x1f8
  gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt(  gvisor.googlesource.com/g/linux/sys_file.go:51 +0x20f
  gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1()
      pkg/sentry/syscalls/linux/sys_file.go:1852 +0x218
  gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt()
      pkg/sentry/syscalls/linux/sys_file.go:51 +0x20f
  gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt()
      pkg/sentry/syscalls/linux/sys_file.go:1840 +0x180
  gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.Rename()
      pkg/sentry/syscalls/linux/sys_file.go:1873 +0x60
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).executeSyscall()
      pkg/sentry/kernel/task_syscall.go:165 +0x17a
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscallInvoke()
      pkg/sentry/kernel/task_syscall.go:283 +0xb4
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscallEnter()
      pkg/sentry/kernel/task_syscall.go:244 +0x10c
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscall()
      pkg/sentry/kernel/task_syscall.go:219 +0x1e3
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runApp).execute()
      pkg/sentry/kernel/task_run.go:215 +0x15a9
  gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run()
      pkg/sentry/kernel/task_run.go:91 +0x24b

Reported-by: syzbot+e1babbf756fab380dfff@syzkaller.appspotmail.com
Change-Id: Icd2620bb3ea28b817bf0672d454a22b9d8ee189a
PiperOrigin-RevId: 242938741
2019-04-10 14:17:33 -07:00
Kevin Krakauer f7aff0aaa4 Allow threads with CAP_SYS_RESOURCE to raise hard rlimits.
PiperOrigin-RevId: 242919489
Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
2019-04-10 12:36:45 -07:00
Nicolas Lacasse 0a0619216e Start saving MountSource.DirentCache.
DirentCache is already a savable type, and it ensures that it is empty at the
point of Save.  There is no reason not to save it along with the MountSource.

This did uncover an issue where not all MountSources were properly flushed
before Save.  If a mount point has an open file and is then unmounted, we save
the MountSource without flushing it first.  This CL also fixes that by flushing
all MountSources for all open FDs on Save.

PiperOrigin-RevId: 242906637
Change-Id: I3acd9d52b6ce6b8c989f835a408016cb3e67018f
2019-04-10 11:27:16 -07:00
Shiva Prasanth 7140b1fdca Fixed /proc/cpuinfo permissions
This also applies these permissions to other static proc files.

Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d
PiperOrigin-RevId: 242898575
2019-04-10 10:49:43 -07:00
Michael Pratt 0e14e48b84 Match multi-word State
From a recent test failure:

"State:\tD (disk sleep)\n"

"disk sleep" does not match \w+. We need to allow spaces.

PiperOrigin-RevId: 242762469
Change-Id: Ic8d05a16669412a72c1e76b498373e5b22fe64c4
2019-04-09 16:26:11 -07:00
Li Qiang b3b140ea4f syscalls: sendfile: limit the count to MAX_RW_COUNT
From sendfile spec and also the linux kernel code, we should
limit the count arg to 'MAX_RW_COUNT'. This patch export
'MAX_RW_COUNT' in kernel pkg and use it in the implementation
of sendfile syscall.

Signed-off-by: Li Qiang <pangpei.lq@antfin.com>
Change-Id: I1086fec0685587116984555abd22b07ac233fbd2
PiperOrigin-RevId: 242745831
2019-04-09 14:57:05 -07:00