Commit Graph

441 Commits

Author SHA1 Message Date
Zach Koopmans b3b60ea29a Implementation of preadv2 for Linux 4.4 support
Implement RWF_HIPRI (4.6) silently passes the read call.
Implement -1 offset calls readv.

PiperOrigin-RevId: 222840324
Change-Id: If9ddc1e8d086e1a632bdf5e00bae08205f95b6b0
2018-11-26 09:50:47 -08:00
Ian Gudger 1918563525 Make ToView non-allocating for single VectorizedViews containing a single View
PiperOrigin-RevId: 222483471
Change-Id: I6720690b20167dd541fdfa5218eba7c9f7483347
2018-11-21 18:11:13 -08:00
Fabricio Voznika eaac94d91c Use RET_KILL_PROCESS if available in kernel
RET_KILL_THREAD doesn't work well for Go because it will
kill only the offending thread and leave the process hanging.
RET_TRAP can be masked out and it's not guaranteed to kill
the process. RET_KILL_PROCESS is available since 4.14.

For older kernel, continue to use RET_TRAP as this is the
best option (likely to kill process, easy to debug).

PiperOrigin-RevId: 222357867
Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
2018-11-20 22:56:51 -08:00
Fabricio Voznika 5236b78242 Dumps stacks if watchdog thread is stuck
PiperOrigin-RevId: 222332703
Change-Id: Id5c3cf79591c5d2949895b4e323e63c48c679820
2018-11-20 17:24:19 -08:00
Fabricio Voznika 8b314b0bf4 Fix recursive read lock taken on TaskSet
SyncSyscallFiltersToThreadGroup and Task.TheadID() both acquired TaskSet RWLock
in R mode and could deadlock if a writer comes in between.

PiperOrigin-RevId: 222313551
Change-Id: I4221057d8d46fec544cbfa55765c9a284fe7ebfa
2018-11-20 15:07:56 -08:00
Michael Pratt 03c1eb78b5 Reference upstream licenses
Include copyright notices and the referenced LICENSE file.

PiperOrigin-RevId: 222171321
Change-Id: I0cc0b167ca51b536d1087bf1c4742fdf1430bc2a
2018-11-20 14:05:16 -08:00
Fabricio Voznika fadffa2ff8 Add unsupported syscall events for get/setsockopt
PiperOrigin-RevId: 222148953
Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa
2018-11-20 14:04:12 -08:00
Nicolas Lacasse 8c84f9a3c1 Parse the tmpfs mode before validating.
This gets rid of the problematic modeRegex.

PiperOrigin-RevId: 221835959
Change-Id: I566b8d8a43579a4c30c0a08a620a964bbcd826dd
2018-11-20 14:02:39 -08:00
Adin Scannell bb9a2bb62e Update futex to use usermem abstractions.
This eliminates the indirection that existed in task_futex.

PiperOrigin-RevId: 221832498
Change-Id: Ifb4c926d493913aa6694e193deae91616a29f042
2018-11-20 14:02:07 -08:00
Rahat Mahmood f7aa937124 Advertise vsyscall support via /proc/<pid>/maps.
Also update test utilities for probing vsyscall support and add a
metric to see if vsyscalls are actually used in sandboxes.

PiperOrigin-RevId: 221698834
Change-Id: I57870ecc33ea8c864bd7437833f21aa1e8117477
2018-11-15 15:14:38 -08:00
Nicolas Lacasse 6ef08c2bc2 Allow setting sticky bit in tmpfs permissions.
PiperOrigin-RevId: 221683127
Change-Id: Ide6a9f41d75aa19d0e2051a05a1e4a114a4fb93c
2018-11-15 13:48:59 -08:00
Ian Gudger 9d8e49d950 Process delayed packets when delay is disabled
Moving the wakeup logic into the disable blocks is an optimization.

PiperOrigin-RevId: 221677028
Change-Id: Ib5a5a6d52cc77b4bbc5dedcad9ee1dbb3da98deb
2018-11-15 13:17:06 -08:00
Bert Muthalaly bc41e4761b Rename incorrectly named (dst, src) arguments in DeliverNetworkPacket prototype
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).

Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.

Note that every callsite passes arguments in the order (src, dst).

PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
2018-11-14 14:46:24 -08:00
Ian Gudger b5e91eaa52 Clean up tcp.sendData
PiperOrigin-RevId: 221484739
Change-Id: I44c71f79f99d0d00a2e70a7f06d7024a62a5de0a
2018-11-14 11:58:41 -08:00
Ian Gudger 7f60294a73 Implement TCP_NODELAY and TCP_CORK
Previously, TCP_NODELAY was always enabled and we would lie about it being
configurable. TCP_NODELAY is now disabled by default (to match Linux) in the
socket layer so that non-gVisor users don't automatically start using this
questionable optimization.

PiperOrigin-RevId: 221368472
Change-Id: Ib0240f66d94455081f4e0ca94f09d9338b2c1356
2018-11-13 18:02:43 -08:00
Googler 25d07fbbed Internal change.
PiperOrigin-RevId: 221189534
Change-Id: Id20d318bed97d5226b454c9351df396d11251e1f
2018-11-12 17:44:46 -08:00
Ian Gudger c22da3e705 Remove obsolete TODO
PiperOrigin-RevId: 221117846
Change-Id: I2a43fd8135b1d1194ff81e98644ce6b6182ece50
2018-11-12 10:45:19 -08:00
Bhasker Hariharan 33089561b1 Add an implementation of a SACK scoreboard as per RFC6675.
PiperOrigin-RevId: 220866996
Change-Id: I89d48215df57c00d6a6ec512fc18712a2ea9080b
2018-11-09 14:38:46 -08:00
Andrei Vagin 2ef122da35 Implement sync_file_range()
sync_file_range - sync a file segment with disk

In Linux, sync_file_range() accepts three flags:

       SYNC_FILE_RANGE_WAIT_BEFORE
              Wait  upon  write-out  of  all pages in the specified range that
              have already been submitted to the device driver  for  write-out
              before performing any write.

       SYNC_FILE_RANGE_WRITE
              Initiate  write-out  of  all  dirty pages in the specified range
              which are not presently submitted  write-out.   Note  that  even
              this  may  block if you attempt to write more than request queue
              size.

       SYNC_FILE_RANGE_WAIT_AFTER
              Wait upon write-out of all pages in the range  after  performing
              any write.

In this implementation:

SYNC_FILE_RANGE_WAIT_BEFORE without SYNC_FILE_RANGE_WAIT_AFTER isn't
supported right now.

SYNC_FILE_RANGE_WRITE is skipped. It should initiate write-out of  all
dirty pages, but it doesn't wait, so it should be safe to do nothing
while nobody uses SYNC_FILE_RANGE_WAIT_BEFORE.

SYNC_FILE_RANGE_WAIT_AFTER is equal to fdatasync(). In Linux,
sync_file_range() doesn't writes out the  file's  meta-data, but
fdatasync() does if a file size is changed.

PiperOrigin-RevId: 220730840
Change-Id: Iae5dfb23c2c916967d67cf1a1ad32f25eb3f6286
2018-11-08 17:39:51 -08:00
Rahat Mahmood 5a0be6fa20 Create stubs for syscalls upto Linux 4.4.
Create syscall stubs for missing syscalls upto Linux 4.4 and advertise
a kernel version of 4.4.

PiperOrigin-RevId: 220667680
Change-Id: Idbdccde538faabf16debc22f492dd053a8af0ba7
2018-11-08 11:09:46 -08:00
Fabricio Voznika dce61075c0 Fix flaky TestCacheResolutionTimeout
Increase timeout to prevent the entry from being
found when there is delay on the address resolution
goroutine that doesn't mark the request as failed.

PiperOrigin-RevId: 220504789
Change-Id: I7e44fd95d8624bd69962f862fbf5517a81395f2a
2018-11-07 12:01:48 -08:00
Googler 9256ed5283 Internal change.
PiperOrigin-RevId: 220314735
Change-Id: Ic519567e43f6caf042b9f223e517da40640b7d38
2018-11-06 11:08:22 -08:00
Ian Gudger 95722dc4dd Use correct company name in copyright header
These files were added with the wrong name after all of the existing files
were corrected.

PiperOrigin-RevId: 220202068
Change-Id: Ia0d15233c1aa69330356a7cf16b5aa00d978e09c
2018-11-05 17:23:56 -08:00
Ian Gudger 37cbce1f91 Merge segments in sender's writeList
PiperOrigin-RevId: 220185891
Change-Id: Iaea73fd7b2fa8c399b989cdcaabf4885f370df4b
2018-11-05 15:39:30 -08:00
Fabricio Voznika b6b81fd04b Add new log format that is compatible with Kubernetes
Fluentd configuration uses 'log' for the log message
while containerd uses 'msg'. Since we can't have a single
JSON format for both, add another log format and make
debug log configurable.

PiperOrigin-RevId: 219729658
Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0
2018-11-01 17:44:58 -07:00
Ian Lewis 9d69d85bc1 Make error messages a bit more user friendly.
Updated error messages so that it doesn't print full Go struct representations
when running a new container in a sandbox. For example, this occurs frequently
when commands are not found when doing a 'kubectl exec'.

PiperOrigin-RevId: 219729141
Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
2018-11-01 17:40:09 -07:00
Rahat Mahmood 0e277a39c8 Prevent premature destruction of shm segments.
Shm segments can be marked for lazy destruction via shmctl(IPC_RMID),
which destroys a segment once it is no longer attached to any
processes. We were unconditionally decrementing the segment refcount
on shmctl(IPC_RMID) which allowed a user to force a segment to be
destroyed by repeatedly calling shmctl(IPC_RMID), with outstanding
memory maps to the segment.

This is problematic because the memory released by a segment destroyed
this way can be reused by a different process while remaining
accessible by the process with outstanding maps to the segment.

PiperOrigin-RevId: 219713660
Change-Id: I443ab838322b4fb418ed87b2722c3413ead21845
2018-11-01 15:54:14 -07:00
Juan b23cd33682 modify modeRegexp to adapt the default spec of containerd
https://github.com/containerd/containerd/blob/master/oci/spec.go#L206, the mode=755
didn't match the pattern modeRegexp = regexp.MustCompile("0[0-7][0-7][0-7]").

Closes #112

Signed-off-by: Juan <xionghuan.cn@gmail.com>
Change-Id: I469e0a68160a1278e34c9e1dbe4b7784c6f97e5a
PiperOrigin-RevId: 219672525
2018-11-01 11:57:54 -07:00
Rahat Mahmood c2249d6472 Mark amutex_test as flaky.
PiperOrigin-RevId: 219575226
Change-Id: If4e67b29110332c94013513fb111ec7e019f2915
2018-10-31 19:33:09 -07:00
Ian Gudger 59b7766af7 Fix a race where keepalives could be sent while there is pending data
PiperOrigin-RevId: 219571556
Change-Id: I5a1042c1cb05eb2711eb01627fd298bad6c543a6
2018-10-31 18:42:44 -07:00
Ian Gudger eeddae1199 Use syserr style error translation in netstack's rawfile
Replacing map lookups with slice indexing is higher performance.

PiperOrigin-RevId: 219569901
Change-Id: I9b7cd22abd4b95383025edbd5a80d1c1a4496936
2018-10-31 18:22:05 -07:00
Adin Scannell fb613020c7 kvm: simplify floating point logic.
This reduces the number of floating point save/restore cycles required (since
we don't need to restore immediately following the switch, this always happens
in a known context) and allows the kernel hooks to capture state. This lets us
remove calls like "Current()".

PiperOrigin-RevId: 219552844
Change-Id: I7676fa2f6c18b9919718458aa888b832a7db8cab
2018-10-31 15:59:23 -07:00
Adin Scannell c4bbb54168 kvm: add detailed traces on vCPU errors.
This improves debuggability greatly.

PiperOrigin-RevId: 219551560
Change-Id: I2ecaffdd1c17b0d9f25911538ea6f693e2bc699f
2018-10-31 15:50:10 -07:00
Adin Scannell e9dbd5ab67 kvm: avoid siginfo allocations.
PiperOrigin-RevId: 219492587
Change-Id: I47f6fc0b74a4907ab0aff03d5f26453bdb983bb5
2018-10-31 10:08:06 -07:00
Tamir Duberstein 0692ad72ef Remove ipv4.endpoint.address
This field was added in the intial implementation, before Route existed
to pass the local and remote addresses to the packet-writing path.
Today, the Route's members should be respected. A similar bug was
previously fixed in 214650822.

PiperOrigin-RevId: 219474095
Change-Id: Id2a8ee4421d2841c8d88ccb3c193c455086350ee
2018-10-31 08:04:57 -07:00
Adin Scannell 0091db9cbd kvm: use private futexes.
Use private futexes for performance and to align with other runtime uses.

PiperOrigin-RevId: 219422634
Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd
2018-10-30 22:46:42 -07:00
Michael Pratt 245d81561b Clean up cpuid_parse_test
Actually parse flags from cpuinfo to avoid mistakenly matching
substrings in cpuinfo that happen to match a flags.

Some features were only exposed in recent versions of Linux. Don't
require them to appear in cpuinfo on old versions of Linux.

Move PREFETCHWT1 back to parse only features. It isn't actually exposed
in Linux yet. Move SDBG to shown features. It has been visible since
Linux 4.3.

PiperOrigin-RevId: 219381731
Change-Id: Ied7c0ee7c8a9879683e81933de56c9074b01108f
2018-10-30 15:56:12 -07:00
Michael Pratt b15a7267e1 Add AMD-specific features to cpuid package
Extend the cpuid package to parse and emulate cpuid features that exist
only on AMD and not Intel. The least straightforward part of this is
that AMD duplicates several block 1 features in block 6. Thus we ignore
those features when parsing block 6 and add them when emulating.

PiperOrigin-RevId: 218935032
Change-Id: Id41bf1c24720b0d9b968e2c19ab5bc00a6d62bd4
2018-10-26 16:55:39 -07:00
Michael Pratt e60525e4dd Add block 3 features to /proc/cpuinfo
Linux added these block 3 features to the end of /proc/cpuinfo in
dfb4a70f20c5b3880da56ee4c9484bdb4e8f1e65.

This also fixes that block 3 features were completely missing from
FeatureSet.FlagsString(false) because FlagsString only prints Linux
blocks regardless of the cpuinfo option.

PiperOrigin-RevId: 218913816
Change-Id: I2f9c38c7c9da4b247a140877d4aca782e80684bd
2018-10-26 14:26:25 -07:00
Michael Pratt 624cc329d8 Order feature strings by block
PiperOrigin-RevId: 218894181
Change-Id: I97d0c74175f4aa528363f768a0a85d6953ea0bfd
2018-10-26 12:18:36 -07:00
Adin Scannell e7191f058f Use TRAP to simplify vsyscall emulation.
PiperOrigin-RevId: 218592058
Change-Id: I373a2d813aa6cc362500dd5a894c0b214a1959d7
2018-10-24 15:52:44 -07:00
Ian Gudger 425dccdd7e Convert Unix transport to syserr
Previously this code used the tcpip error space. Since it is no longer part of
netstack, it can use the sentry's error space (except for a few cases where
there is still some shared code. This reduces the number of error space
conversions required for hot Unix socket operations.

PiperOrigin-RevId: 218541611
Change-Id: I3d13047006a8245b5dfda73364d37b8a453784bb
2018-10-24 11:05:08 -07:00
Fabricio Voznika c99006a240 Mark netstack/tcpip/transport/tcp:tcp_test flaky
PiperOrigin-RevId: 218537640
Change-Id: I1c5f55a46390174e1f5caeff74b1a364fa3268d9
2018-10-24 10:46:25 -07:00
Nicolas Lacasse 4a1a2dead9 Run ptrace stubs in their own session and process group.
Pseudoterminal job control signals are meant to be received and handled by the
sandbox process, but if the ptrace stubs are running in the same process group,
they will receive the signals as well and inject then into the sentry kernel.

This can result in duplicate signals being delivered (often to the wrong
process), or a sentry panic if the ptrace stub is inactive.

This CL makes the ptrace stub run in a new session.

PiperOrigin-RevId: 218536851
Change-Id: Ie593c5687439bbfbf690ada3b2197ea71ed60a0e
2018-10-24 10:42:35 -07:00
Rahat Mahmood 46603b569c Fix panic on creation of zero-len shm segments.
Attempting to create a zero-len shm segment causes a panic since we
try to allocate a zero-len filemem region. The existing code had a
guard to disallow this, but the check didn't encode the fact that
requesting a private segment implies a segment creation regardless of
whether IPC_CREAT is explicitly specified.

PiperOrigin-RevId: 218405743
Change-Id: I30aef1232b2125ebba50333a73352c2f907977da
2018-10-23 14:18:54 -07:00
Adin Scannell 1369e17504 Remove blanket TODO, as it is self-evident.
PiperOrigin-RevId: 218390517
Change-Id: Ic891c1626e62a6c4ed57f8180740872bcd1be177
2018-10-23 12:52:27 -07:00
Adin Scannell ce3a762038 Remove artificial name length check.
This should be determined by the filesystem.

PiperOrigin-RevId: 218376553
Change-Id: I55d176e2cdf8acdd6642789af057b98bb8ca25b8
2018-10-23 11:27:28 -07:00
Tamir Duberstein 692df85673 Simplify channel management
The channels {cancel,resCh} have roughly the same lifetime and are used for
roughly the same purpose as an entry's waiters; we can unify the state
management of the two mechanisms, while also reducing unncessary mutex locking
and unlocking.

Made some cosmetic changes while I'm here.

PiperOrigin-RevId: 218343915
Change-Id: Ic69546a2b7b390162b2231f07f335dd6199472d7
2018-10-23 08:16:13 -07:00
Adin Scannell 75cd70ecc9 Track paths and provide a rename hook.
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.

PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-23 00:20:15 -07:00
Ian Gudger d7c11c7417 Refcount Unix transport queue
This allows us to release messages in the queue when all users close.

PiperOrigin-RevId: 218033550
Change-Id: I2f6e87650fced87a3977e3b74c64775c7b885c1b
2018-10-20 17:58:26 -07:00