Commit Graph

631 Commits

Author SHA1 Message Date
Michael Pratt 2b6df6a204 Format unshare flags
unshare actually takes a subset of clone flags, but has no unique flags,
so formatting as clone flags is close enough.

PiperOrigin-RevId: 225082774
Change-Id: I5b580f18607c7785f323e37809094115520a17c0
2018-12-11 15:33:14 -08:00
Christopher Koch 5934fad1d7 Remove unused envv variable from two funcs.
PiperOrigin-RevId: 225041520
Change-Id: Ib1afc693e592d308d60db82022c5b7743fd3c646
2018-12-11 11:40:16 -08:00
Haibo Xu 52fe3b87a4 Add safecopy support for arm64 platform.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I565214581eeb44045169da7f44d45a489082ac3a
PiperOrigin-RevId: 224938170
2018-12-10 21:35:02 -08:00
Ian Gudger 5d87d8865f Implement MSG_WAITALL
MSG_WAITALL requests that recv family calls do not perform short reads. It only
has an effect for SOCK_STREAM sockets, other types ignore it.

PiperOrigin-RevId: 224918540
Change-Id: Id97fbf972f1f7cbd4e08eec0138f8cbdf1c94fe7
2018-12-10 17:56:34 -08:00
Rahat Mahmood fc29770251 Add type safety to shm ids and keys.
PiperOrigin-RevId: 224864380
Change-Id: I49542279ad56bf15ba462d3de1ef2b157b31830a
2018-12-10 12:48:02 -08:00
Michael Pratt 99d5958693 Validate FS_BASE in Task.Clone
arch_prctl already verified that the new FS_BASE was canonical, but
Task.Clone did not. Centralize these checks in the arch packages.

Failure to validate could cause an error in PTRACE_SET_REGS when we try
to switch to the app.

PiperOrigin-RevId: 224862398
Change-Id: Iefe63b3f9aa6c4810326b8936e501be3ec407f14
2018-12-10 12:37:16 -08:00
Ian Gudger 25b8424d75 Stub out TCP_QUICKACK
PiperOrigin-RevId: 224696233
Change-Id: I45c425d9e32adee5dcce29ca7439a06567b26014
2018-12-09 00:50:33 -08:00
Zhaozhong Ni 9984138abe sentry: turn "dynamically-created" procfs files into static creation.
PiperOrigin-RevId: 224600982
Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34
2018-12-07 17:03:54 -08:00
Michael Pratt 42e2e5cae9 Format sigaction in strace
Sample:

I1206 14:24:56.768520    3700 x:0] [   1] ioctl_test E rt_sigaction(SIGSEGV, 0x7ee6edb0c590 {Handler: 0x559c6d915cf0, Flags: SA_SIGINFO|SA_RESTORER|SA_ONSTACK|SA_NODEFER, Restorer: 0x2a9901a259a0, Mask: []}, 0x7ee6edb0c630)
I1206 14:24:56.768530    3700 x:0] [   1] ioctl_test X rt_sigaction(SIGSEGV, 0x7ee6edb0c590 {Handler: 0x559c6d915cf0, Flags: SA_SIGINFO|SA_RESTORER|SA_ONSTACK|SA_NODEFER, Restorer: 0x2a9901a259a0, Mask: []}, 0x7ee6edb0c630 {Handler: SIG_DFL, Flags: 0x0, Restorer: 0x0, Mask: []}) = 0x0 (2.701?s)

PiperOrigin-RevId: 224596606
Change-Id: I3512493aed99d3d75600249263da46686b1dc0e7
2018-12-07 16:28:54 -08:00
Michael Pratt 673949048e Add period to comment
PiperOrigin-RevId: 224553291
Change-Id: I35d0772c215b71f4319c23f22df5c61c908f8590
2018-12-07 11:53:19 -08:00
Michael Pratt 51900fe3a4 Format signals, signal masks in strace
Sample:

I1205 16:51:49.869701    2492 x:0] [   1] ioctl_test E rt_sigaction(SIGIO, 0x7e0e5b5e8500, 0x7e0e5b5e85a0)
I1205 16:51:49.869766    2492 x:0] [   1] ioctl_test X rt_sigaction(SIGIO, 0x7e0e5b5e8500, 0x7e0e5b5e85a0) = 0x0 (44.336?s)
I1205 16:51:49.869831    2492 x:0] [   1] ioctl_test E rt_sigprocmask(SIG_UNBLOCK, 0x7e0e5b5e8878 [SIGIO], 0x7e0e5b5e87c0, 0x8)
I1205 16:51:49.869866    2492 x:0] [   1] ioctl_test X rt_sigprocmask(SIG_UNBLOCK, 0x7e0e5b5e8878 [SIGIO], 0x7e0e5b5e87c0 [SIGIO 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64], 0x8) = 0x0 (2.575?s)

PiperOrigin-RevId: 224422404
Change-Id: I3ed3f2ec6b1a639baa9cacd37ce7ee325c3703e4
2018-12-06 15:47:06 -08:00
Chris Kuiper 1b3442cae0 Allow sending of broadcast packets w/o route.
Currently sending a broadcast packet (for DHCP, e.g.) requires a "default
route" of the format "0.0.0.0/0 via 0.0.0.0 <intf>". There is no good reason
for this and on devices with several ports this creates a rather akward route
table with lots of such default routes (which defeats the purpose of a default
route).

PiperOrigin-RevId: 224378769
Change-Id: Icd7ec8a206eb08083cff9a837f6f9ab231c73a19
2018-12-06 11:48:12 -08:00
Michael Pratt 666db00c26 Convert ValueSet to a map
Unlike FlagSet, order doesn't matter here, so it can simply be a map.

PiperOrigin-RevId: 224377910
Change-Id: I15810c698a7f02d8614bf09b59583ab73cba0514
2018-12-06 11:43:11 -08:00
Ian Gudger 000fa84a3b Fix tcpip.Endpoint.Write contract regarding short writes
* Clarify tcpip.Endpoint.Write contract regarding short writes.
* Enforce tcpip.Endpoint.Write contract regarding short writes.
* Update relevant users of tcpip.Endpoint.Write.

PiperOrigin-RevId: 224377586
Change-Id: I24299ecce902eb11317ee13dae3b8d8a7c5b097d
2018-12-06 11:41:33 -08:00
Rahat Mahmood 685eaf119f Add counters for memory events.
Also ensure an event is emitted at startup.

PiperOrigin-RevId: 224372065
Change-Id: I5f642b6d6b13c6468ee8f794effe285fcbbf29cf
2018-12-06 11:15:47 -08:00
Zach Koopmans 4d8c7ae869 Fixing O_TRUNC behavior to match Linux.
PiperOrigin-RevId: 224351139
Change-Id: I9453bd75e5a8d38db406bb47fdc01038ac60922e
2018-12-06 09:26:49 -08:00
Zhaozhong Ni 7f35daddd2 sentry: support save / restore of TCP bind socket after shutdown.
PiperOrigin-RevId: 224227677
Change-Id: I08b0e0c0574170556269900653e5bcf9e9e5c9c9
2018-12-05 15:02:40 -08:00
Michael Pratt 9f64e64a6e Enforce directory accessibility before delete Walk
By Walking before checking that the directory is writable and
executable, MayDelete may return the Walk error (e.g., ENOENT) which
would normally be masked by a permission error (EACCES).

PiperOrigin-RevId: 224222453
Change-Id: I108a7f730e6bdaa7f277eaddb776267c00805475
2018-12-05 14:31:58 -08:00
Jamie Liu 23438b3632 Update MM.usageAS when mremap copies or moves a mapping.
PiperOrigin-RevId: 224221509
Change-Id: I7aaea74629227d682786d3e435737364921249bf
2018-12-05 14:27:23 -08:00
Zhaozhong Ni fda4557e3d sentry: skip waiting for undrain for netstack TCP endpoints in error state.
PiperOrigin-RevId: 224214981
Change-Id: I4c1dd5b1c856f7a4f9866a5dda44a5297e92486a
2018-12-05 13:51:16 -08:00
Michael Pratt 592f5bdc67 Add context to mount errors
This makes it more obvious why a mount failed.

PiperOrigin-RevId: 224203880
Change-Id: I7961774a7b6fdbb5493a791f8b3815c49b8f7631
2018-12-05 12:46:30 -08:00
Zach Koopmans 06131fe749 Check for CAP_SYS_RESOURCE in prctl(PR_SET_MM, ...)
If sys_prctl is called with PR_SET_MM without CAP_SYS_RESOURCE,
the syscall should return failure with errno set to EPERM.
See: http://man7.org/linux/man-pages/man2/prctl.2.html
PiperOrigin-RevId: 224182874
Change-Id: I630d1dd44af8b444dd16e8e58a0764a0cf1ad9a3
2018-12-05 10:53:51 -08:00
Chris Kuiper fab029c50b Remove incorrect code and improve testing of Stack.GetMainNICAddress
This removes code that should have never made it in in the first place, but did so due to incomplete testing. With the new tests the original code fails, the new code passes.

PiperOrigin-RevId: 224086966
Change-Id: I646fef76977f4528f3705f497b95fad6b3ec32bc
2018-12-04 19:09:11 -08:00
Michael Pratt 076f107643 Remove initRegs arg from clone
It is always the same as t.initRegs.

PiperOrigin-RevId: 224085550
Change-Id: I5cc4ddc3b481d4748c3c43f6f4bb50da1dbac694
2018-12-04 18:53:43 -08:00
Brian Geffon ffcbda0c8b Partial writes should loop in rpcinet.
FileOperations.Write should return ErrWouldBlock to allow the upper
layer to loop and sendmsg should continue writing where it left off
on a partial write.

PiperOrigin-RevId: 224081631
Change-Id: Ic61f6943ea6b7abbd82e4279decea215347eac48
2018-12-04 18:15:10 -08:00
Ian Gudger d209f71b9f Whitelist Go 1.12 for tcpip/time_unsafe.go
The signature of time.now has remained unchanged:
c2412a7681/src/time/time.go (L1072)

PiperOrigin-RevId: 224061160
Change-Id: Ic84bd6ee8fb9952cd9ab580bcb0892444ce7c2da
2018-12-04 15:52:14 -08:00
Brian Geffon 2cab0e82ad Linkat(2) should sanity check flags.
PiperOrigin-RevId: 224047765
Change-Id: I6f3c75b33c32bf8f8910ea3fab35406d7d672d87
2018-12-04 14:34:19 -08:00
Brian Geffon 82719be42e Max link traversals should be for an entire path.
The number of symbolic links that are allowed to be followed
are for a full path and not just a chain of symbolic links.

PiperOrigin-RevId: 224047321
Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
2018-12-04 14:32:03 -08:00
Zhaozhong Ni adafc08d7c sentry: save / restore netstack procfs configuration.
PiperOrigin-RevId: 224047120
Change-Id: Ia6cb17fa978595cd73857b6178c4bdba401e185e
2018-12-04 14:30:42 -08:00
Brian Geffon 5a6a1eb420 Enforce name length restriction on paths.
NAME_LENGTH must be enforced per component.

PiperOrigin-RevId: 224046749
Change-Id: Iba8105b00d951f2509dc768af58e4110dafbe1c9
2018-12-04 14:28:33 -08:00
Rahat Mahmood 806e346491 Fix mempolicy_test on bazel.
Bazel runs multiple test cases on the same thread. Some of the test
cases rely on the test thread starting with the default memory policy,
while other tests modify the test thread's memory policy. This
obviously breaks when the test framework doesn't run each test case on
a new thread.

Also fixing an incompatibility where set_mempolicy(2) was prevented
from specifying an empty nodemask, which is allowed for some modes.

PiperOrigin-RevId: 224038957
Change-Id: Ibf780766f2706ebc9b129dbc8cf1b85c2a275074
2018-12-04 13:45:58 -08:00
Ian Gudger 8cbd6153a6 Fix available calculation when merging TCP segments
PiperOrigin-RevId: 224033418
Change-Id: I780be973e8be68ac93e8c9e7a100002e912f40d2
2018-12-04 13:15:25 -08:00
Zhaozhong Ni ad8f293e1a sentry: save copy of tcp segment's delivered views to avoid in-struct pointers.
PiperOrigin-RevId: 224033238
Change-Id: Ie5b1854b29340843b02c123766d290a8738d7631
2018-12-04 13:14:24 -08:00
Nicolas Lacasse 54dd0d0dc5 Fix data race caused by unlocked call of Dirent.descendantOf.
PiperOrigin-RevId: 224025363
Change-Id: I98864403c779832e9e1436f7d3c3f6fb2fba9904
2018-12-04 12:24:55 -08:00
Bin Lu c3dd68cea7 Add ARM64 support to pkg/abi/linux
Signed-off-by: Bin Lu <bin.lu@arm.com>
Change-Id: I73cc4c406fadccb054e8e83c9464f6bef6280b0f
PiperOrigin-RevId: 224025309
2018-12-04 12:24:07 -08:00
Ian Gudger 5560615c53 Return an int32 for netlink SO_RCVBUF
Untyped integer constants default to type int and the binary package will panic
if one tries to encode an int.

PiperOrigin-RevId: 223890001
Change-Id: Iccc3afd6d74bad24c35d764508e450fd317b76ec
2018-12-03 17:03:15 -08:00
Ian Gudger 99fb113869 Test that full segments will be sent when delay/cork is enabled
PiperOrigin-RevId: 223425575
Change-Id: Idd777e04c69e6ffcbfb0bdbea828a8b8b42d7672
2018-11-29 15:46:38 -08:00
Nicolas Lacasse 573622fdca Fix data race in fs.Async.
Replaces the WaitGroup with a RWMutex. Calls to Async hold the mutex for
reading, while AsyncBarrier takes the lock for writing. This ensures that all
executing Async work finishes before AsyncBarrier returns.

Also pushes the Async() call from Inode.Release into
gofer/InodeOperations.Release(). This removes a recursive Async call which
should not have been allowed in the first place. The gofer Release call is the
slow one (since it may make RPCs to the gofer), so putting the Async call there
makes sense.

PiperOrigin-RevId: 223093067
Change-Id: I116da7b20fce5ebab8d99c2ab0f27db7c89d890e
2018-11-27 18:17:09 -08:00
Brian Geffon 5bd02b224f Save shutdown flags first.
With rpcinet if shutdown flags are not saved before making
the rpc a race is possible where blocked threads are woken
up before the flags have been persisted. This would mean
that threads can block indefinitely in a recvmsg after a
shutdown(SHUT_RD) has happened.

PiperOrigin-RevId: 223089783
Change-Id: If595e7add12aece54bcdf668ab64c570910d061a
2018-11-27 17:48:05 -08:00
Haibo Xu 9e0f132377 Add procid support for arm64 platform
Change-Id: I7c3db8dfdf95a125d7384c1d67c3300dbb99a47e
PiperOrigin-RevId: 223039923
2018-11-27 12:46:39 -08:00
Zach Koopmans b3b60ea29a Implementation of preadv2 for Linux 4.4 support
Implement RWF_HIPRI (4.6) silently passes the read call.
Implement -1 offset calls readv.

PiperOrigin-RevId: 222840324
Change-Id: If9ddc1e8d086e1a632bdf5e00bae08205f95b6b0
2018-11-26 09:50:47 -08:00
Ian Gudger 1918563525 Make ToView non-allocating for single VectorizedViews containing a single View
PiperOrigin-RevId: 222483471
Change-Id: I6720690b20167dd541fdfa5218eba7c9f7483347
2018-11-21 18:11:13 -08:00
Fabricio Voznika eaac94d91c Use RET_KILL_PROCESS if available in kernel
RET_KILL_THREAD doesn't work well for Go because it will
kill only the offending thread and leave the process hanging.
RET_TRAP can be masked out and it's not guaranteed to kill
the process. RET_KILL_PROCESS is available since 4.14.

For older kernel, continue to use RET_TRAP as this is the
best option (likely to kill process, easy to debug).

PiperOrigin-RevId: 222357867
Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
2018-11-20 22:56:51 -08:00
Fabricio Voznika 5236b78242 Dumps stacks if watchdog thread is stuck
PiperOrigin-RevId: 222332703
Change-Id: Id5c3cf79591c5d2949895b4e323e63c48c679820
2018-11-20 17:24:19 -08:00
Fabricio Voznika 8b314b0bf4 Fix recursive read lock taken on TaskSet
SyncSyscallFiltersToThreadGroup and Task.TheadID() both acquired TaskSet RWLock
in R mode and could deadlock if a writer comes in between.

PiperOrigin-RevId: 222313551
Change-Id: I4221057d8d46fec544cbfa55765c9a284fe7ebfa
2018-11-20 15:07:56 -08:00
Michael Pratt 03c1eb78b5 Reference upstream licenses
Include copyright notices and the referenced LICENSE file.

PiperOrigin-RevId: 222171321
Change-Id: I0cc0b167ca51b536d1087bf1c4742fdf1430bc2a
2018-11-20 14:05:16 -08:00
Fabricio Voznika fadffa2ff8 Add unsupported syscall events for get/setsockopt
PiperOrigin-RevId: 222148953
Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa
2018-11-20 14:04:12 -08:00
Nicolas Lacasse 8c84f9a3c1 Parse the tmpfs mode before validating.
This gets rid of the problematic modeRegex.

PiperOrigin-RevId: 221835959
Change-Id: I566b8d8a43579a4c30c0a08a620a964bbcd826dd
2018-11-20 14:02:39 -08:00
Adin Scannell bb9a2bb62e Update futex to use usermem abstractions.
This eliminates the indirection that existed in task_futex.

PiperOrigin-RevId: 221832498
Change-Id: Ifb4c926d493913aa6694e193deae91616a29f042
2018-11-20 14:02:07 -08:00
Rahat Mahmood f7aa937124 Advertise vsyscall support via /proc/<pid>/maps.
Also update test utilities for probing vsyscall support and add a
metric to see if vsyscalls are actually used in sandboxes.

PiperOrigin-RevId: 221698834
Change-Id: I57870ecc33ea8c864bd7437833f21aa1e8117477
2018-11-15 15:14:38 -08:00
Nicolas Lacasse 6ef08c2bc2 Allow setting sticky bit in tmpfs permissions.
PiperOrigin-RevId: 221683127
Change-Id: Ide6a9f41d75aa19d0e2051a05a1e4a114a4fb93c
2018-11-15 13:48:59 -08:00
Ian Gudger 9d8e49d950 Process delayed packets when delay is disabled
Moving the wakeup logic into the disable blocks is an optimization.

PiperOrigin-RevId: 221677028
Change-Id: Ib5a5a6d52cc77b4bbc5dedcad9ee1dbb3da98deb
2018-11-15 13:17:06 -08:00
Bert Muthalaly bc41e4761b Rename incorrectly named (dst, src) arguments in DeliverNetworkPacket prototype
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).

Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.

Note that every callsite passes arguments in the order (src, dst).

PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
2018-11-14 14:46:24 -08:00
Ian Gudger b5e91eaa52 Clean up tcp.sendData
PiperOrigin-RevId: 221484739
Change-Id: I44c71f79f99d0d00a2e70a7f06d7024a62a5de0a
2018-11-14 11:58:41 -08:00
Ian Gudger 7f60294a73 Implement TCP_NODELAY and TCP_CORK
Previously, TCP_NODELAY was always enabled and we would lie about it being
configurable. TCP_NODELAY is now disabled by default (to match Linux) in the
socket layer so that non-gVisor users don't automatically start using this
questionable optimization.

PiperOrigin-RevId: 221368472
Change-Id: Ib0240f66d94455081f4e0ca94f09d9338b2c1356
2018-11-13 18:02:43 -08:00
Googler 25d07fbbed Internal change.
PiperOrigin-RevId: 221189534
Change-Id: Id20d318bed97d5226b454c9351df396d11251e1f
2018-11-12 17:44:46 -08:00
Ian Gudger c22da3e705 Remove obsolete TODO
PiperOrigin-RevId: 221117846
Change-Id: I2a43fd8135b1d1194ff81e98644ce6b6182ece50
2018-11-12 10:45:19 -08:00
Bhasker Hariharan 33089561b1 Add an implementation of a SACK scoreboard as per RFC6675.
PiperOrigin-RevId: 220866996
Change-Id: I89d48215df57c00d6a6ec512fc18712a2ea9080b
2018-11-09 14:38:46 -08:00
Andrei Vagin 2ef122da35 Implement sync_file_range()
sync_file_range - sync a file segment with disk

In Linux, sync_file_range() accepts three flags:

       SYNC_FILE_RANGE_WAIT_BEFORE
              Wait  upon  write-out  of  all pages in the specified range that
              have already been submitted to the device driver  for  write-out
              before performing any write.

       SYNC_FILE_RANGE_WRITE
              Initiate  write-out  of  all  dirty pages in the specified range
              which are not presently submitted  write-out.   Note  that  even
              this  may  block if you attempt to write more than request queue
              size.

       SYNC_FILE_RANGE_WAIT_AFTER
              Wait upon write-out of all pages in the range  after  performing
              any write.

In this implementation:

SYNC_FILE_RANGE_WAIT_BEFORE without SYNC_FILE_RANGE_WAIT_AFTER isn't
supported right now.

SYNC_FILE_RANGE_WRITE is skipped. It should initiate write-out of  all
dirty pages, but it doesn't wait, so it should be safe to do nothing
while nobody uses SYNC_FILE_RANGE_WAIT_BEFORE.

SYNC_FILE_RANGE_WAIT_AFTER is equal to fdatasync(). In Linux,
sync_file_range() doesn't writes out the  file's  meta-data, but
fdatasync() does if a file size is changed.

PiperOrigin-RevId: 220730840
Change-Id: Iae5dfb23c2c916967d67cf1a1ad32f25eb3f6286
2018-11-08 17:39:51 -08:00
Rahat Mahmood 5a0be6fa20 Create stubs for syscalls upto Linux 4.4.
Create syscall stubs for missing syscalls upto Linux 4.4 and advertise
a kernel version of 4.4.

PiperOrigin-RevId: 220667680
Change-Id: Idbdccde538faabf16debc22f492dd053a8af0ba7
2018-11-08 11:09:46 -08:00
Fabricio Voznika dce61075c0 Fix flaky TestCacheResolutionTimeout
Increase timeout to prevent the entry from being
found when there is delay on the address resolution
goroutine that doesn't mark the request as failed.

PiperOrigin-RevId: 220504789
Change-Id: I7e44fd95d8624bd69962f862fbf5517a81395f2a
2018-11-07 12:01:48 -08:00
Googler 9256ed5283 Internal change.
PiperOrigin-RevId: 220314735
Change-Id: Ic519567e43f6caf042b9f223e517da40640b7d38
2018-11-06 11:08:22 -08:00
Ian Gudger 95722dc4dd Use correct company name in copyright header
These files were added with the wrong name after all of the existing files
were corrected.

PiperOrigin-RevId: 220202068
Change-Id: Ia0d15233c1aa69330356a7cf16b5aa00d978e09c
2018-11-05 17:23:56 -08:00
Ian Gudger 37cbce1f91 Merge segments in sender's writeList
PiperOrigin-RevId: 220185891
Change-Id: Iaea73fd7b2fa8c399b989cdcaabf4885f370df4b
2018-11-05 15:39:30 -08:00
Fabricio Voznika b6b81fd04b Add new log format that is compatible with Kubernetes
Fluentd configuration uses 'log' for the log message
while containerd uses 'msg'. Since we can't have a single
JSON format for both, add another log format and make
debug log configurable.

PiperOrigin-RevId: 219729658
Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0
2018-11-01 17:44:58 -07:00
Ian Lewis 9d69d85bc1 Make error messages a bit more user friendly.
Updated error messages so that it doesn't print full Go struct representations
when running a new container in a sandbox. For example, this occurs frequently
when commands are not found when doing a 'kubectl exec'.

PiperOrigin-RevId: 219729141
Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
2018-11-01 17:40:09 -07:00
Rahat Mahmood 0e277a39c8 Prevent premature destruction of shm segments.
Shm segments can be marked for lazy destruction via shmctl(IPC_RMID),
which destroys a segment once it is no longer attached to any
processes. We were unconditionally decrementing the segment refcount
on shmctl(IPC_RMID) which allowed a user to force a segment to be
destroyed by repeatedly calling shmctl(IPC_RMID), with outstanding
memory maps to the segment.

This is problematic because the memory released by a segment destroyed
this way can be reused by a different process while remaining
accessible by the process with outstanding maps to the segment.

PiperOrigin-RevId: 219713660
Change-Id: I443ab838322b4fb418ed87b2722c3413ead21845
2018-11-01 15:54:14 -07:00
Juan b23cd33682 modify modeRegexp to adapt the default spec of containerd
https://github.com/containerd/containerd/blob/master/oci/spec.go#L206, the mode=755
didn't match the pattern modeRegexp = regexp.MustCompile("0[0-7][0-7][0-7]").

Closes #112

Signed-off-by: Juan <xionghuan.cn@gmail.com>
Change-Id: I469e0a68160a1278e34c9e1dbe4b7784c6f97e5a
PiperOrigin-RevId: 219672525
2018-11-01 11:57:54 -07:00
Rahat Mahmood c2249d6472 Mark amutex_test as flaky.
PiperOrigin-RevId: 219575226
Change-Id: If4e67b29110332c94013513fb111ec7e019f2915
2018-10-31 19:33:09 -07:00
Ian Gudger 59b7766af7 Fix a race where keepalives could be sent while there is pending data
PiperOrigin-RevId: 219571556
Change-Id: I5a1042c1cb05eb2711eb01627fd298bad6c543a6
2018-10-31 18:42:44 -07:00
Ian Gudger eeddae1199 Use syserr style error translation in netstack's rawfile
Replacing map lookups with slice indexing is higher performance.

PiperOrigin-RevId: 219569901
Change-Id: I9b7cd22abd4b95383025edbd5a80d1c1a4496936
2018-10-31 18:22:05 -07:00
Adin Scannell fb613020c7 kvm: simplify floating point logic.
This reduces the number of floating point save/restore cycles required (since
we don't need to restore immediately following the switch, this always happens
in a known context) and allows the kernel hooks to capture state. This lets us
remove calls like "Current()".

PiperOrigin-RevId: 219552844
Change-Id: I7676fa2f6c18b9919718458aa888b832a7db8cab
2018-10-31 15:59:23 -07:00
Adin Scannell c4bbb54168 kvm: add detailed traces on vCPU errors.
This improves debuggability greatly.

PiperOrigin-RevId: 219551560
Change-Id: I2ecaffdd1c17b0d9f25911538ea6f693e2bc699f
2018-10-31 15:50:10 -07:00
Adin Scannell e9dbd5ab67 kvm: avoid siginfo allocations.
PiperOrigin-RevId: 219492587
Change-Id: I47f6fc0b74a4907ab0aff03d5f26453bdb983bb5
2018-10-31 10:08:06 -07:00
Tamir Duberstein 0692ad72ef Remove ipv4.endpoint.address
This field was added in the intial implementation, before Route existed
to pass the local and remote addresses to the packet-writing path.
Today, the Route's members should be respected. A similar bug was
previously fixed in 214650822.

PiperOrigin-RevId: 219474095
Change-Id: Id2a8ee4421d2841c8d88ccb3c193c455086350ee
2018-10-31 08:04:57 -07:00
Adin Scannell 0091db9cbd kvm: use private futexes.
Use private futexes for performance and to align with other runtime uses.

PiperOrigin-RevId: 219422634
Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd
2018-10-30 22:46:42 -07:00
Michael Pratt 245d81561b Clean up cpuid_parse_test
Actually parse flags from cpuinfo to avoid mistakenly matching
substrings in cpuinfo that happen to match a flags.

Some features were only exposed in recent versions of Linux. Don't
require them to appear in cpuinfo on old versions of Linux.

Move PREFETCHWT1 back to parse only features. It isn't actually exposed
in Linux yet. Move SDBG to shown features. It has been visible since
Linux 4.3.

PiperOrigin-RevId: 219381731
Change-Id: Ied7c0ee7c8a9879683e81933de56c9074b01108f
2018-10-30 15:56:12 -07:00
Michael Pratt b15a7267e1 Add AMD-specific features to cpuid package
Extend the cpuid package to parse and emulate cpuid features that exist
only on AMD and not Intel. The least straightforward part of this is
that AMD duplicates several block 1 features in block 6. Thus we ignore
those features when parsing block 6 and add them when emulating.

PiperOrigin-RevId: 218935032
Change-Id: Id41bf1c24720b0d9b968e2c19ab5bc00a6d62bd4
2018-10-26 16:55:39 -07:00
Michael Pratt e60525e4dd Add block 3 features to /proc/cpuinfo
Linux added these block 3 features to the end of /proc/cpuinfo in
dfb4a70f20c5b3880da56ee4c9484bdb4e8f1e65.

This also fixes that block 3 features were completely missing from
FeatureSet.FlagsString(false) because FlagsString only prints Linux
blocks regardless of the cpuinfo option.

PiperOrigin-RevId: 218913816
Change-Id: I2f9c38c7c9da4b247a140877d4aca782e80684bd
2018-10-26 14:26:25 -07:00
Michael Pratt 624cc329d8 Order feature strings by block
PiperOrigin-RevId: 218894181
Change-Id: I97d0c74175f4aa528363f768a0a85d6953ea0bfd
2018-10-26 12:18:36 -07:00
Adin Scannell e7191f058f Use TRAP to simplify vsyscall emulation.
PiperOrigin-RevId: 218592058
Change-Id: I373a2d813aa6cc362500dd5a894c0b214a1959d7
2018-10-24 15:52:44 -07:00
Ian Gudger 425dccdd7e Convert Unix transport to syserr
Previously this code used the tcpip error space. Since it is no longer part of
netstack, it can use the sentry's error space (except for a few cases where
there is still some shared code. This reduces the number of error space
conversions required for hot Unix socket operations.

PiperOrigin-RevId: 218541611
Change-Id: I3d13047006a8245b5dfda73364d37b8a453784bb
2018-10-24 11:05:08 -07:00
Fabricio Voznika c99006a240 Mark netstack/tcpip/transport/tcp:tcp_test flaky
PiperOrigin-RevId: 218537640
Change-Id: I1c5f55a46390174e1f5caeff74b1a364fa3268d9
2018-10-24 10:46:25 -07:00
Nicolas Lacasse 4a1a2dead9 Run ptrace stubs in their own session and process group.
Pseudoterminal job control signals are meant to be received and handled by the
sandbox process, but if the ptrace stubs are running in the same process group,
they will receive the signals as well and inject then into the sentry kernel.

This can result in duplicate signals being delivered (often to the wrong
process), or a sentry panic if the ptrace stub is inactive.

This CL makes the ptrace stub run in a new session.

PiperOrigin-RevId: 218536851
Change-Id: Ie593c5687439bbfbf690ada3b2197ea71ed60a0e
2018-10-24 10:42:35 -07:00
Rahat Mahmood 46603b569c Fix panic on creation of zero-len shm segments.
Attempting to create a zero-len shm segment causes a panic since we
try to allocate a zero-len filemem region. The existing code had a
guard to disallow this, but the check didn't encode the fact that
requesting a private segment implies a segment creation regardless of
whether IPC_CREAT is explicitly specified.

PiperOrigin-RevId: 218405743
Change-Id: I30aef1232b2125ebba50333a73352c2f907977da
2018-10-23 14:18:54 -07:00
Adin Scannell 1369e17504 Remove blanket TODO, as it is self-evident.
PiperOrigin-RevId: 218390517
Change-Id: Ic891c1626e62a6c4ed57f8180740872bcd1be177
2018-10-23 12:52:27 -07:00
Adin Scannell ce3a762038 Remove artificial name length check.
This should be determined by the filesystem.

PiperOrigin-RevId: 218376553
Change-Id: I55d176e2cdf8acdd6642789af057b98bb8ca25b8
2018-10-23 11:27:28 -07:00
Tamir Duberstein 692df85673 Simplify channel management
The channels {cancel,resCh} have roughly the same lifetime and are used for
roughly the same purpose as an entry's waiters; we can unify the state
management of the two mechanisms, while also reducing unncessary mutex locking
and unlocking.

Made some cosmetic changes while I'm here.

PiperOrigin-RevId: 218343915
Change-Id: Ic69546a2b7b390162b2231f07f335dd6199472d7
2018-10-23 08:16:13 -07:00
Adin Scannell 75cd70ecc9 Track paths and provide a rename hook.
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.

PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-23 00:20:15 -07:00
Ian Gudger d7c11c7417 Refcount Unix transport queue
This allows us to release messages in the queue when all users close.

PiperOrigin-RevId: 218033550
Change-Id: I2f6e87650fced87a3977e3b74c64775c7b885c1b
2018-10-20 17:58:26 -07:00
Fabricio Voznika b2068cf5a5 Add more unimplemented syscall events
Added events for *ctl syscalls that may have multiple different commands.
For runsc, each syscall event is only logged once. For *ctl syscalls, use
the cmd as identifier, not only the syscall number.

PiperOrigin-RevId: 218015941
Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59
2018-10-20 11:14:23 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Ian Gudger f7419fec26 Use generic ilist in Unix transport queue
This should improve performance.

PiperOrigin-RevId: 217610560
Change-Id: I370f196ea2396f1715a460b168ecbee197f94d6c
2018-10-17 16:31:15 -07:00
Jamie Liu b2a88ff471 Check thread group CPU timers in the CPU clock ticker.
This reduces the number of goroutines and runtime timers when
ITIMER_VIRTUAL or ITIMER_PROF are enabled, or when RLIMIT_CPU is set.
This also ensures that thread group CPU timers only advance if running
tasks are observed at the time the CPU clock advances, mostly
eliminating the possibility that a CPU timer expiration observes no
running tasks and falls back to the group leader.

PiperOrigin-RevId: 217603396
Change-Id: Ia24ce934d5574334857d9afb5ad8ca0b6a6e65f4
2018-10-17 15:50:02 -07:00
Ian Gudger 6922eee649 Merge queue into Unix transport
This queue only has a single user, so there is no need for it to use an
interface. Merging it into the same package as its sole user allows us to avoid
a circular dependency.

This simplifies the code and should slightly improve performance.

PiperOrigin-RevId: 217595889
Change-Id: Iabbd5164240b935f79933618c61581bc8dcd2822
2018-10-17 15:10:20 -07:00
Ian Gudger 8c85f5e9ce Fix typos in socket_test
PiperOrigin-RevId: 217576188
Change-Id: I82e45c306c5c9161e207311c7dbb8a983820c1df
2018-10-17 13:25:45 -07:00
Michael Pratt 8fa6f6fe76 Reflow comment to 80 columns
PiperOrigin-RevId: 217573168
Change-Id: Ic1914d0ef71bab020e3ee11cf9c4a50a702bd8dd
2018-10-17 13:06:16 -07:00
Nicolas Lacasse 4e6f0892c9 runsc: Support job control signals for the root container.
Now containers run with "docker run -it" support control characters like ^C and
^Z.

This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.

PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17 12:29:05 -07:00
Michael Pratt 578fe5a50d Fix PTRACE_GETREGSET write size
The existing logic is backwards and writes iov_len == 0 for a full write.

PiperOrigin-RevId: 217560377
Change-Id: I5a39c31bf0ba9063a8495993bfef58dc8ab7c5fa
2018-10-17 11:53:04 -07:00
Ian Gudger 6cba410df0 Move Unix transport out of netstack
PiperOrigin-RevId: 217557656
Change-Id: I63d27635b1a6c12877279995d2d9847b6a19da9b
2018-10-17 11:37:51 -07:00
Zhaozhong Ni 9d17eba121 compressio: do not schedule new I/Os when there is no worker (stream closed).
PiperOrigin-RevId: 217536677
Change-Id: Ib9a5a2542df12d0bc5592b91463ffd646e2ec295
2018-10-17 09:57:57 -07:00
Ian Gudger 324ad3564b Refactor host.ConnectedEndpoint
* Integrate recvMsg and sendMsg functions into Recv and Send respectively as
  they are no longer shared.
* Clean up partial read/write error handling code.
* Re-order code to make sense given that there is no longer a host.endpoint
  type.

PiperOrigin-RevId: 217255072
Change-Id: Ib43fe9286452f813b8309d969be11f5fa40694cd
2018-10-15 20:23:18 -07:00
Ian Gudger 167f2401c4 Merge host.endpoint into host.ConnectedEndpoint
host.endpoint contained duplicated logic from the sockerpair implementation and
host.ConnectedEndpoint. Remove host.endpoint in favor of a
host.ConnectedEndpoint wrapped in a socketpair end.

PiperOrigin-RevId: 217240096
Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c
2018-10-15 17:48:11 -07:00
Nicolas Lacasse ecd94ea7a6 Clean up Rename and Unlink checks for EBUSY.
- Change Dirent.Busy => Dirent.isMountPoint. The function body is unchanged,
  and it is no longer exported.

- fs.MayDelete now checks that the victim is not the process root. This aligns
  with Linux's namei.c:may_delete().

- Fix "is-ancestor" checks to actually compare all ancestors, not just the
  parents.

- Fix handling of paths that end in dots, which are handled differently in
  Rename vs. Unlink.

PiperOrigin-RevId: 217239274
Change-Id: I7a0eb768e70a1b2915017ce54f7f95cbf8edf1fb
2018-10-15 17:42:30 -07:00
Zhaozhong Ni 4ea69fce8d sentry: save fs.Dirent deleted info.
PiperOrigin-RevId: 217155458
Change-Id: Id3265b1ec784787039e2131c80254ac4937330c7
2018-10-15 09:31:32 -07:00
Kevin Krakauer 47d3862c33 runsc: Support retrieving MTU via netdevice ioctl.
This enables ifconfig to display MTU.

PiperOrigin-RevId: 216917021
Change-Id: Id513b23d9d76899bcb71b0b6a25036f41629a923
2018-10-12 13:58:32 -07:00
Fabricio Voznika 86680fa002 Add String() method to AddressMask
PiperOrigin-RevId: 216770391
Change-Id: Idcdc28b2fe9e1b0b63b8119d445f05a8bcbce81e
2018-10-11 15:22:02 -07:00
Adin Scannell 96c68b36f6 Add client sanity checking for P9.
This should reduce use-after-free errors and accidental close via create or
remove. This change includes one functional fix as well: when closing via
remove, the closed field was not set and the finalizer was not freed, so the
file would have been clunked at some random point in the future.

PiperOrigin-RevId: 216750000
Change-Id: Ice3292c6feb953ae97abac308afbafd2d9410402
2018-10-11 13:23:59 -07:00
Zhaozhong Ni 0bfa03d61c sentry: allow saving of unlinked files with open fds on virtual fs.
PiperOrigin-RevId: 216733414
Change-Id: I33cd3eb818f0c39717d6656fcdfff6050b37ebb0
2018-10-11 11:41:44 -07:00
Adin Scannell 463e73d46d Add seccomp filter configuration to ptrace stubs.
This is a defense-in-depth measure. If the sentry is compromised, this prevents
system call injection to the stubs. There is some complexity with respect to
ptrace and seccomp interactions, so this protection is not really available
for kernel versions < 4.8; this is detected dynamically.

Note that this also solves the vsyscall emulation issue by adding in
appropriate trapping for those system calls. It does mean that a compromised
sentry could theoretically inject these into the stub (ignoring the trap and
resume, thereby allowing execution), but they are harmless.

PiperOrigin-RevId: 216647581
Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
2018-10-10 22:40:28 -07:00
Jonathan Giannuzzi 8388a505e7 Support for older Linux kernels without getrandom
Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578
PiperOrigin-RevId: 216591484
2018-10-10 14:18:47 -07:00
Michael Pratt ddb34b3690 Enforce message size limits and avoid host calls with too many iovecs
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.

When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.

To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.

PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
2018-10-10 14:10:17 -07:00
Nicolas Lacasse b78552d30e When creating a new process group, add it to the session.
PiperOrigin-RevId: 216554791
Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35
2018-10-10 10:42:11 -07:00
Ian Gudger c36d2ef373 Add new netstack metrics to the sentry
PiperOrigin-RevId: 216431260
Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820
2018-10-09 15:12:44 -07:00
Brian Geffon acf7a95189 Add memunit to sysinfo(2).
Also properly add padding after Procs in the linux.Sysinfo
structure. This will be implicitly padded to 64bits so we
need to do the same.

PiperOrigin-RevId: 216372907
Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
2018-10-09 09:52:14 -07:00
Michael Pratt 569c2b06c4 Statfs Namelen should be NAME_MAX not PATH_MAX
We accidentally set the wrong maximum. I've also added PATH_MAX and
NAME_MAX to the linux abi package.

PiperOrigin-RevId: 216221311
Change-Id: I44805fcf21508831809692184a0eba4cee469633
2018-10-08 11:39:54 -07:00
Jamie Liu e9e8be6613 Implement shared futexes.
- Shared futex objects on shared mappings are represented by Mappable +
  offset, analogous to Linux's use of inode + offset. Add type
  futex.Key, and change the futex.Manager bucket API to use futex.Keys
  instead of addresses.

- Extend the futex.Checker interface to be able to return Keys for
  memory mappings. It returns Keys rather than just mappings because
  whether the address or the target of the mapping is used in the Key
  depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this
  matters because using mapping target for a futex on a MAP_PRIVATE
  mapping causes it to stop working across COW-breaking.

- futex.Manager.WaitComplete depends on atomic updates to
  futex.Waiter.addr to determine when it has locked the right bucket,
  which is much less straightforward for struct futex.Waiter.key. Switch
  to an atomically-accessed futex.Waiter.bucket pointer.

- futex.Manager.Wake now needs to take a futex.Checker to resolve
  addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit
  path to perform a shared futex wakeup (Linux:
  kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid,
  FUTEX_WAKE, ...)). This is a problem because futexChecker is in the
  syscalls/linux package. Move it to kernel.

PiperOrigin-RevId: 216207039
Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
2018-10-08 10:20:38 -07:00
Ian Gudger beac59b37a Fix panic if FIOASYNC callback is registered and triggered without target
PiperOrigin-RevId: 215674589
Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
2018-10-03 20:22:31 -07:00
Nicolas Lacasse 213f6688a5 Implement TIOCSCTTY ioctl as a noop.
PiperOrigin-RevId: 215658757
Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
2018-10-03 17:29:56 -07:00
Ian Gudger 4fef31f96c Add S/R support for FIOASYNC
PiperOrigin-RevId: 215655197
Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
2018-10-03 17:03:09 -07:00
Jamie Liu 8e729e0e1f Add //pkg/sync:generic_atomicptr.
PiperOrigin-RevId: 215620949
Change-Id: I519da4b44386d950443e5784fb8c48ff9a36c5d3
2018-10-03 13:52:15 -07:00
Nicolas Lacasse 0a13042d48 Bump some timeouts in the image tests.
PiperOrigin-RevId: 215489101
Change-Id: Iaf96aa8edb1101b70548030c62995841215237d9
2018-10-02 17:28:09 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Michael Pratt 0400e54592 Add itimer types to linux package, strace
PiperOrigin-RevId: 215278262
Change-Id: Icd10384c99802be6097be938196044386441e282
2018-10-01 14:16:53 -07:00
Nicolas Lacasse 07aa040842 Fix possible panic in control.Processes.
There was a race where we checked task.Parent() != nil, and then later called
task.Parent() again, assuming that it is not nil.  If the task is exiting, the
parent may have been set to nil in between the two calls, causing a panic.

This CL changes the code to only call task.Parent() once.

PiperOrigin-RevId: 215274456
Change-Id: Ib5a537312c917773265ec72016014f7bc59a5f59
2018-10-01 13:56:07 -07:00
Googler fb65b0b471 Change tcpip.Route.Mask to tcpip.AddressMask.
PiperOrigin-RevId: 214975659
Change-Id: I7bd31a2c54f03ff52203109da312e4206701c44c
2018-09-28 12:18:15 -07:00
Michael Pratt 3ff24b4f2c Require AF_UNIX sockets from the gofer
host.endpoint already has the check, but it is missing from
host.ConnectedEndpoint.

PiperOrigin-RevId: 214962762
Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6
2018-09-28 11:03:11 -07:00
Sepehr Raissian c17ea8c6e2 Block for link address resolution
Previously, if address resolution for UDP or Ping sockets required sending
packets using Write in Transport layer, Resolve would return ErrWouldBlock
and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution
would run in background. Further calls to Write using same address would also
return ErrNoLinkAddress until resolution has been completed successfully.

Since Write is not allowed to block and System Calls need to be
interruptible in System Call layer, the caller to Write is responsible for
blocking upon return of ErrWouldBlock.

Now, when startAddressResolution is called a notification channel for
the completion of the address resolution is returned.
The channel will traverse up to the calling function of Write as well as
ErrNoLinkAddress. Once address resolution is complete (success or not) the
channel is closed. The caller would call Write again to send packets and
check if address resolution was compeleted successfully or not.

Fixes google/gvisor#5

Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93
PiperOrigin-RevId: 214962200
2018-09-28 11:00:16 -07:00
Nicolas Lacasse b709d23987 Forward ioctl(TCSETSF) calls on host ttys to the host kernel.
We already forward TCSETS and TCSETSW.  TCSETSF is roughly equivalent but
discards pending input.

The filters were relaxed to allow host ioctls with TCSETSF argument.

This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.

Before:
	root@b8a0240fc836:/# passwd
	Enter new UNIX password: 123
	Retype new UNIX password: 123
	passwd: password updated successfully

After:
	root@ae6f5dabe402:/# passwd
	Enter new UNIX password:
	Retype new UNIX password:
	passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27 18:17:38 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Anton Gyllenberg 68ac2ad1e1 netstack: make go:linkname work for all architectures
The //go:linkname directive requires the presence of
assembly files in the package. Even an empty file will do.
There was an empty assembly file commit_arm64.s, but
that is limited to GOARCH=arm64. Renaming to empty.s will
remove the unnecessary build constraint and allow building
netstack for other architectures than amd64 and arm64.

Without this, building directly with go (not bazel)
for e.g., GOARCH=arm gives:

sleep/sleep_unsafe.go:88:6: missing function body
sleep/sleep_unsafe.go:91:6: missing function body

Change-Id: I29d1d13e1ff31506a174d4595b8cd57fa58bf52b
PiperOrigin-RevId: 214820299
2018-09-27 12:53:10 -07:00
Zhaozhong Ni 234f36b6f2 sentry: export cpuTime function.
PiperOrigin-RevId: 214798278
Change-Id: Id59d1ceb35037cda0689d3a1c4844e96c6957615
2018-09-27 12:52:25 -07:00
Fabricio Voznika fca9a390db Return correct parent PID
Old code was returning ID of the thread that created
the child process. It should be returning the ID of
the parent process instead.

PiperOrigin-RevId: 214720910
Change-Id: I95715c535bcf468ecf1ae771cccd04a4cd345b36
2018-09-26 22:00:04 -07:00
Tamir Duberstein 539df2940d Use the ICMP target address in responses
There is a subtle bug that is the result of two changes made when upstreaming
ICMPv6 support from Fuchsia:
1) ipv6.endpoint.WritePacket writes the local address it was initialized with,
rather than the provided route's local address
2) ipv6.endpoint.handleICMP doesn't set its route's local address to the ICMP
target address before writing the response

The result is that the ICMP response erroneously uses the target ipv6 address
(rather than icmp) as its source address in the response. When trying to debug
this by fixing (2), we ran into problems with bad ipv6 checksums because (1)
didn't respect the local address of the route being passed to it.

This fixes both problems.

PiperOrigin-RevId: 214650822
Change-Id: Ib6148bf432e6428d760ef9da35faef8e4b610d69
2018-09-26 12:41:04 -07:00
Tamir Duberstein bee264f0c5 Export ipv6 address helpers
This is useful for Fuchsia.

PiperOrigin-RevId: 214619681
Change-Id: If5a60dd82365c2eae51a12bbc819e5aae8c76ee9
2018-09-26 09:49:52 -07:00
Ian Gudger 4094480b28 Remove unnecessary defer
PiperOrigin-RevId: 214073949
Change-Id: I8fab916cd77362c13dac2c9dcf2ecc1710d87a5e
2018-09-21 18:14:38 -07:00
Ian Gudger 7ce13ebcad Run gofmt -s on everything
PiperOrigin-RevId: 214040901
Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21 14:06:59 -07:00
Tamir Duberstein 4634cd66ad Extend tcpip.Address.String to ipv6 addresses
PiperOrigin-RevId: 214039349
Change-Id: Ia7d09c5f85eddd1e5634f3c21b0bd60b10be6bd2
2018-09-21 13:58:31 -07:00
Tamir Duberstein 95f30ef67b Deflake TestSimpleReceive
...by increasing the allotted timeout and using direct comparison rather than
reflect.DeepEqual (which should be faster).

PiperOrigin-RevId: 214027024
Change-Id: I0a2690e65c7e14b4cc118c7312dbbf5267dc78bc
2018-09-21 12:33:21 -07:00
Tamir Duberstein 7fa57ee579 Export read-only tcpip.Subnet.Mask
PiperOrigin-RevId: 214023383
Change-Id: I5a7572f949840fb68a3ffb7342e6a3524bd00864
2018-09-21 12:07:29 -07:00
Ian Gudger 117ac8bc5b Fix data race on tcp.endpoint.hardError in tcp.(*endpoint).Read
tcp.endpoint.hardError is protected by tcp.endpoint.mu.

PiperOrigin-RevId: 213730698
Change-Id: I4e4f322ac272b145b500b1a652fbee0c7b985be2
2018-09-19 17:49:18 -07:00
Bert Muthalaly 2e497de2d9 Pass local link address to DeliverNetworkPacket
This allows a NetworkDispatcher to implement transparent bridging,
assuming all implementations of LinkEndpoint.WritePacket call eth.Encode
with header.EthernetFields.SrcAddr set to the passed
Route.LocalLinkAddress, if it is provided.

PiperOrigin-RevId: 213686651
Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
2018-09-19 13:43:58 -07:00
Bhasker Hariharan bd12e95247 Fix RTT estimation when timestamp option is enabled.
From RFC7323#Section-4

The [RFC6298] RTT estimator has weighting factors, alpha and beta, based on an
implicit assumption that at most one RTTM will be sampled per RTT.  When
multiple RTTMs per RTT are available to update the RTT estimator, an
implementation SHOULD try to adhere to the spirit of the history specified in
[RFC6298].  An implementation suggestion is detailed in Appendix G.

From RFC7323#appendix-G
Appendix G.  RTO Calculation Modification

   Taking multiple RTT samples per window would shorten the history calculated
   by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a
   similar history as originally intended by [RFC6298].

   It is roughly known how many samples a congestion window worth of data will
   yield, not accounting for ACK compression, and ACK losses.  Such events will
   result in more history of the path being reflected in the final value for
   RTO, and are uncritical.  This modification will ensure that a similar amount
   of time is taken into account for the RTO estimation, regardless of how many
   samples are taken per window:

      ExpectedSamples = ceiling(FlightSize / (SMSS * 2))

      alpha' = alpha / ExpectedSamples

      beta' = beta / ExpectedSamples

   Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".

   Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and
   beta' instead:

      RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|

      SRTT <- (1 - alpha') * SRTT + alpha' * R'

      (for each sample R')

PiperOrigin-RevId: 213644795
Change-Id: I52278b703540408938a8edb8c38be97b37f4a10e
2018-09-19 09:59:12 -07:00
Nicolas Lacasse fd222d62ed Short-circuit Readdir calls on overlay files when the dirent is frozen.
If we have an overlay file whose corresponding Dirent is frozen, then we should
not bother calling Readdir on the upper or lower files, since DirentReaddir
will calculate children based on the frozen Dirent tree.

A test was added that fails without this change.

PiperOrigin-RevId: 213531215
Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d
2018-09-18 15:42:22 -07:00
Michael Pratt dd05c96d99 Increase state test timeout
PiperOrigin-RevId: 213519378
Change-Id: Iffdb987da3a7209a297ea2df171d2ae5fa9b2b34
2018-09-18 14:38:42 -07:00
Brian Geffon ed08597d12 Allow for MSG_CTRUNC in input flags for recv.
PiperOrigin-RevId: 213481363
Change-Id: I8150ea20cebeb207afe031ed146244de9209e745
2018-09-18 11:14:37 -07:00
Fabricio Voznika da20559137 Provide better message when memfd_create fails with ENOSYS
Updates #100

PiperOrigin-RevId: 213414821
Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37
2018-09-18 02:09:28 -07:00
Fabricio Voznika 5d9816be41 Remove memory usage static init
panic() during init() can be hard to debug.

Updates #100

PiperOrigin-RevId: 213391932
Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17 21:34:37 -07:00
Tamir Duberstein d6409b6564 Prevent TCP connect from picking bound ports
PiperOrigin-RevId: 213387851
Change-Id: Icc6850761bc11afd0525f34863acd77584155140
2018-09-17 20:44:04 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Ian Gudger ab6fa44588 Allow kernel.(*Task).Block to accept an extract only channel
PiperOrigin-RevId: 213328293
Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a
2018-09-17 13:35:54 -07:00
Tamir Duberstein a452971630 Add empty .s file to allow `//go:linkname`
This was previously broken in 212917409, resulting in "missing function body"
compilation errors.

PiperOrigin-RevId: 213323695
Change-Id: I32a95b76a1c73fd731f223062ec022318b979bd4
2018-09-17 13:06:55 -07:00
Tamir Duberstein 23258ca284 Implement packet forwarding to enable NAT
PiperOrigin-RevId: 213323501
Change-Id: I0996ddbdcf097588745efe35481085d42dbaf446
2018-09-17 13:05:36 -07:00
Michael Pratt d639c3d61b Allow NULL data in mount(2)
PiperOrigin-RevId: 213315267
Change-Id: I7562bcd81fb22e90aa9c7dd9eeb94803fcb8c5af
2018-09-17 12:16:29 -07:00
newmanwang de5a590ee2 Avoid reuse of pending SignalInfo objects
runApp.execute -> Task.SendSignal -> sendSignalLocked -> sendSignalTimerLocked
-> pendingSignals.enqueue assumes that it owns the arch.SignalInfo returned
from platform.Context.Switch.

On the other hand, ptrace.context.Switch assumes that it owns the returned
SignalInfo and can safely reuse it on the next call to Switch. The KVM platform
always returns a unique SignalInfo.

This becomes a problem when the returned signal is not immediately delivered,
allowing a future signal in Switch to change the previous pending SignalInfo.

This is noticeable in #38 when external SIGINTs are delivered from the PTY
slave FD. Note that the ptrace stubs are in the same process group as the
sentry, so they are eligible to receive the PTY signals. This should probably
change, but is not the only possible cause of this bug.

Updates #38

Original change by newmanwang <wcs1011@gmail.com>, updated by Michael Pratt
<mpratt@google.com>.

Change-Id: I5383840272309df70a29f67b25e8221f933622cd
PiperOrigin-RevId: 213071072
2018-09-14 17:39:25 -07:00
Tamir Duberstein 75c66f871b Remove buffer.Prependable.UsedBytes
It is the same as buffer.Prependable.View.

PiperOrigin-RevId: 213064166
Change-Id: Ib33b8a2c4da864209d9a0be0a1c113be10b520d3
2018-09-14 16:39:56 -07:00
Michael Pratt 3aa50f18a4 Reuse readlink parameter, add sockaddr max.
PiperOrigin-RevId: 213058623
Change-Id: I522598c655d633b9330990951ff1c54d1023ec29
2018-09-14 16:00:02 -07:00
Tamir Duberstein d7a05b4e63 Pass buffer.Prependable by value
PiperOrigin-RevId: 213053370
Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562
2018-09-14 15:23:58 -07:00
Nicolas Lacasse b84bfa570d Make gVisor hard link check match Linux's.
Linux permits hard-linking if the target is owned by the user OR the target has
Read+Write permission.

PiperOrigin-RevId: 213024613
Change-Id: If642066317b568b99084edd33ee4e8822ec9cbb3
2018-09-14 12:29:46 -07:00
Jamie Liu 0380bcb3a4 Fix interaction between rt_sigtimedwait and ignored signals.
PiperOrigin-RevId: 213011782
Change-Id: I716c6ea3c586b0c6c5a892b6390d2d11478bc5af
2018-09-14 11:10:50 -07:00
Chenggang faa34a0738 platform/kvm: Get max vcpu number dynamically by ioctl
The old kernel version, such as 4.4, only support 255 vcpus.
While gvisor is ran on these kernels, it could panic because the
vcpu id and vcpu number beyond max_vcpus.
Use ioctl(vmfd, _KVM_CHECK_EXTENSION, _KVM_CAP_MAX_VCPUS) to get max
vcpus number dynamically.

Change-Id: I50dd859a11b1c2cea854a8e27d4bf11a411aa45c
PiperOrigin-RevId: 212929704
2018-09-13 21:47:11 -07:00
Ian Gudger 29a7271f5d Plumb monotonic time to netstack
Netstack needs to be portable, so this seems to be preferable to using raw
system calls.

PiperOrigin-RevId: 212917409
Change-Id: I7b2073e7db4b4bf75300717ca23aea4c15be944c
2018-09-13 19:12:15 -07:00
Rahat Mahmood adf8f33970 Extend memory usage events to report mapped memory usage.
PiperOrigin-RevId: 212887555
Change-Id: I3545383ce903cbe9f00d9b5288d9ef9a049b9f4f
2018-09-13 15:16:47 -07:00
Michael Pratt 9c6b38e295 Format struct itimerspec
PiperOrigin-RevId: 212874745
Change-Id: I0c3e8e6a9e8976631cee03bf0b8891b336ddb8c8
2018-09-13 14:07:47 -07:00
Nicolas Lacasse e2d79480f5 initArgs must hold a reference on the Root if it is not nil.
The contract in ExecArgs says that a reference on ExecArgs.Root must be held
for the lifetime of the struct, but the caller is free to drop the ref after
that.

As a result, proc.Exec must take an additional ref on Root when it constructs
the CreateProcessArgs, since that holds a pointer to Root as well. That ref is
dropped in CreateProcess.

PiperOrigin-RevId: 212828348
Change-Id: I7f44a612f337ff51a02b873b8a845d3119408707
2018-09-13 09:50:35 -07:00
Tamir Duberstein d689f8422f Always pass buffer.VectorisedView by value
PiperOrigin-RevId: 212757571
Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
2018-09-12 21:57:55 -07:00
Tamir Duberstein 5adb3468d4 Add multicast support
PiperOrigin-RevId: 212750821
Change-Id: I822fd63e48c684b45fd91f9ce057867b7eceb792
2018-09-12 20:39:24 -07:00
Zhaozhong Ni 9dec7a3db9 compressio: stop worker-pool reference / dependency loop.
PiperOrigin-RevId: 212732300
Change-Id: I9a0b9b7c28e7b7439d34656dd4f2f6114d173e22
2018-09-12 17:24:53 -07:00
Kevin Krakauer 2eff1fdd06 runsc: Add exec flag that specifies where to save the sandbox-internal pid.
This is different from the existing -pid-file flag, which saves a host pid.

PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12 15:23:35 -07:00
Tamir Duberstein cbf3980464 Prevent UDP sockets from binding to bound ports
PiperOrigin-RevId: 212653818
Change-Id: Ib4e1d754d9cdddeaa428a066cb675e6ec44d91ad
2018-09-12 09:39:01 -07:00
Nicolas Lacasse 6cc9b311af platform: Pass device fd into platform constructor.
We were previously openining the platform device (i.e. /dev/kvm) inside the
platfrom constructor (i.e. kvm.New).  This requires that we have RW access to
the platform device when constructing the platform.

However, now that the runsc sandbox process runs as user "nobody", it is not
able to open the platform device.

This CL changes the kvm constructor to take the platform device FD, rather than
opening the device file itself. The device file is opened outside of the
sandbox and passed to the sandbox process.

PiperOrigin-RevId: 212505804
Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-11 13:09:46 -07:00
Jamie Liu a29c39aa62 Map committed chunks concurrently in FileMem.LoadFrom.
PiperOrigin-RevId: 212345401
Change-Id: Iac626ee87ba312df88ab1019ade6ecd62c04c75c
2018-09-10 15:23:44 -07:00
Fabricio Voznika 7e9e6745ca Allow '/dev/zero' to be mapped with unaligned length
PiperOrigin-RevId: 212321271
Change-Id: I79d71c2e6f4b8fcd3b9b923fe96c2256755f4c48
2018-09-10 13:24:55 -07:00
Bert Muthalaly da9ecb748c Simplify some code in VectorisedView#ToView.
PiperOrigin-RevId: 212317717
Change-Id: Ic77449c53bf2f8be92c9f0a7a726c45bd35ec435
2018-09-10 13:04:06 -07:00
Michael Pratt 7045828a31 Update cleanup TODO
PiperOrigin-RevId: 212068327
Change-Id: I3f360cdf7d6caa1c96fae68ae3a1caaf440f0cbe
2018-09-07 18:14:57 -07:00
Nicolas Lacasse 9751b800a6 runsc: Support multi-container exec.
We must use a context.Context with a Root Dirent that corresponds to the
container's chroot. Previously we were using the root context, which does not
have a chroot.

Getting the correct context required refactoring some of the path-lookup code.
We can't lookup the path without a context.Context, which requires
kernel.CreateProcArgs, which we only get inside control.Execute.  So we have to
do the path lookup much later than we previously were.

PiperOrigin-RevId: 212064734
Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07 17:39:54 -07:00
Fabricio Voznika 172860a059 Add 'Starting gVisor...' message to syslog
This allows applications to verify they are running with gVisor. It
also helps debugging when running with a mix of container runtimes.

Closes #54

PiperOrigin-RevId: 212059457
Change-Id: I51d9595ee742b58c1f83f3902ab2e2ecbd5cedec
2018-09-07 16:59:27 -07:00
Adin Scannell 6cfb5cd56d Add additional sanity checks for walk.
PiperOrigin-RevId: 212058684
Change-Id: I319709b9ffcfccb3231bac98df345d2a20eca24b
2018-09-07 16:53:12 -07:00
Fabricio Voznika f895cb4d8b Use root abstract socket namespace for exec
PiperOrigin-RevId: 211999211
Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d
2018-09-07 10:45:55 -07:00
Michael Pratt 169e2efc5a Continue handling signals after disabling forwarding
Before destroying the Kernel, we disable signal forwarding,
relinquishing control to the Go runtime. External signals that arrive
after disabling forwarding but before the sandbox exits thus may use
runtime.raise (i.e., tkill(2)) and violate the syscall filters.

Adjust forwardSignals to handle signals received after disabling
forwarding the same way they are handled before starting forwarding.
i.e., by implementing the standard Go runtime behavior using tgkill(2)
instead of tkill(2).

This also makes the stop callback block until forwarding actually stops.
This isn't required to avoid tkill(2) but is a saner interface.

PiperOrigin-RevId: 211995946
Change-Id: I3585841644409260eec23435cf65681ad41f5f03
2018-09-07 10:28:25 -07:00
Nicolas Lacasse 6516b5648b createProcessArgs.RootFromContext should return process Root if it exists.
It was always returning the MountNamespace root, which may be different from
the process Root if the process is in a chroot environment.

PiperOrigin-RevId: 211862181
Change-Id: I63bfeb610e2b0affa9fdbdd8147eba3c39014480
2018-09-06 13:47:49 -07:00
Tamir Duberstein 156b49ca85 Fix race condition introduced in 211135505
Now that it's possible to remove subnets, we must iterate over them with locks
held.

Also do the removal more efficiently while I'm here.

PiperOrigin-RevId: 211737416
Change-Id: I29025ec8b0c3ad11f22d4447e8ad473f1c785463
2018-09-05 18:59:16 -07:00
Fabricio Voznika 41b56696c4 Imported FD in exec was leaking
Imported file needs to be closed after it's
been imported.

PiperOrigin-RevId: 211732472
Change-Id: Ia9249210558b77be076bcce465b832a22eed301f
2018-09-05 18:07:11 -07:00
Bert Muthalaly 5685d6b5ad Update {LinkEndpoint,NetworkEndpoint}#WritePacket to take a VectorisedView
Makes it possible to avoid copying or allocating in cases where DeliverNetworkPacket (rx)
needs to turn around and call WritePacket (tx) with its VectorisedView.

Also removes the restriction on having VectorisedViews with multiple views in the write path.

PiperOrigin-RevId: 211728717
Change-Id: Ie03a65ecb4e28bd15ebdb9c69f05eced18fdfcff
2018-09-05 17:34:25 -07:00
Tamir Duberstein fe8ca76c22 Implement Subnet removal
This was used to implement https://fuchsia-review.googlesource.com/c/garnet/+/177771.

PiperOrigin-RevId: 211725098
Change-Id: Ib0acc7c13430b7341e8e0ec6eb5fc35f5cee5083
2018-09-05 17:06:29 -07:00
Bert Muthalaly b3b66dbd1f Enable constructing a Prependable from a View without allocating.
PiperOrigin-RevId: 211722525
Change-Id: Ie73753fd09d67d6a2ce70cfe2d4ecf7275f09ce0
2018-09-05 16:47:51 -07:00
Tamir Duberstein bc5e18c9d1 Implement TCP keepalives
PiperOrigin-RevId: 211670620
Change-Id: Ia8a3d8ae53a7fece1dee08ee9c74964bd7f71bb7
2018-09-05 11:48:23 -07:00
Brian Geffon 2b8dae0bc5 Open(2) isn't honoring O_NOFOLLOW
PiperOrigin-RevId: 211644897
Change-Id: I882ed827a477d6c03576463ca5bf2d6351892b90
2018-09-05 09:21:28 -07:00
Bhasker Hariharan 2cff07381a Automated rollback of changelist 211156845
PiperOrigin-RevId: 211525182
Change-Id: I462c20328955c77ecc7bfd8ee803ac91f15858e6
2018-09-04 14:31:52 -07:00
Michael Pratt 3944cb41cb /proc/PID/mounts is not tab-delimited
PiperOrigin-RevId: 211513847
Change-Id: Ib484dd2d921c3e5d70d0e410cd973d3bff4f6b73
2018-09-04 13:29:49 -07:00
Tamir Duberstein 3794cb6bff Expose TCP RTT
PiperOrigin-RevId: 211504634
Change-Id: I9a7bcbbdd40e5036894930f709278725ef477293
2018-09-04 12:39:47 -07:00
Adin Scannell c09f9acd7c Distinguish Element and Linker for ilist.
Furthermore, allow for the specification of an ElementMapper. This allows a
single "Element" type to exist on multiple inline lists, and work without
having to embed the entry type.

This is a requisite change for supporting a per-Inode list of Dirents.

PiperOrigin-RevId: 211467497
Change-Id: If2768999b43e03fdaecf8ed15f435fe37518d163
2018-09-04 09:19:11 -07:00
Googler f0d8817654 Automated rollback of changelist 211103930
PiperOrigin-RevId: 211156845
Change-Id: Ie28011d7eb5f45f3a0158dbee2a68c5edf22f6e0
2018-08-31 15:48:50 -07:00
Jamie Liu f8ccfbbed4 Document more task-goroutine-owned fields in kernel.Task.
Task.creds can only be changed by the task's own set*id and execve
syscalls, and Task namespaces can only be changed by the task's own
unshare/setns syscalls.

PiperOrigin-RevId: 211156279
Change-Id: I94d57105d34e8739d964400995a8a5d76306b2a0
2018-08-31 15:44:40 -07:00
Jamie Liu b935311e23 Do not use fs.FileOwnerFromContext in fs/proc.file.UnstableAttr().
From //pkg/sentry/context/context.go:

// - It is *not safe* to retain a Context passed to a function beyond the scope
// of that function call.

Passing a stored kernel.Task as a context.Context to
fs.FileOwnerFromContext violates this requirement.

PiperOrigin-RevId: 211143021
Change-Id: I4c5b02bd941407be4c9cfdbcbdfe5a26acaec037
2018-08-31 14:17:56 -07:00
Jamie Liu 098046ba19 Disintegrate kernel.TaskResources.
This allows us to call kernel.FDMap.DecRef without holding mutexes
cleanly.

PiperOrigin-RevId: 211139657
Change-Id: Ie59d5210fb9282e1950e2e40323df7264a01bcec
2018-08-31 13:58:04 -07:00
Jamie Liu b1c1afa3cc Delete the long-obsolete kernel.TaskMaybe interface.
PiperOrigin-RevId: 211131855
Change-Id: Ia7799561ccd65d16269e0ae6f408ab53749bca37
2018-08-31 13:07:34 -07:00
Tamir Duberstein 625edb9f28 ipv6: ICMP support
This CL does NDP link-address discovery for IPv6.

It includes several small changes necessary to get linux to talk to
this implementation. In particular, a hop limit of 255 is necessary
for ICMPv6.

PiperOrigin-RevId: 211103930
Change-Id: If25370ab84c6b1decfb15de917f3b0020f2c4e0e
2018-08-31 10:23:32 -07:00
Nicolas Lacasse 8bfb5fa919 fs: Add empty dir at /sys/class/power_supply.
PiperOrigin-RevId: 210953512
Change-Id: I07d2d7fb0d268aa8eca26d81ef28b5b5c42289ee
2018-08-30 12:01:27 -07:00
Ian Gudger 313d4af52d ping: update comment about UDP
PiperOrigin-RevId: 210788012
Change-Id: I5ebdcf3d02bfab3484a1374fbccba870c9d68954
2018-08-29 14:15:58 -07:00