Commit Graph

126 Commits

Author SHA1 Message Date
Ian Gudger d571a4359c Implement ioctl(FIOASYNC)
FIOASYNC and friends are used to send signals when a file is ready for IO.

This may or may not be needed by Nginx. While Nginx does use it, it is unclear
if the code that uses it has any effect.

PiperOrigin-RevId: 201550828
Change-Id: I7ba05a7db4eb2dfffde11e9bd9a35b65b98d7f50
2018-06-21 10:53:21 -07:00
Fabricio Voznika 4ad7315b67 Add 'runsc debug' command
It prints sandbox stacks to the log to help debug stuckness. I expect
that many more options will be added in the future.

PiperOrigin-RevId: 201405931
Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e
2018-06-20 13:31:31 -07:00
Nicolas Lacasse d93f55e863 Remove some defers in hot paths in the filesystem code.
PiperOrigin-RevId: 201401727
Change-Id: Ia5589882ba58a00efb522ab372e206b7e8e62aee
2018-06-20 13:05:54 -07:00
Zhaozhong Ni 4e9f0e91d7 sentry: pending signals S/R optimization.
Almost all of the hundreds of pending signal queues are empty upon save.

PiperOrigin-RevId: 201380318
Change-Id: I40747072435299de681d646e0862efac0637e172
2018-06-20 11:02:41 -07:00
Brian Geffon db66e383c3 Epsocket has incorrect recv(2) behavior after SHUT_RD.
After shutdown(SHUT_RD) calls to recv /w MSG_DONTWAIT or with
O_NONBLOCK should result in a EAGAIN and not 0. Blocking sockets
should return 0 as they would have otherwise blocked indefinitely.

PiperOrigin-RevId: 201271123
Change-Id: If589b69c17fa5b9ff05bcf9e44024da9588c8876
2018-06-19 17:29:11 -07:00
Zhaozhong Ni 18d8992453 state: pretty-print primitive type arrays.
PiperOrigin-RevId: 201269072
Change-Id: Ia542c5a42b5b5d21c1104a003ddff5279644d309
2018-06-19 17:13:35 -07:00
Adin Scannell be76cad5bc Make KVM more scalable by removing CPU cap.
Instead, CPUs will be created dynamically. We also allow a relatively
efficient mechanism for stealing and notifying when a vCPU becomes
available via unlock.

Since the number of vCPUs is no longer fixed at machine creation time,
we make the dirtySet packing more efficient. This has the pleasant side
effect of cutting out the unsafe address space code.

PiperOrigin-RevId: 201266691
Change-Id: I275c73525a4f38e3714b9ac0fd88731c26adfe66
2018-06-19 17:00:30 -07:00
Zhaozhong Ni aa14a2c1be sentry: futex S/R optimization.
No need to save thousands of zerovalue buckets.

PiperOrigin-RevId: 201258598
Change-Id: I5d3ea7b6a5345117ab4f610332d5288ca550be33
2018-06-19 16:08:00 -07:00
Justine Olshan a6dbef045f Added a resume command to unpause a paused container.
Resume checks the status of the container and unpauses the kernel
if its status is paused. Otherwise nothing happens.
Tests were added to ensure that the process is in the correct state
after various commands.

PiperOrigin-RevId: 201251234
Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19
2018-06-19 15:23:36 -07:00
Brian Geffon bda2a1ed35 Rpcinet is racy around shutdown flags.
Correct a data race in rpcinet where a shutdown and recvmsg can
race around shutown flags.

PiperOrigin-RevId: 201238366
Change-Id: I5eb06df4a2b4eba331eeb5de19076213081d581f
2018-06-19 14:12:52 -07:00
Nicolas Lacasse 9db7cfad93 Add a new cache policy FSCACHE_WRITETHROUGH.
The new policy is identical to FSCACHE (which caches everything in memory), but
it also flushes writes to the backing fs agent immediately.

All gofer cache policy decisions have been moved into the cachePolicy type.
Previously they were sprinkled around the codebase.

There are many different things that we cache (page cache, negative dirents,
dirent LRU, unstable attrs, readdir results....), and I don't think we should
have individual flags to control each of these.  Instead, we should have a few
high-level cache policies that are consistent and useful to users.  This
refactoring makes it easy to add more such policies.

PiperOrigin-RevId: 201206937
Change-Id: I6e225c382b2e5e1b0ad4ccf8ca229873f4cd389d
2018-06-19 11:10:11 -07:00
Zhaozhong Ni 5581256f87 state: include I/O and protobuf time in kernel S/R timing stats.
PiperOrigin-RevId: 201205733
Change-Id: I300307b0668989ba7776ab9e3faee71efdd33f46
2018-06-19 11:04:54 -07:00
Brian Geffon 4fd1d40e1d Rpcinet needs to track shutdown state for blocking sockets.
Because rpcinet will emulate a blocking socket backed by an rpc based
non-blocking socket. In the event of a shutdown(SHUT_RD) followed by a
read a non-blocking socket is allowed to return an EWOULDBLOCK however
since a blocking socket knows it cannot receive anymore data it would
block indefinitely and in this situation linux returns 0. We have to
track this on the rpcinet sentry side so we can emulate that behavior
because the remote side has no way to know if the socket is actually
blocking within the sentry.

PiperOrigin-RevId: 201201618
Change-Id: I4ac3a7b74b5dae471ab97c2e7d33b83f425aedac
2018-06-19 10:43:30 -07:00
Brian Geffon 563a71ef24 Add rpcinet support for control messages.
Add support for control messages, but at this time the only
control message that the sentry will support here is SO_TIMESTAMP.

PiperOrigin-RevId: 200922230
Change-Id: I63a852d9305255625d9df1d989bd46a66e93c446
2018-06-17 17:06:40 -07:00
Michael Pratt bd2d1aaa16 Replace crypto/rand with internal rand package
PiperOrigin-RevId: 200784607
Change-Id: I39aa6ee632936dcbb00fc298adccffa606e9f4c0
2018-06-15 15:36:00 -07:00
Zhaozhong Ni fc8ca72a32 sentry: do not start delivering external signal immediately.
PiperOrigin-RevId: 200765756
Change-Id: Ie4266f32e4e977df3925eb29f3fbb756e0337606
2018-06-15 13:38:14 -07:00
Brian Geffon fa6db05e0c FIFOs should support O_TRUNC as a no-op.
PiperOrigin-RevId: 200759323
Change-Id: I683b2edcc2188304c4ca563e46af457e23625905
2018-06-15 12:55:29 -07:00
Adin Scannell b31ac4e1df Use notify explicitly on unlock path.
There are circumstances under which the redpill call will not generate
the appropriate action and notification. Replace this call with an
explicit notification, which is guaranteed to transition as well as
perform the futex wake.

PiperOrigin-RevId: 200726934
Change-Id: Ie19e008a6007692dd7335a31a8b59f0af6e54aaa
2018-06-15 09:30:08 -07:00
Fabricio Voznika 119a302ceb Implement /proc/thread-self
Closes #68

PiperOrigin-RevId: 200725401
Change-Id: I4827009b8aee89d22887c3af67291ccf7058d420
2018-06-15 09:18:00 -07:00
Jamie Liu 657db692b2 Ignore expiration count in kernelCPUClockListener.Notify.
PiperOrigin-RevId: 200590832
Change-Id: I35b817ecccc9414a742dee4815dfc67d0c7d0496
2018-06-14 11:35:11 -07:00
Ian Gudger f5d0c59f5c Fix reference leak in VDSO validation
PiperOrigin-RevId: 200496070
Change-Id: I33adb717c44e5b4bcadece882be3ab1ee3920556
2018-06-13 20:00:55 -07:00
Brian Geffon 1170039e78 Fix missing returns in rpcinet.
PiperOrigin-RevId: 200472634
Change-Id: I3f0fb9e3b2f8616e6aa1569188258f330bf1ed31
2018-06-13 16:21:23 -07:00
Adin Scannell 7b7b199ed0 Deflake kvm_test.
PiperOrigin-RevId: 200439846
Change-Id: I9970fe0716cb02f0f41b754891d55db7e0729f56
2018-06-13 13:05:33 -07:00
Fabricio Voznika 717f2501c9 Fix failure to mount volume that sandbox process has no access
Boot loader tries to stat mount to determine whether it's a file or not. This
may file if the sandbox process doesn't have access to the file. Instead, add
overlay on top of file, which is better anyway since we don't want to propagate
changes to the host.

PiperOrigin-RevId: 200411261
Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
2018-06-13 10:20:06 -07:00
Zhaozhong Ni 686093669e sentry: do not treat all save errors as state file errors.
PiperOrigin-RevId: 200410220
Change-Id: I6a8745e33be949e335719083501f18b24f6ba471
2018-06-13 10:14:15 -07:00
Jamie Liu 55b9058456 Log filemem state when panicing due to invalid refcount.
PiperOrigin-RevId: 200408305
Change-Id: I676ee49ec77697105723577928c7f82088cd378e
2018-06-13 10:03:54 -07:00
Ian Gudger ba426f7782 Fix reference leak for negative dirents
PiperOrigin-RevId: 200306715
Change-Id: I7c80059c77ebd3d9a5d7d48b05c8e7a597f10850
2018-06-12 17:04:20 -07:00
Brian Geffon c2b3f04d1c Rpcinet doensn't handle SO_RCVTIMEO properly.
Rpcinet already inherits socket.ReceiveTimeout; however, it's
never set on setsockopt(2). The value is currently forwarded
as an RPC and ignored as all sockets will be non-blocking
on the RPC side.

PiperOrigin-RevId: 200299260
Change-Id: I6c610ea22c808ff6420c63759dccfaeab17959dd
2018-06-12 16:16:15 -07:00
Brielle Broder 711a9869e5 Runsc checkpoint works.
This is the first iteration of checkpoint that actually saves to a file.
Tests for checkpoint are included.

Ran into an issue when private unix sockets are enabled. An error message
was added for this case and the mutex state was set.

PiperOrigin-RevId: 200269470
Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
2018-06-12 13:25:23 -07:00
Jamie Liu 7a10df454b Drop MMapOpts.MappingIdentity reference in loader.mapSegment.
PiperOrigin-RevId: 200261995
Change-Id: I7e460b18ceab2c23096bdeb7416159d6e774aaf7
2018-06-12 12:38:02 -07:00
Adin Scannell 41f766893a Minor ring0 interface cleanup.
- Remove unused methods.
- Provide declaration for asm function.

PiperOrigin-RevId: 200146850
Change-Id: Ic455c96ffe0d2e78ef15f824eb65d7de705b054a
2018-06-11 18:17:15 -07:00
Adin Scannell 1397a413b4 Make page tables split-safe.
In order to minimize the likelihood of exit during page table
modifications, make the full set of page table functions split-safe.
This is not strictly necessary (and you may still incur splits due to
allocations from the allocator pool) but should make retries a very rare
occurance.

PiperOrigin-RevId: 200146688
Change-Id: I8fa36aa16b807beda2f0b057be60038258e8d597
2018-06-11 18:15:14 -07:00
Adin Scannell 09b0a9c320 Handle all exception vectors.
PiperOrigin-RevId: 200144655
Change-Id: I5a753c74b75007b7714d6fe34aa0d2e845dc5c41
2018-06-11 17:57:19 -07:00
Fabricio Voznika ea4a468fba Set CLOEXEC option to sockets
hostinet/socket.go: the Sentry doesn't spawn new processes, but it doesn't hurt to protect the socket from leaking.
unet/unet.go: should be setting closing on exec. The FD is explicitly donated to children when needed.

PiperOrigin-RevId: 200135682
Change-Id: Ia8a45ced1e00a19420c8611b12e7a8ee770f89cb
2018-06-11 16:45:50 -07:00
Brian Geffon ab2c2575d6 Rpcinet is incorrectly handling MSG_TRUNC with SOCK_STREAM
SOCK_STREAM has special behavior with respect to MSG_TRUNC. Specifically,
the data isn't actually copied back out to userspace when MSG_TRUNC is
provided on a SOCK_STREAM.

According to tcp(7): "Since version 2.4, Linux supports the use of
MSG_TRUNC in the flags argument of recv(2) (and recvmsg(2)). This flag
causes the received bytes of data to be discarded, rather than passed
back in a caller-supplied buffer."

PiperOrigin-RevId: 200134860
Change-Id: I70f17a5f60ffe7794c3f0cfafd131c069202e90d
2018-06-11 16:40:38 -07:00
Brian Geffon 0412f17e06 rpcinet is treating EAGAIN and EWOULDBLOCK as different errnos.
PiperOrigin-RevId: 200124614
Change-Id: I38a7b083f1464a2a586fe24db648e624c455fec5
2018-06-11 15:34:08 -07:00
Fabricio Voznika 7260363751 Add O_TRUNC handling in openat
PiperOrigin-RevId: 200103677
Change-Id: I3efb565c30c64d35f8fd7b5c05ed78dcc2990c51
2018-06-11 13:35:21 -07:00
Kevin Krakauer 032b0398a5 Sentry: split tty.queue into its own file.
Minor refactor. line_discipline.go was home to 2 large structs (lineDiscipline
and queue), and queue is now large enough IMO to get its own file.

Also moves queue locks into the queue struct, making locking simpler.

PiperOrigin-RevId: 200080301
Change-Id: Ia75a0e9b3d9ac8d7e5a0f0099a54e1f5b8bdea34
2018-06-11 11:09:43 -07:00
Adin Scannell c0ab059e7b Fix kernel flags handling and add missing vectors.
PiperOrigin-RevId: 199877174
Change-Id: I9d19ea301608c2b989df0a6123abb1e779427853
2018-06-08 17:51:50 -07:00
Brian Geffon 2fbd1cf57c Add checks for short CopyOut in rpcinet
PiperOrigin-RevId: 199864753
Change-Id: Ibace6a1fdf99ee6ce368ac12c390aa8a02dbdfb7
2018-06-08 15:58:22 -07:00
Adin Scannell 6728f09910 Fix sigaltstack semantics.
Walking off the bottom of the sigaltstack, for example with recursive faults,
results in forced signal delivery, not resetting the stack or pushing signal
stack to whatever happens to lie below the signal stack.

PiperOrigin-RevId: 199856085
Change-Id: I0004d2523f0df35d18714de2685b3eaa147837e0
2018-06-08 15:01:21 -07:00
Bhasker Hariharan de8dba205f Add a protocol option to set congestion control algorithm.
Also adds support to query available congestion control algorithms.

PiperOrigin-RevId: 199826897
Change-Id: I2b338b709820ee9cf58bb56d83aa7b1a39f4eab2
2018-06-08 11:46:23 -07:00
Brian Geffon 2f3895d6f7 rpcinet is not correctly handling MSG_TRUNC on recvmsg(2).
MSG_TRUNC can cause recvmsg(2) to return a value larger than
the buffer size. In this situation it's an indication that the
buffer was completely filled and that the msg was truncated.
Previously in rpcinet we were returning the buffer size but we
should actually be returning the payload length as returned by
the syscall.

PiperOrigin-RevId: 199814221
Change-Id: If09aa364219c1bf193603896fcc0dc5c55e85d21
2018-06-08 10:33:25 -07:00
Brian Geffon 5c37097e34 rpcinet should not block in read(2) rpcs.
PiperOrigin-RevId: 199703609
Change-Id: I8153b0396b22a230a68d4b69c46652a5545f7630
2018-06-07 15:10:15 -07:00
Brian Geffon 7e9893eeb5 Add missing rpcinet ioctls.
PiperOrigin-RevId: 199669120
Change-Id: I0be88cdbba29760f967e9a5bb4144ca62c1ed7aa
2018-06-07 11:37:16 -07:00
Kevin Krakauer 9170303105 Sentry: very basic terminal echo support.
Adds support for echo to terminals. Echoing is just copying input back out to
the user, e.g. when I type "foo" into a terminal, I expect "foo" to be echoed
back to my terminal.

Also makes the transform function part of the queue, eliminating the need to
pass them around together and the possibility of using the wrong transform for a
queue.

PiperOrigin-RevId: 199655147
Change-Id: I37c490d4fc1ee91da20ae58ba1f884a5c14fd0d8
2018-06-07 10:21:22 -07:00
Adin Scannell d269845159 Ensure guest-mode for page table modifications.
Because of the KVM shadow page table implementation, modifications made
to guest page tables from host mode may not be syncronized correctly,
resulting in undefined behavior. This is a KVM bug: page table pages
should also be tracked for host modifications and resynced appropriately
(e.g. the guest could "DMA" into a page table page in theory).

However, since we can't rely on this being fixed everywhere, workaround
the issue by forcing page table modifications to be in guest mode. This
will generally be the case anyways, but now if an exit occurs during
modifications, we will re-enter and perform the modifications again.

PiperOrigin-RevId: 199587895
Change-Id: I83c20b4cf2a9f9fa56f59f34939601dd34538fb0
2018-06-06 23:26:14 -07:00
Adin Scannell 3374849cb5 Split PCID implementation from page tables.
Instead of associating a single PCID with each set of page tables (which
will reach the maximum quickly), allow a dynamic pool for each vCPU.
This is the same way that Linux operates. We also split management of
PCIDs out of the page tables themselves for simplicity.

PiperOrigin-RevId: 199585631
Change-Id: I42f3486ada3cb2a26f623c65ac279b473ae63201
2018-06-06 22:52:55 -07:00
Adin Scannell 1b5062263b Add allocator abstraction for page tables.
In order to prevent possible garbage collection and reuse of page table
pages prior to invalidation, introduce a former allocator abstraction
that can ensure entries are held during a single traversal. This also
cleans up the abstraction and splits it out of the machine itself.

PiperOrigin-RevId: 199581636
Change-Id: I2257d5d7ffd9c36f9b7ecd42f769261baeaf115c
2018-06-06 21:48:24 -07:00
Brian Geffon 79fef54eb1 Add support for rpcinet ioctl(2).
This change will add support for ioctls that have previously
been supported by netstack.

LINE_LENGTH_IGNORE

PiperOrigin-RevId: 199544114
Change-Id: I3769202c19502c3b7d05e06ea9552acfd9255893
2018-06-06 15:53:26 -07:00