Commit Graph

1526 Commits

Author SHA1 Message Date
Adin Scannell b31ac4e1df Use notify explicitly on unlock path.
There are circumstances under which the redpill call will not generate
the appropriate action and notification. Replace this call with an
explicit notification, which is guaranteed to transition as well as
perform the futex wake.

PiperOrigin-RevId: 200726934
Change-Id: Ie19e008a6007692dd7335a31a8b59f0af6e54aaa
2018-06-15 09:30:08 -07:00
Fabricio Voznika ef5dd4df9b Set kernel.applicationCores to the number of processor on the host
The right number to use is the number of processors assigned to the cgroup. But until
we make the sandbox join the respective cgroup, just use the number of processors on
the host.

Closes #65, closes #66

PiperOrigin-RevId: 200725483
Change-Id: I34a566b1a872e26c66f56fa6e3100f42aaf802b1
2018-06-15 09:19:04 -07:00
Fabricio Voznika 119a302ceb Implement /proc/thread-self
Closes #68

PiperOrigin-RevId: 200725401
Change-Id: I4827009b8aee89d22887c3af67291ccf7058d420
2018-06-15 09:18:00 -07:00
Adin Scannell 1eb1bf8670 Update contributing guidelines with an example.
Fixes #69

PiperOrigin-RevId: 200683809
Change-Id: I1312ebb3775d5f9088e9108359c19e2dedbb7b70
2018-06-15 01:22:08 -07:00
Brielle Broder bd1e83ff60 Fix typo.
PiperOrigin-RevId: 200631795
Change-Id: I297fe3e30fb06b04fccd8358c933e45019dcc1fa
2018-06-14 15:45:10 -07:00
Jamie Liu 657db692b2 Ignore expiration count in kernelCPUClockListener.Notify.
PiperOrigin-RevId: 200590832
Change-Id: I35b817ecccc9414a742dee4815dfc67d0c7d0496
2018-06-14 11:35:11 -07:00
Michael Pratt d71f5ef688 Add nanosleep filter for Go 1.11 support
golang.org/cl/108538 replaces pselect6 with nanosleep in runtime.usleep. Update
the filters accordingly.

PiperOrigin-RevId: 200574612
Change-Id: Ifb2296fcb3781518fc047aabbbffedb9ae488cd7
2018-06-14 10:11:05 -07:00
Ian Gudger f5d0c59f5c Fix reference leak in VDSO validation
PiperOrigin-RevId: 200496070
Change-Id: I33adb717c44e5b4bcadece882be3ab1ee3920556
2018-06-13 20:00:55 -07:00
Brian Geffon 1170039e78 Fix missing returns in rpcinet.
PiperOrigin-RevId: 200472634
Change-Id: I3f0fb9e3b2f8616e6aa1569188258f330bf1ed31
2018-06-13 16:21:23 -07:00
Adin Scannell 7b7b199ed0 Deflake kvm_test.
PiperOrigin-RevId: 200439846
Change-Id: I9970fe0716cb02f0f41b754891d55db7e0729f56
2018-06-13 13:05:33 -07:00
Fabricio Voznika 717f2501c9 Fix failure to mount volume that sandbox process has no access
Boot loader tries to stat mount to determine whether it's a file or not. This
may file if the sandbox process doesn't have access to the file. Instead, add
overlay on top of file, which is better anyway since we don't want to propagate
changes to the host.

PiperOrigin-RevId: 200411261
Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
2018-06-13 10:20:06 -07:00
Zhaozhong Ni 686093669e sentry: do not treat all save errors as state file errors.
PiperOrigin-RevId: 200410220
Change-Id: I6a8745e33be949e335719083501f18b24f6ba471
2018-06-13 10:14:15 -07:00
Jamie Liu 55b9058456 Log filemem state when panicing due to invalid refcount.
PiperOrigin-RevId: 200408305
Change-Id: I676ee49ec77697105723577928c7f82088cd378e
2018-06-13 10:03:54 -07:00
Ian Gudger ba426f7782 Fix reference leak for negative dirents
PiperOrigin-RevId: 200306715
Change-Id: I7c80059c77ebd3d9a5d7d48b05c8e7a597f10850
2018-06-12 17:04:20 -07:00
Brian Geffon c2b3f04d1c Rpcinet doensn't handle SO_RCVTIMEO properly.
Rpcinet already inherits socket.ReceiveTimeout; however, it's
never set on setsockopt(2). The value is currently forwarded
as an RPC and ignored as all sockets will be non-blocking
on the RPC side.

PiperOrigin-RevId: 200299260
Change-Id: I6c610ea22c808ff6420c63759dccfaeab17959dd
2018-06-12 16:16:15 -07:00
Lantao Liu 2506b9b11f runsc: do not include sub target if it is not started with '/'.
PiperOrigin-RevId: 200274828
Change-Id: I956703217df08d8650a881479b7ade8f9f119912
2018-06-12 13:54:54 -07:00
Brielle Broder 711a9869e5 Runsc checkpoint works.
This is the first iteration of checkpoint that actually saves to a file.
Tests for checkpoint are included.

Ran into an issue when private unix sockets are enabled. An error message
was added for this case and the mutex state was set.

PiperOrigin-RevId: 200269470
Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
2018-06-12 13:25:23 -07:00
Jamie Liu 7a10df454b Drop MMapOpts.MappingIdentity reference in loader.mapSegment.
PiperOrigin-RevId: 200261995
Change-Id: I7e460b18ceab2c23096bdeb7416159d6e774aaf7
2018-06-12 12:38:02 -07:00
Kevin Krakauer 2dc9cd7bf7 runsc: enable terminals in the sandbox.
runsc now mounts the devpts filesystem, so you get a real terminal using
ssh+sshd.

PiperOrigin-RevId: 200244830
Change-Id: If577c805ad0138fda13103210fa47178d8ac6605
2018-06-12 11:03:25 -07:00
Fabricio Voznika 48335318a2 Enable debug logging in tests
Unit tests call runsc directly now, so all command line arguments
are valid. On the other hand, enabling debug in the test binary
doesn't affect runsc. It needs to be set in the config.

PiperOrigin-RevId: 200237706
Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7
2018-06-12 10:25:55 -07:00
Adin Scannell 41f766893a Minor ring0 interface cleanup.
- Remove unused methods.
- Provide declaration for asm function.

PiperOrigin-RevId: 200146850
Change-Id: Ic455c96ffe0d2e78ef15f824eb65d7de705b054a
2018-06-11 18:17:15 -07:00
Adin Scannell 1397a413b4 Make page tables split-safe.
In order to minimize the likelihood of exit during page table
modifications, make the full set of page table functions split-safe.
This is not strictly necessary (and you may still incur splits due to
allocations from the allocator pool) but should make retries a very rare
occurance.

PiperOrigin-RevId: 200146688
Change-Id: I8fa36aa16b807beda2f0b057be60038258e8d597
2018-06-11 18:15:14 -07:00
Adin Scannell 09b0a9c320 Handle all exception vectors.
PiperOrigin-RevId: 200144655
Change-Id: I5a753c74b75007b7714d6fe34aa0d2e845dc5c41
2018-06-11 17:57:19 -07:00
Fabricio Voznika ea4a468fba Set CLOEXEC option to sockets
hostinet/socket.go: the Sentry doesn't spawn new processes, but it doesn't hurt to protect the socket from leaking.
unet/unet.go: should be setting closing on exec. The FD is explicitly donated to children when needed.

PiperOrigin-RevId: 200135682
Change-Id: Ia8a45ced1e00a19420c8611b12e7a8ee770f89cb
2018-06-11 16:45:50 -07:00
Brian Geffon ab2c2575d6 Rpcinet is incorrectly handling MSG_TRUNC with SOCK_STREAM
SOCK_STREAM has special behavior with respect to MSG_TRUNC. Specifically,
the data isn't actually copied back out to userspace when MSG_TRUNC is
provided on a SOCK_STREAM.

According to tcp(7): "Since version 2.4, Linux supports the use of
MSG_TRUNC in the flags argument of recv(2) (and recvmsg(2)). This flag
causes the received bytes of data to be discarded, rather than passed
back in a caller-supplied buffer."

PiperOrigin-RevId: 200134860
Change-Id: I70f17a5f60ffe7794c3f0cfafd131c069202e90d
2018-06-11 16:40:38 -07:00
Brian Geffon 0412f17e06 rpcinet is treating EAGAIN and EWOULDBLOCK as different errnos.
PiperOrigin-RevId: 200124614
Change-Id: I38a7b083f1464a2a586fe24db648e624c455fec5
2018-06-11 15:34:08 -07:00
Fabricio Voznika 7260363751 Add O_TRUNC handling in openat
PiperOrigin-RevId: 200103677
Change-Id: I3efb565c30c64d35f8fd7b5c05ed78dcc2990c51
2018-06-11 13:35:21 -07:00
Kevin Krakauer 032b0398a5 Sentry: split tty.queue into its own file.
Minor refactor. line_discipline.go was home to 2 large structs (lineDiscipline
and queue), and queue is now large enough IMO to get its own file.

Also moves queue locks into the queue struct, making locking simpler.

PiperOrigin-RevId: 200080301
Change-Id: Ia75a0e9b3d9ac8d7e5a0f0099a54e1f5b8bdea34
2018-06-11 11:09:43 -07:00
Adin Scannell c0ab059e7b Fix kernel flags handling and add missing vectors.
PiperOrigin-RevId: 199877174
Change-Id: I9d19ea301608c2b989df0a6123abb1e779427853
2018-06-08 17:51:50 -07:00
Brian Geffon 2fbd1cf57c Add checks for short CopyOut in rpcinet
PiperOrigin-RevId: 199864753
Change-Id: Ibace6a1fdf99ee6ce368ac12c390aa8a02dbdfb7
2018-06-08 15:58:22 -07:00
Adin Scannell 6728f09910 Fix sigaltstack semantics.
Walking off the bottom of the sigaltstack, for example with recursive faults,
results in forced signal delivery, not resetting the stack or pushing signal
stack to whatever happens to lie below the signal stack.

PiperOrigin-RevId: 199856085
Change-Id: I0004d2523f0df35d18714de2685b3eaa147837e0
2018-06-08 15:01:21 -07:00
Bhasker Hariharan de8dba205f Add a protocol option to set congestion control algorithm.
Also adds support to query available congestion control algorithms.

PiperOrigin-RevId: 199826897
Change-Id: I2b338b709820ee9cf58bb56d83aa7b1a39f4eab2
2018-06-08 11:46:23 -07:00
Brian Geffon 2f3895d6f7 rpcinet is not correctly handling MSG_TRUNC on recvmsg(2).
MSG_TRUNC can cause recvmsg(2) to return a value larger than
the buffer size. In this situation it's an indication that the
buffer was completely filled and that the msg was truncated.
Previously in rpcinet we were returning the buffer size but we
should actually be returning the payload length as returned by
the syscall.

PiperOrigin-RevId: 199814221
Change-Id: If09aa364219c1bf193603896fcc0dc5c55e85d21
2018-06-08 10:33:25 -07:00
Fabricio Voznika 5c51bc51e4 Drop capabilities not needed by Gofer
PiperOrigin-RevId: 199808391
Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
2018-06-08 09:59:26 -07:00
Brian Geffon 5c37097e34 rpcinet should not block in read(2) rpcs.
PiperOrigin-RevId: 199703609
Change-Id: I8153b0396b22a230a68d4b69c46652a5545f7630
2018-06-07 15:10:15 -07:00
Brian Geffon 7e9893eeb5 Add missing rpcinet ioctls.
PiperOrigin-RevId: 199669120
Change-Id: I0be88cdbba29760f967e9a5bb4144ca62c1ed7aa
2018-06-07 11:37:16 -07:00
Kevin Krakauer 9170303105 Sentry: very basic terminal echo support.
Adds support for echo to terminals. Echoing is just copying input back out to
the user, e.g. when I type "foo" into a terminal, I expect "foo" to be echoed
back to my terminal.

Also makes the transform function part of the queue, eliminating the need to
pass them around together and the possibility of using the wrong transform for a
queue.

PiperOrigin-RevId: 199655147
Change-Id: I37c490d4fc1ee91da20ae58ba1f884a5c14fd0d8
2018-06-07 10:21:22 -07:00
Adin Scannell d269845159 Ensure guest-mode for page table modifications.
Because of the KVM shadow page table implementation, modifications made
to guest page tables from host mode may not be syncronized correctly,
resulting in undefined behavior. This is a KVM bug: page table pages
should also be tracked for host modifications and resynced appropriately
(e.g. the guest could "DMA" into a page table page in theory).

However, since we can't rely on this being fixed everywhere, workaround
the issue by forcing page table modifications to be in guest mode. This
will generally be the case anyways, but now if an exit occurs during
modifications, we will re-enter and perform the modifications again.

PiperOrigin-RevId: 199587895
Change-Id: I83c20b4cf2a9f9fa56f59f34939601dd34538fb0
2018-06-06 23:26:14 -07:00
Adin Scannell 3374849cb5 Split PCID implementation from page tables.
Instead of associating a single PCID with each set of page tables (which
will reach the maximum quickly), allow a dynamic pool for each vCPU.
This is the same way that Linux operates. We also split management of
PCIDs out of the page tables themselves for simplicity.

PiperOrigin-RevId: 199585631
Change-Id: I42f3486ada3cb2a26f623c65ac279b473ae63201
2018-06-06 22:52:55 -07:00
Adin Scannell 1b5062263b Add allocator abstraction for page tables.
In order to prevent possible garbage collection and reuse of page table
pages prior to invalidation, introduce a former allocator abstraction
that can ensure entries are held during a single traversal. This also
cleans up the abstraction and splits it out of the machine itself.

PiperOrigin-RevId: 199581636
Change-Id: I2257d5d7ffd9c36f9b7ecd42f769261baeaf115c
2018-06-06 21:48:24 -07:00
Kevin Krakauer 206e90d057 runsc: Support abbreviated container IDs.
Just a UI/usability addition. It's a lot easier to type "60" than
"60185c721d7e10c00489f1fa210ee0d35c594873d6376b457fb1815e4fdbfc2c".

PiperOrigin-RevId: 199547932
Change-Id: I19011b5061a88aba48a9ad7f8cf954a6782de854
2018-06-06 16:13:53 -07:00
Brian Geffon 79fef54eb1 Add support for rpcinet ioctl(2).
This change will add support for ioctls that have previously
been supported by netstack.

LINE_LENGTH_IGNORE

PiperOrigin-RevId: 199544114
Change-Id: I3769202c19502c3b7d05e06ea9552acfd9255893
2018-06-06 15:53:26 -07:00
Googler 0c34b460f2 Add runsc checkpoint command.
Checkpoint command is plumbed through container and sandbox.
Restore has also been added but it is only a stub. None of this
works yet. More changes to come.

PiperOrigin-RevId: 199510105
Change-Id: Ibd08d57f4737847eb25ca20b114518e487320185
2018-06-06 12:31:53 -07:00
Googler 722275c3d1 Added a function to the controller to checkpoint a container.
Functionality for checkpoint is not complete, more to come.

PiperOrigin-RevId: 199500803
Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7
2018-06-06 11:43:55 -07:00
Brian Geffon ff7b4a156f Add support for rpcinet owned procfs files.
This change will add support for /proc/sys/net and /proc/net which will
be managed and owned by rpcinet. This will allow these inodes to be forward
as rpcs.

PiperOrigin-RevId: 199370799
Change-Id: I2c876005d98fe55dd126145163bee5a645458ce4
2018-06-05 15:45:35 -07:00
Zhaozhong Ni 343020ca27 netstack: make TCP endpoint closed and error state cleanup work synchronous.
So that when saving TCP endpoint in these states, there is no pending or
background activities.

Also lift tcp network save rejection error to tcpip package.

PiperOrigin-RevId: 199370748
Change-Id: Ief7b45c2a7338d12414cd7c23db95de6a9c22700
2018-06-05 15:44:38 -07:00
Fabricio Voznika 19a0e83b50 Make fsgofer attach more strict
Refuse to mount paths with "." and ".." in the path to prevent
a compromised Sentry to mount "../../secrets". Only allow
Attach to be called once per mount point.

PiperOrigin-RevId: 199225929
Change-Id: I2a3eb7ea0b23f22eb8dde2e383e32563ec003bd5
2018-06-04 18:04:54 -07:00
Fabricio Voznika 6c585b8eb6 Create destination mount dir if it doesn't exist
PiperOrigin-RevId: 199175296
Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
2018-06-04 12:31:35 -07:00
Fabricio Voznika 78ccd1298e Return 'running' if gofer is still alive
Containerd will start deleting container and rootfs after container
is stopped. However, if gofer is still running, rootfs cleanup will
fail because of device busy.

This CL makes sure that gofer is not running when container state is
stopped.

Change from: lantaol@google.com

PiperOrigin-RevId: 199172668
Change-Id: I9d874eec3ecf74fd9c8edd7f62d9f998edef66fe
2018-06-04 12:14:23 -07:00
Fabricio Voznika 55a37ceef1 Fix leaky FD
9P socket was being created without CLOEXEC and was being inherited
by the children. This would prevent the gofer from detecting that the
sandbox had exited, because the socket would not be closed.

PiperOrigin-RevId: 199168959
Change-Id: I3ee1a07cbe7331b0aeb1cf2b697e728ce24f85a7
2018-06-04 11:52:17 -07:00