Commit Graph

171 Commits

Author SHA1 Message Date
Jamie Liu 657db692b2 Ignore expiration count in kernelCPUClockListener.Notify.
PiperOrigin-RevId: 200590832
Change-Id: I35b817ecccc9414a742dee4815dfc67d0c7d0496
2018-06-14 11:35:11 -07:00
Michael Pratt d71f5ef688 Add nanosleep filter for Go 1.11 support
golang.org/cl/108538 replaces pselect6 with nanosleep in runtime.usleep. Update
the filters accordingly.

PiperOrigin-RevId: 200574612
Change-Id: Ifb2296fcb3781518fc047aabbbffedb9ae488cd7
2018-06-14 10:11:05 -07:00
Ian Gudger f5d0c59f5c Fix reference leak in VDSO validation
PiperOrigin-RevId: 200496070
Change-Id: I33adb717c44e5b4bcadece882be3ab1ee3920556
2018-06-13 20:00:55 -07:00
Brian Geffon 1170039e78 Fix missing returns in rpcinet.
PiperOrigin-RevId: 200472634
Change-Id: I3f0fb9e3b2f8616e6aa1569188258f330bf1ed31
2018-06-13 16:21:23 -07:00
Adin Scannell 7b7b199ed0 Deflake kvm_test.
PiperOrigin-RevId: 200439846
Change-Id: I9970fe0716cb02f0f41b754891d55db7e0729f56
2018-06-13 13:05:33 -07:00
Fabricio Voznika 717f2501c9 Fix failure to mount volume that sandbox process has no access
Boot loader tries to stat mount to determine whether it's a file or not. This
may file if the sandbox process doesn't have access to the file. Instead, add
overlay on top of file, which is better anyway since we don't want to propagate
changes to the host.

PiperOrigin-RevId: 200411261
Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
2018-06-13 10:20:06 -07:00
Zhaozhong Ni 686093669e sentry: do not treat all save errors as state file errors.
PiperOrigin-RevId: 200410220
Change-Id: I6a8745e33be949e335719083501f18b24f6ba471
2018-06-13 10:14:15 -07:00
Jamie Liu 55b9058456 Log filemem state when panicing due to invalid refcount.
PiperOrigin-RevId: 200408305
Change-Id: I676ee49ec77697105723577928c7f82088cd378e
2018-06-13 10:03:54 -07:00
Ian Gudger ba426f7782 Fix reference leak for negative dirents
PiperOrigin-RevId: 200306715
Change-Id: I7c80059c77ebd3d9a5d7d48b05c8e7a597f10850
2018-06-12 17:04:20 -07:00
Brian Geffon c2b3f04d1c Rpcinet doensn't handle SO_RCVTIMEO properly.
Rpcinet already inherits socket.ReceiveTimeout; however, it's
never set on setsockopt(2). The value is currently forwarded
as an RPC and ignored as all sockets will be non-blocking
on the RPC side.

PiperOrigin-RevId: 200299260
Change-Id: I6c610ea22c808ff6420c63759dccfaeab17959dd
2018-06-12 16:16:15 -07:00
Lantao Liu 2506b9b11f runsc: do not include sub target if it is not started with '/'.
PiperOrigin-RevId: 200274828
Change-Id: I956703217df08d8650a881479b7ade8f9f119912
2018-06-12 13:54:54 -07:00
Brielle Broder 711a9869e5 Runsc checkpoint works.
This is the first iteration of checkpoint that actually saves to a file.
Tests for checkpoint are included.

Ran into an issue when private unix sockets are enabled. An error message
was added for this case and the mutex state was set.

PiperOrigin-RevId: 200269470
Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
2018-06-12 13:25:23 -07:00
Jamie Liu 7a10df454b Drop MMapOpts.MappingIdentity reference in loader.mapSegment.
PiperOrigin-RevId: 200261995
Change-Id: I7e460b18ceab2c23096bdeb7416159d6e774aaf7
2018-06-12 12:38:02 -07:00
Kevin Krakauer 2dc9cd7bf7 runsc: enable terminals in the sandbox.
runsc now mounts the devpts filesystem, so you get a real terminal using
ssh+sshd.

PiperOrigin-RevId: 200244830
Change-Id: If577c805ad0138fda13103210fa47178d8ac6605
2018-06-12 11:03:25 -07:00
Fabricio Voznika 48335318a2 Enable debug logging in tests
Unit tests call runsc directly now, so all command line arguments
are valid. On the other hand, enabling debug in the test binary
doesn't affect runsc. It needs to be set in the config.

PiperOrigin-RevId: 200237706
Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7
2018-06-12 10:25:55 -07:00
Adin Scannell 41f766893a Minor ring0 interface cleanup.
- Remove unused methods.
- Provide declaration for asm function.

PiperOrigin-RevId: 200146850
Change-Id: Ic455c96ffe0d2e78ef15f824eb65d7de705b054a
2018-06-11 18:17:15 -07:00
Adin Scannell 1397a413b4 Make page tables split-safe.
In order to minimize the likelihood of exit during page table
modifications, make the full set of page table functions split-safe.
This is not strictly necessary (and you may still incur splits due to
allocations from the allocator pool) but should make retries a very rare
occurance.

PiperOrigin-RevId: 200146688
Change-Id: I8fa36aa16b807beda2f0b057be60038258e8d597
2018-06-11 18:15:14 -07:00
Adin Scannell 09b0a9c320 Handle all exception vectors.
PiperOrigin-RevId: 200144655
Change-Id: I5a753c74b75007b7714d6fe34aa0d2e845dc5c41
2018-06-11 17:57:19 -07:00
Fabricio Voznika ea4a468fba Set CLOEXEC option to sockets
hostinet/socket.go: the Sentry doesn't spawn new processes, but it doesn't hurt to protect the socket from leaking.
unet/unet.go: should be setting closing on exec. The FD is explicitly donated to children when needed.

PiperOrigin-RevId: 200135682
Change-Id: Ia8a45ced1e00a19420c8611b12e7a8ee770f89cb
2018-06-11 16:45:50 -07:00
Brian Geffon ab2c2575d6 Rpcinet is incorrectly handling MSG_TRUNC with SOCK_STREAM
SOCK_STREAM has special behavior with respect to MSG_TRUNC. Specifically,
the data isn't actually copied back out to userspace when MSG_TRUNC is
provided on a SOCK_STREAM.

According to tcp(7): "Since version 2.4, Linux supports the use of
MSG_TRUNC in the flags argument of recv(2) (and recvmsg(2)). This flag
causes the received bytes of data to be discarded, rather than passed
back in a caller-supplied buffer."

PiperOrigin-RevId: 200134860
Change-Id: I70f17a5f60ffe7794c3f0cfafd131c069202e90d
2018-06-11 16:40:38 -07:00
Brian Geffon 0412f17e06 rpcinet is treating EAGAIN and EWOULDBLOCK as different errnos.
PiperOrigin-RevId: 200124614
Change-Id: I38a7b083f1464a2a586fe24db648e624c455fec5
2018-06-11 15:34:08 -07:00
Fabricio Voznika 7260363751 Add O_TRUNC handling in openat
PiperOrigin-RevId: 200103677
Change-Id: I3efb565c30c64d35f8fd7b5c05ed78dcc2990c51
2018-06-11 13:35:21 -07:00
Kevin Krakauer 032b0398a5 Sentry: split tty.queue into its own file.
Minor refactor. line_discipline.go was home to 2 large structs (lineDiscipline
and queue), and queue is now large enough IMO to get its own file.

Also moves queue locks into the queue struct, making locking simpler.

PiperOrigin-RevId: 200080301
Change-Id: Ia75a0e9b3d9ac8d7e5a0f0099a54e1f5b8bdea34
2018-06-11 11:09:43 -07:00
Adin Scannell c0ab059e7b Fix kernel flags handling and add missing vectors.
PiperOrigin-RevId: 199877174
Change-Id: I9d19ea301608c2b989df0a6123abb1e779427853
2018-06-08 17:51:50 -07:00
Brian Geffon 2fbd1cf57c Add checks for short CopyOut in rpcinet
PiperOrigin-RevId: 199864753
Change-Id: Ibace6a1fdf99ee6ce368ac12c390aa8a02dbdfb7
2018-06-08 15:58:22 -07:00
Adin Scannell 6728f09910 Fix sigaltstack semantics.
Walking off the bottom of the sigaltstack, for example with recursive faults,
results in forced signal delivery, not resetting the stack or pushing signal
stack to whatever happens to lie below the signal stack.

PiperOrigin-RevId: 199856085
Change-Id: I0004d2523f0df35d18714de2685b3eaa147837e0
2018-06-08 15:01:21 -07:00
Bhasker Hariharan de8dba205f Add a protocol option to set congestion control algorithm.
Also adds support to query available congestion control algorithms.

PiperOrigin-RevId: 199826897
Change-Id: I2b338b709820ee9cf58bb56d83aa7b1a39f4eab2
2018-06-08 11:46:23 -07:00
Brian Geffon 2f3895d6f7 rpcinet is not correctly handling MSG_TRUNC on recvmsg(2).
MSG_TRUNC can cause recvmsg(2) to return a value larger than
the buffer size. In this situation it's an indication that the
buffer was completely filled and that the msg was truncated.
Previously in rpcinet we were returning the buffer size but we
should actually be returning the payload length as returned by
the syscall.

PiperOrigin-RevId: 199814221
Change-Id: If09aa364219c1bf193603896fcc0dc5c55e85d21
2018-06-08 10:33:25 -07:00
Fabricio Voznika 5c51bc51e4 Drop capabilities not needed by Gofer
PiperOrigin-RevId: 199808391
Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
2018-06-08 09:59:26 -07:00
Brian Geffon 5c37097e34 rpcinet should not block in read(2) rpcs.
PiperOrigin-RevId: 199703609
Change-Id: I8153b0396b22a230a68d4b69c46652a5545f7630
2018-06-07 15:10:15 -07:00
Brian Geffon 7e9893eeb5 Add missing rpcinet ioctls.
PiperOrigin-RevId: 199669120
Change-Id: I0be88cdbba29760f967e9a5bb4144ca62c1ed7aa
2018-06-07 11:37:16 -07:00
Kevin Krakauer 9170303105 Sentry: very basic terminal echo support.
Adds support for echo to terminals. Echoing is just copying input back out to
the user, e.g. when I type "foo" into a terminal, I expect "foo" to be echoed
back to my terminal.

Also makes the transform function part of the queue, eliminating the need to
pass them around together and the possibility of using the wrong transform for a
queue.

PiperOrigin-RevId: 199655147
Change-Id: I37c490d4fc1ee91da20ae58ba1f884a5c14fd0d8
2018-06-07 10:21:22 -07:00
Adin Scannell d269845159 Ensure guest-mode for page table modifications.
Because of the KVM shadow page table implementation, modifications made
to guest page tables from host mode may not be syncronized correctly,
resulting in undefined behavior. This is a KVM bug: page table pages
should also be tracked for host modifications and resynced appropriately
(e.g. the guest could "DMA" into a page table page in theory).

However, since we can't rely on this being fixed everywhere, workaround
the issue by forcing page table modifications to be in guest mode. This
will generally be the case anyways, but now if an exit occurs during
modifications, we will re-enter and perform the modifications again.

PiperOrigin-RevId: 199587895
Change-Id: I83c20b4cf2a9f9fa56f59f34939601dd34538fb0
2018-06-06 23:26:14 -07:00
Adin Scannell 3374849cb5 Split PCID implementation from page tables.
Instead of associating a single PCID with each set of page tables (which
will reach the maximum quickly), allow a dynamic pool for each vCPU.
This is the same way that Linux operates. We also split management of
PCIDs out of the page tables themselves for simplicity.

PiperOrigin-RevId: 199585631
Change-Id: I42f3486ada3cb2a26f623c65ac279b473ae63201
2018-06-06 22:52:55 -07:00
Adin Scannell 1b5062263b Add allocator abstraction for page tables.
In order to prevent possible garbage collection and reuse of page table
pages prior to invalidation, introduce a former allocator abstraction
that can ensure entries are held during a single traversal. This also
cleans up the abstraction and splits it out of the machine itself.

PiperOrigin-RevId: 199581636
Change-Id: I2257d5d7ffd9c36f9b7ecd42f769261baeaf115c
2018-06-06 21:48:24 -07:00
Kevin Krakauer 206e90d057 runsc: Support abbreviated container IDs.
Just a UI/usability addition. It's a lot easier to type "60" than
"60185c721d7e10c00489f1fa210ee0d35c594873d6376b457fb1815e4fdbfc2c".

PiperOrigin-RevId: 199547932
Change-Id: I19011b5061a88aba48a9ad7f8cf954a6782de854
2018-06-06 16:13:53 -07:00
Brian Geffon 79fef54eb1 Add support for rpcinet ioctl(2).
This change will add support for ioctls that have previously
been supported by netstack.

LINE_LENGTH_IGNORE

PiperOrigin-RevId: 199544114
Change-Id: I3769202c19502c3b7d05e06ea9552acfd9255893
2018-06-06 15:53:26 -07:00
Googler 0c34b460f2 Add runsc checkpoint command.
Checkpoint command is plumbed through container and sandbox.
Restore has also been added but it is only a stub. None of this
works yet. More changes to come.

PiperOrigin-RevId: 199510105
Change-Id: Ibd08d57f4737847eb25ca20b114518e487320185
2018-06-06 12:31:53 -07:00
Googler 722275c3d1 Added a function to the controller to checkpoint a container.
Functionality for checkpoint is not complete, more to come.

PiperOrigin-RevId: 199500803
Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7
2018-06-06 11:43:55 -07:00
Brian Geffon ff7b4a156f Add support for rpcinet owned procfs files.
This change will add support for /proc/sys/net and /proc/net which will
be managed and owned by rpcinet. This will allow these inodes to be forward
as rpcs.

PiperOrigin-RevId: 199370799
Change-Id: I2c876005d98fe55dd126145163bee5a645458ce4
2018-06-05 15:45:35 -07:00
Zhaozhong Ni 343020ca27 netstack: make TCP endpoint closed and error state cleanup work synchronous.
So that when saving TCP endpoint in these states, there is no pending or
background activities.

Also lift tcp network save rejection error to tcpip package.

PiperOrigin-RevId: 199370748
Change-Id: Ief7b45c2a7338d12414cd7c23db95de6a9c22700
2018-06-05 15:44:38 -07:00
Fabricio Voznika 19a0e83b50 Make fsgofer attach more strict
Refuse to mount paths with "." and ".." in the path to prevent
a compromised Sentry to mount "../../secrets". Only allow
Attach to be called once per mount point.

PiperOrigin-RevId: 199225929
Change-Id: I2a3eb7ea0b23f22eb8dde2e383e32563ec003bd5
2018-06-04 18:04:54 -07:00
Fabricio Voznika 6c585b8eb6 Create destination mount dir if it doesn't exist
PiperOrigin-RevId: 199175296
Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
2018-06-04 12:31:35 -07:00
Fabricio Voznika 78ccd1298e Return 'running' if gofer is still alive
Containerd will start deleting container and rootfs after container
is stopped. However, if gofer is still running, rootfs cleanup will
fail because of device busy.

This CL makes sure that gofer is not running when container state is
stopped.

Change from: lantaol@google.com

PiperOrigin-RevId: 199172668
Change-Id: I9d874eec3ecf74fd9c8edd7f62d9f998edef66fe
2018-06-04 12:14:23 -07:00
Fabricio Voznika 55a37ceef1 Fix leaky FD
9P socket was being created without CLOEXEC and was being inherited
by the children. This would prevent the gofer from detecting that the
sandbox had exited, because the socket would not be closed.

PiperOrigin-RevId: 199168959
Change-Id: I3ee1a07cbe7331b0aeb1cf2b697e728ce24f85a7
2018-06-04 11:52:17 -07:00
Fabricio Voznika a0e2126be4 Refactor container_test in preparation for sandbox_test
Common code to setup and run sandbox is moved to testutil. Also, don't
link "boot" and "gofer" commands with test binary. Instead, use runsc
binary from the build. This not only make the test setup simpler, but
also resolves a dependency issue with sandbox_tests not depending on
container package.

PiperOrigin-RevId: 199164478
Change-Id: I27226286ca3f914d4d381358270dd7d70ee8372f
2018-06-04 11:26:30 -07:00
Fabricio Voznika 0929bdee34 Fix checksum file for today's build
PiperOrigin-RevId: 199153448
Change-Id: Ic1f0456191080117a8586f77dd2fb44dc53754ca
2018-06-04 10:28:22 -07:00
Fabricio Voznika 43dd424f42 Add SHA512 pointer to README
PiperOrigin-RevId: 199008198
Change-Id: I6d1a0107ae1b11f160b42a2cabaf1fb8ce419edf
2018-06-02 15:22:21 -07:00
Brian Geffon 0212f222c7 Fix refcount bug in rpcinet socketOperations.Accept.
PiperOrigin-RevId: 198931222
Change-Id: I69ee12318e87b9a6a4a94b18a9bf0ae4e39d7eaf
2018-06-01 14:59:47 -07:00
Adin Scannell 659b10d1a6 Move page tables lock into the address space.
This is necessary to prevent races with invalidation. It is currently
possible that page tables are garbage collected while paging caches
refer to them. We must ensure that pages are held until caches can be
invalidated. This is not achieved by this goal alone, but moving locking
to outside the page tables themselves is a requisite.

PiperOrigin-RevId: 198920784
Change-Id: I66fffecd49cb14aa2e676a84a68cabfc0c8b3e9a
2018-06-01 13:51:16 -07:00