Commit Graph

137 Commits

Author SHA1 Message Date
Adin Scannell 1b5062263b Add allocator abstraction for page tables.
In order to prevent possible garbage collection and reuse of page table
pages prior to invalidation, introduce a former allocator abstraction
that can ensure entries are held during a single traversal. This also
cleans up the abstraction and splits it out of the machine itself.

PiperOrigin-RevId: 199581636
Change-Id: I2257d5d7ffd9c36f9b7ecd42f769261baeaf115c
2018-06-06 21:48:24 -07:00
Kevin Krakauer 206e90d057 runsc: Support abbreviated container IDs.
Just a UI/usability addition. It's a lot easier to type "60" than
"60185c721d7e10c00489f1fa210ee0d35c594873d6376b457fb1815e4fdbfc2c".

PiperOrigin-RevId: 199547932
Change-Id: I19011b5061a88aba48a9ad7f8cf954a6782de854
2018-06-06 16:13:53 -07:00
Brian Geffon 79fef54eb1 Add support for rpcinet ioctl(2).
This change will add support for ioctls that have previously
been supported by netstack.

LINE_LENGTH_IGNORE

PiperOrigin-RevId: 199544114
Change-Id: I3769202c19502c3b7d05e06ea9552acfd9255893
2018-06-06 15:53:26 -07:00
Googler 0c34b460f2 Add runsc checkpoint command.
Checkpoint command is plumbed through container and sandbox.
Restore has also been added but it is only a stub. None of this
works yet. More changes to come.

PiperOrigin-RevId: 199510105
Change-Id: Ibd08d57f4737847eb25ca20b114518e487320185
2018-06-06 12:31:53 -07:00
Googler 722275c3d1 Added a function to the controller to checkpoint a container.
Functionality for checkpoint is not complete, more to come.

PiperOrigin-RevId: 199500803
Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7
2018-06-06 11:43:55 -07:00
Brian Geffon ff7b4a156f Add support for rpcinet owned procfs files.
This change will add support for /proc/sys/net and /proc/net which will
be managed and owned by rpcinet. This will allow these inodes to be forward
as rpcs.

PiperOrigin-RevId: 199370799
Change-Id: I2c876005d98fe55dd126145163bee5a645458ce4
2018-06-05 15:45:35 -07:00
Zhaozhong Ni 343020ca27 netstack: make TCP endpoint closed and error state cleanup work synchronous.
So that when saving TCP endpoint in these states, there is no pending or
background activities.

Also lift tcp network save rejection error to tcpip package.

PiperOrigin-RevId: 199370748
Change-Id: Ief7b45c2a7338d12414cd7c23db95de6a9c22700
2018-06-05 15:44:38 -07:00
Fabricio Voznika 19a0e83b50 Make fsgofer attach more strict
Refuse to mount paths with "." and ".." in the path to prevent
a compromised Sentry to mount "../../secrets". Only allow
Attach to be called once per mount point.

PiperOrigin-RevId: 199225929
Change-Id: I2a3eb7ea0b23f22eb8dde2e383e32563ec003bd5
2018-06-04 18:04:54 -07:00
Fabricio Voznika 6c585b8eb6 Create destination mount dir if it doesn't exist
PiperOrigin-RevId: 199175296
Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
2018-06-04 12:31:35 -07:00
Fabricio Voznika 78ccd1298e Return 'running' if gofer is still alive
Containerd will start deleting container and rootfs after container
is stopped. However, if gofer is still running, rootfs cleanup will
fail because of device busy.

This CL makes sure that gofer is not running when container state is
stopped.

Change from: lantaol@google.com

PiperOrigin-RevId: 199172668
Change-Id: I9d874eec3ecf74fd9c8edd7f62d9f998edef66fe
2018-06-04 12:14:23 -07:00
Fabricio Voznika 55a37ceef1 Fix leaky FD
9P socket was being created without CLOEXEC and was being inherited
by the children. This would prevent the gofer from detecting that the
sandbox had exited, because the socket would not be closed.

PiperOrigin-RevId: 199168959
Change-Id: I3ee1a07cbe7331b0aeb1cf2b697e728ce24f85a7
2018-06-04 11:52:17 -07:00
Fabricio Voznika a0e2126be4 Refactor container_test in preparation for sandbox_test
Common code to setup and run sandbox is moved to testutil. Also, don't
link "boot" and "gofer" commands with test binary. Instead, use runsc
binary from the build. This not only make the test setup simpler, but
also resolves a dependency issue with sandbox_tests not depending on
container package.

PiperOrigin-RevId: 199164478
Change-Id: I27226286ca3f914d4d381358270dd7d70ee8372f
2018-06-04 11:26:30 -07:00
Fabricio Voznika 0929bdee34 Fix checksum file for today's build
PiperOrigin-RevId: 199153448
Change-Id: Ic1f0456191080117a8586f77dd2fb44dc53754ca
2018-06-04 10:28:22 -07:00
Fabricio Voznika 43dd424f42 Add SHA512 pointer to README
PiperOrigin-RevId: 199008198
Change-Id: I6d1a0107ae1b11f160b42a2cabaf1fb8ce419edf
2018-06-02 15:22:21 -07:00
Brian Geffon 0212f222c7 Fix refcount bug in rpcinet socketOperations.Accept.
PiperOrigin-RevId: 198931222
Change-Id: I69ee12318e87b9a6a4a94b18a9bf0ae4e39d7eaf
2018-06-01 14:59:47 -07:00
Adin Scannell 659b10d1a6 Move page tables lock into the address space.
This is necessary to prevent races with invalidation. It is currently
possible that page tables are garbage collected while paging caches
refer to them. We must ensure that pages are held until caches can be
invalidated. This is not achieved by this goal alone, but moving locking
to outside the page tables themselves is a requisite.

PiperOrigin-RevId: 198920784
Change-Id: I66fffecd49cb14aa2e676a84a68cabfc0c8b3e9a
2018-06-01 13:51:16 -07:00
Zhengyu He d1ca50d49e Add SyscallRules that supports argument filtering
PiperOrigin-RevId: 198919043
Change-Id: I7f1f0a3b3430cd0936a4ee4fc6859aab71820bdf
2018-06-01 13:40:52 -07:00
Fabricio Voznika 65dadc0029 Ignores IPv6 addresses when configuring network
Closes #60

PiperOrigin-RevId: 198887885
Change-Id: I9bf990ee3fde9259836e57d67257bef5b85c6008
2018-06-01 10:09:37 -07:00
Fabricio Voznika 3547c48867 Add SHA512 file to nightly build
PiperOrigin-RevId: 198745666
Change-Id: I38d4163cd65f1236b09ce4f6481197a9a9fd29f2
2018-05-31 10:53:54 -07:00
Adin Scannell 57edd0ee19 Restore FS on resume.
Previously, the vCPU FS was always correct because it relied on the
reset coming out of the switch. When that doesn't occur, for example,
using bluepill directly, the FS value can be incorrect leading to
strange corruption.

This change is necessary for a subsequent change that enforces guest
mode for page table modifications, and it may reduce test flakiness.
(The problematic path may occur in tests, but does not occur in the
actual platform.)

PiperOrigin-RevId: 198648137
Change-Id: I513910a973dd8666c9a1d18cf78990964d6a644d
2018-05-30 17:37:51 -07:00
Adin Scannell c59475599d Change ring0 & page tables arguments to structs.
This is a refactor of ring0 and ring0/pagetables that changes from
individual arguments to opts structures. This should involve no
functional changes, but sets the stage for subsequent changes.

PiperOrigin-RevId: 198627556
Change-Id: Id4460340f6a73f0c793cd879324398139cd58ae9
2018-05-30 15:14:44 -07:00
Fabricio Voznika 812e83d3bb Supress error when deleting non-existing container with --force
This addresses the first issue reported in #59. CRI-O expects runsc to
return success to delete when --force is used with a non-existing container.

PiperOrigin-RevId: 198487418
Change-Id: If7660e8fdab1eb29549d0a7a45ea82e20a1d4f4a
2018-05-29 17:58:12 -07:00
Fabricio Voznika c5dc873e44 Automated rollback of changelist 196886839
PiperOrigin-RevId: 198457660
Change-Id: I6ea5cf0b4cfe2b5ba455325a7e5299880e5a088a
2018-05-29 14:24:07 -07:00
Brian Geffon a8b90a7158 Poll should wake up on ECONNREFUSED with no mask.
Today poll will not wake up on a ECONNREFUSED if no poll mask
is specified, which is equivalent to POLLHUP | POLLERR which are
implicitly added during the poll syscall.

PiperOrigin-RevId: 197967183
Change-Id: I668d0730c33701228913f2d0843b48491b642efb
2018-05-24 15:46:50 -07:00
Brian Geffon 7f62e9c32e rpcinet connect doesn't handle all errnos correctly.
These were causing non-blocking related errnos to be returned to
the sentry when they were created as blocking FDs internally.

PiperOrigin-RevId: 197962932
Change-Id: I3f843535ff87ebf4cb5827e9f3d26abfb79461b0
2018-05-24 15:18:21 -07:00
Fabricio Voznika e48f707876 Configure sandbox as superuser
Container user might not have enough priviledge to walk directories and
mount filesystems. Instead, create superuser to perform these steps of
the configuration.

PiperOrigin-RevId: 197953667
Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb
2018-05-24 14:27:57 -07:00
Brian Geffon 7996ae7ccf Adding test case for RST acceptable ack panic
PiperOrigin-RevId: 197795613
Change-Id: I759dd04995d900cba6b984649fa48bbc880946d6
2018-05-23 15:01:48 -07:00
Ian Gudger 02ad0dc3d9 Fix typo in TCP transport
PiperOrigin-RevId: 197789418
Change-Id: I86b1574c8d3b8b321348d9b101ffaef7aa15f722
2018-05-23 14:28:43 -07:00
Fabricio Voznika 51c95c270b Remove offset check to match with Linux implementation.
PiperOrigin-RevId: 197644246
Change-Id: I63eb0a58889e69fbc4af2af8232f6fa1c399d43f
2018-05-22 16:36:40 -07:00
Brian Geffon 257ab8de93 When sending a RST the acceptable ACK window shouldn't change.
Today when we transmit a RST it's happening during the time-wait
flow. Because a FIN is allowed to advance the acceptable ACK window
we're incorrectly doing that for a RST.

PiperOrigin-RevId: 197637565
Change-Id: I080190b06bd0225326cd68c1fbf37bd3fdbd414e
2018-05-22 15:52:41 -07:00
Chanwit Kaewkasi 7b2b7a3946 Change length type, and let fadvise64 return ESPIPE if file is a pipe
Kernel before 2.6.16 return EINVAL, but later return ESPIPE for this case.
Also change type of "length" from Uint(uint32) to Int64.
Because C header uses type "size_t" (unsigned long) or "off_t" (long) for length.
And it makes more sense to check length < 0 with Int64 because Uint cannot be negative.

Change-Id: Ifd7fea2dcded7577a30760558d0d31f479f074c4
PiperOrigin-RevId: 197616743
2018-05-22 13:48:14 -07:00
Kevin Krakauer 705605f901 sentry: Add simple SIOCGIFFLAGS support (IFF_RUNNING and IFF_PROMIS).
Establishes a way of communicating interface flags between netstack and
epsocket. More flags can be added over time.

PiperOrigin-RevId: 197616669
Change-Id: I230448c5fb5b7d2e8d69b41a451eb4e1096a0e30
2018-05-22 13:47:33 -07:00
Ian Gudger 3a6070dc98 Clarify that syserr.New must only be called during init
PiperOrigin-RevId: 197599402
Change-Id: I23eb0336195ab0d3e5fb49c0c57fc9e0715a9b75
2018-05-22 11:54:31 -07:00
Fabricio Voznika ed2b86a549 Fix test failure when user can't mount temp dir
PiperOrigin-RevId: 197491098
Change-Id: Ifb75bd4e4f41b84256b6d7afc4b157f6ce3839f3
2018-05-21 17:48:04 -07:00
Adin Scannell 61b0b19497 Dramatically improve handling of KVM vCPU pool.
Especially in situations with small numbers of vCPUs, the existing
system resulted in excessive thrashing. Now, execution contexts
co-ordinate as smoothly as they can to share a small number of cores.

PiperOrigin-RevId: 197483323
Change-Id: I0afc0c5363ea9386994355baf3904bf5fe08c56c
2018-05-21 16:49:40 -07:00
Kevin Krakauer d4c81b7a21 sentry: Get "ip link" working.
In Linux, many UDS ioctls are passed through to the NIC driver. We do the same
here, passing ioctl calls to Unix sockets through to epsocket.

In Linux you can see this path at net/socket.c:sock_ioctl, which calls
sock_do_ioctl, which calls net/core/dev_ioctl.c:dev_ioctl.

SIOCGIFNAME is also added.

PiperOrigin-RevId: 197167508
Change-Id: I62c326a4792bd0a473e9c9108aafb6a6354f2b64
2018-05-18 10:43:41 -07:00
Fabricio Voznika a1e5862f3c Move postgres to list of supported images
PiperOrigin-RevId: 197104043
Change-Id: I377c0727ebf0c44361ed221e1b197787825bfb7b
2018-05-17 23:22:40 -07:00
Michael Pratt b960559fdb Cleanup docs
This brings the proc document more up-to-date.

PiperOrigin-RevId: 197070161
Change-Id: Iae2cf9dc44e3e748a33f497bb95bd3c10d0c094a
2018-05-17 16:26:42 -07:00
Rahat Mahmood b904250b86 Fix capability check for sysv semaphores.
Capabilities for sysv sem operations were being checked against the
current task's user namespace. They should be checked against the user
namespace owning the ipc namespace for the sems instead, per
ipc/util.c:ipcperms().

PiperOrigin-RevId: 197063111
Change-Id: Iba29486b316f2e01ee331dda4e48a6ab7960d589
2018-05-17 15:38:11 -07:00
Rahat Mahmood 8878a66a56 Implement sysv shm.
PiperOrigin-RevId: 197058289
Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
2018-05-17 15:06:19 -07:00
Ian Gudger a8d7cee3e8 Fix sendto for dual stack UDP sockets
Previously, dual stack UDP sockets bound to an IPv4 address could not use
sendto to communicate with IPv4 addresses. Further, dual stack UDP sockets
bound to an IPv6 address could use sendto to communicate with IPv4 addresses.
Neither of these behaviors are consistent with Linux.

PiperOrigin-RevId: 197036024
Change-Id: Ic3713efc569f26196e35bb41e6ad63f23675fc90
2018-05-17 12:50:22 -07:00
Nicolas Lacasse 31386185fe Push signal-delivery and wait into the sandbox.
This is another step towards multi-container support.

Previously, we delivered signals directly to the sandbox process (which then
forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a
container by waiting on the sandbox process itself. This approach will not work
when there are multiple containers inside the sandbox, and we need to
signal/wait on individual containers.

This CL adds two new messages, ContainerSignal and ContainerWait. These
messages include the id of the container to signal/wait. The controller inside
the sandbox receives these messages and signals/waits on the appropriate
process inside the sandbox.

The container id is plumbed into the sandbox, but it currently is not used. We
still end up signaling/waiting on PID 1 in all cases.  Once we actually have
multiple containers inside the sandbox, we will need to keep some sort of map
of container id -> pid (or possibly pid namespace), and signal/kill the
appropriate process for the container.

PiperOrigin-RevId: 197028366
Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-05-17 11:55:28 -07:00
Christopher Koch 8e1deb2ab8 Fix another socket Dirent refcount.
PiperOrigin-RevId: 196893452
Change-Id: I5ea0f851fcabc5eac5859e61f15213323d996337
2018-05-16 14:54:48 -07:00
Chanwit Kaewkasi 3131a6b131 Verify that when offset address is not null, infile must be seekable
Change-Id: Id247399baeac58f6cd774acabd5d1da05e5b5697
PiperOrigin-RevId: 196887768
2018-05-16 14:20:24 -07:00
Zhaozhong Ni 5b4c20e1b8 netstack: make TCP endpoint closed and error state cleanup work synchronous.
So that when saving TCP endpoint in these states, there is no pending or
background activities.

Also lift tcp network save rejection error to tcpip package.

PiperOrigin-RevId: 196886839
Change-Id: I0fe73750f2743ec7e62d139eb2cec758c5dd6698
2018-05-16 14:15:24 -07:00
Christopher Koch d154c6a25f Refcount socket Dirents correctly.
This should fix the socket Dirent memory leak.

fs.NewFile takes a new reference. It should hold the *only* reference.
DecRef that socket Dirent.

Before the globalDirentMap was introduced, a mis-refcounted Dirent
would be garbage collected when all references to it were gone. For
socket Dirents, this meant that they would be garbage collected when
the associated fs.Files disappeared.

After the globalDirentMap, Dirents *must* be reference-counted
correctly to be garbage collected, as Dirents remove themselves
from the global map when their refcount goes to -1 (see Dirent.destroy).
That removes the last pointer to that Dirent.

PiperOrigin-RevId: 196878973
Change-Id: Ic7afcd1de97c7101ccb13be5fc31de0fb50963f0
2018-05-16 13:29:17 -07:00
Brian Geffon f295e26b8a Release mutex in BidirectionalConnect to avoid deadlock.
When doing a BidirectionalConnect we don't need to continue holding
the ConnectingEndpoint's mutex when creating the NewConnectedEndpoint
as it was held during the Connect. Additionally, we're not holding
the baseEndpoint mutex while Unregistering an event.

PiperOrigin-RevId: 196875557
Change-Id: Ied4ceed89de883121c6cba81bc62aa3a8549b1e9
2018-05-16 13:07:12 -07:00
Adin Scannell 4b7e4f3d36 Fix KVM EFAULT handling.
PiperOrigin-RevId: 196781718
Change-Id: I889766eed871929cdc247c6b9aa634398adea9c9
2018-05-15 22:44:40 -07:00
Adin Scannell 00adea3a3f Simplify KVM invalidation logic.
PiperOrigin-RevId: 196780209
Change-Id: I89f39eec914ce54a7c6c4f28e1b6d5ff5a7dd38d
2018-05-15 22:21:36 -07:00
Adin Scannell 310a99228b Simplify KVM state handling.
This also removes the dependency on tmutex.

PiperOrigin-RevId: 196764317
Change-Id: I523fb67454318e1a2ca9da3a08e63bfa3c1eeed3
2018-05-15 18:34:09 -07:00