Commit Graph

126 Commits

Author SHA1 Message Date
Fabricio Voznika a0e2126be4 Refactor container_test in preparation for sandbox_test
Common code to setup and run sandbox is moved to testutil. Also, don't
link "boot" and "gofer" commands with test binary. Instead, use runsc
binary from the build. This not only make the test setup simpler, but
also resolves a dependency issue with sandbox_tests not depending on
container package.

PiperOrigin-RevId: 199164478
Change-Id: I27226286ca3f914d4d381358270dd7d70ee8372f
2018-06-04 11:26:30 -07:00
Fabricio Voznika 0929bdee34 Fix checksum file for today's build
PiperOrigin-RevId: 199153448
Change-Id: Ic1f0456191080117a8586f77dd2fb44dc53754ca
2018-06-04 10:28:22 -07:00
Fabricio Voznika 43dd424f42 Add SHA512 pointer to README
PiperOrigin-RevId: 199008198
Change-Id: I6d1a0107ae1b11f160b42a2cabaf1fb8ce419edf
2018-06-02 15:22:21 -07:00
Brian Geffon 0212f222c7 Fix refcount bug in rpcinet socketOperations.Accept.
PiperOrigin-RevId: 198931222
Change-Id: I69ee12318e87b9a6a4a94b18a9bf0ae4e39d7eaf
2018-06-01 14:59:47 -07:00
Adin Scannell 659b10d1a6 Move page tables lock into the address space.
This is necessary to prevent races with invalidation. It is currently
possible that page tables are garbage collected while paging caches
refer to them. We must ensure that pages are held until caches can be
invalidated. This is not achieved by this goal alone, but moving locking
to outside the page tables themselves is a requisite.

PiperOrigin-RevId: 198920784
Change-Id: I66fffecd49cb14aa2e676a84a68cabfc0c8b3e9a
2018-06-01 13:51:16 -07:00
Zhengyu He d1ca50d49e Add SyscallRules that supports argument filtering
PiperOrigin-RevId: 198919043
Change-Id: I7f1f0a3b3430cd0936a4ee4fc6859aab71820bdf
2018-06-01 13:40:52 -07:00
Fabricio Voznika 65dadc0029 Ignores IPv6 addresses when configuring network
Closes #60

PiperOrigin-RevId: 198887885
Change-Id: I9bf990ee3fde9259836e57d67257bef5b85c6008
2018-06-01 10:09:37 -07:00
Fabricio Voznika 3547c48867 Add SHA512 file to nightly build
PiperOrigin-RevId: 198745666
Change-Id: I38d4163cd65f1236b09ce4f6481197a9a9fd29f2
2018-05-31 10:53:54 -07:00
Adin Scannell 57edd0ee19 Restore FS on resume.
Previously, the vCPU FS was always correct because it relied on the
reset coming out of the switch. When that doesn't occur, for example,
using bluepill directly, the FS value can be incorrect leading to
strange corruption.

This change is necessary for a subsequent change that enforces guest
mode for page table modifications, and it may reduce test flakiness.
(The problematic path may occur in tests, but does not occur in the
actual platform.)

PiperOrigin-RevId: 198648137
Change-Id: I513910a973dd8666c9a1d18cf78990964d6a644d
2018-05-30 17:37:51 -07:00
Adin Scannell c59475599d Change ring0 & page tables arguments to structs.
This is a refactor of ring0 and ring0/pagetables that changes from
individual arguments to opts structures. This should involve no
functional changes, but sets the stage for subsequent changes.

PiperOrigin-RevId: 198627556
Change-Id: Id4460340f6a73f0c793cd879324398139cd58ae9
2018-05-30 15:14:44 -07:00
Fabricio Voznika 812e83d3bb Supress error when deleting non-existing container with --force
This addresses the first issue reported in #59. CRI-O expects runsc to
return success to delete when --force is used with a non-existing container.

PiperOrigin-RevId: 198487418
Change-Id: If7660e8fdab1eb29549d0a7a45ea82e20a1d4f4a
2018-05-29 17:58:12 -07:00
Fabricio Voznika c5dc873e44 Automated rollback of changelist 196886839
PiperOrigin-RevId: 198457660
Change-Id: I6ea5cf0b4cfe2b5ba455325a7e5299880e5a088a
2018-05-29 14:24:07 -07:00
Brian Geffon a8b90a7158 Poll should wake up on ECONNREFUSED with no mask.
Today poll will not wake up on a ECONNREFUSED if no poll mask
is specified, which is equivalent to POLLHUP | POLLERR which are
implicitly added during the poll syscall.

PiperOrigin-RevId: 197967183
Change-Id: I668d0730c33701228913f2d0843b48491b642efb
2018-05-24 15:46:50 -07:00
Brian Geffon 7f62e9c32e rpcinet connect doesn't handle all errnos correctly.
These were causing non-blocking related errnos to be returned to
the sentry when they were created as blocking FDs internally.

PiperOrigin-RevId: 197962932
Change-Id: I3f843535ff87ebf4cb5827e9f3d26abfb79461b0
2018-05-24 15:18:21 -07:00
Fabricio Voznika e48f707876 Configure sandbox as superuser
Container user might not have enough priviledge to walk directories and
mount filesystems. Instead, create superuser to perform these steps of
the configuration.

PiperOrigin-RevId: 197953667
Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb
2018-05-24 14:27:57 -07:00
Brian Geffon 7996ae7ccf Adding test case for RST acceptable ack panic
PiperOrigin-RevId: 197795613
Change-Id: I759dd04995d900cba6b984649fa48bbc880946d6
2018-05-23 15:01:48 -07:00
Ian Gudger 02ad0dc3d9 Fix typo in TCP transport
PiperOrigin-RevId: 197789418
Change-Id: I86b1574c8d3b8b321348d9b101ffaef7aa15f722
2018-05-23 14:28:43 -07:00
Fabricio Voznika 51c95c270b Remove offset check to match with Linux implementation.
PiperOrigin-RevId: 197644246
Change-Id: I63eb0a58889e69fbc4af2af8232f6fa1c399d43f
2018-05-22 16:36:40 -07:00
Brian Geffon 257ab8de93 When sending a RST the acceptable ACK window shouldn't change.
Today when we transmit a RST it's happening during the time-wait
flow. Because a FIN is allowed to advance the acceptable ACK window
we're incorrectly doing that for a RST.

PiperOrigin-RevId: 197637565
Change-Id: I080190b06bd0225326cd68c1fbf37bd3fdbd414e
2018-05-22 15:52:41 -07:00
Chanwit Kaewkasi 7b2b7a3946 Change length type, and let fadvise64 return ESPIPE if file is a pipe
Kernel before 2.6.16 return EINVAL, but later return ESPIPE for this case.
Also change type of "length" from Uint(uint32) to Int64.
Because C header uses type "size_t" (unsigned long) or "off_t" (long) for length.
And it makes more sense to check length < 0 with Int64 because Uint cannot be negative.

Change-Id: Ifd7fea2dcded7577a30760558d0d31f479f074c4
PiperOrigin-RevId: 197616743
2018-05-22 13:48:14 -07:00
Kevin Krakauer 705605f901 sentry: Add simple SIOCGIFFLAGS support (IFF_RUNNING and IFF_PROMIS).
Establishes a way of communicating interface flags between netstack and
epsocket. More flags can be added over time.

PiperOrigin-RevId: 197616669
Change-Id: I230448c5fb5b7d2e8d69b41a451eb4e1096a0e30
2018-05-22 13:47:33 -07:00
Ian Gudger 3a6070dc98 Clarify that syserr.New must only be called during init
PiperOrigin-RevId: 197599402
Change-Id: I23eb0336195ab0d3e5fb49c0c57fc9e0715a9b75
2018-05-22 11:54:31 -07:00
Fabricio Voznika ed2b86a549 Fix test failure when user can't mount temp dir
PiperOrigin-RevId: 197491098
Change-Id: Ifb75bd4e4f41b84256b6d7afc4b157f6ce3839f3
2018-05-21 17:48:04 -07:00
Adin Scannell 61b0b19497 Dramatically improve handling of KVM vCPU pool.
Especially in situations with small numbers of vCPUs, the existing
system resulted in excessive thrashing. Now, execution contexts
co-ordinate as smoothly as they can to share a small number of cores.

PiperOrigin-RevId: 197483323
Change-Id: I0afc0c5363ea9386994355baf3904bf5fe08c56c
2018-05-21 16:49:40 -07:00
Kevin Krakauer d4c81b7a21 sentry: Get "ip link" working.
In Linux, many UDS ioctls are passed through to the NIC driver. We do the same
here, passing ioctl calls to Unix sockets through to epsocket.

In Linux you can see this path at net/socket.c:sock_ioctl, which calls
sock_do_ioctl, which calls net/core/dev_ioctl.c:dev_ioctl.

SIOCGIFNAME is also added.

PiperOrigin-RevId: 197167508
Change-Id: I62c326a4792bd0a473e9c9108aafb6a6354f2b64
2018-05-18 10:43:41 -07:00
Fabricio Voznika a1e5862f3c Move postgres to list of supported images
PiperOrigin-RevId: 197104043
Change-Id: I377c0727ebf0c44361ed221e1b197787825bfb7b
2018-05-17 23:22:40 -07:00
Michael Pratt b960559fdb Cleanup docs
This brings the proc document more up-to-date.

PiperOrigin-RevId: 197070161
Change-Id: Iae2cf9dc44e3e748a33f497bb95bd3c10d0c094a
2018-05-17 16:26:42 -07:00
Rahat Mahmood b904250b86 Fix capability check for sysv semaphores.
Capabilities for sysv sem operations were being checked against the
current task's user namespace. They should be checked against the user
namespace owning the ipc namespace for the sems instead, per
ipc/util.c:ipcperms().

PiperOrigin-RevId: 197063111
Change-Id: Iba29486b316f2e01ee331dda4e48a6ab7960d589
2018-05-17 15:38:11 -07:00
Rahat Mahmood 8878a66a56 Implement sysv shm.
PiperOrigin-RevId: 197058289
Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
2018-05-17 15:06:19 -07:00
Ian Gudger a8d7cee3e8 Fix sendto for dual stack UDP sockets
Previously, dual stack UDP sockets bound to an IPv4 address could not use
sendto to communicate with IPv4 addresses. Further, dual stack UDP sockets
bound to an IPv6 address could use sendto to communicate with IPv4 addresses.
Neither of these behaviors are consistent with Linux.

PiperOrigin-RevId: 197036024
Change-Id: Ic3713efc569f26196e35bb41e6ad63f23675fc90
2018-05-17 12:50:22 -07:00
Nicolas Lacasse 31386185fe Push signal-delivery and wait into the sandbox.
This is another step towards multi-container support.

Previously, we delivered signals directly to the sandbox process (which then
forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a
container by waiting on the sandbox process itself. This approach will not work
when there are multiple containers inside the sandbox, and we need to
signal/wait on individual containers.

This CL adds two new messages, ContainerSignal and ContainerWait. These
messages include the id of the container to signal/wait. The controller inside
the sandbox receives these messages and signals/waits on the appropriate
process inside the sandbox.

The container id is plumbed into the sandbox, but it currently is not used. We
still end up signaling/waiting on PID 1 in all cases.  Once we actually have
multiple containers inside the sandbox, we will need to keep some sort of map
of container id -> pid (or possibly pid namespace), and signal/kill the
appropriate process for the container.

PiperOrigin-RevId: 197028366
Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-05-17 11:55:28 -07:00
Christopher Koch 8e1deb2ab8 Fix another socket Dirent refcount.
PiperOrigin-RevId: 196893452
Change-Id: I5ea0f851fcabc5eac5859e61f15213323d996337
2018-05-16 14:54:48 -07:00
Chanwit Kaewkasi 3131a6b131 Verify that when offset address is not null, infile must be seekable
Change-Id: Id247399baeac58f6cd774acabd5d1da05e5b5697
PiperOrigin-RevId: 196887768
2018-05-16 14:20:24 -07:00
Zhaozhong Ni 5b4c20e1b8 netstack: make TCP endpoint closed and error state cleanup work synchronous.
So that when saving TCP endpoint in these states, there is no pending or
background activities.

Also lift tcp network save rejection error to tcpip package.

PiperOrigin-RevId: 196886839
Change-Id: I0fe73750f2743ec7e62d139eb2cec758c5dd6698
2018-05-16 14:15:24 -07:00
Christopher Koch d154c6a25f Refcount socket Dirents correctly.
This should fix the socket Dirent memory leak.

fs.NewFile takes a new reference. It should hold the *only* reference.
DecRef that socket Dirent.

Before the globalDirentMap was introduced, a mis-refcounted Dirent
would be garbage collected when all references to it were gone. For
socket Dirents, this meant that they would be garbage collected when
the associated fs.Files disappeared.

After the globalDirentMap, Dirents *must* be reference-counted
correctly to be garbage collected, as Dirents remove themselves
from the global map when their refcount goes to -1 (see Dirent.destroy).
That removes the last pointer to that Dirent.

PiperOrigin-RevId: 196878973
Change-Id: Ic7afcd1de97c7101ccb13be5fc31de0fb50963f0
2018-05-16 13:29:17 -07:00
Brian Geffon f295e26b8a Release mutex in BidirectionalConnect to avoid deadlock.
When doing a BidirectionalConnect we don't need to continue holding
the ConnectingEndpoint's mutex when creating the NewConnectedEndpoint
as it was held during the Connect. Additionally, we're not holding
the baseEndpoint mutex while Unregistering an event.

PiperOrigin-RevId: 196875557
Change-Id: Ied4ceed89de883121c6cba81bc62aa3a8549b1e9
2018-05-16 13:07:12 -07:00
Adin Scannell 4b7e4f3d36 Fix KVM EFAULT handling.
PiperOrigin-RevId: 196781718
Change-Id: I889766eed871929cdc247c6b9aa634398adea9c9
2018-05-15 22:44:40 -07:00
Adin Scannell 00adea3a3f Simplify KVM invalidation logic.
PiperOrigin-RevId: 196780209
Change-Id: I89f39eec914ce54a7c6c4f28e1b6d5ff5a7dd38d
2018-05-15 22:21:36 -07:00
Adin Scannell 310a99228b Simplify KVM state handling.
This also removes the dependency on tmutex.

PiperOrigin-RevId: 196764317
Change-Id: I523fb67454318e1a2ca9da3a08e63bfa3c1eeed3
2018-05-15 18:34:09 -07:00
Kevin Krakauer 96c28a4368 sentry: Replaces saving of inet.Stack with retrieval via context.
Previously, inet.Stack was referenced in 2 structs in sentry/socket that can be
saved/restored.  If an app is saved and restored on another machine, it may try
to use the old stack, which will have been replaced by a new stack on the new
machine.

PiperOrigin-RevId: 196733985
Change-Id: I6a8cfe73b5d7a90749734677dada635ab3389cb9
2018-05-15 14:56:18 -07:00
Fabricio Voznika 9889c29d6d Fix problem with sendfile(2) writing less data
When the amount of data read is more than the amount written, sendfile would not
adjust 'in file' position and would resume from the wrong location.

Closes #33

PiperOrigin-RevId: 196731287
Change-Id: Ia219895dd765016ed9e571fd5b366963c99afb27
2018-05-15 14:39:20 -07:00
Nicolas Lacasse 205f1027e6 Refactor the Sandbox package into Sandbox + Container.
This is a necessary prerequisite for supporting multiple containers in a single
sandbox.

All the commands (in cmd package) now call operations on Containers (container
package). When a Container first starts, it will create a Sandbox with the same
ID.

The Sandbox class is now simpler, as it only knows how to create boot/gofer
processes, and how to forward commands into the running boot process.

There are TODOs sprinkled around for additional support for multiple
containers. Most notably, we need to detect when a container is intended to run
in an existing sandbox (by reading the metadata), and then have some way to
signal to the sandbox to start a new container. Other urpc calls into the
sandbox need to pass the container ID, so the sandbox can run the operation on
the given container. These are only half-plummed through right now.

PiperOrigin-RevId: 196688269
Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0
2018-05-15 10:18:03 -07:00
Adin Scannell ed02ac4f66 Disable INVPCID check; it's not used.
PiperOrigin-RevId: 196615029
Change-Id: Idfa383a9aee6a9397167a4231ce99d0b0e5b9912
2018-05-14 21:40:21 -07:00
Adin Scannell 2ab754cff7 Make KVM system call first check.
PiperOrigin-RevId: 196613447
Change-Id: Ib76902896798f072c3031b0c5cf7b433718928b7
2018-05-14 21:14:17 -07:00
Adin Scannell 825e9ea809 Simplify KVM host map handling.
PiperOrigin-RevId: 196611084
Change-Id: I6afa6b01e1dcd2aa9776dfc0f910874cc6b8d72c
2018-05-14 20:45:41 -07:00
Adin Scannell 17a0fa3af0 Ignore spurious KVM emulation failures.
PiperOrigin-RevId: 196609789
Change-Id: Ie261eea3b7fa05b6c348ca93e229de26cbd4dc7d
2018-05-14 20:27:21 -07:00
Kevin Krakauer 08879266fe sentry: Adds canonical mode support.
PiperOrigin-RevId: 196331627
Change-Id: Ifef4485f8202c52481af317cedd52d2ef48cea6a
2018-05-11 17:19:46 -07:00
Zhaozhong Ni 987f7841a6 netstack: TCP connecting state endpoint save / restore support.
PiperOrigin-RevId: 196325647
Change-Id: I850eb4a29b9c679da4db10eb164bbdf967690663
2018-05-11 16:28:39 -07:00
Zhaozhong Ni 85fd5d40ff netstack: release rcv lock after ping socket save is done.
PiperOrigin-RevId: 196324694
Change-Id: Ia3a48976433f21622eacb4a38fefe7143ca5e31b
2018-05-11 16:20:50 -07:00
Michael Pratt 8deabbaae1 Remove error return from AddressSpace.Release()
PiperOrigin-RevId: 196291289
Change-Id: Ie3487be029850b0b410b82416750853a6c4a2b00
2018-05-11 12:24:15 -07:00