Commit Graph

957 Commits

Author SHA1 Message Date
Chanwit Kaewkasi 7b2b7a3946 Change length type, and let fadvise64 return ESPIPE if file is a pipe
Kernel before 2.6.16 return EINVAL, but later return ESPIPE for this case.
Also change type of "length" from Uint(uint32) to Int64.
Because C header uses type "size_t" (unsigned long) or "off_t" (long) for length.
And it makes more sense to check length < 0 with Int64 because Uint cannot be negative.

Change-Id: Ifd7fea2dcded7577a30760558d0d31f479f074c4
PiperOrigin-RevId: 197616743
2018-05-22 13:48:14 -07:00
Kevin Krakauer 705605f901 sentry: Add simple SIOCGIFFLAGS support (IFF_RUNNING and IFF_PROMIS).
Establishes a way of communicating interface flags between netstack and
epsocket. More flags can be added over time.

PiperOrigin-RevId: 197616669
Change-Id: I230448c5fb5b7d2e8d69b41a451eb4e1096a0e30
2018-05-22 13:47:33 -07:00
Ian Gudger 3a6070dc98 Clarify that syserr.New must only be called during init
PiperOrigin-RevId: 197599402
Change-Id: I23eb0336195ab0d3e5fb49c0c57fc9e0715a9b75
2018-05-22 11:54:31 -07:00
Fabricio Voznika ed2b86a549 Fix test failure when user can't mount temp dir
PiperOrigin-RevId: 197491098
Change-Id: Ifb75bd4e4f41b84256b6d7afc4b157f6ce3839f3
2018-05-21 17:48:04 -07:00
Adin Scannell 61b0b19497 Dramatically improve handling of KVM vCPU pool.
Especially in situations with small numbers of vCPUs, the existing
system resulted in excessive thrashing. Now, execution contexts
co-ordinate as smoothly as they can to share a small number of cores.

PiperOrigin-RevId: 197483323
Change-Id: I0afc0c5363ea9386994355baf3904bf5fe08c56c
2018-05-21 16:49:40 -07:00
Kevin Krakauer d4c81b7a21 sentry: Get "ip link" working.
In Linux, many UDS ioctls are passed through to the NIC driver. We do the same
here, passing ioctl calls to Unix sockets through to epsocket.

In Linux you can see this path at net/socket.c:sock_ioctl, which calls
sock_do_ioctl, which calls net/core/dev_ioctl.c:dev_ioctl.

SIOCGIFNAME is also added.

PiperOrigin-RevId: 197167508
Change-Id: I62c326a4792bd0a473e9c9108aafb6a6354f2b64
2018-05-18 10:43:41 -07:00
Fabricio Voznika a1e5862f3c Move postgres to list of supported images
PiperOrigin-RevId: 197104043
Change-Id: I377c0727ebf0c44361ed221e1b197787825bfb7b
2018-05-17 23:22:40 -07:00
Michael Pratt b960559fdb Cleanup docs
This brings the proc document more up-to-date.

PiperOrigin-RevId: 197070161
Change-Id: Iae2cf9dc44e3e748a33f497bb95bd3c10d0c094a
2018-05-17 16:26:42 -07:00
Rahat Mahmood b904250b86 Fix capability check for sysv semaphores.
Capabilities for sysv sem operations were being checked against the
current task's user namespace. They should be checked against the user
namespace owning the ipc namespace for the sems instead, per
ipc/util.c:ipcperms().

PiperOrigin-RevId: 197063111
Change-Id: Iba29486b316f2e01ee331dda4e48a6ab7960d589
2018-05-17 15:38:11 -07:00
Rahat Mahmood 8878a66a56 Implement sysv shm.
PiperOrigin-RevId: 197058289
Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
2018-05-17 15:06:19 -07:00
Ian Gudger a8d7cee3e8 Fix sendto for dual stack UDP sockets
Previously, dual stack UDP sockets bound to an IPv4 address could not use
sendto to communicate with IPv4 addresses. Further, dual stack UDP sockets
bound to an IPv6 address could use sendto to communicate with IPv4 addresses.
Neither of these behaviors are consistent with Linux.

PiperOrigin-RevId: 197036024
Change-Id: Ic3713efc569f26196e35bb41e6ad63f23675fc90
2018-05-17 12:50:22 -07:00
Nicolas Lacasse 31386185fe Push signal-delivery and wait into the sandbox.
This is another step towards multi-container support.

Previously, we delivered signals directly to the sandbox process (which then
forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a
container by waiting on the sandbox process itself. This approach will not work
when there are multiple containers inside the sandbox, and we need to
signal/wait on individual containers.

This CL adds two new messages, ContainerSignal and ContainerWait. These
messages include the id of the container to signal/wait. The controller inside
the sandbox receives these messages and signals/waits on the appropriate
process inside the sandbox.

The container id is plumbed into the sandbox, but it currently is not used. We
still end up signaling/waiting on PID 1 in all cases.  Once we actually have
multiple containers inside the sandbox, we will need to keep some sort of map
of container id -> pid (or possibly pid namespace), and signal/kill the
appropriate process for the container.

PiperOrigin-RevId: 197028366
Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-05-17 11:55:28 -07:00
Christopher Koch 8e1deb2ab8 Fix another socket Dirent refcount.
PiperOrigin-RevId: 196893452
Change-Id: I5ea0f851fcabc5eac5859e61f15213323d996337
2018-05-16 14:54:48 -07:00
Chanwit Kaewkasi 3131a6b131 Verify that when offset address is not null, infile must be seekable
Change-Id: Id247399baeac58f6cd774acabd5d1da05e5b5697
PiperOrigin-RevId: 196887768
2018-05-16 14:20:24 -07:00
Zhaozhong Ni 5b4c20e1b8 netstack: make TCP endpoint closed and error state cleanup work synchronous.
So that when saving TCP endpoint in these states, there is no pending or
background activities.

Also lift tcp network save rejection error to tcpip package.

PiperOrigin-RevId: 196886839
Change-Id: I0fe73750f2743ec7e62d139eb2cec758c5dd6698
2018-05-16 14:15:24 -07:00
Christopher Koch d154c6a25f Refcount socket Dirents correctly.
This should fix the socket Dirent memory leak.

fs.NewFile takes a new reference. It should hold the *only* reference.
DecRef that socket Dirent.

Before the globalDirentMap was introduced, a mis-refcounted Dirent
would be garbage collected when all references to it were gone. For
socket Dirents, this meant that they would be garbage collected when
the associated fs.Files disappeared.

After the globalDirentMap, Dirents *must* be reference-counted
correctly to be garbage collected, as Dirents remove themselves
from the global map when their refcount goes to -1 (see Dirent.destroy).
That removes the last pointer to that Dirent.

PiperOrigin-RevId: 196878973
Change-Id: Ic7afcd1de97c7101ccb13be5fc31de0fb50963f0
2018-05-16 13:29:17 -07:00
Brian Geffon f295e26b8a Release mutex in BidirectionalConnect to avoid deadlock.
When doing a BidirectionalConnect we don't need to continue holding
the ConnectingEndpoint's mutex when creating the NewConnectedEndpoint
as it was held during the Connect. Additionally, we're not holding
the baseEndpoint mutex while Unregistering an event.

PiperOrigin-RevId: 196875557
Change-Id: Ied4ceed89de883121c6cba81bc62aa3a8549b1e9
2018-05-16 13:07:12 -07:00
Adin Scannell 4b7e4f3d36 Fix KVM EFAULT handling.
PiperOrigin-RevId: 196781718
Change-Id: I889766eed871929cdc247c6b9aa634398adea9c9
2018-05-15 22:44:40 -07:00
Adin Scannell 00adea3a3f Simplify KVM invalidation logic.
PiperOrigin-RevId: 196780209
Change-Id: I89f39eec914ce54a7c6c4f28e1b6d5ff5a7dd38d
2018-05-15 22:21:36 -07:00
Adin Scannell 310a99228b Simplify KVM state handling.
This also removes the dependency on tmutex.

PiperOrigin-RevId: 196764317
Change-Id: I523fb67454318e1a2ca9da3a08e63bfa3c1eeed3
2018-05-15 18:34:09 -07:00
Kevin Krakauer 96c28a4368 sentry: Replaces saving of inet.Stack with retrieval via context.
Previously, inet.Stack was referenced in 2 structs in sentry/socket that can be
saved/restored.  If an app is saved and restored on another machine, it may try
to use the old stack, which will have been replaced by a new stack on the new
machine.

PiperOrigin-RevId: 196733985
Change-Id: I6a8cfe73b5d7a90749734677dada635ab3389cb9
2018-05-15 14:56:18 -07:00
Fabricio Voznika 9889c29d6d Fix problem with sendfile(2) writing less data
When the amount of data read is more than the amount written, sendfile would not
adjust 'in file' position and would resume from the wrong location.

Closes #33

PiperOrigin-RevId: 196731287
Change-Id: Ia219895dd765016ed9e571fd5b366963c99afb27
2018-05-15 14:39:20 -07:00
Nicolas Lacasse 205f1027e6 Refactor the Sandbox package into Sandbox + Container.
This is a necessary prerequisite for supporting multiple containers in a single
sandbox.

All the commands (in cmd package) now call operations on Containers (container
package). When a Container first starts, it will create a Sandbox with the same
ID.

The Sandbox class is now simpler, as it only knows how to create boot/gofer
processes, and how to forward commands into the running boot process.

There are TODOs sprinkled around for additional support for multiple
containers. Most notably, we need to detect when a container is intended to run
in an existing sandbox (by reading the metadata), and then have some way to
signal to the sandbox to start a new container. Other urpc calls into the
sandbox need to pass the container ID, so the sandbox can run the operation on
the given container. These are only half-plummed through right now.

PiperOrigin-RevId: 196688269
Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0
2018-05-15 10:18:03 -07:00
Adin Scannell ed02ac4f66 Disable INVPCID check; it's not used.
PiperOrigin-RevId: 196615029
Change-Id: Idfa383a9aee6a9397167a4231ce99d0b0e5b9912
2018-05-14 21:40:21 -07:00
Adin Scannell 2ab754cff7 Make KVM system call first check.
PiperOrigin-RevId: 196613447
Change-Id: Ib76902896798f072c3031b0c5cf7b433718928b7
2018-05-14 21:14:17 -07:00
Adin Scannell 825e9ea809 Simplify KVM host map handling.
PiperOrigin-RevId: 196611084
Change-Id: I6afa6b01e1dcd2aa9776dfc0f910874cc6b8d72c
2018-05-14 20:45:41 -07:00
Adin Scannell 17a0fa3af0 Ignore spurious KVM emulation failures.
PiperOrigin-RevId: 196609789
Change-Id: Ie261eea3b7fa05b6c348ca93e229de26cbd4dc7d
2018-05-14 20:27:21 -07:00
Kevin Krakauer 08879266fe sentry: Adds canonical mode support.
PiperOrigin-RevId: 196331627
Change-Id: Ifef4485f8202c52481af317cedd52d2ef48cea6a
2018-05-11 17:19:46 -07:00
Zhaozhong Ni 987f7841a6 netstack: TCP connecting state endpoint save / restore support.
PiperOrigin-RevId: 196325647
Change-Id: I850eb4a29b9c679da4db10eb164bbdf967690663
2018-05-11 16:28:39 -07:00
Zhaozhong Ni 85fd5d40ff netstack: release rcv lock after ping socket save is done.
PiperOrigin-RevId: 196324694
Change-Id: Ia3a48976433f21622eacb4a38fefe7143ca5e31b
2018-05-11 16:20:50 -07:00
Michael Pratt 8deabbaae1 Remove error return from AddressSpace.Release()
PiperOrigin-RevId: 196291289
Change-Id: Ie3487be029850b0b410b82416750853a6c4a2b00
2018-05-11 12:24:15 -07:00
Jamie Liu 12c161f278 Implement MAP_32BIT.
PiperOrigin-RevId: 196281052
Change-Id: Ie620a0f983a1bf2570d0003d4754611879335c1c
2018-05-11 11:18:31 -07:00
Nicolas Lacasse f24db99498 Update README to point to nightly builds.
The "install from source" section is moved under "advanced" header, right
before the testing section.

PiperOrigin-RevId: 196271666
Change-Id: I653ac0a2fa4661c96a0cb3daf3528c2109fed8d7
2018-05-11 10:23:41 -07:00
Fabricio Voznika 7cff8489de Fix failure to rename directory
os.Rename validates that the target doesn't exist, which is different from
syscall.Rename which replace the target if both are directories. fsgofer needs
the syscall behavior.

PiperOrigin-RevId: 196194630
Change-Id: I87d08cad88b5ef310b245cd91647c4f5194159d8
2018-05-10 17:13:10 -07:00
Chanwit Kaewkasi 7b6111b695 Display the current git revision in the info block
Change-Id: I9737cc680968033ba82c95bb04cc482fcaa12642
PiperOrigin-RevId: 196192683
2018-05-10 16:57:41 -07:00
Fabricio Voznika ac01f245ff Skip atime and mtime update when file is backed by host FD
When file is backed by host FD, atime and mtime for the host file and the
cached attributes in the Sentry must be close together. In this case,
the call to update atime and mtime can be skipped. This is important when
host filesystem is using overlay because updating atime and mtime explicitly
forces a copy up for every file that is touched.

PiperOrigin-RevId: 196176413
Change-Id: I3933ea91637a071ba2ea9db9d8ac7cdba5dc0482
2018-05-10 14:59:40 -07:00
Fabricio Voznika 31a4fefbe0 Make cachePolicy int to avoid string comparison
PiperOrigin-RevId: 196157086
Change-Id: Ia7f7ffe1bf486b21ef8091e2e8ef9a9faf733dfc
2018-05-10 12:47:15 -07:00
Nicolas Lacasse 9d91c44d77 Fix nightly release upload path.
The "nightly/latest" was duplicated.

PiperOrigin-RevId: 196156453
Change-Id: Iccac65d870f3eb44c4bd97bcbed5cc436cb1d3c9
2018-05-10 12:42:17 -07:00
Fabricio Voznika 5a509c47a2 Open file as read-write when mount points to a file
This is to allow files mapped directly, like /etc/hosts, to be writable.
Closes #40

PiperOrigin-RevId: 196155920
Change-Id: Id2027e421cef5f94a0951c3e18b398a77c285bbd
2018-05-10 12:38:36 -07:00
Nicolas Lacasse 2d3c6dc2ef Upload the nightly release to a "nightly/latest" bucket for easy download.
We also upload to a path with the current date, so that previous builds are
archived. Since these builds only include the date (and not time) their links
are somewhat discoverable as well.

PiperOrigin-RevId: 196147475
Change-Id: I54792d7a4ba2a7af24a51cd9b9f153c7744b310b
2018-05-10 11:40:29 -07:00
Nicolas Lacasse e2720f91dc Put the http dependencies first in the WORKSPACE file.
PiperOrigin-RevId: 196131690
Change-Id: I3a4eec0dcca654380ea229e3ae388ca416200110
2018-05-10 10:07:14 -07:00
Nicolas Lacasse 3271d549f0 Build nightly runsc releases with Kokoro.
PiperOrigin-RevId: 196129010
Change-Id: I655eb3eecf24ffff475b3882ec55a8b55e6d2f36
2018-05-10 09:48:50 -07:00
Nicolas Lacasse 0ca020dcb3 Use the go_repository rule from the Gazelle repo.
The one from rules_go is being deprecated.

PiperOrigin-RevId: 196128132
Change-Id: I7a4ab32696a1bcd221b0585b7a4e8109462a3609
2018-05-10 09:41:58 -07:00
Nicolas Lacasse c97f0978b7 Cache symlinks in addition to files and directories.
PiperOrigin-RevId: 196051326
Change-Id: I4195b110e9a7d38d1ce1ed9c613971dea1be3bf0
2018-05-09 16:58:21 -07:00
Nicolas Lacasse b3bfb24991 Small readme tweak.
Change-Id: Ibbb94cfd901d72d879657aca38bf3db1580f0d62
PiperOrigin-RevId: 196043734
2018-05-09 16:01:24 -07:00
Fabricio Voznika 4453b56bd9 Increment link count in CreateHardlink
Closes #28

PiperOrigin-RevId: 196041391
Change-Id: I5d79f1735b9d72744e8bebc6897002b27df9aa7a
2018-05-09 15:44:26 -07:00
Nicolas Lacasse 1bdec86bae Return better errors from Docker when runsc fails to start.
Two changes in this CL:

First, make the "boot" process sleep when it encounters an error to give the
controller time to send the error back to the "start" process. Otherwise the
"boot" process exits immediately and the control connection errors with EOF.

Secondly, open the log file with O_APPEND, not O_TRUNC. Docker uses the same
log file for all runtime commands, and setting O_TRUNC causes them to get
destroyed. Furthermore, containerd parses these log files in the event of an
error, and it does not like the file being truncated out from underneath it.

Now, when trying to run a binary that does not exist in the image, the error
message is more reasonable:

$ docker run alpine /not/found
docker: Error response from daemon: OCI runtime start failed: /usr/local/google/docker/runtimes/runscd did not terminate sucessfully: error starting sandbox: error starting application [/not/found]: failed to create init process: no such file or directory

Fixes #32

PiperOrigin-RevId: 196027084
Change-Id: Iabc24c0bdd8fc327237acc051a1655515f445e68
2018-05-09 14:13:37 -07:00
Googler 5ed969aff0 Internal change.
PiperOrigin-RevId: 195980843
Change-Id: I066f9696b69e92e144c2c8d2c2aa52c546df94fb
2018-05-09 09:21:25 -07:00
Zhaozhong Ni ad278d6944 state: serialize string as bytes instead of protobuf string.
Protobuf strings have to be UTF-8 encoded or 7-bit ASCII.

PiperOrigin-RevId: 195902557
Change-Id: I9800afd47ecfa6615e28a2cce7f2532f04f10763
2018-05-08 17:23:50 -07:00
Jamie Liu 10a2cfc6a9 Implement /proc/[pid]/statm.
PiperOrigin-RevId: 195893391
Change-Id: I645b7042d7f4f9dd54723afde3e5df0986e43160
2018-05-08 16:14:48 -07:00