55c553ae8c
Package syncevent is intended to subsume ~all uses of channels in the sentry (including //pkg/waiter), as well as //pkg/sleep. Compared to channels: - Delivery of events to a syncevent.Receiver allows *synchronous* execution of an arbitrary callback, whereas delivery of events to a channel requires a goroutine to receive from that channel, resulting in substantial scheduling overhead. (This is also part of the motivation for the waiter package.) - syncevent.Waiter can wait on multiple event sources without the high O(N) overhead of select. (This is the same motivation as for the sleep package.) Compared to the waiter package: - syncevent.Waiters are intended to be persistent (i.e. per-kernel.Task), and syncevent.Broadcaster (analogous to waiter.Queue) is a hash table rather than a linked list, such that blocking is (usually) allocation-free. - syncevent.Source (analogous to waiter.Waitable) does not include an equivalent to waiter.Waitable.Readiness(), since this is inappropriate for transient events (see e.g. //pkg/sentry/kernel/time.ClockEventSource). Compared to the sleep package: - syncevent events are represented by bits in a bitmask rather than discrete sleep.Waker objects, reducing overhead and making it feasible to broadcast events to multiple syncevent.Receivers. - syncevent.Receiver invokes an arbitrary callback, which is required by the sentry's epoll implementation. (syncevent.Waiter, which is analogous to sleep.Sleeper, pairs a syncevent.Receiver with a callback that wakes a waiting goroutine; the implementation of this aspect is nearly identical to that of sleep.Sleeper, except that it represents *runtime.g as unsafe.Pointer rather than uintptr.) - syncevent.Waiter.Wait (analogous to sleep.Sleeper.Fetch(block=true)) does not automatically un-assert returned events. This is useful in cases where the path for handling an event is not the same as the path that observes it, such as for application signals (a la Linux's TIF_SIGPENDING). - Unlike sleep.Sleeper, which Fetches Wakers in the order that they were Asserted, the event bitmasks used by syncevent.Receiver have no way of preserving event arrival order. (This is similar to select, which goes out of its way to randomize event ordering.) The disadvantage of the syncevent package is that, since events are represented by bits in a uint64 bitmask, each syncevent.Receiver can "only" multiplex between 64 distinct events; this does not affect any known use case. Benchmarks: BenchmarkBroadcasterSubscribeUnsubscribe BenchmarkBroadcasterSubscribeUnsubscribe-12 45133884 26.3 ns/op BenchmarkMapSubscribeUnsubscribe BenchmarkMapSubscribeUnsubscribe-12 28504662 41.8 ns/op BenchmarkQueueSubscribeUnsubscribe BenchmarkQueueSubscribeUnsubscribe-12 22747668 45.6 ns/op BenchmarkBroadcasterSubscribeUnsubscribeBatch BenchmarkBroadcasterSubscribeUnsubscribeBatch-12 31609177 37.8 ns/op BenchmarkMapSubscribeUnsubscribeBatch BenchmarkMapSubscribeUnsubscribeBatch-12 17563906 62.1 ns/op BenchmarkQueueSubscribeUnsubscribeBatch BenchmarkQueueSubscribeUnsubscribeBatch-12 26248838 46.6 ns/op BenchmarkBroadcasterBroadcastRedundant BenchmarkBroadcasterBroadcastRedundant/0 BenchmarkBroadcasterBroadcastRedundant/0-12 100907563 11.8 ns/op BenchmarkBroadcasterBroadcastRedundant/1 BenchmarkBroadcasterBroadcastRedundant/1-12 85103068 13.3 ns/op BenchmarkBroadcasterBroadcastRedundant/4 BenchmarkBroadcasterBroadcastRedundant/4-12 52716502 22.3 ns/op BenchmarkBroadcasterBroadcastRedundant/16 BenchmarkBroadcasterBroadcastRedundant/16-12 20278165 58.7 ns/op BenchmarkBroadcasterBroadcastRedundant/64 BenchmarkBroadcasterBroadcastRedundant/64-12 5905428 205 ns/op BenchmarkMapBroadcastRedundant BenchmarkMapBroadcastRedundant/0 BenchmarkMapBroadcastRedundant/0-12 87532734 13.5 ns/op BenchmarkMapBroadcastRedundant/1 BenchmarkMapBroadcastRedundant/1-12 28488411 36.3 ns/op BenchmarkMapBroadcastRedundant/4 BenchmarkMapBroadcastRedundant/4-12 19628920 60.9 ns/op BenchmarkMapBroadcastRedundant/16 BenchmarkMapBroadcastRedundant/16-12 6026980 192 ns/op BenchmarkMapBroadcastRedundant/64 BenchmarkMapBroadcastRedundant/64-12 1640858 754 ns/op BenchmarkQueueBroadcastRedundant BenchmarkQueueBroadcastRedundant/0 BenchmarkQueueBroadcastRedundant/0-12 96904807 12.0 ns/op BenchmarkQueueBroadcastRedundant/1 BenchmarkQueueBroadcastRedundant/1-12 73521873 16.3 ns/op BenchmarkQueueBroadcastRedundant/4 BenchmarkQueueBroadcastRedundant/4-12 39209468 31.2 ns/op BenchmarkQueueBroadcastRedundant/16 BenchmarkQueueBroadcastRedundant/16-12 10810058 105 ns/op BenchmarkQueueBroadcastRedundant/64 BenchmarkQueueBroadcastRedundant/64-12 2998046 376 ns/op BenchmarkBroadcasterBroadcastAck BenchmarkBroadcasterBroadcastAck/1 BenchmarkBroadcasterBroadcastAck/1-12 44472397 26.4 ns/op BenchmarkBroadcasterBroadcastAck/4 BenchmarkBroadcasterBroadcastAck/4-12 17653509 69.7 ns/op BenchmarkBroadcasterBroadcastAck/16 BenchmarkBroadcasterBroadcastAck/16-12 4082617 260 ns/op BenchmarkBroadcasterBroadcastAck/64 BenchmarkBroadcasterBroadcastAck/64-12 1220534 1027 ns/op BenchmarkMapBroadcastAck BenchmarkMapBroadcastAck/1 BenchmarkMapBroadcastAck/1-12 26760705 44.2 ns/op BenchmarkMapBroadcastAck/4 BenchmarkMapBroadcastAck/4-12 11495636 100 ns/op BenchmarkMapBroadcastAck/16 BenchmarkMapBroadcastAck/16-12 2937590 343 ns/op BenchmarkMapBroadcastAck/64 BenchmarkMapBroadcastAck/64-12 861037 1344 ns/op BenchmarkQueueBroadcastAck BenchmarkQueueBroadcastAck/1 BenchmarkQueueBroadcastAck/1-12 19832679 55.0 ns/op BenchmarkQueueBroadcastAck/4 BenchmarkQueueBroadcastAck/4-12 5618214 189 ns/op BenchmarkQueueBroadcastAck/16 BenchmarkQueueBroadcastAck/16-12 1569980 713 ns/op BenchmarkQueueBroadcastAck/64 BenchmarkQueueBroadcastAck/64-12 437672 2814 ns/op BenchmarkWaiterNotifyRedundant BenchmarkWaiterNotifyRedundant-12 650823090 1.96 ns/op BenchmarkSleeperNotifyRedundant BenchmarkSleeperNotifyRedundant-12 619871544 1.61 ns/op BenchmarkChannelNotifyRedundant BenchmarkChannelNotifyRedundant-12 298903778 3.67 ns/op BenchmarkWaiterNotifyWaitAck BenchmarkWaiterNotifyWaitAck-12 68358360 17.8 ns/op BenchmarkSleeperNotifyWaitAck BenchmarkSleeperNotifyWaitAck-12 25044883 41.2 ns/op BenchmarkChannelNotifyWaitAck BenchmarkChannelNotifyWaitAck-12 29572404 40.2 ns/op BenchmarkSleeperMultiNotifyWaitAck BenchmarkSleeperMultiNotifyWaitAck-12 16122969 73.8 ns/op BenchmarkWaiterTempNotifyWaitAck BenchmarkWaiterTempNotifyWaitAck-12 46111489 25.8 ns/op BenchmarkSleeperTempNotifyWaitAck BenchmarkSleeperTempNotifyWaitAck-12 15541882 73.6 ns/op BenchmarkWaiterNotifyWaitMultiAck BenchmarkWaiterNotifyWaitMultiAck-12 65878500 18.2 ns/op BenchmarkSleeperNotifyWaitMultiAck BenchmarkSleeperNotifyWaitMultiAck-12 28798623 41.5 ns/op BenchmarkChannelNotifyWaitMultiAck BenchmarkChannelNotifyWaitMultiAck-12 11308468 101 ns/op BenchmarkWaiterNotifyAsyncWaitAck BenchmarkWaiterNotifyAsyncWaitAck-12 2475387 492 ns/op BenchmarkSleeperNotifyAsyncWaitAck BenchmarkSleeperNotifyAsyncWaitAck-12 2184507 518 ns/op BenchmarkChannelNotifyAsyncWaitAck BenchmarkChannelNotifyAsyncWaitAck-12 2120365 562 ns/op BenchmarkWaiterNotifyAsyncWaitMultiAck BenchmarkWaiterNotifyAsyncWaitMultiAck-12 2351247 494 ns/op BenchmarkSleeperNotifyAsyncWaitMultiAck BenchmarkSleeperNotifyAsyncWaitMultiAck-12 2205799 522 ns/op BenchmarkChannelNotifyAsyncWaitMultiAck BenchmarkChannelNotifyAsyncWaitMultiAck-12 1238079 928 ns/op Updates #1074 PiperOrigin-RevId: 295834087 |
||
---|---|---|
.github | ||
benchmarks | ||
g3doc | ||
kokoro | ||
pkg | ||
runsc | ||
scripts | ||
test | ||
tools | ||
vdso | ||
.bazelrc | ||
.gitignore | ||
AUTHORS | ||
BUILD | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
Dockerfile | ||
LICENSE | ||
Makefile | ||
README.md | ||
SECURITY.md | ||
WORKSPACE | ||
go.mod | ||
go.sum |
README.md
What is gVisor?
gVisor is a user-space kernel, written in Go, that implements a substantial
portion of the Linux system surface. It includes an
Open Container Initiative (OCI) runtime called runsc
that provides an
isolation boundary between the application and the host kernel. The runsc
runtime integrates with Docker and Kubernetes, making it simple to run sandboxed
containers.
Why does gVisor exist?
Containers are not a sandbox. While containers have revolutionized how we develop, package, and deploy applications, running untrusted or potentially malicious code without additional isolation is not a good idea. The efficiency and performance gains from using a single, shared kernel also mean that container escape is possible with a single vulnerability.
gVisor is a user-space kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal user-space process. In other words, gVisor implements Linux by way of Linux.
gVisor should not be confused with technologies and tools to harden containers against external threats, provide additional integrity checks, or limit the scope of access for a service. One should always be careful about what data is made available to a container.
Documentation
User documentation and technical architecture, including quick start guides, can be found at gvisor.dev.
Installing from source
gVisor currently requires x86_64 Linux to build, though support for other architectures may become available in the future.
Requirements
Make sure the following dependencies are installed:
- Linux 4.14.77+ (older linux)
- git
- Bazel 1.2+
- Python
- Docker version 17.09.0 or greater
- C++ toolchain supporting C++17 (GCC 7+, Clang 5+)
- Gold linker (e.g.
binutils-gold
package on Ubuntu)
Building
Build and install the runsc
binary:
bazel build runsc
sudo cp ./bazel-bin/runsc/linux_amd64_pure_stripped/runsc /usr/local/bin
If you don't want to install bazel on your system, you can build runsc in a Docker container:
make runsc
sudo cp ./bazel-bin/runsc/linux_amd64_pure_stripped/runsc /usr/local/bin
Testing
The test suite can be run with Bazel:
bazel test //...
or in a Docker container:
make unit-tests
make tests
Using remote execution
If you have a Remote Build Execution environment, you can use it to speed up build and test cycles.
You must authenticate with the project first:
gcloud auth application-default login --no-launch-browser
Then invoke bazel with the following flags:
--config=remote
--project_id=$PROJECT
--remote_instance_name=projects/$PROJECT/instances/default_instance
You can also add those flags to your local ~/.bazelrc to avoid needing to specify them each time on the command line.
Using go get
This project uses bazel to build and manage dependencies. A synthetic
go
branch is maintained that is compatible with standard go
tooling for
convenience.
For example, to build runsc
directly from this branch:
echo "module runsc" > go.mod
GO111MODULE=on go get gvisor.dev/gvisor/runsc@go
CGO_ENABLED=0 GO111MODULE=on go install gvisor.dev/gvisor/runsc
Note that this branch is supported in a best effort capacity, and direct
development on this branch is not supported. Development should occur on the
master
branch, which is then reflected into the go
branch.
Community & Governance
The governance model is documented in our community repository.
The gvisor-users mailing list and gvisor-dev mailing list are good starting points for questions and discussion.
Security Policy
See SECURITY.md.
Contributing
See Contributing.md.