gvisor

Go to file

Jamie Liu 163ab5e9ba Sentry virtual filesystem, v2 Major differences from the current ("v1") sentry VFS: - Path resolution is Filesystem-driven (FilesystemImpl methods call vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a Dirent tree and calls fs.InodeOperations methods to populate it). This drastically improves performance, primarily by reducing overhead from inefficient synchronization and indirection. It also makes it possible to implement remote filesystem protocols that translate FS system calls into single RPCs, rather than having to make (at least) one RPC per path component, significantly reducing the latency of remote filesystems (especially during cold starts and for uncacheable shared filesystems). - Mounts are correctly represented as a separate check based on contextual state (current mount) rather than direct replacement in a fs.Dirent tree. This makes it possible to support (non-recursive) bind mounts and mount namespaces. Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem that exists primarily to demonstrate intended filesystem implementation patterns and for benchmarking: BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op From this we can infer that, on this machine: - Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1. - Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a difference of about 6x. - The cost of crossing a mount boundary is about 80ns in VFS2 (MemfsMountStat/1 does approximately the same amount of work as MemfsStat/2, except that it also crosses a mount boundary). This is an inescapable cost of the separate mount lookup needed to support bind mounts and mount namespaces. PiperOrigin-RevId: 258853946		2019-07-18 15:10:29 -07:00
.github	Update CONTRIBUTING.md	2019-05-30 12:09:10 -07:00
cloudbuild	Allow specification of origin in cloudbuild.	2019-06-03 18:05:59 -07:00
g3doc	Merge pull request #306 from amscanne:add_readme	2019-06-13 17:20:49 -07:00
kokoro	Fix Kokoro revision and 'go get usage'	2019-06-04 11:07:27 -07:00
pkg	Sentry virtual filesystem, v2	2019-07-18 15:10:29 -07:00
runsc	test/integration: wait a background process	2019-07-16 15:06:17 -07:00
test	Add AF_UNIX, SOCK_RAW sockets, which exist for some reason.	2019-07-17 11:49:16 -07:00
third_party/gvsync	build: add nogo for static validation	2019-07-09 16:44:06 -07:00
tools	Merge pull request #504 from matthyx:master	2019-07-17 15:32:59 -07:00
vdso	Fix various spelling issues in the documentation	2019-06-27 14:25:50 -07:00
.bazelrc	Update straggling copyright holder	2019-06-03 12:51:55 -07:00
.gitignore	Add .gitignore	2018-05-01 09:37:49 -04:00
AUTHORS	Change copyright notice to "The gVisor Authors"	2019-04-29 14:26:23 -07:00
BUILD	build: add nogo for static validation	2019-07-09 16:44:06 -07:00
CODE_OF_CONDUCT.md	Adds Code of Conduct	2018-12-14 18:13:52 -08:00
CONTRIBUTING.md	Merge pull request #350 from kshithijiyer:patch-1	2019-07-12 16:15:51 -07:00
Dockerfile	gvisor/bazel: use python2 to build runsc-debian	2019-06-17 17:09:06 -07:00
LICENSE	Check in gVisor.	2018-04-28 01:44:26 -04:00
Makefile	gvisor: run bazel in a docker container	2019-05-03 14:13:08 -07:00
README.md	Update canonical repository.	2019-06-13 16:50:15 -07:00
WORKSPACE	Bump rules_go to v0.18.7 and go toolchain to v1.12.7.	2019-07-11 16:20:43 -07:00
go.mod	Update canonical repository.	2019-06-13 16:50:15 -07:00
go.sum	Update canonical repository.	2019-06-13 16:50:15 -07:00

README.md

What is gVisor?

gVisor is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface. It includes an Open Container Initiative (OCI) runtime called runsc that provides an isolation boundary between the application and the host kernel. The runsc runtime integrates with Docker and Kubernetes, making it simple to run sandboxed containers.

Why does gVisor exist?

Containers are not a sandbox. While containers have revolutionized how we develop, package, and deploy applications, running untrusted or potentially malicious code without additional isolation is not a good idea. The efficiency and performance gains from using a single, shared kernel also mean that container escape is possible with a single vulnerability.

gVisor is a user-space kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal user-space process. In other words, gVisor implements Linux by way of Linux.

gVisor should not be confused with technologies and tools to harden containers against external threats, provide additional integrity checks, or limit the scope of access for a service. One should always be careful about what data is made available to a container.

Documentation

User documentation and technical architecture, including quick start guides, can be found at gvisor.dev.

Installing from source

gVisor currently requires x86_64 Linux to build, though support for other architectures may become available in the future.

Requirements

Make sure the following dependencies are installed:

Linux 4.14.77+ (older linux)
git
Bazel 0.23.0+
Python
Docker version 17.09.0 or greater
Gold linker (e.g. binutils-gold package on Ubuntu)

Building

Build and install the runsc binary:

bazel build runsc
sudo cp ./bazel-bin/runsc/linux_amd64_pure_stripped/runsc /usr/local/bin

If you don't want to install bazel on your system, you can build runsc in a Docker container:

make runsc
sudo cp ./bazel-bin/runsc/linux_amd64_pure_stripped/runsc /usr/local/bin

Testing

The test suite can be run with Bazel:

bazel test //...

or in a Docker container:

make unit-tests
make tests

Using remote execution

If you have a Remote Build Execution environment, you can use it to speed up build and test cycles.

You must authenticate with the project first:

gcloud auth application-default login --no-launch-browser

Then invoke bazel with the following flags:

--config=remote
--project_id=$PROJECT
--remote_instance_name=projects/$PROJECT/instances/default_instance

You can also add those flags to your local ~/.bazelrc to avoid needing to specify them each time on the command line.

Using `go get`

This project uses bazel to build and manage dependencies. A synthetic go branch is maintained that is compatible with standard go tooling for convenience.

For example, to build runsc directly from this branch:

echo "module runsc" > go.mod
GO111MODULE=on go get gvisor.dev/gvisor/runsc@go
CGO_ENABLED=0 GO111MODULE=on go install gvisor.dev/gvisor/runsc

Note that this branch is supported in a best effort capacity, and direct development on this branch is not supported. Development should occur on the master branch, which is then reflected into the go branch.

Community & Governance

The governance model is documented in our community repository.

The gvisor-users mailing list and gvisor-dev mailing list are good starting points for questions and discussion.

Security

Sensitive security-related questions, comments and disclosures can be sent to the gvisor-security mailing list. The full security disclosure policy is defined in the community repository.

Contributing

See Contributing.md.