492229d017
Updates #1198 Opening host pipes (by spinning in fdpipe) and host sockets is not yet complete, and will be done in a future CL. Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels of backportability: - "Cache policies" are replaced by InteropMode, which control the behavior of timestamps in addition to caching. Under InteropModeExclusive (analogous to cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough), client timestamps are *not* written back to the server (it is not possible in 9P or Linux for clients to set ctime, so writing back client-authoritative timestamps results in incoherence between atime/mtime and ctime). Under InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps are not used at all (remote filesystem clocks are authoritative). cacheNone is translated to InteropModeShared + new option filesystemOptions.specialRegularFiles. - Under InteropModeShared, "unstable attribute" reloading for permission checks, lookup, and revalidation are fused, which is feasible in VFS2 since gofer.filesystem controls path resolution. This results in a ~33% reduction in RPCs for filesystem operations compared to cacheRemoteRevalidating. For example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails revalidation, resulting in the instantiation of a new dentry: VFS1 RPCs: getattr("/") // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr() walkgetattr("/", "foo") = fid1 // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate() clunk(fid1) getattr("/foo") // CheckPermission walkgetattr("/foo", "bar") = fid2 // Revalidate clunk(fid2) getattr("/foo/bar") // CheckPermission walkgetattr("/foo/bar", "baz") = fid3 // Revalidate clunk(fid3) walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup getattr("/foo/bar/baz") // linux.stat() => gofer.inodeOperations.UnstableAttr() VFS2 RPCs: getattr("/") // gofer.filesystem.walkExistingLocked() walkgetattr("/", "foo") = fid1 // gofer.filesystem.stepExistingLocked() clunk(fid1) // No getattr: walkgetattr already updated metadata for permission check walkgetattr("/foo", "bar") = fid2 clunk(fid2) walkgetattr("/foo/bar", "baz") = fid3 // No clunk: fid3 used for new gofer.dentry // No getattr: walkgetattr already updated metadata for stat() - gofer.filesystem.unlinkAt() does not require instantiation of a dentry that represents the file to be deleted. Updates #898. - gofer.regularFileFD.OnClose() skips Tflushf for regular files under InteropModeExclusive, as it's nonsensical to request a remote file flush without flushing locally-buffered writes to that remote file first. - Symlink targets are cached when InteropModeShared is not in effect. - p9.QID.Path (which is already required to be unique for each file within a server, and is accordingly already synthesized from device/inode numbers in all known gofers) is used as-is for inode numbers, rather than being mapped along with attr.RDev in the client to yet another synthetic inode number. - Relevant parts of fsutil.CachingInodeOperations are inlined directly into gofer package code. This avoids having to duplicate part of its functionality in fsutil.HostMappable. PiperOrigin-RevId: 293190213 |
||
---|---|---|
.github | ||
benchmarks | ||
g3doc | ||
kokoro | ||
pkg | ||
runsc | ||
scripts | ||
test | ||
tools | ||
vdso | ||
.bazelrc | ||
.gitignore | ||
AUTHORS | ||
BUILD | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
Dockerfile | ||
LICENSE | ||
Makefile | ||
README.md | ||
SECURITY.md | ||
WORKSPACE | ||
go.mod | ||
go.sum |
README.md
What is gVisor?
gVisor is a user-space kernel, written in Go, that implements a substantial
portion of the Linux system surface. It includes an
Open Container Initiative (OCI) runtime called runsc
that provides an
isolation boundary between the application and the host kernel. The runsc
runtime integrates with Docker and Kubernetes, making it simple to run sandboxed
containers.
Why does gVisor exist?
Containers are not a sandbox. While containers have revolutionized how we develop, package, and deploy applications, running untrusted or potentially malicious code without additional isolation is not a good idea. The efficiency and performance gains from using a single, shared kernel also mean that container escape is possible with a single vulnerability.
gVisor is a user-space kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal user-space process. In other words, gVisor implements Linux by way of Linux.
gVisor should not be confused with technologies and tools to harden containers against external threats, provide additional integrity checks, or limit the scope of access for a service. One should always be careful about what data is made available to a container.
Documentation
User documentation and technical architecture, including quick start guides, can be found at gvisor.dev.
Installing from source
gVisor currently requires x86_64 Linux to build, though support for other architectures may become available in the future.
Requirements
Make sure the following dependencies are installed:
- Linux 4.14.77+ (older linux)
- git
- Bazel 1.2+
- Python
- Docker version 17.09.0 or greater
- C++ toolchain supporting C++17 (GCC 7+, Clang 5+)
- Gold linker (e.g.
binutils-gold
package on Ubuntu)
Building
Build and install the runsc
binary:
bazel build runsc
sudo cp ./bazel-bin/runsc/linux_amd64_pure_stripped/runsc /usr/local/bin
If you don't want to install bazel on your system, you can build runsc in a Docker container:
make runsc
sudo cp ./bazel-bin/runsc/linux_amd64_pure_stripped/runsc /usr/local/bin
Testing
The test suite can be run with Bazel:
bazel test //...
or in a Docker container:
make unit-tests
make tests
Using remote execution
If you have a Remote Build Execution environment, you can use it to speed up build and test cycles.
You must authenticate with the project first:
gcloud auth application-default login --no-launch-browser
Then invoke bazel with the following flags:
--config=remote
--project_id=$PROJECT
--remote_instance_name=projects/$PROJECT/instances/default_instance
You can also add those flags to your local ~/.bazelrc to avoid needing to specify them each time on the command line.
Using go get
This project uses bazel to build and manage dependencies. A synthetic
go
branch is maintained that is compatible with standard go
tooling for
convenience.
For example, to build runsc
directly from this branch:
echo "module runsc" > go.mod
GO111MODULE=on go get gvisor.dev/gvisor/runsc@go
CGO_ENABLED=0 GO111MODULE=on go install gvisor.dev/gvisor/runsc
Note that this branch is supported in a best effort capacity, and direct
development on this branch is not supported. Development should occur on the
master
branch, which is then reflected into the go
branch.
Community & Governance
The governance model is documented in our community repository.
The gvisor-users mailing list and gvisor-dev mailing list are good starting points for questions and discussion.
Security Policy
See SECURITY.md.
Contributing
See Contributing.md.