Commit Graph

141 Commits

Author SHA1 Message Date
Adin Scannell 364ac92baf Support for saving pointers to fields in the state package.
Previously, it was not possible to encode/decode an object graph which
contained a pointer to a field within another type. This was because the
encoder was previously unable to disambiguate a pointer to an object and a
pointer within the object.

This CL remedies this by constructing an address map tracking the full memory
range object occupy. The encoded Refvalue message has been extended to allow
references to children objects within another object. Because the encoding
process may learn about object structure over time, we cannot encode any
objects under the entire graph has been generated.

This CL also updates the state package to use standard interfaces intead of
reflection-based dispatch in order to improve performance overall. This
includes a custom wire protocol to significantly reduce the number of
allocations and take advantage of structure packing.

As part of these changes, there are a small number of minor changes in other
places of the code base:

* The lists used during encoding are changed to use intrusive lists with the
  objectEncodeState directly, which required that the ilist Len() method is
  updated to work properly with the ElementMapper mechanism.

* A bug is fixed in the list code wherein Remove() called on an element that is
  already removed can corrupt the list (removing the element if there's only a
  single element). Now the behavior is correct.

* Standard error wrapping is introduced.

* Compressio was updated to implement the new wire.Reader and wire.Writer
  inteface methods directly. The lack of a ReadByte and WriteByte caused issues
  not due to interface dispatch, but because underlying slices for a Read or
  Write call through an interface would always escape to the heap!

* Statify has been updated to support the new APIs.

See README.md for a description of how the new mechanism works.

PiperOrigin-RevId: 318010298
2020-06-23 23:34:06 -07:00
Fabricio Voznika 0ae5bd24d7 Mount root and volumes as read-only if --overlay is enabled
PiperOrigin-RevId: 315583963
2020-06-09 16:31:38 -07:00
Michael Pratt 12f74bd6f6 Include runtime goroutines in panics
SetTraceback("all") does not include all goroutines in panics (you didn't think
it was that simple, did you?). It includes all _user_ goroutines; those started
by the runtime (such as GC workers) are excluded.

Switch to "system" to additionally include runtime goroutines, which are useful
to track down bugs in the runtime itself.

PiperOrigin-RevId: 314204473
2020-06-01 14:32:19 -07:00
Fabricio Voznika 16100d18cb Make gofer mount readonly when overlay is enabled
No writes are expected to the underlying filesystem when
using --overlay.

PiperOrigin-RevId: 314171457
2020-06-01 11:44:32 -07:00
Mikael Mello 9e8000e9fb
Add cwd option to spec cmd 2020-05-24 17:44:03 -03:00
Adin Scannell 420b791a3d Minor formatting updates for gvisor.dev.
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense
  in one place.

* Drop "user-space kernel" and use "application kernel". The term "user-space
  kernel" is confusing when some platform implementation do not run in
  user-space (instead running in guest ring zero).

* Clear up the relationship between the Platform page in the user guide and the
  Platform page in the architecture guide, and ensure they are cross-linked.

* Restore the call-to-action quick start link in the main page, and drop the
  GitHub link (which also appears in the top-right).

* Improve image formatting by centering all doc and blog images, and move the
  image captions to the alt text.

PiperOrigin-RevId: 311845158
2020-05-15 20:05:18 -07:00
Adin Scannell 279f1eb7ab Fix runsc syscall documentation generation.
We can register any number of tables with any number of architectures, and
need not limit the definitions to the architecture in question. This allows
runsc to generate documentation for all architectures simultaneously.

Similarly, this simplifies the VFSv2 patching process.

PiperOrigin-RevId: 310224827
2020-05-06 14:13:48 -07:00
Michael Pratt 147c8ba1f7 runsc: extend do network cleanup
Previously we unconditionally failed to cleanup the networking files
(hostname, resolve.conf, hosts), and failed to cleanup the netns, etc on
partial setup failure.

We can drop the iptables commands from cleanup, as the routes
automatically go away when the device is deleted. Those commands were
failing previously.

Forward signals to the container, allowing it to exit normally when a
signal is received, and then for runsc to run the cleanup. This doesn't
cover cleanup when runsc is signalled before the container start, it
covers the most common case.

Fixes #2539
Fixes #2540
2020-04-27 16:36:07 -04:00
Adin Scannell 1481499fe2 Simplify Docker test infrastructure.
This change adds a layer of abstraction around the internal Docker APIs,
and eliminates all direct dependencies on Dockerfiles in the infrastructure.

A subsequent change will automated the generation of local images (with
efficient caching). Note that this change drops the use of bazel container
rules, as that experiment does not seem to be viable.

PiperOrigin-RevId: 308095430
2020-04-23 11:33:30 -07:00
Andrei Vagin 0c586946ea Specify a memory file in platform.New().
PiperOrigin-RevId: 307941984
2020-04-22 17:50:10 -07:00
Fabricio Voznika a80cd43023 Add test name to boot and gofer log files
This is to make easier to find corresponding logs in
case test fails.

PiperOrigin-RevId: 307104283
2020-04-17 13:28:54 -07:00
Fabricio Voznika 6dd5a1f3fe Clean up TODOs
PiperOrigin-RevId: 305592245
2020-04-08 17:58:13 -07:00
Fabricio Voznika 069f1edbe4 Improve error message when pivot_root fails
PiperOrigin-RevId: 301949722
2020-03-19 20:18:03 -07:00
Fabricio Voznika f2e4b5ab93 Kill sandbox process when parent process terminates
When the sandbox runs in attached more, e.g. runsc do, runsc run, the
sandbox lifetime is controlled by the parent process. This wasn't working
in all cases because PR_GET_PDEATHSIG doesn't propagate through execve
when the process changes uid/gid. So it was getting dropped when the
sandbox execve's to change to user nobody.

PiperOrigin-RevId: 300601247
2020-03-12 12:32:26 -07:00
moricho d8ed784311 add profile option 2020-02-26 16:49:51 +09:00
Adin Scannell ec5630527b Add statefile command to runsc.
PiperOrigin-RevId: 296105337
2020-02-19 18:28:42 -08:00
Adin Scannell 3e8b38d08b Add flag package to limit visibility.
PiperOrigin-RevId: 294297004
2020-02-10 13:57:01 -08:00
Adin Scannell 90ec596166 Fix licenses.
The preferred Copyright holder is "The gVisor Authors".

PiperOrigin-RevId: 291786657
2020-01-27 13:23:57 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Andrei Vagin f8c5ad061b runsc/debug: add an option to list all processes
runsc debug --ps list all processes with all threads. This option is added to
the debug command but not to the ps command, because it is going to be used for
debug purposes and we want to add any useful information without thinking about
backward compatibility.

This will help to investigate syzkaller issues.

PiperOrigin-RevId: 285013668
2019-12-11 11:05:41 -08:00
Adin Scannell 371e210b83 Add runtime tracing.
This adds meaningful annotations to the trace generated by the runtime/trace
package.

PiperOrigin-RevId: 284290115
2019-12-06 17:00:07 -08:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Davor Kapsa fec0663bb7
Set base to root 2019-10-11 06:38:26 +02:00
Davor Kapsa 53960d48c7
Remove unnecessary assignment to path 2019-10-10 23:06:17 +02:00
Fabricio Voznika 0b02c3d5e5 Prevent CAP_NET_RAW from appearing in exec
'docker exec' was getting CAP_NET_RAW even when --net-raw=false
because it was not filtered out from when copying container's
capabilities.

PiperOrigin-RevId: 272260451
2019-10-01 11:49:49 -07:00
gVisor bot dd0e5eedae Merge pull request #765 from trailofbits:uds_support
PiperOrigin-RevId: 271235134
2019-09-25 16:44:22 -07:00
Robert Tonic 7810b30983 Refactor command line options and remove the allowed terminology for uds 2019-09-24 18:24:10 -04:00
Nicolas Lacasse f2ea8e6b24 Always set HOME env var with `runsc exec`.
We already do this for `runsc run`, but need to do the same for `runsc exec`.

PiperOrigin-RevId: 270793459
2019-09-23 17:06:02 -07:00
Robert Tonic 46beb91912 Fix documentation, clean up seccomp filter installation, rename helpers.
Filter installation has been streamlined and functions renamed. 
Documentation has been fixed to be standards compliant, and missing 
documentation added. gofmt has also been applied to modified files.
2019-09-19 17:10:50 -04:00
Robert Tonic ac38a7ead0 Place the host UDS mounting behind --fsgofer-host-uds-allowed.
This commit allows the use of the `--fsgofer-host-uds-allowed` flag to 
enable mounting sockets and add the appropriate seccomp filters.
2019-09-19 12:37:15 -04:00
Adin Scannell 67a2ab1438 Impose order on test scripts.
The simple test script has gotten out of control. Shard this script into
different pieces and attempt to impose order on overall test structure. This
change helps lay some of the foundations for future improvements.

 * The runsc/test directories are moved into just test/.
 * The runsc/test/testutil package is split into logical pieces.
 * The scripts/ directory contains new top-level targets.
 * Each test is now responsible for building targets it requires.
 * The install functionality is moved into `runsc` itself for simplicity.
 * The existing kokoro run_tests.sh file now just calls all (can be split).

After this change is merged,  I will create multiple distinct workflows for
Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today,
which should dramatically reduce the time-to-run for the Kokoro tests, and
provides a better foundation for further improvements to the infrastructure.

PiperOrigin-RevId: 267081397
2019-09-03 22:02:43 -07:00
Andrei Vagin 67f2cefce0 Avoid importing platforms from many source files
PiperOrigin-RevId: 256494243
2019-07-03 22:51:26 -07:00
Michael Pratt 5b41ba5d0e Fix various spelling issues in the documentation
Addresses obvious typos, in the documentation only.

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
2019-06-27 14:25:50 -07:00
Nicolas Lacasse 67e2f227aa Always set SysProcAttr.Ctty to an FD in the child's FD table.
Go was going to change the behavior of SysProcAttr.Ctty such that it must be an
FD in the *parent* FD table:
https://go-review.googlesource.com/c/go/+/178919/

However, after some debate, it was decided that this change was too
backwards-incompatible, and so it was reverted.
https://github.com/golang/go/issues/29458

The behavior going forward is unchanged: the Ctty FD must be an FD in the
*child* FD table.

PiperOrigin-RevId: 255228476
2019-06-26 11:27:31 -07:00
Nicolas Lacasse a8f148b8e4 Use different Ctty FDs based on the go version.
An upcoming change in Go 1.13 [1] changes the semantics of the SysProcAttr.Ctty
field. Prior to the change, the FD must be an FD in the child process's FD
table (aka "post-shuffle"). After the change, the FD must be an FD in the
current process's FD table (aka "pre-shuffle").

To be compatible with both versions this CL introduces a new boolean
"CttyFdIsPostShuffle" which indicates whether a pre- or post-shuffle FD should
be provided. We use build tags to chose the correct one.

1: https://go-review.googlesource.com/c/go/+/178919/
PiperOrigin-RevId: 255015303
2019-06-25 11:47:27 -07:00
Andrei Vagin fd16a329ce fsgopher: reopen files via /proc/self/fd
When we reopen file by path, we can't be sure that
we will open exactly the same file. The file can be
deleted and another one with the same name can be
created.

PiperOrigin-RevId: 254898594
2019-06-24 21:44:27 -07:00
Fabricio Voznika b21b1db700 Allow to change logging options using 'runsc debug'
New options are:
  runsc debug --strace=off|all|function1,function2
  runsc debug --log-level=warning|info|debug
  runsc debug --log-packets=true|false

Updates #407

PiperOrigin-RevId: 254843128
2019-06-24 15:03:02 -07:00
Fabricio Voznika 0e07c94d54 Kill sandbox process when 'runsc do' exits
PiperOrigin-RevId: 253882115
2019-06-18 15:36:17 -07:00
Fabricio Voznika bdb19b82ef Add Container/Sandbox args struct for creation
There were 3 string arguments that could be easily misplaced
and it makes it easier to add new arguments, especially for
Container that has dozens of callers.

PiperOrigin-RevId: 253872074
2019-06-18 14:46:49 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Fabricio Voznika 356d1be140 Allow 'runsc do' to run without root
'--rootless' flag lets a non-root user execute 'runsc do'.
The drawback is that the sandbox and gofer processes will
run as root inside a user namespace that is mapped to the
caller's user, intead of nobody. And network is defaulted
to '--network=host' inside the root network namespace. On
the bright side, it's very convenient for testing:

runsc --rootless do ls
runsc --rootless do curl www.google.com

PiperOrigin-RevId: 252840970
2019-06-12 09:41:50 -07:00
Ian Lewis 74e397e39a Add introspection for Linux/AMD64 syscalls
Adds simple introspection for syscall compatibility information to Linux/AMD64.
Syscalls registered in the syscall table now have associated metadata like
name, support level, notes, and URLs to relevant issues.

Syscall information can be exported as a table, JSON, or CSV using the new
'runsc help syscalls' command. Users can use this info to debug and get info
on the compatibility of the version of runsc they are running or to generate
documentation.

PiperOrigin-RevId: 252558304
2019-06-10 23:38:36 -07:00
Fabricio Voznika 720ec3590d Send error message to docker/kubectl exec on failure
Containerd uses the last error message sent to the log to
print as failure cause for create/exec. This required a
few changes in the logging logic for runsc:
  - cmd.Errorf/Fatalf: now writes a message with 'error'
    level to containerd log, in addition to stderr and
    debug logs, like before.
  - log.Infof/Warningf/Fatalf: are not sent to containerd
    log anymore. They are mostly used for debugging and not
    useful to containerd. In most cases, --debug-log is
    enabled and this avoids the logs messages from being
    duplicated.
  - stderr is not used as default log destination anymore.
    Some commands assume stdio is for the container/process
    running inside the sandbox and it's better to never use
    it for logging. By default, logs are supressed now.

PiperOrigin-RevId: 251881815
2019-06-06 10:49:43 -07:00
Fabricio Voznika d28f71adcf Remove 'clearStatus' option from container.Wait*PID()
clearStatus was added to allow detached execution to wait
on the exec'd process and retrieve its exit status. However,
it's not currently used. Both docker and gvisor-containerd-shim
wait on the "shim" process and retrieve the exit status from
there. We could change gvisor-containerd-shim to use waits, but
it will end up also consuming a process for the wait, which is
similar to having the shim process.

Closes #234

PiperOrigin-RevId: 251349490
2019-06-03 18:16:09 -07:00
Bhasker Hariharan 035a8fa38e Add support for collecting execution trace to runsc.
Updates #220

PiperOrigin-RevId: 250532302
2019-05-30 12:07:11 -07:00
Andrei Vagin b52e571a61 runsc/do: don't specify the read-only flag for the root mount
The root mount is an overlay mount.

PiperOrigin-RevId: 250429317
2019-05-30 12:06:42 -07:00
Andrei Vagin 673358c0d9 runsc/do: allow to run commands in a host network namespace
PiperOrigin-RevId: 250329795
2019-05-30 12:05:16 -07:00
Fabricio Voznika 1e42b4cfca Update internal flag name and documentation
Updates #234

PiperOrigin-RevId: 250323553
2019-05-30 12:04:49 -07:00
Andrei Vagin 409e8eea60 runsc/do: do a proper cleanup if a command failed due to internal errors
Fatalf calls os.Exit and a process exits without calling defer callbacks.

Should we do this for other runsc commands?

PiperOrigin-RevId: 249776310
Change-Id: If9d8b54d0ae37db443895906eb33bd9e9b600cc9
2019-05-23 22:28:38 -07:00