Commit Graph

48 Commits

Author SHA1 Message Date
Aleksandr Razumov 67f678be27
Leave minimum CPU number as a constant
Remove introduced CPUNumMin config and hard-code it as 2.
2019-12-17 20:41:02 +03:00
Aleksandr Razumov b661434202
Add minimum CPU number and only lower CPUs on --cpu-num-from-quota
* Add `--cpu-num-min` flag to control minimum CPUs
* Only lower CPU count
* Fix comments
2019-12-17 13:27:13 +03:00
Aleksandr Razumov 8782f0e287
Set CPU number to CPU quota
When application is not cgroups-aware, it can spawn excessive threads
which often defaults to CPU number.
Introduce a opt-in flag that will set CPU number accordingly to CPU
quota (if available).

Fixes #1391
2019-12-15 21:12:43 +03:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
gVisor bot dd0e5eedae Merge pull request #765 from trailofbits:uds_support
PiperOrigin-RevId: 271235134
2019-09-25 16:44:22 -07:00
Robert Tonic 7810b30983 Refactor command line options and remove the allowed terminology for uds 2019-09-24 18:24:10 -04:00
Robert Tonic 46beb91912 Fix documentation, clean up seccomp filter installation, rename helpers.
Filter installation has been streamlined and functions renamed. 
Documentation has been fixed to be standards compliant, and missing 
documentation added. gofmt has also been applied to modified files.
2019-09-19 17:10:50 -04:00
Robert Tonic ac38a7ead0 Place the host UDS mounting behind --fsgofer-host-uds-allowed.
This commit allows the use of the `--fsgofer-host-uds-allowed` flag to 
enable mounting sockets and add the appropriate seccomp filters.
2019-09-19 12:37:15 -04:00
Fabricio Voznika 010b093258 Bring back to life features lost in recent refactor
- Sandbox logs are generated when running tests
- Kokoro uploads the sandbox logs
- Supports multiple parallel runs
- Revive script to install locally built runsc with docker

PiperOrigin-RevId: 269337274
2019-09-16 08:17:00 -07:00
gVisor bot 0789b9cc08 Merge pull request #655 from praveensastry:feature/runsc-ref-chk-leak
PiperOrigin-RevId: 266226714
2019-08-29 14:17:32 -07:00
praveensastry 7672eaae25 Add log prefix for better clarity 2019-08-22 22:52:43 +10:00
praveensastry 73985c6545 Fix the Stringer for leak mode 2019-08-09 17:13:06 +10:00
praveensastry 8d89c0d92b Remove traces option for ref leak mode 2019-08-06 11:57:50 +10:00
praveensastry 607be0585f Add option to configure reference leak checking 2019-08-06 01:15:48 +10:00
Andrei Vagin 4183b9021a runsc: propagate the alsologtostderr to sub-commands
PiperOrigin-RevId: 260239119
2019-07-26 16:53:54 -07:00
Andrei Vagin 67f2cefce0 Avoid importing platforms from many source files
PiperOrigin-RevId: 256494243
2019-07-03 22:51:26 -07:00
Adin Scannell add40fd6ad Update canonical repository.
This can be merged after:
https://github.com/google/gvisor-website/pull/77
  or
https://github.com/google/gvisor-website/pull/78

PiperOrigin-RevId: 253132620
2019-06-13 16:50:15 -07:00
Fabricio Voznika 356d1be140 Allow 'runsc do' to run without root
'--rootless' flag lets a non-root user execute 'runsc do'.
The drawback is that the sandbox and gofer processes will
run as root inside a user namespace that is mapped to the
caller's user, intead of nobody. And network is defaulted
to '--network=host' inside the root network namespace. On
the bright side, it's very convenient for testing:

runsc --rootless do ls
runsc --rootless do curl www.google.com

PiperOrigin-RevId: 252840970
2019-06-12 09:41:50 -07:00
Bhasker Hariharan 85be01b42d Add multi-fd support to fdbased endpoint.
This allows an fdbased endpoint to have multiple underlying fd's from which
packets can be read and dispatched/written to.

This should allow for higher throughput as well as better scalability of the
network stack as number of connections increases.

Updates #231

PiperOrigin-RevId: 251852825
2019-06-06 08:07:02 -07:00
Andrei Vagin bf0ac565d2 Fix runsc restore to be compatible with docker start --checkpoint ...
Change-Id: I02b30de13f1393df66edf8829fedbf32405d18f8
PiperOrigin-RevId: 246621192
2019-05-03 21:41:45 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Kevin Krakauer 43dff57b87 Make raw sockets a toggleable feature disabled by default.
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-26 16:51:46 -07:00
Andrei Vagin a046054ba3 gvisor/runsc: enable generic segmentation offload (GSO)
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.

Here are numbers for the nginx-1m test:
runsc:		579330.01 [Kbytes/sec] received
runsc-gso:	1794121.66 [Kbytes/sec] received
runc:		2122139.06 [Kbytes/sec] received

and for tcp_benchmark:

$ tcp_benchmark  --duration 15   --ideal
[  4]  0.0-15.0 sec  86647 MBytes  48456 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal
[  4]  0.0-15.0 sec  2173 MBytes  1214 Mbits/sec

$ tcp_benchmark --client --duration 15   --ideal --gso 65536
[  4]  0.0-15.0 sec  19357 MBytes  10825 Mbits/sec

PiperOrigin-RevId: 241072403
Change-Id: I20b03063a1a6649362b43609cbbc9b59be06e6d5
2019-03-29 16:27:38 -07:00
Fabricio Voznika bc9b979b94 Add profiling commands to runsc
Example:
  runsc debug --root=<dir> \
      --profile-heap=/tmp/heap.prof \
      --profile-cpu=/tmp/cpu.prod --profile-delay=30 \
      <container ID>
PiperOrigin-RevId: 237848456
Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773
2019-03-11 11:47:30 -07:00
Michael Pratt 33191e1cc4 Automated rollback of changelist 225089593
PiperOrigin-RevId: 227595007
Change-Id: If14cc5aab869c5fd7a4ebd95929c887ab690e94c
2019-01-02 15:48:00 -08:00
Michael Pratt b62591e6a8 Expose internal testing flag
Never to used outside of runsc tests!

PiperOrigin-RevId: 225919013
Change-Id: Ib3b14aa2a2564b5246fb3f8933d95e01027ed186
2018-12-17 17:35:06 -08:00
Michael Pratt 24c1158b9c Add "trace signal" option
This option is effectively equivalent to -panic-signal, except that the
sandbox does not die after logging the traceback.

PiperOrigin-RevId: 225089593
Change-Id: Ifb1c411210110b6104613f404334bd02175e484e
2018-12-11 16:12:41 -08:00
Fabricio Voznika b6b81fd04b Add new log format that is compatible with Kubernetes
Fluentd configuration uses 'log' for the log message
while containerd uses 'msg'. Since we can't have a single
JSON format for both, add another log format and make
debug log configurable.

PiperOrigin-RevId: 219729658
Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0
2018-11-01 17:44:58 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Fabricio Voznika f3ffa4db52 Resolve mount paths while setting up root fs mount
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.

Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.

It haven't been able to write a good test for it. So I'm relying on manual tests
for now.

PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-18 12:42:24 -07:00
Fabricio Voznika e68d86e1bd Make debug log file name configurable
This is a breaking change if you're using --debug-log-dir.
The fix is to replace it with --debug-log and add a '/' at
the end:
  --debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/

PiperOrigin-RevId: 216761212
Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
2018-10-11 14:29:37 -07:00
Fabricio Voznika a2ad8fef13 Make multi-container the default mode for runsc
And remove multicontainer option.

PiperOrigin-RevId: 215236981
Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-10-01 10:31:17 -07:00
Fabricio Voznika bc81f3fe4a Remove '--file-access=direct' option
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.

PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07 12:28:48 -07:00
Nicolas Lacasse 210c252089 runsc: Run sandbox process inside minimal chroot.
We construct a dir with the executable bind-mounted at /exe, and proc mounted
at /proc.  Runsc now executes the sandbox process inside this chroot, thus
limiting access to the host filesystem.  The mounts and chroot dir are removed
when the sandbox is destroyed.

Because this requires bind-mounts, we can only do the chroot if we have
CAP_SYS_ADMIN.

PiperOrigin-RevId: 211994001
Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-07 10:16:39 -07:00
Nicolas Lacasse 0a9a40abcd runsc: Run sandbox as user nobody.
When starting a sandbox without direct file or network access, we create an
empty user namespace and run the sandbox in there.  However, the root user in
that namespace is still mapped to the root user in the parent namespace.

This CL maps the "nobody" user from the parent namespace into the child
namespace, and runs the sandbox process as user "nobody" inside the new
namespace.

PiperOrigin-RevId: 211572223
Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6
2018-09-04 20:33:05 -07:00
Nicolas Lacasse ad8648c634 runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

The specutils.ReadSpecFromFile method was fixed to always seek to the beginning
of the file before reading. This allows Files from the same FD to be read
multiple times, as we do in the boot command when the apply-caps flag is set.

Tested with --network=host.

PiperOrigin-RevId: 211570647
Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-09-04 20:10:01 -07:00
Fabricio Voznika 7e18f158b2 Automated rollback of changelist 210995199
PiperOrigin-RevId: 211116429
Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-31 11:30:47 -07:00
Nicolas Lacasse 5ade9350ad runsc: Pass log and config files to sandbox process by FD.
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.

Instead, we must open then before starting the sandbox process, and pass them
by FD.

PiperOrigin-RevId: 210995199
Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-08-30 15:47:18 -07:00
Fabricio Voznika ae648bafda Add command-line parameter to trigger panic on signal
This is to troubleshoot problems with a hung process that is
not responding to 'runsc debug --stack' command.

PiperOrigin-RevId: 210483513
Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
2018-08-27 20:36:10 -07:00
Nicolas Lacasse 2033f61aae runsc: Fix instances of file access "proxy".
This file access type is actually called "proxy-shared", but I forgot to update
all locations.

PiperOrigin-RevId: 208832491
Change-Id: I7848bc4ec2478f86cf2de1dcd1bfb5264c6276de
2018-08-15 09:34:18 -07:00
Nicolas Lacasse e8a4f2e133 runsc: Change cache policy for root fs and volume mounts.
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively.  While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.

This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.

This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.

A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.

All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.

PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-14 16:25:58 -07:00
Fabricio Voznika 1f207de315 Add option to configure watchdog action
PiperOrigin-RevId: 202494747
Change-Id: I4d4a18e71468690b785060e580a5f83c616bd90f
2018-06-28 09:46:50 -07:00
Lantao Liu e8ae2b85e9 runsc: add a `multi-container` flag to enable multi-container support.
PiperOrigin-RevId: 201995800
Change-Id: I770190d135e14ec7da4b3155009fe10121b2a502
2018-06-25 12:08:44 -07:00
Fabricio Voznika cecc1e472c Fix lint errors
PiperOrigin-RevId: 201978212
Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067
2018-06-25 10:41:27 -07:00
Fabricio Voznika 48335318a2 Enable debug logging in tests
Unit tests call runsc directly now, so all command line arguments
are valid. On the other hand, enabling debug in the test binary
doesn't affect runsc. It needs to be set in the config.

PiperOrigin-RevId: 200237706
Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7
2018-06-12 10:25:55 -07:00
Nicolas Lacasse 205f1027e6 Refactor the Sandbox package into Sandbox + Container.
This is a necessary prerequisite for supporting multiple containers in a single
sandbox.

All the commands (in cmd package) now call operations on Containers (container
package). When a Container first starts, it will create a Sandbox with the same
ID.

The Sandbox class is now simpler, as it only knows how to create boot/gofer
processes, and how to forward commands into the running boot process.

There are TODOs sprinkled around for additional support for multiple
containers. Most notably, we need to detect when a container is intended to run
in an existing sandbox (by reading the metadata), and then have some way to
signal to the sandbox to start a new container. Other urpc calls into the
sandbox need to pass the container ID, so the sandbox can run the operation on
the given container. These are only half-plummed through right now.

PiperOrigin-RevId: 196688269
Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0
2018-05-15 10:18:03 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00