Sandbox root dir was not being saved with the Container state,
so it would point to the wrong directory location when attempting
to lock the sandbox. This led to race conditions saving and
loading container state. Fixing it, led to multiple deadlocks.
I've moved the saving and locking logic to a separate struct and
moved the lock file inside the RootDir (instead of container
root dir), which allows the lock to be taken inside Destroy,
and removes the need to lock the sandbox.
PiperOrigin-RevId: 277599612
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
container.startContainers() cannot be called twice in a test
(e.g. TestMultiContainerLoadSandbox) because the cleanup
function deletes the rootDir, together with information from
all other containers that may exist.
PiperOrigin-RevId: 276591806
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.
The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.
Nginx test results:
Server Software: nginx/1.15.9
Server Hostname: 10.138.0.2
Server Port: 8080
Document Path: /10m.txt
Document Length: 10485760 bytes
w/o gso:
Concurrency Level: 5
Time taken for tests: 5.491 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 18.21 [#/sec] (mean)
Time per request: 274.525 [ms] (mean)
Time per request: 54.905 [ms] (mean, across all concurrent requests)
Transfer rate: 186508.03 [Kbytes/sec] received
sw-gso:
Concurrency Level: 5
Time taken for tests: 3.852 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 25.96 [#/sec] (mean)
Time per request: 192.576 [ms] (mean)
Time per request: 38.515 [ms] (mean, across all concurrent requests)
Transfer rate: 265874.92 [Kbytes/sec] received
w/o gso:
$ ./tcp_benchmark --client --duration 15 --ideal
[SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec
software gso:
$ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso
[SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec
PiperOrigin-RevId: 276112677
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.
Binding isn't supported yet.
PiperOrigin-RevId: 275909366
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.
A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.
Updates #235
Updates #1003
[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.
PiperOrigin-RevId: 275558502
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.
Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.
Closes#1006
It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.
PiperOrigin-RevId: 275115289
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.
This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.
Updates #235
PiperOrigin-RevId: 274827872
rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer
dereference). Before this, nil-pointer dereferences cause a syscall violation
instead of a panic.
PiperOrigin-RevId: 274028767
Options that do not change mount behavior inside the Sentry are
irrelevant and should not be used when looking for possible
incompatibilities between master and slave mounts.
PiperOrigin-RevId: 273593486
Adds two tests. One to make sure that $HOME is set when starting a container
via 'docker run' and one to make sure that $HOME is set for each container in a
multi-container sandbox.
Issue #701
PiperOrigin-RevId: 273395763
'docker exec' was getting CAP_NET_RAW even when --net-raw=false
because it was not filtered out from when copying container's
capabilities.
PiperOrigin-RevId: 272260451
BUILD:85:1: in _pkg_deb rule //runsc:runsc-debian: target
'//runsc:runsc-debian' depends on deprecated target
'@bazel_tools//tools/build_defs/pkg:make_deb': The internal version of
make_deb is deprecated. Please use the replacement for pkg_deb from
https://github.com/bazelbuild/rules_pkg/blob/master/pkg.
PiperOrigin-RevId: 271590386
Filter installation has been streamlined and functions renamed.
Documentation has been fixed to be standards compliant, and missing
documentation added. gofmt has also been applied to modified files.
We need to include the `--stamp` flag in `tools/workspace_status.sh` for
the version to be picked up by the linker. Not sure why.
Also changes the VERSION string to STABLE_VERSION, which will cause the
program to be re-linked if the string changes.
Fixes#830
This is done because the root container for CRI is the infrastructure (pause)
container and always gets a low oom_score_adj. We do this to ensure that only
the oom_score_adj of user containers is used to calculated the sandbox
oom_score_adj.
Implemented in runsc rather than the containerd shim as it's a bit cleaner to
implement here (in the shim it would require overwriting the oomScoreAdj and
re-writing out the config.json again). This processing is Kubernetes(CRI)
specific but we are currently only supporting CRI for multi-container support
anyway.
PiperOrigin-RevId: 267507706
Some processes are reparented to the root container depending
on the kill order and the root container would not reap in time.
So some zombie processes were still present when the test checked.
Fix it by running the second container inside a PID namespace.
PiperOrigin-RevId: 267278591
The simple test script has gotten out of control. Shard this script into
different pieces and attempt to impose order on overall test structure. This
change helps lay some of the foundations for future improvements.
* The runsc/test directories are moved into just test/.
* The runsc/test/testutil package is split into logical pieces.
* The scripts/ directory contains new top-level targets.
* Each test is now responsible for building targets it requires.
* The install functionality is moved into `runsc` itself for simplicity.
* The existing kokoro run_tests.sh file now just calls all (can be split).
After this change is merged, I will create multiple distinct workflows for
Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today,
which should dramatically reduce the time-to-run for the Kokoro tests, and
provides a better foundation for further improvements to the infrastructure.
PiperOrigin-RevId: 267081397