Commit Graph

240 Commits

Author SHA1 Message Date
Fabricio Voznika d81fcbf85c Set RLimits during `runsc exec`
PiperOrigin-RevId: 378726430
2021-06-10 13:55:10 -07:00
Fabricio Voznika 9f33fe64f2 Fixes to runsc cgroups
When loading cgroups for another process, `/proc/self` was used in
a few places, causing the end state to be a mix of the process
and self. This is now fixes to always use the proper `/proc/[pid]`
path.

Added net_prio and net_cls to the list of optional controllers. This
is to allow runsc to execute then these cgroups are disabled as long
as there are no net_prio and net_cls limits that need to be applied.

Deflake TestMultiContainerEvent.

Closes #5875
Closes #5887

PiperOrigin-RevId: 372242687
2021-05-05 17:39:29 -07:00
Fabricio Voznika 95df852bf2 Make Mount.Type optional for bind mounts
According to the OCI spec Mount.Type is an optional field and it
defaults to "bind" when any of "bind" or "rbind" is included in
Mount.Options.

Also fix the shim to remove bind/rbind from options when mount is
converted from bind to tmpfs inside the Sentry.

Fixes #2330
Fixes #3274

PiperOrigin-RevId: 371996891
2021-05-04 14:36:06 -07:00
Adin Scannell 8192cccda6 Clean test tags.
PiperOrigin-RevId: 369505182
2021-04-20 13:11:25 -07:00
Fabricio Voznika 71f3dccbb3 Fix panic when overriding /dev files with VFS2
VFS1 skips over mounts that overrides files in /dev because the list of
files is hardcoded. This is not needed for VFS2 and a recent change
lifted this restriction. However, parts of the code were still skipping
/dev mounts even in VFS2, causing the loader to panic when it ran short
of FDs to connect to the gofer.

PiperOrigin-RevId: 365858436
2021-03-30 11:36:55 -07:00
Rahat Mahmood c5667022b6 Report filesystem-specific mount options.
PiperOrigin-RevId: 362406813
2021-03-11 16:49:36 -08:00
Fabricio Voznika 14fc2ddd6c Update flock to v0.8.0
PiperOrigin-RevId: 361962416
2021-03-09 20:54:15 -08:00
Ayush Ranjan e668288faf [op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.
The syscall package has been deprecated in favor of golang.org/x/sys.

Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential

Updates #214

PiperOrigin-RevId: 361381490
2021-03-06 22:07:07 -08:00
Daniel Dao 306a9477da
return root pids with runsc ps
`runsc ps` currently return pid for a task's immediate pid namespace,
which is confusing when there're multiple pid namespaces. We should
return only pids in the root namespace.

Before:

```
1000      1         0         0         ?         02:24     250ms     chrome
1000      1         0         0         ?         02:24     40ms      dumb-init
1000      1         0         0         ?         02:24     240ms     chrome
1000      2         1         0         ?         02:24     2.78s     node
```

After:

```
UID       PID       PPID      C         TTY       STIME     TIME      CMD
1000      1         0         0         ?         12:35     0s        dumb-init
1000      2         1         7         ?         12:35     240ms     node
1000      13        2         21        ?         12:35     2.33s     chrome
1000      27        13        3         ?         12:35     260ms     chrome
```

Signed-off-by: Daniel Dao <dqminh@cloudflare.com>
2021-02-24 15:20:43 +00:00
Fabricio Voznika 34e2cda9ad Return nicer error message when cgroups v1 isn't available
Updates #3481
Closes #5430

PiperOrigin-RevId: 358923208
2021-02-22 15:57:07 -08:00
Fabricio Voznika 19fe3a2bfb Fix `runsc kill --pid`
Previously, loader.signalProcess was inconsitently using both root and
container's PID namespace to find the process. It used root namespace
for the exec'd process and container's PID namespace for other processes.
This fixes the code to use the root PID namespace across the board, which
is the same PID reported in `runsc ps` (or soon will after
https://github.com/google/gvisor/pull/5519).

PiperOrigin-RevId: 358836297
2021-02-22 09:33:46 -08:00
Ting-Yu Wang 120c8e3468 Replace TaskFromContext(ctx).Kernel() with KernelFromContext(ctx)
Panic seen at some code path like control.ExecAsync where
ctx does not have a Task.

Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com
PiperOrigin-RevId: 355960596
2021-02-05 17:28:01 -08:00
Kevin Krakauer 5f7bf31526 Stub out basic `runsc events --stat` CPU functionality
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.

This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.

Addresses #172.

PiperOrigin-RevId: 355229833
2021-02-02 12:47:23 -08:00
Fabricio Voznika aae4803808 Enable container checkpoint/restore tests with VFS2
Updates #1663

PiperOrigin-RevId: 355077816
2021-02-01 19:29:29 -08:00
Fabricio Voznika f14f3ba3ef Fix TestDuplicateEnvVariable flakyness
Updates #5226

PiperOrigin-RevId: 353262133
2021-01-22 09:57:44 -08:00
Adin Scannell 4e03e87547 Fix simple mistakes identified by goreportcard.
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.

PiperOrigin-RevId: 351425971
2021-01-12 12:38:22 -08:00
Fabricio Voznika 7e462a1c7f OCI spec may contain duplicate environment variables
Closes #5226

PiperOrigin-RevId: 351259576
2021-01-11 16:25:50 -08:00
Fabricio Voznika 8ea19b5818 Add sandbox ID to state file name
This allows to find all containers inside a sandbox more efficiently.
This operation is required every time a container starts and stops,
and previously required loading *all* container state files to check
whether the container belonged to the sandbox.

Apert from being inneficient, it has caused problems when state files
are stale or corrupt, causing inavalability to create any container.

Also adjust commands `list` and `debug` to skip over files that fail
to load.

Resolves #5052

PiperOrigin-RevId: 348050637
2020-12-17 10:52:44 -08:00
Adin Scannell 80552b936d Support partitions for other tests.
PiperOrigin-RevId: 345399936
2020-12-03 01:00:21 -08:00
Fabricio Voznika 7158095d68 Fix race condition in multi-container wait test
Container is not thread-safe, locking must be done in the caller.
The test was calling Container.Wait() from multiple threads with
no synchronization.

Also removed Container.WaitPID from test because the process might
have already existed when wait is called.

PiperOrigin-RevId: 343176280
2020-11-18 16:06:31 -08:00
Fabricio Voznika e2d9a68eef Add support for TTY in multi-container
Fixes #2714

PiperOrigin-RevId: 342950412
2020-11-17 14:51:24 -08:00
Fabricio Voznika 0e8fdfd388 Re-add start/stop container tests
Due to a type doDestroyNotStartedTest was being tested
2x instead of doDestroyStartingTest.

PiperOrigin-RevId: 340969797
2020-11-05 19:06:43 -08:00
Fabricio Voznika 62b0e845b7 Return failure when `runsc events` queries a stopped container
This was causing gvisor-containerd-shim to crash because the command
suceeded, but there was no stat present.

PiperOrigin-RevId: 340964921
2020-11-05 18:18:21 -08:00
Fabricio Voznika c47f8afe23 Fix failure setting OOM score adjustment
When OOM score adjustment needs to be set, all the containers need to be
loaded to find all containers that belong to the sandbox. However, each
load signals the container to ensure it is still alive. OOM score
adjustment is set during creation and deletion of every container, generating
a flood of signals to all containers. The fix removes the signal check
when it's not needed.

There is also a race fetching OOM score adjustment value from the parent when
the sandbox exits at the same  time (the time it took to signal containers above
made this window quite large). The fix is to store the original value
in the sandbox state file and use it when the value needs to be restored.

Also add more logging and made the existing ones more consistent to help with
debugging.

PiperOrigin-RevId: 340940799
2020-11-05 15:36:20 -08:00
gVisor bot 1a5eb49a43 Merge pull request #3957 from workato:auto-cgroup
PiperOrigin-RevId: 338372736
2020-10-21 17:24:06 -07:00
Konstantin Baranov d579ed8505 Do not even try forcing cgroups in tests 2020-10-20 20:03:04 -07:00
Fabricio Voznika 4b4d12d5bb Fixes to cgroups
There were a few problems with cgroups:
- cleanup loop what breaking too early
- parse of /proc/[pid]/cgroups was skipping "name=systemd"
  because "name=" was not being removed from name.
- When no limits are specified, fillFromAncestor was not being
  called, causing a failure to set cpuset.mems

Updates #4536

PiperOrigin-RevId: 337947356
2020-10-19 15:32:50 -07:00
Konstantin Baranov a2a27eedf4 Ignore errors in rootless and test modes 2020-10-06 15:34:02 -07:00
Fabricio Voznika 9e64b9f3a5 Fix gofer monitor prematurely destroying container
When all container tasks finish, they release the mount which in turn
will close the 9P session to the gofer. The gofer exits when the connection
closes, triggering the gofer monitor. The gofer monitor will _think_ that
the gofer died prematurely and destroy the container. Then when the caller
attempts to wait for the container, e.g. to get the exit code, wait fails
saying the container doesn't exist.

Gofer monitor now just SIGKILLs the container, and let the normal teardown
process to happen, which will evetually destroy the container at the right
time. Also, fixed an issue with exec racing with container's init process
exiting.

Closes #1487

PiperOrigin-RevId: 335537350
2020-10-05 17:40:23 -07:00
Fabricio Voznika 9e9fec3a09 Enable more VFS2 tests
Updates #1487

PiperOrigin-RevId: 335516732
2020-10-05 15:54:36 -07:00
Konstantin Baranov 6321eccddc Treat absent "linux" section is empty "cgroupsPath" too 2020-10-02 14:37:55 -07:00
Howard Zhang d47209b86d fix TestUserLog for multi-arch
based on arch, apply different syscall number for
sched_rr_get_interval

Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2020-09-25 14:48:37 +08:00
Fabricio Voznika da07e38f7c Remove option to panic gofer
Gofer panics are suppressed by p9 server and an error
is returned to the caller, making it effectively the
same as returning EROFS.

PiperOrigin-RevId: 332282959
2020-09-17 12:01:45 -07:00
Fabricio Voznika a11061d78a Add VFS2 overlay support in runsc
All tests under runsc are passing with overlay enabled.

Updates #1487, #1199

PiperOrigin-RevId: 332181267
2020-09-17 01:09:42 -07:00
Fabricio Voznika 326a1dbb73 Refactor removed default test dimension
ptrace was always selected as a dimension before, but not
anymore. Some tests were specifying "overlay" expecting that
to be in addition to the default.

PiperOrigin-RevId: 332004111
2020-09-16 07:47:28 -07:00
Konstantin Baranov b8dc9a889f Use container ID as cgroup name if not provided
Useful when you want to run multiple containers with the same config.
And runc does that too.
2020-09-15 20:50:07 -07:00
Fabricio Voznika c8f1ce288d Honor readonly flag for root mount
Updates #1487

PiperOrigin-RevId: 330580699
2020-09-08 14:00:43 -07:00
Fabricio Voznika 2202812e07 Simplify FD handling for container start/exec
VFS1 and VFS2 host FDs have different dupping behavior,
making error prone to code for both. Change the contract
so that FDs are released as they are used, so the caller
can simple defer a block that closes all remaining files.
This also addresses handling of partial failures.

With this fix, more VFS2 tests can be enabled.

Updates #1487

PiperOrigin-RevId: 330112266
2020-09-04 11:42:02 -07:00
Ayush Ranjan 2eaf54dd59 Refactor tty codebase to use master-replica terminology.
Updates #2972

PiperOrigin-RevId: 329584905
2020-09-01 14:43:41 -07:00
Fabricio Voznika be76c7ce6e Move boot.Config to its own package
Updates #3494

PiperOrigin-RevId: 327548511
2020-08-19 18:37:42 -07:00
Adin Scannell d0fd97541a Clean-up bazel wrapper.
The bazel server was being started as the wrong user, leading to issues
where the container would suddenly exit during a build.

We can also simplify the waiting logic by starting the container in two
separate steps: those that must complete first, then the asynchronous bit.

PiperOrigin-RevId: 323391161
2020-07-27 10:40:29 -07:00
gVisor bot bdbab2702a Merge pull request #3022 from prattmic:runsc_do_pdeathsig
PiperOrigin-RevId: 321449877
2020-07-15 15:21:32 -07:00
Michael Pratt 1481673178 Apply pdeathsig to gofer for runsc run/do
Much like the boot process, apply pdeathsig to the gofer for cases where
the sandbox lifecycle is attached to the parent (runsc run/do).

This isn't strictly necessary, as the gofer normally exits once the
sentry disappears, but this makes that extra reliable.
2020-07-15 15:15:11 -04:00
Fabricio Voznika 1bfb556ccd Prepare boot.Loader to support multi-container TTY
- Combine process creation code that is shared between
  root and subcontainer processes
- Move root container information into a struct for
  clarity

Updates #2714

PiperOrigin-RevId: 321204798
2020-07-14 12:02:03 -07:00
gVisor bot c81ac8ec3b Merge pull request #2672 from amscanne:shim-integrated
PiperOrigin-RevId: 321053634
2020-07-13 16:10:58 -07:00
Fabricio Voznika c4815af947 Add shared mount hints to VFS2
Container restart test is disabled for VFS2 for now.

Updates #1487

PiperOrigin-RevId: 320296401
2020-07-08 17:12:29 -07:00
Ian Lewis 8ea99d58ff Set the HOME environment variable for sub-containers.
Fixes #701

PiperOrigin-RevId: 316025635
2020-06-11 19:31:24 -07:00
Fabricio Voznika 4e96b94915 Combine executable lookup code
Run vs. exec, VFS1 vs. VFS2 were executable lookup were
slightly different from each other. Combine them all
into the same logic.

PiperOrigin-RevId: 315426443
2020-06-08 23:08:23 -07:00
Fabricio Voznika ca5912d13c More runsc changes for VFS2
- Add /tmp handling
- Apply mount options
- Enable more container_test tests
- Forward signals to child process when test respaws process
  to run as root inside namespace.

Updates #1487

PiperOrigin-RevId: 314263281
2020-06-01 21:32:09 -07:00
Fabricio Voznika f7418e2159 Move Cleanup to its own package
PiperOrigin-RevId: 313663382
2020-05-28 14:49:06 -07:00