Commit Graph

829 Commits

Author SHA1 Message Date
Fabricio Voznika 19fe3a2bfb Fix `runsc kill --pid`
Previously, loader.signalProcess was inconsitently using both root and
container's PID namespace to find the process. It used root namespace
for the exec'd process and container's PID namespace for other processes.
This fixes the code to use the root PID namespace across the board, which
is the same PID reported in `runsc ps` (or soon will after
https://github.com/google/gvisor/pull/5519).

PiperOrigin-RevId: 358836297
2021-02-22 09:33:46 -08:00
Adin Scannell 3ef012944d Stop the control server only once.
Operations are now shut down automatically by the main Stop
command, and it is not necessary to call Stop during Destroy.

Fixes #5454

PiperOrigin-RevId: 357295930
2021-02-12 17:13:44 -08:00
Fabricio Voznika 192780946f Allow rt_sigaction in gofer seccomp
rt_sigaction may be called by Go runtime when trying to panic:

https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;drc=ed3e4afa12d655a0c5606bcf3dd4e1cdadcb1476;bpv=1;bpt=1;l=780?q=rt_sigaction&ss=go

Updates #5038

PiperOrigin-RevId: 357013186
2021-02-11 11:01:21 -08:00
Zach Koopmans 1ac58cc23e Add mitigate command to runsc
PiperOrigin-RevId: 356772367
2021-02-10 10:48:48 -08:00
Ting-Yu Wang 120c8e3468 Replace TaskFromContext(ctx).Kernel() with KernelFromContext(ctx)
Panic seen at some code path like control.ExecAsync where
ctx does not have a Task.

Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com
PiperOrigin-RevId: 355960596
2021-02-05 17:28:01 -08:00
Michael Pratt 41510d2746 Move getcpu() to core filter list
Some versions of the Go runtime call getcpu(), so add it for compatibility. The
hostcpu package already uses getcpu() on arm64.

PiperOrigin-RevId: 355717757
2021-02-04 14:56:26 -08:00
Zach Koopmans fcc2468db5 Add CPUSet for runsc mitigate.
PiperOrigin-RevId: 355242055
2021-02-02 13:40:46 -08:00
Kevin Krakauer 5f7bf31526 Stub out basic `runsc events --stat` CPU functionality
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.

This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.

Addresses #172.

PiperOrigin-RevId: 355229833
2021-02-02 12:47:23 -08:00
Fabricio Voznika aae4803808 Enable container checkpoint/restore tests with VFS2
Updates #1663

PiperOrigin-RevId: 355077816
2021-02-01 19:29:29 -08:00
gVisor bot 25284ae3c9 Merge pull request #4503 from dqminh:nested-cgroup
PiperOrigin-RevId: 354568091
2021-01-29 11:06:55 -08:00
Zach Koopmans 814cfd7c4a Internal change
PiperOrigin-RevId: 354170726
2021-01-27 14:18:30 -08:00
Daniel Dao d8b590581a
Clean cgroupt mountinfo and add more test cases
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-01-27 12:02:27 +00:00
Daniel Dao bd5eb8a9db
runsc: check for nested cgroup when generating croup paths
in nested container, we see paths from host in /proc/self/cgroup, so we
need to re-process that path to get a relative path to be used inside
the container.

Without it, runsc generates ugly paths that may trip other cgroup
watchers that expect clean paths. An example of ugly path is:

```
/sys/fs/cgroup/memory/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93/cgroupPath
```

Notice duplication of `docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93`

`/proc/1/cgroup` looks like

```
12:perf_event:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
11:blkio:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
10:freezer:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
9:hugetlb:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
8:devices:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
7:rdma:/
6:pids:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
5:cpuset:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
4:cpu,cpuacct:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
3:memory:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
2:net_cls,net_prio:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
1:name=systemd:/docker/e383892b29290ae8005d535f2dadc4a583bb354d5bb1ba8c10bf900d92c4db93
0::/system.slice/containerd.service
```

This is not necessary when the parent container was created with cgroup
namespace, but that setup is not very common right now.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-01-26 15:01:21 +00:00
Zach Koopmans 16b81308cf Add initial mitigate code and cpu parsing.
PiperOrigin-RevId: 353274135
2021-01-22 10:52:57 -08:00
Fabricio Voznika 9b4f4655ed Remove dependency to abi/linux
abi package is to be used by the Sentry to implement the Linux ABI.
Code dealing with the host should use x/sys/unix.

PiperOrigin-RevId: 353272679
2021-01-22 10:47:28 -08:00
Fabricio Voznika f14f3ba3ef Fix TestDuplicateEnvVariable flakyness
Updates #5226

PiperOrigin-RevId: 353262133
2021-01-22 09:57:44 -08:00
Fabricio Voznika 7bf656f4c6 Fix ownership change logic
Previously fsgofer was skipping chown call if the uid and gid
were the same as the current user/group. However, when setgid
is set, the group may not be the same as the caller. Instead,
compare the actual uid/gid of the file after it has been
created and change ownership only if needed.

Updates #180

PiperOrigin-RevId: 353118733
2021-01-21 15:47:23 -08:00
Dean Deng 1efe0ebc59 Switch uses of os.Getenv that check for empty string to os.LookupEnv.
Whether the variable was found is already returned by syscall.Getenv.
os.Getenv drops this value while os.Lookupenv passes it along.

PiperOrigin-RevId: 351674032
2021-01-13 15:15:20 -08:00
Adin Scannell 4e03e87547 Fix simple mistakes identified by goreportcard.
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.

PiperOrigin-RevId: 351425971
2021-01-12 12:38:22 -08:00
Fabricio Voznika 7e462a1c7f OCI spec may contain duplicate environment variables
Closes #5226

PiperOrigin-RevId: 351259576
2021-01-11 16:25:50 -08:00
Adin Scannell b06e5bc5b0 Add benchmarks targets to BuildKite.
This includes minor fix-ups:

* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.

PiperOrigin-RevId: 350205718
2021-01-05 13:21:54 -08:00
Andrei Vagin 622db84e4b Internal changes.
PiperOrigin-RevId: 350159657
2021-01-05 09:53:42 -08:00
Fabricio Voznika 1b66bad7c4 Fix condition checking in `runsc debug`
Closes #5052

PiperOrigin-RevId: 349579814
2020-12-30 11:18:36 -08:00
Adin Scannell 85c1c3ed4b Make profiling commands synchronous.
This allows for a model of profiling when you can start collection, and
it will terminate when the sandbox terminates. Without this synchronous
call, it is effectively impossible to collect length blocking and mutex
profiles.

PiperOrigin-RevId: 349483418
2020-12-29 16:23:01 -08:00
Etienne Perot 9a72730f24 Typo fix.
PiperOrigin-RevId: 348106699
2020-12-17 15:39:03 -08:00
Ayush Ranjan 028271b530 [netstack] Implement IP(V6)_RECVERR socket option.
PiperOrigin-RevId: 348055514
2020-12-17 11:10:41 -08:00
Fabricio Voznika 8ea19b5818 Add sandbox ID to state file name
This allows to find all containers inside a sandbox more efficiently.
This operation is required every time a container starts and stops,
and previously required loading *all* container state files to check
whether the container belonged to the sandbox.

Apert from being inneficient, it has caused problems when state files
are stale or corrupt, causing inavalability to create any container.

Also adjust commands `list` and `debug` to skip over files that fail
to load.

Resolves #5052

PiperOrigin-RevId: 348050637
2020-12-17 10:52:44 -08:00
Fabricio Voznika e7493a9e23 Set max memory not min
Closes #5048

PiperOrigin-RevId: 348050472
2020-12-17 10:46:47 -08:00
Fabricio Voznika 12ac31ed04 fsgofer optimizations
- Skip chown call in case owner change is not needed
- Skip filepath.Clean() calls when joining paths
- Pass unix.Stat_t by value to reduce runtime.duffcopy calls.
  This change allows for better inlining in localFile.walk().

                                Change            Baseline    Improvement
BenchmarkWalkOne-6           	 2912 ns/op       3082 ns/op     5.5%
BenchmarkCreate-6            	15915 ns/op      19126 ns/op    16.8%
BenchmarkCreateDiffOwner-6	18795 ns/op      19741 ns/op     4.8%

PiperOrigin-RevId: 347667833
2020-12-15 12:23:55 -08:00
Ayush Ranjan a1c56bc227 [netstack] Update raw socket and hostinet control message parsing.
There are surprisingly few syscall tests that run with hostinet. For example
running the following command only returns two results:
`bazel query test/syscalls:all | grep hostnet`

I think as a result, as our control messages evolved, hostinet was left
behind. Update it to support all control messages netstack supports.

This change also updates sentry's control message parsing logic to make it up to
date with all the control messages we support.

PiperOrigin-RevId: 347508892
2020-12-14 18:00:55 -08:00
Dean Deng 80379894d3 Add runsc symbolize command.
This command takes instruction pointers from stdin and converts them into their
corresponding file names and line/column numbers in the runsc source code. The
inputs are not interpreted as actual addresses, but as synthetic values that are
exposed through /sys/kernel/debug/kcov. One can extract coverage information
from kcov and translate those values into locations in the source code by
running symbolize on the same runsc binary.

This will allow us to generate syzkaller coverage reports.

PiperOrigin-RevId: 347089624
2020-12-11 15:43:22 -08:00
Adin Scannell 4cba3904f4 Remove existing nogo exceptions.
PiperOrigin-RevId: 347047550
2020-12-11 12:06:49 -08:00
Bhasker Hariharan bcb97a3bb7 Disable host reassembly for fragments.
fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket
to ensure that fragments are delivered inorder to the right dispatcher. But this
prevents fragments from being delivered to gvisor at all and makes testing of
gvisor's fragment reassembly code impossible.

The potential impact from this is minimal since IP Fragmentation is not really
that prevelant and in cases where we do get fragments we may deliver the
fragment out of order to the TCP layer as multiple network dispatchers may
process the fragments and deliver a reassembled fragment after the next packet
has been delivered to the TCP endpoint. While not desirable I believe the impact
from this is minimal due to low prevalence of fragmentation.

Also removed PktType and Hatype fields when binding the socket as these are not
used when binding. Its just confusing to have them specified.

See: https://man7.org/linux/man-pages/man7/packet.7.html
"Fields used for binding are
       sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex."

Fixes #5055

PiperOrigin-RevId: 346919439
2020-12-10 20:08:59 -08:00
Adin Scannell 65a2242db4 Tweak aarch64 support.
A few images were broken with respect to aarch64. We should now
be able to run push-all-images with ARCH=aarch64 as part of the
regular continuous integration builds, and add aarch64 smoke tests
(via user emulation for now) to the regular test suite (future).

PiperOrigin-RevId: 346685462
2020-12-09 18:51:17 -08:00
Peter Johnston eeb23531eb Support icmpv6 transport protocol
PiperOrigin-RevId: 346101076
2020-12-07 08:44:44 -08:00
Jamie Liu b80021afd2 Overlay runsc regular file mounts with regular files.
Fixes #4991

PiperOrigin-RevId: 345800333
2020-12-04 19:13:24 -08:00
Dean Deng bec8cea651 Surface usage message for `runsc do`.
c.Usage() only returns a string; f.Usage() will print the usage message.

PiperOrigin-RevId: 345500123
2020-12-03 11:47:30 -08:00
Adin Scannell 80552b936d Support partitions for other tests.
PiperOrigin-RevId: 345399936
2020-12-03 01:00:21 -08:00
Fabricio Voznika 209a95a35a Propagate IP address prefix from host to netstack
Closes #4022

PiperOrigin-RevId: 343378647
2020-11-19 15:11:17 -08:00
Andrei Vagin 764504c38f runsc: check whether cgroup exists or not for each controller
We have seen a case when a memory cgroup exists but a perf_event one doesn't.

Reported-by: syzbot+f31468b61d1a27e629dc@syzkaller.appspotmail.com
Reported-by: syzbot+1f163ec0321768f1497e@syzkaller.appspotmail.com
PiperOrigin-RevId: 343200070
2020-11-18 18:37:31 -08:00
Fabricio Voznika 7158095d68 Fix race condition in multi-container wait test
Container is not thread-safe, locking must be done in the caller.
The test was calling Container.Wait() from multiple threads with
no synchronization.

Also removed Container.WaitPID from test because the process might
have already existed when wait is called.

PiperOrigin-RevId: 343176280
2020-11-18 16:06:31 -08:00
Fabricio Voznika e2d9a68eef Add support for TTY in multi-container
Fixes #2714

PiperOrigin-RevId: 342950412
2020-11-17 14:51:24 -08:00
Ghanan Gowripalan cc5cfce4c6 Remove ARP address workaround
- Make AddressableEndpoint optional for NetworkEndpoint.
Not all NetworkEndpoints need to support addressing (e.g. ARP), so
AddressableEndpoint should only be implemented for protocols that
support addressing such as IPv4 and IPv6.

With this change, tcpip.ErrNotSupported will be returned by the stack
when attempting to modify addresses on a network endpoint that does
not support addressing.

Now that packets are fully handled at the network layer, and (with this
change) addresses are optional for network endpoints, we no longer need
the workaround for ARP where a fake ARP address was added to each NIC
that performs ARP so that packets would be delivered to the ARP layer.

PiperOrigin-RevId: 342722547
2020-11-16 14:36:10 -08:00
Fabricio Voznika 74be0dd0d5 Remove TESTONLY tag from vfs2 flag
Updates #1035

PiperOrigin-RevId: 342168926
2020-11-12 17:44:53 -08:00
Fabricio Voznika 0e8fdfd388 Re-add start/stop container tests
Due to a type doDestroyNotStartedTest was being tested
2x instead of doDestroyStartingTest.

PiperOrigin-RevId: 340969797
2020-11-05 19:06:43 -08:00
Fabricio Voznika 62b0e845b7 Return failure when `runsc events` queries a stopped container
This was causing gvisor-containerd-shim to crash because the command
suceeded, but there was no stat present.

PiperOrigin-RevId: 340964921
2020-11-05 18:18:21 -08:00
Fabricio Voznika c47f8afe23 Fix failure setting OOM score adjustment
When OOM score adjustment needs to be set, all the containers need to be
loaded to find all containers that belong to the sandbox. However, each
load signals the container to ensure it is still alive. OOM score
adjustment is set during creation and deletion of every container, generating
a flood of signals to all containers. The fix removes the signal check
when it's not needed.

There is also a race fetching OOM score adjustment value from the parent when
the sandbox exits at the same  time (the time it took to signal containers above
made this window quite large). The fix is to store the original value
in the sandbox state file and use it when the value needs to be restored.

Also add more logging and made the existing ones more consistent to help with
debugging.

PiperOrigin-RevId: 340940799
2020-11-05 15:36:20 -08:00
Ting-Yu Wang 1cfa8d58f6 Fix more nogo tests
PiperOrigin-RevId: 340536306
2020-11-03 15:23:32 -08:00
Kevin Krakauer 02fe467b47 Keep magic constants out of netstack
PiperOrigin-RevId: 339721152
2020-10-29 12:22:21 -07:00
Dean Deng 3b4674ffe0 Add logging option to leak checker.
Also refactor the template and CheckedObject interface to make this cleaner.

Updates #1486.

PiperOrigin-RevId: 339577120
2020-10-28 18:23:29 -07:00