Commit Graph

816 Commits

Author SHA1 Message Date
Zach Koopmans 16b81308cf Add initial mitigate code and cpu parsing.
PiperOrigin-RevId: 353274135
2021-01-22 10:52:57 -08:00
Fabricio Voznika 9b4f4655ed Remove dependency to abi/linux
abi package is to be used by the Sentry to implement the Linux ABI.
Code dealing with the host should use x/sys/unix.

PiperOrigin-RevId: 353272679
2021-01-22 10:47:28 -08:00
Fabricio Voznika f14f3ba3ef Fix TestDuplicateEnvVariable flakyness
Updates #5226

PiperOrigin-RevId: 353262133
2021-01-22 09:57:44 -08:00
Fabricio Voznika 7bf656f4c6 Fix ownership change logic
Previously fsgofer was skipping chown call if the uid and gid
were the same as the current user/group. However, when setgid
is set, the group may not be the same as the caller. Instead,
compare the actual uid/gid of the file after it has been
created and change ownership only if needed.

Updates #180

PiperOrigin-RevId: 353118733
2021-01-21 15:47:23 -08:00
Dean Deng 1efe0ebc59 Switch uses of os.Getenv that check for empty string to os.LookupEnv.
Whether the variable was found is already returned by syscall.Getenv.
os.Getenv drops this value while os.Lookupenv passes it along.

PiperOrigin-RevId: 351674032
2021-01-13 15:15:20 -08:00
Adin Scannell 4e03e87547 Fix simple mistakes identified by goreportcard.
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.

PiperOrigin-RevId: 351425971
2021-01-12 12:38:22 -08:00
Fabricio Voznika 7e462a1c7f OCI spec may contain duplicate environment variables
Closes #5226

PiperOrigin-RevId: 351259576
2021-01-11 16:25:50 -08:00
Adin Scannell b06e5bc5b0 Add benchmarks targets to BuildKite.
This includes minor fix-ups:

* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.

PiperOrigin-RevId: 350205718
2021-01-05 13:21:54 -08:00
Andrei Vagin 622db84e4b Internal changes.
PiperOrigin-RevId: 350159657
2021-01-05 09:53:42 -08:00
Fabricio Voznika 1b66bad7c4 Fix condition checking in `runsc debug`
Closes #5052

PiperOrigin-RevId: 349579814
2020-12-30 11:18:36 -08:00
Adin Scannell 85c1c3ed4b Make profiling commands synchronous.
This allows for a model of profiling when you can start collection, and
it will terminate when the sandbox terminates. Without this synchronous
call, it is effectively impossible to collect length blocking and mutex
profiles.

PiperOrigin-RevId: 349483418
2020-12-29 16:23:01 -08:00
Etienne Perot 9a72730f24 Typo fix.
PiperOrigin-RevId: 348106699
2020-12-17 15:39:03 -08:00
Ayush Ranjan 028271b530 [netstack] Implement IP(V6)_RECVERR socket option.
PiperOrigin-RevId: 348055514
2020-12-17 11:10:41 -08:00
Fabricio Voznika 8ea19b5818 Add sandbox ID to state file name
This allows to find all containers inside a sandbox more efficiently.
This operation is required every time a container starts and stops,
and previously required loading *all* container state files to check
whether the container belonged to the sandbox.

Apert from being inneficient, it has caused problems when state files
are stale or corrupt, causing inavalability to create any container.

Also adjust commands `list` and `debug` to skip over files that fail
to load.

Resolves #5052

PiperOrigin-RevId: 348050637
2020-12-17 10:52:44 -08:00
Fabricio Voznika e7493a9e23 Set max memory not min
Closes #5048

PiperOrigin-RevId: 348050472
2020-12-17 10:46:47 -08:00
Fabricio Voznika 12ac31ed04 fsgofer optimizations
- Skip chown call in case owner change is not needed
- Skip filepath.Clean() calls when joining paths
- Pass unix.Stat_t by value to reduce runtime.duffcopy calls.
  This change allows for better inlining in localFile.walk().

                                Change            Baseline    Improvement
BenchmarkWalkOne-6           	 2912 ns/op       3082 ns/op     5.5%
BenchmarkCreate-6            	15915 ns/op      19126 ns/op    16.8%
BenchmarkCreateDiffOwner-6	18795 ns/op      19741 ns/op     4.8%

PiperOrigin-RevId: 347667833
2020-12-15 12:23:55 -08:00
Ayush Ranjan a1c56bc227 [netstack] Update raw socket and hostinet control message parsing.
There are surprisingly few syscall tests that run with hostinet. For example
running the following command only returns two results:
`bazel query test/syscalls:all | grep hostnet`

I think as a result, as our control messages evolved, hostinet was left
behind. Update it to support all control messages netstack supports.

This change also updates sentry's control message parsing logic to make it up to
date with all the control messages we support.

PiperOrigin-RevId: 347508892
2020-12-14 18:00:55 -08:00
Dean Deng 80379894d3 Add runsc symbolize command.
This command takes instruction pointers from stdin and converts them into their
corresponding file names and line/column numbers in the runsc source code. The
inputs are not interpreted as actual addresses, but as synthetic values that are
exposed through /sys/kernel/debug/kcov. One can extract coverage information
from kcov and translate those values into locations in the source code by
running symbolize on the same runsc binary.

This will allow us to generate syzkaller coverage reports.

PiperOrigin-RevId: 347089624
2020-12-11 15:43:22 -08:00
Adin Scannell 4cba3904f4 Remove existing nogo exceptions.
PiperOrigin-RevId: 347047550
2020-12-11 12:06:49 -08:00
Bhasker Hariharan bcb97a3bb7 Disable host reassembly for fragments.
fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket
to ensure that fragments are delivered inorder to the right dispatcher. But this
prevents fragments from being delivered to gvisor at all and makes testing of
gvisor's fragment reassembly code impossible.

The potential impact from this is minimal since IP Fragmentation is not really
that prevelant and in cases where we do get fragments we may deliver the
fragment out of order to the TCP layer as multiple network dispatchers may
process the fragments and deliver a reassembled fragment after the next packet
has been delivered to the TCP endpoint. While not desirable I believe the impact
from this is minimal due to low prevalence of fragmentation.

Also removed PktType and Hatype fields when binding the socket as these are not
used when binding. Its just confusing to have them specified.

See: https://man7.org/linux/man-pages/man7/packet.7.html
"Fields used for binding are
       sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex."

Fixes #5055

PiperOrigin-RevId: 346919439
2020-12-10 20:08:59 -08:00
Adin Scannell 65a2242db4 Tweak aarch64 support.
A few images were broken with respect to aarch64. We should now
be able to run push-all-images with ARCH=aarch64 as part of the
regular continuous integration builds, and add aarch64 smoke tests
(via user emulation for now) to the regular test suite (future).

PiperOrigin-RevId: 346685462
2020-12-09 18:51:17 -08:00
Peter Johnston eeb23531eb Support icmpv6 transport protocol
PiperOrigin-RevId: 346101076
2020-12-07 08:44:44 -08:00
Jamie Liu b80021afd2 Overlay runsc regular file mounts with regular files.
Fixes #4991

PiperOrigin-RevId: 345800333
2020-12-04 19:13:24 -08:00
Dean Deng bec8cea651 Surface usage message for `runsc do`.
c.Usage() only returns a string; f.Usage() will print the usage message.

PiperOrigin-RevId: 345500123
2020-12-03 11:47:30 -08:00
Adin Scannell 80552b936d Support partitions for other tests.
PiperOrigin-RevId: 345399936
2020-12-03 01:00:21 -08:00
Fabricio Voznika 209a95a35a Propagate IP address prefix from host to netstack
Closes #4022

PiperOrigin-RevId: 343378647
2020-11-19 15:11:17 -08:00
Andrei Vagin 764504c38f runsc: check whether cgroup exists or not for each controller
We have seen a case when a memory cgroup exists but a perf_event one doesn't.

Reported-by: syzbot+f31468b61d1a27e629dc@syzkaller.appspotmail.com
Reported-by: syzbot+1f163ec0321768f1497e@syzkaller.appspotmail.com
PiperOrigin-RevId: 343200070
2020-11-18 18:37:31 -08:00
Fabricio Voznika 7158095d68 Fix race condition in multi-container wait test
Container is not thread-safe, locking must be done in the caller.
The test was calling Container.Wait() from multiple threads with
no synchronization.

Also removed Container.WaitPID from test because the process might
have already existed when wait is called.

PiperOrigin-RevId: 343176280
2020-11-18 16:06:31 -08:00
Fabricio Voznika e2d9a68eef Add support for TTY in multi-container
Fixes #2714

PiperOrigin-RevId: 342950412
2020-11-17 14:51:24 -08:00
Ghanan Gowripalan cc5cfce4c6 Remove ARP address workaround
- Make AddressableEndpoint optional for NetworkEndpoint.
Not all NetworkEndpoints need to support addressing (e.g. ARP), so
AddressableEndpoint should only be implemented for protocols that
support addressing such as IPv4 and IPv6.

With this change, tcpip.ErrNotSupported will be returned by the stack
when attempting to modify addresses on a network endpoint that does
not support addressing.

Now that packets are fully handled at the network layer, and (with this
change) addresses are optional for network endpoints, we no longer need
the workaround for ARP where a fake ARP address was added to each NIC
that performs ARP so that packets would be delivered to the ARP layer.

PiperOrigin-RevId: 342722547
2020-11-16 14:36:10 -08:00
Fabricio Voznika 74be0dd0d5 Remove TESTONLY tag from vfs2 flag
Updates #1035

PiperOrigin-RevId: 342168926
2020-11-12 17:44:53 -08:00
Fabricio Voznika 0e8fdfd388 Re-add start/stop container tests
Due to a type doDestroyNotStartedTest was being tested
2x instead of doDestroyStartingTest.

PiperOrigin-RevId: 340969797
2020-11-05 19:06:43 -08:00
Fabricio Voznika 62b0e845b7 Return failure when `runsc events` queries a stopped container
This was causing gvisor-containerd-shim to crash because the command
suceeded, but there was no stat present.

PiperOrigin-RevId: 340964921
2020-11-05 18:18:21 -08:00
Fabricio Voznika c47f8afe23 Fix failure setting OOM score adjustment
When OOM score adjustment needs to be set, all the containers need to be
loaded to find all containers that belong to the sandbox. However, each
load signals the container to ensure it is still alive. OOM score
adjustment is set during creation and deletion of every container, generating
a flood of signals to all containers. The fix removes the signal check
when it's not needed.

There is also a race fetching OOM score adjustment value from the parent when
the sandbox exits at the same  time (the time it took to signal containers above
made this window quite large). The fix is to store the original value
in the sandbox state file and use it when the value needs to be restored.

Also add more logging and made the existing ones more consistent to help with
debugging.

PiperOrigin-RevId: 340940799
2020-11-05 15:36:20 -08:00
Ting-Yu Wang 1cfa8d58f6 Fix more nogo tests
PiperOrigin-RevId: 340536306
2020-11-03 15:23:32 -08:00
Kevin Krakauer 02fe467b47 Keep magic constants out of netstack
PiperOrigin-RevId: 339721152
2020-10-29 12:22:21 -07:00
Dean Deng 3b4674ffe0 Add logging option to leak checker.
Also refactor the template and CheckedObject interface to make this cleaner.

Updates #1486.

PiperOrigin-RevId: 339577120
2020-10-28 18:23:29 -07:00
gVisor bot 22ac9b0723 Merge pull request #4587 from lnsp:stacktrace
PiperOrigin-RevId: 339385609
2020-10-27 20:43:02 -07:00
Fabricio Voznika 93d2d37a93 Add more cgroup unit tests
PiperOrigin-RevId: 339380431
2020-10-27 19:46:51 -07:00
gVisor bot 013d79d8e4 Merge pull request #4420 from workato:dev-options
PiperOrigin-RevId: 339363816
2020-10-27 17:22:26 -07:00
Konstantin Baranov 2b72da8bf9 Allow overriding mount options for /dev and /dev/pts
This is useful to optionally set /dev ro,noexec.

Treat /dev and /dev/pts the same as /proc and /sys.
Make sure the Type is right though. Many config.json snippets
on the Internet suggest /dev is tmpfs, not devtmpfs.
2020-10-26 18:02:52 -07:00
Fabricio Voznika 3ed8ace871 Fix nogo errors in specutils
PiperOrigin-RevId: 338780793
2020-10-23 18:35:45 -07:00
Jamie Liu 9f87400f08 Support VFS2 save/restore.
Inode number consistency checks are now skipped in save/restore tests for
reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename.
They pass in VFS1 due to the bug described in new test case
SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs.

Fixes #1663

PiperOrigin-RevId: 338776148
2020-10-23 17:48:33 -07:00
Dean Deng 9ca66ec598 Rewrite reference leak checker without finalizers.
Our current reference leak checker uses finalizers to verify whether an object
has reached zero references before it is garbage collected. There are multiple
problems with this mechanism, so a rewrite is in order.

With finalizers, there is no way to guarantee that a finalizer will run before
the program exits. When an unreachable object with a finalizer is garbage
collected, its finalizer will be added to a queue and run asynchronously. The
best we can do is run garbage collection upon sandbox exit to make sure that
all finalizers are enqueued.

Furthermore, if there is a chain of finalized objects, e.g. A points to B
points to C, garbage collection needs to run multiple times before all of the
finalizers are enqueued. The first GC run will register the finalizer for A but
not free it. It takes another GC run to free A, at which point B's finalizer
can be registered. As a result, we need to run GC as many times as the length
of the longest such chain to have a somewhat reliable leak checker.

Finally, a cyclical chain of structs pointing to one another will never be
garbage collected if a finalizer is set. This is a well-known issue with Go
finalizers (https://github.com/golang/go/issues/7358). Using leak checking on
filesystem objects that produce cycles will not work and even result in memory
leaks.

The new leak checker stores reference counted objects in a global map when
leak check is enabled and removes them once they are destroyed. At sandbox
exit, any remaining objects in the map are considered as leaked. This provides
a deterministic way of detecting leaks without relying on the complexities of
finalizers and garbage collection.

This approach has several benefits over the former, including:
- Always detects leaks of objects that should be destroyed very close to
  sandbox exit. The old checker very rarely detected these leaks, because it
  relied on garbage collection to be run in a short window of time.
- Panics if we forgot to enable leak check on a ref-counted object (we will try
  to remove it from the map when it is destroyed, but it will never have been
  added).
- Can store extra logging information in the map values without adding to the
  size of the ref count struct itself. With the size of just an int64, the ref
  count object remains compact, meaning frequent operations like IncRef/DecRef
  are more cache-efficient.
- Can aggregate leak results in a single report after the sandbox exits.
  Instead of having warnings littered in the log, which were
  non-deterministically triggered by garbage collection, we can print all
  warning messages at once. Note that this could also be a limitation--the
  sandbox must exit properly for leaks to be detected.

Some basic benchmarking indicates that this change does not significantly
affect performance when leak checking is enabled, which is understandable
since registering/unregistering is only done once for each filesystem object.

Updates #1486.

PiperOrigin-RevId: 338685972
2020-10-23 09:17:02 -07:00
Lennart Espe 3b8193e762 Add --traceback flag to customize GOTRACEBACK level 2020-10-23 08:46:57 +00:00
Fabricio Voznika 293877cf64 Load spec during "runsc start" to process flag overrides
Subcontainers are only configured when the container starts, however because
start doesn't load the spec, flag annotations that may override flags were
not getting applied to the configuration.

Updates #3494

PiperOrigin-RevId: 338610953
2020-10-22 22:07:06 -07:00
gVisor bot 1a5eb49a43 Merge pull request #3957 from workato:auto-cgroup
PiperOrigin-RevId: 338372736
2020-10-21 17:24:06 -07:00
Konstantin Baranov d579ed8505 Do not even try forcing cgroups in tests 2020-10-20 20:03:04 -07:00
Fabricio Voznika c21d8375d9 Add /dev to mandatory mounts test
PiperOrigin-RevId: 338072845
2020-10-20 09:20:49 -07:00
Jamie Liu cd86bd4931 Fix runsc tests on VFS2 overlay.
- Check the sticky bit in overlay.filesystem.UnlinkAt(). Fixes
  StickyTest.StickyBitPermDenied.

- When configuring a VFS2 overlay in runsc, copy the lower layer's root
  owner/group/mode to the upper layer's root (as in the VFS1 equivalent,
  boot.addOverlay()). This makes the overlay root owned by UID/GID 65534 with
  mode 0755 rather than owned by UID/GID 0 with mode 01777. Fixes
  CreateTest.CreateFailsOnUnpermittedDir, which assumes that the test cannot
  create files in /.

- MknodTest.UnimplementedTypesReturnError assumes that the creation of device
  special files is not supported. However, while the VFS2 gofer client still
  doesn't support device special files, VFS2 tmpfs does, and in the overlay
  test dimension mknod() targets a tmpfs upper layer. The test initially has
  all capabilities, including CAP_MKNOD, so its creation of these files
  succeeds. Constrain these tests to VFS1.

- Rename overlay.nonDirectoryFD to overlay.regularFileFD and only use it for
  regular files, using the original FD for pipes and device special files. This
  is more consistent with Linux (which gets the original inode_operations, and
  therefore file_operations, for these file types from ovl_fill_inode() =>
  init_special_inode()) and fixes remaining mknod and pipe tests.

- Read/write 1KB at a time in PipeTest.Streaming, rather than 4 bytes. This
  isn't strictly necessary, but it makes the test less obnoxiously slow on
  ptrace.

Fixes #4407

PiperOrigin-RevId: 337971042
2020-10-19 17:48:02 -07:00