Commit Graph

908 Commits

Author SHA1 Message Date
Zach Koopmans c07dc3828a [SMT] Refactor runsc mititgate
Refactor mitigate to use /sys/devices/system/cpu/smt/control instead
of individual CPU control files.

PiperOrigin-RevId: 389215975
2021-08-06 11:10:54 -07:00
Chong Cai cbb99336ce Add Fs controls
Add Fs controls and implement "cat" command.

PiperOrigin-RevId: 388812540
2021-08-04 16:44:11 -07:00
Chong Cai 8caf231cb1 Add Lifecycle controls
Also change runsc pause/resume cmd to access Lifecycle instead of
containerManager.

PiperOrigin-RevId: 388534928
2021-08-03 13:49:26 -07:00
gVisor bot 9a96e00f0f Merge pull request #6292 from btw616:local-timezone
PiperOrigin-RevId: 386988406
2021-07-26 16:47:13 -07:00
Lucas Manning 0eea96057a Add support for SIOCGIFCONF ioctl in hostinet.
PiperOrigin-RevId: 386511818
2021-07-23 12:52:44 -07:00
Andrei Vagin 47f025461e runsc: Wait child processes without timeouts
* First, we don't need to poll child processes.
* Second, the 5 seconds timeout is too small if a host is overloaded.
* Third, this can hide bugs in the code when we wait a process that
  isn't going to exit.

PiperOrigin-RevId: 386337586
2021-07-22 15:40:40 -07:00
Fabricio Voznika 990cd1a950 Don't kill container when volume is unmounted
The gofer session is killed when a gofer backed volume is unmounted. The
gofer monitor catches the disconnect and kills the container. This changes
the gofer monitor to only care about the rootfs connections, which cannot
be unmounted.

Fixes #6259

PiperOrigin-RevId: 385929039
2021-07-20 20:57:09 -07:00
Jamie Liu 1ad3822200 Add go:build directives as required by Go 1.17's gofmt.
PiperOrigin-RevId: 385894869
2021-07-20 16:28:45 -07:00
Fabricio Voznika 85a0a353ad Replace whitelist with allowlist
PiperOrigin-RevId: 384586164
2021-07-13 17:20:41 -07:00
Fabricio Voznika c16e69a9d5 Use consistent naming for subcontainers
It was confusing to find functions relating to root and non-root
containers. Replace "non-root" and "subcontainer" and make naming
consistent in Sandbox and controller.

PiperOrigin-RevId: 384512518
2021-07-13 11:36:13 -07:00
Fabricio Voznika f51e0486d4 Fix stdios ownership
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.

1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.

Fixes #6180

PiperOrigin-RevId: 384347009
2021-07-12 16:55:40 -07:00
Fabricio Voznika 7132b9a07b Fix GoLand analyzer errors under runsc/...
PiperOrigin-RevId: 384344990
2021-07-12 16:45:33 -07:00
Tiwei Bie c7ac581049 runsc: fix the local timezone support in logs
This patch fixes the local timezone support in logs by creating
etc/localtime in the rootfs of sandbox process and gofer process
based on the current /etc/localtime on host.

Before this patch, the timestamps in sandbox and gofer logs will
fallback to UTC timezone after execving "/proc/self/exe" which
may not be very convenient for users to analyse the logs:

I0708 15:37:43.825100       1 chroot.go:69] Setting up sandbox chroot in "/tmp"
I0708 15:37:43.825189       1 chroot.go:31] Mounting "proc" at "/tmp/proc"
......
I0708 15:37:43.850926       1 cmd.go:73] Execve "/proc/self/exe" again, bye!
I0708 07:37:43.856719       1 main.go:218] ***************************
I0708 07:37:43.856751       1 main.go:219] Args: [runsc-sandbox --root=/run/...]
I0708 07:37:43.856785       1 main.go:220] Version release-20210628.0-27-g02fec8dba5a6
I0708 07:37:43.856795       1 main.go:221] GOOS: linux
I0708 07:37:43.856803       1 main.go:222] GOARCH: amd64
......

Fixes #1984

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2021-07-09 10:14:17 +08:00
Tiwei Bie c4c5f4d92a runsc: check the error when preparing tree for pivot_root
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2021-07-09 10:14:17 +08:00
Kevin Krakauer f8207a8233 clarify safemount behavior
PiperOrigin-RevId: 383750666
2021-07-08 17:56:11 -07:00
Jamie Liu 052eb90dc1 Replace kernel.ExitStatus with linux.WaitStatus.
PiperOrigin-RevId: 383705129
2021-07-08 13:39:15 -07:00
Kevin Krakauer 3d32a05a35 runsc: validate mount targets
PiperOrigin-RevId: 382845950
2021-07-02 18:15:59 -07:00
Adin Scannell 16b751b6c6 Mix checklocks and atomic analyzers.
This change makes the checklocks analyzer considerable more powerful, adding:
* The ability to traverse complex structures, e.g. to have multiple nested
  fields as part of the annotation.
* The ability to resolve simple anonymous functions and closures, and perform
  lock analysis across these invocations. This does not apply to closures that
  are passed elsewhere, since it is not possible to know the context in which
  they might be invoked.
* The ability to annotate return values in addition to receivers and other
  parameters, with the same complex structures noted above.
* Ignoring locking semantics for "fresh" objects, i.e. objects that are
  allocated in the local frame (typically a new-style function).
* Sanity checking of locking state across block transitions and returns, to
  ensure that no unexpected locks are held.

Note that initially, most of these findings are excluded by a comprehensive
nogo.yaml. The findings that are included are fundamental lock violations.
The changes here should be relatively low risk, minor refactorings to either
include necessary annotations to simplify the code structure (in general
removing closures in favor of methods) so that the analyzer can be easily
track the lock state.

This change additional includes two changes to nogo itself:
* Sanity checking of all types to ensure that the binary and ast-derived
  types have a consistent objectpath, to prevent the bug above from occurring
  silently (and causing much confusion). This also requires a trick in
  order to ensure that serialized facts are consumable downstream. This can
  be removed with https://go-review.googlesource.com/c/tools/+/331789 merged.
* A minor refactoring to isolation the objdump settings in its own package.
  This was originally used to implement the sanity check above, but this
  information is now being passed another way. The minor refactor is preserved
  however, since it cleans up the code slightly and is minimal risk.

PiperOrigin-RevId: 382613300
2021-07-01 15:07:56 -07:00
Zach Koopmans 590b8d3e99 [syserror] Update several syserror errors to linuxerr equivalents.
Update/remove most syserror errors to linuxerr equivalents. For list
of removed errors, see //pkg/syserror/syserror.go.

PiperOrigin-RevId: 382574582
2021-07-01 12:05:19 -07:00
Lucas Manning 90dbb4b0c7 Add SIOCGIFFLAGS ioctl support to hostinet.
PiperOrigin-RevId: 382194711
2021-06-29 17:01:11 -07:00
Ian Lewis 2d899a843b Exit early with error message on checkpoint/pause w/ hostinet.
PiperOrigin-RevId: 381964660
2021-06-28 16:02:29 -07:00
gVisor bot e5526f4f26 Merge pull request #6222 from avagin:stop
PiperOrigin-RevId: 381561785
2021-06-25 15:43:17 -07:00
Zach Koopmans e1dc1c78e7 [syserror] Add conversions to linuxerr with temporary Equals method.
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors.
This will facilitate removal of syserror definitions in a followup, and
finding needed conversions from unix.Errno to linuxerr.

PiperOrigin-RevId: 380909667
2021-06-22 15:53:32 -07:00
Andrei Vagin d703340bc0 runsc: don't kill sandbox, let it stop properly
The typical sequence of calls to start a container looks like this

ct, err := container.New(conf, containerArgs)
defer ct.Destroy()
ct.Start(conf)
ws, err := ct.Wait()

For the root container, ct.Destroy() kills the sandbox process. This
doesn't look like a right wait to stop it. For example, all ongoing rpc
calls are aborted in this case. If everything is going alright, we can
just wait and it will exit itself.

Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-06-22 11:01:31 -07:00
Tamir Duberstein 3cf1644a3b Move tcpip.Clock impl to Timekeeper
...and pass it explicitly.

This reverts commit b63e61828d.

PiperOrigin-RevId: 380039167
2021-06-17 14:02:05 -07:00
Fabricio Voznika d81fcbf85c Set RLimits during `runsc exec`
PiperOrigin-RevId: 378726430
2021-06-10 13:55:10 -07:00
Fabricio Voznika 8d426b7381 Parse mmap protection and flags in strace
PiperOrigin-RevId: 378712518
2021-06-10 12:51:43 -07:00
Ayush Ranjan 9ede1a6058 [op] Move SignalInfo to abi/linux package.
Fixes #214

PiperOrigin-RevId: 378680466
2021-06-10 10:26:36 -07:00
gVisor bot d3ebc2db68 remove the erroneous (5th) filter argument to sendmmsg.
PiperOrigin-RevId: 378677167
2021-06-10 10:13:45 -07:00
Fabricio Voznika 1ca981f50f Remove --overlayfs-stale-read flag
It defaults to true and setting it to false can cause filesytem corruption.

PiperOrigin-RevId: 378518663
2021-06-09 15:53:44 -07:00
Fabricio Voznika 86cf56eb71 Add additional mmap seccomp rule
HostFileMapper.RegenerateMappings calls mmap with
MAP_SHARED|MAP_FIXED and these were not allowed.

Closes #6116

PiperOrigin-RevId: 377428463
2021-06-03 20:07:55 -07:00
Tamir Duberstein 758713f4c1 Initialize metrics at init
Avoids a race condition at kernel initialization.

Updates #6057.

PiperOrigin-RevId: 377357723
2021-06-03 13:18:43 -07:00
Ian Lewis 4f37469981 Update comments on ambient caps to point to bug
PiperOrigin-RevId: 376747671
2021-05-31 20:02:43 -07:00
Tamir Duberstein 097efe81a1 Use the stack RNG everywhere
...except in tests.

Note this replaces some uses of a cryptographic RNG with a plain RNG.

PiperOrigin-RevId: 376070666
2021-05-26 18:15:43 -07:00
Tamir Duberstein b63e61828d Initialize Kernel.Timekeeper before network NS
PiperOrigin-RevId: 375843579
2021-05-25 18:57:38 -07:00
Tamir Duberstein a54cb9d8a2 Use specific fmt verbs (avoid %v)
Remove useless conversions. Avoid unhandled errors.

PiperOrigin-RevId: 375834275
2021-05-25 17:48:34 -07:00
Fabricio Voznika ec542dbedf Suppress log message when there is no error
PiperOrigin-RevId: 374981100
2021-05-20 17:14:19 -07:00
Dean Deng 894187b2c6 Resolve remaining O_PATH TODOs.
O_PATH is now implemented in vfs2.

Fixes #2782.

PiperOrigin-RevId: 373861410
2021-05-14 14:04:46 -07:00
gVisor bot 3894c9fcb9 Merge pull request #5983 from btw616:fix/issue-5982
PiperOrigin-RevId: 373661350
2021-05-13 14:50:03 -07:00
Fabricio Voznika f3478b7516 Fix problem with grouped cgroups
cgroup controllers can be grouped together (e.g. cpu,cpuacct) and
that was confusing Cgroup.Install() into thinking that a cgroup
directory was created by the caller, when it had being created by
another controller that is grouped together.

PiperOrigin-RevId: 373661336
2021-05-13 14:44:08 -07:00
Tiwei Bie ddaa36bde5 Fix file descriptor leak in MultiGetAttr
We need to make sure that all children are closed before
return. But the last child saved in parent isn't closed
after we successfully iterate all the files in "names".
This patch fixes this issue.

Fixes #5982

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2021-05-13 09:08:20 +08:00
gVisor bot 6c349c675c Merge pull request #5764 from zhlhahaha:2126-2
PiperOrigin-RevId: 372993341
2021-05-10 12:59:03 -07:00
gVisor bot e691004e0c Merge pull request #5758 from zhlhahaha:2125
PiperOrigin-RevId: 372608247
2021-05-07 12:39:14 -07:00
howard zhang 0bff4afd0f Init all vCPU when initializing machine on ARM64
This patch is to solve problem that vCPU timer mess up when
adding vCPU dynamically on ARM64, for detailed information
please refer to:
https://github.com/google/gvisor/issues/5739

There is no influence on x86 and here are main changes for
ARM64:
1. create maxVCPUs number of vCPU in machine initialization
2. we want to sync gvisor vCPU number with host CPU number,
so use smaller number between runtime.NumCPU and
KVM_CAP_MAX_VCPUS to be maxVCPUS
3. put unused vCPUs into architecture-specific map initialvCPUs
4. When machine need to bind a new vCPU with tid, rather
than creating new one, it would pick a vCPU from map initalvCPUs
5. change the setSystemTime function. When vCPU number increasing,
the time cost for function setTSC(use syscall to set cntvoff) is
liner growth from around 300 ns to 100000 ns, and this leads to
the function setSystemTimeLegacy can not get correct offset
value.
6. initializing StdioFDs and goferFD before a platform to avoid
StdioFDs confects with vCPU fds

Signed-off-by: howard zhang <howard.zhang@arm.com>
2021-05-07 16:42:58 +08:00
Fabricio Voznika 9f33fe64f2 Fixes to runsc cgroups
When loading cgroups for another process, `/proc/self` was used in
a few places, causing the end state to be a mix of the process
and self. This is now fixes to always use the proper `/proc/[pid]`
path.

Added net_prio and net_cls to the list of optional controllers. This
is to allow runsc to execute then these cgroups are disabled as long
as there are no net_prio and net_cls limits that need to be applied.

Deflake TestMultiContainerEvent.

Closes #5875
Closes #5887

PiperOrigin-RevId: 372242687
2021-05-05 17:39:29 -07:00
Rahat Mahmood e00bd82816 Remove uses of the binary package from the rest of the sentry.
PiperOrigin-RevId: 372020696
2021-05-04 16:41:08 -07:00
Fabricio Voznika 95df852bf2 Make Mount.Type optional for bind mounts
According to the OCI spec Mount.Type is an optional field and it
defaults to "bind" when any of "bind" or "rbind" is included in
Mount.Options.

Also fix the shim to remove bind/rbind from options when mount is
converted from bind to tmpfs inside the Sentry.

Fixes #2330
Fixes #3274

PiperOrigin-RevId: 371996891
2021-05-04 14:36:06 -07:00
Fabricio Voznika 26adb3c474 Automated rollback of changelist 369686285
PiperOrigin-RevId: 371015541
2021-04-28 17:02:33 -07:00
Nayana Bidari 0a6eaed50b Add weirdness sentry metric.
Weirdness metric contains fields to track the number of clock fallback,
partial result and vsyscalls. This metric will avoid the overhead of
having three different metrics (fallbackMetric, partialResultMetric,
vsyscallCount).

PiperOrigin-RevId: 369970218
2021-04-22 16:07:15 -07:00
Michael Pratt c2955339d8 Automated rollback of changelist 369325957
PiperOrigin-RevId: 369686285
2021-04-21 10:41:28 -07:00