Commit Graph

5727 Commits

Author SHA1 Message Date
Zyad A. Ali 7eae6402c1 Implement Registry.FindOrCreate.
FindOrCreate implements the behaviour of msgget(2).

Updates #135
2021-07-13 22:12:02 +02:00
Zyad A. Ali 7c488fcfe8 Create package msgqueue.
Create package msgqueue, define primitives to be used for message
queues, and add a msgqueue.Registry to IPCNamespace.

Updates #135
2021-07-13 22:12:02 +02:00
Zyad A. Ali c8851be593 Add initial test cases for msgget(2).
Updates #135
2021-07-13 22:12:02 +02:00
Zyad A. Ali 44c8766d2e Add abi definitions for sysv message queues.
Updates #135
2021-07-13 22:12:02 +02:00
Zyad A. Ali 35a1ff8d39 Create ipc.Registry.
Create ipc.Registry to hold fields, and define functionality common to
all SysV registries, and have registries use it.
2021-07-13 22:12:02 +02:00
Zyad A. Ali 7a73169229 Create ipc package and ipc.Object.
Create ipc.Object to define fields and functionality used in SysV
mechanisms, and have them use it.
2021-07-13 22:09:41 +02:00
Fabricio Voznika c16e69a9d5 Use consistent naming for subcontainers
It was confusing to find functions relating to root and non-root
containers. Replace "non-root" and "subcontainer" and make naming
consistent in Sandbox and controller.

PiperOrigin-RevId: 384512518
2021-07-13 11:36:13 -07:00
Kevin Krakauer 1fe6db8c54 netstack: atomically update buffer sizes
Previously, two calls to set the send or receive buffer size could have raced
and left state wherein:
- The actual size depended on one call
- The value returned by getsockopt() depended on the other

PiperOrigin-RevId: 384508720
2021-07-13 11:20:54 -07:00
Ghanan Gowripalan b4caeaf78f Deflake TestRouterSolicitation
Before this change, transmission of the first router solicitation races
with the adding of an IPv6 link-local address. This change creates the
NIC in the disabled state and is only enabled after the address is added
(if required) to avoid this race.

PiperOrigin-RevId: 384493553
2021-07-13 10:22:05 -07:00
Kevin Krakauer e35d20f79c netstack: move SO_SNDBUF/RCVBUF clamping logic out of //pkg/tcpip
- Keeps Linux-specific behavior out of //pkg/tcpip
- Makes it clearer that clamping is done only for setsockopt calls from users
- Removes code duplication

PiperOrigin-RevId: 384389809
2021-07-12 22:37:11 -07:00
Fabricio Voznika 520795aaad Fix deadlock in procfs
Kernfs provides an internal mechanism to defer calls to `DecRef()` because
on the last reference `Filesystem.mu` must be held and most places that
need to call `DecRef()` are inside the lock. The same can be true for
filesystems that extend kernfs. procfs needs to look up files and `DecRef()`
them inside the `kernfs.Filesystem.mu`. If the files happen to be procfs
files, it can deadlock trying to decrement if it's the last reference.
This change extends the mechanism to external callers to defer DecRefs
to `vfs.FileDescription` and `vfs.VirtualDentries`.

PiperOrigin-RevId: 384361647
2021-07-12 18:30:46 -07:00
Adin Scannell 275932bf08 Drop dedicated benchmark lifecycle.
Instead, roll the output scraping into the main runner. Pass a perf flag to
the runner in order to control leak checking, apply tags via the macro and
appropriately disable logging. This may be removed in the future.

PiperOrigin-RevId: 384348035
2021-07-12 17:00:51 -07:00
Fabricio Voznika f51e0486d4 Fix stdios ownership
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.

1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.

Fixes #6180

PiperOrigin-RevId: 384347009
2021-07-12 16:55:40 -07:00
Fabricio Voznika 7132b9a07b Fix GoLand analyzer errors under runsc/...
PiperOrigin-RevId: 384344990
2021-07-12 16:45:33 -07:00
Zach Koopmans e3fdd15932 [syserror] Update syserror to linuxerr for more errors.
Update the following from syserror to the linuxerr equivalent:
EEXIST
EFAULT
ENOTDIR
ENOTTY
EOPNOTSUPP
ERANGE
ESRCH

PiperOrigin-RevId: 384329869
2021-07-12 15:26:20 -07:00
Andrei Vagin ebe99977a4 Mark all functions that are called from a forked child with go:norace
PiperOrigin-RevId: 384305599
2021-07-12 13:34:03 -07:00
Jamie Liu 9c09db654e Fix async-signal-unsafety in chroot test.
PiperOrigin-RevId: 384295543
2021-07-12 12:49:48 -07:00
Tamir Duberstein 4742f7d788 Prevent interleaving in sniffer pcap output
Remove "partial write" handling as io.Writer.Write is not permitted to
return a nil error on partial writes, and this code was already
panicking on non-nil errors.

PiperOrigin-RevId: 384289970
2021-07-12 12:24:17 -07:00
Adin Scannell 1f396d8c16 Prevent the cleanup script from destroying any "bootstrap" containers.
PiperOrigin-RevId: 384257460
2021-07-12 10:03:04 -07:00
Michael Pratt 36a17a814b Go 1.17 support for the KVM platform
Go 1.17 adds a new register-based calling convention. While transparent for
most applications, the KVM platform needs special work in a few cases.

First of all, we need the actual address of some assembly functions, rather
than the address of a wrapper. See http://gvisor.dev/pr/5832 for complete
discussion of this.

More relevant to this CL is that ABI0-to-ABIInternal wrappers (i.e., calls from
assembly to Go) access the G via FS_BASE. The KVM quite fast-and-loose about
the Go environment, often calling into (nosplit) Go functions with
uninitialized FS_BASE.

That will no longer work in Go 1.17, so this CL changes the platform to
consistently restore FS_BASE before calling into Go code.

This CL does not affect arm64 code. Go 1.17 does not support the register-based
calling convention for arm64 (it will come in 1.18), but arm64 also does not
use a non-standard register like FS_BASE for TLS, so it may not require any
changes.

PiperOrigin-RevId: 384234305
2021-07-12 08:01:53 -07:00
Adin Scannell d78713e2da Drop unnecessary checklocksignore.
PiperOrigin-RevId: 383940663
2021-07-09 16:00:25 -07:00
Jamie Liu de29d8d415 Fix some //pkg/seccomp bugs.
- LockOSThread() around prctl(PR_SET_NO_NEW_PRIVS) => seccomp(). go:nosplit
  "mostly" prevents async preemption, but IIUC preemption is still permitted
  during function prologues:

funcpctab "".seccomp [valfunc=pctopcdata]
     0     -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	TEXT	"".seccomp(SB), NOSPLIT|ABIInternal, $72-32
     0        00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	TEXT	"".seccomp(SB), NOSPLIT|ABIInternal, $72-32
     0     -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	SUBQ	$72, SP
     4        00004 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	MOVQ	BP, 64(SP)
     9        00009 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	LEAQ	64(SP), BP
     e        00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	FUNCDATA	$0, gclocals·ba30782f8935b28ed1adaec603e72627(SB)
     e        00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	FUNCDATA	$1, gclocals·663f8c6bfa83aa777198789ce63d9ab4(SB)
     e        00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110)	FUNCDATA	$2, "".seccomp.stkobj(SB)
     e        00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111)	PCDATA	$0, $-2
     e     -2 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111)	MOVQ	"".ptr+88(SP), AX

(-1 is objabi.PCDATA_UnsafePointSafe and -2 is objabi.PCDATA_UnsafePointUnsafe,
from Go's cmd/internal/objabi.)

- Handle non-errno failures from seccomp() with SECCOMP_FILTER_FLAG_TSYNC.

PiperOrigin-RevId: 383757580
2021-07-08 18:59:01 -07:00
Kevin Krakauer f8207a8233 clarify safemount behavior
PiperOrigin-RevId: 383750666
2021-07-08 17:56:11 -07:00
Jamie Liu 052eb90dc1 Replace kernel.ExitStatus with linux.WaitStatus.
PiperOrigin-RevId: 383705129
2021-07-08 13:39:15 -07:00
Jamie Liu fbd4ccf333 Fix async-signal-unsafety in socket test.
PiperOrigin-RevId: 383689096
2021-07-08 12:26:56 -07:00
Etienne Perot 07f2c8b56b devpts: Notify of echo'd input queue bytes only after locks have been released.
PiperOrigin-RevId: 383684320
2021-07-08 12:04:56 -07:00
Bhasker Hariharan 1fc7a9eac2 Do not queue zero sized segments.
Commit 16b751b6c6 introduced a bug where writes of
zero size would end up queueing a zero sized segment which will cause the
sandbox to panic when trying to send a zero sized segment(e.g. after an RTO) as
netstack asserts that the all non FIN segments have size > 0.

This change adds the check for a zero sized payload back to avoid queueing
such segments. The associated test panics without the fix and passes with it.

PiperOrigin-RevId: 383677884
2021-07-08 11:35:18 -07:00
Tamir Duberstein 02fec8dba5 Move time.Now() call to sniffer
PiperOrigin-RevId: 383481745
2021-07-07 13:30:31 -07:00
Etienne Perot cd558fcb05 Sentry: Measure the time it takes to initialize the Sentry.
PiperOrigin-RevId: 383472507
2021-07-07 12:48:24 -07:00
Tamir Duberstein b63631b46c Use time package-level variable
PiperOrigin-RevId: 383426091
2021-07-07 09:15:41 -07:00
Ayush Ranjan add8bca5ba [op] Make TCPNonBlockingConnectClose more reasonable.
This test single handedly causes the syscalls:socket_inet_loopback_test test
variants to take more than an hour to run on some of our testing environments.

Reduce how aggressively this test tries to replicate a fixed flake. This is a
regression test.

PiperOrigin-RevId: 382849039
2021-07-02 18:47:48 -07:00
Kevin Krakauer 3d32a05a35 runsc: validate mount targets
PiperOrigin-RevId: 382845950
2021-07-02 18:15:59 -07:00
gVisor bot fcf0ff2fc1 Merge pull request #6258 from liornm:fix-iptables-input-interface
PiperOrigin-RevId: 382788878
2021-07-02 12:18:14 -07:00
Ghanan Gowripalan a51a4b872e Discover more specific routes as per RFC 4191
More-specific route discovery allows hosts to pick a more appropriate
router for off-link destinations.

Fixes #6172.

PiperOrigin-RevId: 382779880
2021-07-02 11:31:42 -07:00
Adin Scannell 16b751b6c6 Mix checklocks and atomic analyzers.
This change makes the checklocks analyzer considerable more powerful, adding:
* The ability to traverse complex structures, e.g. to have multiple nested
  fields as part of the annotation.
* The ability to resolve simple anonymous functions and closures, and perform
  lock analysis across these invocations. This does not apply to closures that
  are passed elsewhere, since it is not possible to know the context in which
  they might be invoked.
* The ability to annotate return values in addition to receivers and other
  parameters, with the same complex structures noted above.
* Ignoring locking semantics for "fresh" objects, i.e. objects that are
  allocated in the local frame (typically a new-style function).
* Sanity checking of locking state across block transitions and returns, to
  ensure that no unexpected locks are held.

Note that initially, most of these findings are excluded by a comprehensive
nogo.yaml. The findings that are included are fundamental lock violations.
The changes here should be relatively low risk, minor refactorings to either
include necessary annotations to simplify the code structure (in general
removing closures in favor of methods) so that the analyzer can be easily
track the lock state.

This change additional includes two changes to nogo itself:
* Sanity checking of all types to ensure that the binary and ast-derived
  types have a consistent objectpath, to prevent the bug above from occurring
  silently (and causing much confusion). This also requires a trick in
  order to ensure that serialized facts are consumable downstream. This can
  be removed with https://go-review.googlesource.com/c/tools/+/331789 merged.
* A minor refactoring to isolation the objdump settings in its own package.
  This was originally used to implement the sanity check above, but this
  information is now being passed another way. The minor refactor is preserved
  however, since it cleans up the code slightly and is minimal risk.

PiperOrigin-RevId: 382613300
2021-07-01 15:07:56 -07:00
Bhasker Hariharan 570ca57180 Fix bug with TCP bind w/ SO_REUSEADDR.
In gVisor today its possible that when trying to bind a TCP socket
w/ SO_REUSEADDR specified and requesting the kernel pick a port by
setting port to zero can result in a previously bound port being
returned. This behaviour is incorrect as the user is clearly requesting
a free port. The behaviour is fine when the user explicity specifies
a port.

This change now checks if the user specified a port when making a port
reservation for a TCP port and only returns unbound ports even if
SO_REUSEADDR was specified.

Fixes #6209

PiperOrigin-RevId: 382607638
2021-07-01 14:42:00 -07:00
Fabricio Voznika 3d4a8824f8 Strace: handle null paths
PiperOrigin-RevId: 382603592
2021-07-01 14:23:01 -07:00
Zach Koopmans 590b8d3e99 [syserror] Update several syserror errors to linuxerr equivalents.
Update/remove most syserror errors to linuxerr equivalents. For list
of removed errors, see //pkg/syserror/syserror.go.

PiperOrigin-RevId: 382574582
2021-07-01 12:05:19 -07:00
Ghanan Gowripalan 07ffecef83 Implement fmt.Stringer for NDPRoutePreference
PiperOrigin-RevId: 382427879
2021-06-30 18:33:49 -07:00
Zach Koopmans 6ef2684096 [syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.
Update all instances of the above errors to the faster linuxerr implementation.
With the temporary linuxerr.Equals(), no logical changes are made.

PiperOrigin-RevId: 382306655
2021-06-30 08:18:59 -07:00
Ghanan Gowripalan 66a79461a2 Support parsing NDP Route Information option
This change prepares for a later change which supports the NDP
Route Information option to discover more-specific routes, as
per RFC 4191.

Updates #6172.

PiperOrigin-RevId: 382225812
2021-06-29 21:32:09 -07:00
gVisor bot 3e5a6981d6 Merge pull request #6085 from liornm:fix-tun-no_pi
PiperOrigin-RevId: 382202462
2021-06-29 17:54:17 -07:00
Chong Cai 57095bd3bd Sort children map before hash
The unordered map may generate different hash due to its order. The
children map needs to be sorted each time before hashing to avoid false
verification failure due to the map.

Store the sorted children map in verity dentry to avoid sorting it each
time verification happens.

Also serialize the whole VerityDescriptor struct to hash now that the
map is removed from it.

PiperOrigin-RevId: 382201560
2021-06-29 17:44:53 -07:00
Lucas Manning 90dbb4b0c7 Add SIOCGIFFLAGS ioctl support to hostinet.
PiperOrigin-RevId: 382194711
2021-06-29 17:01:11 -07:00
Zach Koopmans 54b71221c0 [syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVAL
Remove three syserror entries duplicated in linuxerr. Because of the
linuxerr.Equals method, this is a mere change of return values from
syserror to linuxerr definitions.

Done with only these three errnos as CLs removing all grow to a significantly
large size.

PiperOrigin-RevId: 382173835
2021-06-29 15:08:46 -07:00
Fabricio Voznika d205926f23 Delete PID files right after they are read
The PID files are not used after they are read, so there is
no point in keeping them around until the shim is deleted.

Updates #6225

PiperOrigin-RevId: 382169916
2021-06-29 14:49:33 -07:00
Fabricio Voznika 5f2b3728fc Redirect all calls from `errdefs.ToGRPC` to `utils.ErrToGRPC`
This is to ensure that Go 1.13 error wrapping is correctly
translated to gRPC errors before returning from the shim.

Updates #6225

PiperOrigin-RevId: 382120441
2021-06-29 10:56:17 -07:00
liornm e8bc632d07 Fix iptables List entries Input interface field
In Linux the list entries command returns the name of the input interface assigned to the iptable rule.
iptables -S
 > -A FORWARD -i docker0 -o docker0 -j ACCEPT

Meanwhile, in gVsior this interface name is ignored.
iptables -S
 > -A FORWARD -o docker0 -j ACCEPT
2021-06-29 15:13:07 +03:00
liornm ddbc273659 Fix TUN IFF_NO_PI bug
When TUN is created with IFF_NO_PI flag, there will be no Ethernet header and no packet info, therefore, both read and write will fail. 

This commit fix this bug.
2021-06-29 10:51:58 +03:00
Jamie Liu 5b2afd24a7 Allow VFS2 gofer client to mmap from sentry page cache when forced.
PiperOrigin-RevId: 381982257
2021-06-28 17:43:23 -07:00