Commit Graph

4135 Commits

Author SHA1 Message Date
Ghanan Gowripalan 87c5c0ad25 Receive broadcast packets on interested endpoints
When a broadcast packet is received by the stack, the packet should be
delivered to each endpoint that may be interested in the packet. This
includes all any address and specified broadcast address listeners.

Test: integration_test.TestReuseAddrAndBroadcast
PiperOrigin-RevId: 332060652
2020-09-16 12:20:45 -07:00
Fabricio Voznika 326a1dbb73 Refactor removed default test dimension
ptrace was always selected as a dimension before, but not
anymore. Some tests were specifying "overlay" expecting that
to be in addition to the default.

PiperOrigin-RevId: 332004111
2020-09-16 07:47:28 -07:00
Rahat Mahmood 9ef1c79922 Rename marshal.Task to marshal.CopyContext.
CopyContext is a better name for the interface because from
go-marshal's perspective, the interface has nothing to do with a
task. A kernel.Task happens to implement the interface, but so can
other things like MemoryManager and IO sequences.

PiperOrigin-RevId: 331959678
2020-09-16 02:10:12 -07:00
Rahat Mahmood d201feb8c5 Enable automated marshalling for the syscall package.
PiperOrigin-RevId: 331940975
2020-09-15 23:38:57 -07:00
Ian Lewis dcd532e2e4 Add support for OCI seccomp filters in the sandbox.
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.

The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.

- New conditional operations were added to pkg/seccomp to support operations
  available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
  that syscalls matching the conditionals result in the provided action not
  simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
  rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
  jumps are still not permitted.

Fixes #510

PiperOrigin-RevId: 331938697
2020-09-15 23:19:17 -07:00
Ian Lewis c053c4bb03 Fix GitHub issue template.
runsc -v doesn't work. It should be runsc -version

PiperOrigin-RevId: 331911035
2020-09-15 19:49:56 -07:00
Chong Cai cb2e3c946a Implement gvisor verity fs ioctl with GETFLAGS
PiperOrigin-RevId: 331905347
2020-09-15 19:01:59 -07:00
Jamie Liu 8b15effd9e Improve syserror_test.
- It's very difficult to prevent returnErrnoAsError and returnError from being
  optimized out. Instead, replace BenchmarkReturn* with BenchmarkAssign*, which
  store to globalError.

- Compare to a non-nil globalError in BenchmarkCompare* and BenchmarkSwitch*.

New results:
BenchmarkAssignErrno
BenchmarkAssignErrno-12     	1000000000	         0.615 ns/op
BenchmarkAssignError
BenchmarkAssignError-12     	1000000000	         0.626 ns/op
BenchmarkCompareErrno
BenchmarkCompareErrno-12    	1000000000	         0.522 ns/op
BenchmarkCompareError
BenchmarkCompareError-12    	1000000000	         3.54 ns/op
BenchmarkSwitchErrno
BenchmarkSwitchErrno-12     	1000000000	         1.45 ns/op
BenchmarkSwitchError
BenchmarkSwitchError-12     	536315757	        10.9 ns/op

PiperOrigin-RevId: 331875387
2020-09-15 15:59:25 -07:00
Jamie Liu 456c6c33e1 Invert dependency between the context and amutex packages.
This is to allow the syserror package to depend on the context package in a
future change.

PiperOrigin-RevId: 331866252
2020-09-15 15:14:53 -07:00
Dean Deng a004f0d082 Support setting STATX_SIZE for kernfs.InodeAttrs.
Make setting STATX_SIZE a no-op, if it is valid for the given permissions and
file type.

Also update proc tests, which were overfitted before.

Fixes #3842.
Updates #1193.

PiperOrigin-RevId: 331861087
2020-09-15 14:55:28 -07:00
Arthur Sfez 72a30b1148 Move reusable IPv4 test code into a testutil module and refactor it
The refactor aims to simplify the package, by replacing the Go channel with a
PacketBuffer slice.

This code will be reused by tests for IPv6 fragmentation.

PiperOrigin-RevId: 331860411
2020-09-15 14:49:29 -07:00
Nayana Bidari 7f89a26e18 Release FDTable lock before dropping the fds.
This is needed for SO_LINGER, where close() is blocked for linger timeout and
we are holding the FDTable lock for the entire timeout which will not allow
us to create/delete other fds. We have to release the locks and then drop the
fds.

PiperOrigin-RevId: 331844185
2020-09-15 13:43:50 -07:00
Jamie Liu 0d790cbaea Read vfs2 epoll events atomically.
Discovered by ayushranjan@:

VFS2 was employing the following algorithm for fetching ready events from an
epoll instance:
- Create a statically sized EpollEvent slice on the stack of size 16.
- Pass that to EpollInstance.ReadEvents() to populate.
   - EpollInstance.ReadEvents() requeues level-triggered events that it returns
     back into the ready queue.
- Write the results to usermem.
- If the number of results were = 16 then recall EpollInstance.ReadEvents() in
  the hopes of getting more. But this will cause duplication of the "requeued"
  ready level-triggered events.

So if the ready queue has >= 16 ready events, the EpollWait for loop will spin
until it fills the usermem with `maxEvents` events.

Fixes #3521

PiperOrigin-RevId: 331840527
2020-09-15 13:25:58 -07:00
Jamie Liu 86b31a8077 RFC: design for a 9P replacement
Tentatively `lisafs` (LInux SAndbox FileSystem).

PiperOrigin-RevId: 331839246
2020-09-15 13:19:36 -07:00
gVisor bot 84d48c0fdd Merge pull request #3895 from btw616:fix/issue-3894
PiperOrigin-RevId: 331824411
2020-09-15 12:12:28 -07:00
Ghanan Gowripalan d3880b76cb Don't conclude broadcast from route destination
The routing table (in its current) form should not be used to make
decisions about whether a remote address is a broadcast address or
not (for IPv4).

Note, a destination subnet does not always map to a network.
E.g. RouterA may have a route to 192.168.0.0/22 through RouterB,
but RouterB may be configured with 4x /24 subnets on 4 different
interfaces.

See https://github.com/google/gvisor/issues/3938.

PiperOrigin-RevId: 331819868
2020-09-15 11:53:00 -07:00
Tiwei Bie 1adedad81c Fix proc.(*fdDir).IterDirents for VFS2
Currently the returned offset is an index, and we can't
use it to find the next fd to serialize, because getdents
should iterate correctly despite mutation of fds. Instead,
we can return the next fd to serialize plus 2 (which
accounts for "." and "..") as the offset.

Fixes: #3894

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-15 11:12:29 +08:00
Fabricio Voznika 52ffeb2d64 Add note about gofer link(2) limitation
PiperOrigin-RevId: 331648296
2020-09-14 16:05:02 -07:00
Tamir Duberstein 2747030ec7 Store multicast memberships in a set
This is simpler and more performant.

PiperOrigin-RevId: 331639978
2020-09-14 15:22:00 -07:00
Mithun Iyer 05d2ebee5e Test RST handling in TIME_WAIT.
gVisor stack ignores RSTs when in TIME_WAIT which is not the default
Linux behavior. Add a packetimpact test to test the same.
Also update code comments to reflect the rationale for the current
gVisor behavior.

PiperOrigin-RevId: 331629879
2020-09-14 14:33:53 -07:00
Jamie Liu 2969b17405 Correct FDSize in /proc/[pid]/status.
In Linux, FDSize is fs/proc/array.c:task_state() => struct fdtable::max_fds,
which is set to the underlying array's length in fs/file.c:alloc_fdtable().

Follow-up changes:

- Remove FDTable.GetRefs() and FDTable.GetRefsVFS2(), which are unused.

- Reset FDTable.used to 0 during restore, since the subsequent calls to
  FDTable.setAll() increment it again, causing its value to be doubled. (After
  this CL, FDTable.used is only used to avoid reallocation in FDTable.GetFDs(),
  so this fix is not very visible.)

PiperOrigin-RevId: 331588190
2020-09-14 11:34:50 -07:00
Kevin Krakauer 833ceb0f14 Fix modprobe dependency
The modprobe command only takes 1 module per invocation. The second module name
is being passed as a module parameter.

PiperOrigin-RevId: 331585765
2020-09-14 11:11:05 -07:00
Toshi Kikuchi b6ca96b9b9 Cap reassembled IPv6 packets at 65535 octets
IPv4 can accept 65536-octet reassembled packets.

Test:
- ipv4_test.TestInvalidFragments
- ipv4_test.TestReceiveFragments
- ipv6.TestInvalidIPv6Fragments
- ipv6.TestReceiveIPv6Fragments

Fixes #3770

PiperOrigin-RevId: 331382977
2020-09-12 23:21:27 -07:00
Rahat Mahmood 3ca73841d7 Move the 'marshal' and 'primitive' packages to the 'pkg' directory.
PiperOrigin-RevId: 331256608
2020-09-11 17:42:49 -07:00
Nicolas Lacasse 1f4fb817c8 Check that we have access to the trusted.* xattr namespace directly.
These operations require CAP_SYS_ADMIN in the root user namespace. There's no
easy way to check that other than trying the operation and seeing what happens.

PiperOrigin-RevId: 331242256
2020-09-11 16:10:12 -07:00
Amanda Tait 325f7036b0 Use correct test device name in Fuchsia packetimpact
Packetimpact on Fuchsia was formerly using the Linux test device name. This
change fixes that.

PiperOrigin-RevId: 331211518
2020-09-11 13:28:57 -07:00
Michael Pratt 490e5c83bd Make nogo more robust to variety of stdlib layouts.
PiperOrigin-RevId: 331206424
2020-09-11 13:07:30 -07:00
Jamie Liu 9a5635eb17 Implement copy-up-coherent mmap for VFS2 overlayfs.
This is very similar to copy-up-coherent mmap in the VFS1 overlay, with the
minor wrinkle that there is no fs.InodeOperations.Mappable().

Updates #1199

PiperOrigin-RevId: 331206314
2020-09-11 13:01:54 -07:00
Bhasker Hariharan 831ab2dd99 Fix host unix socket to not swallow EOF incorrectly.
Fixes an error where in case of a receive buffer larger than the host send
buffer size for a host backed unix dgram socket we would end up swallowing EOF
from recvmsg syscall causing the read() to block forever.

PiperOrigin-RevId: 331192810
2020-09-11 11:56:04 -07:00
Tamir Duberstein 964447c8ce Clean up image construction
- Skip `docker inspect`; `docker pull` is idempotent
- Remove unnecessary CMD directives in Dockerfiles
- Run bazel before building images to catch errors sooner

PiperOrigin-RevId: 331107815
2020-09-11 01:57:42 -07:00
Ayush Ranjan 365545855f [vfs] Disable inode number equality check for overlayfs.
Overlayfs does not persist a directory's inode number even while it is mounted.
See fs/overlayfs/inode.c:ovl_map_dev_ino(). VFS2 generates a new inode number
for directories everytime in lookup.

PiperOrigin-RevId: 331045037
2020-09-10 16:50:18 -07:00
Ayush Ranjan 14e0eb6e0f [vfs] Add vfs2 runtime tests.
PiperOrigin-RevId: 330981912
2020-09-10 11:42:51 -07:00
gVisor bot 7275f293d7 Merge pull request #3892 from lubinszARM:pr_n1_02
PiperOrigin-RevId: 330973856
2020-09-10 11:07:18 -07:00
Ayush Ranjan 50c99a86d1 [vfs] Disable nlink tests for overlayfs.
Overlayfs intentionally does not compute nlink for directories (because it can
be really expensive). Linux returns 1, VFS2 returns 2 and VFS1 actually
calculates the correct value.

PiperOrigin-RevId: 330967139
2020-09-10 10:40:35 -07:00
gVisor bot 9a003835f9 Fix typo, remove duplicate word.
PiperOrigin-RevId: 330898705
2020-09-10 03:12:21 -07:00
Bin Lu 6d68834779 arm64:place an SB sequence following an ERET instruction
Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-10 02:47:13 -04:00
Ian Lewis 1ab097b08f Add note about kubeadm to the FAQ
Fixes #3277

PiperOrigin-RevId: 330853338
2020-09-09 20:13:15 -07:00
Jamie Liu 644ac7b6bc Unlock VFS.mountMu before FilesystemImpl calls for /proc/[pid]/{mounts,mountinfo}.
Also move VFS.MakeSyntheticMountpoint() (which is a utility wrapper around
VFS.MkdirAllAt(), itself a utility wrapper around VFS.MkdirAt()) to not be in
the middle of the implementation of these proc files.

Fixes #3878

PiperOrigin-RevId: 330843106
2020-09-09 18:45:42 -07:00
Jamie Liu 2c7df1a9a5 Don't write VFS2 gofer client timestamps back on dentry destruction.
This feature is too expensive for runsc, even with setattrclunk, because
fsgofer.localFile.SetAttr() ends up needing to call reopenProcFD(), incurring
two string allocations for the FD pathname, an fd.FD allocation, and two calls
to runtime.SetFinalizer() when the fd.FD is created and closed respectively
(b/133767962) (plus the actual cost of the syscalls, which is negligible).

PiperOrigin-RevId: 330843012
2020-09-09 18:39:23 -07:00
gVisor bot f949951144 Merge pull request #3886 from avagin:github-act-feature
PiperOrigin-RevId: 330841374
2020-09-09 18:25:58 -07:00
Andrei Vagin bd3252d5cf github: Don't build the Go branch for feature branches
We can't actually push the Go branch on pushes to feature branches.

Signed-off-by: Andrei Vagin <avagin@google.com>
2020-09-09 14:56:59 -07:00
gVisor bot aead623d93 Merge pull request #3880 from avagin:github-act-feature
PiperOrigin-RevId: 330802067
2020-09-09 14:48:55 -07:00
Jamie Liu f3172c3a11 Don't sched_setaffinity in ptrace platform.
PiperOrigin-RevId: 330777900
2020-09-09 12:48:57 -07:00
Andrei Vagin 27897621da github: run actions for feature branches
Signed-off-by: Andrei Vagin <avagin@google.com>
2020-09-09 10:56:11 -07:00
Ian Lewis fb281eea75 Fix formatting for Kubernetes tutorial
PiperOrigin-RevId: 330745430
2020-09-09 10:09:58 -07:00
Ian Lewis 26439f9a43 Add syntax highlighting to website
Adds a syntax highlighting theme css so that code snippets are highlighted
properly.

PiperOrigin-RevId: 330733737
2020-09-09 09:08:37 -07:00
Ian Lewis 00479af515 Add a Docker Compose tutorial
Adds a Docker Compose tutorial to the website that shows how to start a
Wordpress site and includes information about how to get DNS working.

Fixes #115

PiperOrigin-RevId: 330652842
2020-09-08 21:59:24 -07:00
Jamie Liu 8d3551da6a Implement synthetic mountpoints for kernfs.
PiperOrigin-RevId: 330629897
2020-09-08 18:33:03 -07:00
Ayush Ranjan bca4d99a4b [vfs] overlayfs: Fix socket tests.
- BindSocketThenOpen test was expecting the incorrect error when opening
  a socket. Fixed that.
- VirtualFilesystem.BindEndpointAt should not require pop.Path.Begin.Ok()
  because the filesystem implementations do not need to walk to the parent
  dentry. This check also exists for MknodAt, MkdirAt, RmdirAt, SymlinkAt and
  UnlinkAt but those filesystem implementations also need to walk to the parent
  denty. So that check is valid. Added some syscall tests to test this.

PiperOrigin-RevId: 330625220
2020-09-08 17:56:22 -07:00
gVisor bot a17d083f3b Add check for both child and childMerkle ENOENT
The check in verity walk returns error for non ENOENT cases, and all
ENOENT results should be checked. This case was missing.

PiperOrigin-RevId: 330604771
2020-09-08 16:01:10 -07:00