Commit Graph

4360 Commits

Author SHA1 Message Date
Bin Lu 6d68834779 arm64:place an SB sequence following an ERET instruction
Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-10 02:47:13 -04:00
Ian Lewis 1ab097b08f Add note about kubeadm to the FAQ
Fixes #3277

PiperOrigin-RevId: 330853338
2020-09-09 20:13:15 -07:00
Jamie Liu 644ac7b6bc Unlock VFS.mountMu before FilesystemImpl calls for /proc/[pid]/{mounts,mountinfo}.
Also move VFS.MakeSyntheticMountpoint() (which is a utility wrapper around
VFS.MkdirAllAt(), itself a utility wrapper around VFS.MkdirAt()) to not be in
the middle of the implementation of these proc files.

Fixes #3878

PiperOrigin-RevId: 330843106
2020-09-09 18:45:42 -07:00
Jamie Liu 2c7df1a9a5 Don't write VFS2 gofer client timestamps back on dentry destruction.
This feature is too expensive for runsc, even with setattrclunk, because
fsgofer.localFile.SetAttr() ends up needing to call reopenProcFD(), incurring
two string allocations for the FD pathname, an fd.FD allocation, and two calls
to runtime.SetFinalizer() when the fd.FD is created and closed respectively
(b/133767962) (plus the actual cost of the syscalls, which is negligible).

PiperOrigin-RevId: 330843012
2020-09-09 18:39:23 -07:00
gVisor bot f949951144 Merge pull request #3886 from avagin:github-act-feature
PiperOrigin-RevId: 330841374
2020-09-09 18:25:58 -07:00
Andrei Vagin bd3252d5cf github: Don't build the Go branch for feature branches
We can't actually push the Go branch on pushes to feature branches.

Signed-off-by: Andrei Vagin <avagin@google.com>
2020-09-09 14:56:59 -07:00
gVisor bot aead623d93 Merge pull request #3880 from avagin:github-act-feature
PiperOrigin-RevId: 330802067
2020-09-09 14:48:55 -07:00
Jamie Liu f3172c3a11 Don't sched_setaffinity in ptrace platform.
PiperOrigin-RevId: 330777900
2020-09-09 12:48:57 -07:00
Andrei Vagin 27897621da github: run actions for feature branches
Signed-off-by: Andrei Vagin <avagin@google.com>
2020-09-09 10:56:11 -07:00
Ian Lewis fb281eea75 Fix formatting for Kubernetes tutorial
PiperOrigin-RevId: 330745430
2020-09-09 10:09:58 -07:00
Ian Lewis 26439f9a43 Add syntax highlighting to website
Adds a syntax highlighting theme css so that code snippets are highlighted
properly.

PiperOrigin-RevId: 330733737
2020-09-09 09:08:37 -07:00
Ian Lewis 00479af515 Add a Docker Compose tutorial
Adds a Docker Compose tutorial to the website that shows how to start a
Wordpress site and includes information about how to get DNS working.

Fixes #115

PiperOrigin-RevId: 330652842
2020-09-08 21:59:24 -07:00
Jamie Liu 8d3551da6a Implement synthetic mountpoints for kernfs.
PiperOrigin-RevId: 330629897
2020-09-08 18:33:03 -07:00
Ayush Ranjan bca4d99a4b [vfs] overlayfs: Fix socket tests.
- BindSocketThenOpen test was expecting the incorrect error when opening
  a socket. Fixed that.
- VirtualFilesystem.BindEndpointAt should not require pop.Path.Begin.Ok()
  because the filesystem implementations do not need to walk to the parent
  dentry. This check also exists for MknodAt, MkdirAt, RmdirAt, SymlinkAt and
  UnlinkAt but those filesystem implementations also need to walk to the parent
  denty. So that check is valid. Added some syscall tests to test this.

PiperOrigin-RevId: 330625220
2020-09-08 17:56:22 -07:00
gVisor bot a17d083f3b Add check for both child and childMerkle ENOENT
The check in verity walk returns error for non ENOENT cases, and all
ENOENT results should be checked. This case was missing.

PiperOrigin-RevId: 330604771
2020-09-08 16:01:10 -07:00
gVisor bot 360f1535c7 Implement ioctl with enable verity
ioctl with FS_IOC_ENABLE_VERITY is added to verity file system to enable
a file as verity file. For a file, a Merkle tree is built with its data.
For a directory, a Merkle tree is built with the root hashes of its
children.

PiperOrigin-RevId: 330604368
2020-09-08 15:54:21 -07:00
Ayush Ranjan 682c0edcdc [vfs] overlayfs: decref VD when not using it.
overlay/filesystem.go:lookupLocked() did not DecRef the VD on some error paths
when it would not end up saving or using the VD.

PiperOrigin-RevId: 330589742
2020-09-08 14:42:39 -07:00
Fabricio Voznika c8f1ce288d Honor readonly flag for root mount
Updates #1487

PiperOrigin-RevId: 330580699
2020-09-08 14:00:43 -07:00
Sam Balana 284e6811e4 Increase resolution timeout for TestCacheResolution
Fixes pkg/tcpip/stack:stack_test flake experienced while running
TestCacheResolution with gotsan. This occurs when the test-runner takes longer
than the resolution timeout to call linkAddrCache.get.

In this test we don't care about the resolution timeout, so set it to the
maximum and rely on test-runner timeouts to avoid deadlocks.

PiperOrigin-RevId: 330566250
2020-09-08 12:52:10 -07:00
gVisor bot a3b87a0cef Merge pull request #3856 from btw616:fix/issue-3855
PiperOrigin-RevId: 330565414
2020-09-08 12:46:25 -07:00
Bhasker Hariharan 38cdb0579b Fix data race in tcp.GetSockOpt.
e.ID can't be read without holding e.mu. GetSockOpt was reading e.ID
when looking up OriginalDst without holding e.mu.

PiperOrigin-RevId: 330562293
2020-09-08 12:31:19 -07:00
Ghanan Gowripalan d35f07b36a Improve type safety for transport protocol options
The existing implementation for TransportProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.

This change introduces marker interfaces for transport protocol options
that may be set or queried which transport protocol option types
implement to ensure that invalid types are caught at compile time.
Different interfaces are used to allow the compiler to enforce read-only
or set-only socket options.

RELNOTES: n/a
PiperOrigin-RevId: 330559811
2020-09-08 12:17:39 -07:00
Ayush Ranjan d84ec6c42b [vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.
PiperOrigin-RevId: 330554450
2020-09-08 11:51:39 -07:00
Tiwei Bie ceab2e21de Fix the use after nil check on args.MountNamespaceVFS2
The args.MountNamespaceVFS2 is used again after the nil check,
instead, mntnsVFS2 which holds the expected reference should be
used. This patch fixes this issue.

Fixes: #3855

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-08 15:50:29 +08:00
Ayush Ranjan fada564c83 Fix make_apt script.
This change makes the following fixes:
- When creating a test repo.key, create a secret keyring as other workflows
  also use secret keyrings only.
- We should not be using both --keyring and --secret-keyring options. Just use
  --secret-keyring.
- Pass homedir to all gpg commands. dpkg-sig takes an arg -g which stands for
  gpgopts. So we need to pass the homedir there too.

PiperOrigin-RevId: 330443280
2020-09-07 21:18:22 -07:00
Fabricio Voznika 2202812e07 Simplify FD handling for container start/exec
VFS1 and VFS2 host FDs have different dupping behavior,
making error prone to code for both. Change the contract
so that FDs are released as they are used, so the caller
can simple defer a block that closes all remaining files.
This also addresses handling of partial failures.

With this fix, more VFS2 tests can be enabled.

Updates #1487

PiperOrigin-RevId: 330112266
2020-09-04 11:42:02 -07:00
Dean Deng c564293b65 Adjust input file offset when sendfile only completes a partial write.
Fixes #3779.

PiperOrigin-RevId: 330057268
2020-09-03 23:30:47 -07:00
Ayush Ranjan b6d6a120d0 Fix the release workflow.
PiperOrigin-RevId: 330049242
2020-09-03 21:45:10 -07:00
Bhasker Hariharan 805861ca37 Use fine-grained mutex for stack.cleanupEndpoints.
stack.cleanupEndpoints is protected by the stack.mu but that can cause
contention as the stack mutex is already acquired in a lot of hot paths during
new endpoint creation /cleanup etc. Moving this to a fine grained mutex should
reduce contention on the stack.mu.

PiperOrigin-RevId: 330026151
2020-09-03 17:36:41 -07:00
Jamie Liu 76e51c8b9a Use atomic.Value for Stack.tcpProbeFunc.
b/166980357#comment56 shows:

- 837 goroutines blocked in:
gvisor/pkg/sync/sync.(*RWMutex).Lock
gvisor/pkg/tcpip/stack/stack.(*Stack).StartTransportEndpointCleanup
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).cleanupLocked
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).completeWorkerLocked
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop.func1
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop

- 695 goroutines blocked in:
gvisor/pkg/sync/sync.(*RWMutex).Lock
gvisor/pkg/tcpip/stack/stack.(*Stack).CompleteTransportEndpointCleanup
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).cleanupLocked
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).completeWorkerLocked
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop.func1
gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop

- 3882 goroutines blocked in:
gvisor/pkg/sync/sync.(*RWMutex).Lock
gvisor/pkg/tcpip/stack/stack.(*Stack).GetTCPProbe
gvisor/pkg/tcpip/transport/tcp/tcp.newEndpoint
gvisor/pkg/tcpip/transport/tcp/tcp.(*protocol).NewEndpoint
gvisor/pkg/tcpip/stack/stack.(*Stack).NewEndpoint

All of these are contending on Stack.mu. Stack.StartTransportEndpointCleanup()
and Stack.CompleteTransportEndpointCleanup() insert/delete TransportEndpoints
in a map (Stack.cleanupEndpoints), and the former also does endpoint
unregistration while holding Stack.mu, so it's not immediately clear how
feasible it is to replace the map with a mutex-less implementation or how much
doing so would help. However, Stack.GetTCPProbe() just reads a function object
(Stack.tcpProbeFunc) that is almost always nil (as far as I can tell,
Stack.AddTCPProbe() is only called in tests), and it's called for every new TCP
endpoint. So converting it to an atomic.Value should significantly reduce
contention on Stack.mu, improving TCP endpoint creation latency and allowing
TCP endpoint cleanup to proceed.

PiperOrigin-RevId: 330004140
2020-09-03 15:24:48 -07:00
Nicolas Lacasse 30c20df76f Run gentdents_benchmark with fewer files.
This test regularly times out when "shared" filesystem is enabled.

PiperOrigin-RevId: 329950622
2020-09-03 10:54:01 -07:00
Tamir Duberstein 319ce67369 Avoid grpc_impl
PiperOrigin-RevId: 329902747
2020-09-03 05:52:17 -07:00
Ian Lewis a8c174c047 Update version in cni tutorial
Update the cniVersion used in the CNI tutorial so that it works with
containerd 1.2. Containerd 1.2 includes a version of the cri plugin
(release/1.2) that, in turn, includes a version of the
cni library (0.6.0) that only supports up to 0.3.1.
https://github.com/containernetworking/cni/blob/v0.6.0/pkg/version/version.go#L38

PiperOrigin-RevId: 329837188
2020-09-02 19:38:34 -07:00
Zeling Feng 86c1ae095a Add support to run packetimpact tests against Fuchsia
blaze test <test_name>_fuchsia_test will run the corresponding packetimpact
test against fuchsia.

PiperOrigin-RevId: 329835290
2020-09-02 19:19:40 -07:00
Bhasker Hariharan b69352245a Fix Accept to not return error for sockets in accept queue.
Accept on gVisor will return an error if a socket in the accept queue was closed
before Accept() was called. Linux will return the new fd even if the returned
socket is already closed by the peer say due to a RST being sent by the peer.

This seems to be intentional in linux more details on the github issue.

Fixes #3780

PiperOrigin-RevId: 329828404
2020-09-02 18:21:47 -07:00
Ayush Ranjan 1fec861939 [vfs] Implement xattr for overlayfs.
PiperOrigin-RevId: 329825497
2020-09-02 17:58:05 -07:00
Ayush Ranjan 0ca0d8e011 [vfs] Fix error handling in overlayfs OpenAt.
Updates #1199

PiperOrigin-RevId: 329802274
2020-09-02 15:43:13 -07:00
Jamie Liu 5c66011200 Update Go version constraint on sync/spin_unsafe.go.
PiperOrigin-RevId: 329801584
2020-09-02 15:37:26 -07:00
Jamie Liu 9bd0164237 Improve sync.SeqCount performance.
- Make sync.SeqCountEpoch not a struct. This allows sync.SeqCount.BeginRead()
  to be inlined.

- Mark sync.SeqAtomicLoad<T> nosplit to mitigate the Go compiler's refusal to
  inline it. (Best I could get was "cost 92 exceeds budget 80".)

- Use runtime-guided spinning in SeqCount.BeginRead().

Benchmarks:
name                               old time/op  new time/op   delta
pkg:pkg/sync/sync goos:linux goarch:amd64
SeqCountWriteUncontended-12        8.24ns ± 0%  11.40ns ± 0%  +38.35%  (p=0.000 n=10+10)
SeqCountReadUncontended-12         0.33ns ± 0%   0.14ns ± 3%  -57.77%  (p=0.000 n=7+8)
pkg:pkg/sync/seqatomictest/seqatomic goos:linux goarch:amd64
SeqAtomicLoadIntUncontended-12     0.64ns ± 1%   0.41ns ± 1%  -36.40%  (p=0.000 n=10+8)
SeqAtomicTryLoadIntUncontended-12  0.18ns ± 4%   0.18ns ± 1%     ~     (p=0.206 n=10+8)
AtomicValueLoadIntUncontended-12   0.27ns ± 3%   0.27ns ± 0%   -1.77%  (p=0.000 n=10+8)

(atomic.Value.Load is, of course, inlined. We would expect an uncontended
inline SeqAtomicLoad<int> to perform identically to SeqAtomicTryLoad<int>.) The
"regression" in BenchmarkSeqCountWriteUncontended, despite this CL changing
nothing in that path, is attributed to microarchitectural subtlety; the
benchmark loop is unchanged except for its address:

Before this CL:
  :0                    0x4e62d1                48ffc2                  INCQ DX
  :0                    0x4e62d4                48399110010000          CMPQ DX, 0x110(CX)
  :0                    0x4e62db                7e26                    JLE 0x4e6303
  :0                    0x4e62dd                90                      NOPL
  :0                    0x4e62de                bb01000000              MOVL $0x1, BX
  :0                    0x4e62e3                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e62e7                ffc3                    INCL BX
  :0                    0x4e62e9                0fbae300                BTL $0x0, BX
  :0                    0x4e62ed                733a                    JAE 0x4e6329
  :0                    0x4e62ef                90                      NOPL
  :0                    0x4e62f0                bb01000000              MOVL $0x1, BX
  :0                    0x4e62f5                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e62f9                ffc3                    INCL BX
  :0                    0x4e62fb                0fbae300                BTL $0x0, BX
  :0                    0x4e62ff                73d0                    JAE 0x4e62d1

After this CL:
  :0                    0x4e6361                48ffc2                  INCQ DX
  :0                    0x4e6364                48399110010000          CMPQ DX, 0x110(CX)
  :0                    0x4e636b                7e26                    JLE 0x4e6393
  :0                    0x4e636d                90                      NOPL
  :0                    0x4e636e                bb01000000              MOVL $0x1, BX
  :0                    0x4e6373                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e6377                ffc3                    INCL BX
  :0                    0x4e6379                0fbae300                BTL $0x0, BX
  :0                    0x4e637d                733a                    JAE 0x4e63b9
  :0                    0x4e637f                90                      NOPL
  :0                    0x4e6380                bb01000000              MOVL $0x1, BX
  :0                    0x4e6385                f00fc118                LOCK XADDL BX, 0(AX)
  :0                    0x4e6389                ffc3                    INCL BX
  :0                    0x4e638b                0fbae300                BTL $0x0, BX
  :0                    0x4e638f                73d0                    JAE 0x4e6361

PiperOrigin-RevId: 329754148
2020-09-02 11:37:31 -07:00
Zach Koopmans b9b6660dc4 Add Docs to nginx benchmark.
Adds docs to nginx and refactors both Httpd and Nginx benchmarks.

Key changes:
- Add docs and make nginx tests the same as httpd (reverse, all docs, etc.).
- Make requests scale on c * b.N -> a request per thread. This works well
with both --test.benchtime=10m (do a run that lasts at least 10m) and
--test.benchtime=10x (do b.N = 10).
-- Remove a doc from both tests (1000Kb) as 1024Kb exists.

PiperOrigin-RevId: 329751091
2020-09-02 11:22:17 -07:00
Ayush Ranjan 8ab08cdc01 [runtime tests] Exclude flaky nodejs test
PiperOrigin-RevId: 329749191
2020-09-02 11:13:02 -07:00
gVisor bot a0e4310384 Merge pull request #3822 from btw616:fix/issue-3821
PiperOrigin-RevId: 329710371
2020-09-02 07:42:19 -07:00
Zach Koopmans 563f28b7d5 Fix statfs test for opensource.
PiperOrigin-RevId: 329638946
2020-09-01 21:03:48 -07:00
Fabricio Voznika 37a217aca4 Implement setattr+clunk in 9P
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.

Benchmark results:

BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2:          68109 ns
VFS1:          72507 ns

Updates #1198

PiperOrigin-RevId: 329628461
2020-09-01 19:22:12 -07:00
Mithun Iyer 40faeaa180 Fix handling of unacceptable ACKs during close.
On receiving an ACK with unacceptable ACK number, in a closing state,
TCP, needs to reply back with an ACK with correct seq and ack numbers and
remain in same state. This change is as per RFC793 page 37, but with a
difference that it does not apply to ESTABLISHED state, just as in Linux.
Also add more tests to check for OTW sequence number and unacceptable
ack numbers in these states.

Fixes #3785

PiperOrigin-RevId: 329616283
2020-09-01 17:45:04 -07:00
Dean Deng c67d8ece09 Test opening file handles with different permissions.
These were problematic for vfs2 gofers before correctly implementing separate
read/write handles.

PiperOrigin-RevId: 329613261
2020-09-01 17:21:22 -07:00
Ayush Ranjan 2eaf54dd59 Refactor tty codebase to use master-replica terminology.
Updates #2972

PiperOrigin-RevId: 329584905
2020-09-01 14:43:41 -07:00
Nayana Bidari 04c284f8c2 Fix panic when calling dup2().
PiperOrigin-RevId: 329572337
2020-09-01 13:41:01 -07:00
Ayush Ranjan 723fb5c116 [go-marshal] Enable auto-marshalling for fs/tty.
PiperOrigin-RevId: 329564614
2020-09-01 13:02:17 -07:00
Fabricio Voznika 71589b7f7e Let flags be overriden from OCI annotations
This allows runsc flags to be set per sandbox instance. For
example, K8s pod annotations can be used to enable
--debug for a single pod, making troubleshoot much easier.
Similarly, features like --vfs2 can be enabled for
experimentation without affecting other pods in the node.

Closes #3494

PiperOrigin-RevId: 329542815
2020-09-01 11:12:19 -07:00