Commit Graph

1791 Commits

Author SHA1 Message Date
Haibo Xu c04958e2fa Enable thread local storage support on arm64.
Linux use the task.thread.uw.tp_value field to store the
TLS pointer on arm64 platform, and we use a similar way
in gvisor to store it in the arch/State struct.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Ie76b5c6d109bc27ccfd594008a96753806db7764
2020-03-09 01:04:55 +00:00
Dean Deng 228813fd26 Update comments and debug level for profiling options.
PiperOrigin-RevId: 299448307
2020-03-06 15:23:46 -08:00
Dean Deng 960f6a975b Add plumbing for importing fds in VFS2, along with non-socket, non-TTY impl.
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.

An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.

This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.

Updates #1672

PiperOrigin-RevId: 299417263
2020-03-06 12:59:49 -08:00
Tamir Duberstein 6fa5cee82c Prevent memory leaks in ilist
When list elements are removed from a list but not discarded, it becomes
important to invalidate the references they hold to their former
neighbors to prevent memory leaks.

PiperOrigin-RevId: 299412421
2020-03-06 12:31:43 -08:00
gVisor bot 18d41cf153 Merge pull request #1963 from xiaobo55x:kvm_common
PiperOrigin-RevId: 299405855
2020-03-06 12:05:30 -08:00
gVisor bot 56c4272568 Merge pull request #1946 from xiaobo55x:dieTramp
PiperOrigin-RevId: 299405663
2020-03-06 12:01:23 -08:00
Eyal Soha d5dbe366bf shutdown(s, SHUT_WR) in TIME-WAIT returns ENOTCONN
From RFC 793 s3.9 p61 Event Processing:

CLOSE Call during TIME-WAIT: return with "error: connection closing"

Fixes #1603

PiperOrigin-RevId: 299401353
2020-03-06 11:42:34 -08:00
Ghanan Gowripalan f50d9a31e9 Specify the source of outgoing NDP RS
If the NIC has a valid IPv6 address assigned, use it as the
source address for outgoing NDP Router Solicitation packets.

Test: stack_test.TestRouterSolicitation
PiperOrigin-RevId: 299398763
2020-03-06 11:33:28 -08:00
Nayana Bidari 1e8c0bcedb Add nat table support for iptables. 2020-03-06 09:25:32 -08:00
Ghanan Gowripalan d6f5e71df2 Get strings for stack.DHCPv6ConfigurationFromNDPRA
Useful for logs to print the string representation of the value
instead of the integer value.

PiperOrigin-RevId: 299356847
2020-03-06 08:02:45 -08:00
Ian Lewis da48fc6cca Stub oom_score_adj and oom_score.
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts
writes of values -1000 to 1000 and persists the value with the task. New tasks
inherit the parent's oom_score_adj.

oom_score is a read-only stub that always returns the value '0'.

Issue #202

PiperOrigin-RevId: 299245355
2020-03-05 18:23:01 -08:00
Ting-Yu Wang 9b64b658c1 Fix S/R on inet.Namespace.
PiperOrigin-RevId: 299238067
2020-03-05 17:40:18 -08:00
gVisor bot 6367963c14 Merge pull request #1951 from moricho:moricho/add-profiler-option
PiperOrigin-RevId: 299233818
2020-03-05 17:16:54 -08:00
Ian Gudger 9b3aad33c4 Use a pool of arrays to avoid slice headers from escaping in TCP options pool.
By putting slices into the pool, the slice header escapes. This can be avoided
by not putting the slice header into the pool.

This removes an allocation from the TCP segment send path.

PiperOrigin-RevId: 299215480
2020-03-05 15:56:42 -08:00
Andrei Vagin 80b40bbb06 tests: Don't print log messages on stdout
A parser of test results doesn't expect to see any extra messages.

PiperOrigin-RevId: 298966577
2020-03-04 16:16:35 -08:00
Jamie Liu a690b57624 Ensure that safemem.BlockSeqOf(safemem.Block{}) produces an empty BlockSeq.
PiperOrigin-RevId: 298941855
2020-03-04 14:30:27 -08:00
Fabricio Voznika 122d47aed1 Update cached file size when cache is skipped
gofer.dentryReadWriter.WriteFromBlocks was not updating
gofer.dentry.size after a write operation that skips the
cache.

Updates #1198

PiperOrigin-RevId: 298708646
2020-03-03 15:29:13 -08:00
Tamir Duberstein 371abe00f0 Avoid memory leaks
Properly discard segments from the segment heap.

PiperOrigin-RevId: 298704074
2020-03-03 15:07:09 -08:00
Andrei Vagin 277a0d5a1f platform/ptrace: don't call probeSeccomp on arm64
The support of PTRACE_SYSEMU on arm64 was added in the 5.3 kernel,
so we can be sure that the current version is higher that 5.3.

And this change moves vsyscall seccomp rules to the arch specific file,
because vsyscall isn't supported on arm64.

PiperOrigin-RevId: 298696493
2020-03-03 14:35:42 -08:00
Tamir Duberstein 844e4d284c Extract local variables for readability
PiperOrigin-RevId: 298690552
2020-03-03 14:11:01 -08:00
Ian Gudger c15b8515eb Fix datarace on TransportEndpointInfo.ID and clean up semantics.
Ensures that all access to TransportEndpointInfo.ID is either:
* In a function ending in a Locked suffix.
* While holding the appropriate mutex.

This primary affects the checkV4Mapped method on affected endpoints, which has
been renamed to checkV4MappedLocked. Also document the method and change its
argument to be a value instead of a pointer which had caused some awkwardness.

This race was possible in the udp and icmp endpoints between Connect and uses
of TransportEndpointInfo.ID including in both itself and Bind.

The tcp endpoint did not suffer from this bug, but benefited from better
documentation.

Updates #357

PiperOrigin-RevId: 298682913
2020-03-03 13:42:13 -08:00
Nayana Bidari 43abb24657 Fix panic caused by invalid address for Bind in packet sockets.
PiperOrigin-RevId: 298476533
2020-03-02 16:31:52 -08:00
Bhasker Hariharan 3310175250 Fix data-race when reading/writing e.amss.
PiperOrigin-RevId: 298451319
2020-03-02 14:45:03 -08:00
Ghanan Gowripalan 8821a7104f Do not read-lock NIC recursively
A deadlock may occur if a write lock on a RWMutex is blocked between
nested read lock attempts as the inner read lock attempt will be
blocked in this scenario.

Example (T1 and T2 are differnt goroutines):
  T1: obtain read-lock
  T2: attempt write-lock (blocks)
  T1: attempt inner/nested read-lock (blocks)

Here we can see that T1 and T2 are deadlocked.

Tests: Existing tests pass.
PiperOrigin-RevId: 298426678
2020-03-02 13:16:10 -08:00
gVisor bot f03e19d575 Merge pull request #1885 from avagin:arm64-pcids
PiperOrigin-RevId: 298405064
2020-03-02 11:42:04 -08:00
Andrei Vagin 42fb7d3491 socket: take readMu to access readView
DATA RACE in netstack.(*SocketOperations).fetchReadView

Write at 0x00c001dca138 by goroutine 1001:
  gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).fetchReadView()
      pkg/sentry/socket/netstack/netstack.go:418 +0x85
  gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).coalescingRead()
      pkg/sentry/socket/netstack/netstack.go:2309 +0x67
  gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).nonBlockingRead()
      pkg/sentry/socket/netstack/netstack.go:2378 +0x183d

Previous read at 0x00c001dca138 by goroutine 1111:
  gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).Ioctl()
      pkg/sentry/socket/netstack/netstack.go:2666 +0x533
  gvisor.dev/gvisor/pkg/sentry/syscalls/linux.Ioctl()

Reported-by: syzbot+d4c3885fcc346f08deb6@syzkaller.appspotmail.com
PiperOrigin-RevId: 298387377
2020-03-02 10:33:15 -08:00
Michael Pratt 62bd3ca8a3 Take write lock when removing xattr
PiperOrigin-RevId: 298380654
2020-03-02 10:07:13 -08:00
gVisor bot 3d9ddeb339 Merge pull request #1929 from avagin:arm64-cpuid
PiperOrigin-RevId: 297982488
2020-02-28 18:47:17 -08:00
Andrei Vagin ab7ecdd66d watchdog: print panic error message before other messages
This is needed for syzkaller to proper classify issues.

Right now, all watchdog issues are duped to one with the
subject "panic: Sentry detected stuck task(s). See stack
trace and message above for more details".

PiperOrigin-RevId: 297975363
2020-02-28 17:54:36 -08:00
Andrei Vagin 413a9b7fdc Define CPUIDInstruction for arm64
There is no cpuid instruction on arm64, so we need to defined it
just to avoid a compile time error.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-28 17:07:01 -08:00
Andrei Vagin 837cf62551 pcids.go isn't arch-specific
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-28 14:34:13 -08:00
Adin Scannell 463f4217d1 Make pipe buffer implementation standard.
A follow-up change will convert the networking code to use this standard
pipe implementation.

PiperOrigin-RevId: 297903206
2020-02-28 12:29:23 -08:00
Ting-Yu Wang 6b4d36e325 Hide /dev/net/tun when using hostinet.
/dev/net/tun does not currently work with hostinet. This has caused some
program starts failing because it thinks the feature exists.

PiperOrigin-RevId: 297876196
2020-02-28 10:39:12 -08:00
Fabricio Voznika 0f8a9e3623 Change dup2 call to dup3
We changed syscalls to allow dup3 for ARM64.

Updates #1198

PiperOrigin-RevId: 297870816
2020-02-28 10:15:20 -08:00
Nayana Bidari af6fab6514 Add nat table support for iptables.
- Fix review comments.
2020-02-28 10:00:38 -08:00
Ian Gudger c6bdc6b05b Fix a race in TCP endpoint teardown and teardown the stack in tcp_test.
Call stack.Close on stacks when we are done with them in tcp_test. This avoids
leaking resources and reduces the test's flakiness when race/gotsan is enabled.
It also provides test coverage for the race also fixed in this change, which
can be reliably triggered with the stack.Close change (and without the other
changes) when race/gotsan is enabled.

The race was possible when calling Abort (via stack.Close) on an endpoint
processing a SYN segment as part of a passive connect.

Updates #1564

PiperOrigin-RevId: 297685432
2020-02-27 14:15:44 -08:00
gVisor bot d9ee81183f Merge of a369c88c0c
PiperOrigin-RevId: 297674924
2020-02-27 13:34:23 -08:00
Nayana Bidari abf7ebcd38 Internal change.
PiperOrigin-RevId: 297638665
2020-02-27 11:00:41 -08:00
Bin Lu 5f0e8e6239 Prepare the vcpu environment for sentry on Arm64
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-02-27 01:19:28 -05:00
Rahat Mahmood 8fb84f78ad Fix construct of linux.Stat for arm64.
PiperOrigin-RevId: 297494373
2020-02-26 19:29:27 -08:00
gVisor bot 6ddeb35ed4 Merge pull request #1912 from lubinszARM:pr_kvm_build
PiperOrigin-RevId: 297492004
2020-02-26 19:09:45 -08:00
Nayana Bidari 9fccf98c0d Fix merge conflicts. 2020-02-26 13:18:35 -08:00
Kevin Krakauer 408979e619 iptables: filter by IP address (and range)
Enables commands such as:
$ iptables -A INPUT -d 127.0.0.1 -j ACCEPT
$ iptables -t nat -A PREROUTING ! -d 127.0.0.1 -j REDIRECT

Also adds a bunch of REDIRECT+destination tests.
2020-02-26 11:04:00 -08:00
moricho d8ed784311 add profile option 2020-02-26 16:49:51 +09:00
Jamie Liu a92087f0f8 Add VFS.NewDisconnectedMount().
Analogous to Linux's kern_mount().

PiperOrigin-RevId: 297259580
2020-02-25 19:13:30 -08:00
Adin Scannell fba479b3c7 Fix DATA RACE in fs.MayDelete.
MayDelete must lock the directory also, otherwise concurrent renames may
race. Note that this also changes the methods to be aligned with the actual
Remove and RemoveDirectory methods to minimize confusion when reading the
code. (It was hard to see that resolution was correct.)

PiperOrigin-RevId: 297258304
2020-02-25 19:04:15 -08:00
Haibo Xu 73201f4c57 Code Clean: Move arch independent codes to common file in kvm pkg.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Iefbdf53e8e8d6d23ae75d8a2ff0d2a6e71f414d8
2020-02-26 01:51:31 +00:00
gVisor bot 813b1b0486 Merge pull request #1271 from lubinszARM:pr_ring0_1
PiperOrigin-RevId: 297230721
2020-02-25 16:24:43 -08:00
Ian Gudger 87288b26a1 Add netlink sockopt logging to strace.
PiperOrigin-RevId: 297220008
2020-02-25 15:35:24 -08:00
nybidari 818abc2bd5
Merge branch 'master' into iptables 2020-02-25 15:33:59 -08:00
Ghanan Gowripalan 5f1f9dd9d2 Use link-local source address for link-local multicast
Tests:
- header_test.TestIsV6LinkLocalMulticastAddress
- header_test.TestScopeForIPv6Address
- stack_test.TestIPv6SourceAddressSelectionScopeAndSameAddress
PiperOrigin-RevId: 297215576
2020-02-25 15:16:16 -08:00
Nayana Bidari acc405ba60 Add nat table support for iptables.
- commit the changes for the comments.
2020-02-25 15:03:51 -08:00
Fabricio Voznika 72e3f3a3ee Add option to skip stuck tasks waiting for address space
PiperOrigin-RevId: 297192390
2020-02-25 13:44:18 -08:00
gVisor bot 430992a67a Merge pull request #1816 from xiaobo55x:trap_flag
PiperOrigin-RevId: 297191168
2020-02-25 13:41:05 -08:00
Jamie Liu 471b15b212 Port most syscalls to VFS2.
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.

Updates #1623

PiperOrigin-RevId: 297188448
2020-02-25 13:37:34 -08:00
Adin Scannell 6def8ea6ac Fix nested logging.
PiperOrigin-RevId: 297175316
2020-02-25 12:25:38 -08:00
Adin Scannell 98b693e61b Don't acquire contended lock with the OS thread locked.
Fixes #1049

PiperOrigin-RevId: 297175164
2020-02-25 12:22:29 -08:00
Adin Scannell 53504e29ca Fix mount refcount issue.
Each mount is holds a reference on a root Dirent, but the mount itself may
live beyond it's own reference. This means that a call to Root() can come
after the associated reference has been dropped.

Instead of introducing a separate layer of references for mount objects,
we simply change the Root() method to use TryIncRef() and allow it to return
nil if the mount is already gone. This requires updating a small number of
callers and minimizes the change (since VFSv2 will replace this code shortly).

PiperOrigin-RevId: 297174230
2020-02-25 12:17:52 -08:00
Bhasker Hariharan d7b7379251 Deflake TestCurrentConnectedIncrement.
TestCurrentConnectedIncrement fails consistently under gotsan due to the sleep
to check metrics is exactly the same as the TIME-WAIT duration. Under gotsan
things can be slow enough that the increment test is done before the protocol
goroutine is run after the TIME-WAIT timer expires and does its cleanup.

Increasing the sleep from 1s to 1.2s makes the test pass consistently.

PiperOrigin-RevId: 297160181
2020-02-25 11:19:34 -08:00
Haibo Xu 93e0c37529 Enable bluepill dieTrampoline operation on arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e1bf2513c23bdd8c387e5b3c874c6ad3ca9aab0
2020-02-25 01:50:58 +00:00
Ian Gudger c37b196455 Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.

Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.

PiperOrigin-RevId: 296922548
2020-02-24 10:32:17 -08:00
Ting-Yu Wang b8f56c79be Implement tap/tun device in vfs.
PiperOrigin-RevId: 296526279
2020-02-21 15:42:56 -08:00
Ghanan Gowripalan a155a23480 Attach LinkEndpoint to NetworkDispatcher immediately
Tests: stack_test.TestAttachToLinkEndpointImmediately
PiperOrigin-RevId: 296474068
2020-02-21 11:21:23 -08:00
Ghanan Gowripalan 97c07242c3 Use Route.MaxHeaderLength when constructing NDP RS
Test: stack_test.TestRouterSolicitation
PiperOrigin-RevId: 296454766
2020-02-21 09:54:55 -08:00
gVisor bot 4a73bae269 Initial network namespace support.
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.

Before the userspace is able to create device, the default loopback device can
be used to test.

/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.

Issue #1833

PiperOrigin-RevId: 296309389
2020-02-20 15:20:40 -08:00
gVisor bot 67b615b86f Support disabling a NIC
- Disabled NICs will have their associated NDP state cleared.
- Disabled NICs will not accept incoming packets.
- Writes through a Route with a disabled NIC will return an invalid
  endpoint state error.
- stack.Stack.FindRoute will not return a route with a disabled NIC.
- NIC's Running flag will report the NIC's enabled status.

Tests:
- stack_test.TestDisableUnknownNIC
- stack_test.TestDisabledNICsNICInfoAndCheckNIC
- stack_test.TestRoutesWithDisabledNIC
- stack_test.TestRouteWritePacketWithDisabledNIC
- stack_test.TestStopStartSolicitingRouters
- stack_test.TestCleanupNDPState
- stack_test.TestAddRemoveIPv4BroadcastAddressOnNICEnableDisable
- stack_test.TestJoinLeaveAllNodesMulticastOnNICEnableDisable
PiperOrigin-RevId: 296298588
2020-02-20 14:32:49 -08:00
gVisor bot d90d71474f Remove bytes read/written from marshal.Marshallable API.
Users of the API only care about whether the copy in/out succeeds in
their entirety, which is already signalled by the returned error.

PiperOrigin-RevId: 296297843
2020-02-20 14:29:26 -08:00
gVisor bot 9bad87339a Better strace logging for epoll syscalls.
Example:

epoll_ctl(0x3 anon_inode:[eventpoll], EPOLL_CTL_ADD, 0x6 anon_inode:[eventfd], 0x7efe2fd92a80 {events=EPOLLIN|EPOLLOUT data=0x10203040506070a}) = 0x0 (4.411µs)

epoll_wait(0x3 anon_inode:[eventpoll], 0x7efe2fd92b50 {{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}}, 0x3, 0xffffffff) = 0x3 (29.891µs)

PiperOrigin-RevId: 296258146
2020-02-20 11:31:00 -08:00
gVisor bot 9a4e3e63ef Re-add atomicbitops_arm64.s to BUILD.
This was inadverently dropped by cl/295811743.

PiperOrigin-RevId: 296254482
2020-02-20 11:16:08 -08:00
gVisor bot 10ed60e477 VFS2: Support memory mapping in tmpfs.
tmpfs.fileDescription now implements ConfigureMMap. And tmpfs.regularFile
implement memmap.Mappable. The methods are mostly unchanged from VFS1 tmpfs.

PiperOrigin-RevId: 296234557
2020-02-20 09:58:10 -08:00
Bin Lu a369c88c0c Lazy-fpsimd support patch series#1: add Arm64-fpsimd support to arch module
This patch defines the structures and
adds the implementations for fpsimd initialization.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-02-20 07:46:30 -05:00
Bin Lu de68e1d8c4 Code Clean:Move getUserRegisters into dieArchSetup() and other small changes.
Consistent with QEMU, getUserRegisters() should be an arch-specific
function. So, it should be called in dieArchSetup().

With this patch and the pagetable/pcid patch, the kvm modules on Arm64 can be
built successfully.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-02-20 06:43:27 +00:00
gVisor bot 2daa21e4d7 Internal change.
PiperOrigin-RevId: 296088213
2020-02-19 16:48:57 -08:00
Kevin Krakauer 92d2d78876 Fix mis-named comment. 2020-02-18 21:20:41 -08:00
gVisor bot 56fd9504aa Enable IPV6_RECVTCLASS socket option for datagram sockets
Added the ability to get/set the IP_RECVTCLASS socket option on UDP endpoints.
If enabled, traffic class from the incoming Network Header passed as ancillary
data in the ControlMessages.

Adding Get/SetSockOptBool to decrease the overhead of getting/setting simple
options. (This was absorbed in a CL that will be landing before this one).

Test:
* Added unit test to udp_test.go that tests getting/setting as well as
verifying that we receive expected TOS from incoming packet.
* Added a syscall test for verifying getting/setting
* Removed test skip for existing syscall test to enable end to end test.
PiperOrigin-RevId: 295840218
2020-02-18 15:45:36 -08:00
gVisor bot 55c553ae8c Add //pkg/syncevent.
Package syncevent is intended to subsume ~all uses of channels in the sentry
(including //pkg/waiter), as well as //pkg/sleep.

Compared to channels:

- Delivery of events to a syncevent.Receiver allows *synchronous* execution of
  an arbitrary callback, whereas delivery of events to a channel requires a
  goroutine to receive from that channel, resulting in substantial scheduling
  overhead. (This is also part of the motivation for the waiter package.)

- syncevent.Waiter can wait on multiple event sources without the high O(N)
  overhead of select. (This is the same motivation as for the sleep package.)

Compared to the waiter package:

- syncevent.Waiters are intended to be persistent (i.e. per-kernel.Task), and
  syncevent.Broadcaster (analogous to waiter.Queue) is a hash table rather than
  a linked list, such that blocking is (usually) allocation-free.

- syncevent.Source (analogous to waiter.Waitable) does not include an equivalent
  to waiter.Waitable.Readiness(), since this is inappropriate for transient
  events (see e.g. //pkg/sentry/kernel/time.ClockEventSource).

Compared to the sleep package:

- syncevent events are represented by bits in a bitmask rather than discrete
  sleep.Waker objects, reducing overhead and making it feasible to broadcast
  events to multiple syncevent.Receivers.

- syncevent.Receiver invokes an arbitrary callback, which is required by the
  sentry's epoll implementation. (syncevent.Waiter, which is analogous to
  sleep.Sleeper, pairs a syncevent.Receiver with a callback that wakes a
  waiting goroutine; the implementation of this aspect is nearly identical to
  that of sleep.Sleeper, except that it represents *runtime.g as unsafe.Pointer
  rather than uintptr.)

- syncevent.Waiter.Wait (analogous to sleep.Sleeper.Fetch(block=true)) does not
  automatically un-assert returned events. This is useful in cases where the
  path for handling an event is not the same as the path that observes it, such
  as for application signals (a la Linux's TIF_SIGPENDING).

- Unlike sleep.Sleeper, which Fetches Wakers in the order that they were
  Asserted, the event bitmasks used by syncevent.Receiver have no way of
  preserving event arrival order. (This is similar to select, which goes out of
  its way to randomize event ordering.)

The disadvantage of the syncevent package is that, since events are represented
by bits in a uint64 bitmask, each syncevent.Receiver can "only" multiplex
between 64 distinct events; this does not affect any known use case.

Benchmarks:

BenchmarkBroadcasterSubscribeUnsubscribe
BenchmarkBroadcasterSubscribeUnsubscribe-12         	45133884	        26.3 ns/op
BenchmarkMapSubscribeUnsubscribe
BenchmarkMapSubscribeUnsubscribe-12                 	28504662	        41.8 ns/op
BenchmarkQueueSubscribeUnsubscribe
BenchmarkQueueSubscribeUnsubscribe-12               	22747668	        45.6 ns/op
BenchmarkBroadcasterSubscribeUnsubscribeBatch
BenchmarkBroadcasterSubscribeUnsubscribeBatch-12    	31609177	        37.8 ns/op
BenchmarkMapSubscribeUnsubscribeBatch
BenchmarkMapSubscribeUnsubscribeBatch-12            	17563906	        62.1 ns/op
BenchmarkQueueSubscribeUnsubscribeBatch
BenchmarkQueueSubscribeUnsubscribeBatch-12          	26248838	        46.6 ns/op
BenchmarkBroadcasterBroadcastRedundant
BenchmarkBroadcasterBroadcastRedundant/0
BenchmarkBroadcasterBroadcastRedundant/0-12         	100907563	        11.8 ns/op
BenchmarkBroadcasterBroadcastRedundant/1
BenchmarkBroadcasterBroadcastRedundant/1-12         	85103068	        13.3 ns/op
BenchmarkBroadcasterBroadcastRedundant/4
BenchmarkBroadcasterBroadcastRedundant/4-12         	52716502	        22.3 ns/op
BenchmarkBroadcasterBroadcastRedundant/16
BenchmarkBroadcasterBroadcastRedundant/16-12        	20278165	        58.7 ns/op
BenchmarkBroadcasterBroadcastRedundant/64
BenchmarkBroadcasterBroadcastRedundant/64-12        	 5905428	       205 ns/op
BenchmarkMapBroadcastRedundant
BenchmarkMapBroadcastRedundant/0
BenchmarkMapBroadcastRedundant/0-12                 	87532734	        13.5 ns/op
BenchmarkMapBroadcastRedundant/1
BenchmarkMapBroadcastRedundant/1-12                 	28488411	        36.3 ns/op
BenchmarkMapBroadcastRedundant/4
BenchmarkMapBroadcastRedundant/4-12                 	19628920	        60.9 ns/op
BenchmarkMapBroadcastRedundant/16
BenchmarkMapBroadcastRedundant/16-12                	 6026980	       192 ns/op
BenchmarkMapBroadcastRedundant/64
BenchmarkMapBroadcastRedundant/64-12                	 1640858	       754 ns/op
BenchmarkQueueBroadcastRedundant
BenchmarkQueueBroadcastRedundant/0
BenchmarkQueueBroadcastRedundant/0-12               	96904807	        12.0 ns/op
BenchmarkQueueBroadcastRedundant/1
BenchmarkQueueBroadcastRedundant/1-12               	73521873	        16.3 ns/op
BenchmarkQueueBroadcastRedundant/4
BenchmarkQueueBroadcastRedundant/4-12               	39209468	        31.2 ns/op
BenchmarkQueueBroadcastRedundant/16
BenchmarkQueueBroadcastRedundant/16-12              	10810058	       105 ns/op
BenchmarkQueueBroadcastRedundant/64
BenchmarkQueueBroadcastRedundant/64-12              	 2998046	       376 ns/op
BenchmarkBroadcasterBroadcastAck
BenchmarkBroadcasterBroadcastAck/1
BenchmarkBroadcasterBroadcastAck/1-12               	44472397	        26.4 ns/op
BenchmarkBroadcasterBroadcastAck/4
BenchmarkBroadcasterBroadcastAck/4-12               	17653509	        69.7 ns/op
BenchmarkBroadcasterBroadcastAck/16
BenchmarkBroadcasterBroadcastAck/16-12              	 4082617	       260 ns/op
BenchmarkBroadcasterBroadcastAck/64
BenchmarkBroadcasterBroadcastAck/64-12              	 1220534	      1027 ns/op
BenchmarkMapBroadcastAck
BenchmarkMapBroadcastAck/1
BenchmarkMapBroadcastAck/1-12                       	26760705	        44.2 ns/op
BenchmarkMapBroadcastAck/4
BenchmarkMapBroadcastAck/4-12                       	11495636	       100 ns/op
BenchmarkMapBroadcastAck/16
BenchmarkMapBroadcastAck/16-12                      	 2937590	       343 ns/op
BenchmarkMapBroadcastAck/64
BenchmarkMapBroadcastAck/64-12                      	  861037	      1344 ns/op
BenchmarkQueueBroadcastAck
BenchmarkQueueBroadcastAck/1
BenchmarkQueueBroadcastAck/1-12                     	19832679	        55.0 ns/op
BenchmarkQueueBroadcastAck/4
BenchmarkQueueBroadcastAck/4-12                     	 5618214	       189 ns/op
BenchmarkQueueBroadcastAck/16
BenchmarkQueueBroadcastAck/16-12                    	 1569980	       713 ns/op
BenchmarkQueueBroadcastAck/64
BenchmarkQueueBroadcastAck/64-12                    	  437672	      2814 ns/op
BenchmarkWaiterNotifyRedundant
BenchmarkWaiterNotifyRedundant-12                   	650823090	         1.96 ns/op
BenchmarkSleeperNotifyRedundant
BenchmarkSleeperNotifyRedundant-12                  	619871544	         1.61 ns/op
BenchmarkChannelNotifyRedundant
BenchmarkChannelNotifyRedundant-12                  	298903778	         3.67 ns/op
BenchmarkWaiterNotifyWaitAck
BenchmarkWaiterNotifyWaitAck-12                     	68358360	        17.8 ns/op
BenchmarkSleeperNotifyWaitAck
BenchmarkSleeperNotifyWaitAck-12                    	25044883	        41.2 ns/op
BenchmarkChannelNotifyWaitAck
BenchmarkChannelNotifyWaitAck-12                    	29572404	        40.2 ns/op
BenchmarkSleeperMultiNotifyWaitAck
BenchmarkSleeperMultiNotifyWaitAck-12               	16122969	        73.8 ns/op
BenchmarkWaiterTempNotifyWaitAck
BenchmarkWaiterTempNotifyWaitAck-12                 	46111489	        25.8 ns/op
BenchmarkSleeperTempNotifyWaitAck
BenchmarkSleeperTempNotifyWaitAck-12                	15541882	        73.6 ns/op
BenchmarkWaiterNotifyWaitMultiAck
BenchmarkWaiterNotifyWaitMultiAck-12                	65878500	        18.2 ns/op
BenchmarkSleeperNotifyWaitMultiAck
BenchmarkSleeperNotifyWaitMultiAck-12               	28798623	        41.5 ns/op
BenchmarkChannelNotifyWaitMultiAck
BenchmarkChannelNotifyWaitMultiAck-12               	11308468	       101 ns/op
BenchmarkWaiterNotifyAsyncWaitAck
BenchmarkWaiterNotifyAsyncWaitAck-12                	 2475387	       492 ns/op
BenchmarkSleeperNotifyAsyncWaitAck
BenchmarkSleeperNotifyAsyncWaitAck-12               	 2184507	       518 ns/op
BenchmarkChannelNotifyAsyncWaitAck
BenchmarkChannelNotifyAsyncWaitAck-12               	 2120365	       562 ns/op
BenchmarkWaiterNotifyAsyncWaitMultiAck
BenchmarkWaiterNotifyAsyncWaitMultiAck-12           	 2351247	       494 ns/op
BenchmarkSleeperNotifyAsyncWaitMultiAck
BenchmarkSleeperNotifyAsyncWaitMultiAck-12          	 2205799	       522 ns/op
BenchmarkChannelNotifyAsyncWaitMultiAck
BenchmarkChannelNotifyAsyncWaitMultiAck-12          	 1238079	       928 ns/op

Updates #1074

PiperOrigin-RevId: 295834087
2020-02-18 15:18:48 -08:00
gVisor bot a3582de618 cpuid: cache the maximum size of xsave state
perf shows that ExtendedStateSize cosumes more than 20% of cpu:

    23.61%    23.61%  [.] pkg/cpuid/cpuid.HostID

PiperOrigin-RevId: 295813263
2020-02-18 13:50:07 -08:00
gVisor bot 906eb6295d atomicbitops package cleanups
- Redocument memory ordering from "no ordering" to "acquire-release". (No
  functional change: both LOCK WHATEVER on x86, and LDAXR/STLXR loops on ARM64,
  already have this property.)

- Remove IncUnlessZeroInt32 and DecUnlessOneInt32, which were only faster than
  the equivalent loops using sync/atomic before the Go compiler inlined
  non-unsafe.Pointer atomics many releases ago.

PiperOrigin-RevId: 295811743
2020-02-18 13:43:28 -08:00
gVisor bot 7fdb609b3e Merge pull request #1850 from kevinGC:jump2
PiperOrigin-RevId: 295785052
2020-02-18 11:41:54 -08:00
Nayana Bidari b30b7f3422 Add nat table support for iptables.
Add nat table support for Prerouting hook with Redirect option.
Add tests to check redirect of ports.
2020-02-18 11:30:42 -08:00
gVisor bot fae3de21af ring0/pagetables: fix typo
PiperOrigin-RevId: 295770717
2020-02-18 10:50:46 -08:00
gVisor bot a5069f820f Remove linux.EpollEvent.Fd.
glibc defines struct epoll_event in such a way that epoll_event.data.fd exists.
However, the kernel's definition of struct epoll_event makes epoll_event.data
an opaque uint64, so naming half of it "fd" just introduces confusion. Remove
the Fd field, and make Data a [2]int32 to compensate.

Also add required padding to linux.EpollEvent on ARM64.

PiperOrigin-RevId: 295250424
2020-02-14 16:19:48 -08:00
gVisor bot 5baf9dc2fb Synchronize signalling with S/R
This is to fix a data race between sending an external signal to
a ThreadGroup and kernel saving state for S/R.

PiperOrigin-RevId: 295244281
2020-02-14 15:49:09 -08:00
gVisor bot 3557b26651 Allow vfs.IterDirentsCallback.Handle() to return an error.
This is easier than storing errors from e.g. CopyOut in the callback.

PiperOrigin-RevId: 295230021
2020-02-14 14:40:35 -08:00
gVisor bot 87bc2834c9 Enable automated marshalling for RSeqCriticalSection.
PiperOrigin-RevId: 295226468
2020-02-14 14:24:27 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot 50c493193b Un-export p9 message encode/decode functions.
These are not used outside of the p9 package.

PiperOrigin-RevId: 295200052
2020-02-14 12:23:10 -08:00
gVisor bot 3c26f5ecb0 Enable automated marshalling for struct stat.
This requires fixing a few build issues for non-am64 platforms.

PiperOrigin-RevId: 295196922
2020-02-14 12:08:12 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00
gVisor bot b2e86906ea Fix various issues related to enabling go-marshal.
- Add missing build tags to files in the abi package.

- Add the marshal package as a sentry dependency, allowed by deps_test.

- Fix an issue with our top-level go_library BUILD rule, which
  incorrectly shadows the variable containing the input set of source
  files. This caused the expansion for the go_marshal clause to
  silently omit input files.

- Fix formatting when copying build tags to gomarshal-generated files.

- Fix a bug with import statement collision detection in go-marshal.

PiperOrigin-RevId: 295112284
2020-02-14 03:27:34 -08:00
Bin Lu ebaf29abeb passed the kvm test case of "TestKernelSyscall" on Arm64
For kvm test case "TestKernelSyscall",
redpill/syscall(-1) in guest kernel level will be trapped in el1_svc.
And in el1_svc, we use mmio_exit to leave the guest.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-02-14 02:04:44 -05:00
gVisor bot a6024f7f5f Add FileExec flag to OpenOptions
This allow callers to say whether the file is being
opened to be executed, so that the proper checks can
be done from FilesystemImpl.OpenAt()

Updates #1623

PiperOrigin-RevId: 295042595
2020-02-13 17:57:36 -08:00
Kevin Krakauer 6ef63cd7da We can now create and jump in iptables. For example:
$ iptables -N foochain
$ iptables -A INPUT -j foochain
2020-02-13 17:02:50 -08:00
gVisor bot 16308b9dc1 Merge pull request #1791 from kevinGC:uchains
PiperOrigin-RevId: 294957297
2020-02-13 11:19:09 -08:00
gVisor bot 69bf39e8a4 Internal change.
PiperOrigin-RevId: 294952610
2020-02-13 10:59:52 -08:00
Haibo Xu d30a884775 Add definition of arch.ARMTrapFlag.
Fixes #1708

Signed-off-by: Haibo Xu haibo.xu@arm.com
Change-Id: Ib15768692ead17c81c06f7666ca3f0a14064c3a0
2020-02-13 00:25:16 +00:00
Kevin Krakauer 6fdf2c53a1 iptables: User chains
- Adds creation of user chains via `-N <chainname>`
- Adds `-j RETURN` support for built-in chains, which triggers the
  chain's underflow rule (usually the default policy).
- Adds tests for chain creation, default policies, and `-j RETURN' from
  built-in chains.
2020-02-12 15:02:47 -08:00
gVisor bot 5205bc7e58 Simplify atomic operations
PiperOrigin-RevId: 294582802
2020-02-11 20:37:01 -08:00
gVisor bot 6dced977ea Ensure fsimpl/gofer.dentryPlatformFile.hostFileMapper is initialized.
Fixes #1812. (The more direct cause of the deadlock is panic unsafety because
the historically high cost of defer means that we avoid it in hot paths,
including much of MM; defer is much cheaper as of Go 1.14, but still a
measurable overhead.)

PiperOrigin-RevId: 294560316
2020-02-11 17:38:57 -08:00
gVisor bot b8e22e241c Disallow duplicate NIC names.
PiperOrigin-RevId: 294500858
2020-02-11 12:59:11 -08:00
Adin Scannell 115898e368 Prevent DATA RACE in UnstableAttr.
The slaveInodeOperations is currently copying the object when
truncate is called (which is a no-op). This may result in a
(unconsequential) data race when being modified concurrently.

PiperOrigin-RevId: 294484276
2020-02-11 11:38:08 -08:00
gVisor bot 762e4761cc Move Align{Up,Down} into binary package.
PiperOrigin-RevId: 294477647
2020-02-11 11:09:31 -08:00
gVisor bot 0dd9ee0d1e Merge pull request #1775 from kevinGC:tcp-matchers-submit
PiperOrigin-RevId: 294340468
2020-02-10 17:21:13 -08:00
Adin Scannell 71af006b6f Cleanup internal package group.
PiperOrigin-RevId: 294339229
2020-02-10 17:12:59 -08:00
Dean Deng 475316e87d Refactor getxattr.
Put most of the logic for getxattr in one place for clarity. This simplifies
FGetXattr and getXattrFromPath, which are just wrappers for getXattr.

PiperOrigin-RevId: 294308332
2020-02-10 14:47:47 -08:00
Adin Scannell 2889ffa84e Add context to note.
PiperOrigin-RevId: 294300040
2020-02-10 14:11:52 -08:00
Adin Scannell a6f9361c2f Add context to comments.
PiperOrigin-RevId: 294295852
2020-02-10 13:52:09 -08:00
Adin Scannell bb22ebd7fb Add contextual comment.
PiperOrigin-RevId: 294289066
2020-02-10 13:21:30 -08:00
Adin Scannell 4d4d47f0c0 Add contextual note.
PiperOrigin-RevId: 294285723
2020-02-10 13:05:27 -08:00
Adin Scannell c9a18b16ad Document MinimumTotalMemoryBytes.
PiperOrigin-RevId: 294273559
2020-02-10 12:08:32 -08:00
Fabricio Voznika bfa0bba72a Redirect FIXME to gvisor.dev
PiperOrigin-RevId: 294272755
2020-02-10 12:04:38 -08:00
Brad Burlage 20840bfec0 Move x86 state definition to its own file.
PiperOrigin-RevId: 294271541
2020-02-10 12:00:46 -08:00
Adin Scannell 0efa8168c7 Update visibility.
PiperOrigin-RevId: 294265019
2020-02-10 11:30:21 -08:00
gVisor bot a03b40ca17 Merge pull request #1453 from xiaobo55x:cpuid
PiperOrigin-RevId: 294257911
2020-02-10 11:01:08 -08:00
Haibo Xu 9cbf5a3dcc Enable pkg/cpuid support on arm64.
Fixes #1255

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I8614e6f3ee321c2989567e4e712aa8f28cc9db14
2020-02-10 02:46:05 +00:00
Dean Deng 17b9f5e662 Support listxattr and removexattr syscalls.
Note that these are only implemented for tmpfs, and other impls will still
return EOPNOTSUPP.

PiperOrigin-RevId: 293899385
2020-02-07 14:47:13 -08:00
Ian Gudger e1587a2887 Log level, optname, optval and optlen in getsockopt/setsockopt in strace.
Log 8, 16, and 32 int optvals and dump the memory of other sizes.

Updates #1782

PiperOrigin-RevId: 293889388
2020-02-07 14:01:47 -08:00
Kevin Krakauer c141eb5f43 Address GH comments. 2020-02-07 13:47:57 -08:00
Ghanan Gowripalan ca30dfa065 Send DAD event when DAD resolves immediately
Previously, a DAD event would not be sent if DAD was disabled.

This allows integrators to do some work when an IPv6 address is bound to
a NIC without special logic that checks if DAD is enabled.

Without this change, integrators would need to check if a NIC has DAD
enabled when an address is auto-generated. If DAD is enabled, it would
need to delay the work until the DAD completion event; otherwise, it
would need to do the work in the address auto-generated event handler.

Test: stack_test.TestDADDisabled
PiperOrigin-RevId: 293732914
2020-02-06 19:50:34 -08:00
Kevin Krakauer d98287f5eb Merge branch 'master' into tcp-matchers-submit 2020-02-06 17:07:04 -08:00
Ghanan Gowripalan 3700221b1f Auto-generate link-local address as a SLAAC address
Auto-generated link-local addresses should have the same lifecycle hooks
as global SLAAC addresses.

The Stack's NDP dispatcher should be notified when link-local addresses
are auto-generated and invalidated. They should also be removed when a
NIC is disabled (which will be supported in a later change).

Tests:
- stack_test.TestNICAutoGenAddrWithOpaque
- stack_test.TestNICAutoGenAddr
PiperOrigin-RevId: 293706760
2020-02-06 16:43:39 -08:00
Ghanan Gowripalan 940d255971 Perform DAD on IPv6 addresses when enabling a NIC
Addresses may be added before a NIC is enabled. Make sure DAD is
performed on the permanent IPv6 addresses when they get enabled.

Test:
- stack_test.TestDoDADWhenNICEnabled
- stack.TestDisabledRxStatsWhenNICDisabled
PiperOrigin-RevId: 293697429
2020-02-06 15:58:16 -08:00
Ian Gudger 736775e0ac Make gonet consistent both internally and with the net package.
The types gonet.Conn and gonet.PacketConn were confusingly named as both
implemented net.Conn. Further, gonet.Conn was perhaps unexpectedly
TCP-specific (net.Conn is not). This change renames them to gonet.TCPConn and
gonet.UDPConn.

Renames gonet.NewListener to gonet.ListenTCP and adds a new gonet.NewTCPListner
function to be consistent with both the gonet.DialXxx and gonet.NewXxxConn
functions as well as net.ListenTCP.

Updates #1632

PiperOrigin-RevId: 293671303
2020-02-06 14:07:04 -08:00
Ghanan Gowripalan 6bd59b4e08 Update link address for targets of Neighbor Adverts
Get the link address for the target of an NDP Neighbor Advertisement
from the NDP Target Link Layer Address option.

Tests:
- ipv6.TestNeighorAdvertisementWithTargetLinkLayerOption
- ipv6.TestNeighorAdvertisementWithInvalidTargetLinkLayerOption
PiperOrigin-RevId: 293632609
2020-02-06 11:13:29 -08:00
Andrei Vagin 5ff780891e Move p9.pool to a separate package
PiperOrigin-RevId: 293617493
2020-02-06 10:07:45 -08:00
Adin Scannell 1b6a12a768 Add notes to relevant tests.
These were out-of-band notes that can help provide additional context
and simplify automated imports.

PiperOrigin-RevId: 293525915
2020-02-05 22:46:35 -08:00
Eyal Soha f3d9560703 recv() on a closed TCP socket returns ENOTCONN
From RFC 793 s3.9 p58 Event Processing:

If RECEIVE Call arrives in CLOSED state and the user has access to such a
connection, the return should be "error: connection does not exist"

Fixes #1598

PiperOrigin-RevId: 293494287
2020-02-05 17:56:42 -08:00
Kevin Krakauer bf0ea204e9 Merge branch 'master' into tcp-matchers-submit 2020-02-05 14:43:11 -08:00
Nicolas Lacasse eea0eeee93 Disable get/set xattrs until list/remove exist too.
PiperOrigin-RevId: 293411655
2020-02-05 11:26:19 -08:00
Ting-Yu Wang 665b614e4a Support RTM_NEWADDR and RTM_GETLINK in (rt)netlink.
PiperOrigin-RevId: 293271055
2020-02-04 18:05:03 -08:00
gVisor bot b29aeebaf6 Merge pull request #1683 from kevinGC:ipt-udp-matchers
PiperOrigin-RevId: 293243342
2020-02-04 16:20:16 -08:00
Ian Gudger a26a954946 Add socket connection stress test.
Tests 65k connection attempts on common types of sockets to check for port
leaks.

Also fixes a bug where dual-stack sockets wouldn't properly re-queue
segments received while closing.

PiperOrigin-RevId: 293241166
2020-02-04 15:54:49 -08:00
Michael Pratt 6823b5e244 timer_create(2) should return 0 on success
The timer ID is copied out to the argument.

Fixes #1738

PiperOrigin-RevId: 293210801
2020-02-04 13:27:39 -08:00
Adin Scannell f5072caaf8 Fix safecopy test.
This is failing in Go1.14 due to new checkptr constraints. This version
should avoid hitting this constraints by doing "dangerous" pointer math
less dangerously?

PiperOrigin-RevId: 293205764
2020-02-04 12:47:06 -08:00
Fabricio Voznika dcffddf0ca Remove argument from vfs.MountNamespace.DecRef()
Updates #1035

PiperOrigin-RevId: 293194631
2020-02-04 11:48:36 -08:00
Jamie Liu 492229d017 VFS2 gofer client
Updates #1198

Opening host pipes (by spinning in fdpipe) and host sockets is not yet
complete, and will be done in a future CL.

Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels
of backportability:

- "Cache policies" are replaced by InteropMode, which control the behavior of
  timestamps in addition to caching. Under InteropModeExclusive (analogous to
  cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough),
  client timestamps are *not* written back to the server (it is not possible in
  9P or Linux for clients to set ctime, so writing back client-authoritative
  timestamps results in incoherence between atime/mtime and ctime). Under
  InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps
  are not used at all (remote filesystem clocks are authoritative). cacheNone
  is translated to InteropModeShared + new option
  filesystemOptions.specialRegularFiles.

- Under InteropModeShared, "unstable attribute" reloading for permission
  checks, lookup, and revalidation are fused, which is feasible in VFS2 since
  gofer.filesystem controls path resolution. This results in a ~33% reduction
  in RPCs for filesystem operations compared to cacheRemoteRevalidating. For
  example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails
  revalidation, resulting in the instantiation of a new dentry:

  VFS1 RPCs:
  getattr("/")                          // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr()
  walkgetattr("/", "foo") = fid1        // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate()
  clunk(fid1)
  getattr("/foo")                       // CheckPermission
  walkgetattr("/foo", "bar") = fid2     // Revalidate
  clunk(fid2)
  getattr("/foo/bar")                   // CheckPermission
  walkgetattr("/foo/bar", "baz") = fid3 // Revalidate
  clunk(fid3)
  walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup
  getattr("/foo/bar/baz")               // linux.stat() => gofer.inodeOperations.UnstableAttr()

  VFS2 RPCs:
  getattr("/")                          // gofer.filesystem.walkExistingLocked()
  walkgetattr("/", "foo") = fid1        // gofer.filesystem.stepExistingLocked()
  clunk(fid1)
                                        // No getattr: walkgetattr already updated metadata for permission check
  walkgetattr("/foo", "bar") = fid2
  clunk(fid2)
  walkgetattr("/foo/bar", "baz") = fid3
                                        // No clunk: fid3 used for new gofer.dentry
                                        // No getattr: walkgetattr already updated metadata for stat()

- gofer.filesystem.unlinkAt() does not require instantiation of a dentry that
  represents the file to be deleted. Updates #898.

- gofer.regularFileFD.OnClose() skips Tflushf for regular files under
  InteropModeExclusive, as it's nonsensical to request a remote file flush
  without flushing locally-buffered writes to that remote file first.

- Symlink targets are cached when InteropModeShared is not in effect.

- p9.QID.Path (which is already required to be unique for each file within a
  server, and is accordingly already synthesized from device/inode numbers in
  all known gofers) is used as-is for inode numbers, rather than being mapped
  along with attr.RDev in the client to yet another synthetic inode number.

- Relevant parts of fsutil.CachingInodeOperations are inlined directly into
  gofer package code. This avoids having to duplicate part of its functionality
  in fsutil.HostMappable.

PiperOrigin-RevId: 293190213
2020-02-04 11:29:22 -08:00
Fabricio Voznika d7cd484091 Add support for sentry internal pipe for gofer mounts
Internal pipes are supported similarly to how internal UDS is done.
It is also controlled by the same flag.

Fixes #1102

PiperOrigin-RevId: 293150045
2020-02-04 08:20:52 -08:00
Andrei Vagin f37e913a35 seccomp: allow to filter syscalls by instruction pointer
PiperOrigin-RevId: 293029446
2020-02-03 16:16:18 -08:00
Ian Gudger 02997af5ab Fix method comment to match method name.
PiperOrigin-RevId: 292624867
2020-01-31 15:09:13 -08:00
Dean Deng 6c3072243d Implement file locks for regular tmpfs files in VFSv2.
Add a file lock implementation that can be embedded into various filesystem
implementations.

Updates #1480

PiperOrigin-RevId: 292614758
2020-01-31 14:15:41 -08:00
Ghanan Gowripalan 77bf586db7 Use multicast Ethernet address for multicast NDP
As per RFC 2464 section 7, an IPv6 packet with a multicast destination
address is transmitted to the mapped Ethernet multicast address.

Test:
- ipv6.TestLinkResolution
- stack_test.TestDADResolve
- stack_test.TestRouterSolicitation
PiperOrigin-RevId: 292610529
2020-01-31 13:55:46 -08:00
Kevin Krakauer 29ad5762e4 Spelling 2020-01-31 13:53:58 -08:00
Kevin Krakauer eba7bdc24d iptables: enable TCP matching with "-m tcp".
A couple other things that changed:

- There's a proper extension registration system for matchers. Anyone
  adding another matcher can use tcp_matcher.go or udp_matcher.go as a
  template.
- All logging and use of syserr.Error in the netfilter package happens at the
  highest possible level (public functions). Lower-level functions just
  return normal, descriptive golang errors.
2020-01-31 13:46:13 -08:00
Ghanan Gowripalan 528dd1ec72 Extract multicast IP to Ethernet address mapping
Test: header.TestEthernetAddressFromMulticastIPAddress
PiperOrigin-RevId: 292604649
2020-01-31 13:25:48 -08:00
gVisor bot bc3a24d627 Internal change.
PiperOrigin-RevId: 292587459
2020-01-31 13:19:42 -08:00
Kevin Krakauer 2142c70118 Merge branch 'master' into ipt-udp-matchers 2020-01-30 14:56:50 -08:00
Bhasker Hariharan 4ee64a248e Fix for panic in endpoint.Close().
When sending a RST on shutdown we need to double check the
state after acquiring the work mutex as the endpoint could
have transitioned out of a connected state from the time
we checked it and we acquired the workMutex.

I added two tests but sadly neither reproduce the panic. I am
going to leave the tests in as they are good to have anyway.

PiperOrigin-RevId: 292393800
2020-01-30 12:00:35 -08:00
gVisor bot 757b2b87fe Merge pull request #1288 from lubinszARM:pr_ring0_6
PiperOrigin-RevId: 292369598
2020-01-30 10:01:31 -08:00
Michael Pratt ede8dfab37 Enforce splice offset limits
Splice must not allow negative offsets. Writes also must not allow offset +
size to overflow int64. Reads are similarly broken, but not just in splice
(b/148095030).

Reported-by: syzbot+0e1ff0b95fb2859b4190@syzkaller.appspotmail.com
PiperOrigin-RevId: 292361208
2020-01-30 09:14:31 -08:00
Ghanan Gowripalan ec0679737e Do not include the Source Link Layer option with an unspecified source address
When sending NDP messages with an unspecified source address, the Source
Link Layer address must not be included.

Test: stack_test.TestDADResolve
PiperOrigin-RevId: 292341334
2020-01-30 07:12:52 -08:00
Ghanan Gowripalan 6f841c304d Do not spawn a goroutine when calling stack.NDPDispatcher's methods
Do not start a new goroutine when calling
stack.NDPDispatcher.OnDuplicateAddressDetectionStatus.

PiperOrigin-RevId: 292268574
2020-01-29 19:55:00 -08:00
Bhasker Hariharan 51b783505b Add support for TCP_DEFER_ACCEPT.
PiperOrigin-RevId: 292233574
2020-01-29 15:53:45 -08:00
Kevin Krakauer b615f94aea Merge branch 'master' into ipt-udp-matchers 2020-01-29 13:21:12 -08:00
Dean Deng 148fda60e8 Add plumbing for file locks in VFS2.
Updates #1480

PiperOrigin-RevId: 292180192
2020-01-29 11:39:28 -08:00
Andrei Vagin 37bb502670 sentry: rename SetRSEQInterruptedIP to SetOldRSeqInterruptedIP for arm64
For amd64, this has been done on cl/288342928.

PiperOrigin-RevId: 292170856
2020-01-29 10:47:28 -08:00
Jamie Liu 8dcedc953a Add //pkg/sentry/devices/memdev.
PiperOrigin-RevId: 292165063
2020-01-29 10:09:31 -08:00
Bin Lu 6adbdfe232 supporting sError in guest kernel on Arm64
For test case 'TestBounce', we use KVM_SET_VCPU_EVENTS to trigger sError
to leave guest.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-01-29 07:50:38 -05:00
Dean Deng 4cb55a7a3b Prevent arbitrary size allocation when sending UDS messages.
Currently, Send() will copy data into a new byte slice without regard to the
original size. Size checks should be performed before the allocation takes
place.

Note that for the sake of performance, we avoid putting the buffer
allocation into the critical section. As a result, the size checks need to be
performed again within Enqueue() in case the limit has changed.

PiperOrigin-RevId: 292058147
2020-01-28 18:46:14 -08:00
Fabricio Voznika 396c574db2 Add support for WritableSource in DynamicBytesFileDescriptionImpl
WritableSource is a convenience interface used for files that can
be written to, e.g. /proc/net/ipv4/tpc_sack. It reads max of 4KB
and only from offset 0 which should cover most cases. It can be
extended as neeed.

Updates #1195

PiperOrigin-RevId: 292056924
2020-01-28 18:31:28 -08:00
Fabricio Voznika 3d046fef06 Changes missing in last submit
Updates #1487
Updates #1623

PiperOrigin-RevId: 292040835
2020-01-28 16:53:55 -08:00
Ghanan Gowripalan 431ff52768 Update link address for senders of Neighbor Solicitations
Update link address for senders of NDP Neighbor Solicitations when the NS
contains an NDP Source Link Layer Address option.

Tests:
- ipv6.TestNeighorSolicitationWithSourceLinkLayerOption
- ipv6.TestNeighorSolicitationWithInvalidSourceLinkLayerOption
PiperOrigin-RevId: 292028553
2020-01-28 15:45:36 -08:00
Fabricio Voznika 437c986c6a Add vfs.FileDescription to FD table
FD table now holds both VFS1 and VFS2 types and uses the correct
one based on what's set.

Parts of this CL are just initial changes (e.g. sys_read.go,
runsc/main.go) to serve as a template for the remaining changes.

Updates #1487
Updates #1623

PiperOrigin-RevId: 292023223
2020-01-28 15:31:03 -08:00
Jamie Liu 2862b0b1be Add //pkg/sentry/fsimpl/devtmpfs.
PiperOrigin-RevId: 292021389
2020-01-28 15:05:24 -08:00
Ghanan Gowripalan ce0bac4be9 Include the NDP Source Link Layer option when sending DAD messages
Test: stack_test.TestDADResolve
PiperOrigin-RevId: 292003124
2020-01-28 13:52:04 -08:00
Andrei Vagin f263801a74 fs/splice: don't report partial errors for special files
Special files can have additional requirements for granularity.
For example, read from eventfd returns EINVAL if a size is less 8 bytes.

Reported-by: syzbot+3905f5493bec08eb7b02@syzkaller.appspotmail.com
PiperOrigin-RevId: 292002926
2020-01-28 13:37:19 -08:00
Jamie Liu 34fbd8446c Add VFS2 support for epoll.
PiperOrigin-RevId: 291997879
2020-01-28 13:11:43 -08:00
Jianfeng Tan d99329e584 netlink: add support for RTM_F_LOOKUP_TABLE
Test command:
  $ ip route get 1.1.1.1

Fixes: #1099

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/1121 from tanjianfeng:fix-1099 e6919f3d4ede5aa51a48b3d2be0d7a4b482dd53d
PiperOrigin-RevId: 291990716
2020-01-28 12:32:59 -08:00
Jamie Liu 1119644080 Implement an anon_inode equivalent for VFS2.
PiperOrigin-RevId: 291986033
2020-01-28 12:08:00 -08:00
Michael Pratt 76483b8b1e Check sigsetsize in rt_sigaction
This isn't in the libc wrapper, but it is in the syscall itself.

Discovered by @xiaobo55x in #1625.

PiperOrigin-RevId: 291973931
2020-01-28 11:26:09 -08:00
Ghanan Gowripalan 2a2da5be31 Add a type to represent the NDP Source Link Layer Address option
Tests:
- header.TestNDPSourceLinkLayerAddressOptionEthernetAddress
- header.TestNDPSourceLinkLayerAddressOptionSerialize
- header.TestNDPOptionsIterCheck
- header.TestNDPOptionsIter
PiperOrigin-RevId: 291856429
2020-01-27 20:51:28 -08:00
Kevin Krakauer d6a2e01d3e Address GH comments. 2020-01-27 16:40:46 -08:00
gVisor bot db68c85ab7 Merge pull request #1561 from zhangningdlut:chris_tty
PiperOrigin-RevId: 291821850
2020-01-27 16:35:38 -08:00
Adin Scannell 253c9e666c Cleanup glog and add real caller information.
In general, we've learned that logging must be avoided at all
costs in the hot path. It's unlikely that the optimizations
here were significant in any case, since buffer would certainly
escape.

This also adds a test to ensure that the caller identification
works as expected, and so that logging can be benchmarked.

Original:
BenchmarkGoogleLogging-6   	 1222255	       949 ns/op

With this change:
BenchmarkGoogleLogging-6   	  517323	      2346 ns/op

Fixes #184

PiperOrigin-RevId: 291815420
2020-01-27 16:08:35 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
gVisor bot 60d7ff73e1 Merge pull request #1676 from majek:marek/FIX-1632-expose-NewPacketConn
PiperOrigin-RevId: 291803499
2020-01-27 15:30:32 -08:00
Adin Scannell 90ec596166 Fix licenses.
The preferred Copyright holder is "The gVisor Authors".

PiperOrigin-RevId: 291786657
2020-01-27 13:23:57 -08:00
Bhasker Hariharan fbfcfcf5b0 Update ChecksumVVWithoffset to use unrolled version.
Fixes #1656

PiperOrigin-RevId: 291777279
2020-01-27 13:05:26 -08:00
Dean Deng 13c1f38dfa Update bug number for supporting extended attribute namespaces.
PiperOrigin-RevId: 291774815
2020-01-27 12:50:18 -08:00
Ting-Yu Wang 6b14be4246 Refactor to hide C from channel.Endpoint.
This is to aid later implementation for /dev/net/tun device.

PiperOrigin-RevId: 291746025
2020-01-27 12:31:47 -08:00
Kevin Krakauer e889f95671 More cleanup. 2020-01-27 12:30:06 -08:00
Kevin Krakauer 29316e66ad Cleanup for GH review. 2020-01-27 12:27:04 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Kevin Krakauer 67243ca51c Merge branch 'master' into ipt-udp-matchers 2020-01-27 10:09:51 -08:00
Marek Majkowski 45398b160f Expose gonet.NewPacketConn, for parity with gonet.NewConn API
gonet.Conn can be created with both gonet.NewConn and gonet.Dial.
gonet.PacketConn was created only by gonet.DialUDP. This prevented
us from being able to use PacketConn in udp.NewForwarder() context.

This simple constructor - NewPacketConn, allows user to create
correct structure from that context.
2020-01-27 13:51:15 +00:00
Bhasker Hariharan 6b43cf791a Replace calculateChecksum w/ the unrolled version.
Fixes #1656

PiperOrigin-RevId: 291703760
2020-01-27 05:34:34 -08:00
Bhasker Hariharan 68514d4ba3 Unroll checksum computation loop.
Checksum computation is one of the most expensive bits of
packet processing. Manual unrolling of the loop provides
significant improvement in checksum speed.

Updates #1656

BenchmarkChecksum/checksum_64-12                49834124                23.6 ns/op
BenchmarkChecksum/checksum_128-12               27111997                44.1 ns/op
BenchmarkChecksum/checksum_256-12               11416683                91.5 ns/op
BenchmarkChecksum/checksum_512-12                6375298               174 ns/op
BenchmarkChecksum/checksum_1024-12               3403852               338 ns/op
BenchmarkChecksum/checksum_1500-12               2343576               493 ns/op
BenchmarkChecksum/checksum_2048-12               1730521               656 ns/op
BenchmarkChecksum/checksum_4096-12                920469              1327 ns/op
BenchmarkChecksum/checksum_8192-12                445885              2637 ns/op
BenchmarkChecksum/checksum_16384-12               226342              5268 ns/op
BenchmarkChecksum/checksum_32767-12               114210             10503 ns/op
BenchmarkChecksum/checksum_32768-12                99138             10610 ns/op
BenchmarkChecksum/checksum_65535-12                53438             21158 ns/op
BenchmarkChecksum/checksum_65536-12                52993             21067 ns/op
BenchmarkUnrolledChecksum/checksum_64-12        61035639                19.1 ns/op
BenchmarkUnrolledChecksum/checksum_128-12               36067015                33.6 ns/op
BenchmarkUnrolledChecksum/checksum_256-12               19731220                60.4 ns/op
BenchmarkUnrolledChecksum/checksum_512-12                9091291               116 ns/op
BenchmarkUnrolledChecksum/checksum_1024-12               4976406               226 ns/op
BenchmarkUnrolledChecksum/checksum_1500-12               3685224               328 ns/op
BenchmarkUnrolledChecksum/checksum_2048-12               2579108               447 ns/op
BenchmarkUnrolledChecksum/checksum_4096-12               1350475               887 ns/op
BenchmarkUnrolledChecksum/checksum_8192-12                658248              1780 ns/op
BenchmarkUnrolledChecksum/checksum_16384-12               335869              3534 ns/op
BenchmarkUnrolledChecksum/checksum_32767-12               168650              7095 ns/op
BenchmarkUnrolledChecksum/checksum_32768-12               168075              7098 ns/op
BenchmarkUnrolledChecksum/checksum_65535-12                75085             14277 ns/op
BenchmarkUnrolledChecksum/checksum_65536-12                75921             14127 ns/op

PiperOrigin-RevId: 291643290
2020-01-26 18:35:01 -08:00
Kevin Krakauer 2946fe8162 We can now actually write out the udp matcher. 2020-01-24 17:12:03 -08:00
Jamie Liu 18a7e1309d Add support for device special files to VFS2 tmpfs.
PiperOrigin-RevId: 291471892
2020-01-24 17:07:54 -08:00
Ghanan Gowripalan 878bda6e19 Lock the NIC when checking if an address is tentative
PiperOrigin-RevId: 291426657
2020-01-24 13:10:08 -08:00
Jamie Liu d135b5abf6 Add anonymous device number allocation to VFS2.
Note that in VFS2, filesystem device numbers are per-vfs.FilesystemImpl rather
than global, avoiding the need for a "registry" type to handle save/restore.
(This is more consistent with Linux anyway: compare e.g.
mm/shmem.c:shmem_mount() => fs/super.c:mount_nodev() => (indirectly)
set_anon_super().)

PiperOrigin-RevId: 291425193
2020-01-24 12:54:39 -08:00
Ghanan Gowripalan fb80979e3f Increase timeouts for NDP tests' async events
Increase the timeout to 1s when waiting for async NDP events to help
reduce flakiness. This will not significantly increase test times as the
async events continue to receive an event on a channel. The increased
timeout allows more time for an event to be sent on the channel as the
previous timeout of 100ms caused some flakes.

Test: Existing tests pass
PiperOrigin-RevId: 291420936
2020-01-24 12:30:23 -08:00
Michael Pratt 390bb9c241 Ignore external SIGURG
Go 1.14+ sends SIGURG to Ms to attempt asynchronous preemption of a G. Since it
can't guarantee that a SIGURG is only related to preemption, it continues to
forward them to signal.Notify (see runtime.sighandler).

We should ignore these signals, as applications shouldn't receive them. Note
that this means that truly external SIGURG can no longer be sent to the
application (as with SIGCHLD).

PiperOrigin-RevId: 291415357
2020-01-24 12:01:04 -08:00
Kevin Krakauer 7636478a31 Merge branch 'master' into ipt-udp-matchers 2020-01-24 10:42:43 -08:00
Nicolas Lacasse 3db317390b Remove epoll entry from map when dropping it.
This pattern (delete from map when dropping) is also used in epoll.RemoveEntry,
and seems like generally a good idea.

PiperOrigin-RevId: 291268208
2020-01-23 16:19:10 -08:00
gVisor bot 3d10edc942 Merge pull request #1617 from kevinGC:iptables-write-filter-proto
PiperOrigin-RevId: 291249314
2020-01-23 14:48:39 -08:00
Michael Pratt 7a79715504 Check for EINTR from KVM_CREATE_VM
The kernel may return EINTR from:

kvm_create_vm
  kvm_init_mmu_notifier
    mmu_notifier_register
      do_mmu_notifier_register
        mm_take_all_locks

Go 1.14's preemptive scheduling signals make hitting this much more likely.

PiperOrigin-RevId: 291212669
2020-01-23 11:49:02 -08:00
Rahat Mahmood 896bd654b6 De-duplicate common test functionality for VFS2 filesystems.
PiperOrigin-RevId: 291041576
2020-01-22 15:16:21 -08:00
Ghanan Gowripalan 1d97adaa6d Use embedded mutex pattern for stack.NIC
- Wrap NIC's fields that should only be accessed while holding the mutex in
  an anonymous struct with the embedded mutex.
- Make sure NIC's spoofing and promiscuous mode flags are only read while
  holding the NIC's mutex.
- Use the correct endpoint when sending DAD messages.
- Do not hold the NIC's lock when sending DAD messages.

This change does not introduce any behaviour changes.

Tests: Existing tests continue to pass.
PiperOrigin-RevId: 291036251
2020-01-22 14:51:53 -08:00
Kevin Krakauer b7853f688b Error marshalling the matcher.
The iptables binary is looking for libxt_.so when it should be looking
for libxt_udp.so, so it's having an issue reading the data in
xt_match_entry. I think it may be an alignment issue.

Trying to fix this is leading to me fighting with the metadata struct,
so I'm gonna go kill that.
2020-01-22 14:46:15 -08:00
gVisor bot 0e7f417293 Merge pull request #1631 from majek:fix-gonet-udp.RemoteAddr
PiperOrigin-RevId: 291019296
2020-01-22 13:35:07 -08:00