Linux use the task.thread.uw.tp_value field to store the
TLS pointer on arm64 platform, and we use a similar way
in gvisor to store it in the arch/State struct.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Ie76b5c6d109bc27ccfd594008a96753806db7764
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.
An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.
This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.
Updates #1672
PiperOrigin-RevId: 299417263
When list elements are removed from a list but not discarded, it becomes
important to invalidate the references they hold to their former
neighbors to prevent memory leaks.
PiperOrigin-RevId: 299412421
If the NIC has a valid IPv6 address assigned, use it as the
source address for outgoing NDP Router Solicitation packets.
Test: stack_test.TestRouterSolicitation
PiperOrigin-RevId: 299398763
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts
writes of values -1000 to 1000 and persists the value with the task. New tasks
inherit the parent's oom_score_adj.
oom_score is a read-only stub that always returns the value '0'.
Issue #202
PiperOrigin-RevId: 299245355
By putting slices into the pool, the slice header escapes. This can be avoided
by not putting the slice header into the pool.
This removes an allocation from the TCP segment send path.
PiperOrigin-RevId: 299215480
gofer.dentryReadWriter.WriteFromBlocks was not updating
gofer.dentry.size after a write operation that skips the
cache.
Updates #1198
PiperOrigin-RevId: 298708646
The support of PTRACE_SYSEMU on arm64 was added in the 5.3 kernel,
so we can be sure that the current version is higher that 5.3.
And this change moves vsyscall seccomp rules to the arch specific file,
because vsyscall isn't supported on arm64.
PiperOrigin-RevId: 298696493
Ensures that all access to TransportEndpointInfo.ID is either:
* In a function ending in a Locked suffix.
* While holding the appropriate mutex.
This primary affects the checkV4Mapped method on affected endpoints, which has
been renamed to checkV4MappedLocked. Also document the method and change its
argument to be a value instead of a pointer which had caused some awkwardness.
This race was possible in the udp and icmp endpoints between Connect and uses
of TransportEndpointInfo.ID including in both itself and Bind.
The tcp endpoint did not suffer from this bug, but benefited from better
documentation.
Updates #357
PiperOrigin-RevId: 298682913
A deadlock may occur if a write lock on a RWMutex is blocked between
nested read lock attempts as the inner read lock attempt will be
blocked in this scenario.
Example (T1 and T2 are differnt goroutines):
T1: obtain read-lock
T2: attempt write-lock (blocks)
T1: attempt inner/nested read-lock (blocks)
Here we can see that T1 and T2 are deadlocked.
Tests: Existing tests pass.
PiperOrigin-RevId: 298426678
This is needed for syzkaller to proper classify issues.
Right now, all watchdog issues are duped to one with the
subject "panic: Sentry detected stuck task(s). See stack
trace and message above for more details".
PiperOrigin-RevId: 297975363
/dev/net/tun does not currently work with hostinet. This has caused some
program starts failing because it thinks the feature exists.
PiperOrigin-RevId: 297876196
Call stack.Close on stacks when we are done with them in tcp_test. This avoids
leaking resources and reduces the test's flakiness when race/gotsan is enabled.
It also provides test coverage for the race also fixed in this change, which
can be reliably triggered with the stack.Close change (and without the other
changes) when race/gotsan is enabled.
The race was possible when calling Abort (via stack.Close) on an endpoint
processing a SYN segment as part of a passive connect.
Updates #1564
PiperOrigin-RevId: 297685432
Enables commands such as:
$ iptables -A INPUT -d 127.0.0.1 -j ACCEPT
$ iptables -t nat -A PREROUTING ! -d 127.0.0.1 -j REDIRECT
Also adds a bunch of REDIRECT+destination tests.
MayDelete must lock the directory also, otherwise concurrent renames may
race. Note that this also changes the methods to be aligned with the actual
Remove and RemoveDirectory methods to minimize confusion when reading the
code. (It was hard to see that resolution was correct.)
PiperOrigin-RevId: 297258304
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.
Updates #1623
PiperOrigin-RevId: 297188448
Each mount is holds a reference on a root Dirent, but the mount itself may
live beyond it's own reference. This means that a call to Root() can come
after the associated reference has been dropped.
Instead of introducing a separate layer of references for mount objects,
we simply change the Root() method to use TryIncRef() and allow it to return
nil if the mount is already gone. This requires updating a small number of
callers and minimizes the change (since VFSv2 will replace this code shortly).
PiperOrigin-RevId: 297174230
TestCurrentConnectedIncrement fails consistently under gotsan due to the sleep
to check metrics is exactly the same as the TIME-WAIT duration. Under gotsan
things can be slow enough that the increment test is done before the protocol
goroutine is run after the TIME-WAIT timer expires and does its cleanup.
Increasing the sleep from 1s to 1.2s makes the test pass consistently.
PiperOrigin-RevId: 297160181
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.
Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.
PiperOrigin-RevId: 296922548
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.
Before the userspace is able to create device, the default loopback device can
be used to test.
/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.
Issue #1833
PiperOrigin-RevId: 296309389
- Disabled NICs will have their associated NDP state cleared.
- Disabled NICs will not accept incoming packets.
- Writes through a Route with a disabled NIC will return an invalid
endpoint state error.
- stack.Stack.FindRoute will not return a route with a disabled NIC.
- NIC's Running flag will report the NIC's enabled status.
Tests:
- stack_test.TestDisableUnknownNIC
- stack_test.TestDisabledNICsNICInfoAndCheckNIC
- stack_test.TestRoutesWithDisabledNIC
- stack_test.TestRouteWritePacketWithDisabledNIC
- stack_test.TestStopStartSolicitingRouters
- stack_test.TestCleanupNDPState
- stack_test.TestAddRemoveIPv4BroadcastAddressOnNICEnableDisable
- stack_test.TestJoinLeaveAllNodesMulticastOnNICEnableDisable
PiperOrigin-RevId: 296298588
Users of the API only care about whether the copy in/out succeeds in
their entirety, which is already signalled by the returned error.
PiperOrigin-RevId: 296297843
tmpfs.fileDescription now implements ConfigureMMap. And tmpfs.regularFile
implement memmap.Mappable. The methods are mostly unchanged from VFS1 tmpfs.
PiperOrigin-RevId: 296234557
Consistent with QEMU, getUserRegisters() should be an arch-specific
function. So, it should be called in dieArchSetup().
With this patch and the pagetable/pcid patch, the kvm modules on Arm64 can be
built successfully.
Signed-off-by: Bin Lu <bin.lu@arm.com>
Added the ability to get/set the IP_RECVTCLASS socket option on UDP endpoints.
If enabled, traffic class from the incoming Network Header passed as ancillary
data in the ControlMessages.
Adding Get/SetSockOptBool to decrease the overhead of getting/setting simple
options. (This was absorbed in a CL that will be landing before this one).
Test:
* Added unit test to udp_test.go that tests getting/setting as well as
verifying that we receive expected TOS from incoming packet.
* Added a syscall test for verifying getting/setting
* Removed test skip for existing syscall test to enable end to end test.
PiperOrigin-RevId: 295840218
Package syncevent is intended to subsume ~all uses of channels in the sentry
(including //pkg/waiter), as well as //pkg/sleep.
Compared to channels:
- Delivery of events to a syncevent.Receiver allows *synchronous* execution of
an arbitrary callback, whereas delivery of events to a channel requires a
goroutine to receive from that channel, resulting in substantial scheduling
overhead. (This is also part of the motivation for the waiter package.)
- syncevent.Waiter can wait on multiple event sources without the high O(N)
overhead of select. (This is the same motivation as for the sleep package.)
Compared to the waiter package:
- syncevent.Waiters are intended to be persistent (i.e. per-kernel.Task), and
syncevent.Broadcaster (analogous to waiter.Queue) is a hash table rather than
a linked list, such that blocking is (usually) allocation-free.
- syncevent.Source (analogous to waiter.Waitable) does not include an equivalent
to waiter.Waitable.Readiness(), since this is inappropriate for transient
events (see e.g. //pkg/sentry/kernel/time.ClockEventSource).
Compared to the sleep package:
- syncevent events are represented by bits in a bitmask rather than discrete
sleep.Waker objects, reducing overhead and making it feasible to broadcast
events to multiple syncevent.Receivers.
- syncevent.Receiver invokes an arbitrary callback, which is required by the
sentry's epoll implementation. (syncevent.Waiter, which is analogous to
sleep.Sleeper, pairs a syncevent.Receiver with a callback that wakes a
waiting goroutine; the implementation of this aspect is nearly identical to
that of sleep.Sleeper, except that it represents *runtime.g as unsafe.Pointer
rather than uintptr.)
- syncevent.Waiter.Wait (analogous to sleep.Sleeper.Fetch(block=true)) does not
automatically un-assert returned events. This is useful in cases where the
path for handling an event is not the same as the path that observes it, such
as for application signals (a la Linux's TIF_SIGPENDING).
- Unlike sleep.Sleeper, which Fetches Wakers in the order that they were
Asserted, the event bitmasks used by syncevent.Receiver have no way of
preserving event arrival order. (This is similar to select, which goes out of
its way to randomize event ordering.)
The disadvantage of the syncevent package is that, since events are represented
by bits in a uint64 bitmask, each syncevent.Receiver can "only" multiplex
between 64 distinct events; this does not affect any known use case.
Benchmarks:
BenchmarkBroadcasterSubscribeUnsubscribe
BenchmarkBroadcasterSubscribeUnsubscribe-12 45133884 26.3 ns/op
BenchmarkMapSubscribeUnsubscribe
BenchmarkMapSubscribeUnsubscribe-12 28504662 41.8 ns/op
BenchmarkQueueSubscribeUnsubscribe
BenchmarkQueueSubscribeUnsubscribe-12 22747668 45.6 ns/op
BenchmarkBroadcasterSubscribeUnsubscribeBatch
BenchmarkBroadcasterSubscribeUnsubscribeBatch-12 31609177 37.8 ns/op
BenchmarkMapSubscribeUnsubscribeBatch
BenchmarkMapSubscribeUnsubscribeBatch-12 17563906 62.1 ns/op
BenchmarkQueueSubscribeUnsubscribeBatch
BenchmarkQueueSubscribeUnsubscribeBatch-12 26248838 46.6 ns/op
BenchmarkBroadcasterBroadcastRedundant
BenchmarkBroadcasterBroadcastRedundant/0
BenchmarkBroadcasterBroadcastRedundant/0-12 100907563 11.8 ns/op
BenchmarkBroadcasterBroadcastRedundant/1
BenchmarkBroadcasterBroadcastRedundant/1-12 85103068 13.3 ns/op
BenchmarkBroadcasterBroadcastRedundant/4
BenchmarkBroadcasterBroadcastRedundant/4-12 52716502 22.3 ns/op
BenchmarkBroadcasterBroadcastRedundant/16
BenchmarkBroadcasterBroadcastRedundant/16-12 20278165 58.7 ns/op
BenchmarkBroadcasterBroadcastRedundant/64
BenchmarkBroadcasterBroadcastRedundant/64-12 5905428 205 ns/op
BenchmarkMapBroadcastRedundant
BenchmarkMapBroadcastRedundant/0
BenchmarkMapBroadcastRedundant/0-12 87532734 13.5 ns/op
BenchmarkMapBroadcastRedundant/1
BenchmarkMapBroadcastRedundant/1-12 28488411 36.3 ns/op
BenchmarkMapBroadcastRedundant/4
BenchmarkMapBroadcastRedundant/4-12 19628920 60.9 ns/op
BenchmarkMapBroadcastRedundant/16
BenchmarkMapBroadcastRedundant/16-12 6026980 192 ns/op
BenchmarkMapBroadcastRedundant/64
BenchmarkMapBroadcastRedundant/64-12 1640858 754 ns/op
BenchmarkQueueBroadcastRedundant
BenchmarkQueueBroadcastRedundant/0
BenchmarkQueueBroadcastRedundant/0-12 96904807 12.0 ns/op
BenchmarkQueueBroadcastRedundant/1
BenchmarkQueueBroadcastRedundant/1-12 73521873 16.3 ns/op
BenchmarkQueueBroadcastRedundant/4
BenchmarkQueueBroadcastRedundant/4-12 39209468 31.2 ns/op
BenchmarkQueueBroadcastRedundant/16
BenchmarkQueueBroadcastRedundant/16-12 10810058 105 ns/op
BenchmarkQueueBroadcastRedundant/64
BenchmarkQueueBroadcastRedundant/64-12 2998046 376 ns/op
BenchmarkBroadcasterBroadcastAck
BenchmarkBroadcasterBroadcastAck/1
BenchmarkBroadcasterBroadcastAck/1-12 44472397 26.4 ns/op
BenchmarkBroadcasterBroadcastAck/4
BenchmarkBroadcasterBroadcastAck/4-12 17653509 69.7 ns/op
BenchmarkBroadcasterBroadcastAck/16
BenchmarkBroadcasterBroadcastAck/16-12 4082617 260 ns/op
BenchmarkBroadcasterBroadcastAck/64
BenchmarkBroadcasterBroadcastAck/64-12 1220534 1027 ns/op
BenchmarkMapBroadcastAck
BenchmarkMapBroadcastAck/1
BenchmarkMapBroadcastAck/1-12 26760705 44.2 ns/op
BenchmarkMapBroadcastAck/4
BenchmarkMapBroadcastAck/4-12 11495636 100 ns/op
BenchmarkMapBroadcastAck/16
BenchmarkMapBroadcastAck/16-12 2937590 343 ns/op
BenchmarkMapBroadcastAck/64
BenchmarkMapBroadcastAck/64-12 861037 1344 ns/op
BenchmarkQueueBroadcastAck
BenchmarkQueueBroadcastAck/1
BenchmarkQueueBroadcastAck/1-12 19832679 55.0 ns/op
BenchmarkQueueBroadcastAck/4
BenchmarkQueueBroadcastAck/4-12 5618214 189 ns/op
BenchmarkQueueBroadcastAck/16
BenchmarkQueueBroadcastAck/16-12 1569980 713 ns/op
BenchmarkQueueBroadcastAck/64
BenchmarkQueueBroadcastAck/64-12 437672 2814 ns/op
BenchmarkWaiterNotifyRedundant
BenchmarkWaiterNotifyRedundant-12 650823090 1.96 ns/op
BenchmarkSleeperNotifyRedundant
BenchmarkSleeperNotifyRedundant-12 619871544 1.61 ns/op
BenchmarkChannelNotifyRedundant
BenchmarkChannelNotifyRedundant-12 298903778 3.67 ns/op
BenchmarkWaiterNotifyWaitAck
BenchmarkWaiterNotifyWaitAck-12 68358360 17.8 ns/op
BenchmarkSleeperNotifyWaitAck
BenchmarkSleeperNotifyWaitAck-12 25044883 41.2 ns/op
BenchmarkChannelNotifyWaitAck
BenchmarkChannelNotifyWaitAck-12 29572404 40.2 ns/op
BenchmarkSleeperMultiNotifyWaitAck
BenchmarkSleeperMultiNotifyWaitAck-12 16122969 73.8 ns/op
BenchmarkWaiterTempNotifyWaitAck
BenchmarkWaiterTempNotifyWaitAck-12 46111489 25.8 ns/op
BenchmarkSleeperTempNotifyWaitAck
BenchmarkSleeperTempNotifyWaitAck-12 15541882 73.6 ns/op
BenchmarkWaiterNotifyWaitMultiAck
BenchmarkWaiterNotifyWaitMultiAck-12 65878500 18.2 ns/op
BenchmarkSleeperNotifyWaitMultiAck
BenchmarkSleeperNotifyWaitMultiAck-12 28798623 41.5 ns/op
BenchmarkChannelNotifyWaitMultiAck
BenchmarkChannelNotifyWaitMultiAck-12 11308468 101 ns/op
BenchmarkWaiterNotifyAsyncWaitAck
BenchmarkWaiterNotifyAsyncWaitAck-12 2475387 492 ns/op
BenchmarkSleeperNotifyAsyncWaitAck
BenchmarkSleeperNotifyAsyncWaitAck-12 2184507 518 ns/op
BenchmarkChannelNotifyAsyncWaitAck
BenchmarkChannelNotifyAsyncWaitAck-12 2120365 562 ns/op
BenchmarkWaiterNotifyAsyncWaitMultiAck
BenchmarkWaiterNotifyAsyncWaitMultiAck-12 2351247 494 ns/op
BenchmarkSleeperNotifyAsyncWaitMultiAck
BenchmarkSleeperNotifyAsyncWaitMultiAck-12 2205799 522 ns/op
BenchmarkChannelNotifyAsyncWaitMultiAck
BenchmarkChannelNotifyAsyncWaitMultiAck-12 1238079 928 ns/op
Updates #1074
PiperOrigin-RevId: 295834087
- Redocument memory ordering from "no ordering" to "acquire-release". (No
functional change: both LOCK WHATEVER on x86, and LDAXR/STLXR loops on ARM64,
already have this property.)
- Remove IncUnlessZeroInt32 and DecUnlessOneInt32, which were only faster than
the equivalent loops using sync/atomic before the Go compiler inlined
non-unsafe.Pointer atomics many releases ago.
PiperOrigin-RevId: 295811743
glibc defines struct epoll_event in such a way that epoll_event.data.fd exists.
However, the kernel's definition of struct epoll_event makes epoll_event.data
an opaque uint64, so naming half of it "fd" just introduces confusion. Remove
the Fd field, and make Data a [2]int32 to compensate.
Also add required padding to linux.EpollEvent on ARM64.
PiperOrigin-RevId: 295250424
- Added fsbridge package with interface that can be used to open
and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup
Updates #1623
PiperOrigin-RevId: 295183950
- Add missing build tags to files in the abi package.
- Add the marshal package as a sentry dependency, allowed by deps_test.
- Fix an issue with our top-level go_library BUILD rule, which
incorrectly shadows the variable containing the input set of source
files. This caused the expansion for the go_marshal clause to
silently omit input files.
- Fix formatting when copying build tags to gomarshal-generated files.
- Fix a bug with import statement collision detection in go-marshal.
PiperOrigin-RevId: 295112284
For kvm test case "TestKernelSyscall",
redpill/syscall(-1) in guest kernel level will be trapped in el1_svc.
And in el1_svc, we use mmio_exit to leave the guest.
Signed-off-by: Bin Lu <bin.lu@arm.com>
This allow callers to say whether the file is being
opened to be executed, so that the proper checks can
be done from FilesystemImpl.OpenAt()
Updates #1623
PiperOrigin-RevId: 295042595
- Adds creation of user chains via `-N <chainname>`
- Adds `-j RETURN` support for built-in chains, which triggers the
chain's underflow rule (usually the default policy).
- Adds tests for chain creation, default policies, and `-j RETURN' from
built-in chains.
Fixes#1812. (The more direct cause of the deadlock is panic unsafety because
the historically high cost of defer means that we avoid it in hot paths,
including much of MM; defer is much cheaper as of Go 1.14, but still a
measurable overhead.)
PiperOrigin-RevId: 294560316
The slaveInodeOperations is currently copying the object when
truncate is called (which is a no-op). This may result in a
(unconsequential) data race when being modified concurrently.
PiperOrigin-RevId: 294484276
Put most of the logic for getxattr in one place for clarity. This simplifies
FGetXattr and getXattrFromPath, which are just wrappers for getXattr.
PiperOrigin-RevId: 294308332
Previously, a DAD event would not be sent if DAD was disabled.
This allows integrators to do some work when an IPv6 address is bound to
a NIC without special logic that checks if DAD is enabled.
Without this change, integrators would need to check if a NIC has DAD
enabled when an address is auto-generated. If DAD is enabled, it would
need to delay the work until the DAD completion event; otherwise, it
would need to do the work in the address auto-generated event handler.
Test: stack_test.TestDADDisabled
PiperOrigin-RevId: 293732914
Auto-generated link-local addresses should have the same lifecycle hooks
as global SLAAC addresses.
The Stack's NDP dispatcher should be notified when link-local addresses
are auto-generated and invalidated. They should also be removed when a
NIC is disabled (which will be supported in a later change).
Tests:
- stack_test.TestNICAutoGenAddrWithOpaque
- stack_test.TestNICAutoGenAddr
PiperOrigin-RevId: 293706760
Addresses may be added before a NIC is enabled. Make sure DAD is
performed on the permanent IPv6 addresses when they get enabled.
Test:
- stack_test.TestDoDADWhenNICEnabled
- stack.TestDisabledRxStatsWhenNICDisabled
PiperOrigin-RevId: 293697429
The types gonet.Conn and gonet.PacketConn were confusingly named as both
implemented net.Conn. Further, gonet.Conn was perhaps unexpectedly
TCP-specific (net.Conn is not). This change renames them to gonet.TCPConn and
gonet.UDPConn.
Renames gonet.NewListener to gonet.ListenTCP and adds a new gonet.NewTCPListner
function to be consistent with both the gonet.DialXxx and gonet.NewXxxConn
functions as well as net.ListenTCP.
Updates #1632
PiperOrigin-RevId: 293671303
Get the link address for the target of an NDP Neighbor Advertisement
from the NDP Target Link Layer Address option.
Tests:
- ipv6.TestNeighorAdvertisementWithTargetLinkLayerOption
- ipv6.TestNeighorAdvertisementWithInvalidTargetLinkLayerOption
PiperOrigin-RevId: 293632609
From RFC 793 s3.9 p58 Event Processing:
If RECEIVE Call arrives in CLOSED state and the user has access to such a
connection, the return should be "error: connection does not exist"
Fixes#1598
PiperOrigin-RevId: 293494287
Tests 65k connection attempts on common types of sockets to check for port
leaks.
Also fixes a bug where dual-stack sockets wouldn't properly re-queue
segments received while closing.
PiperOrigin-RevId: 293241166
This is failing in Go1.14 due to new checkptr constraints. This version
should avoid hitting this constraints by doing "dangerous" pointer math
less dangerously?
PiperOrigin-RevId: 293205764
Updates #1198
Opening host pipes (by spinning in fdpipe) and host sockets is not yet
complete, and will be done in a future CL.
Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels
of backportability:
- "Cache policies" are replaced by InteropMode, which control the behavior of
timestamps in addition to caching. Under InteropModeExclusive (analogous to
cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough),
client timestamps are *not* written back to the server (it is not possible in
9P or Linux for clients to set ctime, so writing back client-authoritative
timestamps results in incoherence between atime/mtime and ctime). Under
InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps
are not used at all (remote filesystem clocks are authoritative). cacheNone
is translated to InteropModeShared + new option
filesystemOptions.specialRegularFiles.
- Under InteropModeShared, "unstable attribute" reloading for permission
checks, lookup, and revalidation are fused, which is feasible in VFS2 since
gofer.filesystem controls path resolution. This results in a ~33% reduction
in RPCs for filesystem operations compared to cacheRemoteRevalidating. For
example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails
revalidation, resulting in the instantiation of a new dentry:
VFS1 RPCs:
getattr("/") // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr()
walkgetattr("/", "foo") = fid1 // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate()
clunk(fid1)
getattr("/foo") // CheckPermission
walkgetattr("/foo", "bar") = fid2 // Revalidate
clunk(fid2)
getattr("/foo/bar") // CheckPermission
walkgetattr("/foo/bar", "baz") = fid3 // Revalidate
clunk(fid3)
walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup
getattr("/foo/bar/baz") // linux.stat() => gofer.inodeOperations.UnstableAttr()
VFS2 RPCs:
getattr("/") // gofer.filesystem.walkExistingLocked()
walkgetattr("/", "foo") = fid1 // gofer.filesystem.stepExistingLocked()
clunk(fid1)
// No getattr: walkgetattr already updated metadata for permission check
walkgetattr("/foo", "bar") = fid2
clunk(fid2)
walkgetattr("/foo/bar", "baz") = fid3
// No clunk: fid3 used for new gofer.dentry
// No getattr: walkgetattr already updated metadata for stat()
- gofer.filesystem.unlinkAt() does not require instantiation of a dentry that
represents the file to be deleted. Updates #898.
- gofer.regularFileFD.OnClose() skips Tflushf for regular files under
InteropModeExclusive, as it's nonsensical to request a remote file flush
without flushing locally-buffered writes to that remote file first.
- Symlink targets are cached when InteropModeShared is not in effect.
- p9.QID.Path (which is already required to be unique for each file within a
server, and is accordingly already synthesized from device/inode numbers in
all known gofers) is used as-is for inode numbers, rather than being mapped
along with attr.RDev in the client to yet another synthetic inode number.
- Relevant parts of fsutil.CachingInodeOperations are inlined directly into
gofer package code. This avoids having to duplicate part of its functionality
in fsutil.HostMappable.
PiperOrigin-RevId: 293190213
As per RFC 2464 section 7, an IPv6 packet with a multicast destination
address is transmitted to the mapped Ethernet multicast address.
Test:
- ipv6.TestLinkResolution
- stack_test.TestDADResolve
- stack_test.TestRouterSolicitation
PiperOrigin-RevId: 292610529
A couple other things that changed:
- There's a proper extension registration system for matchers. Anyone
adding another matcher can use tcp_matcher.go or udp_matcher.go as a
template.
- All logging and use of syserr.Error in the netfilter package happens at the
highest possible level (public functions). Lower-level functions just
return normal, descriptive golang errors.
When sending a RST on shutdown we need to double check the
state after acquiring the work mutex as the endpoint could
have transitioned out of a connected state from the time
we checked it and we acquired the workMutex.
I added two tests but sadly neither reproduce the panic. I am
going to leave the tests in as they are good to have anyway.
PiperOrigin-RevId: 292393800
Splice must not allow negative offsets. Writes also must not allow offset +
size to overflow int64. Reads are similarly broken, but not just in splice
(b/148095030).
Reported-by: syzbot+0e1ff0b95fb2859b4190@syzkaller.appspotmail.com
PiperOrigin-RevId: 292361208
When sending NDP messages with an unspecified source address, the Source
Link Layer address must not be included.
Test: stack_test.TestDADResolve
PiperOrigin-RevId: 292341334
Currently, Send() will copy data into a new byte slice without regard to the
original size. Size checks should be performed before the allocation takes
place.
Note that for the sake of performance, we avoid putting the buffer
allocation into the critical section. As a result, the size checks need to be
performed again within Enqueue() in case the limit has changed.
PiperOrigin-RevId: 292058147
WritableSource is a convenience interface used for files that can
be written to, e.g. /proc/net/ipv4/tpc_sack. It reads max of 4KB
and only from offset 0 which should cover most cases. It can be
extended as neeed.
Updates #1195
PiperOrigin-RevId: 292056924
Update link address for senders of NDP Neighbor Solicitations when the NS
contains an NDP Source Link Layer Address option.
Tests:
- ipv6.TestNeighorSolicitationWithSourceLinkLayerOption
- ipv6.TestNeighorSolicitationWithInvalidSourceLinkLayerOption
PiperOrigin-RevId: 292028553
FD table now holds both VFS1 and VFS2 types and uses the correct
one based on what's set.
Parts of this CL are just initial changes (e.g. sys_read.go,
runsc/main.go) to serve as a template for the remaining changes.
Updates #1487
Updates #1623
PiperOrigin-RevId: 292023223
Special files can have additional requirements for granularity.
For example, read from eventfd returns EINVAL if a size is less 8 bytes.
Reported-by: syzbot+3905f5493bec08eb7b02@syzkaller.appspotmail.com
PiperOrigin-RevId: 292002926
Test command:
$ ip route get 1.1.1.1
Fixes: #1099
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/1121 from tanjianfeng:fix-1099 e6919f3d4ede5aa51a48b3d2be0d7a4b482dd53d
PiperOrigin-RevId: 291990716
In general, we've learned that logging must be avoided at all
costs in the hot path. It's unlikely that the optimizations
here were significant in any case, since buffer would certainly
escape.
This also adds a test to ensure that the caller identification
works as expected, and so that logging can be benchmarked.
Original:
BenchmarkGoogleLogging-6 1222255 949 ns/op
With this change:
BenchmarkGoogleLogging-6 517323 2346 ns/op
Fixes#184
PiperOrigin-RevId: 291815420
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.
PiperOrigin-RevId: 291811289
gonet.Conn can be created with both gonet.NewConn and gonet.Dial.
gonet.PacketConn was created only by gonet.DialUDP. This prevented
us from being able to use PacketConn in udp.NewForwarder() context.
This simple constructor - NewPacketConn, allows user to create
correct structure from that context.
Note that in VFS2, filesystem device numbers are per-vfs.FilesystemImpl rather
than global, avoiding the need for a "registry" type to handle save/restore.
(This is more consistent with Linux anyway: compare e.g.
mm/shmem.c:shmem_mount() => fs/super.c:mount_nodev() => (indirectly)
set_anon_super().)
PiperOrigin-RevId: 291425193
Increase the timeout to 1s when waiting for async NDP events to help
reduce flakiness. This will not significantly increase test times as the
async events continue to receive an event on a channel. The increased
timeout allows more time for an event to be sent on the channel as the
previous timeout of 100ms caused some flakes.
Test: Existing tests pass
PiperOrigin-RevId: 291420936
Go 1.14+ sends SIGURG to Ms to attempt asynchronous preemption of a G. Since it
can't guarantee that a SIGURG is only related to preemption, it continues to
forward them to signal.Notify (see runtime.sighandler).
We should ignore these signals, as applications shouldn't receive them. Note
that this means that truly external SIGURG can no longer be sent to the
application (as with SIGCHLD).
PiperOrigin-RevId: 291415357
The kernel may return EINTR from:
kvm_create_vm
kvm_init_mmu_notifier
mmu_notifier_register
do_mmu_notifier_register
mm_take_all_locks
Go 1.14's preemptive scheduling signals make hitting this much more likely.
PiperOrigin-RevId: 291212669
- Wrap NIC's fields that should only be accessed while holding the mutex in
an anonymous struct with the embedded mutex.
- Make sure NIC's spoofing and promiscuous mode flags are only read while
holding the NIC's mutex.
- Use the correct endpoint when sending DAD messages.
- Do not hold the NIC's lock when sending DAD messages.
This change does not introduce any behaviour changes.
Tests: Existing tests continue to pass.
PiperOrigin-RevId: 291036251
The iptables binary is looking for libxt_.so when it should be looking
for libxt_udp.so, so it's having an issue reading the data in
xt_match_entry. I think it may be an alignment issue.
Trying to fix this is leading to me fighting with the metadata struct,
so I'm gonna go kill that.