- Document fsutil.CachedFileObject.FD() requirements on access
permissions, and change gofer.inodeFileState.FD() to honor them.
Fixes#147.
- Combine gofer.inodeFileState.readonly and
gofer.inodeFileState.readthrough, and simplify handle caching logic.
- Inline gofer.cachePolicy.cacheHandles into
gofer.inodeFileState.setSharedHandles, because users with access to
gofer.inodeFileState don't necessarily have access to the fs.Inode
(predictably, this is a save/restore problem).
Before this CL:
$ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash
root@34d51017ed67:/# /root/repro/runsc-b147
mmap: 0x7f3c01e45000
Segmentation fault
After this CL:
$ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash
root@d3c3cb56bbf9:/# /root/repro/runsc-b147
mmap: 0x7f78987ec000
o
PiperOrigin-RevId: 240818413
Change-Id: I49e1d4a81a0cb9177832b0a9f31a10da722a896b
The start time is the number of clock ticks between the boot time and
application start time.
PiperOrigin-RevId: 240619475
Change-Id: Ic8bd7a73e36627ed563988864b0c551c052492a5
Memfds are simply anonymous tmpfs files with no associated
mounts. Also implementing file seals, which Linux only implements for
memfds at the moment.
PiperOrigin-RevId: 240450031
Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f
.net sets these flags to -1 and then uses their result, especting it to be
zero.
Does not set actual flags (e.g. MSG_TRUNC), but setting to zero is more correct
than what we did before.
PiperOrigin-RevId: 239657951
Change-Id: I89c5f84bc9b94a2cd8ff84e8ecfea09e01142030
In the case of a rename replacing an existing destination inode, ramfs
Rename failed to first remove the replaced inode. This caused:
1. A leak of a reference to the inode (making it live indefinitely).
2. For directories, a leak of the replaced directory's .. link to the
parent. This would cause the parent's link count to incorrectly
increase.
(2) is much simpler to test than (1), so that's what I've done.
agentfs has a similar bug with link count only, so the Dirent layer
informs the Inode if this is a replacing rename.
Fixes#133
PiperOrigin-RevId: 239105698
Change-Id: I4450af2462d8ae3339def812287213d2cbeebde0
getsockopt(IP_MULTICAST_IF) only supports struct in_addr.
Also adds support for setsockopt(IP_MULTICAST_IF) with struct in_addr.
PiperOrigin-RevId: 237620230
Change-Id: I75e7b5b3e08972164eb1906f43ddd67aedffc27c
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.
For now we only support IPv4 multicast.
PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
* open_create_test_runsc_ptrace_shared doesn't expect the write access to /
* exec_test_runsc_ptrace_shared could not find /usr/share/zoneinfo/
* clock_gettime_test_runsc_ptrace_shared didn't expect that
a thread cpu time can be zero.
* affinity_test_runsc_ptrace_shared expected minimum 3 cpus
PiperOrigin-RevId: 237509429
Change-Id: I477937e5d2cdf3f8720836bfa972abd35d8220a3
Now that tests aren't running in parallel, this test occassionally
takes too long and times out.
PiperOrigin-RevId: 237106971
Change-Id: I195a4b77315c9f5511c9e8ffadddb7aaa78beafd
ScopedSigaction is not async-signal-safe, so it cannot be used after fork.
Replace it with plain sigaction, which is safe. This is in a unique child
anyways, so it doesn't need any cleanup.
PiperOrigin-RevId: 237102411
Change-Id: I5c6ea373bbac67b9c4db204ceb1db62d338d9178
It is Implemented without the priority inheritance part given
that gVisor defers scheduling decisions to Go runtime and doesn't
have control over it.
PiperOrigin-RevId: 236989545
Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
Tests using DisableSave because a portion of the test is *incompatible*
with S/R clearly cannot use random S/R, as the saves may occur in the
DisableSave critical section.
Most such tests already have NoRandomSave. Add it to the rest.
PiperOrigin-RevId: 236914708
Change-Id: Iee1cf044cfa7cb8d5aba21ddc130926218210c48
When run in parallel, multicast packets can be received by the wrong test. The
tests in the target are run in an isolated network namespace, but if
parallelism is enabled, multiple tests from the same target will run in
parallel within the target's network namespace. Disabling parallelism only
allows one test to run in the network namespace at a time, which prevents
interaction.
PiperOrigin-RevId: 236709160
Change-Id: If828db44f0ae4002af36de6097866137c8d9da5c
The specific issue was:
- Test creates a raw ICMP socket
- Test sends an ICMP echo request (aka ping request) to itself via loopback
- Now two events race:
- The raw socket recieves the ICMP echo request
- Netstack receives the request and generates a reply (aka ping reply),
which it sends back over loopback, where it is eventually received by the
raw socket
- The test was written to expect packets in a specific order, but they can
come in any order.
PiperOrigin-RevId: 236179066
Change-Id: I02c07c919d3d28093add3d18dd9196fbbc870813
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
which can pass it up to the socket layer. This allows a raw socket to return
the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
that enable incoming packets to be delivered to raw endpoints. New raw sockets
of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.
PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
There was a minor bug whth IsPosixErrorOkAndHoldsMatcher where
it wouldn't display the actual value contained. This fixes that
and adds a few other minor improvements.
PiperOrigin-RevId: 235809065
Change-Id: I487e5072e9569eb06104522963e9a1b34204daaf
This solves two problems:
1. Using the host /tmp directly meant that concurrent tests could
collide attempting to use the same file, and that misbehaving tests
never have their /tmp output cleaned up.
2. Host /tmp is not world-accessible on all hosts. Some tests (e.g.,
sticky) access files in /tmp from other users, so we need to ensure
that its /tmp is world-accessible.
PiperOrigin-RevId: 235637873
Change-Id: I7555224685ac5b93af88c403196b09ce1bb2bfe7
This change adds support for the SO_BROADCAST socket option in gVisor Netstack.
This support includes getsockopt()/setsockopt() functionality for both UDP and
TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and
down the stack, and route finding/creation for broadcast packets. Finally, a
suite of tests have been implemented, exercising this functionality through the
Linux syscall API.
PiperOrigin-RevId: 234850781
Change-Id: If3e666666917d39f55083741c78314a06defb26c
- Use new user namespace for namespace creation checks.
- Ensure userns is never nil since it's used by other namespaces.
PiperOrigin-RevId: 234673175
Change-Id: I4b9d9d1e63ce4e24362089793961a996f7540cd9
In addition to simplifying the implementation, this fixes two bugs:
- seqfile.NewSeqFile unconditionally creates an inode with mode 0444,
but {uid,gid}_map have mode 0644.
- idMapSeqFile.Write implements fs.FileOperations.Write ... but it
doesn't implement any other fs.FileOperations methods and is never
used as fs.FileOperations. idMapSeqFile.GetFile() =>
seqfile.SeqFile.GetFile() uses seqfile.seqFileOperations instead,
which rejects all writes.
PiperOrigin-RevId: 234638212
Change-Id: I4568f741ab07929273a009d7e468c8205a8541bc
This allows setting a default send interface for IPv4 multicast. IPv6 support
will come later.
PiperOrigin-RevId: 234251379
Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in
a followup CL.
PiperOrigin-RevId: 233008638
Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
- Change proc to return envp on overwrite of argv with limitations from
upstream.
- Add unit tests
- Change layout of argv/envp on the stack so that end of argv is contiguous with
beginning of envp.
PiperOrigin-RevId: 232506107
Change-Id: I993880499ab2c1220f6dc456a922235c49304dec
Multiple tests were creating the same directory before removing it, making it
possible for concurrent tests to fail because the directory already exists.
PiperOrigin-RevId: 232389814
Change-Id: I35d409fff4b3fd864b30fee742cb587b14975c23
Nothing reads them and they can simply get stale.
Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD
PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
stdout can be (and, in automated testing, often is) a host pipe or
similar resource shared between multiple parallel tests, such that it
can become transiently full during testing.
PiperOrigin-RevId: 231413569
Change-Id: Id14991b5f71e53c894695899e65e1be4dd228cc6
The implementation of O_CLOEXEC is orthogonal to every property tested
by these tests; removing it significantly reduces the number of
redundant tests we run.
Also remove no-op calls to VecCat (calls with a single argument).
PiperOrigin-RevId: 230959537
Change-Id: I83fe7db24e481ef67ca1f1992228af423f640b5c
Lots of tests use /tmp for the tests. Force /tmp to be
mounted over fsgofer instead of tmpfs.
PiperOrigin-RevId: 230788985
Change-Id: Id6597ed88133232d15e808c48126bf77cb32673e
Otherwise, C++11-compliant compilers may select PosixErrorOr(const T&)
as the relevant constructor, and fail because std::vector<Mapping> is
not copyable (because Mapping is not copyable).
This is a C++11 defect that is fixed in C++14 (and in C++11 mode for
Clang, but not GCC). See DR 1579,
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1579.
PiperOrigin-RevId: 230767401
Change-Id: I65f481f5188d91db6cbbbd65ed0a60bc55df3401
netlink_autobind() sets a port id to a process ID, if this address is
available. Otherwise, it will set a port id to a random negative value.
PiperOrigin-RevId: 230631956
Change-Id: I11692e4fe9421e77d9406627b4e7772e4d9b105a
Compilation of this test fails in kokoro:
In file included from /usr/include/linux/netdevice.h:28:0,
from /usr/include/linux/if_arp.h:26,
from ./test/syscalls/linux/socket_netlink_util.h:18,
from test/syscalls/linux/socket_netdevice.cc:24:
/usr/include/linux/if.h:143:8: error: redefinition of 'struct ifmap'
struct ifmap {
^~~~~
In file included from test/syscalls/linux/socket_netdevice.cc:18:0:
/usr/include/net/if.h:111:8: note: previous definition of 'struct ifmap'
struct ifmap
^~~~~
In file included from /usr/include/linux/netdevice.h:28:0,
from /usr/include/linux/if_arp.h:26,
from ./test/syscalls/linux/socket_netlink_util.h:18,
from test/syscalls/linux/socket_netdevice.cc:24:
/usr/include/linux/if.h:177:8: error: redefinition of 'struct ifreq'
struct ifreq {
^~~~~
In file included from test/syscalls/linux/socket_netdevice.cc:18:0:
/usr/include/net/if.h:126:8: note: previous definition of 'struct ifreq'
struct ifreq
^~~~~
In file included from /usr/include/linux/netdevice.h:28:0,
from /usr/include/linux/if_arp.h:26,
from ./test/syscalls/linux/socket_netlink_util.h:18,
from test/syscalls/linux/socket_netdevice.cc:24:
/usr/include/linux/if.h:226:8: error: redefinition of 'struct ifconf'
struct ifconf {
^~~~~~
In file included from test/syscalls/linux/socket_netdevice.cc:18:0:
/usr/include/net/if.h:176:8: note: previous definition of 'struct ifconf'
struct ifconf
PiperOrigin-RevId: 230381931
Change-Id: I3c422c53e50cf2b90022778599d3a8a4a61fd1a3
Runsc wants to mount /tmp using internal tmpfs implementation for
performance. However, it risks hiding files that may exist under
/tmp in case it's present in the container. Now, it only mounts
over /tmp iff:
- /tmp was not explicitly asked to be mounted
- /tmp is empty
If any of this is not true, then /tmp maps to the container's
image /tmp.
Note: checkpoint doesn't have sentry FS mounted to check if /tmp
is empty. It simply looks for explicit mounts right now.
PiperOrigin-RevId: 229607856
Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
syscall test split testcase via shard count, reset
high bound as begin of next subslice, cause the slice
is half-open range.
Change-Id: I1954f57c93cbfd9be518153315da305a2de377a0
PiperOrigin-RevId: 229405199
More helper structs have been added to the fsutil package to make it easier to
implement fs.InodeOperations and fs.FileOperations.
PiperOrigin-RevId: 229305982
Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
test/syscalls/linux/wait.cc:626:8: warning: lambda capture 'this' is not
used
302
[this, stack] { ASSERT_THAT(FreeStack(stack), SyscallSucceeds()); });
303
^~~~~
test/syscalls/linux/priority.cc:195:17: warning: lambda capture
'kParentPriority' is not required to be captured for this use
273
ScopedThread([kParentPriority, kChildPriority]() {
274
^~~~~~~~~~~~~~~~
PiperOrigin-RevId: 229275900
Change-Id: I6f0c88efc7891c6c729378a2fa70f70b1b9046a7
- Fix a few cases where async-signal-unsafe code is executed in a forked
process pre-execve.
- Ensure that the return value of fork() is always checked.
PiperOrigin-RevId: 228949310
Change-Id: I3096cb7d7394b8d9ab81b0e0245f2060713ef589
Removing check to RLIMIT_NOFILE in select call.
Adding unit test to select suite to document behavior.
Moving setrlimit class from mlock to a util file for reuse.
Fixing flaky test based on comments from Jamie.
PiperOrigin-RevId: 228726131
Change-Id: Ie9dbe970bbf835ba2cca6e17eec7c2ee6fadf459
Instead just find the syscall_test_runner binary in the shell script.
PiperOrigin-RevId: 228621230
Change-Id: I274ee0874e47d53f59474b1ac730ee45e3dff977
The static local variable `enabled` in CooperativeSaveEnabled() is not
initialized until the first call to CooperativeSaveEnabled(), per the
C++14 standard, section 6.7 ("Declaration statement"), paragraph 4. This
initialization is thread-safe as of C++11, but it is *not* required to
be async-signal-safe. Use a namespace-scope variable instead, since this
is guaranteed to be zero-initialized before main() by section 3.6.2
("Initialization of non-local variables").
getenv() is technically not async-signal-safe either, hence the hedging
in the change summary line. However, glibc's implementation of getenv()
appears to be async-signal-safe in the absence of calls to setenv().
PiperOrigin-RevId: 228588617
Change-Id: I669f555d1c91352d55c606970bb237ec888fa7ca
This option allows multiple sockets to be bound to the same port.
Incoming packets are distributed to sockets using a hash based on source and
destination addresses. This means that all packets from one sender will be
received by the same server socket.
PiperOrigin-RevId: 227153413
Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
epoll_wait acquires EventPoll.listsMu (in EventPoll.ReadEvents) and
then calls Inotify.Readiness which tries to acquire Inotify.evMu.
getdents acquires Inotify.evMu (in Inotify.queueEvent) and then calls
readyCallback.Callback which tries to acquire EventPoll.listsMu.
The fix is to release Inotify.evMu before calling Queue.Notify. Queue
is thread-safe and doesn't require Inotify.evMu to be held.
Closes#121
PiperOrigin-RevId: 227066695
Change-Id: Id29364bb940d1727f33a5dff9a3c52f390c15761
We don't explicitly support out-of-band data and treat it like normal in-band
data. This is equilivent to SO_OOBINLINE being enabled, so always report that
it is enabled.
PiperOrigin-RevId: 226572742
Change-Id: I4c30ccb83265e76c30dea631cbf86822e6ee1c1b
We now build all packages (including //test/...) with RBE as part of the Kokoro
presubmit.
The tests do not yet use RBE, because there are some failures. The Golang unit,
integration, and image tests still run locally.
The syscall test suite needs even more work to make it pass on RBE. Those will
be enabled in follow-up CLs. They currently are not enabled at all on Kokoro.
PiperOrigin-RevId: 226562208
Change-Id: Idd2b81b3e8f07bf300c77e68990493ba97d16e23
Within gVisor, plumb new socket options to netstack.
Within netstack, fix GetSockOpt and SetSockOpt return value logic.
PiperOrigin-RevId: 226532229
Change-Id: If40734e119eed633335f40b4c26facbebc791c74
The code that matches the event being published with events watchers
was wronly matching all watchers in case any of the control event bits
were set.
Issue #121
PiperOrigin-RevId: 226521230
Change-Id: Ie2c42bc4366faaf59fbf80a74e9297499bd93f9e
Implement pwritev2 and associated unit tests.
Clean up preadv2 unit tests.
Tag RWF_ flags in both preadv2 and pwritev2 with associated bug tickets.
PiperOrigin-RevId: 226222119
Change-Id: Ieb22672418812894ba114bbc88e67f1dd50de620
Connectionless Unix sockets (DGRAM Unix sockets created with the socket system
call) inherently only have a read queue. They do not establish bidirectional
connections, instead, the connect system call only sets a default send
location. Writes give the data to the other endpoint which has its own read
queue.
To simplify the code, connectionless Unix sockets still get read and write
queues, but the write queue is a dummy and never waited on. The read queue is
the connectionless endpoint's queue. This change fixes a bug where the dummy
queue was incorrectly set as the read queue and the endpoint's queue was
incorrectly set as the write queue. This meant that read notifications went
to the dummy queue and were black holed.
PiperOrigin-RevId: 225921042
Change-Id: I8d9059def787a2c3c305185b92d05093fbd2be2a
Currently mlock() and friends do nothing whatsoever. However, mlocking
is directly application-visible in a number of ways; for example,
madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked
regions. We handle this inconsistently: MADV_DONTNEED is too important
to not work, but MS_INVALIDATE is rejected.
Change MM to track mlocked regions in a manner consistent with Linux.
It still will not actually pin pages into host physical memory, but:
- mlock() will now cause sentry memory management to precommit mlocked
pages.
- MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as
described above.
PiperOrigin-RevId: 225861605
Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee
This test suite was creating shm segments without ensuring they were
cleaned up. Shm segments outlive the process creating them, so on a
standard linux machine the test was leaving segments behind after each
run. This would often cause failures as test cases would be affected
by the cases that ran before them and left unexpected segments lying
around.
Also skip some assertions around memory usage when running on a Linux
host, as we can't reason about external users of shm segments.
PiperOrigin-RevId: 225435523
Change-Id: Ia299dacf59045002436f5e30dcc131f679bb7272
This allows ValueOrDie() to be called on PosixErrorOr rvalues (e.g.
temporaries) holding move-only types without extraneous std::move()s.
PiperOrigin-RevId: 225098036
Change-Id: I662862e4f3562141f941845fc6e197edb27ce29b
The argument is --test_tag_filters, not --test_tag_filter.
Also switch to ... instead of :*, as it doesn't require special shell
quoting to avoid * expansion.
PiperOrigin-RevId: 224949618
Change-Id: I45dd6acbaeae29f2cc0baa977b086b5c037c6a88
MSG_WAITALL requests that recv family calls do not perform short reads. It only
has an effect for SOCK_STREAM sockets, other types ignore it.
PiperOrigin-RevId: 224918540
Change-Id: Id97fbf972f1f7cbd4e08eec0138f8cbdf1c94fe7