We install seccomp rules so that the SIGSYS signal is generated for
each mmap system call. Then our signal handler executes the real mmap
syscall and if a new regions is created, it maps it to the guest.
Signed-off-by: Andrei Vagin <avagin@google.com>
Turns out certain features of iptables (e.g. NAT) will not perform
any checks/work unless both the Network and Transport headers are
populated.
With this change, provide the packet directly to the outgoing
network endpoint's `writePacket` method instead of going
through `WriteHeaderIncludedPacket` which expected the headers
to not be set.
PiperOrigin-RevId: 398304004
Before this change, when a new connection was created after receiving
an ACK that matched a SYN-cookie, it was always delivered asynchronously
to the accept queue. There was a chance that the listening endpoint
would process a SYN from another client before the delivery happened,
and the listening endpoint would not know yet that the queue was about
to be full, once the delivery happened.
Now, when an ACK matching a SYN-cookie is received, the new endpoint is
created and moved to the accept queue synchronously, while holding the
accept lock.
Fixes#6545
PiperOrigin-RevId: 398107254
A socket queue can contain sockets (others and this one). We have to avoid
taking locks of the same class where it is possible.
PiperOrigin-RevId: 398100744
Introduces RPC methods in lisafs. Makes that gofer client use lisafs RPCs
instead of p9 when lisafs is enabled.
Implements the handlers for those methods in fsgofer.
Fixes#5465
PiperOrigin-RevId: 398080310
lisafs is only supported in VFS2. Added a runsc flag which enables lisafs.
When the flag is enabled, the gofer process and the client communicate using
lisafs protocol instead of 9P.
Added a filesystem option in fsimpl/gofer which indicates if lisafs is being
used. That will be used to gate lisafs on the gofer client.
Note that this change does not make the gofer client use lisafs just yet.
Updates #5465
PiperOrigin-RevId: 397917844
Once a packet socket is bound to a network protocol, it cannot be
unbound from that protocol; the network protocol binding may only be
updated to a different network protocol.
To comply with Linux.
PiperOrigin-RevId: 397810878
This change mainly aims to define the semantics of communication for the LISAFS
(LInux SAndbox Filesystem) protocol. This protocol aims to replace 9P and
intends to bring some performance benefits with it.
Some of the notable differences from the p9 package are:
- Now the server implementations own the handlers.
- As a result, there is no verbose interface like `p9.File` that all servers
need to implement. Different implementations can extend their File
implementations to varying degrees without imposing those extensions to other
server implementations that might not have anything to do with those features.
- If a server implementation adds a new RPC message, other implementations are
not compelled to support it.
I wrote a benchmark `BenchmarkSendRecv` in connection_test.go which competes
with p9's `BenchmarkSendRecvChannel`. Running these on an AMD Milan machine
shows that lisafs is **45%** faster.
**With 9P**
goos: linux
goarch: amd64
pkg: gvisor/pkg/p9/p9
cpu: AMD EPYC 7B13 64-Core Processor
BenchmarkSendRecvLegacy-256 82830 14053 ns/op 633 B/op 23 allocs/op
BenchmarkSendRecvChannel-256 776971 1551 ns/op 184 B/op 6 allocs/op
**With lisafs**
goos: linux
goarch: amd64
pkg: pkg/lisafs/connection_test
cpu: AMD EPYC 7B13 64-Core Processor
BenchmarkSendRecv-256 1399610 853.5 ns/op 48 B/op 2 allocs/op
Fixes#5464
PiperOrigin-RevId: 397803163
...to change the network protocol a packet socket may receive packets
from.
This CL is a portion of an originally larger CL that was split with
a8ad692fd3
being the dependent CL. That CL (accidentally) included the change in
the endpoint's `afterLoad` method to take the required lock when
accessing the endpoint's netProto field. That change should have been in
this CL.
The CL that made the change mentioned in the commit message is
cl/396946187.
PiperOrigin-RevId: 397412582
...to catch lock-related bugs in nogo tests.
Checklocks also pointed out a locking violation which is fixed
in this change.
Updates #6566.
PiperOrigin-RevId: 397225322
...so that a later change can add a new packet_socket syscall test
target that holds raw/dgram packet socket generic common tests. The
current packet_socket syscall test target holds tests specific to
dgram packet sockets.
While I am here, remove the defines for the packet_socket_raw_test
target as no code is guarded with `__linux__` in the target's sources.
PiperOrigin-RevId: 397217761
In the general case, files may have offsets between MaxInt64 and MaxUint64; in
Linux pgoff is consistently represented by an unsigned long, and in gVisor the
offset types in memmap.MappableRange are uint64. However, regular file mmap is
constrained to int64 offsets (on 64-bit systems) by
mm/mmap.c:file_mmap_size_max() => MAX_LFS_FILESIZE == LLONG_MAX.
As a related fix, check for chunkStart overflow in fsutil.HostFileMapper; chunk
offsets are uint64s, but as noted above some file types may use uint64 offsets
beyond MaxInt64.
Reported-by: syzbot+71342a1585aed97ed9f7@syzkaller.appspotmail.com
PiperOrigin-RevId: 397136751
Add global flags -profile-{block,cpu,heap,mutex} and -trace which
enable collection of the specified profile for the entire duration of a
container execution. This provides a way to definitively start profiling
before that application starts, rather than attempting to race with an
out-of-band `runsc debug`.
Note that only the main boot process is profiled.
This exposed a bug in Task.traceExecEvent: a crash when tracing and
-race are enabled. traceExecEvent is called off of the task goroutine,
but uses the Task as a context, which is a violation of the Task
contract. Switching to the AsyncContext fixes the issue.
Fixes#220
...to catch lock-related bugs in nogo tests.
Also update the endpoint's state field to be accessed while the mutex is
held instead of requiring atomic operations as nothing needs to call the
State method while the mutex is held.
Updates #6566.
PiperOrigin-RevId: 397010316
...to catch lock-related bugs in nogo tests.
This change also disables/enables packet reception before/after
save/restore with a flag that is protected by rcvMu instead of mu.
Updates #6566.
PiperOrigin-RevId: 396946187
Replaced the current AddAddressWithOptions method with
AddAddressWithProperties which passes all address properties in
a single AddressProperties type. More properties that need to be
configured in the future are expected, so adding a type makes adding
them easier.
PiperOrigin-RevId: 396930729
A raw IP endpoint's write and socket option get/set path can use the
datagram-based endpoint.
This change extracts tests from UDP that may also run on Raw IP sockets.
Updates #6565.
Test: Raw IP + datagram-based socket syscall tests.
PiperOrigin-RevId: 396729727
Code to get the loopback interface's index is scattered throughout the
syscall tests. Implement the code once and use that in tests (where
applicable).
While I am here, trim the dependencies/includes for network namespace
tests.
PiperOrigin-RevId: 396718124
Previously, we weren't making a copy when a sysv message queue was
receiving a message with the MSG_COPY flag. This flag indicates the
message being received should be left in the queue and a copy of the
message should be returned to userspace. Without the copy, a racing
process can modify the original message while it's being marshalled to
user memory.
Reported-by: syzbot+cb15e644698b20ff4e17@syzkaller.appspotmail.com
PiperOrigin-RevId: 396712856
Previously, any time a datagram-based network endpoint (e.g. UDP) was
bound, the bound NIC is always set based on the bound address (if
specified). However, we should only consider the endpoint bound to
an NIC if a NIC was explicitly bound to.
If an endpoint has been bound to an address and attempts to send packets
to an unconnected remote, the endpoint will default to sending packets
through the bound address' NIC if not explicitly bound to a NIC.
Updates #6565.
PiperOrigin-RevId: 396712415
Fixed a bug introduced in the following commit:
979d6e7d77
The commit introduced a bug which causes the recvmmsg dispatcher to never exit
as BlockingPoll is now called with two fds and poll will not return an error
anymore if one of the FD is closed. We need to explicitly check the events for
each FD to determine if the sentry FD is closed.
ReadV dispatcher does not have the same issue as Readv does not rely on sk_err
field of the underlying socket to determine if the socket is in an error
state. Recvmmsg OTOH seems to get confused and always returns EAGAIN if poll()
is called which queries the sk_err field and clears it.
PiperOrigin-RevId: 396676135
SOL_UDP is used when get/set-ing socket options to specify the socket
level. When creating normal UDP sockets, the protocol need not be
specified. When creating RAW IP sockets for UDP, use IPPROTO_UDP.
PiperOrigin-RevId: 396675986
Rootless mode seems to work fine for simple containers with runsc run,
so allow its use.
Since runsc run is more widely used, require a workable --network option
is passed rather than automatically switching like runsc do does.
Fixes#3036
...if bound to an address.
We previously checked the source of a packet instead of the destination
of a packet when bound to an address.
PiperOrigin-RevId: 396497647
Otherwise it can fail:
$ bazel cquery pkg/p9/... --output=starlark --starlark:file=tools/show_paths.bzl
...
ERROR: Starlark evaluation error for //pkg/p9/p9test:mockgen:
Traceback (most recent call last):
File "tools/show_paths.bzl", line 8, column 32, in format
Error: 'NoneType' value has no field or method 'get'
PiperOrigin-RevId: 396457764
Setting the ToS for IPv4 packets (SOL_IP, IP_TOS) should not affect the
Traffic Class of IPv6 packets (SOL_IPV6, IPV6_TCLASS).
Also only return the ToS value XOR Traffic Class as a packet cannot be
both an IPv4 and an IPv6 packet; It is invalid to return both the IPv4
ToS and IPv6 Traffic Class control messages when reading packets.
Updates #6389.
PiperOrigin-RevId: 396399096