Commit Graph

270 Commits

Author SHA1 Message Date
Dean Deng 1b88c63b3e Move hostfs mount to Kernel struct.
This is needed to set up host fds passed through a Unix socket. Note that
the host package depends on kernel, so we cannot set up the hostfs mount
directly in Kernel.Init as we do for sockfs and pipefs.

Also, adjust sockfs to make its setup look more like hostfs's and pipefs's.

PiperOrigin-RevId: 308274053
2020-04-24 10:03:43 -07:00
Rahat Mahmood f01f2132d8 Enable automated marshalling for mempolicy syscalls.
PiperOrigin-RevId: 308170679
2020-04-23 18:20:21 -07:00
Rahat Mahmood 93dd471461 Enable automated marshalling for epoll events.
Ensure we use the correct architecture-specific defintion of epoll
event, and use go-marshal for serialization.

PiperOrigin-RevId: 308145677
2020-04-23 15:49:05 -07:00
gVisor bot ded5c963ae Merge pull request #1819 from lubinszARM:pr_signal_2
PiperOrigin-RevId: 308100771
2020-04-23 12:01:38 -07:00
Andrei Vagin 0c586946ea Specify a memory file in platform.New().
PiperOrigin-RevId: 307941984
2020-04-22 17:50:10 -07:00
Zach Koopmans 12bde95635 Get /bin/true to run on VFS2
Included:
- loader_test.go RunTest and TestStartSignal VFS2
- container_test.go TestAppExitStatus on VFS2
- experimental flag added to runsc to turn on VFS2

Note: shared mounts are not yet supported.
PiperOrigin-RevId: 307070753
2020-04-17 10:39:19 -07:00
Jamie Liu f03996c5e9 Implement pipe(2) and pipe2(2) for VFS2.
Updates #1035

PiperOrigin-RevId: 306968644
2020-04-16 19:27:03 -07:00
Fabricio Voznika 28399818fc Make ExtractErrno a function
PiperOrigin-RevId: 306891171
2020-04-16 11:49:27 -07:00
gVisor bot 7e5d67ee90 Merge pull request #2168 from xiaobo55x:ptrace_test
PiperOrigin-RevId: 306306809
2020-04-13 14:17:53 -07:00
Dean Deng 5d885d7fb2 Port socket-related syscalls to VFS2.
Note that most kinds of sockets are not yet supported in VFS2
(only Unix sockets are partially supported at the moment), so
these syscalls will still generally fail. Enabling them allows
us to begin running socket tests for VFS2 as more features are
ported over.

Updates #1476, #1478, #1484, #1485.

PiperOrigin-RevId: 306292294
2020-04-13 13:02:34 -07:00
Jon Budd 6a4d17a31d Remove obsolete TODOs for b/38173783
The comments in the ticket indicate that this behavior
is fine and that the ticket should be closed, so we shouldn't
need pointers to the ticket.

PiperOrigin-RevId: 306266071
2020-04-13 11:02:14 -07:00
Ian Lewis daf3322498 Add logging message for noNewPrivileges OCI option.
noNewPrivileges is ignored if set to false since gVisor assumes that
PR_SET_NO_NEW_PRIVS is always enabled.

PiperOrigin-RevId: 305991947
2020-04-10 20:32:23 -07:00
Fabricio Voznika 1798d6cbee Remove TODO from kernel.Stracer
The dependency strace=>kernel grew over time. strace also depends on
task's FD table and FSContext. It could be fixed with some interfaces
the other way, but then we're trading an interface for another, and
kernel.Stracer is likely cleaner.

Closes #155

PiperOrigin-RevId: 305909678
2020-04-10 11:19:12 -07:00
gVisor bot 78126611e6 Merge pull request #2253 from amscanne:nogo
PiperOrigin-RevId: 305807868
2020-04-09 19:16:46 -07:00
Haibo Xu 7aa5caae71 Enable syscall ptrace test on arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I5bb8fa7d580d173b1438d6465e1adb442216c8fa
2020-04-10 10:00:26 +08:00
Fabricio Voznika 6dd5a1f3fe Clean up TODOs
PiperOrigin-RevId: 305592245
2020-04-08 17:58:13 -07:00
Adin Scannell 94b793262d Fix all copy locks violations.
This required minor restructuring of how system call tables were saved
and restored, but it makes way more sense this way.

Updates #2243
2020-04-08 10:00:14 -07:00
Nicolas Lacasse f332a864e8 Port timerfd to VFS2.
PiperOrigin-RevId: 305067208
2020-04-06 10:52:56 -07:00
Dean Deng 24bee1c181 Record VFS2 sockets in global socket map.
Updates #1476, #1478, #1484, #1485.

PiperOrigin-RevId: 304845354
2020-04-04 21:02:42 -07:00
Bhasker Hariharan fc99a7ebf0 Refactor software GSO code.
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.

This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.

While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.

Updates #231

PiperOrigin-RevId: 304731742
2020-04-03 18:35:55 -07:00
Dean Deng 5818663ebe Add FileDescriptionImpl for Unix sockets.
This change involves several steps:
- Refactor the VFS1 unix socket implementation to share methods between VFS1
  and VFS2 where possible. Re-implement the rest.
- Override the default PRead, Read, PWrite, Write, Ioctl, Release methods in
  FileDescriptionDefaultImpl.
- Add functions to create and initialize a new Dentry/Inode and FileDescription
  for a Unix socket file.

Updates #1476

PiperOrigin-RevId: 304689796
2020-04-03 14:08:54 -07:00
Adin Scannell a94309628e Ensure EOF is handled propertly during splice.
PiperOrigin-RevId: 304684417
2020-04-03 13:40:51 -07:00
Rahat Mahmood 840980aeba Implement automated marshalling for slices of Marshallable types.
PiperOrigin-RevId: 304119255
2020-03-31 22:56:09 -07:00
Bin Lu 5eb41c8fba Arm64 signal#2: signal support in arch module
SA_RESTORER is always used on Intel platform.
But this flag is optional on other platforms.

The vdso is enabled, so we can use the sigreturn trampolines
the vdso provides instead on Arm platform.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-03-31 22:53:15 -04:00
Dean Deng 639d94f9f7 Add socket filesystem and global disconnected socket mount for VFS2.
A socket mount where anonymous sockets will reside is added to the
VirtualFilesystem. Socketfs is built on top of kernfs.

Updates #1476, #1478, #1484, #1485.

PiperOrigin-RevId: 304095251
2020-03-31 19:17:12 -07:00
Nayana Bidari 92b9069b67 Support owner matching for iptables.
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
2020-03-26 12:21:24 -07:00
Bhasker Hariharan e9e399c25d Remove workMu from tcpip.Endpoint.
workMu is removed and e.mu is now a mutex that supports TryLock.  The packet
processing path tries to lock the mutex and if its locked it will just queue the
packet and move on. The endpoint.UnlockUser() will process any backlog of
packets before unlocking the socket.

This simplifies the locking inside tcp endpoints a lot. Further the
endpoint.LockUser() implements spinning as long as the lock is not held by
another syscall goroutine. This ensures low latency as not spinning leads to the
task thread being put to sleep if the lock is held by the packet dispatch
path. This is suboptimal as the lower layer rarely holds the lock for long so
implementing spinning here helps.

If the lock is held by another task goroutine then we just proceed to call
LockUser() and the task could be put to sleep.

The protocol goroutines themselves just call e.mu.Lock() and block if the
lock is currently not available.

Updates #231, #357

PiperOrigin-RevId: 301808349
2020-03-19 07:19:58 -07:00
Fabricio Voznika f1d1af2a4a Fix FDTable.NewFDVFS2
It was looking at VFS1 table to determine where to
allocate the next FD from.

Updates #1035

PiperOrigin-RevId: 301678858
2020-03-18 15:13:42 -07:00
Andrei Vagin b55f0e5d40 fdtable: don't try to zap fdtable entry if close is called for non-existing fd
FDTable.setAll is used to zap entries, but it grows the table up to
a specified fd.

Reported-by: syzbot+9e281b0750d2d4caa190@syzkaller.appspotmail.com
PiperOrigin-RevId: 301280000
2020-03-16 18:29:58 -07:00
Dean Deng 5e413cad10 Plumb VFS2 imported fds into virtual filesystem.
- When setting up the virtual filesystem, mount a host.filesystem to contain
  all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
  supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
  enabled.

PiperOrigin-RevId: 300922353
2020-03-14 07:14:33 -07:00
Fabricio Voznika 45a8ae240d Add remaining procfs files
Closes #1195

PiperOrigin-RevId: 300867055
2020-03-13 18:57:07 -07:00
Fabricio Voznika 829beebf0b Panic if file in FDTable has been destroyed
This will give more information about the file to
identify where possibly the extra DecRef()
would be.

PiperOrigin-RevId: 300855874
2020-03-13 17:18:10 -07:00
Jamie Liu b0f2c3e764 Fix infinite loop in semaphore.sem.wakeWaiters().
PiperOrigin-RevId: 300845134
2020-03-13 16:09:18 -07:00
Jamie Liu 1c05352970 Fix oom_score_adj.
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj).

- Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu
  and TaskSet.mu in the wrong order (via Task.ExitState()).

PiperOrigin-RevId: 300814698
2020-03-13 13:19:13 -07:00
Jamie Liu b78cee3bae Fix lock recursion in kernel.ProcessGroup.SendSignal().
PiperOrigin-RevId: 300803515
2020-03-13 12:18:36 -07:00
Tamir Duberstein 6fa5cee82c Prevent memory leaks in ilist
When list elements are removed from a list but not discarded, it becomes
important to invalidate the references they hold to their former
neighbors to prevent memory leaks.

PiperOrigin-RevId: 299412421
2020-03-06 12:31:43 -08:00
Ian Lewis da48fc6cca Stub oom_score_adj and oom_score.
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts
writes of values -1000 to 1000 and persists the value with the task. New tasks
inherit the parent's oom_score_adj.

oom_score is a read-only stub that always returns the value '0'.

Issue #202

PiperOrigin-RevId: 299245355
2020-03-05 18:23:01 -08:00
Andrei Vagin 413a9b7fdc Define CPUIDInstruction for arm64
There is no cpuid instruction on arm64, so we need to defined it
just to avoid a compile time error.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-28 17:07:01 -08:00
Adin Scannell 463f4217d1 Make pipe buffer implementation standard.
A follow-up change will convert the networking code to use this standard
pipe implementation.

PiperOrigin-RevId: 297903206
2020-02-28 12:29:23 -08:00
Ting-Yu Wang 6b4d36e325 Hide /dev/net/tun when using hostinet.
/dev/net/tun does not currently work with hostinet. This has caused some
program starts failing because it thinks the feature exists.

PiperOrigin-RevId: 297876196
2020-02-28 10:39:12 -08:00
Fabricio Voznika 72e3f3a3ee Add option to skip stuck tasks waiting for address space
PiperOrigin-RevId: 297192390
2020-02-25 13:44:18 -08:00
Jamie Liu 471b15b212 Port most syscalls to VFS2.
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.

Updates #1623

PiperOrigin-RevId: 297188448
2020-02-25 13:37:34 -08:00
Adin Scannell 6def8ea6ac Fix nested logging.
PiperOrigin-RevId: 297175316
2020-02-25 12:25:38 -08:00
gVisor bot 4a73bae269 Initial network namespace support.
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.

Before the userspace is able to create device, the default loopback device can
be used to test.

/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.

Issue #1833

PiperOrigin-RevId: 296309389
2020-02-20 15:20:40 -08:00
gVisor bot d90d71474f Remove bytes read/written from marshal.Marshallable API.
Users of the API only care about whether the copy in/out succeeds in
their entirety, which is already signalled by the returned error.

PiperOrigin-RevId: 296297843
2020-02-20 14:29:26 -08:00
gVisor bot 5baf9dc2fb Synchronize signalling with S/R
This is to fix a data race between sending an external signal to
a ThreadGroup and kernel saving state for S/R.

PiperOrigin-RevId: 295244281
2020-02-14 15:49:09 -08:00
gVisor bot 87bc2834c9 Enable automated marshalling for RSeqCriticalSection.
PiperOrigin-RevId: 295226468
2020-02-14 14:24:27 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot 3c26f5ecb0 Enable automated marshalling for struct stat.
This requires fixing a few build issues for non-am64 platforms.

PiperOrigin-RevId: 295196922
2020-02-14 12:08:12 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00