Commit Graph

1180 Commits

Author SHA1 Message Date
Nicolas Lacasse 10f2c8db91 Add FilesystemType.Name method, and FilesystemType field to Filesystem struct.
Both have analogues in Linux:
* struct file_system_type has a char *name field.
* struct super_block keeps a pointer to the file_system_type.

These fields are necessary to support the `filesystem type` field in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303434063
2020-03-27 16:56:16 -07:00
Dean Deng 76a7ace751 Add BoundEndpointAt filesystem operation.
BoundEndpointAt() is needed to support Unix sockets bound at a
file path, corresponding to BoundEndpoint() in VFS1.

Updates #1476.

PiperOrigin-RevId: 303258251
2020-03-26 21:52:24 -07:00
Dean Deng 137f361400 Use host-defined file owner and mode, when possible, for imported fds.
Using the host-defined file owner matches VFS1. It is more correct to use the
host-defined mode, since the cached value may become out of date. However,
kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are
in-memory so retrieving mode should not fail. Therefore, if the host syscall
fails, we rely on a cached value instead.

Updates #1672.

PiperOrigin-RevId: 303220864
2020-03-26 16:47:20 -07:00
gVisor bot 0e62a548eb Merge pull request #2130 from nybidari:iptables
PiperOrigin-RevId: 303208407
2020-03-26 15:47:00 -07:00
Nicolas Lacasse e466ab04a2 Add unique ID to Mount type.
Analagous to Linux's mount.mnt_id. This ID is displayed in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303185564
2020-03-26 13:49:59 -07:00
Nayana Bidari 92b9069b67 Support owner matching for iptables.
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
2020-03-26 12:21:24 -07:00
gVisor bot 7aa388ce74 Merge pull request #1986 from lubinszARM:pr_ring0_clean_1
PiperOrigin-RevId: 303105826
2020-03-26 08:49:12 -07:00
Fabricio Voznika de694e5484 Combine file mode and isDir arguments
Updates #1035

PiperOrigin-RevId: 303021328
2020-03-26 08:48:04 -07:00
Fabricio Voznika f2eba94015 Remove TODO to push down exec permission check
Pushing it down requires all implementation to check for
exec individualy which is not maintanable. Making it part
of GenericCheckPermissions add extra cost to everyone that
calls it. So it's better to keep is in
VirtualFilesystem.OpenAt.

Updates #1193

PiperOrigin-RevId: 302982993
2020-03-25 15:57:37 -07:00
Fabricio Voznika e541ebec2f Misc fixes to make stat_test pass (almost)
The only test failing now requires socket which is not
available in VFS2 yet.

Updates #1198

PiperOrigin-RevId: 302976572
2020-03-25 14:59:15 -07:00
Fabricio Voznika c7f5673529 Set file mode and type to attribute
Makes less error prone to find file type.

Updates #1197

PiperOrigin-RevId: 302974244
2020-03-25 14:49:13 -07:00
Bhasker Hariharan d8c4eff3f7 Automated rollback of changelist 301837227
PiperOrigin-RevId: 302891559
2020-03-25 08:11:21 -07:00
Bhasker Hariharan 7e4073af12 Move tcpip.PacketBuffer and IPTables to stack package.
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.

Updates #2214

PiperOrigin-RevId: 302677662
2020-03-24 09:06:26 -07:00
Ian Lewis a730d74b32 Support basic /proc/net/dev metrics for netstack
Fixes #506

PiperOrigin-RevId: 302540404
2020-03-23 16:12:58 -07:00
Bhasker Hariharan 369cf38bd7 Fix data race in SetSockOpt.
PiperOrigin-RevId: 302539171
2020-03-23 16:06:33 -07:00
Dean Deng 6eebaea949 Correctly release taskPathOperation for accessAt.
PiperOrigin-RevId: 302518924
2020-03-23 14:33:15 -07:00
Dean Deng 248e46f320 Whitelist utimensat(2).
utimensat is used by hostfs for setting timestamps on imported fds. Previously,
this would crash the sandbox since utimensat was not allowed.

Correct the VFS2 version of hostfs to match the call in VFS1.

PiperOrigin-RevId: 301970121
2020-03-19 23:30:21 -07:00
Zach Koopmans 57d9bd922b Remove the "frozen" bit from dirents.
Frozen was to lock down changes to the host filesystem
for hostFS. Now that hostFS is gone, it can be removed.

PiperOrigin-RevId: 301907923
2020-03-19 15:30:13 -07:00
Bhasker Hariharan 3a37f67917 Change SocketOperations.readMu to an RWMutex.
Also get rid of the readViewHasData as it's not required anymore.

Updates #231, #357

PiperOrigin-RevId: 301837227
2020-03-19 10:00:31 -07:00
Bhasker Hariharan e9e399c25d Remove workMu from tcpip.Endpoint.
workMu is removed and e.mu is now a mutex that supports TryLock.  The packet
processing path tries to lock the mutex and if its locked it will just queue the
packet and move on. The endpoint.UnlockUser() will process any backlog of
packets before unlocking the socket.

This simplifies the locking inside tcp endpoints a lot. Further the
endpoint.LockUser() implements spinning as long as the lock is not held by
another syscall goroutine. This ensures low latency as not spinning leads to the
task thread being put to sleep if the lock is held by the packet dispatch
path. This is suboptimal as the lower layer rarely holds the lock for long so
implementing spinning here helps.

If the lock is held by another task goroutine then we just proceed to call
LockUser() and the task could be put to sleep.

The protocol goroutines themselves just call e.mu.Lock() and block if the
lock is currently not available.

Updates #231, #357

PiperOrigin-RevId: 301808349
2020-03-19 07:19:58 -07:00
Dean Deng 3a42638a0b Port imported TTY fds to vfs2.
Refactor fs/host.TTYFileOperations so that the relevant functionality can be
shared with VFS2 (fsimpl/host.ttyFD).

Incorporate host.defaultFileFD into the default host.fileDescription. This way,
there is no need for a separate default_file.go. As in vfs1, the TTY file
implementation can be built on top of this default and override operations as
necessary (PRead/Read/PWrite/Write, Release, Ioctl).

Note that these changes still need to be plumbed into runsc, which refers to
imported TTYs in control/proc.go:ExecAsync.

Updates #1672.

PiperOrigin-RevId: 301718157
2020-03-18 19:12:10 -07:00
gVisor bot a0fed7ea45 Merge pull request #2061 from lubinszARM:pr_restart_syscall
PiperOrigin-RevId: 301700868
2020-03-18 17:11:43 -07:00
Fabricio Voznika f1d1af2a4a Fix FDTable.NewFDVFS2
It was looking at VFS1 table to determine where to
allocate the next FD from.

Updates #1035

PiperOrigin-RevId: 301678858
2020-03-18 15:13:42 -07:00
Zach Koopmans 42d78ba61b Remove HostFS from Sentry.
PiperOrigin-RevId: 301402181
2020-03-17 10:30:32 -07:00
Andrei Vagin b55f0e5d40 fdtable: don't try to zap fdtable entry if close is called for non-existing fd
FDTable.setAll is used to zap entries, but it grows the table up to
a specified fd.

Reported-by: syzbot+9e281b0750d2d4caa190@syzkaller.appspotmail.com
PiperOrigin-RevId: 301280000
2020-03-16 18:29:58 -07:00
Fabricio Voznika 2a6c4369be Enforce file size rlimits in VFS2
Updates #1035

PiperOrigin-RevId: 301255357
2020-03-16 16:00:49 -07:00
Fabricio Voznika 0f60799a4f Add calls to vfs.CheckSetStat to fsimpls
Only gofer filesystem was calling vfs.CheckSetStat for
vfs.FilesystemImpl.SetStatAt and vfs.FileDescriptionImpl.SetStat.

Updates #1193, #1672, #1197

PiperOrigin-RevId: 301226522
2020-03-16 13:29:12 -07:00
gVisor bot 159a230b9b Merge pull request #1943 from kevinGC:ipt-filter-ip
PiperOrigin-RevId: 301197007
2020-03-16 11:13:14 -07:00
Fabricio Voznika 9712775028 Disallow kernfs.Inode.SetStat for readonly inodes
Updates #1195, #1193

PiperOrigin-RevId: 300950993
2020-03-14 13:48:06 -07:00
Dean Deng 5e413cad10 Plumb VFS2 imported fds into virtual filesystem.
- When setting up the virtual filesystem, mount a host.filesystem to contain
  all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
  supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
  enabled.

PiperOrigin-RevId: 300922353
2020-03-14 07:14:33 -07:00
Fabricio Voznika 45a8ae240d Add remaining procfs files
Closes #1195

PiperOrigin-RevId: 300867055
2020-03-13 18:57:07 -07:00
Fabricio Voznika 829beebf0b Panic if file in FDTable has been destroyed
This will give more information about the file to
identify where possibly the extra DecRef()
would be.

PiperOrigin-RevId: 300855874
2020-03-13 17:18:10 -07:00
Jamie Liu b0f2c3e764 Fix infinite loop in semaphore.sem.wakeWaiters().
PiperOrigin-RevId: 300845134
2020-03-13 16:09:18 -07:00
Jamie Liu 1c05352970 Fix oom_score_adj.
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj).

- Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu
  and TaskSet.mu in the wrong order (via Task.ExitState()).

PiperOrigin-RevId: 300814698
2020-03-13 13:19:13 -07:00
Jamie Liu b78cee3bae Fix lock recursion in kernel.ProcessGroup.SendSignal().
PiperOrigin-RevId: 300803515
2020-03-13 12:18:36 -07:00
Dean Deng 2e38408f20 Implement access/faccessat for VFS2.
Note that the raw faccessat system call does not actually take a flags argument;
according to faccessat(2), the glibc wrapper implements the flags by using
fstatat(2). Remove the flag argument that we try to extract from vfs1, which
would just be a garbage value.

Updates #1965
Fixes #2101

PiperOrigin-RevId: 300796067
2020-03-13 11:41:08 -07:00
Ting-Yu Wang f458a325e9 Fix "application exiting with {Code:0 Signo:27}" during boot.
2aa9514a06 skips SIGURG, but later code expects
the sigchans array contains consecutive signal numbers.

PiperOrigin-RevId: 300793450
2020-03-13 11:26:45 -07:00
Fabricio Voznika 8f8f16efaf Add support for mount flags
Plumbs MS_NOEXEC and MS_RDONLY. Others are TODO.

Updates #1623 #1193

PiperOrigin-RevId: 300764669
2020-03-13 08:58:04 -07:00
Bin Lu 7df936f359 passed the syscall test case 'alarm' on Arm64 platform
This issue was caused by 'restart_syscall'.
The value of Register R0 should be stored after finishing sysemu.
So that we can restore the value and restart syscall.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-03-12 05:57:47 -04:00
Andrei Vagin 22d89ef5cb Import "unsafe" in bluepill_arm64_unsafe.go
This fixes a compile time error:
pkg/sentry/platform/kvm/bluepill_arm64_unsafe.go:45:35: undefined: unsafe

PiperOrigin-RevId: 300375687
2020-03-11 12:01:46 -07:00
gVisor bot 2c2622b942 Merge pull request #1975 from nybidari:iptables
PiperOrigin-RevId: 300362789
2020-03-11 11:02:04 -07:00
Andrei Vagin 2aa9514a06 runsc: don't redirect SIGURG which is used by Go's runtime scheduler
Go 1.14+ sends SIGURG to Ms to attempt asynchronous preemption of a G. Since it
can't guarantee that a SIGURG is only related to preemption, it continues to
forward them to signal.Notify (see runtime.sighandler).

When runsc is running a container, there are three processes: a parent process
and two children (sandbox and gopher). A parent process sets a signal handler
for all signals and redirect them to the container init process. This logic
should ignore SIGURG signals. We already ignore them in the Sentry, but it will
be better to not notify about them when this is possible.

PiperOrigin-RevId: 300345286
2020-03-11 09:50:06 -07:00
gVisor bot 24e7005ab6 Merge pull request #1832 from xiaobo55x:tls_ptrace
PiperOrigin-RevId: 300270894
2020-03-11 01:06:19 -07:00
Ting-Yu Wang b36de6e7be Move /proc/net to /proc/PID/net, and make /proc/net -> /proc/self/net.
Issue #1833

PiperOrigin-RevId: 299998105
2020-03-09 19:59:09 -07:00
Haibo Xu c04958e2fa Enable thread local storage support on arm64.
Linux use the task.thread.uw.tp_value field to store the
TLS pointer on arm64 platform, and we use a similar way
in gvisor to store it in the arch/State struct.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Ie76b5c6d109bc27ccfd594008a96753806db7764
2020-03-09 01:04:55 +00:00
Dean Deng 228813fd26 Update comments and debug level for profiling options.
PiperOrigin-RevId: 299448307
2020-03-06 15:23:46 -08:00
Dean Deng 960f6a975b Add plumbing for importing fds in VFS2, along with non-socket, non-TTY impl.
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.

An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.

This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.

Updates #1672

PiperOrigin-RevId: 299417263
2020-03-06 12:59:49 -08:00
Tamir Duberstein 6fa5cee82c Prevent memory leaks in ilist
When list elements are removed from a list but not discarded, it becomes
important to invalidate the references they hold to their former
neighbors to prevent memory leaks.

PiperOrigin-RevId: 299412421
2020-03-06 12:31:43 -08:00
gVisor bot 18d41cf153 Merge pull request #1963 from xiaobo55x:kvm_common
PiperOrigin-RevId: 299405855
2020-03-06 12:05:30 -08:00
gVisor bot 56c4272568 Merge pull request #1946 from xiaobo55x:dieTramp
PiperOrigin-RevId: 299405663
2020-03-06 12:01:23 -08:00