Commit Graph

72 Commits

Author SHA1 Message Date
Fabricio Voznika 2a6c4369be Enforce file size rlimits in VFS2
Updates #1035

PiperOrigin-RevId: 301255357
2020-03-16 16:00:49 -07:00
Fabricio Voznika 0f60799a4f Add calls to vfs.CheckSetStat to fsimpls
Only gofer filesystem was calling vfs.CheckSetStat for
vfs.FilesystemImpl.SetStatAt and vfs.FileDescriptionImpl.SetStat.

Updates #1193, #1672, #1197

PiperOrigin-RevId: 301226522
2020-03-16 13:29:12 -07:00
Fabricio Voznika 9712775028 Disallow kernfs.Inode.SetStat for readonly inodes
Updates #1195, #1193

PiperOrigin-RevId: 300950993
2020-03-14 13:48:06 -07:00
Dean Deng 5e413cad10 Plumb VFS2 imported fds into virtual filesystem.
- When setting up the virtual filesystem, mount a host.filesystem to contain
  all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
  supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
  enabled.

PiperOrigin-RevId: 300922353
2020-03-14 07:14:33 -07:00
Fabricio Voznika 45a8ae240d Add remaining procfs files
Closes #1195

PiperOrigin-RevId: 300867055
2020-03-13 18:57:07 -07:00
Jamie Liu 1c05352970 Fix oom_score_adj.
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj).

- Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu
  and TaskSet.mu in the wrong order (via Task.ExitState()).

PiperOrigin-RevId: 300814698
2020-03-13 13:19:13 -07:00
Dean Deng 2e38408f20 Implement access/faccessat for VFS2.
Note that the raw faccessat system call does not actually take a flags argument;
according to faccessat(2), the glibc wrapper implements the flags by using
fstatat(2). Remove the flag argument that we try to extract from vfs1, which
would just be a garbage value.

Updates #1965
Fixes #2101

PiperOrigin-RevId: 300796067
2020-03-13 11:41:08 -07:00
Ting-Yu Wang b36de6e7be Move /proc/net to /proc/PID/net, and make /proc/net -> /proc/self/net.
Issue #1833

PiperOrigin-RevId: 299998105
2020-03-09 19:59:09 -07:00
Dean Deng 960f6a975b Add plumbing for importing fds in VFS2, along with non-socket, non-TTY impl.
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.

An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.

This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.

Updates #1672

PiperOrigin-RevId: 299417263
2020-03-06 12:59:49 -08:00
Ian Lewis da48fc6cca Stub oom_score_adj and oom_score.
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts
writes of values -1000 to 1000 and persists the value with the task. New tasks
inherit the parent's oom_score_adj.

oom_score is a read-only stub that always returns the value '0'.

Issue #202

PiperOrigin-RevId: 299245355
2020-03-05 18:23:01 -08:00
Fabricio Voznika 122d47aed1 Update cached file size when cache is skipped
gofer.dentryReadWriter.WriteFromBlocks was not updating
gofer.dentry.size after a write operation that skips the
cache.

Updates #1198

PiperOrigin-RevId: 298708646
2020-03-03 15:29:13 -08:00
Fabricio Voznika 0f8a9e3623 Change dup2 call to dup3
We changed syscalls to allow dup3 for ARM64.

Updates #1198

PiperOrigin-RevId: 297870816
2020-02-28 10:15:20 -08:00
Jamie Liu 471b15b212 Port most syscalls to VFS2.
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.

Updates #1623

PiperOrigin-RevId: 297188448
2020-02-25 13:37:34 -08:00
gVisor bot 4a73bae269 Initial network namespace support.
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.

Before the userspace is able to create device, the default loopback device can
be used to test.

/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.

Issue #1833

PiperOrigin-RevId: 296309389
2020-02-20 15:20:40 -08:00
gVisor bot 10ed60e477 VFS2: Support memory mapping in tmpfs.
tmpfs.fileDescription now implements ConfigureMMap. And tmpfs.regularFile
implement memmap.Mappable. The methods are mostly unchanged from VFS1 tmpfs.

PiperOrigin-RevId: 296234557
2020-02-20 09:58:10 -08:00
gVisor bot 3557b26651 Allow vfs.IterDirentsCallback.Handle() to return an error.
This is easier than storing errors from e.g. CopyOut in the callback.

PiperOrigin-RevId: 295230021
2020-02-14 14:40:35 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00
gVisor bot a6024f7f5f Add FileExec flag to OpenOptions
This allow callers to say whether the file is being
opened to be executed, so that the proper checks can
be done from FilesystemImpl.OpenAt()

Updates #1623

PiperOrigin-RevId: 295042595
2020-02-13 17:57:36 -08:00
gVisor bot 6dced977ea Ensure fsimpl/gofer.dentryPlatformFile.hostFileMapper is initialized.
Fixes #1812. (The more direct cause of the deadlock is panic unsafety because
the historically high cost of defer means that we avoid it in hot paths,
including much of MM; defer is much cheaper as of Go 1.14, but still a
measurable overhead.)

PiperOrigin-RevId: 294560316
2020-02-11 17:38:57 -08:00
Adin Scannell bb22ebd7fb Add contextual comment.
PiperOrigin-RevId: 294289066
2020-02-10 13:21:30 -08:00
Adin Scannell 4d4d47f0c0 Add contextual note.
PiperOrigin-RevId: 294285723
2020-02-10 13:05:27 -08:00
Fabricio Voznika bfa0bba72a Redirect FIXME to gvisor.dev
PiperOrigin-RevId: 294272755
2020-02-10 12:04:38 -08:00
Fabricio Voznika dcffddf0ca Remove argument from vfs.MountNamespace.DecRef()
Updates #1035

PiperOrigin-RevId: 293194631
2020-02-04 11:48:36 -08:00
Jamie Liu 492229d017 VFS2 gofer client
Updates #1198

Opening host pipes (by spinning in fdpipe) and host sockets is not yet
complete, and will be done in a future CL.

Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels
of backportability:

- "Cache policies" are replaced by InteropMode, which control the behavior of
  timestamps in addition to caching. Under InteropModeExclusive (analogous to
  cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough),
  client timestamps are *not* written back to the server (it is not possible in
  9P or Linux for clients to set ctime, so writing back client-authoritative
  timestamps results in incoherence between atime/mtime and ctime). Under
  InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps
  are not used at all (remote filesystem clocks are authoritative). cacheNone
  is translated to InteropModeShared + new option
  filesystemOptions.specialRegularFiles.

- Under InteropModeShared, "unstable attribute" reloading for permission
  checks, lookup, and revalidation are fused, which is feasible in VFS2 since
  gofer.filesystem controls path resolution. This results in a ~33% reduction
  in RPCs for filesystem operations compared to cacheRemoteRevalidating. For
  example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails
  revalidation, resulting in the instantiation of a new dentry:

  VFS1 RPCs:
  getattr("/")                          // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr()
  walkgetattr("/", "foo") = fid1        // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate()
  clunk(fid1)
  getattr("/foo")                       // CheckPermission
  walkgetattr("/foo", "bar") = fid2     // Revalidate
  clunk(fid2)
  getattr("/foo/bar")                   // CheckPermission
  walkgetattr("/foo/bar", "baz") = fid3 // Revalidate
  clunk(fid3)
  walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup
  getattr("/foo/bar/baz")               // linux.stat() => gofer.inodeOperations.UnstableAttr()

  VFS2 RPCs:
  getattr("/")                          // gofer.filesystem.walkExistingLocked()
  walkgetattr("/", "foo") = fid1        // gofer.filesystem.stepExistingLocked()
  clunk(fid1)
                                        // No getattr: walkgetattr already updated metadata for permission check
  walkgetattr("/foo", "bar") = fid2
  clunk(fid2)
  walkgetattr("/foo/bar", "baz") = fid3
                                        // No clunk: fid3 used for new gofer.dentry
                                        // No getattr: walkgetattr already updated metadata for stat()

- gofer.filesystem.unlinkAt() does not require instantiation of a dentry that
  represents the file to be deleted. Updates #898.

- gofer.regularFileFD.OnClose() skips Tflushf for regular files under
  InteropModeExclusive, as it's nonsensical to request a remote file flush
  without flushing locally-buffered writes to that remote file first.

- Symlink targets are cached when InteropModeShared is not in effect.

- p9.QID.Path (which is already required to be unique for each file within a
  server, and is accordingly already synthesized from device/inode numbers in
  all known gofers) is used as-is for inode numbers, rather than being mapped
  along with attr.RDev in the client to yet another synthetic inode number.

- Relevant parts of fsutil.CachingInodeOperations are inlined directly into
  gofer package code. This avoids having to duplicate part of its functionality
  in fsutil.HostMappable.

PiperOrigin-RevId: 293190213
2020-02-04 11:29:22 -08:00
Dean Deng 6c3072243d Implement file locks for regular tmpfs files in VFSv2.
Add a file lock implementation that can be embedded into various filesystem
implementations.

Updates #1480

PiperOrigin-RevId: 292614758
2020-01-31 14:15:41 -08:00
Fabricio Voznika 396c574db2 Add support for WritableSource in DynamicBytesFileDescriptionImpl
WritableSource is a convenience interface used for files that can
be written to, e.g. /proc/net/ipv4/tpc_sack. It reads max of 4KB
and only from offset 0 which should cover most cases. It can be
extended as neeed.

Updates #1195

PiperOrigin-RevId: 292056924
2020-01-28 18:31:28 -08:00
Jamie Liu 2862b0b1be Add //pkg/sentry/fsimpl/devtmpfs.
PiperOrigin-RevId: 292021389
2020-01-28 15:05:24 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Jamie Liu 18a7e1309d Add support for device special files to VFS2 tmpfs.
PiperOrigin-RevId: 291471892
2020-01-24 17:07:54 -08:00
Rahat Mahmood 896bd654b6 De-duplicate common test functionality for VFS2 filesystems.
PiperOrigin-RevId: 291041576
2020-01-22 15:16:21 -08:00
Jamie Liu 5ab1213a6c Move VFS2 handling of FD readability/writability to vfs.FileDescription.
PiperOrigin-RevId: 291006713
2020-01-22 12:29:36 -08:00
Rahat Mahmood ad1968ed56 Implement sysfs.
PiperOrigin-RevId: 290822487
2020-01-21 15:13:26 -08:00
Fabricio Voznika d46c397a1c Add line break to /proc/net files
Some files were missing the last line break.

PiperOrigin-RevId: 290808898
2020-01-21 13:28:24 -08:00
Fabricio Voznika 8e8d0f96f6 Add /proc/[pid]/cgroups file
Updates #1195

PiperOrigin-RevId: 290298266
2020-01-17 10:41:44 -08:00
Fabricio Voznika ff99609858 Add /proc/net/* files
Updates #1195

PiperOrigin-RevId: 290285420
2020-01-17 10:21:46 -08:00
Nicolas Lacasse 70d7c52bd7 Implement tmpfs.SetStat with a size argument.
This is similar to 'Truncate' in vfs1.

Updates https://github.com/google/gvisor/issues/1197

PiperOrigin-RevId: 290139140
2020-01-16 14:39:55 -08:00
Fabricio Voznika 3dd3275da7 Add more files to /proc/[pid]/*
Files not implemented require VFSv2 plumbing into the kernel.
Also, cgroup is not implemented yet.

Updates #1195

PiperOrigin-RevId: 290129176
2020-01-16 14:10:05 -08:00
Fabricio Voznika 7b7c31820b Add remaining /proc/* and /proc/sys/* files
Except for one under /proc/sys/net/ipv4/tcp_sack.
/proc/pid/* is still incomplete.

Updates #1195

PiperOrigin-RevId: 290120438
2020-01-16 12:30:21 -08:00
Nicolas Lacasse d6fb1ec6c7 Add timestamps to VFS2 tmpfs, and implement some of SetStat.
PiperOrigin-RevId: 289962040
2020-01-15 16:32:55 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Dean Deng 565b641483 Define sizes for extent headers and entries separately to improve clarity.
PiperOrigin-RevId: 288799694
2020-01-08 16:58:12 -08:00
Fabricio Voznika db376e1392 Make /proc/[pid] offset start at TGID_OFFSET
Updates #1195

PiperOrigin-RevId: 288725745
2020-01-08 10:45:12 -08:00
Nicolas Lacasse 51f3ab85e0 Convert memfs into proto-tmpfs.
- Renamed memfs to tmpfs.
- Copied fileRangeSet bits from fs/fsutil/ to fsimpl/tmpfs/
- Changed tmpfs to be backed by filemem instead of byte slice.
- regularFileReadWriter uses a sync.Pool, similar to gofer client.

PiperOrigin-RevId: 288356380
2020-01-06 12:52:55 -08:00
Jamie Liu 1f384ac42b Add VFS2 support for device special files.
- Add FileDescriptionOptions.UseDentryMetadata, which reduces the amount of
  boilerplate needed for device FDs and the like between filesystems.

- Switch back to having FileDescription.Init() take references on the Mount and
  Dentry; otherwise managing refcounts around failed calls to
  OpenDeviceSpecialFile() / Device.Open() is tricky.

PiperOrigin-RevId: 287575574
2019-12-30 11:36:41 -08:00
Jamie Liu 796f53c0be Add VFS2 support for /proc/filesystems.
Updates #1195

PiperOrigin-RevId: 287269106
2019-12-27 00:13:54 -08:00
Fabricio Voznika 3c125eb219 Initial procfs implementation in VFSv2
Updates #1195

PiperOrigin-RevId: 287227722
2019-12-26 14:45:35 -08:00
Fabricio Voznika 574e988f2b Fix deadlock in kernfs.Filesystem.revalidateChildLocked
It was calling Dentry.InsertChild with the dentry's mutex
already locked.

Updates #1035

PiperOrigin-RevId: 286962742
2019-12-23 17:32:46 -08:00
Jamie Liu f45df7505b Clean up vfs.FilesystemImpl methods that operate on parent directories.
- Make FilesystemImpl methods that operate on parent directories require
  !rp.Done() (i.e. there is at least one path component to resolve) as
  precondition and postcondition (in cases where they do not finish path
  resolution due to mount boundary / absolute symlink), and require that they
  do not need to follow the last path component (the file being created /
  deleted) as a symlink. Check for these in VFS.

- Add FilesystemImpl.GetParentDentryAt(), which is required to obtain the old
  parent directory for VFS.RenameAt(). (Passing the Dentry to be renamed
  instead has the wrong semantics if the file named by the old path is a mount
  point since the Dentry will be on the wrong Mount.)

- Update memfs to implement these methods correctly (?), including RenameAt.

- Change fspath.Parse() to allow empty paths (to simplify implementation of
  AT_EMPTY_PATH).

- Change vfs.PathOperation to take a fspath.Path instead of a raw pathname;
  non-test callers will need to fspath.Parse() pathnames themselves anyway in
  order to detect absolute paths and select PathOperation.Start accordingly.

PiperOrigin-RevId: 286934941
2019-12-23 13:18:39 -08:00