Commit Graph

88 Commits

Author SHA1 Message Date
Dean Deng 24bee1c181 Record VFS2 sockets in global socket map.
Updates #1476, #1478, #1484, #1485.

PiperOrigin-RevId: 304845354
2020-04-04 21:02:42 -07:00
Dean Deng 5818663ebe Add FileDescriptionImpl for Unix sockets.
This change involves several steps:
- Refactor the VFS1 unix socket implementation to share methods between VFS1
  and VFS2 where possible. Re-implement the rest.
- Override the default PRead, Read, PWrite, Write, Ioctl, Release methods in
  FileDescriptionDefaultImpl.
- Add functions to create and initialize a new Dentry/Inode and FileDescription
  for a Unix socket file.

Updates #1476

PiperOrigin-RevId: 304689796
2020-04-03 14:08:54 -07:00
Fabricio Voznika dd3bc49997 Add NAME_MAX checks and update file times
NAME_MAX should be enforced per filesystem implementation
because other file systems may not have the same restriction.

Gofer filesystem now keeps a reference to the kernel clock to
avoid lookup in the Context on file access to update atime.

Update access, modification, and status change times in tmpfs.

Updates #1197, #1198.

PiperOrigin-RevId: 304527148
2020-04-02 19:39:03 -07:00
Dean Deng 5b2396d244 Fix typo in TODO comments.
PiperOrigin-RevId: 304508083
2020-04-02 17:07:13 -07:00
Nicolas Lacasse 0d1e299079 Pass configurable FilesystemType to tmpfs.
PiperOrigin-RevId: 304234086
2020-04-01 12:06:37 -07:00
Dean Deng 639d94f9f7 Add socket filesystem and global disconnected socket mount for VFS2.
A socket mount where anonymous sockets will reside is added to the
VirtualFilesystem. Socketfs is built on top of kernfs.

Updates #1476, #1478, #1484, #1485.

PiperOrigin-RevId: 304095251
2020-03-31 19:17:12 -07:00
Nicolas Lacasse e1c8eaca8f Fix /proc/self/mounts and /proc/self/mountinfo in VFS2.
Some extra fields were added to the Mount type to expose necessary data to the
proc filesystem.

PiperOrigin-RevId: 304053361
2020-03-31 15:07:26 -07:00
Nicolas Lacasse 9de982ea79 Allow passing root file type to tmpfs.
PiperOrigin-RevId: 304053357
2020-03-31 15:02:57 -07:00
Nicolas Lacasse 10f2c8db91 Add FilesystemType.Name method, and FilesystemType field to Filesystem struct.
Both have analogues in Linux:
* struct file_system_type has a char *name field.
* struct super_block keeps a pointer to the file_system_type.

These fields are necessary to support the `filesystem type` field in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303434063
2020-03-27 16:56:16 -07:00
Dean Deng 76a7ace751 Add BoundEndpointAt filesystem operation.
BoundEndpointAt() is needed to support Unix sockets bound at a
file path, corresponding to BoundEndpoint() in VFS1.

Updates #1476.

PiperOrigin-RevId: 303258251
2020-03-26 21:52:24 -07:00
Dean Deng 137f361400 Use host-defined file owner and mode, when possible, for imported fds.
Using the host-defined file owner matches VFS1. It is more correct to use the
host-defined mode, since the cached value may become out of date. However,
kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are
in-memory so retrieving mode should not fail. Therefore, if the host syscall
fails, we rely on a cached value instead.

Updates #1672.

PiperOrigin-RevId: 303220864
2020-03-26 16:47:20 -07:00
Fabricio Voznika de694e5484 Combine file mode and isDir arguments
Updates #1035

PiperOrigin-RevId: 303021328
2020-03-26 08:48:04 -07:00
Fabricio Voznika e541ebec2f Misc fixes to make stat_test pass (almost)
The only test failing now requires socket which is not
available in VFS2 yet.

Updates #1198

PiperOrigin-RevId: 302976572
2020-03-25 14:59:15 -07:00
Fabricio Voznika c7f5673529 Set file mode and type to attribute
Makes less error prone to find file type.

Updates #1197

PiperOrigin-RevId: 302974244
2020-03-25 14:49:13 -07:00
Dean Deng 248e46f320 Whitelist utimensat(2).
utimensat is used by hostfs for setting timestamps on imported fds. Previously,
this would crash the sandbox since utimensat was not allowed.

Correct the VFS2 version of hostfs to match the call in VFS1.

PiperOrigin-RevId: 301970121
2020-03-19 23:30:21 -07:00
Dean Deng 3a42638a0b Port imported TTY fds to vfs2.
Refactor fs/host.TTYFileOperations so that the relevant functionality can be
shared with VFS2 (fsimpl/host.ttyFD).

Incorporate host.defaultFileFD into the default host.fileDescription. This way,
there is no need for a separate default_file.go. As in vfs1, the TTY file
implementation can be built on top of this default and override operations as
necessary (PRead/Read/PWrite/Write, Release, Ioctl).

Note that these changes still need to be plumbed into runsc, which refers to
imported TTYs in control/proc.go:ExecAsync.

Updates #1672.

PiperOrigin-RevId: 301718157
2020-03-18 19:12:10 -07:00
Fabricio Voznika 2a6c4369be Enforce file size rlimits in VFS2
Updates #1035

PiperOrigin-RevId: 301255357
2020-03-16 16:00:49 -07:00
Fabricio Voznika 0f60799a4f Add calls to vfs.CheckSetStat to fsimpls
Only gofer filesystem was calling vfs.CheckSetStat for
vfs.FilesystemImpl.SetStatAt and vfs.FileDescriptionImpl.SetStat.

Updates #1193, #1672, #1197

PiperOrigin-RevId: 301226522
2020-03-16 13:29:12 -07:00
Fabricio Voznika 9712775028 Disallow kernfs.Inode.SetStat for readonly inodes
Updates #1195, #1193

PiperOrigin-RevId: 300950993
2020-03-14 13:48:06 -07:00
Dean Deng 5e413cad10 Plumb VFS2 imported fds into virtual filesystem.
- When setting up the virtual filesystem, mount a host.filesystem to contain
  all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
  supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
  enabled.

PiperOrigin-RevId: 300922353
2020-03-14 07:14:33 -07:00
Fabricio Voznika 45a8ae240d Add remaining procfs files
Closes #1195

PiperOrigin-RevId: 300867055
2020-03-13 18:57:07 -07:00
Jamie Liu 1c05352970 Fix oom_score_adj.
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj).

- Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu
  and TaskSet.mu in the wrong order (via Task.ExitState()).

PiperOrigin-RevId: 300814698
2020-03-13 13:19:13 -07:00
Dean Deng 2e38408f20 Implement access/faccessat for VFS2.
Note that the raw faccessat system call does not actually take a flags argument;
according to faccessat(2), the glibc wrapper implements the flags by using
fstatat(2). Remove the flag argument that we try to extract from vfs1, which
would just be a garbage value.

Updates #1965
Fixes #2101

PiperOrigin-RevId: 300796067
2020-03-13 11:41:08 -07:00
Ting-Yu Wang b36de6e7be Move /proc/net to /proc/PID/net, and make /proc/net -> /proc/self/net.
Issue #1833

PiperOrigin-RevId: 299998105
2020-03-09 19:59:09 -07:00
Dean Deng 960f6a975b Add plumbing for importing fds in VFS2, along with non-socket, non-TTY impl.
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.

An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.

This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.

Updates #1672

PiperOrigin-RevId: 299417263
2020-03-06 12:59:49 -08:00
Ian Lewis da48fc6cca Stub oom_score_adj and oom_score.
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts
writes of values -1000 to 1000 and persists the value with the task. New tasks
inherit the parent's oom_score_adj.

oom_score is a read-only stub that always returns the value '0'.

Issue #202

PiperOrigin-RevId: 299245355
2020-03-05 18:23:01 -08:00
Fabricio Voznika 122d47aed1 Update cached file size when cache is skipped
gofer.dentryReadWriter.WriteFromBlocks was not updating
gofer.dentry.size after a write operation that skips the
cache.

Updates #1198

PiperOrigin-RevId: 298708646
2020-03-03 15:29:13 -08:00
Fabricio Voznika 0f8a9e3623 Change dup2 call to dup3
We changed syscalls to allow dup3 for ARM64.

Updates #1198

PiperOrigin-RevId: 297870816
2020-02-28 10:15:20 -08:00
Jamie Liu 471b15b212 Port most syscalls to VFS2.
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.

Updates #1623

PiperOrigin-RevId: 297188448
2020-02-25 13:37:34 -08:00
gVisor bot 4a73bae269 Initial network namespace support.
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.

Before the userspace is able to create device, the default loopback device can
be used to test.

/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.

Issue #1833

PiperOrigin-RevId: 296309389
2020-02-20 15:20:40 -08:00
gVisor bot 10ed60e477 VFS2: Support memory mapping in tmpfs.
tmpfs.fileDescription now implements ConfigureMMap. And tmpfs.regularFile
implement memmap.Mappable. The methods are mostly unchanged from VFS1 tmpfs.

PiperOrigin-RevId: 296234557
2020-02-20 09:58:10 -08:00
gVisor bot 3557b26651 Allow vfs.IterDirentsCallback.Handle() to return an error.
This is easier than storing errors from e.g. CopyOut in the callback.

PiperOrigin-RevId: 295230021
2020-02-14 14:40:35 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00
gVisor bot a6024f7f5f Add FileExec flag to OpenOptions
This allow callers to say whether the file is being
opened to be executed, so that the proper checks can
be done from FilesystemImpl.OpenAt()

Updates #1623

PiperOrigin-RevId: 295042595
2020-02-13 17:57:36 -08:00
gVisor bot 6dced977ea Ensure fsimpl/gofer.dentryPlatformFile.hostFileMapper is initialized.
Fixes #1812. (The more direct cause of the deadlock is panic unsafety because
the historically high cost of defer means that we avoid it in hot paths,
including much of MM; defer is much cheaper as of Go 1.14, but still a
measurable overhead.)

PiperOrigin-RevId: 294560316
2020-02-11 17:38:57 -08:00
Adin Scannell bb22ebd7fb Add contextual comment.
PiperOrigin-RevId: 294289066
2020-02-10 13:21:30 -08:00
Adin Scannell 4d4d47f0c0 Add contextual note.
PiperOrigin-RevId: 294285723
2020-02-10 13:05:27 -08:00
Fabricio Voznika bfa0bba72a Redirect FIXME to gvisor.dev
PiperOrigin-RevId: 294272755
2020-02-10 12:04:38 -08:00
Fabricio Voznika dcffddf0ca Remove argument from vfs.MountNamespace.DecRef()
Updates #1035

PiperOrigin-RevId: 293194631
2020-02-04 11:48:36 -08:00
Jamie Liu 492229d017 VFS2 gofer client
Updates #1198

Opening host pipes (by spinning in fdpipe) and host sockets is not yet
complete, and will be done in a future CL.

Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels
of backportability:

- "Cache policies" are replaced by InteropMode, which control the behavior of
  timestamps in addition to caching. Under InteropModeExclusive (analogous to
  cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough),
  client timestamps are *not* written back to the server (it is not possible in
  9P or Linux for clients to set ctime, so writing back client-authoritative
  timestamps results in incoherence between atime/mtime and ctime). Under
  InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps
  are not used at all (remote filesystem clocks are authoritative). cacheNone
  is translated to InteropModeShared + new option
  filesystemOptions.specialRegularFiles.

- Under InteropModeShared, "unstable attribute" reloading for permission
  checks, lookup, and revalidation are fused, which is feasible in VFS2 since
  gofer.filesystem controls path resolution. This results in a ~33% reduction
  in RPCs for filesystem operations compared to cacheRemoteRevalidating. For
  example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails
  revalidation, resulting in the instantiation of a new dentry:

  VFS1 RPCs:
  getattr("/")                          // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr()
  walkgetattr("/", "foo") = fid1        // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate()
  clunk(fid1)
  getattr("/foo")                       // CheckPermission
  walkgetattr("/foo", "bar") = fid2     // Revalidate
  clunk(fid2)
  getattr("/foo/bar")                   // CheckPermission
  walkgetattr("/foo/bar", "baz") = fid3 // Revalidate
  clunk(fid3)
  walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup
  getattr("/foo/bar/baz")               // linux.stat() => gofer.inodeOperations.UnstableAttr()

  VFS2 RPCs:
  getattr("/")                          // gofer.filesystem.walkExistingLocked()
  walkgetattr("/", "foo") = fid1        // gofer.filesystem.stepExistingLocked()
  clunk(fid1)
                                        // No getattr: walkgetattr already updated metadata for permission check
  walkgetattr("/foo", "bar") = fid2
  clunk(fid2)
  walkgetattr("/foo/bar", "baz") = fid3
                                        // No clunk: fid3 used for new gofer.dentry
                                        // No getattr: walkgetattr already updated metadata for stat()

- gofer.filesystem.unlinkAt() does not require instantiation of a dentry that
  represents the file to be deleted. Updates #898.

- gofer.regularFileFD.OnClose() skips Tflushf for regular files under
  InteropModeExclusive, as it's nonsensical to request a remote file flush
  without flushing locally-buffered writes to that remote file first.

- Symlink targets are cached when InteropModeShared is not in effect.

- p9.QID.Path (which is already required to be unique for each file within a
  server, and is accordingly already synthesized from device/inode numbers in
  all known gofers) is used as-is for inode numbers, rather than being mapped
  along with attr.RDev in the client to yet another synthetic inode number.

- Relevant parts of fsutil.CachingInodeOperations are inlined directly into
  gofer package code. This avoids having to duplicate part of its functionality
  in fsutil.HostMappable.

PiperOrigin-RevId: 293190213
2020-02-04 11:29:22 -08:00
Dean Deng 6c3072243d Implement file locks for regular tmpfs files in VFSv2.
Add a file lock implementation that can be embedded into various filesystem
implementations.

Updates #1480

PiperOrigin-RevId: 292614758
2020-01-31 14:15:41 -08:00
Fabricio Voznika 396c574db2 Add support for WritableSource in DynamicBytesFileDescriptionImpl
WritableSource is a convenience interface used for files that can
be written to, e.g. /proc/net/ipv4/tpc_sack. It reads max of 4KB
and only from offset 0 which should cover most cases. It can be
extended as neeed.

Updates #1195

PiperOrigin-RevId: 292056924
2020-01-28 18:31:28 -08:00
Jamie Liu 2862b0b1be Add //pkg/sentry/fsimpl/devtmpfs.
PiperOrigin-RevId: 292021389
2020-01-28 15:05:24 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Jamie Liu 18a7e1309d Add support for device special files to VFS2 tmpfs.
PiperOrigin-RevId: 291471892
2020-01-24 17:07:54 -08:00
Rahat Mahmood 896bd654b6 De-duplicate common test functionality for VFS2 filesystems.
PiperOrigin-RevId: 291041576
2020-01-22 15:16:21 -08:00
Jamie Liu 5ab1213a6c Move VFS2 handling of FD readability/writability to vfs.FileDescription.
PiperOrigin-RevId: 291006713
2020-01-22 12:29:36 -08:00
Rahat Mahmood ad1968ed56 Implement sysfs.
PiperOrigin-RevId: 290822487
2020-01-21 15:13:26 -08:00