Commit Graph

16 Commits

Author SHA1 Message Date
Jamie Liu 9b5e305e05 Remove filesystem structure from vfs.Dentry.
This change:

- Drastically simplifies the synchronization model: filesystem structure is
  both implementation-defined and implementation-synchronized.

- Allows implementations of vfs.DentryImpl to use implementation-specific
  dentry types, reducing casts during path traversal.

- Doesn't require dentries representing non-directory files to waste space on a
  map of children.

- Allows dentry revalidation and mount lookup to be correctly ordered (fixed
  FIXME in fsimpl/gofer/filesystem.go).

- Removes the need to have two separate maps in gofer.dentry
  (dentry.vfsd.children and dentry.negativeChildren) for positive and negative
  lookups respectively.

//pkg/sentry/fsimpl/tmpfs/benchmark_test.go:
name                        old time/op  new time/op  delta
VFS2TmpfsStat/1-112          172ns ± 4%   165ns ± 3%   -4.08%  (p=0.002 n=9+9)
VFS2TmpfsStat/2-112          199ns ± 3%   195ns ±10%     ~     (p=0.132 n=8+9)
VFS2TmpfsStat/3-112          230ns ± 2%   216ns ± 2%   -6.15%  (p=0.000 n=8+8)
VFS2TmpfsStat/8-112          390ns ± 2%   358ns ± 4%   -8.33%  (p=0.000 n=9+8)
VFS2TmpfsStat/64-112        2.20µs ± 3%  2.01µs ± 3%   -8.48%  (p=0.000 n=10+8)
VFS2TmpfsStat/100-112       3.42µs ± 9%  3.08µs ± 2%   -9.82%  (p=0.000 n=9+8)
VFS2TmpfsMountStat/1-112     278ns ± 1%   286ns ±15%     ~     (p=0.712 n=8+10)
VFS2TmpfsMountStat/2-112     311ns ± 4%   298ns ± 2%   -4.27%  (p=0.000 n=9+8)
VFS2TmpfsMountStat/3-112     339ns ± 3%   330ns ± 9%     ~     (p=0.070 n=8+9)
VFS2TmpfsMountStat/8-112     503ns ± 3%   466ns ± 3%   -7.38%  (p=0.000 n=8+8)
VFS2TmpfsMountStat/64-112   2.53µs ±16%  2.17µs ± 7%  -14.19%  (p=0.000 n=10+9)
VFS2TmpfsMountStat/100-112  3.60µs ± 4%  3.30µs ± 8%   -8.33%  (p=0.001 n=8+9)

Updates #1035

PiperOrigin-RevId: 307655892
2020-04-21 12:18:07 -07:00
Fabricio Voznika 6dd5a1f3fe Clean up TODOs
PiperOrigin-RevId: 305592245
2020-04-08 17:58:13 -07:00
Dean Deng 5b2396d244 Fix typo in TODO comments.
PiperOrigin-RevId: 304508083
2020-04-02 17:07:13 -07:00
Nicolas Lacasse e1c8eaca8f Fix /proc/self/mounts and /proc/self/mountinfo in VFS2.
Some extra fields were added to the Mount type to expose necessary data to the
proc filesystem.

PiperOrigin-RevId: 304053361
2020-03-31 15:07:26 -07:00
Nicolas Lacasse e466ab04a2 Add unique ID to Mount type.
Analagous to Linux's mount.mnt_id. This ID is displayed in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303185564
2020-03-26 13:49:59 -07:00
Fabricio Voznika 8f8f16efaf Add support for mount flags
Plumbs MS_NOEXEC and MS_RDONLY. Others are TODO.

Updates #1623 #1193

PiperOrigin-RevId: 300764669
2020-03-13 08:58:04 -07:00
Jamie Liu a92087f0f8 Add VFS.NewDisconnectedMount().
Analogous to Linux's kern_mount().

PiperOrigin-RevId: 297259580
2020-02-25 19:13:30 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00
Fabricio Voznika dcffddf0ca Remove argument from vfs.MountNamespace.DecRef()
Updates #1035

PiperOrigin-RevId: 293194631
2020-02-04 11:48:36 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
Jamie Liu 796f53c0be Add VFS2 support for /proc/filesystems.
Updates #1195

PiperOrigin-RevId: 287269106
2019-12-27 00:13:54 -08:00
Jamie Liu 46651a7d26 Add most VFS methods for syscalls.
PiperOrigin-RevId: 284892289
2019-12-10 18:21:07 -08:00
Jamie Liu b72e1b3c08 Minor VFS2 interface changes.
- Remove the Filesystem argument from DentryImpl.*Ref(); in general DentryImpls
  that need the Filesystem for reference counting will probably also need it
  for other interface methods that don't plumb Filesystem, so it's easier to
  just store a pointer to the filesystem in the DentryImpl.

- Add a pointer to the VirtualFilesystem to Filesystem, which is needed by the
  gofer client to disown dentries for cache eviction triggered by dentry
  reference count changes.

- Rename FilesystemType.NewFilesystem to GetFilesystem; in some cases (e.g.
  sysfs, cgroupfs) it's much cleaner for there to be only one Filesystem that
  is used by all mounts, and in at least one case (devtmpfs) it's visibly
  incorrect not to do so, so NewFilesystem doesn't always actually create and
  return a *new* Filesystem.

- Require callers of FileDescription.Init() to increment Mount/Dentry
  references. This is because the gofer client may, in the OpenAt() path, take
  a reference on a dentry with 0 references, which is safe due to
  synchronization that is outside the scope of this CL, and it would be safer
  to still have its implementation of DentryImpl.IncRef() check for an
  increment for 0 references in other cases.

- Add FileDescription.TryIncRef. This is used by the gofer client to take
  references on "special file descriptions" (FDs for files such as pipes,
  sockets, and devices), which use per-FD handles (fids) instead of
  dentry-shared handles, for sync() and syncfs().

PiperOrigin-RevId: 282473364
2019-11-25 18:10:31 -08:00
Jamie Liu 128948d6ae Implement basic umounting for vfs2.
This is required to test filesystems with a non-trivial implementation of
FilesystemImpl.Release(). Propagation isn't handled yet, and umount isn't yet
plumbed out to VirtualFilesystem.UmountAt(), but otherwise the implementation
of umount is believed to be correct.

- Move entering mountTable.seq writer critical sections to callers of
  mountTable.{insert,remove}Seqed. This is required since umount(2) must ensure
  that no new references are taken on the candidate mount after checking that
  it isn't busy, which is only possible by entering a vfs.mountTable.seq writer
  critical section before the check and remaining in it until after
  VFS.umountRecursiveLocked() is complete. (Linux does the same thing:
  fs/namespace.c:do_umount() => lock_mount_hash(),
  fs/pnode.c:propagate_mount_busy(), umount_tree(), unlock_mount_hash().)

- It's not possible for dentry deletion to umount while only holding
  VFS.mountMu for reading, but it's also very unappealing to hold VFS.mountMu
  exclusively around e.g. gofer unlink RPCs. Introduce dentry.mu to avoid these
  problems. This means that VFS.mountMu is never acquired for reading, so
  change it to a sync.Mutex.

PiperOrigin-RevId: 282444343
2019-11-25 15:21:49 -08:00
Jamie Liu 163ab5e9ba Sentry virtual filesystem, v2
Major differences from the current ("v1") sentry VFS:

- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).

- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.

Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:

BenchmarkVFS1TmpfsStat/1-6               3000000               497 ns/op
BenchmarkVFS1TmpfsStat/2-6               2000000               676 ns/op
BenchmarkVFS1TmpfsStat/3-6               2000000               904 ns/op
BenchmarkVFS1TmpfsStat/8-6               1000000              1944 ns/op
BenchmarkVFS1TmpfsStat/64-6               100000             14067 ns/op
BenchmarkVFS1TmpfsStat/100-6               50000             21700 ns/op
BenchmarkVFS2MemfsStat/1-6              10000000               197 ns/op
BenchmarkVFS2MemfsStat/2-6               5000000               233 ns/op
BenchmarkVFS2MemfsStat/3-6               5000000               268 ns/op
BenchmarkVFS2MemfsStat/8-6               3000000               477 ns/op
BenchmarkVFS2MemfsStat/64-6               500000              2592 ns/op
BenchmarkVFS2MemfsStat/100-6              300000              4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6          2000000               679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6          2000000               912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6          1000000              1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6          1000000              2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6                  100000             14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6                 100000             22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6                  5000000               317 ns/op
BenchmarkVFS2MemfsMountStat/2-6                  5000000               361 ns/op
BenchmarkVFS2MemfsMountStat/3-6                  5000000               387 ns/op
BenchmarkVFS2MemfsMountStat/8-6                  3000000               582 ns/op
BenchmarkVFS2MemfsMountStat/64-6                  500000              2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6                 300000              4133 ns/op

From this we can infer that, on this machine:

- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.

- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.

- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.

PiperOrigin-RevId: 258853946
2019-07-18 15:10:29 -07:00