Commit Graph

21 Commits

Author SHA1 Message Date
Jamie Liu 744401297a Add VFS2 plumbing for extended attributes.
PiperOrigin-RevId: 286281274
2019-12-18 15:48:45 -08:00
Michael Pratt 334a513f11 Add Mems_allowed to /proc/PID/status
PiperOrigin-RevId: 286248378
2019-12-18 13:16:28 -08:00
Rahat Mahmood 007707a072 Implement kernfs.
PiperOrigin-RevId: 285231002
2019-12-12 11:20:47 -08:00
Jamie Liu 481dbfa5ab Add vfs.Pathname{WithDeleted,ForGetcwd}.
The former is needed for vfs.FileDescription to implement
memmap.MappingIdentity, and the latter is needed to implement getcwd(2).

PiperOrigin-RevId: 285051855
2019-12-11 14:26:32 -08:00
Jamie Liu 46651a7d26 Add most VFS methods for syscalls.
PiperOrigin-RevId: 284892289
2019-12-10 18:21:07 -08:00
Fabricio Voznika 3e832bec1b Point TODOs to gvisor.dev
PiperOrigin-RevId: 283610781
2019-12-03 13:32:31 -08:00
Jamie Liu b72e1b3c08 Minor VFS2 interface changes.
- Remove the Filesystem argument from DentryImpl.*Ref(); in general DentryImpls
  that need the Filesystem for reference counting will probably also need it
  for other interface methods that don't plumb Filesystem, so it's easier to
  just store a pointer to the filesystem in the DentryImpl.

- Add a pointer to the VirtualFilesystem to Filesystem, which is needed by the
  gofer client to disown dentries for cache eviction triggered by dentry
  reference count changes.

- Rename FilesystemType.NewFilesystem to GetFilesystem; in some cases (e.g.
  sysfs, cgroupfs) it's much cleaner for there to be only one Filesystem that
  is used by all mounts, and in at least one case (devtmpfs) it's visibly
  incorrect not to do so, so NewFilesystem doesn't always actually create and
  return a *new* Filesystem.

- Require callers of FileDescription.Init() to increment Mount/Dentry
  references. This is because the gofer client may, in the OpenAt() path, take
  a reference on a dentry with 0 references, which is safe due to
  synchronization that is outside the scope of this CL, and it would be safer
  to still have its implementation of DentryImpl.IncRef() check for an
  increment for 0 references in other cases.

- Add FileDescription.TryIncRef. This is used by the gofer client to take
  references on "special file descriptions" (FDs for files such as pipes,
  sockets, and devices), which use per-FD handles (fids) instead of
  dentry-shared handles, for sync() and syncfs().

PiperOrigin-RevId: 282473364
2019-11-25 18:10:31 -08:00
Jamie Liu 128948d6ae Implement basic umounting for vfs2.
This is required to test filesystems with a non-trivial implementation of
FilesystemImpl.Release(). Propagation isn't handled yet, and umount isn't yet
plumbed out to VirtualFilesystem.UmountAt(), but otherwise the implementation
of umount is believed to be correct.

- Move entering mountTable.seq writer critical sections to callers of
  mountTable.{insert,remove}Seqed. This is required since umount(2) must ensure
  that no new references are taken on the candidate mount after checking that
  it isn't busy, which is only possible by entering a vfs.mountTable.seq writer
  critical section before the check and remaining in it until after
  VFS.umountRecursiveLocked() is complete. (Linux does the same thing:
  fs/namespace.c:do_umount() => lock_mount_hash(),
  fs/pnode.c:propagate_mount_busy(), umount_tree(), unlock_mount_hash().)

- It's not possible for dentry deletion to umount while only holding
  VFS.mountMu for reading, but it's also very unappealing to hold VFS.mountMu
  exclusively around e.g. gofer unlink RPCs. Introduce dentry.mu to avoid these
  problems. This means that VFS.mountMu is never acquired for reading, so
  change it to a sync.Mutex.

PiperOrigin-RevId: 282444343
2019-11-25 15:21:49 -08:00
Adin Scannell c0f89eba6e Import and structure cleanup.
PiperOrigin-RevId: 281795269
2019-11-21 11:41:30 -08:00
Kevin Krakauer 652f7b1d0f Add support for pipes in VFS2.
PiperOrigin-RevId: 275650307
2019-10-19 11:49:38 -07:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Jamie Liu 0457a4c4cb Minor vfs.FileDescriptionImpl fixes.
- Pass context.Context to OnClose().

- Pass memmap.MMapOpts to ConfigureMMap() by pointer so that implementations
  can actually mutate it as required.

PiperOrigin-RevId: 274934967
2019-10-15 18:40:45 -07:00
Kevin Krakauer 7ef1c44a7f Change linux.FileMode from uint to uint16, and update VFS to use FileMode.
In Linux (include/linux/types.h), mode_t is an unsigned short.

PiperOrigin-RevId: 272956350
2019-10-04 14:20:32 -07:00
Jamie Liu fb55c2bd0d Change vfs.Dirent.Off to NextOff.
"d_off is the distance from the start of the directory to the start of the next
linux_dirent." - getdents(2).

PiperOrigin-RevId: 270349685
2019-09-20 14:24:29 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Adin Scannell 67a2ab1438 Impose order on test scripts.
The simple test script has gotten out of control. Shard this script into
different pieces and attempt to impose order on overall test structure. This
change helps lay some of the foundations for future improvements.

 * The runsc/test directories are moved into just test/.
 * The runsc/test/testutil package is split into logical pieces.
 * The scripts/ directory contains new top-level targets.
 * Each test is now responsible for building targets it requires.
 * The install functionality is moved into `runsc` itself for simplicity.
 * The existing kokoro run_tests.sh file now just calls all (can be split).

After this change is merged,  I will create multiple distinct workflows for
Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today,
which should dramatically reduce the time-to-run for the Kokoro tests, and
provides a better foundation for further improvements to the infrastructure.

PiperOrigin-RevId: 267081397
2019-09-03 22:02:43 -07:00
Ayush Ranjan 661b2b9f69 procfs: Migrate seqfile implementations.
Migrates all (except 3) seqfile implementations to the vfs.DynamicBytesSource
interface. There should not be any change in functionality due to this migration
itself.

Please note that the following seqfile implementations have not been migrated:
- /proc/filesystems in proc/filesystems.go
- /proc/[pid]/mountinfo in proc/mounts.go
- /proc/[pid]/mounts in proc/mounts.go
This is because these depend on pending changes in /pkg/senty/vfs.

PiperOrigin-RevId: 263880719
2019-08-16 17:36:42 -07:00
Ayush Ranjan 4bab7d7f08 vfs: Remove vfs.DefaultDirectoryFD from embedding vfs.DefaultFD.
This fixes the implementation ambiguity issues when a filesystem
implementation embeds vfs.DefaultDirectoryFD to its directory FD along
with an internal common fileDescription utility.

For similar reasons also removes FileDescriptionDefaultImpl from
DynamicBytesFileDescriptionImpl.

PiperOrigin-RevId: 263795513
2019-08-16 10:20:11 -07:00
Ayush Ranjan c8961a6cbd ext: Move to pkg/sentry/fsimpl.
fsimpl is the keeper of all filesystem implementations in VFS2.

PiperOrigin-RevId: 262617869
2019-08-09 13:08:28 -07:00
Jamie Liu 06102af65a memfs fixes.
- Unexport Filesystem/Dentry/Inode.

- Support SEEK_CUR in directoryFD.Seek().

- Hold Filesystem.mu before touching directoryFD.off in
directoryFD.Seek().

- Remove deleted Dentries from their parent directory.childLists.

- Remove invalid FIXMEs.

PiperOrigin-RevId: 262400633
2019-08-08 11:46:38 -07:00
Jamie Liu 163ab5e9ba Sentry virtual filesystem, v2
Major differences from the current ("v1") sentry VFS:

- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).

- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.

Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:

BenchmarkVFS1TmpfsStat/1-6               3000000               497 ns/op
BenchmarkVFS1TmpfsStat/2-6               2000000               676 ns/op
BenchmarkVFS1TmpfsStat/3-6               2000000               904 ns/op
BenchmarkVFS1TmpfsStat/8-6               1000000              1944 ns/op
BenchmarkVFS1TmpfsStat/64-6               100000             14067 ns/op
BenchmarkVFS1TmpfsStat/100-6               50000             21700 ns/op
BenchmarkVFS2MemfsStat/1-6              10000000               197 ns/op
BenchmarkVFS2MemfsStat/2-6               5000000               233 ns/op
BenchmarkVFS2MemfsStat/3-6               5000000               268 ns/op
BenchmarkVFS2MemfsStat/8-6               3000000               477 ns/op
BenchmarkVFS2MemfsStat/64-6               500000              2592 ns/op
BenchmarkVFS2MemfsStat/100-6              300000              4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6          2000000               679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6          2000000               912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6          1000000              1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6          1000000              2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6                  100000             14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6                 100000             22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6                  5000000               317 ns/op
BenchmarkVFS2MemfsMountStat/2-6                  5000000               361 ns/op
BenchmarkVFS2MemfsMountStat/3-6                  5000000               387 ns/op
BenchmarkVFS2MemfsMountStat/8-6                  3000000               582 ns/op
BenchmarkVFS2MemfsMountStat/64-6                  500000              2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6                 300000              4133 ns/op

From this we can infer that, on this machine:

- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.

- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.

- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.

PiperOrigin-RevId: 258853946
2019-07-18 15:10:29 -07:00