Commit Graph

46 Commits

Author SHA1 Message Date
Fabricio Voznika c8f1ce288d Honor readonly flag for root mount
Updates #1487

PiperOrigin-RevId: 330580699
2020-09-08 14:00:43 -07:00
Ayush Ranjan d84ec6c42b [vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.
PiperOrigin-RevId: 330554450
2020-09-08 11:51:39 -07:00
Nicolas Lacasse 5ec3d4ed3e Make mounts ReadWrite first, then later change to ReadOnly.
This lets us create "synthetic" mountpoint directories in ReadOnly mounts
during VFS setup.

Also add context.WithMountNamespace, as some filesystems (like overlay) require
a MountNamespace on ctx to handle vfs.Filesystem Operations.

PiperOrigin-RevId: 327874971
2020-08-21 14:30:03 -07:00
Michael Pratt 129018ab3d Consistent precondition formatting
Our "Preconditions:" blocks are very useful to determine the input invariants,
but they are bit inconsistent throughout the codebase, which makes them harder
to read (particularly cases with 5+ conditions in a single paragraph).

I've reformatted all of the cases to fit in simple rules:

1. Cases with a single condition are placed on a single line.
2. Cases with multiple conditions are placed in a bulleted list.

This format has been added to the style guide.

I've also mentioned "Postconditions:", though those are much less frequently
used, and all uses already match this style.

PiperOrigin-RevId: 327687465
2020-08-20 13:32:24 -07:00
Nayana Bidari b2ae7ea1bb Plumbing context.Context to DecRef() and Release().
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.

PiperOrigin-RevId: 324672584
2020-08-03 13:36:05 -07:00
gVisor bot c81ac8ec3b Merge pull request #2672 from amscanne:shim-integrated
PiperOrigin-RevId: 321053634
2020-07-13 16:10:58 -07:00
Dean Deng 0c628c3152 Support inotify in vfs2 gofer fs.
Because there is no inode structure stored in the sandbox, inotify watches
must be held on the dentry. This would be an issue in the presence of hard
links, where multiple dentries would need to share the same set of watches,
but in VFS2, we do not support the internal creation of hard links on gofer
fs. As a result, we make the assumption that every dentry corresponds to a
unique inode.

Furthermore, dentries can be cached and then evicted, even if the underlying
file has not be deleted. We must prevent this from occurring if there are any
watches that would be lost. Note that if the dentry was deleted or invalidated
(d.vfsd.IsDead()), we should still destroy it along with its watches.

Additionally, when a dentry’s last watch is removed, we cache it if it also
has zero references. This way, the dentry can eventually be evicted from
memory if it is no longer needed. This is accomplished with a new dentry
method, OnZeroWatches(), which is called by Inotify.RmWatch and
Inotify.Release. Note that it must be called after all inotify locks are
released to avoid violating lock order. Stress tests are added to make sure
that inotify operations don't deadlock with gofer.OnZeroWatches.

Updates #1479.

PiperOrigin-RevId: 317958034
2020-06-23 16:14:56 -07:00
Fabricio Voznika 96519e2c9d Implement POSIX locks
- Change FileDescriptionImpl Lock/UnlockPOSIX signature to
  take {start,length,whence}, so the correct offset can be
  calculated in the implementations.
- Create PosixLocker interface to make it possible to share
  the same locking code from different implementations.

Closes #1480

PiperOrigin-RevId: 316910286
2020-06-17 10:04:26 -07:00
Fabricio Voznika d58d57606a Don't copy structs with sync.Mutex during initialization
During inititalization inode struct was copied around, but
it isn't great pratice to copy it around since it contains
ref count and sync.Mutex.

Updates #1480

PiperOrigin-RevId: 315983788
2020-06-11 14:56:19 -07:00
Fabricio Voznika 67565078bb Implement flock(2) in VFS2
LockFD is the generic implementation that can be embedded in
FileDescriptionImpl implementations. Unique lock ID is
maintained in vfs.FileDescription and is created on demand.

Updates #1480

PiperOrigin-RevId: 315604825
2020-06-09 18:46:42 -07:00
Dean Deng ccf69bdd7e Implement IN_EXCL_UNLINK inotify option in vfs2.
Limited to tmpfs. Inotify support in other filesystem implementations to
follow.

Updates #1479

PiperOrigin-RevId: 313828648
2020-05-29 12:28:49 -07:00
Dean Deng fe464f44b7 Port inotify to vfs2, with support in tmpfs.
Support in other filesystem impls is still needed. Unlike in Linux and vfs1, we
need to plumb inotify down to each filesystem implementation in order to keep
track of links/inode structures properly.

IN_EXCL_UNLINK still needs to be implemented, as well as a few inotify hooks
that are not present in either vfs1 or vfs2. Those will be addressed in
subsequent changes.

Updates #1479.

PiperOrigin-RevId: 313781995
2020-05-29 08:09:14 -07:00
Jamie Liu 9115f26851 Allocate device numbers for VFS2 filesystems.
Updates #1197, #1198, #1672

PiperOrigin-RevId: 310432006
2020-05-07 14:01:53 -07:00
Dean Deng ce19497c1c Fix Unix socket permissions.
Enforce write permission checks in BoundEndpointAt, which corresponds to the
permission checks in Linux (net/unix/af_unix.c:unix_find_other).
Also, create bound socket files with the correct permissions in VFS2.

Fixes #2324.

PiperOrigin-RevId: 308949084
2020-04-28 20:13:01 -07:00
Dean Deng f3ca5ca82a Support pipes and sockets in VFS2 gofer fs.
Named pipes and sockets can be represented in two ways in gofer fs:
1. As a file on the remote filesystem. In this case, all file operations are
   passed through 9p.
2. As a synthetic file that is internal to the sandbox. In this case, the
   dentry stores an endpoint or VFSPipe for sockets and pipes respectively,
   which replaces interactions with the remote fs through the gofer.
In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p,
and if it fails, fall back to the synthetic version.

Updates #1200.

PiperOrigin-RevId: 308828161
2020-04-28 08:34:00 -07:00
Adin Scannell 1481499fe2 Simplify Docker test infrastructure.
This change adds a layer of abstraction around the internal Docker APIs,
and eliminates all direct dependencies on Dockerfiles in the infrastructure.

A subsequent change will automated the generation of local images (with
efficient caching). Note that this change drops the use of bazel container
rules, as that experiment does not seem to be viable.

PiperOrigin-RevId: 308095430
2020-04-23 11:33:30 -07:00
Jamie Liu 9b5e305e05 Remove filesystem structure from vfs.Dentry.
This change:

- Drastically simplifies the synchronization model: filesystem structure is
  both implementation-defined and implementation-synchronized.

- Allows implementations of vfs.DentryImpl to use implementation-specific
  dentry types, reducing casts during path traversal.

- Doesn't require dentries representing non-directory files to waste space on a
  map of children.

- Allows dentry revalidation and mount lookup to be correctly ordered (fixed
  FIXME in fsimpl/gofer/filesystem.go).

- Removes the need to have two separate maps in gofer.dentry
  (dentry.vfsd.children and dentry.negativeChildren) for positive and negative
  lookups respectively.

//pkg/sentry/fsimpl/tmpfs/benchmark_test.go:
name                        old time/op  new time/op  delta
VFS2TmpfsStat/1-112          172ns ± 4%   165ns ± 3%   -4.08%  (p=0.002 n=9+9)
VFS2TmpfsStat/2-112          199ns ± 3%   195ns ±10%     ~     (p=0.132 n=8+9)
VFS2TmpfsStat/3-112          230ns ± 2%   216ns ± 2%   -6.15%  (p=0.000 n=8+8)
VFS2TmpfsStat/8-112          390ns ± 2%   358ns ± 4%   -8.33%  (p=0.000 n=9+8)
VFS2TmpfsStat/64-112        2.20µs ± 3%  2.01µs ± 3%   -8.48%  (p=0.000 n=10+8)
VFS2TmpfsStat/100-112       3.42µs ± 9%  3.08µs ± 2%   -9.82%  (p=0.000 n=9+8)
VFS2TmpfsMountStat/1-112     278ns ± 1%   286ns ±15%     ~     (p=0.712 n=8+10)
VFS2TmpfsMountStat/2-112     311ns ± 4%   298ns ± 2%   -4.27%  (p=0.000 n=9+8)
VFS2TmpfsMountStat/3-112     339ns ± 3%   330ns ± 9%     ~     (p=0.070 n=8+9)
VFS2TmpfsMountStat/8-112     503ns ± 3%   466ns ± 3%   -7.38%  (p=0.000 n=8+8)
VFS2TmpfsMountStat/64-112   2.53µs ±16%  2.17µs ± 7%  -14.19%  (p=0.000 n=10+9)
VFS2TmpfsMountStat/100-112  3.60µs ± 4%  3.30µs ± 8%   -8.33%  (p=0.001 n=8+9)

Updates #1035

PiperOrigin-RevId: 307655892
2020-04-21 12:18:07 -07:00
Dean Deng 09ddb5a426 Port extended attributes to VFS2.
As in VFS1, we only support the user.* namespace. Plumbing is added to tmpfs
and goferfs.
Note that because of the slightly different order of checks between VFS2 and
Linux, one of the xattr tests needs to be relaxed slightly.

Fixes #2363.

PiperOrigin-RevId: 305985121
2020-04-10 19:02:55 -07:00
Nicolas Lacasse 10f2c8db91 Add FilesystemType.Name method, and FilesystemType field to Filesystem struct.
Both have analogues in Linux:
* struct file_system_type has a char *name field.
* struct super_block keeps a pointer to the file_system_type.

These fields are necessary to support the `filesystem type` field in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303434063
2020-03-27 16:56:16 -07:00
Dean Deng 76a7ace751 Add BoundEndpointAt filesystem operation.
BoundEndpointAt() is needed to support Unix sockets bound at a
file path, corresponding to BoundEndpoint() in VFS1.

Updates #1476.

PiperOrigin-RevId: 303258251
2020-03-26 21:52:24 -07:00
Fabricio Voznika de694e5484 Combine file mode and isDir arguments
Updates #1035

PiperOrigin-RevId: 303021328
2020-03-26 08:48:04 -07:00
Dean Deng 2e38408f20 Implement access/faccessat for VFS2.
Note that the raw faccessat system call does not actually take a flags argument;
according to faccessat(2), the glibc wrapper implements the flags by using
fstatat(2). Remove the flag argument that we try to extract from vfs1, which
would just be a garbage value.

Updates #1965
Fixes #2101

PiperOrigin-RevId: 300796067
2020-03-13 11:41:08 -07:00
gVisor bot 3557b26651 Allow vfs.IterDirentsCallback.Handle() to return an error.
This is easier than storing errors from e.g. CopyOut in the callback.

PiperOrigin-RevId: 295230021
2020-02-14 14:40:35 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot a6024f7f5f Add FileExec flag to OpenOptions
This allow callers to say whether the file is being
opened to be executed, so that the proper checks can
be done from FilesystemImpl.OpenAt()

Updates #1623

PiperOrigin-RevId: 295042595
2020-02-13 17:57:36 -08:00
Adin Scannell bb22ebd7fb Add contextual comment.
PiperOrigin-RevId: 294289066
2020-02-10 13:21:30 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00
Jamie Liu 5ab1213a6c Move VFS2 handling of FD readability/writability to vfs.FileDescription.
PiperOrigin-RevId: 291006713
2020-01-22 12:29:36 -08:00
Ian Gudger 27500d529f New sync package.
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.

This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.

Updates #1472

PiperOrigin-RevId: 289033387
2020-01-09 22:02:24 -08:00
Dean Deng 565b641483 Define sizes for extent headers and entries separately to improve clarity.
PiperOrigin-RevId: 288799694
2020-01-08 16:58:12 -08:00
Jamie Liu 1f384ac42b Add VFS2 support for device special files.
- Add FileDescriptionOptions.UseDentryMetadata, which reduces the amount of
  boilerplate needed for device FDs and the like between filesystems.

- Switch back to having FileDescription.Init() take references on the Mount and
  Dentry; otherwise managing refcounts around failed calls to
  OpenDeviceSpecialFile() / Device.Open() is tricky.

PiperOrigin-RevId: 287575574
2019-12-30 11:36:41 -08:00
Jamie Liu 796f53c0be Add VFS2 support for /proc/filesystems.
Updates #1195

PiperOrigin-RevId: 287269106
2019-12-27 00:13:54 -08:00
Jamie Liu f45df7505b Clean up vfs.FilesystemImpl methods that operate on parent directories.
- Make FilesystemImpl methods that operate on parent directories require
  !rp.Done() (i.e. there is at least one path component to resolve) as
  precondition and postcondition (in cases where they do not finish path
  resolution due to mount boundary / absolute symlink), and require that they
  do not need to follow the last path component (the file being created /
  deleted) as a symlink. Check for these in VFS.

- Add FilesystemImpl.GetParentDentryAt(), which is required to obtain the old
  parent directory for VFS.RenameAt(). (Passing the Dentry to be renamed
  instead has the wrong semantics if the file named by the old path is a mount
  point since the Dentry will be on the wrong Mount.)

- Update memfs to implement these methods correctly (?), including RenameAt.

- Change fspath.Parse() to allow empty paths (to simplify implementation of
  AT_EMPTY_PATH).

- Change vfs.PathOperation to take a fspath.Path instead of a raw pathname;
  non-test callers will need to fspath.Parse() pathnames themselves anyway in
  order to detect absolute paths and select PathOperation.Start accordingly.

PiperOrigin-RevId: 286934941
2019-12-23 13:18:39 -08:00
Jamie Liu 3eb489ed6c Move VFS2 file description status flags to vfs.FileDescription.
PiperOrigin-RevId: 286616668
2019-12-20 11:53:48 -08:00
Jamie Liu 744401297a Add VFS2 plumbing for extended attributes.
PiperOrigin-RevId: 286281274
2019-12-18 15:48:45 -08:00
Jamie Liu 481dbfa5ab Add vfs.Pathname{WithDeleted,ForGetcwd}.
The former is needed for vfs.FileDescription to implement
memmap.MappingIdentity, and the latter is needed to implement getcwd(2).

PiperOrigin-RevId: 285051855
2019-12-11 14:26:32 -08:00
Jamie Liu 46651a7d26 Add most VFS methods for syscalls.
PiperOrigin-RevId: 284892289
2019-12-10 18:21:07 -08:00
Jamie Liu b72e1b3c08 Minor VFS2 interface changes.
- Remove the Filesystem argument from DentryImpl.*Ref(); in general DentryImpls
  that need the Filesystem for reference counting will probably also need it
  for other interface methods that don't plumb Filesystem, so it's easier to
  just store a pointer to the filesystem in the DentryImpl.

- Add a pointer to the VirtualFilesystem to Filesystem, which is needed by the
  gofer client to disown dentries for cache eviction triggered by dentry
  reference count changes.

- Rename FilesystemType.NewFilesystem to GetFilesystem; in some cases (e.g.
  sysfs, cgroupfs) it's much cleaner for there to be only one Filesystem that
  is used by all mounts, and in at least one case (devtmpfs) it's visibly
  incorrect not to do so, so NewFilesystem doesn't always actually create and
  return a *new* Filesystem.

- Require callers of FileDescription.Init() to increment Mount/Dentry
  references. This is because the gofer client may, in the OpenAt() path, take
  a reference on a dentry with 0 references, which is safe due to
  synchronization that is outside the scope of this CL, and it would be safer
  to still have its implementation of DentryImpl.IncRef() check for an
  increment for 0 references in other cases.

- Add FileDescription.TryIncRef. This is used by the gofer client to take
  references on "special file descriptions" (FDs for files such as pipes,
  sockets, and devices), which use per-FD handles (fids) instead of
  dentry-shared handles, for sync() and syncfs().

PiperOrigin-RevId: 282473364
2019-11-25 18:10:31 -08:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Jamie Liu 0457a4c4cb Minor vfs.FileDescriptionImpl fixes.
- Pass context.Context to OnClose().

- Pass memmap.MMapOpts to ConfigureMMap() by pointer so that implementations
  can actually mutate it as required.

PiperOrigin-RevId: 274934967
2019-10-15 18:40:45 -07:00
Jamie Liu fb55c2bd0d Change vfs.Dirent.Off to NextOff.
"d_off is the distance from the start of the directory to the start of the next
linux_dirent." - getdents(2).

PiperOrigin-RevId: 270349685
2019-09-20 14:24:29 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Adin Scannell 67a2ab1438 Impose order on test scripts.
The simple test script has gotten out of control. Shard this script into
different pieces and attempt to impose order on overall test structure. This
change helps lay some of the foundations for future improvements.

 * The runsc/test directories are moved into just test/.
 * The runsc/test/testutil package is split into logical pieces.
 * The scripts/ directory contains new top-level targets.
 * Each test is now responsible for building targets it requires.
 * The install functionality is moved into `runsc` itself for simplicity.
 * The existing kokoro run_tests.sh file now just calls all (can be split).

After this change is merged,  I will create multiple distinct workflows for
Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today,
which should dramatically reduce the time-to-run for the Kokoro tests, and
provides a better foundation for further improvements to the infrastructure.

PiperOrigin-RevId: 267081397
2019-09-03 22:02:43 -07:00
Ayush Ranjan 4bab7d7f08 vfs: Remove vfs.DefaultDirectoryFD from embedding vfs.DefaultFD.
This fixes the implementation ambiguity issues when a filesystem
implementation embeds vfs.DefaultDirectoryFD to its directory FD along
with an internal common fileDescription utility.

For similar reasons also removes FileDescriptionDefaultImpl from
DynamicBytesFileDescriptionImpl.

PiperOrigin-RevId: 263795513
2019-08-16 10:20:11 -07:00
Ayush Ranjan c8961a6cbd ext: Move to pkg/sentry/fsimpl.
fsimpl is the keeper of all filesystem implementations in VFS2.

PiperOrigin-RevId: 262617869
2019-08-09 13:08:28 -07:00