Commit Graph

79 Commits

Author SHA1 Message Date
Jamie Liu 9115f26851 Allocate device numbers for VFS2 filesystems.
Updates #1197, #1198, #1672

PiperOrigin-RevId: 310432006
2020-05-07 14:01:53 -07:00
Nicolas Lacasse d0b1d0233d Move pkg/sentry/vfs/{eventfd,timerfd} to new packages in pkg/sentry/fsimpl.
They don't depend on anything in VFS2, so they should be their own packages.

PiperOrigin-RevId: 310416807
2020-05-07 12:44:03 -07:00
Nicolas Lacasse 26c60d7d5d Port signalfd to vfs2.
PiperOrigin-RevId: 310404113
2020-05-07 11:41:50 -07:00
Dean Deng e0089a20e4 Remove outdated TODO for VFS2 AccessAt.
Fixes #1965.

PiperOrigin-RevId: 310380433
2020-05-07 09:53:52 -07:00
Jamie Liu 7cd54c1f14 Remove vfs.FileDescriptionOptions.InvalidWrite.
Compare:
https://elixir.bootlin.com/linux/v5.6/source/fs/timerfd.c#L431
PiperOrigin-RevId: 310246908
2020-05-06 16:08:12 -07:00
Nicolas Lacasse da71dc7fdd Port eventfd to VFS2.
And move sys_timerfd.go to just timerfd.go for consistency.

Updates #1475.

PiperOrigin-RevId: 309835029
2020-05-04 16:02:07 -07:00
Fabricio Voznika cbc5bef2a6 Add TTY support on VFS2 to runsc
Updates #1623, #1487

PiperOrigin-RevId: 309777922
2020-05-04 10:59:20 -07:00
Dean Deng ce19497c1c Fix Unix socket permissions.
Enforce write permission checks in BoundEndpointAt, which corresponds to the
permission checks in Linux (net/unix/af_unix.c:unix_find_other).
Also, create bound socket files with the correct permissions in VFS2.

Fixes #2324.

PiperOrigin-RevId: 308949084
2020-04-28 20:13:01 -07:00
Dean Deng f3ca5ca82a Support pipes and sockets in VFS2 gofer fs.
Named pipes and sockets can be represented in two ways in gofer fs:
1. As a file on the remote filesystem. In this case, all file operations are
   passed through 9p.
2. As a synthetic file that is internal to the sandbox. In this case, the
   dentry stores an endpoint or VFSPipe for sockets and pipes respectively,
   which replaces interactions with the remote fs through the gofer.
In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p,
and if it fails, fall back to the synthetic version.

Updates #1200.

PiperOrigin-RevId: 308828161
2020-04-28 08:34:00 -07:00
Fabricio Voznika 2cc0fd42f4 Fixes for procfs
- Return ENOENT for /proc/[pid]/task if task is zoombied or terminated
- Allow directory to be Seek() to the end
- Construct synthetic files for /proc/[pid]/ns/*
- Changed GenericDirectoryFD.Init to not register with FileDescription,
  otherwise other implementation cannot change behavior.

Updates #1195,1193

PiperOrigin-RevId: 308294649
2020-04-24 11:45:19 -07:00
Jamie Liu 5042ea7e2c Add vfs.MkdirOptions.ForSyntheticMountpoint.
PiperOrigin-RevId: 308143529
2020-04-23 15:37:10 -07:00
Fabricio Voznika 37e01fd2ea Misc VFS2 fixes
- Fix defer operation ordering in kernfs.Filesystem.AccessAt()
- Add AT_NULL entry in proc/pid/auvx
- Fix line padding in /proc/pid/maps
- Fix linux_dirent serialization for getdents(2)
- Remove file creation flags from vfs.FileDescription.statusFlags()

Updates #1193, #1035

PiperOrigin-RevId: 307704159
2020-04-21 16:31:53 -07:00
Jamie Liu 9b5e305e05 Remove filesystem structure from vfs.Dentry.
This change:

- Drastically simplifies the synchronization model: filesystem structure is
  both implementation-defined and implementation-synchronized.

- Allows implementations of vfs.DentryImpl to use implementation-specific
  dentry types, reducing casts during path traversal.

- Doesn't require dentries representing non-directory files to waste space on a
  map of children.

- Allows dentry revalidation and mount lookup to be correctly ordered (fixed
  FIXME in fsimpl/gofer/filesystem.go).

- Removes the need to have two separate maps in gofer.dentry
  (dentry.vfsd.children and dentry.negativeChildren) for positive and negative
  lookups respectively.

//pkg/sentry/fsimpl/tmpfs/benchmark_test.go:
name                        old time/op  new time/op  delta
VFS2TmpfsStat/1-112          172ns ± 4%   165ns ± 3%   -4.08%  (p=0.002 n=9+9)
VFS2TmpfsStat/2-112          199ns ± 3%   195ns ±10%     ~     (p=0.132 n=8+9)
VFS2TmpfsStat/3-112          230ns ± 2%   216ns ± 2%   -6.15%  (p=0.000 n=8+8)
VFS2TmpfsStat/8-112          390ns ± 2%   358ns ± 4%   -8.33%  (p=0.000 n=9+8)
VFS2TmpfsStat/64-112        2.20µs ± 3%  2.01µs ± 3%   -8.48%  (p=0.000 n=10+8)
VFS2TmpfsStat/100-112       3.42µs ± 9%  3.08µs ± 2%   -9.82%  (p=0.000 n=9+8)
VFS2TmpfsMountStat/1-112     278ns ± 1%   286ns ±15%     ~     (p=0.712 n=8+10)
VFS2TmpfsMountStat/2-112     311ns ± 4%   298ns ± 2%   -4.27%  (p=0.000 n=9+8)
VFS2TmpfsMountStat/3-112     339ns ± 3%   330ns ± 9%     ~     (p=0.070 n=8+9)
VFS2TmpfsMountStat/8-112     503ns ± 3%   466ns ± 3%   -7.38%  (p=0.000 n=8+8)
VFS2TmpfsMountStat/64-112   2.53µs ±16%  2.17µs ± 7%  -14.19%  (p=0.000 n=10+9)
VFS2TmpfsMountStat/100-112  3.60µs ± 4%  3.30µs ± 8%   -8.33%  (p=0.001 n=8+9)

Updates #1035

PiperOrigin-RevId: 307655892
2020-04-21 12:18:07 -07:00
Jamie Liu f03996c5e9 Implement pipe(2) and pipe2(2) for VFS2.
Updates #1035

PiperOrigin-RevId: 306968644
2020-04-16 19:27:03 -07:00
Jamie Liu 52b4b19249 Pass O_LARGEFILE in syscalls/linux/vfs2.openat.
Needed for PipeTest_Flags: files opened by open() and openat() get O_LARGEFILE
(on architectures with 64-bit off_t), but not FDs created by other syscalls
such as pipe().

Updates #1035

PiperOrigin-RevId: 306504788
2020-04-14 13:37:51 -07:00
Dean Deng 09ddb5a426 Port extended attributes to VFS2.
As in VFS1, we only support the user.* namespace. Plumbing is added to tmpfs
and goferfs.
Note that because of the slightly different order of checks between VFS2 and
Linux, one of the xattr tests needs to be relaxed slightly.

Fixes #2363.

PiperOrigin-RevId: 305985121
2020-04-10 19:02:55 -07:00
Fabricio Voznika 6dd5a1f3fe Clean up TODOs
PiperOrigin-RevId: 305592245
2020-04-08 17:58:13 -07:00
Nicolas Lacasse f332a864e8 Port timerfd to VFS2.
PiperOrigin-RevId: 305067208
2020-04-06 10:52:56 -07:00
Dean Deng 24bee1c181 Record VFS2 sockets in global socket map.
Updates #1476, #1478, #1484, #1485.

PiperOrigin-RevId: 304845354
2020-04-04 21:02:42 -07:00
Dean Deng 5818663ebe Add FileDescriptionImpl for Unix sockets.
This change involves several steps:
- Refactor the VFS1 unix socket implementation to share methods between VFS1
  and VFS2 where possible. Re-implement the rest.
- Override the default PRead, Read, PWrite, Write, Ioctl, Release methods in
  FileDescriptionDefaultImpl.
- Add functions to create and initialize a new Dentry/Inode and FileDescription
  for a Unix socket file.

Updates #1476

PiperOrigin-RevId: 304689796
2020-04-03 14:08:54 -07:00
Dean Deng 5b2396d244 Fix typo in TODO comments.
PiperOrigin-RevId: 304508083
2020-04-02 17:07:13 -07:00
Jamie Liu dbc507dc5c Add equivalents to FMODE_PREAD/PWRITE to VFS2.
This is mostly required for PipeTest_OffsetCalls.

The options are DenyPRead/PWrite rather than AllowPRead/PWrite since, in Linux
terms, fs/open.c:do_dentry_open sets FMODE_PREAD|FMODE_PWRITE unconditionally
(although it allows filesystem implementations of open to unset these flags),
so they're set for most FDs; it's usually FDs created outside of open(2) that
don't get them, e.g.:

- Syscall-created pipes (fs/pipe.c:create_pipe_files =>
  fs/file_table.c:alloc_file_pseudo)

- Epoll instances (fs/eventpoll.c:do_epoll_create =>
  fs/anon_inodes.c:anon_inode_getfile => alloc_file_pseudo)

- Sockets (net/socket.c:sock_alloc_file => alloc_file_pseudo)

This CL adds the flags to epoll instances; a subsequent CL reworks the VFS2
implementation of pipe FDs to be filesystem-independent and adds the flags
there, and sockets aren't implemented yet.

Updates #1035

PiperOrigin-RevId: 304506434
2020-04-02 16:58:24 -07:00
Nicolas Lacasse e1c8eaca8f Fix /proc/self/mounts and /proc/self/mountinfo in VFS2.
Some extra fields were added to the Mount type to expose necessary data to the
proc filesystem.

PiperOrigin-RevId: 304053361
2020-03-31 15:07:26 -07:00
Jamie Liu f6e4daa67a Add vfs.PathnameReachable().
/proc/[pid]/mount* omit mounts whose mount point is outside the chroot, which
is checked (indirectly) via __d_path().

PiperOrigin-RevId: 303434226
2020-03-27 16:57:14 -07:00
Nicolas Lacasse 10f2c8db91 Add FilesystemType.Name method, and FilesystemType field to Filesystem struct.
Both have analogues in Linux:
* struct file_system_type has a char *name field.
* struct super_block keeps a pointer to the file_system_type.

These fields are necessary to support the `filesystem type` field in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303434063
2020-03-27 16:56:16 -07:00
Dean Deng 76a7ace751 Add BoundEndpointAt filesystem operation.
BoundEndpointAt() is needed to support Unix sockets bound at a
file path, corresponding to BoundEndpoint() in VFS1.

Updates #1476.

PiperOrigin-RevId: 303258251
2020-03-26 21:52:24 -07:00
Nicolas Lacasse e466ab04a2 Add unique ID to Mount type.
Analagous to Linux's mount.mnt_id. This ID is displayed in
/proc/[pid]/mountinfo.

PiperOrigin-RevId: 303185564
2020-03-26 13:49:59 -07:00
Fabricio Voznika de694e5484 Combine file mode and isDir arguments
Updates #1035

PiperOrigin-RevId: 303021328
2020-03-26 08:48:04 -07:00
Fabricio Voznika f2eba94015 Remove TODO to push down exec permission check
Pushing it down requires all implementation to check for
exec individualy which is not maintanable. Making it part
of GenericCheckPermissions add extra cost to everyone that
calls it. So it's better to keep is in
VirtualFilesystem.OpenAt.

Updates #1193

PiperOrigin-RevId: 302982993
2020-03-25 15:57:37 -07:00
Fabricio Voznika e541ebec2f Misc fixes to make stat_test pass (almost)
The only test failing now requires socket which is not
available in VFS2 yet.

Updates #1198

PiperOrigin-RevId: 302976572
2020-03-25 14:59:15 -07:00
Fabricio Voznika 2a6c4369be Enforce file size rlimits in VFS2
Updates #1035

PiperOrigin-RevId: 301255357
2020-03-16 16:00:49 -07:00
Fabricio Voznika 0f60799a4f Add calls to vfs.CheckSetStat to fsimpls
Only gofer filesystem was calling vfs.CheckSetStat for
vfs.FilesystemImpl.SetStatAt and vfs.FileDescriptionImpl.SetStat.

Updates #1193, #1672, #1197

PiperOrigin-RevId: 301226522
2020-03-16 13:29:12 -07:00
Dean Deng 2e38408f20 Implement access/faccessat for VFS2.
Note that the raw faccessat system call does not actually take a flags argument;
according to faccessat(2), the glibc wrapper implements the flags by using
fstatat(2). Remove the flag argument that we try to extract from vfs1, which
would just be a garbage value.

Updates #1965
Fixes #2101

PiperOrigin-RevId: 300796067
2020-03-13 11:41:08 -07:00
Fabricio Voznika 8f8f16efaf Add support for mount flags
Plumbs MS_NOEXEC and MS_RDONLY. Others are TODO.

Updates #1623 #1193

PiperOrigin-RevId: 300764669
2020-03-13 08:58:04 -07:00
Dean Deng 960f6a975b Add plumbing for importing fds in VFS2, along with non-socket, non-TTY impl.
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.

An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.

This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.

Updates #1672

PiperOrigin-RevId: 299417263
2020-03-06 12:59:49 -08:00
Jamie Liu a92087f0f8 Add VFS.NewDisconnectedMount().
Analogous to Linux's kern_mount().

PiperOrigin-RevId: 297259580
2020-02-25 19:13:30 -08:00
Jamie Liu 471b15b212 Port most syscalls to VFS2.
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.

Updates #1623

PiperOrigin-RevId: 297188448
2020-02-25 13:37:34 -08:00
gVisor bot a5069f820f Remove linux.EpollEvent.Fd.
glibc defines struct epoll_event in such a way that epoll_event.data.fd exists.
However, the kernel's definition of struct epoll_event makes epoll_event.data
an opaque uint64, so naming half of it "fd" just introduces confusion. Remove
the Fd field, and make Data a [2]int32 to compensate.

Also add required padding to linux.EpollEvent on ARM64.

PiperOrigin-RevId: 295250424
2020-02-14 16:19:48 -08:00
gVisor bot 3557b26651 Allow vfs.IterDirentsCallback.Handle() to return an error.
This is easier than storing errors from e.g. CopyOut in the callback.

PiperOrigin-RevId: 295230021
2020-02-14 14:40:35 -08:00
gVisor bot e4c7f3e6f6 Inline vfs.VirtualFilesystem in Kernel struct
This saves one pointer dereference per VFS access.

Updates #1623

PiperOrigin-RevId: 295216176
2020-02-14 13:40:39 -08:00
gVisor bot 4075de11be Plumb VFS2 inside the Sentry
- Added fsbridge package with interface that can be used to open
  and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup

Updates #1623

PiperOrigin-RevId: 295183950
2020-02-14 11:12:47 -08:00
gVisor bot a6024f7f5f Add FileExec flag to OpenOptions
This allow callers to say whether the file is being
opened to be executed, so that the proper checks can
be done from FilesystemImpl.OpenAt()

Updates #1623

PiperOrigin-RevId: 295042595
2020-02-13 17:57:36 -08:00
Fabricio Voznika dcffddf0ca Remove argument from vfs.MountNamespace.DecRef()
Updates #1035

PiperOrigin-RevId: 293194631
2020-02-04 11:48:36 -08:00
Dean Deng 6c3072243d Implement file locks for regular tmpfs files in VFSv2.
Add a file lock implementation that can be embedded into various filesystem
implementations.

Updates #1480

PiperOrigin-RevId: 292614758
2020-01-31 14:15:41 -08:00
Dean Deng 148fda60e8 Add plumbing for file locks in VFS2.
Updates #1480

PiperOrigin-RevId: 292180192
2020-01-29 11:39:28 -08:00
Fabricio Voznika 396c574db2 Add support for WritableSource in DynamicBytesFileDescriptionImpl
WritableSource is a convenience interface used for files that can
be written to, e.g. /proc/net/ipv4/tpc_sack. It reads max of 4KB
and only from offset 0 which should cover most cases. It can be
extended as neeed.

Updates #1195

PiperOrigin-RevId: 292056924
2020-01-28 18:31:28 -08:00
Jamie Liu 34fbd8446c Add VFS2 support for epoll.
PiperOrigin-RevId: 291997879
2020-01-28 13:11:43 -08:00
Jamie Liu 1119644080 Implement an anon_inode equivalent for VFS2.
PiperOrigin-RevId: 291986033
2020-01-28 12:08:00 -08:00
Adin Scannell 0e2f1b7abd Update package locations.
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.

PiperOrigin-RevId: 291811289
2020-01-27 15:31:32 -08:00
Adin Scannell d29e59af9f Standardize on tools directory.
PiperOrigin-RevId: 291745021
2020-01-27 12:21:00 -08:00