Commit Graph

225 Commits

Author SHA1 Message Date
Brian Geffon 2b8dae0bc5 Open(2) isn't honoring O_NOFOLLOW
PiperOrigin-RevId: 211644897
Change-Id: I882ed827a477d6c03576463ca5bf2d6351892b90
2018-09-05 09:21:28 -07:00
Michael Pratt 3944cb41cb /proc/PID/mounts is not tab-delimited
PiperOrigin-RevId: 211513847
Change-Id: Ib484dd2d921c3e5d70d0e410cd973d3bff4f6b73
2018-09-04 13:29:49 -07:00
Adin Scannell c09f9acd7c Distinguish Element and Linker for ilist.
Furthermore, allow for the specification of an ElementMapper. This allows a
single "Element" type to exist on multiple inline lists, and work without
having to embed the entry type.

This is a requisite change for supporting a per-Inode list of Dirents.

PiperOrigin-RevId: 211467497
Change-Id: If2768999b43e03fdaecf8ed15f435fe37518d163
2018-09-04 09:19:11 -07:00
Jamie Liu f8ccfbbed4 Document more task-goroutine-owned fields in kernel.Task.
Task.creds can only be changed by the task's own set*id and execve
syscalls, and Task namespaces can only be changed by the task's own
unshare/setns syscalls.

PiperOrigin-RevId: 211156279
Change-Id: I94d57105d34e8739d964400995a8a5d76306b2a0
2018-08-31 15:44:40 -07:00
Jamie Liu b935311e23 Do not use fs.FileOwnerFromContext in fs/proc.file.UnstableAttr().
From //pkg/sentry/context/context.go:

// - It is *not safe* to retain a Context passed to a function beyond the scope
// of that function call.

Passing a stored kernel.Task as a context.Context to
fs.FileOwnerFromContext violates this requirement.

PiperOrigin-RevId: 211143021
Change-Id: I4c5b02bd941407be4c9cfdbcbdfe5a26acaec037
2018-08-31 14:17:56 -07:00
Jamie Liu 098046ba19 Disintegrate kernel.TaskResources.
This allows us to call kernel.FDMap.DecRef without holding mutexes
cleanly.

PiperOrigin-RevId: 211139657
Change-Id: Ie59d5210fb9282e1950e2e40323df7264a01bcec
2018-08-31 13:58:04 -07:00
Jamie Liu b1c1afa3cc Delete the long-obsolete kernel.TaskMaybe interface.
PiperOrigin-RevId: 211131855
Change-Id: Ia7799561ccd65d16269e0ae6f408ab53749bca37
2018-08-31 13:07:34 -07:00
Nicolas Lacasse 8bfb5fa919 fs: Add empty dir at /sys/class/power_supply.
PiperOrigin-RevId: 210953512
Change-Id: I07d2d7fb0d268aa8eca26d81ef28b5b5c42289ee
2018-08-30 12:01:27 -07:00
Nicolas Lacasse 956fe64ad6 fs: Fix renameMu lock recursion.
dirent.walk() takes renameMu, but is often called with renameMu already held,
which can lead to a deadlock.

Fix this by requiring renameMu to be held for reading when dirent.walk() is
called. This causes walks and existence checks to block while a rename
operation takes place, but that is what we were already trying to enforce by
taking renameMu in walk() anyways.

PiperOrigin-RevId: 210760780
Change-Id: Id61018e6e4adbeac53b9c1b3aa24ab77f75d8a54
2018-08-29 11:47:01 -07:00
Nicolas Lacasse 1893247616 fs: Drop reference to over-written file before renaming over it.
dirent.go:Rename() walks to the file being replaced and defers
replaced.DecRef(). After the rename, the reference is dropped, triggering a
writeout and SettAttr call to the gofer. Because of lazyOpenForWrite, the gofer
opens the replaced file BY ITS OLD NAME and calls ftruncate on it.

This CL changes Remove to drop the reference on replaced (and thus trigger
writeout) before the actual rename call.

PiperOrigin-RevId: 210756097
Change-Id: I01ea09a5ee6c2e2d464560362f09943641638e0f
2018-08-29 11:22:27 -07:00
Ian Gudger 52e6714146 fasync: don't keep mutex after return
PiperOrigin-RevId: 210637533
Change-Id: I3536c3f9efb54732a0d8ada8bc299142b2c1682f
2018-08-28 17:26:26 -07:00
Nicolas Lacasse 3b11769c77 fs: Don't bother saving negative dirents.
PiperOrigin-RevId: 210616454
Change-Id: I3f536e2b4d603e540cdd9a67c61b8ec3351f4ac3
2018-08-28 15:18:42 -07:00
Nicolas Lacasse 515d9bf43b fs: Add tests for dirent ref counting with an overlay.
PiperOrigin-RevId: 210614669
Change-Id: I408365ff6d6c7765ed7b789446d30e7079cbfc67
2018-08-28 15:09:17 -07:00
Zhaozhong Ni d724863a31 sentry: optimize dirent weakref map save / restore.
Weak references save / restore involves multiple interface indirection
and cause material latency overhead when there are lots of dirents, each
containing a weak reference map. The nil entries in the map should also
be purged.

PiperOrigin-RevId: 210593727
Change-Id: Ied6f4c3c0726fcc53a24b983d9b3a79121b6b758
2018-08-28 13:22:07 -07:00
Michael Pratt 25a8e13a78 Bump to Go 1.11
The procid offset is unchanged.

PiperOrigin-RevId: 210551969
Change-Id: I33ba1ce56c2f5631b712417d870aa65ef24e6022
2018-08-28 09:22:41 -07:00
Fabricio Voznika ae648bafda Add command-line parameter to trigger panic on signal
This is to troubleshoot problems with a hung process that is
not responding to 'runsc debug --stack' command.

PiperOrigin-RevId: 210483513
Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
2018-08-27 20:36:10 -07:00
Brian Geffon f0492d45aa Add /proc/sys/kernel/shm[all,max,mni].
PiperOrigin-RevId: 210459956
Change-Id: I51859b90fa967631e0a54a390abc3b5541fbee66
2018-08-27 17:21:37 -07:00
Nicolas Lacasse 0b3bfe2ea3 fs: Fix remote-revalidate cache policy.
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.

If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.

Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.

PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
2018-08-27 14:26:29 -07:00
Zhaozhong Ni bd01816c87 sentry: mark fsutil.DirFileOperations as savable.
PiperOrigin-RevId: 210405166
Change-Id: I252766015885c418e914007baf2fc058fec39b3e
2018-08-27 11:55:32 -07:00
Kevin Krakauer 2524111fc6 runsc: Terminal resizing support.
Implements the TIOCGWINSZ and TIOCSWINSZ ioctls, which allow processes to resize
the terminal. This allows, for example, sshd to properly set the window size for
ssh sessions.

PiperOrigin-RevId: 210392504
Change-Id: I0d4789154d6d22f02509b31d71392e13ee4a50ba
2018-08-27 10:49:16 -07:00
Nicolas Lacasse 106de2182d runsc: Terminal support for "docker exec -ti".
This CL adds terminal support for "docker exec".  We previously only supported
consoles for the container process, but not exec processes.

The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.

Note that control-character signals are still not properly supported.

Tested with:
	$ docker run --runtime=runsc -it alpine
In another terminial:
	$ docker exec -it <containerid> /bin/sh

PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24 17:43:21 -07:00
Nicolas Lacasse c48708a041 fs: Drop unused WaitGroup in Dirent.destroy.
PiperOrigin-RevId: 210182476
Change-Id: I655a2a801e2069108d30323f7f5ae76deb3ea3ec
2018-08-24 17:15:42 -07:00
Jamie Liu 64403265a0 Implement POSIX per-process interval timers.
PiperOrigin-RevId: 210021612
Change-Id: If7c161e6fd08cf17942bfb6bc5a8d2c4e271c61e
2018-08-23 16:32:36 -07:00
Zhaozhong Ni ba8f6ba8c8 sentry: mark idMapSeqHandle as savable.
PiperOrigin-RevId: 209994384
Change-Id: I16186cf79cb4760a134f3968db30c168a5f4340e
2018-08-23 13:59:20 -07:00
Adin Scannell a7a8d07d7d Add separate Recycle method for allocator.
This improves debugging for pagetable-related issues.

PiperOrigin-RevId: 209827795
Change-Id: I4cfa11664b0b52f26f6bc90a14c5bb106f01e038
2018-08-22 14:16:04 -07:00
Zhaozhong Ni 6b9133ba96 sentry: mark S/R stating errors as save rejections / fs corruptions.
PiperOrigin-RevId: 209817767
Change-Id: Iddf2b8441bc44f31f9a8cf6f2bd8e7a5b824b487
2018-08-22 13:19:16 -07:00
Brian Geffon 545ea7ab3f Always add AT_BASE even if there is no interpreter.
Linux will ALWAYS add AT_BASE even for a static binary, expect it
will be set to 0 [1].

1. https://github.com/torvalds/linux/blob/master/fs/binfmt_elf.c#L253

PiperOrigin-RevId: 209811129
Change-Id: I92cc66532f23d40f24414a921c030bd3481e12a0
2018-08-22 12:37:09 -07:00
Nicolas Lacasse 8d318aac55 fs: Hold Dirent.mu when calling Dirent.flush().
As required by the contract in Dirent.flush().

Also inline Dirent.freeze() into Dirent.Freeze(), since it is only called from
there.

PiperOrigin-RevId: 209783626
Change-Id: Ie6de4533d93dd299ffa01dabfa257c9cc259b1f4
2018-08-22 10:07:01 -07:00
Zhaozhong Ni 8bb50dab79 sentry: do not release gofer inode file state loading lock upon error.
When an inode file state failed to load asynchronuously, we want to report
the error instead of potentially panicing in another async loading goroutine
incorrectly unblocked.

PiperOrigin-RevId: 209683977
Change-Id: I591cde97710bbe3cdc53717ee58f1d28bbda9261
2018-08-21 16:52:27 -07:00
Ian Gudger 9c407382b0 Fix races in kernel.(*Task).Value()
PiperOrigin-RevId: 209627180
Change-Id: Idc84afd38003427e411df6e75abfabd9174174e1
2018-08-21 11:16:17 -07:00
Ian Gudger 47d5a12ce5 Fix handling of abstract Unix socket addresses
* Don't truncate abstract addresses at second null.
* Properly handle abstract addresses with length < 108 bytes.

PiperOrigin-RevId: 209502703
Change-Id: I49053f2d18b5a78208c3f640c27dbbdaece4f1a9
2018-08-20 16:12:23 -07:00
Nicolas Lacasse 1501400d9c getdents should return type=DT_DIR for SpecialDirectories.
It was returning DT_UNKNOWN, and this was breaking numpy.

PiperOrigin-RevId: 209459351
Change-Id: Ic6f548e23aa9c551b2032b92636cb5f0df9ccbd4
2018-08-20 11:59:58 -07:00
Nicolas Lacasse 0050e3e71c sysfs: Add (empty) cpu directories for each cpu in /sys/devices/system/cpu.
Numpy needs these.

Also added the "present" directory, since the contents are the same as possible
and online.

PiperOrigin-RevId: 209451777
Change-Id: I2048de3f57bf1c57e9b5421d607ca89c2a173684
2018-08-20 11:19:15 -07:00
Chenggang Qin aeec7a4c00 fs: Support possible and online knobs for cpu
Some linux commands depend on /sys/devices/system/cpu/possible, such
as 'lscpu'.

Add 2 knobs for cpu:
/sys/devices/system/cpu/possible
/sys/devices/system/cpu/online
Both the values are '0 - Kernel.ApplicationCores()-1'.

Change-Id: Iabd8a4e559cbb630ed249686b92c22b4e7120663
PiperOrigin-RevId: 209070163
2018-08-16 16:28:14 -07:00
Kevin Krakauer 635b0c4593 runsc fsgofer: Support dynamic serving of filesystems.
When multiple containers run inside a sentry, each container has its own root
filesystem and set of mounts. Containers are also added after sentry boot rather
than all configured and known at boot time.

The fsgofer needs to be able to serve the root filesystem of each container.
Thus, it must be possible to add filesystems after the fsgofer has already
started.

This change:
* Creates a URPC endpoint within the gofer process that listens for requests to
  serve new content.
* Enables the sentry, when starting a new container, to add the new container's
  filesystem.
* Mounts those new filesystems at separate roots within the sentry.

PiperOrigin-RevId: 208903248
Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15 16:25:22 -07:00
Ian Gudger a620bea045 Reduce map lookups in syserr
PiperOrigin-RevId: 208755352
Change-Id: Ia24630f452a4a42940ab73a8113a2fd5ea2cfca2
2018-08-14 19:03:38 -07:00
Nicolas Lacasse e8a4f2e133 runsc: Change cache policy for root fs and volume mounts.
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively.  While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.

This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.

This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.

A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.

All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.

PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-14 16:25:58 -07:00
Kevin Krakauer d4939f6dc2 TTY: Fix data race where calls into tty.queue's waiter were not synchronized.
Now, there's a waiter for each end (master and slave) of the TTY, and each
waiter.Entry is only enqueued in one of the waiters.

PiperOrigin-RevId: 208734483
Change-Id: I06996148f123075f8dd48cde5a553e2be74c6dce
2018-08-14 16:22:56 -07:00
Kevin Krakauer 12a4912aed Fix `ls -laR | wc -l` hanging.
stat()-ing /proc/PID/fd/FD incremented but didn't decrement the refcount for
FD. This behavior wasn't usually noticeable, but in the above case:

- ls would never decrement the refcount of the write end of the pipe to 0.
- This caused the write end of the pipe never to close.
- wc would then hang read()-ing from the pipe.

PiperOrigin-RevId: 208728817
Change-Id: I4fca1ba5ca24e4108915a1d30b41dc63da40604d
2018-08-14 15:49:58 -07:00
Ian Gudger e97717e29a Enforce Unix socket address length limit
PiperOrigin-RevId: 208720936
Change-Id: Ic943a88b6efeff49574306d4d4e1f113116ae32e
2018-08-14 15:07:05 -07:00
Nicolas Lacasse 6cf2278167 Automated rollback of changelist 208284483
PiperOrigin-RevId: 208685417
Change-Id: Ie2849c4811e3a2d14a002f521cef018ded0c6c4a
2018-08-14 11:50:49 -07:00
Nicolas Lacasse 66b0f3e15a Fix bind() on overlays.
InodeOperations.Bind now returns a Dirent which will be cached in the Dirent
tree.

When an overlay is in-use, Bind cannot return the Dirent created by the upper
filesystem because the Dirent does not know about the overlay. Instead,
overlayBind must create a new overlay-aware Inode and Dirent and return that.
This is analagous to how Lookup and overlayLookup work.

PiperOrigin-RevId: 208670710
Change-Id: I6390affbcf94c38656b4b458e248739b4853da29
2018-08-14 10:34:56 -07:00
Adin Scannell dde836a918 Prevent renames across walk fast path.
PiperOrigin-RevId: 208533436
Change-Id: Ifc1a4e2d6438a424650bee831c301b1ac0d670a3
2018-08-13 13:31:18 -07:00
Nicolas Lacasse a2ec391dfb fs: Allow overlays to revalidate files from the upper fs.
Previously, an overlay would panic if either the upper or lower fs required
revalidation for a given Dirent. Now, we allow revalidation from the upper
file, but not the lower.

If a cached overlay inode does need revalidation (because the upper needs
revalidation), then the entire overlay Inode will be discarded and a new
overlay Inode will be built with a fresh copy of the upper file.

As a side effect of this change, Revalidate must take an Inode instead of a
Dirent, since an overlay needs to revalidate individual Inodes.

PiperOrigin-RevId: 208293638
Change-Id: Ic8f8d1ffdc09114721745661a09522b54420c5f1
2018-08-10 17:16:38 -07:00
Justine Olshan ae6f092fe1 Implemented the splice(2) syscall.
Currently the implementation matches the behavior of moving data
between two file descriptors. However, it does not implement this
through zero-copy movement. Thus, this code is a starting point
to build the more complex implementation.

PiperOrigin-RevId: 208284483
Change-Id: Ibde79520a3d50bc26aead7ad4f128d2be31db14e
2018-08-10 16:11:01 -07:00
Nicolas Lacasse 567c5eed11 cache policy: Check policy before returning a negative dirent.
The cache policy determines whether Lookup should return a negative dirent, or
just ENOENT. This CL fixes one spot where we returned a negative dirent without
first consulting the policy.

PiperOrigin-RevId: 208280230
Change-Id: I8f963bbdb45a95a74ad0ecc1eef47eff2092d3a4
2018-08-10 15:43:03 -07:00
Brielle Broder 4ececd8e8d Enable checkpoint/restore in cases of UDS use.
Previously, processes which used file-system Unix Domain Sockets could not be
checkpoint-ed in runsc because the sockets were saved with their inode
numbers which do not necessarily remain the same upon restore. Now,
the sockets are also saved with their paths so that the new inodes
can be determined for the sockets based on these paths after restoring.
Tests for cases with UDS use are included. Test cleanup to come.

PiperOrigin-RevId: 208268781
Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec
2018-08-10 14:33:20 -07:00
Neel Natu d5b702b64f Validate FS.base before establishing it in the task's register set.
PiperOrigin-RevId: 208229341
Change-Id: I5d84bc52bbafa073446ef497e56958d0d7955aa8
2018-08-10 10:27:09 -07:00
Michael Pratt 2e06b23aa6 Fix missing O_LARGEFILE from O_CREAT files
Cleanup some more syscall.O_* references while we're here.

PiperOrigin-RevId: 208133460
Change-Id: I48db71a38f817e4f4673977eafcc0e3874eb9a25
2018-08-09 16:50:37 -07:00
Fabricio Voznika 4e171f7590 Basic support for ip link/addr and ifconfig
Closes #94

PiperOrigin-RevId: 207997580
Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924
2018-08-08 22:39:58 -07:00