Commit Graph

194 Commits

Author SHA1 Message Date
Jamie Liu 9471c01348 Export kernel.SignalInfoPriv.
Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks.

PiperOrigin-RevId: 242562428
Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90
2019-04-08 16:32:11 -07:00
Nicolas Lacasse 70906f1d24 Intermediate ram fs dirs should be writable.
We construct a ramfs tree of "scaffolding" directories for all mount points, so
that a directory exists that each mount point can be mounted over.

We were creating these directories without write permissions, which meant that
they were not wribable even when underlayed under a writable filesystem. They
should be writable.

PiperOrigin-RevId: 242507789
Change-Id: I86645e35417560d862442ff5962da211dbe9b731
2019-04-08 11:56:38 -07:00
Nicolas Lacasse ee7e6d33b2 Use string type for extended attribute values, instead of []byte.
Strings are a better fit for this usage because they are immutable in Go, and
can contain arbitrary bytes. It also allows us to avoid casting bytes to string
(and the associated allocation) in the hot path when checking for overlay
whiteouts.

PiperOrigin-RevId: 242208856
Change-Id: I7699ae6302492eca71787dd0b72e0a5a217a3db2
2019-04-05 15:49:39 -07:00
Andrei Vagin 88409e983c gvisor: Add support for the MS_NOEXEC mount option
https://github.com/google/gvisor/issues/145

PiperOrigin-RevId: 242044115
Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878
2019-04-04 17:43:53 -07:00
Nicolas Lacasse 61d8c361c6 Don't release d.mu in checks for child-existence.
Dirent.exists() is called in Create to check whether a child with the given
name already exists.

Dirent.exists() calls walk(), and before this CL allowed walk() to drop d.mu
while calling d.Inode.Lookup. During this existence check, a racing Rename()
can acquire d.mu and create a new child of the dirent with the same name.
(Note that the source and destination of the rename must be in the same
directory, otherwise renameMu will be taken preventing the race.) In this
case, d.exists() can return false, even though a child with the same name
actually does exist.

This CL changes d.exists() so that it does not release d.mu while walking, thus
preventing the race with Rename.

It also adds comments noting that lockForRename may not take renameMu if the
source and destination are in the same directory, as this is a bit surprising
(at least it was to me).

PiperOrigin-RevId: 241842579
Change-Id: I56524870e39dfcd18cab82054eb3088846c34813
2019-04-03 17:53:56 -07:00
Kevin Krakauer 82529becae Fix index out of bounds in tty implementation.
The previous implementation revolved around runes instead of bytes, which caused
weird behavior when converting between the two. For example, peekRune would read
the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an
alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for
?. However, peekRune also returned the length of the byte (1). When calling
utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte
character ?.

tl;dr: I apparently didn't understand runes when I wrote this.

PiperOrigin-RevId: 241789081
Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727
2019-04-03 13:00:34 -07:00
Kevin Krakauer c79e81bd27 Addresses data race in tty implementation.
Also makes the safemem reading and writing inline, as it makes it easier to see
what locks are held.

PiperOrigin-RevId: 241775201
Change-Id: Ib1072f246773ef2d08b5b9a042eb7e9e0284175c
2019-04-03 11:49:55 -07:00
Nicolas Lacasse 1776ab28f0 Add test that symlinking over a directory returns EEXIST.
Also remove comments in InodeOperations that required that implementation of
some Create* operations ensure that the name does not already exist, since
these checks are all centralized in the Dirent.

PiperOrigin-RevId: 241637335
Change-Id: Id098dc6063ff7c38347af29d1369075ad1e89a58
2019-04-02 17:28:36 -07:00
Wei Zhang 1fcd40719d device: fix device major/minor
Current gvisor doesn't give devices a right major and minor number.

When testing golang supporting of gvisor, I run the test case below:

```
$ docker run -ti --runtime runsc golang:1.12.1 bash -c "cd /usr/local/go/src && ./run.bash "
```

And it reports some errors, one of them is:

"--- FAIL: TestDevices (0.00s)
    --- FAIL: TestDevices//dev/null_1:3 (0.00s)
        dev_linux_test.go:45: for /dev/null Major(0x0) == 0, want 1
        dev_linux_test.go:48: for /dev/null Minor(0x0) == 0, want 3
        dev_linux_test.go:51: for /dev/null Mkdev(1, 3) == 0x103, want 0x0
    --- FAIL: TestDevices//dev/zero_1:5 (0.00s)
        dev_linux_test.go:45: for /dev/zero Major(0x0) == 0, want 1
        dev_linux_test.go:48: for /dev/zero Minor(0x0) == 0, want 5
        dev_linux_test.go:51: for /dev/zero Mkdev(1, 5) == 0x105, want 0x0
    --- FAIL: TestDevices//dev/random_1:8 (0.00s)
        dev_linux_test.go:45: for /dev/random Major(0x0) == 0, want 1
        dev_linux_test.go:48: for /dev/random Minor(0x0) == 0, want 8
        dev_linux_test.go:51: for /dev/random Mkdev(1, 8) == 0x108, want 0x0
    --- FAIL: TestDevices//dev/full_1:7 (0.00s)
        dev_linux_test.go:45: for /dev/full Major(0x0) == 0, want 1
        dev_linux_test.go:48: for /dev/full Minor(0x0) == 0, want 7
        dev_linux_test.go:51: for /dev/full Mkdev(1, 7) == 0x107, want 0x0
    --- FAIL: TestDevices//dev/urandom_1:9 (0.00s)
        dev_linux_test.go:45: for /dev/urandom Major(0x0) == 0, want 1
        dev_linux_test.go:48: for /dev/urandom Minor(0x0) == 0, want 9
        dev_linux_test.go:51: for /dev/urandom Mkdev(1, 9) == 0x109, want 0x0
"

So I think we'd better assign to them correct major/minor numbers following linux spec.

Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I4521ee7884b4e214fd3a261929e3b6dac537ada9
PiperOrigin-RevId: 241609021
2019-04-02 14:51:07 -07:00
Andrei Vagin a4b34e2637 gvisor: convert ilist to ilist:generic_list
ilist:generic_list works faster (cl/240185278) and
the code looks cleaner without type casting.
PiperOrigin-RevId: 241381175
Change-Id: I8487ab1d73637b3e9733c253c56dce9e79f0d35f
2019-04-01 12:53:27 -07:00
Jamie Liu 69afd0438e Return srclen in proc.idMapFileOperations.Write.
PiperOrigin-RevId: 241037926
Change-Id: I4b0381ac1c7575e8b861291b068d3da22bc03850
2019-03-29 13:16:46 -07:00
Googler e373d3642e Internal change.
PiperOrigin-RevId: 240842801
Change-Id: Ibbd6f849f9613edc1b1dd7a99a97d1ecdb6e9188
2019-03-28 13:43:47 -07:00
Jamie Liu f005350c93 Clean up gofer handle caching.
- Document fsutil.CachedFileObject.FD() requirements on access
permissions, and change gofer.inodeFileState.FD() to honor them.
Fixes #147.

- Combine gofer.inodeFileState.readonly and
gofer.inodeFileState.readthrough, and simplify handle caching logic.

- Inline gofer.cachePolicy.cacheHandles into
gofer.inodeFileState.setSharedHandles, because users with access to
gofer.inodeFileState don't necessarily have access to the fs.Inode
(predictably, this is a save/restore problem).

Before this CL:

$ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash
root@34d51017ed67:/# /root/repro/runsc-b147
mmap: 0x7f3c01e45000
Segmentation fault

After this CL:

$ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash
root@d3c3cb56bbf9:/# /root/repro/runsc-b147
mmap: 0x7f78987ec000
o
PiperOrigin-RevId: 240818413
Change-Id: I49e1d4a81a0cb9177832b0a9f31a10da722a896b
2019-03-28 11:43:51 -07:00
Nicolas Lacasse 9c18897887 Add rsslim field in /proc/pid/stat.
PiperOrigin-RevId: 240681675
Change-Id: Ib214106e303669fca2d5c744ed5c18e835775161
2019-03-27 17:44:38 -07:00
Nicolas Lacasse 2d355f0e8f Add start time to /proc/<pid>/stat.
The start time is the number of clock ticks between the boot time and
application start time.

PiperOrigin-RevId: 240619475
Change-Id: Ic8bd7a73e36627ed563988864b0c551c052492a5
2019-03-27 12:41:27 -07:00
Nicolas Lacasse 645af7cdd8 Dev device methods should take pointer receiver.
PiperOrigin-RevId: 240600504
Change-Id: I7dd5f27c8da31f24b68b48acdf8f1c19dbd0c32d
2019-03-27 11:08:50 -07:00
Rahat Mahmood 06ec97a3f8 Implement memfd_create.
Memfds are simply anonymous tmpfs files with no associated
mounts. Also implementing file seals, which Linux only implements for
memfds at the moment.

PiperOrigin-RevId: 240450031
Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f
2019-03-26 16:16:57 -07:00
Jamie Liu f3723f8059 Call memmap.Mappable.Translate with more conservative usermem.AccessType.
MM.insertPMAsLocked() passes vma.maxPerms to memmap.Mappable.Translate
(although it unsets AccessType.Write if the vma is private). This
somewhat simplifies handling of pmas, since it means only COW-break
needs to replace existing pmas. However, it also means that a MAP_SHARED
mapping of a file opened O_RDWR dirties the file, regardless of the
mapping's permissions and whether or not the mapping is ever actually
written to with I/O that ignores permissions (e.g.
ptrace(PTRACE_POKEDATA)).

To fix this:

- Change the pma-getting path to request only the permissions that are
required for the calling access.

- Change memmap.Mappable.Translate to take requested permissions, and
return allowed permissions. This preserves the existing behavior in the
common cases where the memmap.Mappable isn't
fsutil.CachingInodeOperations and doesn't care if the translated
platform.File pages are written to.

- Change the MM.getPMAsLocked path to support permission upgrading of
pmas outside of copy-on-write.

PiperOrigin-RevId: 240196979
Change-Id: Ie0147c62c1fbc409467a6fa16269a413f3d7d571
2019-03-25 12:42:43 -07:00
Kevin Krakauer 0cd5f20044 Replace manual pty copies to/from userspace with safemem operations.
Also, changing queue.writeBuf from a buffer.Bytes to a [][]byte should reduce
copying and reallocating of slices.

PiperOrigin-RevId: 239713547
Change-Id: I6ee5ff19c3ee2662f1af5749cae7b73db0569e96
2019-03-21 18:05:07 -07:00
Andrei Vagin 87cce0ec08 netstack: reduce MSS from SYN to account tcp options
See: https://tools.ietf.org/html/rfc6691#section-2
PiperOrigin-RevId: 239305632
Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e
2019-03-19 17:33:20 -07:00
Michael Pratt 8a499ae65f Remove references to replaced child in Rename in ramfs/agentfs
In the case of a rename replacing an existing destination inode, ramfs
Rename failed to first remove the replaced inode. This caused:

1. A leak of a reference to the inode (making it live indefinitely).
2. For directories, a leak of the replaced directory's .. link to the
   parent. This would cause the parent's link count to incorrectly
   increase.

(2) is much simpler to test than (1), so that's what I've done.

agentfs has a similar bug with link count only, so the Dirent layer
informs the Inode if this is a replacing rename.

Fixes #133

PiperOrigin-RevId: 239105698
Change-Id: I4450af2462d8ae3339def812287213d2cbeebde0
2019-03-18 18:40:06 -07:00
Jamie Liu 8f4634997b Decouple filemem from platform and move it to pgalloc.MemoryFile.
This is in preparation for improved page cache reclaim, which requires
greater integration between the page cache and page allocator.

PiperOrigin-RevId: 238444706
Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4
2019-03-14 08:12:48 -07:00
Jamie Liu fb9919881c Use WalkGetAttr in gofer.inodeOperations.Create.
p9.Twalk.handle() with a non-empty path also stats the walked-to path
anyway, so the preceding GetAttr is completely wasted.

PiperOrigin-RevId: 238440645
Change-Id: I7fbc7536f46b8157639d0d1f491e6aaa9ab688a3
2019-03-14 07:43:15 -07:00
Nicolas Lacasse 2512cc5617 Allow filesystem.Mount to take an optional interface argument.
PiperOrigin-RevId: 238360231
Change-Id: I5eaf8d26f8892f77d71c7fbd6c5225ef471cedf1
2019-03-13 19:24:03 -07:00
Jamie Liu 8930e79ebf Clarify the platform.File interface.
- Redefine some memmap.Mappable, platform.File, and platform.Memory
semantics in terms of File reference counts (no functional change).

- Make AddressSpace.MapFile take a platform.File instead of a raw FD,
and replace platform.File.MapInto with platform.File.FD. This allows
kvm.AddressSpace.MapFile to always use platform.File.MapInternal instead
of maintaining its own (redundant) cache of file mappings in the sentry
address space.

PiperOrigin-RevId: 238044504
Change-Id: Ib73a11e4275c0da0126d0194aa6c6017a9cef64f
2019-03-12 10:29:16 -07:00
Nicolas Lacasse fbacb35039 No need to check for negative uintptr.
Fixes #134

PiperOrigin-RevId: 237128306
Change-Id: I396e808484c18931fc5775970ec1f5ae231e1cb9
2019-03-06 15:06:46 -08:00
Nicolas Lacasse 0d683c9961 Make tmpfs respect MountNoATime now that fs.Handle is gone.
PiperOrigin-RevId: 236752802
Change-Id: I9e50600b2ae25d5f2ac632c4405a7a185bdc3c92
2019-03-04 16:57:14 -08:00
Nicolas Lacasse 9177bcd0ba DecRef replaced dirent in inode_overlay.
PiperOrigin-RevId: 236352158
Change-Id: Ide5104620999eaef6820917505e7299c7b0c5a03
2019-03-01 11:58:59 -08:00
Ruidong Cao 3851705a73 Fix procfs bugs
Current procfs has some bugs. After executing ls twice, many dirs come
out with same name like "1" or ".". Files like "cpuinfo" disappear.
Here variable names is a slice with cap() > len(). Sort after appending
to it will not alloc a new space and impact orignal slice. Same to m.

Signed-off-by: Ruidong Cao <crdfrank@gmail.com>
Change-Id: I83e5cd1c7968c6fe28c35ea4fee497488d4f9eef
PiperOrigin-RevId: 236222270
2019-02-28 16:44:54 -08:00
Jamie Liu 05d721f9ee Hold dataMu for writing in CachingInodeOperations.WriteOut.
fsutil.SyncDirtyAll mutates the DirtySet.

PiperOrigin-RevId: 236183349
Change-Id: I7e809d5b406ac843407e61eff17d81259a819b4f
2019-02-28 13:14:43 -08:00
Nicolas Lacasse d516ee3312 Allow overlay to merge Directories and SepcialDirectories.
Needed to mount inside /proc or /sys.

PiperOrigin-RevId: 235936529
Change-Id: Iee6f2671721b1b9b58a3989705ea901322ec9206
2019-02-27 09:45:45 -08:00
Fabricio Voznika 23fe059761 Lazily allocate inotify map on inode
PiperOrigin-RevId: 235735865
Change-Id: I84223eb18eb51da1fa9768feaae80387ff6bfed0
2019-02-26 09:33:44 -08:00
Googler 532f4b2fba Internal change.
PiperOrigin-RevId: 235053594
Change-Id: Ie3d7b11843d0710184a2463886c7034e8f5305d1
2019-02-21 13:08:34 -08:00
Jamie Liu 22d8b6eba1 Break /proc/[pid]/{uid,gid}_map's dependence on seqfile.
In addition to simplifying the implementation, this fixes two bugs:

- seqfile.NewSeqFile unconditionally creates an inode with mode 0444,
  but {uid,gid}_map have mode 0644.

- idMapSeqFile.Write implements fs.FileOperations.Write ... but it
  doesn't implement any other fs.FileOperations methods and is never
  used as fs.FileOperations. idMapSeqFile.GetFile() =>
  seqfile.SeqFile.GetFile() uses seqfile.seqFileOperations instead,
  which rejects all writes.

PiperOrigin-RevId: 234638212
Change-Id: I4568f741ab07929273a009d7e468c8205a8541bc
2019-02-19 11:21:46 -08:00
Nicolas Lacasse 0a41ea72c1 Don't allow writing or reading to TTY unless process group is in foreground.
If a background process tries to read from a TTY, linux sends it a SIGTTIN
unless the signal is blocked or ignored, or the process group is an orphan, in
which case the syscall returns EIO.

See drivers/tty/n_tty.c:n_tty_read()=>job_control().

If a background process tries to write a TTY, set the termios, or set the
foreground process group, linux then sends a SIGTTOU. If the signal is ignored
or blocked, linux allows the write. If the process group is an orphan, the
syscall returns EIO.

See drivers/tty/tty_io.c:tty_check_change().

PiperOrigin-RevId: 234044367
Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9
2019-02-14 15:47:31 -08:00
Googler 7aaa6cf225 Internal change.
PiperOrigin-RevId: 233802562
Change-Id: I40e1b13fd571daaf241b00f8df4bcedd034dc3f1
2019-02-13 12:07:34 -08:00
Nicolas Lacasse f17692d807 Add fs.AsyncWithContext and call it in fs/gofer/inodeOperations.Release.
fs/gofer/inodeOperations.Release does some asynchronous work.  Previously it
was calling fs.Async with an anonymous function, which caused the function to
be allocated on the heap.  Because Release is relatively hot, this results in a
lot of small allocations and increased GC pressure, noticeable in perf profiles.

This CL adds a new function, AsyncWithContext, which is just like Async, but
passes a context to the async function.  It avoids the need for an extra
anonymous function in fs/gofer/inodeOperations.Release.  The Async function
itself still requires a single anonymous function.

PiperOrigin-RevId: 233141763
Change-Id: I1dce4a883a7be9a8a5b884db01e654655f16d19c
2019-02-08 15:54:15 -08:00
Rahat Mahmood 2ba74f84be Implement /proc/net/unix.
PiperOrigin-RevId: 232948478
Change-Id: Ib830121e5e79afaf5d38d17aeef5a1ef97913d23
2019-02-07 14:44:21 -08:00
Zach Koopmans 0cf7fc4e11 Change /proc/PID/cmdline to read environment vector.
- Change proc to return envp on overwrite of argv with limitations from
upstream.
- Add unit tests
- Change layout of argv/envp on the stack so that end of argv is contiguous with
beginning of envp.

PiperOrigin-RevId: 232506107
Change-Id: I993880499ab2c1220f6dc456a922235c49304dec
2019-02-05 10:02:06 -08:00
Fabricio Voznika 2d20b121d7 CachingInodeOperations was over-dirtying cached attributes
Dirty should be set only when the attribute is changed in the cache
only. Instances where the change was also sent to the backing file
doesn't need to dirty the attribute.

Also remove size update during WriteOut as writing dirty page would
naturaly grow the file if needed.

RELNOTES: relnotes is needed for the parent CL.
PiperOrigin-RevId: 232068978
Change-Id: I00ba54693a2c7adc06efa9e030faf8f2e8e7f188
2019-02-01 17:51:48 -08:00
Nicolas Lacasse 92e85623a0 Factor the subtargets method into a helper method with tests.
PiperOrigin-RevId: 232047515
Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60
2019-02-01 15:23:43 -08:00
Michael Pratt 88b4ce8cac Fix comment
PiperOrigin-RevId: 231861005
Change-Id: I134d4e20cc898d44844219db0a8aacda87e11ef0
2019-01-31 15:03:12 -08:00
Fabricio Voznika a497f5ed5f Invalidate COW mappings when file is truncated
This changed required making fsutil.HostMappable use
a backing file to ensure the correct FD would be used
for read/write operations.

RELNOTES: relnotes is needed for the parent CL.
PiperOrigin-RevId: 231836164
Change-Id: I8ae9639715529874ea7d80a65e2c711a5b4ce254
2019-01-31 12:54:00 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Zhaozhong Ni ae6e37df2a Convert TODO into FIXME.
PiperOrigin-RevId: 231301228
Change-Id: I3e18f3a12a35fb89a22a8c981188268d5887dc61
2019-01-28 15:34:18 -08:00
Nicolas Lacasse 09cf3b40a8 Fix data race in InodeSimpleAttributes.Unstable.
We were modifying InodeSimpleAttributes.Unstable.AccessTime without holding
the necessary lock.  Luckily for us, InodeSimpleAttributes already has a
NotifyAccess method that will do the update while holding the lock.

In addition, we were holding dfo.dir.mu.Lock while setting AccessTime, which
is unnecessary, so that lock has been removed.

PiperOrigin-RevId: 231278447
Change-Id: I81ed6d3dbc0b18e3f90c1df5e5a9c06132761769
2019-01-28 13:26:28 -08:00
Jamie Liu 1cedccf8e9 Drop the one-page limit for /proc/[pid]/{cmdline,environ}.
It never actually should have applied to environ (the relevant change in
Linux 4.2 is c2c0bb44620d "proc: fix PAGE_SIZE limit of
/proc/$PID/cmdline"), and we claim to be Linux 4.4 now anyway.

PiperOrigin-RevId: 231250661
Change-Id: I37f9c4280a533d1bcb3eebb7803373ac3c7b9f15
2019-01-28 11:00:23 -08:00
Fabricio Voznika 55e8eb775b Make cacheRemoteRevalidating detect changes to file size
When file size changes outside the sandbox, page cache was not
refreshing file size which is required for cacheRemoteRevalidating.
In fact, cacheRemoteRevalidating should be skipping the cache
completely since it's not really benefiting from it. The cache is
cache is already bypassed for unstable attributes (see
cachePolicy.cacheUAttrs). And althought the cache is called to
map pages, they will always miss the cache and map directly from
the host.

Created a HostMappable struct that maps directly to the host and
use it for files with cacheRemoteRevalidating.

Closes #124

PiperOrigin-RevId: 230998440
Change-Id: Ic5f632eabe33b47241e05e98c95e9b2090ae08fc
2019-01-25 17:23:07 -08:00
Adin Scannell b5088ba59c cleanup: extract the kernel from context
Change-Id: I94704a90beebb53164325e0cce1fcb9a0b97d65c
PiperOrigin-RevId: 230817308
2019-01-24 17:02:52 -08:00
Rahat Mahmood 8d7c10e908 Display /proc/net entries for all network configurations.
Most of the entries are stubbed out at the moment, but even those were
only displayed if IPv6 support was enabled. The entries should be
displayed with IPv4-support only, and with only loopback devices.

PiperOrigin-RevId: 229946441
Change-Id: I18afaa3af386322787f91bf9d168ab66c01d5a4c
2019-01-18 10:02:12 -08:00