Commit Graph

102 Commits

Author SHA1 Message Date
Ian Gudger 81ecd8b6ea Implement the MSG_CTRUNC msghdr flag for Unix sockets.
Updates google/gvisor#206

PiperOrigin-RevId: 245880573
Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29 21:21:08 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Nicolas Lacasse 2df64cd6d2 createAt should return all errors from FindInode except ENOENT.
Previously, createAt was eating all errors from FindInode except for EACCES and
proceeding with the creation. This is incorrect, as FindInode can return many
other errors (like ENAMETOOLONG) that should stop creation.

This CL changes createAt to return all errors encountered except for ENOENT,
which we can ignore because we are about to create the thing.

PiperOrigin-RevId: 245773222
Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
2019-04-29 10:30:24 -07:00
Wei Zhang 17ff6063a3 Bugfix: fix fstatat symbol link to dir
For a symbol link to some directory, eg.

`/tmp/symlink -> /tmp/dir`

`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.

Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.

Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
2019-04-22 20:07:06 -07:00
Ian Gudger 358eb52a76 Add support for the MSG_TRUNC msghdr flag.
The MSG_TRUNC flag is set in the msghdr when a message is truncated.

Fixes google/gvisor#200

PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-19 16:17:01 -07:00
Michael Pratt c931c8e082 Format struct pollfd in poll(2)/ppoll(2)
I0410 15:40:38.854295    3776 x:0] [   1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1)
I0410 15:40:38.854348    3776 x:0] [   1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s)

PiperOrigin-RevId: 244269879
Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
2019-04-18 15:24:07 -07:00
Michael Pratt 08d99c5fbe Convert poll/select to operate more directly on linux.PollFD
Current, doPoll copies the user struct pollfd array into a
[]syscalls.PollFD, which contains internal kdefs.FD and
waiter.EventMask types. While these are currently binary-compatible with
the Linux versions, we generally discourage copying directly to internal
types (someone may inadvertantly change kdefs.FD to uint64).

Instead, copy directly to a []linux.PollFD, which will certainly be
binary compatible. Most of syscalls/polling.go is included directly into
syscalls/linux/sys_poll.go, as it can then operate directly on
linux.PollFD. The additional syscalls.PollFD type is providing little
value.

I've also added explicit conversion functions for waiter.EventMask,
which creates the possibility of a different binary format.

PiperOrigin-RevId: 244042947
Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
2019-04-17 12:15:01 -07:00
Jamie Liu 4209edafb6 Use open fids when fstat()ing gofer files.
PiperOrigin-RevId: 243018347
Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37
2019-04-11 00:43:04 -07:00
Michael Pratt cc48969bb7 Internal change
PiperOrigin-RevId: 242978508
Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
2019-04-10 18:00:18 -07:00
Kevin Krakauer f7aff0aaa4 Allow threads with CAP_SYS_RESOURCE to raise hard rlimits.
PiperOrigin-RevId: 242919489
Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
2019-04-10 12:36:45 -07:00
Li Qiang b3b140ea4f syscalls: sendfile: limit the count to MAX_RW_COUNT
From sendfile spec and also the linux kernel code, we should
limit the count arg to 'MAX_RW_COUNT'. This patch export
'MAX_RW_COUNT' in kernel pkg and use it in the implementation
of sendfile syscall.

Signed-off-by: Li Qiang <pangpei.lq@antfin.com>
Change-Id: I1086fec0685587116984555abd22b07ac233fbd2
PiperOrigin-RevId: 242745831
2019-04-09 14:57:05 -07:00
Jamie Liu 9471c01348 Export kernel.SignalInfoPriv.
Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks.

PiperOrigin-RevId: 242562428
Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90
2019-04-08 16:32:11 -07:00
Andrei Vagin 88409e983c gvisor: Add support for the MS_NOEXEC mount option
https://github.com/google/gvisor/issues/145

PiperOrigin-RevId: 242044115
Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878
2019-04-04 17:43:53 -07:00
Adin Scannell 75c8ac38e0 BUILD: Add useful go_path target
Change-Id: Ibd6d8a1a63826af6e62a0f0669f8f0866c8091b4
PiperOrigin-RevId: 242037969
2019-04-04 17:05:38 -07:00
Ian Lewis 77f01ee3c7 Add syscall annotations for unimplemented syscalls
Added syscall annotations for unimplemented syscalls for later generation into
reference docs. Annotations are of the form:
@Syscall(<name>, <key:value>, ...)

Supported args and values are:

- arg: A syscall option. This entry only applies to the syscall when given this
       option.
- support: Indicates support level
  - UNIMPLEMENTED: Unimplemented (implies returns:ENOSYS)
  - PARTIAL: Partial support. Details should be provided in note.
  - FULL: Full support
- returns: Indicates a known return value. Values are
           syscall errors. This is treated as a string so you can use something
           like "returns:EPERM or ENOSYS".
- issue: A Github issue number.
- note: A note

Example:
// @Syscall(mmap, arg:MAP_PRIVATE, support:FULL, note:Private memory fully supported)
// @Syscall(mmap, arg:MAP_SHARED, support:UNIMPLEMENTED, issue:123, note:Shared memory not supported)
// @Syscall(setxattr, returns:ENOTSUP, note:Requires file system support)

Annotations should be placed as close to their implementation as possible
(preferrably as part of a supporting function's Godoc) and should be updated as
syscall support changes.

PiperOrigin-RevId: 241697482
Change-Id: I7a846135db124e1271dc5057d788cba82ca312d4
2019-04-03 03:10:23 -07:00
Jamie Liu 26e8d9981f Use kernel.Task.CopyScratchBuffer in syscalls/linux where possible.
PiperOrigin-RevId: 241072126
Change-Id: Ib4d9f58f550732ac4c5153d3cf159a5b1a9749da
2019-03-29 16:25:33 -07:00
Nicolas Lacasse 99195b0e16 Setting timestamps should trigger an inotify event.
PiperOrigin-RevId: 240850187
Change-Id: I1458581b771a1031e47bba439e480829794927b8
2019-03-28 14:15:23 -07:00
Rahat Mahmood 06ec97a3f8 Implement memfd_create.
Memfds are simply anonymous tmpfs files with no associated
mounts. Also implementing file seals, which Linux only implements for
memfds at the moment.

PiperOrigin-RevId: 240450031
Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f
2019-03-26 16:16:57 -07:00
Nicolas Lacasse b81bfd6013 lstat should resolve the final path component if it ends in a slash.
PiperOrigin-RevId: 239896221
Change-Id: I0949981fe50c57131c5631cdeb10b225648575c0
2019-03-22 17:38:13 -07:00
Ian Gudger ba828233b9 Clear msghdr flags on successful recvmsg.
.net sets these flags to -1 and then uses their result, especting it to be
zero.

Does not set actual flags (e.g. MSG_TRUNC), but setting to zero is more correct
than what we did before.

PiperOrigin-RevId: 239657951
Change-Id: I89c5f84bc9b94a2cd8ff84e8ecfea09e01142030
2019-03-21 13:19:11 -07:00
Jamie Liu 8f4634997b Decouple filemem from platform and move it to pgalloc.MemoryFile.
This is in preparation for improved page cache reclaim, which requires
greater integration between the page cache and page allocator.

PiperOrigin-RevId: 238444706
Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4
2019-03-14 08:12:48 -07:00
Nicolas Lacasse 2512cc5617 Allow filesystem.Mount to take an optional interface argument.
PiperOrigin-RevId: 238360231
Change-Id: I5eaf8d26f8892f77d71c7fbd6c5225ef471cedf1
2019-03-13 19:24:03 -07:00
Fabricio Voznika 0b76887147 Priority-inheritance futex implementation
It is Implemented without the priority inheritance part given
that gVisor defers scheduling decisions to Go runtime and doesn't
have control over it.

PiperOrigin-RevId: 236989545
Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
2019-03-05 23:40:18 -08:00
Fabricio Voznika 3dbd4a16f8 Add semctl(GETPID) syscall
Also added unimplemented notification for semctl(2)
commands.

PiperOrigin-RevId: 236340672
Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478
2019-03-01 10:57:02 -08:00
Fabricio Voznika 10426e0f31 Handle invalid offset in sendfile(2)
PiperOrigin-RevId: 235578698
Change-Id: I608ff5e25eac97f6e1bda058511c1f82b0e3b736
2019-02-25 12:17:46 -08:00
Nicolas Lacasse e884168e1e Encode stat to bytes manually, instead of calling CopyObjectOut.
CopyObjectOut grows its destination byte slice incrementally, causing
many small slice allocations on the heap. This leads to increased GC and
noticeably slower stat calls.

PiperOrigin-RevId: 233140904
Change-Id: Ieb90295dd8dd45b3e56506fef9d7f86c92e97d97
2019-02-08 15:48:23 -08:00
Fabricio Voznika 9ef3427ac1 Implement semctl(2) SETALL and GETALL
PiperOrigin-RevId: 232914984
Change-Id: Id2643d7ad8e986ca9be76d860788a71db2674cda
2019-02-07 11:41:44 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Zach Koopmans 7f8de3bf92 Fixing select call to not enforce RLIMIT_NOFILE.
Removing check to RLIMIT_NOFILE in select call.
Adding unit test to select suite to document behavior.
Moving setrlimit class from mlock to a util file for reuse.
Fixing flaky test based on comments from Jamie.

PiperOrigin-RevId: 228726131
Change-Id: Ie9dbe970bbf835ba2cca6e17eec7c2ee6fadf459
2019-01-10 09:44:45 -08:00
Brian Geffon dd761c170c Allow MSG_OOB and MSG_DONTROUTE to be no-ops on recvmsg(2).
PiperOrigin-RevId: 228428223
Change-Id: I433ba5ffc15ea4c2706ec944901b8269b1f364f8
2019-01-08 17:13:17 -08:00
Brian Geffon 3676b7ff1c Improve loader related error messages returned to users.
PiperOrigin-RevId: 228382827
Change-Id: Ica1d30e0df826bdd77f180a5092b2b735ea5c804
2019-01-08 12:58:08 -08:00
Jamie Liu 9a442fa4b5 Automated rollback of changelist 226224230
PiperOrigin-RevId: 226493053
Change-Id: Ia98d1cb6dd0682049e4d907ef69619831de5c34a
2018-12-21 08:23:34 -08:00
Ian Gudger f6274804e1 Make read and write respect SO_RCVTIMEO and SO_SNDTIMEO
PiperOrigin-RevId: 226387521
Change-Id: I0579ab262320fde6c72d2994dd38437f01a99ea5
2018-12-20 13:48:52 -08:00
Googler 86c9bd2547 Automated rollback of changelist 225861605
PiperOrigin-RevId: 226224230
Change-Id: Id24c7d3733722fd41d5fe74ef64e0ce8c68f0b12
2018-12-19 13:30:08 -08:00
Zach Koopmans ff7178a4d1 Implement pwritev2.
Implement pwritev2 and associated unit tests.
Clean up preadv2 unit tests.
Tag RWF_ flags in both preadv2 and pwritev2 with associated bug tickets.

PiperOrigin-RevId: 226222119
Change-Id: Ieb22672418812894ba114bbc88e67f1dd50de620
2018-12-19 13:16:06 -08:00
Jamie Liu 2421006426 Implement mlock(), kind of.
Currently mlock() and friends do nothing whatsoever. However, mlocking
is directly application-visible in a number of ways; for example,
madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked
regions. We handle this inconsistently: MADV_DONTNEED is too important
to not work, but MS_INVALIDATE is rejected.

Change MM to track mlocked regions in a manner consistent with Linux.
It still will not actually pin pages into host physical memory, but:

- mlock() will now cause sentry memory management to precommit mlocked
pages.

- MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as
described above.

PiperOrigin-RevId: 225861605
Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee
2018-12-17 11:38:59 -08:00
Ian Gudger e1dcf92ec5 Implement SO_SNDTIMEO
PiperOrigin-RevId: 225620490
Change-Id: Ia726107b3f58093a5f881634f90b071b32d2c269
2018-12-14 16:15:06 -08:00
Ian Gudger 5d87d8865f Implement MSG_WAITALL
MSG_WAITALL requests that recv family calls do not perform short reads. It only
has an effect for SOCK_STREAM sockets, other types ignore it.

PiperOrigin-RevId: 224918540
Change-Id: Id97fbf972f1f7cbd4e08eec0138f8cbdf1c94fe7
2018-12-10 17:56:34 -08:00
Rahat Mahmood fc29770251 Add type safety to shm ids and keys.
PiperOrigin-RevId: 224864380
Change-Id: I49542279ad56bf15ba462d3de1ef2b157b31830a
2018-12-10 12:48:02 -08:00
Michael Pratt 99d5958693 Validate FS_BASE in Task.Clone
arch_prctl already verified that the new FS_BASE was canonical, but
Task.Clone did not. Centralize these checks in the arch packages.

Failure to validate could cause an error in PTRACE_SET_REGS when we try
to switch to the app.

PiperOrigin-RevId: 224862398
Change-Id: Iefe63b3f9aa6c4810326b8936e501be3ec407f14
2018-12-10 12:37:16 -08:00
Zach Koopmans 4d8c7ae869 Fixing O_TRUNC behavior to match Linux.
PiperOrigin-RevId: 224351139
Change-Id: I9453bd75e5a8d38db406bb47fdc01038ac60922e
2018-12-06 09:26:49 -08:00
Zach Koopmans 06131fe749 Check for CAP_SYS_RESOURCE in prctl(PR_SET_MM, ...)
If sys_prctl is called with PR_SET_MM without CAP_SYS_RESOURCE,
the syscall should return failure with errno set to EPERM.
See: http://man7.org/linux/man-pages/man2/prctl.2.html
PiperOrigin-RevId: 224182874
Change-Id: I630d1dd44af8b444dd16e8e58a0764a0cf1ad9a3
2018-12-05 10:53:51 -08:00
Brian Geffon 2cab0e82ad Linkat(2) should sanity check flags.
PiperOrigin-RevId: 224047765
Change-Id: I6f3c75b33c32bf8f8910ea3fab35406d7d672d87
2018-12-04 14:34:19 -08:00
Brian Geffon 82719be42e Max link traversals should be for an entire path.
The number of symbolic links that are allowed to be followed
are for a full path and not just a chain of symbolic links.

PiperOrigin-RevId: 224047321
Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
2018-12-04 14:32:03 -08:00
Rahat Mahmood 806e346491 Fix mempolicy_test on bazel.
Bazel runs multiple test cases on the same thread. Some of the test
cases rely on the test thread starting with the default memory policy,
while other tests modify the test thread's memory policy. This
obviously breaks when the test framework doesn't run each test case on
a new thread.

Also fixing an incompatibility where set_mempolicy(2) was prevented
from specifying an empty nodemask, which is allowed for some modes.

PiperOrigin-RevId: 224038957
Change-Id: Ibf780766f2706ebc9b129dbc8cf1b85c2a275074
2018-12-04 13:45:58 -08:00
Zach Koopmans b3b60ea29a Implementation of preadv2 for Linux 4.4 support
Implement RWF_HIPRI (4.6) silently passes the read call.
Implement -1 offset calls readv.

PiperOrigin-RevId: 222840324
Change-Id: If9ddc1e8d086e1a632bdf5e00bae08205f95b6b0
2018-11-26 09:50:47 -08:00
Fabricio Voznika 8b314b0bf4 Fix recursive read lock taken on TaskSet
SyncSyscallFiltersToThreadGroup and Task.TheadID() both acquired TaskSet RWLock
in R mode and could deadlock if a writer comes in between.

PiperOrigin-RevId: 222313551
Change-Id: I4221057d8d46fec544cbfa55765c9a284fe7ebfa
2018-11-20 15:07:56 -08:00
Adin Scannell bb9a2bb62e Update futex to use usermem abstractions.
This eliminates the indirection that existed in task_futex.

PiperOrigin-RevId: 221832498
Change-Id: Ifb4c926d493913aa6694e193deae91616a29f042
2018-11-20 14:02:07 -08:00
Andrei Vagin 2ef122da35 Implement sync_file_range()
sync_file_range - sync a file segment with disk

In Linux, sync_file_range() accepts three flags:

       SYNC_FILE_RANGE_WAIT_BEFORE
              Wait  upon  write-out  of  all pages in the specified range that
              have already been submitted to the device driver  for  write-out
              before performing any write.

       SYNC_FILE_RANGE_WRITE
              Initiate  write-out  of  all  dirty pages in the specified range
              which are not presently submitted  write-out.   Note  that  even
              this  may  block if you attempt to write more than request queue
              size.

       SYNC_FILE_RANGE_WAIT_AFTER
              Wait upon write-out of all pages in the range  after  performing
              any write.

In this implementation:

SYNC_FILE_RANGE_WAIT_BEFORE without SYNC_FILE_RANGE_WAIT_AFTER isn't
supported right now.

SYNC_FILE_RANGE_WRITE is skipped. It should initiate write-out of  all
dirty pages, but it doesn't wait, so it should be safe to do nothing
while nobody uses SYNC_FILE_RANGE_WAIT_BEFORE.

SYNC_FILE_RANGE_WAIT_AFTER is equal to fdatasync(). In Linux,
sync_file_range() doesn't writes out the  file's  meta-data, but
fdatasync() does if a file size is changed.

PiperOrigin-RevId: 220730840
Change-Id: Iae5dfb23c2c916967d67cf1a1ad32f25eb3f6286
2018-11-08 17:39:51 -08:00