Commit Graph

825 Commits

Author SHA1 Message Date
Andrei Vagin f7dbddaf77 platform/kvm: calll sigtimedwait with zero timeout
sigtimedwait is used to check pending signals and
it should not block.

PiperOrigin-RevId: 277777269
2019-10-31 12:29:04 -07:00
lubinszARM ca933329fa support using KVM_MEM_READONLY for arm64 regions
On Arm platform, "setMemoryRegion" has extra permission checks.
In virt/kvm/arm/mmu.c: kvm_arch_prepare_memory_region()
      ....
      if (writable && !(vma->vm_flags & VM_WRITE)) {
             ret = -EPERM;
             break;
       }
        ....
So, for Arm platform, the "flags" for kvm_memory_region is required.
And on x86 platform, the "flags" can be always set as '0'.

Signed-off-by: Bin Lu <bin.lu@arm.com>
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/810 from lubinszARM:pr_setregion 8c99b19cfb0c859c6630a1cfff951db65fcf87ac
PiperOrigin-RevId: 277602603
2019-10-30 15:53:31 -07:00
Ian Gudger a2c51efe36 Add endpoint tracking to the stack.
In the future this will replace DanglingEndpoints. DanglingEndpoints must be
kept for now due to issues with save/restore.

This is arguably a cleaner design and allows the stack to know which transport
endpoints might still be using its link endpoints.

Updates #837

PiperOrigin-RevId: 277386633
2019-10-29 16:14:51 -07:00
Dean Deng d7f5e823e2 Fix grammar in comment.
Missing "for".

PiperOrigin-RevId: 277358513
2019-10-29 14:05:04 -07:00
Dean Deng 38330e9377 Update symlink traversal limit when resolving interpreter path.
When execveat is called on an interpreter script, the symlink count for
resolving the script path should be separate from the count for resolving the
the corresponding interpreter. An ELOOP error should not occur if we do not hit
the symlink limit along any individual path, even if the total number of
symlinks encountered exceeds the limit.

Closes #574

PiperOrigin-RevId: 277358474
2019-10-29 13:59:28 -07:00
Michael Pratt c0b8fd4b6a Update build tags to allow Go 1.14
Currently there are no ABI changes. We should check again closer to release.

PiperOrigin-RevId: 277349744
2019-10-29 13:18:16 -07:00
Dean Deng 2e00771d5a Refactor logic for loadExecutable.
Separate the handling of filenames and *fs.File objects in a more explicit way
for the sake of clarity.

PiperOrigin-RevId: 277344203
2019-10-29 12:51:29 -07:00
Dean Deng 29273b0384 Disallow execveat on interpreter scripts with fd opened with O_CLOEXEC.
When an interpreter script is opened with O_CLOEXEC and the resulting fd is
passed into execveat, an ENOENT error should occur (the script would otherwise
be inaccessible to the interpreter). This matches the actual behavior of
Linux's execveat.

PiperOrigin-RevId: 277306680
2019-10-29 10:04:39 -07:00
Michael Pratt 198f1cddb8 Update comment
FDTable.GetFile doesn't exist.

PiperOrigin-RevId: 277089842
2019-10-28 10:20:23 -07:00
Dean Deng 1c480abc39 Aggregate arguments for loading executables into a single struct.
This change simplifies the function signatures of functions related to loading
executables, such as LoadTaskImage, Load, loadBinary.

PiperOrigin-RevId: 276821187
2019-10-25 22:44:19 -07:00
Ian Gudger 8f029b3f82 Convert DelayOption to the newer/faster SockOpt int type.
DelayOption is set on all new endpoints in gVisor.

PiperOrigin-RevId: 276746791
2019-10-25 13:15:34 -07:00
Andrei Vagin fd598912be platform/ptrace: use tgkill instead of kill
The syscall filters don't allow kill, just tgkill.

PiperOrigin-RevId: 276718421
2019-10-25 11:19:20 -07:00
Dean Deng d9fd536340 Handle AT_SYMLINK_NOFOLLOW flag for execveat.
PiperOrigin-RevId: 276441249
2019-10-24 01:45:25 -07:00
Dean Deng 7ca50236c4 Handle AT_EMPTY_PATH flag in execveat.
PiperOrigin-RevId: 276419967
2019-10-23 22:23:05 -07:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
DarcySail fbe6b50d56 Keep minimal available fd to accelerate fd allocation
Use fd.next to store the iteration start position, which can be used to accelerate allocating new FDs.
And adding the corresponding gtest benchmark to measure performance.
@tanjianfeng

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/758 from DarcySail:master 96685ec7886dfe1a64988406831d3bc002b438cc
PiperOrigin-RevId: 276351250
2019-10-23 14:27:53 -07:00
Ian Lewis ebe8001724 Update const names to be Go style.
PiperOrigin-RevId: 276165962
2019-10-22 16:16:41 -07:00
Andrei Vagin e63ff6d923 platform/ptrace: exit without panic if a stub process has been killed by SIGKILL
SIGKILL can be sent only by an user or OOM-killer. In both cases, we don't
need to panic.

PiperOrigin-RevId: 276150120
2019-10-22 14:57:23 -07:00
Nicolas Lacasse 070a8c2d4c Remove old TODO.
PiperOrigin-RevId: 275956240
2019-10-21 17:04:32 -07:00
Dean Deng 0b569b7cae Add basic implementation of execveat syscall and associated tests.
Allow file descriptors of directories as well as AT_FDCWD.

PiperOrigin-RevId: 275929668
2019-10-21 14:55:18 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Kevin Krakauer 652f7b1d0f Add support for pipes in VFS2.
PiperOrigin-RevId: 275650307
2019-10-19 11:49:38 -07:00
Kevin Krakauer dfdbdf14fa Refactor pipe to support VFS2.
* Pulls common functionality (IO and locking on open) into pipe_util.go.
* Adds pipe/vfs.go, which implements a subset of vfs.FileDescriptionImpl.

A subsequent change will add support for pipes in memfs.

PiperOrigin-RevId: 275322385
2019-10-17 13:11:07 -07:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Michael Pratt 8fe48dcb1e Add sublevel to kernel version
Standard Linux kernel versions are VERSION.PATCHLEVEL.SUBLEVEL. e.g., 4.4.0,
even when the sublevel is 0. Match this standard.

PiperOrigin-RevId: 275125715
2019-10-16 15:22:42 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Nicolas Lacasse fd4e436002 Support O_SYNC and O_DSYNC flags.
When any of these flags are set, all writes will trigger a subsequent fsync
call. This behavior already existed for "write-through" mounts.

O_DIRECT is treated as an alias for O_SYNC. Better support coming soon.

PiperOrigin-RevId: 275114392
2019-10-16 15:01:23 -07:00
Michael Pratt bbdcf44ebb Fix syscall changes lost in rebase
These syscalls were changed in the amd64 file around the time the arm64 PR was
sent out, so their changes got lost.

Updates #63

PiperOrigin-RevId: 275114194
2019-10-16 14:56:29 -07:00
gVisor bot d22f0534c0 Merge pull request #736 from tanjianfeng:fix-unix
PiperOrigin-RevId: 275114157
2019-10-16 14:41:43 -07:00
Jamie Liu 0457a4c4cb Minor vfs.FileDescriptionImpl fixes.
- Pass context.Context to OnClose().

- Pass memmap.MMapOpts to ConfigureMMap() by pointer so that implementations
  can actually mutate it as required.

PiperOrigin-RevId: 274934967
2019-10-15 18:40:45 -07:00
Jianfeng Tan d277bfba27 epsocket: support /proc/net/snmp
Netstack has its own stats, we use this to fill /proc/net/snmp.

Note that some metrics are not recorded in Netstack, which will be shown
as 0 in the proc file.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15 16:38:41 +00:00
Jianfeng Tan aee2c93366 netstack: add counters for tcp CurrEstab and EstabResets
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-15 16:38:40 +00:00
Jianfeng Tan dd7d1f825d hostinet: support /proc/net/snmp and /proc/net/dev
For hostinet, we inherit the data from host procfs. To to that, we
cache the fds for these files for later reads.

Fixes #506

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I2f81215477455b9c59acf67e33f5b9af28ee0165
2019-10-15 16:38:40 +00:00
Jianfeng Tan b94505ecc0 support /proc/net/route
This proc file reports routing information to applications inside the
container.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15 16:38:40 +00:00
Jianfeng Tan e3d4a67739 support /proc/net/snmp
This proc file contains statistics according to [1].

[1] https://tools.ietf.org/html/rfc2013

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-15 16:38:40 +00:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Ian Lewis 470997ca99 Allow for zero byte iovec with MSG_PEEK | MSG_TRUNC in recvmsg.
This allows for peeking at the length of the next message on a netlink socket
without pulling it off the socket's buffer/queue, allowing tools like 'ip' to
work.

This CL also fixes an issue where dump_done_errno was not included in the
NLMSG_DONE messages payload.

Issue #769

PiperOrigin-RevId: 274068637
2019-10-10 16:55:48 -07:00
Bhasker Hariharan c7e901f47a Fix bugs in fragment handling.
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.

PiperOrigin-RevId: 274049313
2019-10-10 15:14:55 -07:00
Adin Scannell f8b1859319 Fix signalfd polling.
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 274017890
2019-10-10 12:51:22 -07:00
gVisor bot bf870c1a42 Internal change.
PiperOrigin-RevId: 273861936
2019-10-09 17:56:05 -07:00
gVisor bot 7a2d5b2fa7 Merge pull request #811 from lubinszARM:pr_testutil
PiperOrigin-RevId: 273781641
2019-10-09 12:00:53 -07:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Kevin Krakauer 1de0cf3563 Remove unnecessary context parameter for new pipes.
PiperOrigin-RevId: 273421634
2019-10-07 18:16:14 -07:00
Kevin Krakauer 6a98237949 Rename epsocket to netstack.
PiperOrigin-RevId: 273365058
2019-10-07 13:57:59 -07:00
gVisor bot 8fce24d33a Merge pull request #753 from lubinszARM:pr_syscall_linux
PiperOrigin-RevId: 273364848
2019-10-07 13:52:19 -07:00
Nicolas Lacasse f24c3188b5 Add sanity check that overlayCreate is called with an overlay parent inode.
PiperOrigin-RevId: 272987037
2019-10-04 17:03:50 -07:00
Kevin Krakauer 7ef1c44a7f Change linux.FileMode from uint to uint16, and update VFS to use FileMode.
In Linux (include/linux/types.h), mode_t is an unsigned short.

PiperOrigin-RevId: 272956350
2019-10-04 14:20:32 -07:00
Andrei Vagin db218fdfcf Don't report partialResult errors from sendfile
The input file descriptor is always a regular file, so sendfile can't lose any
data if it will not be able to write them to the output file descriptor.

Reported-by: syzbot+22d22330a35fa1c02155@syzkaller.appspotmail.com
PiperOrigin-RevId: 272730357
2019-10-03 13:38:30 -07:00
gVisor bot cde7711837 Merge pull request #865 from tanjianfeng:fix-829
PiperOrigin-RevId: 272522508
2019-10-02 14:51:04 -07:00
Andrei Vagin 2016cc283c fs/proc: report PID-s from a pid namespace of the proc mount
Right now, we can find more than one process with the 1 PID in /proc.

$ for i in `seq 10`; do
> unshare -fp sleep 1000 &
> done

$ ls /proc
1  1  1  1  12  18  24  29  6            loadavg  net   sys          version
1  1  1  1  16  20  26  32  cpuinfo      meminfo  self  thread-self
1  1  1  1  17  21  28  36  filesystems  mounts   stat  uptime

PiperOrigin-RevId: 272506593
2019-10-02 13:29:42 -07:00