Commit Graph

355 Commits

Author SHA1 Message Date
Nicolas Lacasse bb00438f36 Make masterInodeOperations.Truncate take a pointer receiver.
Otherwise a copy happens, which triggers a data race when reading
masterInodeOperations.SimpleFileOperations.uattr, which must be accessed with a
lock held.

PiperOrigin-RevId: 286464473
2019-12-19 14:34:53 -08:00
Michael Pratt 334a513f11 Add Mems_allowed to /proc/PID/status
PiperOrigin-RevId: 286248378
2019-12-18 13:16:28 -08:00
gVisor bot 3ab90ecf25 Merge pull request #1394 from zhuangel:bindlock
PiperOrigin-RevId: 286051631
2019-12-17 13:53:16 -08:00
gVisor bot 2e2545b458 Merge pull request #1392 from zhuangel:bindleak
PiperOrigin-RevId: 285874181
2019-12-16 16:21:17 -08:00
Dean Deng e6f4124afd Implement checks for get/setxattr at the syscall layer.
Add checks for input arguments, file type, permissions, etc. that match
the Linux implementation. A call to get/setxattr that passes all the
checks will still currently return EOPNOTSUPP. Actual support will be
added in following commits.

Only allow user.* extended attributes for the time being.

PiperOrigin-RevId: 285835159
2019-12-16 13:20:07 -08:00
Yong He bd5c7bf58d Fix deadlock in overlay bind
Copy up parent when binding UDS on overlayfs is supported in commit
02ab1f187c.
But the using of copyUp in overlayBind will cause sentry stuck, reason
is dead lock in renameMu.

1 [Process A] Invoke a Unix socket bind operation
  renameMu is hold in fs.(*Dirent).genericCreate by process A
2 [Process B] Invoke a read syscall on /proc/task/mounts
  waitng on Lock of renameMu in fs.(*MountNamespace).FindMount
3 [Process A] Continue Unix socket bind operation
  wating on RLock of renameMu in fs.copyUp

Root cause is recursive reading lock of reanmeMu in bind call trace,
if there are writing lock between the two reading lock, then deadlock
occured.

Fixes #1397
2019-12-16 18:37:35 +08:00
Yong He 8a46e83111 Fix UDS bind cause fd leak in gofer
After the finalizer optimize in 76039f8959
commit, clientFile needs to closed before finalizer release it.
The clientFile is not closed if it is created via
gofer.(*inodeOperations).Bind, this will cause fd leak which is hold
by gofer process.

Fixes #1396

Signed-off-by: Yong He <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-12-16 18:28:10 +08:00
Fabricio Voznika 898dcc2f83 Redirect TODOs to gvisor.dev
PiperOrigin-RevId: 284606233
2019-12-09 12:11:28 -08:00
Nicolas Lacasse 663fe840f7 Implement TTY field in control.Processes().
Threadgroups already know their TTY (if they have one), which now contains the
TTY Index, and is returned in the Processes() call.

PiperOrigin-RevId: 284263850
2019-12-06 14:34:13 -08:00
Zach Koopmans 0a32c02357 Create correct file for /proc/[pid]/task/[tid]/io
PiperOrigin-RevId: 284038840
2019-12-05 13:24:05 -08:00
Zach Koopmans 0354071539 Fix printing /proc/[pid]/io for /proc/[pid]/task/[tid]/io.
PiperOrigin-RevId: 283630669
2019-12-03 15:07:49 -08:00
Ian Lewis 20279c305e Allow open(O_TRUNC) and (f)truncate for proc files.
This allows writable proc and devices files to be opened with O_CREAT|O_TRUNC.
This is encountered most frequently when interacting with proc or devices files
via the command line.
e.g. $ echo 8192 1048576 4194304 > /proc/sys/net/ipv4/tcp_rmem

Also adds a test to test the behavior of open(O_TRUNC), truncate, and ftruncate
on named pipes.

Fixes #1116

PiperOrigin-RevId: 282677425
2019-11-26 18:21:09 -08:00
gVisor bot 0416c247ec Merge pull request #1176 from xiaobo55x:runsc_boot
PiperOrigin-RevId: 282382564
2019-11-25 11:01:22 -08:00
Adin Scannell c0f89eba6e Import and structure cleanup.
PiperOrigin-RevId: 281795269
2019-11-21 11:41:30 -08:00
Nicolas Lacasse 012102eefd Pass OpenTruncate to gofer in Open call when opening file with O_TRUNC.
Note that the Sentry still calls Truncate() on the file before calling Open.

A new p9 version check was added to ensure that the p9 server can handle the
the OpenTruncate flag. If not, then the flag is stripped before sending.

PiperOrigin-RevId: 281609112
2019-11-20 15:07:16 -08:00
gVisor bot 235a96cab1 Merge pull request #1177 from xiaobo55x:fs_host
PiperOrigin-RevId: 281112758
2019-11-18 11:50:44 -08:00
Kevin Krakauer 339536de5e Check that a file is a regular file with open(O_TRUNC).
It was possible to panic the sentry by opening a cache revalidating folder with
O_TRUNC|O_CREAT.

Avoids breaking php tests.

PiperOrigin-RevId: 280533213
2019-11-14 16:08:34 -08:00
Nicolas Lacasse c2d3dc0c13 Use overlay MountSource when binding socket in overlay.
PiperOrigin-RevId: 280131840
2019-11-12 23:01:47 -08:00
Haibo Xu c5d9b5b881 Enable sentry/fs/host support on arm64.
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Iea95550ea53bcf85c01f7b3b95da70ad0952177d
2019-11-13 06:46:02 +00:00
Haibo Xu 05871a1cdc Enable runsc/boot support on arm64.
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-13 06:39:11 +00:00
Kevin Krakauer af58a4e3bb Automated rollback of changelist 278417533
PiperOrigin-RevId: 279365629
2019-11-08 12:20:11 -08:00
Kevin Krakauer 4fdd69d681 Check that a file is a regular file with open(O_TRUNC).
It was possible to panic the sentry by opening a cache revalidating folder with
O_TRUNC|O_CREAT.

PiperOrigin-RevId: 278417533
2019-11-04 10:58:29 -08:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Nicolas Lacasse fd4e436002 Support O_SYNC and O_DSYNC flags.
When any of these flags are set, all writes will trigger a subsequent fsync
call. This behavior already existed for "write-through" mounts.

O_DIRECT is treated as an alias for O_SYNC. Better support coming soon.

PiperOrigin-RevId: 275114392
2019-10-16 15:01:23 -07:00
gVisor bot d22f0534c0 Merge pull request #736 from tanjianfeng:fix-unix
PiperOrigin-RevId: 275114157
2019-10-16 14:41:43 -07:00
Jianfeng Tan b94505ecc0 support /proc/net/route
This proc file reports routing information to applications inside the
container.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15 16:38:40 +00:00
Jianfeng Tan e3d4a67739 support /proc/net/snmp
This proc file contains statistics according to [1].

[1] https://tools.ietf.org/html/rfc2013

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-15 16:38:40 +00:00
Kevin Krakauer 1de0cf3563 Remove unnecessary context parameter for new pipes.
PiperOrigin-RevId: 273421634
2019-10-07 18:16:14 -07:00
Nicolas Lacasse f24c3188b5 Add sanity check that overlayCreate is called with an overlay parent inode.
PiperOrigin-RevId: 272987037
2019-10-04 17:03:50 -07:00
gVisor bot cde7711837 Merge pull request #865 from tanjianfeng:fix-829
PiperOrigin-RevId: 272522508
2019-10-02 14:51:04 -07:00
Andrei Vagin 2016cc283c fs/proc: report PID-s from a pid namespace of the proc mount
Right now, we can find more than one process with the 1 PID in /proc.

$ for i in `seq 10`; do
> unshare -fp sleep 1000 &
> done

$ ls /proc
1  1  1  1  12  18  24  29  6            loadavg  net   sys          version
1  1  1  1  16  20  26  32  cpuinfo      meminfo  self  thread-self
1  1  1  1  17  21  28  36  filesystems  mounts   stat  uptime

PiperOrigin-RevId: 272506593
2019-10-02 13:29:42 -07:00
Michael Pratt dd69b49ed1 Disable cpuClockTicker when app is idle
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to
track their CPU usage. This improves latency in the syscall path by avoid
expensive monotonic clock calls on every syscall entry/exit.

However, this timer fires every 10ms. Thus, when all tasks are idle (i.e.,
blocked or stopped), this forces a sentry wakeup every 10ms, when we may
otherwise be able to sleep until the next app-relevant event. These wakeups
cause the sentry to utilize approximately 2% CPU when the application is
otherwise idle.

Updates to clock are not strictly necessary when the app is idle, as there are
no readers of cpuClock. This commit reduces idle CPU by disabling the timer
when tasks are completely idle, and computing its effects at the next wakeup.

Rather than disabling the timer as soon as the app goes idle, we wait until the
next tick, which provides a window for short sleeps to sleep and wakeup without
doing the (relatively) expensive work of disabling and enabling the timer.

PiperOrigin-RevId: 272265822
2019-10-01 12:21:01 -07:00
Andrei Vagin 7a234f736f splice: try another fallback option only if the previous one isn't supported
Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com
PiperOrigin-RevId: 272110815
2019-09-30 18:23:42 -07:00
Nicolas Lacasse 3ad17ff597 Force timestamps to update when set via InodeOperations.SetTimestamps.
The gofer's CachingInodeOperations implementation contains an optimization for
the common open-read-close pattern when we have a host FD.  In this case, the
host kernel will update the timestamp for us to a reasonably close time, so we
don't need an extra RPC to the gofer.

However, when the app explicitly sets the timestamps (via futimes or similar)
then we actually DO need to update the timestamps, because the host kernel
won't do it for us.

To fix this, a new boolean `forceSetTimestamps` was added to
CachineInodeOperations.SetMaskedAttributes. It is only set by
gofer.InodeOperations.SetTimestamps.

PiperOrigin-RevId: 272048146
2019-09-30 13:08:45 -07:00
henry.tjf bc9de939fd tty: fix sending SIGTTOU on tty write
How to reproduce:
  $ echo "timeout 10 ls" > foo.sh
  $ chmod +x foo.sh
  $ ./foo.sh
  (will hang here for 10 secs, and the output of ls does not show)

When "ls" process writes to stdout, it receives SIGTTOU signal, and
hangs there. Until "timeout" process timeouts, and kills "ls" process.

The expected result is: "ls" writes its output into tty, and terminates
immdedately, then "timeout" process receives SIGCHLD and terminates.

The reason for this failure is that we missed the check for TOSTOP (if
set, background processes will receive the SIGTTOU signal when they do
write).

We use drivers/tty/n_tty.c:n_tty_write() as a reference.

Fixes: #862

Reported-by: chris.zn <chris.zn@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Signed-off-by: chenglang.hy <chenglang.hy@antfin.com>
2019-09-24 14:18:22 +00:00
Jianfeng Tan 329b6653ff Implement /proc/net/tcp6
Fixes: #829

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Signed-off-by: Jielong Zhou <jielong.zjl@antfin.com>
2019-09-20 17:20:08 +00:00
Kevin Krakauer 0a8a75f3da Job control: controlling TTYs and foreground process groups.
Adresses a deadlock with the rolled back change:
b6a5b950d2
Creating a session from an orphaned process group was causing a lock to be
acquired twice by a single goroutine. This behavior is addressed, and a test
(OrphanRegression) has been added to pty.cc.

Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().

Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.

PiperOrigin-RevId: 270088599
2019-09-19 11:36:47 -07:00
Andrei Vagin 239a07aabf gvisor: return ENOTDIR from the unlink syscall
ENOTDIR has to be returned when a component used as a directory in
pathname is not, in  fact,  a directory.

PiperOrigin-RevId: 269037893
2019-09-13 21:44:57 -07:00
Adin Scannell 7c6ab6a219 Implement splice methods for pipes and sockets.
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.

Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.

PiperOrigin-RevId: 268805207
2019-09-12 17:43:27 -07:00
Michael Pratt df5d377521 Remove go_test from go_stateify and go_marshal
They are no-ops, so the standard rule works fine.

PiperOrigin-RevId: 268776264
2019-09-12 15:10:17 -07:00
Jamie Liu 0352cf5866 Remove support for non-incremental mapped accounting.
PiperOrigin-RevId: 266496644
2019-08-30 19:06:55 -07:00
Bhasker Hariharan 54bf2e8eff Automated rollback of changelist 261387276
PiperOrigin-RevId: 266491264
2019-08-30 18:15:32 -07:00
Rahat Mahmood 863e11ac4d Implement /proc/net/udp.
PiperOrigin-RevId: 266229756
2019-08-29 14:30:41 -07:00
Jamie Liu 36a8949b2a Add limit_host_fd_translation Gofer mount option.
PiperOrigin-RevId: 266177409
2019-08-29 14:01:03 -07:00
Fabricio Voznika c39564332b Mount volumes as super user
This used to be the case, but regressed after a recent change.
Also made a few fixes around it and clean up the code a bit.

Closes #720

PiperOrigin-RevId: 265717496
2019-08-27 10:47:16 -07:00
Jianfeng Tan 2c3e2ed2bf unix: return ECONNRESET if peer closed with data not read
For SOCK_STREAM type unix socket, we shall return ECONNRESET if peer is
closed with data not read.

We explictly set a flag when closing one end, to differentiate from
just shutdown (where zero shall be returned).

Fixes: #735

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-08-22 15:25:38 +00:00
Tamir Duberstein d81d94ac4c Replace uinptr with int64 when returning lengths
This is in accordance with newer parts of the standard library.

PiperOrigin-RevId: 263449916
2019-08-14 16:05:56 -07:00
Fabricio Voznika 0e907c4298 Fix file mode check in pipeOperations
PiperOrigin-RevId: 263203441
2019-08-13 13:33:33 -07:00