Commit Graph

161 Commits

Author SHA1 Message Date
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Ian Gudger ab6fa44588 Allow kernel.(*Task).Block to accept an extract only channel
PiperOrigin-RevId: 213328293
Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a
2018-09-17 13:35:54 -07:00
Jamie Liu 0380bcb3a4 Fix interaction between rt_sigtimedwait and ignored signals.
PiperOrigin-RevId: 213011782
Change-Id: I716c6ea3c586b0c6c5a892b6390d2d11478bc5af
2018-09-14 11:10:50 -07:00
Ian Gudger 29a7271f5d Plumb monotonic time to netstack
Netstack needs to be portable, so this seems to be preferable to using raw
system calls.

PiperOrigin-RevId: 212917409
Change-Id: I7b2073e7db4b4bf75300717ca23aea4c15be944c
2018-09-13 19:12:15 -07:00
Rahat Mahmood adf8f33970 Extend memory usage events to report mapped memory usage.
PiperOrigin-RevId: 212887555
Change-Id: I3545383ce903cbe9f00d9b5288d9ef9a049b9f4f
2018-09-13 15:16:47 -07:00
Fabricio Voznika 172860a059 Add 'Starting gVisor...' message to syslog
This allows applications to verify they are running with gVisor. It
also helps debugging when running with a mix of container runtimes.

Closes #54

PiperOrigin-RevId: 212059457
Change-Id: I51d9595ee742b58c1f83f3902ab2e2ecbd5cedec
2018-09-07 16:59:27 -07:00
Fabricio Voznika f895cb4d8b Use root abstract socket namespace for exec
PiperOrigin-RevId: 211999211
Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d
2018-09-07 10:45:55 -07:00
Nicolas Lacasse 6516b5648b createProcessArgs.RootFromContext should return process Root if it exists.
It was always returning the MountNamespace root, which may be different from
the process Root if the process is in a chroot environment.

PiperOrigin-RevId: 211862181
Change-Id: I63bfeb610e2b0affa9fdbdd8147eba3c39014480
2018-09-06 13:47:49 -07:00
Adin Scannell c09f9acd7c Distinguish Element and Linker for ilist.
Furthermore, allow for the specification of an ElementMapper. This allows a
single "Element" type to exist on multiple inline lists, and work without
having to embed the entry type.

This is a requisite change for supporting a per-Inode list of Dirents.

PiperOrigin-RevId: 211467497
Change-Id: If2768999b43e03fdaecf8ed15f435fe37518d163
2018-09-04 09:19:11 -07:00
Jamie Liu f8ccfbbed4 Document more task-goroutine-owned fields in kernel.Task.
Task.creds can only be changed by the task's own set*id and execve
syscalls, and Task namespaces can only be changed by the task's own
unshare/setns syscalls.

PiperOrigin-RevId: 211156279
Change-Id: I94d57105d34e8739d964400995a8a5d76306b2a0
2018-08-31 15:44:40 -07:00
Jamie Liu 098046ba19 Disintegrate kernel.TaskResources.
This allows us to call kernel.FDMap.DecRef without holding mutexes
cleanly.

PiperOrigin-RevId: 211139657
Change-Id: Ie59d5210fb9282e1950e2e40323df7264a01bcec
2018-08-31 13:58:04 -07:00
Jamie Liu b1c1afa3cc Delete the long-obsolete kernel.TaskMaybe interface.
PiperOrigin-RevId: 211131855
Change-Id: Ia7799561ccd65d16269e0ae6f408ab53749bca37
2018-08-31 13:07:34 -07:00
Ian Gudger 52e6714146 fasync: don't keep mutex after return
PiperOrigin-RevId: 210637533
Change-Id: I3536c3f9efb54732a0d8ada8bc299142b2c1682f
2018-08-28 17:26:26 -07:00
Brian Geffon f0492d45aa Add /proc/sys/kernel/shm[all,max,mni].
PiperOrigin-RevId: 210459956
Change-Id: I51859b90fa967631e0a54a390abc3b5541fbee66
2018-08-27 17:21:37 -07:00
Jamie Liu 64403265a0 Implement POSIX per-process interval timers.
PiperOrigin-RevId: 210021612
Change-Id: If7c161e6fd08cf17942bfb6bc5a8d2c4e271c61e
2018-08-23 16:32:36 -07:00
Ian Gudger 9c407382b0 Fix races in kernel.(*Task).Value()
PiperOrigin-RevId: 209627180
Change-Id: Idc84afd38003427e411df6e75abfabd9174174e1
2018-08-21 11:16:17 -07:00
Kevin Krakauer 635b0c4593 runsc fsgofer: Support dynamic serving of filesystems.
When multiple containers run inside a sentry, each container has its own root
filesystem and set of mounts. Containers are also added after sentry boot rather
than all configured and known at boot time.

The fsgofer needs to be able to serve the root filesystem of each container.
Thus, it must be possible to add filesystems after the fsgofer has already
started.

This change:
* Creates a URPC endpoint within the gofer process that listens for requests to
  serve new content.
* Enables the sentry, when starting a new container, to add the new container's
  filesystem.
* Mounts those new filesystems at separate roots within the sentry.

PiperOrigin-RevId: 208903248
Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15 16:25:22 -07:00
Michael Pratt 2e06b23aa6 Fix missing O_LARGEFILE from O_CREAT files
Cleanup some more syscall.O_* references while we're here.

PiperOrigin-RevId: 208133460
Change-Id: I48db71a38f817e4f4673977eafcc0e3874eb9a25
2018-08-09 16:50:37 -07:00
Jamie Liu c036da5dff Hold TaskSet.mu in Task.Parent.
PiperOrigin-RevId: 207766238
Change-Id: Id3b66d8fe1f44c3570f67fa5ae7ba16021e35be1
2018-08-07 13:09:42 -07:00
Zhaozhong Ni c348d07863 sentry: make epoll.pollEntry wait for the file operation in restore.
PiperOrigin-RevId: 207737935
Change-Id: I3a301ece1f1d30909715f36562474e3248b6a0d5
2018-08-07 10:27:37 -07:00
Zhaozhong Ni 57d0fcbdbf Automated rollback of changelist 207037226
PiperOrigin-RevId: 207125440
Change-Id: I6c572afb4d693ee72a0c458a988b0e96d191cd49
2018-08-02 10:42:48 -07:00
Brian Geffon cf44aff6e0 Add seccomp(2) support.
Add support for the seccomp syscall and the flag SECCOMP_FILTER_FLAG_TSYNC.

PiperOrigin-RevId: 207101507
Change-Id: I5eb8ba9d5ef71b0e683930a6429182726dc23175
2018-08-02 08:10:30 -07:00
Michael Pratt 60add78980 Automated rollback of changelist 207007153
PiperOrigin-RevId: 207037226
Change-Id: I8b5f1a056d4f3eab17846f2e0193bb737ecb5428
2018-08-01 19:57:32 -07:00
Zhaozhong Ni b9e1cf8404 stateify: convert all packages to use explicit mode.
PiperOrigin-RevId: 207007153
Change-Id: Ifedf1cc3758dc18be16647a4ece9c840c1c636c9
2018-08-01 15:43:24 -07:00
Andrei Vagin a7a0167716 proc: show file flags in fdinfo
Currently, there is an attempt to print FD flags, but
they are not decoded into a number, so we see something like this:

/criu # cat /proc/self/fdinfo/0
flags: {%!o(bool=000false)}

Actually, fdinfo has to contain file flags.

Change-Id: Idcbb7db908067447eb9ae6f2c3cfb861f2be1a97
PiperOrigin-RevId: 206794498
2018-07-31 11:19:15 -07:00
Zhaozhong Ni be7fcbc558 stateify: support explicit annotation mode; convert refs and stack packages.
We have been unnecessarily creating too many savable types implicitly.

PiperOrigin-RevId: 206334201
Change-Id: Idc5a3a14bfb7ee125c4f2bb2b1c53164e46f29a8
2018-07-27 10:17:21 -07:00
Adin Scannell 8b8aad91d5 kernel: mutations on creds now require a copy.
PiperOrigin-RevId: 205315612
Change-Id: I9a0a1e32c8abfb7467a38743b82449cc92830316
2018-07-19 15:48:56 -07:00
Adin Scannell 29e00c943a Add CPUID faulting for ptrace and KVM.
PiperOrigin-RevId: 204858314
Change-Id: I8252bf8de3232a7a27af51076139b585e73276d4
2018-07-16 22:02:58 -07:00
Neel Natu 8f21c0bb28 Add EventOperations.HostFD()
This method allows an eventfd inside the Sentry to be registered with with
the host kernel.

Update comment about memory mapping host fds via CachingInodeOperations.

PiperOrigin-RevId: 204784859
Change-Id: I55823321e2d84c17ae0f7efaabc6b55b852ae257
2018-07-16 12:20:05 -07:00
Zhaozhong Ni 1cd46c8dd1 sentry: wait for restore clock instead of panicing in Timekeeper.
PiperOrigin-RevId: 204372296
Change-Id: If1ed9843b93039806e0c65521f30177dc8036979
2018-07-12 15:09:02 -07:00
Michael Pratt 41e0b977e5 Format documentation
PiperOrigin-RevId: 204323728
Change-Id: I1ff9aa062ffa12583b2e38ec94c87db7a3711971
2018-07-12 10:37:21 -07:00
Jamie Liu b9c469f372 Move ptrace constants to abi/linux.
PiperOrigin-RevId: 204188763
Change-Id: I5596ab7abb3ec9e210a7f57b3fc420e836fa43f3
2018-07-11 14:24:19 -07:00
Zhaozhong Ni b1683df90b netstack: tcp socket connected state S/R support.
PiperOrigin-RevId: 203958972
Change-Id: Ia6fe16547539296d48e2c6731edacdd96bd6e93c
2018-07-10 09:23:35 -07:00
Jamie Liu 41aeb680b1 Inherit parent in clone(CLONE_THREAD) under TaskSet.mu.
PiperOrigin-RevId: 203849534
Change-Id: I4d81513bfd32e0b7fc40c8a4c194eba7abc35a83
2018-07-09 16:16:19 -07:00
Nicolas Lacasse f93bd2cbe6 Hold t.mu while calling t.FSContext().
PiperOrigin-RevId: 202562686
Change-Id: I0f5be7cc9098e86fa31d016251c127cb91084b05
2018-06-28 16:11:19 -07:00
Fabricio Voznika 6b6852bceb Fix semaphore data races
PiperOrigin-RevId: 202371908
Change-Id: I72603b1d321878cae6404987c49e64732b676331
2018-06-27 14:41:32 -07:00
Nicolas Lacasse 99afc982f1 Call mm.CheckIORange() when copying in IOVecs.
CheckIORange is analagous to Linux's access_ok() method, which is checked when
copying in IOVecs in both lib/iov_iter.c:import_single_range() and
lib/iov_iter.c:import_iovec() => fs/read_write.c:rw_copy_check_uvector().

gVisor copies in IOVecs via Task.SingleIOSequence() and Task.CopyInIovecs().
We were checking the address range bounds, but not whether the address is
valid. To conform with linux, we should also check that the address is valid.

For usual preadv/pwritev syscalls, the effect of this change is not noticeable,
since we find out that the address is invalid before the syscall completes.

For vectorized async-IO operations, however, this change is necessary because
Linux returns EFAULT when the operation is submitted, but before it executes.
Thus, we must validate the iovecs when copying them in.

PiperOrigin-RevId: 202370092
Change-Id: I8759a63ccf7e6b90d90d30f78ab8935a0fcf4936
2018-06-27 14:31:35 -07:00
Brian Geffon 51c1e510ab Automated rollback of changelist 201596247
PiperOrigin-RevId: 202151720
Change-Id: I0491172c436bbb32b977f557953ba0bc41cfe299
2018-06-26 10:33:24 -07:00
Michael Pratt 4ac79312b0 Don't read cwd or root without holding mu
PiperOrigin-RevId: 202043090
Change-Id: I3c47fb3413ca8615d50d8a0503d72fcce9b09421
2018-06-25 16:46:29 -07:00
Michael Pratt 478f0ac003 Don't read FSContext.root without holding FSContext.mu
IsChrooted still has the opportunity to race with another thread
entering the FSContext into a chroot, but that is unchanged (and
fine, AFAIK).

PiperOrigin-RevId: 202029117
Change-Id: I38bce763b3a7715fa6ae98aa200a19d51a0235f1
2018-06-25 15:22:56 -07:00
Zhaozhong Ni 0e434b66a6 netstack: tcp socket connected state S/R support.
PiperOrigin-RevId: 201596247
Change-Id: Id22f47b2cdcbe14aa0d930f7807ba75f91a56724
2018-06-21 15:19:45 -07:00
Michael Pratt 2dedbc7211 Drop return from SendExternalSignal
SendExternalSignal is no longer called before CreateProcess, so it can
enforce this simplified precondition.

StartForwarding, and after Kernel.Start.

PiperOrigin-RevId: 201591170
Change-Id: Ib7022ef7895612d7d82a00942ab59fa433c4d6e9
2018-06-21 14:53:55 -07:00
Ian Gudger d571a4359c Implement ioctl(FIOASYNC)
FIOASYNC and friends are used to send signals when a file is ready for IO.

This may or may not be needed by Nginx. While Nginx does use it, it is unclear
if the code that uses it has any effect.

PiperOrigin-RevId: 201550828
Change-Id: I7ba05a7db4eb2dfffde11e9bd9a35b65b98d7f50
2018-06-21 10:53:21 -07:00
Zhaozhong Ni 4e9f0e91d7 sentry: pending signals S/R optimization.
Almost all of the hundreds of pending signal queues are empty upon save.

PiperOrigin-RevId: 201380318
Change-Id: I40747072435299de681d646e0862efac0637e172
2018-06-20 11:02:41 -07:00
Zhaozhong Ni aa14a2c1be sentry: futex S/R optimization.
No need to save thousands of zerovalue buckets.

PiperOrigin-RevId: 201258598
Change-Id: I5d3ea7b6a5345117ab4f610332d5288ca550be33
2018-06-19 16:08:00 -07:00
Brian Geffon fa6db05e0c FIFOs should support O_TRUNC as a no-op.
PiperOrigin-RevId: 200759323
Change-Id: I683b2edcc2188304c4ca563e46af457e23625905
2018-06-15 12:55:29 -07:00
Jamie Liu 657db692b2 Ignore expiration count in kernelCPUClockListener.Notify.
PiperOrigin-RevId: 200590832
Change-Id: I35b817ecccc9414a742dee4815dfc67d0c7d0496
2018-06-14 11:35:11 -07:00
Adin Scannell 6728f09910 Fix sigaltstack semantics.
Walking off the bottom of the sigaltstack, for example with recursive faults,
results in forced signal delivery, not resetting the stack or pushing signal
stack to whatever happens to lie below the signal stack.

PiperOrigin-RevId: 199856085
Change-Id: I0004d2523f0df35d18714de2685b3eaa147837e0
2018-06-08 15:01:21 -07:00
Rahat Mahmood b904250b86 Fix capability check for sysv semaphores.
Capabilities for sysv sem operations were being checked against the
current task's user namespace. They should be checked against the user
namespace owning the ipc namespace for the sems instead, per
ipc/util.c:ipcperms().

PiperOrigin-RevId: 197063111
Change-Id: Iba29486b316f2e01ee331dda4e48a6ab7960d589
2018-05-17 15:38:11 -07:00
Rahat Mahmood 8878a66a56 Implement sysv shm.
PiperOrigin-RevId: 197058289
Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
2018-05-17 15:06:19 -07:00
Kevin Krakauer 96c28a4368 sentry: Replaces saving of inet.Stack with retrieval via context.
Previously, inet.Stack was referenced in 2 structs in sentry/socket that can be
saved/restored.  If an app is saved and restored on another machine, it may try
to use the old stack, which will have been replaced by a new stack on the new
machine.

PiperOrigin-RevId: 196733985
Change-Id: I6a8cfe73b5d7a90749734677dada635ab3389cb9
2018-05-15 14:56:18 -07:00
Michael Pratt 8deabbaae1 Remove error return from AddressSpace.Release()
PiperOrigin-RevId: 196291289
Change-Id: Ie3487be029850b0b410b82416750853a6c4a2b00
2018-05-11 12:24:15 -07:00
Zhaozhong Ni 174161013d Capture restore file system corruption errors in exit error.
PiperOrigin-RevId: 195850822
Change-Id: I4d7bdd8fe129c5ed461b73e1d7458be2cf5680c2
2018-05-08 11:36:59 -07:00
Ian Gudger b4765f782d Fix warning: redundant if ...; err != nil check, just return error instead.
This warning is produced by golint.

PiperOrigin-RevId: 195833381
Change-Id: Idd6a7e57e3cfdf00819f2374b19fc113585dc1e1
2018-05-08 09:51:56 -07:00
Ian Gudger d0d01a1896 Fix format string type in test
PiperOrigin-RevId: 195831778
Change-Id: I413dc909cedc18fbf5320a4f75d876f1be133c6c
2018-05-08 09:39:42 -07:00
Ian Gudger 7c8c3705ea Fix misspellings
PiperOrigin-RevId: 195742598
Change-Id: Ibd4a8e4394e268c87700b6d1e50b4b37dfce5182
2018-05-07 16:38:01 -07:00
Ian Gudger f47174f06b Run gofmt -s on everything
PiperOrigin-RevId: 195469901
Change-Id: I66d5c7a334bbb8b47e40d266a2661291c2d91c7f
2018-05-04 14:16:11 -07:00
Cyrille Hemidy 04b79137ba Fix misspellings.
PiperOrigin-RevId: 195307689
Change-Id: I499f19af49875a43214797d63376f20ae788d2f4
2018-05-03 14:06:13 -07:00
Zhengyu He 2264ce7f62 Use png for the run states diagram
PiperOrigin-RevId: 195071508
Change-Id: I63314bf7529560e4c779ef07cc9399ad8d53f0a2
2018-05-02 03:43:41 -07:00
Ian Gudger 3d3deef573 Implement SO_TIMESTAMP
PiperOrigin-RevId: 195047018
Change-Id: I6d99528a00a2125f414e1e51e067205289ec9d3d
2018-05-01 22:11:49 -07:00
Googler d02b74a5dc Check in gVisor.
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
2018-04-28 01:44:26 -04:00