Commit Graph

164 Commits

Author SHA1 Message Date
Fabricio Voznika 0b76887147 Priority-inheritance futex implementation
It is Implemented without the priority inheritance part given
that gVisor defers scheduling decisions to Go runtime and doesn't
have control over it.

PiperOrigin-RevId: 236989545
Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
2019-03-05 23:40:18 -08:00
Fabricio Voznika fcba4e8f04 Add uncaught signal message to the user log
This help troubleshoot cases where the container is killed and the
app logs don't show the reason.

PiperOrigin-RevId: 236982883
Change-Id: I361892856a146cea5b04abaa3aedbf805e123724
2019-03-05 22:20:17 -08:00
Fabricio Voznika 3dbd4a16f8 Add semctl(GETPID) syscall
Also added unimplemented notification for semctl(2)
commands.

PiperOrigin-RevId: 236340672
Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478
2019-03-01 10:57:02 -08:00
Kevin Krakauer b75aa51504 Rename ping endpoints to icmp endpoints.
PiperOrigin-RevId: 235248572
Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09
2019-02-22 13:34:47 -08:00
Nicolas Lacasse 0a41ea72c1 Don't allow writing or reading to TTY unless process group is in foreground.
If a background process tries to read from a TTY, linux sends it a SIGTTIN
unless the signal is blocked or ignored, or the process group is an orphan, in
which case the syscall returns EIO.

See drivers/tty/n_tty.c:n_tty_read()=>job_control().

If a background process tries to write a TTY, set the termios, or set the
foreground process group, linux then sends a SIGTTOU. If the signal is ignored
or blocked, linux allows the write. If the process group is an orphan, the
syscall returns EIO.

See drivers/tty/tty_io.c:tty_check_change().

PiperOrigin-RevId: 234044367
Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9
2019-02-14 15:47:31 -08:00
Bhasker Hariharan e0b3d3323f Add support for using PACKET_RX_RING to receive packets.
PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the
kernel. This should cut down the number of host syscalls that need to be made
to receive packets when the underlying fd is a socket of the AF_PACKET type.

PiperOrigin-RevId: 233834998
Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f
2019-02-13 14:53:03 -08:00
Nicolas Lacasse 92e85623a0 Factor the subtargets method into a helper method with tests.
PiperOrigin-RevId: 232047515
Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60
2019-02-01 15:23:43 -08:00
Michael Pratt 2a0c69b19f Remove license comments
Nothing reads them and they can simply get stale.

Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD

PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-31 11:12:53 -08:00
Andrei Vagin 7e8a56087b runsc: check whether a container is deleted or not before setupContainerFS
PiperOrigin-RevId: 231811387
Change-Id: Ib143fb9a4d0fa1f105d1a3a3bd533dfc44e792af
2019-01-31 10:34:15 -08:00
Andrei Vagin dd577f5410 runsc: reap a sandbox process only in sandbox.Wait()
PiperOrigin-RevId: 231504064
Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d
2019-01-29 17:15:56 -08:00
Bhasker Hariharan 24cb2c0a72 Use recvmmsg() instead of readv() to read packets from NIC.
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.

PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
2019-01-29 01:39:01 -08:00
Fabricio Voznika c1be25b78d Scrub runsc error messages
Removed "error" and "failed to" prefix that don't add value
from messages. Adjusted a few other messages.  In particular,
when the container fail to start, the message returned is easier
for humans to read:

$ docker run --rm --runtime=runsc alpine foobar
docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory

Closes #77

PiperOrigin-RevId: 230022798
Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-18 17:36:02 -08:00
Fabricio Voznika e4d3ca7263 Prevent internal tmpfs mount to override files in /tmp
Runsc wants to mount /tmp using internal tmpfs implementation for
performance. However, it risks hiding files that may exist under
/tmp in case it's present in the container. Now, it only mounts
over /tmp iff:
  - /tmp was not explicitly asked to be mounted
  - /tmp is empty

If any of this is not true, then /tmp maps to the container's
image /tmp.

Note: checkpoint doesn't have sentry FS mounted to check if /tmp
is empty. It simply looks for explicit mounts right now.
PiperOrigin-RevId: 229607856
Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
2019-01-16 12:48:32 -08:00
Nicolas Lacasse dc8450b567 Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.
More helper structs have been added to the fsutil package to make it easier to
implement fs.InodeOperations and fs.FileOperations.

PiperOrigin-RevId: 229305982
Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2019-01-14 20:34:28 -08:00
Michael Pratt 33191e1cc4 Automated rollback of changelist 225089593
PiperOrigin-RevId: 227595007
Change-Id: If14cc5aab869c5fd7a4ebd95929c887ab690e94c
2019-01-02 15:48:00 -08:00
Fabricio Voznika a891afad6d Simplify synchronization between runsc and sandbox process
Make 'runsc create' join cgroup before creating sandbox process.
This removes the need to synchronize platform creation and ensure
that sandbox process is charged to the right cgroup from the start.

PiperOrigin-RevId: 227166451
Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2
2018-12-28 13:48:24 -08:00
Jamie Liu 194ef586fc Rename limits.MemoryPagesLocked to limits.MemoryLocked.
"RLIMIT_MEMLOCK: This is the maximum number of bytes of memory that may
be locked into RAM." - getrlimit(2)

PiperOrigin-RevId: 226384346
Change-Id: Iefac4a1bb69f7714dc813b5b871226a8344dc800
2018-12-20 13:28:46 -08:00
Googler 86c9bd2547 Automated rollback of changelist 225861605
PiperOrigin-RevId: 226224230
Change-Id: Id24c7d3733722fd41d5fe74ef64e0ce8c68f0b12
2018-12-19 13:30:08 -08:00
Michael Pratt b62591e6a8 Expose internal testing flag
Never to used outside of runsc tests!

PiperOrigin-RevId: 225919013
Change-Id: Ib3b14aa2a2564b5246fb3f8933d95e01027ed186
2018-12-17 17:35:06 -08:00
Jamie Liu 2421006426 Implement mlock(), kind of.
Currently mlock() and friends do nothing whatsoever. However, mlocking
is directly application-visible in a number of ways; for example,
madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked
regions. We handle this inconsistently: MADV_DONTNEED is too important
to not work, but MS_INVALIDATE is rejected.

Change MM to track mlocked regions in a manner consistent with Linux.
It still will not actually pin pages into host physical memory, but:

- mlock() will now cause sentry memory management to precommit mlocked
pages.

- MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as
described above.

PiperOrigin-RevId: 225861605
Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee
2018-12-17 11:38:59 -08:00
Michael Pratt 24c1158b9c Add "trace signal" option
This option is effectively equivalent to -panic-signal, except that the
sandbox does not die after logging the traceback.

PiperOrigin-RevId: 225089593
Change-Id: Ifb1c411210110b6104613f404334bd02175e484e
2018-12-11 16:12:41 -08:00
Brian Geffon d3bc79bc84 Open source system call tests.
PiperOrigin-RevId: 224886231
Change-Id: I0fccb4d994601739d8b16b1d4e6b31f40297fb22
2018-12-10 14:42:34 -08:00
Zhaozhong Ni 9984138abe sentry: turn "dynamically-created" procfs files into static creation.
PiperOrigin-RevId: 224600982
Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34
2018-12-07 17:03:54 -08:00
Brian Geffon 82719be42e Max link traversals should be for an entire path.
The number of symbolic links that are allowed to be followed
are for a full path and not just a chain of symbolic links.

PiperOrigin-RevId: 224047321
Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
2018-12-04 14:32:03 -08:00
Fabricio Voznika eaac94d91c Use RET_KILL_PROCESS if available in kernel
RET_KILL_THREAD doesn't work well for Go because it will
kill only the offending thread and leave the process hanging.
RET_TRAP can be masked out and it's not guaranteed to kill
the process. RET_KILL_PROCESS is available since 4.14.

For older kernel, continue to use RET_TRAP as this is the
best option (likely to kill process, easy to debug).

PiperOrigin-RevId: 222357867
Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
2018-11-20 22:56:51 -08:00
Fabricio Voznika fadffa2ff8 Add unsupported syscall events for get/setsockopt
PiperOrigin-RevId: 222148953
Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa
2018-11-20 14:04:12 -08:00
Fabricio Voznika 237f9c7a5e Don't fail when destroyContainerFS is called more than once
This can happen when destroy is called multiple times or when destroy
failed previously and is being called again.

PiperOrigin-RevId: 221882034
Change-Id: I8d069af19cf66c4e2419bdf0d4b789c5def8d19e
2018-11-20 14:03:42 -08:00
Nicolas Lacasse 845836c578 Internal change.
PiperOrigin-RevId: 221848471
Change-Id: I882fbe5ce7737048b2e1f668848e9c14ed355665
2018-11-20 14:03:11 -08:00
Nicolas Lacasse 40f843fc78 Internal change.
PiperOrigin-RevId: 221343626
Change-Id: I03d57293a555cf4da9952a81803b9f8463173c89
2018-11-13 15:18:17 -08:00
Nicolas Lacasse 6c2d320138 Internal change.
PiperOrigin-RevId: 221299066
Change-Id: I8ae352458f9976c329c6946b1efa843a3de0eaa4
2018-11-13 11:14:28 -08:00
Fabricio Voznika d97ccfa346 Close donated files if containerManager.Start() fails
PiperOrigin-RevId: 220869535
Change-Id: I9917e5daf02499f7aab6e2aa4051c54ff4461b9a
2018-11-09 14:54:34 -08:00
Nicolas Lacasse 13b48f2e6a AsyncBarrier should be run after all defers in destroyContainerFS.
destroyContainerFS must wait for all async operations to finish before
returning. In an attempt to do this, we call fs.AsyncBarrier() at the end of
the function. However, there are many defer'd DecRefs which end up running
AFTER the AsyncBarrier() call.

This CL fixes this by calling fs.AsyncBarrier() in the first defer statement,
thus ensuring that it runs at the end of the function, after all other defers.

PiperOrigin-RevId: 220523545
Change-Id: I5e96ee9ea6d86eeab788ff964484c50ef7f64a2f
2018-11-07 13:55:36 -08:00
Fabricio Voznika c92b9b7086 Add more logging to controller.go
PiperOrigin-RevId: 220519632
Change-Id: Iaeec007fc1aa3f0b72569b288826d45f2534c4bf
2018-11-07 13:33:19 -08:00
Fabricio Voznika 86b3f0cd24 Fix race between start and destroy
Before this change, a container starting up could race with
destroy (aka delete) and leave processes behind.

Now, whenever a container is created, Loader.processes gets
a new entry. Start now expects the entry to be there, and if
it's not it means that the container was deleted.

I've also fixed Loader.waitPID to search for the process using
the init process's PID namespace.

We could use a few more tests for signal and wait. I'll send
them in another cl.

PiperOrigin-RevId: 220224290
Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
2018-11-05 21:29:37 -08:00
Fabricio Voznika a467f09261 Log when external signal is received
PiperOrigin-RevId: 220204591
Change-Id: I21a9c6f5c12a376d18da5d10c1871837c4f49ad2
2018-11-05 17:42:24 -08:00
Fabricio Voznika 5cd55cd90f Use spec with clean paths for gofer
Otherwise the gofer's attach point may be different from sandbox when there
symlinks in the path.

PiperOrigin-RevId: 219730492
Change-Id: Ia9c4c2d16228c6a1a9e790e0cb673fd881003fe1
2018-11-01 17:52:11 -07:00
Fabricio Voznika b6b81fd04b Add new log format that is compatible with Kubernetes
Fluentd configuration uses 'log' for the log message
while containerd uses 'msg'. Since we can't have a single
JSON format for both, add another log format and make
debug log configurable.

PiperOrigin-RevId: 219729658
Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0
2018-11-01 17:44:58 -07:00
Ian Lewis 9d69d85bc1 Make error messages a bit more user friendly.
Updated error messages so that it doesn't print full Go struct representations
when running a new container in a sandbox. For example, this occurs frequently
when commands are not found when doing a 'kubectl exec'.

PiperOrigin-RevId: 219729141
Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
2018-11-01 17:40:09 -07:00
Adin Scannell 0091db9cbd kvm: use private futexes.
Use private futexes for performance and to align with other runtime uses.

PiperOrigin-RevId: 219422634
Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd
2018-10-30 22:46:42 -07:00
Kevin Krakauer b42a2a3203 Removes outdated TODO.
PiperOrigin-RevId: 219151173
Change-Id: I73014ea648ae485692ea0d44860c87f4365055cb
2018-10-29 10:31:56 -07:00
Adin Scannell 75cd70ecc9 Track paths and provide a rename hook.
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.

PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-23 00:20:15 -07:00
Fabricio Voznika b2068cf5a5 Add more unimplemented syscall events
Added events for *ctl syscalls that may have multiple different commands.
For runsc, each syscall event is only logged once. For *ctl syscalls, use
the cmd as identifier, not only the syscall number.

PiperOrigin-RevId: 218015941
Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59
2018-10-20 11:14:23 -07:00
Ian Gudger 8fce67af24 Use correct company name in copyright header
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-19 16:35:11 -07:00
Fabricio Voznika f3ffa4db52 Resolve mount paths while setting up root fs mount
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.

Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.

It haven't been able to write a good test for it. So I'm relying on manual tests
for now.

PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-18 12:42:24 -07:00
Nicolas Lacasse e0bb94201f Close the gofer socket gracefully in boot:boot_test.
We were closing the FD directly. If the test then created a new socket pair
with the same FD, in-flight RPCs would get directed to the new socket and break
the test.

Instead, we should use unet.Socket.Close(), which allows any in-flight RPCs to
finish.

PiperOrigin-RevId: 217608491
Change-Id: I8c5a76638899ba30f33ca976e6fac967fa0aadbf
2018-10-17 16:18:39 -07:00
Nicolas Lacasse 4e6f0892c9 runsc: Support job control signals for the root container.
Now containers run with "docker run -it" support control characters like ^C and
^Z.

This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.

PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17 12:29:05 -07:00
Kevin Krakauer 8cbca46b6d Remove incorrect TODO.
PiperOrigin-RevId: 217548429
Change-Id: Ie640c881fdc4fc70af58c8ca834df1ac531e519a
2018-10-17 10:55:34 -07:00
Kevin Krakauer 9b3550f70b runsc: Add --pid flag to runsc kill.
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.

PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-17 10:51:39 -07:00
Nicolas Lacasse 3bc5e6482b Fix reference leak in tests.
PiperOrigin-RevId: 216780438
Change-Id: Ide637fe36f8d2a61fea9e5b16d1b3401f2540416
2018-10-11 16:23:54 -07:00
Fabricio Voznika e68d86e1bd Make debug log file name configurable
This is a breaking change if you're using --debug-log-dir.
The fix is to replace it with --debug-log and add a '/' at
the end:
  --debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/

PiperOrigin-RevId: 216761212
Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
2018-10-11 14:29:37 -07:00