Removed "error" and "failed to" prefix that don't add value
from messages. Adjusted a few other messages. In particular,
when the container fail to start, the message returned is easier
for humans to read:
$ docker run --rm --runtime=runsc alpine foobar
docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory
Closes#77
PiperOrigin-RevId: 230022798
Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
In addition, it fixes a race condition in TestMultiContainerGoferStop.
There are two scripts copy the same set of files into the same directory
and sometime one of this command fails with EXIST.
PiperOrigin-RevId: 230011247
Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c
Runsc wants to mount /tmp using internal tmpfs implementation for
performance. However, it risks hiding files that may exist under
/tmp in case it's present in the container. Now, it only mounts
over /tmp iff:
- /tmp was not explicitly asked to be mounted
- /tmp is empty
If any of this is not true, then /tmp maps to the container's
image /tmp.
Note: checkpoint doesn't have sentry FS mounted to check if /tmp
is empty. It simply looks for explicit mounts right now.
PiperOrigin-RevId: 229607856
Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
More helper structs have been added to the fsutil package to make it easier to
implement fs.InodeOperations and fs.FileOperations.
PiperOrigin-RevId: 229305982
Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
In this case, new mounts are not created in the host mount namspaces, so
tearDownChroot isn't needed, because chroot will be destroyed with a
sandbox mount namespace.
In additional, pivot_root can't be called instead of chroot.
PiperOrigin-RevId: 229250871
Change-Id: I765bdb587d0b8287a6a8efda8747639d37c7e7b6
And we need to wait a gofer process before cgroup.Uninstall,
because it is running in the sandbox cgroups.
PiperOrigin-RevId: 228904020
Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
The original code assumed that it was safe to join and not restore cgroup,
but Container.Run will not exit after calling start, making cgroup cleanup
fail because there were still processes inside the cgroup.
PiperOrigin-RevId: 228529199
Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51
File/dir/symlink creation is multi-step and may leave state behind in
case of failure in one of the steps. Added best effort attempt to
clean up.
PiperOrigin-RevId: 228286612
Change-Id: Ib03c27cd3d3e4f44d0352edc6ee212a53412d7f1
Make 'runsc create' join cgroup before creating sandbox process.
This removes the need to synchronize platform creation and ensure
that sandbox process is charged to the right cgroup from the start.
PiperOrigin-RevId: 227166451
Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2
"RLIMIT_MEMLOCK: This is the maximum number of bytes of memory that may
be locked into RAM." - getrlimit(2)
PiperOrigin-RevId: 226384346
Change-Id: Iefac4a1bb69f7714dc813b5b871226a8344dc800
Currently mlock() and friends do nothing whatsoever. However, mlocking
is directly application-visible in a number of ways; for example,
madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked
regions. We handle this inconsistently: MADV_DONTNEED is too important
to not work, but MS_INVALIDATE is rejected.
Change MM to track mlocked regions in a manner consistent with Linux.
It still will not actually pin pages into host physical memory, but:
- mlock() will now cause sentry memory management to precommit mlocked
pages.
- MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as
described above.
PiperOrigin-RevId: 225861605
Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee
If the sandbox process is dead (because of a panic or some other problem),
container.Destroy will never remove the container metadata file, since it will
always fail when calling container.stop().
This CL changes container.Destroy() to always perform the three necessary
cleanup operations:
* Stop the sandbox and gofer processes.
* Remove the container fs on the host.
* Delete the container metadata directory.
Errors from these three operations will be concatenated and returned from
Destroy().
PiperOrigin-RevId: 225448164
Change-Id: I99c6311b2e4fe5f6e2ca991424edf1ebeae9df32
This option is effectively equivalent to -panic-signal, except that the
sandbox does not die after logging the traceback.
PiperOrigin-RevId: 225089593
Change-Id: Ifb1c411210110b6104613f404334bd02175e484e
The number of symbolic links that are allowed to be followed
are for a full path and not just a chain of symbolic links.
PiperOrigin-RevId: 224047321
Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
gvisor-containerd-shim moved. It now has a stable URL that run_tests.sh always
uses.
PiperOrigin-RevId: 223188822
Change-Id: I5687c78289404da27becd8d5949371e580fdb360
RET_KILL_THREAD doesn't work well for Go because it will
kill only the offending thread and leave the process hanging.
RET_TRAP can be masked out and it's not guaranteed to kill
the process. RET_KILL_PROCESS is available since 4.14.
For older kernel, continue to use RET_TRAP as this is the
best option (likely to kill process, easy to debug).
PiperOrigin-RevId: 222357867
Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
We were relying on time.UnixNano, but that was causing collisions.
Now we generate 20 bytes of entropy from rand.Read, and base32-encode it to get
a valid container id.
PiperOrigin-RevId: 222313867
Change-Id: Iaeea9b9582d36de55f9f02f55de6a5de3f739371
This can happen when destroy is called multiple times or when destroy
failed previously and is being called again.
PiperOrigin-RevId: 221882034
Change-Id: I8d069af19cf66c4e2419bdf0d4b789c5def8d19e
sandbox.Wait is racey, as the sandbox may have exited before it is called, or
even during.
We already had code to handle the case that the sandbox exits during the Wait
call, but we were not properly handling the case where the sandbox has exited
before the call.
The best we can do in such cases is return the sandbox exit code as the
application exit code.
PiperOrigin-RevId: 221702517
Change-Id: I290d0333cc094c7c1c3b4ce0f17f61a3e908d787
Each container has its respective gofer. Test that
gofer can be shutdown when a container stops and that
it doesn't affect other containers.
PiperOrigin-RevId: 220829898
Change-Id: I2a44a3cf2a88577e6ad1133afc622bbf4a5f6591
SetupContainerInRoot was setting Config.RootDir unnecessarily
and causing a --race violation in TestMultiContainerDestroyStarting.
PiperOrigin-RevId: 220580073
Change-Id: Ie0b28c19846106c7458a92681b708ae70f87d25a
destroyContainerFS must wait for all async operations to finish before
returning. In an attempt to do this, we call fs.AsyncBarrier() at the end of
the function. However, there are many defer'd DecRefs which end up running
AFTER the AsyncBarrier() call.
This CL fixes this by calling fs.AsyncBarrier() in the first defer statement,
thus ensuring that it runs at the end of the function, after all other defers.
PiperOrigin-RevId: 220523545
Change-Id: I5e96ee9ea6d86eeab788ff964484c50ef7f64a2f
Before this change, a container starting up could race with
destroy (aka delete) and leave processes behind.
Now, whenever a container is created, Loader.processes gets
a new entry. Start now expects the entry to be there, and if
it's not it means that the container was deleted.
I've also fixed Loader.waitPID to search for the process using
the init process's PID namespace.
We could use a few more tests for signal and wait. I'll send
them in another cl.
PiperOrigin-RevId: 220224290
Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
More tests will come, but it's worth getting what's done so far reviewed.
PiperOrigin-RevId: 219734531
Change-Id: If15ca6e6855e3d1cc28c83b5f9c3a72cb65b2e59
Otherwise the gofer's attach point may be different from sandbox when there
symlinks in the path.
PiperOrigin-RevId: 219730492
Change-Id: Ia9c4c2d16228c6a1a9e790e0cb673fd881003fe1
Fluentd configuration uses 'log' for the log message
while containerd uses 'msg'. Since we can't have a single
JSON format for both, add another log format and make
debug log configurable.
PiperOrigin-RevId: 219729658
Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0
Updated error messages so that it doesn't print full Go struct representations
when running a new container in a sandbox. For example, this occurs frequently
when commands are not found when doing a 'kubectl exec'.
PiperOrigin-RevId: 219729141
Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
From https://golang.org/pkg/net/http/#Get:
"When err is nil, resp always contains a non-nil resp.Body. Caller should close
resp.Body when done reading from it."
PiperOrigin-RevId: 219658052
Change-Id: I556e88ac4f2c90cd36ab16cd3163d1a52afc32b7
With recent changes to 9P server, path walks are now safe inside
open, create, rename and setattr calls. To simplify the code, remove
the lazyopen=false mode that was used for bind mounts, and converge
all mounts to using lazy open.
PiperOrigin-RevId: 219508628
Change-Id: I073e7e1e2e9a9972d150eaf4cb29e553997a9b76
Use private futexes for performance and to align with other runtime uses.
PiperOrigin-RevId: 219422634
Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd
It can be occurred if two controllers are mounted together or if Uninstall() is called on a error path.
PiperOrigin-RevId: 218723886
Change-Id: I69d7a3c0685a7da38527ea8b7b301dbe96268285
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.
PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
Errors are shown as being ignored by assigning to the blank identifier.
PiperOrigin-RevId: 218103819
Change-Id: I7cc7b9d8ac503a03de5504ebdeb99ed30a531cf2
Added events for *ctl syscalls that may have multiple different commands.
For runsc, each syscall event is only logged once. For *ctl syscalls, use
the cmd as identifier, not only the syscall number.
PiperOrigin-RevId: 218015941
Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.
Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.
It haven't been able to write a good test for it. So I'm relying on manual tests
for now.
PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
We were closing the FD directly. If the test then created a new socket pair
with the same FD, in-flight RPCs would get directed to the new socket and break
the test.
Instead, we should use unet.Socket.Close(), which allows any in-flight RPCs to
finish.
PiperOrigin-RevId: 217608491
Change-Id: I8c5a76638899ba30f33ca976e6fac967fa0aadbf
Now containers run with "docker run -it" support control characters like ^C and
^Z.
This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.
PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.
PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
It has timed out running with kokoro a few times. I passes
consistently on my machine (200+ runsc). Increase the timeout
to see if it helps.
Failure: image_test.go:212: WaitForHTTP() timeout: Get http://localhost:32785/: dial tcp [::1]:32785: connect: connection refused
PiperOrigin-RevId: 217532428
Change-Id: Ibf860aecf537830bef832e436f2e804b3fc12f2d
This is one of the many tests that fails periodically, making Kokoro unstable.
PiperOrigin-RevId: 217528257
Change-Id: I2508ecf4d74d71b91feff1183544d61d7bd16995
Sometimes if we try to remove the cgroup directory too soon after killing the
sandbox we EBUSY. This CL adds a retry (up to 5 seconds) for removing.
Deflakes ChrootTest.
PiperOrigin-RevId: 217526909
Change-Id: I749bb172117e2298c9888ecad094072393b94810
We treat handle the boot process stdio separately from the application stdio
(which gets passed via flags), but we were still sending both to same place. As
a result, some logs that are written directly to os.Stderr by the boot process
were ending up in the application logs.
This CL starts sendind boot process stdio to the null device (since we don't
have any better options). The boot process is already configured to send all
logs (and panics) to the log file, so we won't miss anything important.
PiperOrigin-RevId: 217173020
Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794
The spec command is analygous to the 'runc spec' command and allows for
the convenient creation of a config.json file for users that don't have
runc handy.
Change-Id: Ifdfec37e023048ea461c32da1a9042a45b37d856
PiperOrigin-RevId: 216907826
It's possible for Start() and Wait() calls to race, if the sandboxed
application is short-lived. If the application finishes before (or during) the
Wait RPC, then Wait will fail. In practice this looks like "connection
refused" or "EOF" errors when waiting for an RPC response.
This race is especially bad in tests, where we often run "true" inside a
sandbox.
This CL does a best-effort fix, by returning the sandbox exit status as the
container exit status. In most cases, these are the same.
This fixes the remaining flakes in runsc/container:container_test.
PiperOrigin-RevId: 216777793
Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
This is a breaking change if you're using --debug-log-dir.
The fix is to replace it with --debug-log and add a '/' at
the end:
--debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/
PiperOrigin-RevId: 216761212
Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.
For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.
PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY
in the new process, not the current process. The ioctl call is made after
duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD.
This fixes the "bad address" flakes in runsc/container:container_test, although
some other flakes remain.
PiperOrigin-RevId: 216594394
Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.
When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.
To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.
PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
Sandbox creation uses the limits and reservations configured in the
OCI spec and set cgroup options accordinly. Then it puts both the
sandbox and gofer processes inside the cgroup.
It also allows the cgroup to be pre-configured by the caller. If the
cgroup already exists, sandbox and gofer processes will join the
cgroup but it will not modify the cgroup with spec limits.
PiperOrigin-RevId: 216538209
Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
We were previously only sending to the originator of the process group.
Integration test was changed to test this behavior. It fails without the
corresponding code change.
PiperOrigin-RevId: 216297263
Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
Docker and Containerd both eat the boot processes stderr, making it difficult
to track down panics (which are always written to stderr).
This CL makes the boot process dup its debug log FD to stderr, so that panics
will be captured in the debug log, which is better than nothing.
This is the 3rd try at this CL. Previous attempts were foiled because Docker
expects the 'create' command to pass its stdio directly to the container, so
duping stderr in 'create' caused the applications stderr to go to the log file,
which breaks many applications (including our mysql test).
I added a new image_test that makes sure stdout and stderr are handled
correctly.
PiperOrigin-RevId: 215767328
Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
Sandbox was setting chroot, but was not chaging the working
dir. Added test to ensure this doesn't happen in the future.
PiperOrigin-RevId: 215676270
Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.
Without this transition, errors during Create get hidden by the later panic.
PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.
I switched back to Math.Pow format to make the arithmetic easier to inspect.
PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.
If we have a debug log file, we should send panics (and all stderr data) to the
log file.
PiperOrigin-RevId: 215585559
Change-Id: I3844259ed0cd26e26422bcdb40dded302740b8b6
We were previously using the sandbox process's stdio as the root container's
stdio. This makes it difficult/impossible to distinguish output application
output from sandbox output, such as panics, which are always written to stderr.
Also close the console socket when we are done with it.
PiperOrigin-RevId: 215585180
Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0