Commit Graph

653 Commits

Author SHA1 Message Date
Ian Gudger 6922eee649 Merge queue into Unix transport
This queue only has a single user, so there is no need for it to use an
interface. Merging it into the same package as its sole user allows us to avoid
a circular dependency.

This simplifies the code and should slightly improve performance.

PiperOrigin-RevId: 217595889
Change-Id: Iabbd5164240b935f79933618c61581bc8dcd2822
2018-10-17 15:10:20 -07:00
Nicolas Lacasse e4277cb6ff Relativize all socket paths in tests.
Otherwise they may exceed the maximum.

PiperOrigin-RevId: 217584658
Change-Id: I869e400d3409599c0d3b85c6590702c052f49550
2018-10-17 14:11:30 -07:00
Ian Gudger 8c85f5e9ce Fix typos in socket_test
PiperOrigin-RevId: 217576188
Change-Id: I82e45c306c5c9161e207311c7dbb8a983820c1df
2018-10-17 13:25:45 -07:00
Michael Pratt 8fa6f6fe76 Reflow comment to 80 columns
PiperOrigin-RevId: 217573168
Change-Id: Ic1914d0ef71bab020e3ee11cf9c4a50a702bd8dd
2018-10-17 13:06:16 -07:00
Nicolas Lacasse 4e6f0892c9 runsc: Support job control signals for the root container.
Now containers run with "docker run -it" support control characters like ^C and
^Z.

This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.

PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17 12:29:05 -07:00
Michael Pratt 578fe5a50d Fix PTRACE_GETREGSET write size
The existing logic is backwards and writes iov_len == 0 for a full write.

PiperOrigin-RevId: 217560377
Change-Id: I5a39c31bf0ba9063a8495993bfef58dc8ab7c5fa
2018-10-17 11:53:04 -07:00
Ian Gudger 6cba410df0 Move Unix transport out of netstack
PiperOrigin-RevId: 217557656
Change-Id: I63d27635b1a6c12877279995d2d9847b6a19da9b
2018-10-17 11:37:51 -07:00
Kevin Krakauer 8cbca46b6d Remove incorrect TODO.
PiperOrigin-RevId: 217548429
Change-Id: Ie640c881fdc4fc70af58c8ca834df1ac531e519a
2018-10-17 10:55:34 -07:00
Kevin Krakauer 9b3550f70b runsc: Add --pid flag to runsc kill.
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.

PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-17 10:51:39 -07:00
Zhaozhong Ni 9d17eba121 compressio: do not schedule new I/Os when there is no worker (stream closed).
PiperOrigin-RevId: 217536677
Change-Id: Ib9a5a2542df12d0bc5592b91463ffd646e2ec295
2018-10-17 09:57:57 -07:00
Fabricio Voznika ba33a70e47 Attempt to deflake TestPythonHello
It has timed out running with kokoro a few times. I passes
consistently on my machine (200+ runsc). Increase the timeout
to see if it helps.

Failure: image_test.go:212: WaitForHTTP() timeout: Get http://localhost:32785/: dial tcp [::1]:32785: connect: connection refused
PiperOrigin-RevId: 217532428
Change-Id: Ibf860aecf537830bef832e436f2e804b3fc12f2d
2018-10-17 09:31:00 -07:00
Nicolas Lacasse bdcf8d143e Bump Pause/Resume integration test timeout in attempt to deflake Kokoro.
This is one of the many tests that fails periodically, making Kokoro unstable.

PiperOrigin-RevId: 217528257
Change-Id: I2508ecf4d74d71b91feff1183544d61d7bd16995
2018-10-17 09:09:29 -07:00
Nicolas Lacasse 4fae756645 Make removing cgroups retry up to 5 seconds.
Sometimes if we try to remove the cgroup directory too soon after killing the
sandbox we EBUSY. This CL adds a retry (up to 5 seconds) for removing.

Deflakes ChrootTest.

PiperOrigin-RevId: 217526909
Change-Id: I749bb172117e2298c9888ecad094072393b94810
2018-10-17 09:03:01 -07:00
Nicolas Lacasse fb46292778 Bump rules_go and gazelle versions.
PiperOrigin-RevId: 217526027
Change-Id: I21261f5172d8eb50820f1e9f1624d24603089f12
2018-10-17 08:55:41 -07:00
Nicolas Lacasse cea51641d4 Bump sandbox start and stop timeouts.
PiperOrigin-RevId: 217433699
Change-Id: Icef08285728c23ee7dd650706aaf18da51c25dff
2018-10-16 20:34:10 -07:00
Ian Gudger 324ad3564b Refactor host.ConnectedEndpoint
* Integrate recvMsg and sendMsg functions into Recv and Send respectively as
  they are no longer shared.
* Clean up partial read/write error handling code.
* Re-order code to make sense given that there is no longer a host.endpoint
  type.

PiperOrigin-RevId: 217255072
Change-Id: Ib43fe9286452f813b8309d969be11f5fa40694cd
2018-10-15 20:23:18 -07:00
Ian Gudger 167f2401c4 Merge host.endpoint into host.ConnectedEndpoint
host.endpoint contained duplicated logic from the sockerpair implementation and
host.ConnectedEndpoint. Remove host.endpoint in favor of a
host.ConnectedEndpoint wrapped in a socketpair end.

PiperOrigin-RevId: 217240096
Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c
2018-10-15 17:48:11 -07:00
Nicolas Lacasse ecd94ea7a6 Clean up Rename and Unlink checks for EBUSY.
- Change Dirent.Busy => Dirent.isMountPoint. The function body is unchanged,
  and it is no longer exported.

- fs.MayDelete now checks that the victim is not the process root. This aligns
  with Linux's namei.c:may_delete().

- Fix "is-ancestor" checks to actually compare all ancestors, not just the
  parents.

- Fix handling of paths that end in dots, which are handled differently in
  Rename vs. Unlink.

PiperOrigin-RevId: 217239274
Change-Id: I7a0eb768e70a1b2915017ce54f7f95cbf8edf1fb
2018-10-15 17:42:30 -07:00
Nicolas Lacasse 3f05325956 Never send boot process stdio to application stdio.
We treat handle the boot process stdio separately from the application stdio
(which gets passed via flags), but we were still sending both to same place. As
a result, some logs that are written directly to os.Stderr by the boot process
were ending up in the application logs.

This CL starts sendind boot process stdio to the null device (since we don't
have any better options). The boot process is already configured to send all
logs (and panics) to the log file, so we won't miss anything important.

PiperOrigin-RevId: 217173020
Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794
2018-10-15 11:08:49 -07:00
Zhaozhong Ni 4ea69fce8d sentry: save fs.Dirent deleted info.
PiperOrigin-RevId: 217155458
Change-Id: Id3265b1ec784787039e2131c80254ac4937330c7
2018-10-15 09:31:32 -07:00
Kevin Krakauer 47d3862c33 runsc: Support retrieving MTU via netdevice ioctl.
This enables ifconfig to display MTU.

PiperOrigin-RevId: 216917021
Change-Id: Id513b23d9d76899bcb71b0b6a25036f41629a923
2018-10-12 13:58:32 -07:00
Ian Lewis a771775f3a Added spec command to create OCI spec config.json
The spec command is analygous to the 'runc spec' command and allows for
the convenient creation of a config.json file for users that don't have
runc handy.

Change-Id: Ifdfec37e023048ea461c32da1a9042a45b37d856
PiperOrigin-RevId: 216907826
2018-10-12 12:59:49 -07:00
Fabricio Voznika f074f0c2c7 Make the gofer process enter namespaces
This is done to further isolate the gofer from the host.

PiperOrigin-RevId: 216790991
Change-Id: Ia265b77e4e50f815d08f743a05669f9d75ad7a6f
2018-10-11 17:45:51 -07:00
Nicolas Lacasse 3bc5e6482b Fix reference leak in tests.
PiperOrigin-RevId: 216780438
Change-Id: Ide637fe36f8d2a61fea9e5b16d1b3401f2540416
2018-10-11 16:23:54 -07:00
Nicolas Lacasse ea5f6ed6ec Make Wait() return the sandbox exit status if the sandbox has exited.
It's possible for Start() and Wait() calls to race, if the sandboxed
application is short-lived. If the application finishes before (or during) the
Wait RPC, then Wait will fail.  In practice this looks like "connection
refused" or "EOF" errors when waiting for an RPC response.

This race is especially bad in tests, where we often run "true" inside a
sandbox.

This CL does a best-effort fix, by returning the sandbox exit status as the
container exit status.  In most cases, these are the same.

This fixes the remaining flakes in runsc/container:container_test.

PiperOrigin-RevId: 216777793
Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
2018-10-11 16:07:05 -07:00
Fabricio Voznika 86680fa002 Add String() method to AddressMask
PiperOrigin-RevId: 216770391
Change-Id: Idcdc28b2fe9e1b0b63b8119d445f05a8bcbce81e
2018-10-11 15:22:02 -07:00
Fabricio Voznika e68d86e1bd Make debug log file name configurable
This is a breaking change if you're using --debug-log-dir.
The fix is to replace it with --debug-log and add a '/' at
the end:
  --debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/

PiperOrigin-RevId: 216761212
Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
2018-10-11 14:29:37 -07:00
Adin Scannell 96c68b36f6 Add client sanity checking for P9.
This should reduce use-after-free errors and accidental close via create or
remove. This change includes one functional fix as well: when closing via
remove, the closed field was not set and the finalizer was not freed, so the
file would have been clunked at some random point in the future.

PiperOrigin-RevId: 216750000
Change-Id: Ice3292c6feb953ae97abac308afbafd2d9410402
2018-10-11 13:23:59 -07:00
Fabricio Voznika d40d801069 Sandbox cgroup tests
Verify that cgroup is being properly set.

PiperOrigin-RevId: 216736137
Change-Id: I0e27fd604eca67e7dd2e3548dc372ca9cc416309
2018-10-11 11:58:15 -07:00
Fabricio Voznika f413e4b117 Add bare bones unsupported syscall logging
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.

For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.

PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-11 11:56:54 -07:00
Zhaozhong Ni 0bfa03d61c sentry: allow saving of unlinked files with open fds on virtual fs.
PiperOrigin-RevId: 216733414
Change-Id: I33cd3eb818f0c39717d6656fcdfff6050b37ebb0
2018-10-11 11:41:44 -07:00
Adin Scannell 463e73d46d Add seccomp filter configuration to ptrace stubs.
This is a defense-in-depth measure. If the sentry is compromised, this prevents
system call injection to the stubs. There is some complexity with respect to
ptrace and seccomp interactions, so this protection is not really available
for kernel versions < 4.8; this is detected dynamically.

Note that this also solves the vsyscall emulation issue by adding in
appropriate trapping for those system calls. It does mean that a compromised
sentry could theoretically inject these into the stub (ignoring the trap and
resume, thereby allowing execution), but they are harmless.

PiperOrigin-RevId: 216647581
Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
2018-10-10 22:40:28 -07:00
Kevin Krakauer e21ba16d9c Removes irrelevant TODO.
PiperOrigin-RevId: 216616873
Change-Id: I4d974ab968058eadd01542081e18a987ef08f50a
2018-10-10 16:50:59 -07:00
Nicolas Lacasse 1939cd020f runsc: Pass controlling TTY by FD in the *new* process, not current process.
When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY
in the new process, not the current process. The ioctl call is made after
duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD.

This fixes the "bad address" flakes in runsc/container:container_test, although
some other flakes remain.

PiperOrigin-RevId: 216594394
Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862
2018-10-10 14:35:03 -07:00
Jonathan Giannuzzi 8388a505e7 Support for older Linux kernels without getrandom
Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578
PiperOrigin-RevId: 216591484
2018-10-10 14:18:47 -07:00
Michael Pratt ddb34b3690 Enforce message size limits and avoid host calls with too many iovecs
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.

When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.

To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.

PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
2018-10-10 14:10:17 -07:00
Nicolas Lacasse b78552d30e When creating a new process group, add it to the session.
PiperOrigin-RevId: 216554791
Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35
2018-10-10 10:42:11 -07:00
Fabricio Voznika 29cd05a7c6 Add sandbox to cgroup
Sandbox creation uses the limits and reservations configured in the
OCI spec and set cgroup options accordinly. Then it puts both the
sandbox and gofer processes inside the cgroup.

It also allows the cgroup to be pre-configured by the caller. If the
cgroup already exists, sandbox and gofer processes will join the
cgroup but it will not modify the cgroup with spec limits.

PiperOrigin-RevId: 216538209
Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-10 09:00:42 -07:00
Fabricio Voznika 20508bafb8 Add tests to verify gofer is chroot'ed
PiperOrigin-RevId: 216472439
Change-Id: Ic4cb86c8e0a9cb022d3ceed9dc5615266c307cf9
2018-10-09 21:07:14 -07:00
Ian Gudger c36d2ef373 Add new netstack metrics to the sentry
PiperOrigin-RevId: 216431260
Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820
2018-10-09 15:12:44 -07:00
Brian Geffon acf7a95189 Add memunit to sysinfo(2).
Also properly add padding after Procs in the linux.Sysinfo
structure. This will be implicitly padded to 64bits so we
need to do the same.

PiperOrigin-RevId: 216372907
Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
2018-10-09 09:52:14 -07:00
Nicolas Lacasse ae5122eb87 Job control signals must be sent to all processes in the FG process group.
We were previously only sending to the originator of the process group.

Integration test was changed to test this behavior. It fails without the
corresponding code change.

PiperOrigin-RevId: 216297263
Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
2018-10-08 20:48:54 -07:00
Michael Pratt b8048f75da Uncapitalize error
PiperOrigin-RevId: 216281263
Change-Id: Ie0c189e7f5934b77c6302336723bc1181fd2866c
2018-10-08 17:44:39 -07:00
Michael Pratt 569c2b06c4 Statfs Namelen should be NAME_MAX not PATH_MAX
We accidentally set the wrong maximum. I've also added PATH_MAX and
NAME_MAX to the linux abi package.

PiperOrigin-RevId: 216221311
Change-Id: I44805fcf21508831809692184a0eba4cee469633
2018-10-08 11:39:54 -07:00
Jamie Liu e9e8be6613 Implement shared futexes.
- Shared futex objects on shared mappings are represented by Mappable +
  offset, analogous to Linux's use of inode + offset. Add type
  futex.Key, and change the futex.Manager bucket API to use futex.Keys
  instead of addresses.

- Extend the futex.Checker interface to be able to return Keys for
  memory mappings. It returns Keys rather than just mappings because
  whether the address or the target of the mapping is used in the Key
  depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this
  matters because using mapping target for a futex on a MAP_PRIVATE
  mapping causes it to stop working across COW-breaking.

- futex.Manager.WaitComplete depends on atomic updates to
  futex.Waiter.addr to determine when it has locked the right bucket,
  which is much less straightforward for struct futex.Waiter.key. Switch
  to an atomically-accessed futex.Waiter.bucket pointer.

- futex.Manager.Wake now needs to take a futex.Checker to resolve
  addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit
  path to perform a shared futex wakeup (Linux:
  kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid,
  FUTEX_WAKE, ...)). This is a problem because futexChecker is in the
  syscalls/linux package. Move it to kernel.

PiperOrigin-RevId: 216207039
Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
2018-10-08 10:20:38 -07:00
Nicolas Lacasse 4a00ea557c Capture boot panics in debug log.
Docker and Containerd both eat the boot processes stderr, making it difficult
to track down panics (which are always written to stderr).

This CL makes the boot process dup its debug log FD to stderr, so that panics
will be captured in the debug log, which is better than nothing.

This is the 3rd try at this CL.  Previous attempts were foiled because Docker
expects the 'create' command to pass its stdio directly to the container, so
duping stderr in 'create' caused the applications stderr to go to the log file,
which breaks many applications (including our mysql test).

I added a new image_test that makes sure stdout and stderr are handled
correctly.

PiperOrigin-RevId: 215767328
Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
2018-10-04 11:01:44 -07:00
Fabricio Voznika 3f46f2e501 Fix sandbox chroot
Sandbox was setting chroot, but was not chaging the working
dir. Added test to ensure this doesn't happen in the future.

PiperOrigin-RevId: 215676270
Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
2018-10-03 20:44:20 -07:00
Ian Gudger beac59b37a Fix panic if FIOASYNC callback is registered and triggered without target
PiperOrigin-RevId: 215674589
Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
2018-10-03 20:22:31 -07:00
Nicolas Lacasse e98b14b4aa Bump rules_go to v0.15.4 and go toolchain to v1.11.1.
PiperOrigin-RevId: 215664253
Change-Id: Ice2500e669194630c9d03903c35622afb92dcba5
2018-10-03 18:16:43 -07:00
Nicolas Lacasse 213f6688a5 Implement TIOCSCTTY ioctl as a noop.
PiperOrigin-RevId: 215658757
Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
2018-10-03 17:29:56 -07:00