Commit Graph

663 Commits

Author SHA1 Message Date
Brian Geffon acf7a95189 Add memunit to sysinfo(2).
Also properly add padding after Procs in the linux.Sysinfo
structure. This will be implicitly padded to 64bits so we
need to do the same.

PiperOrigin-RevId: 216372907
Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
2018-10-09 09:52:14 -07:00
Nicolas Lacasse ae5122eb87 Job control signals must be sent to all processes in the FG process group.
We were previously only sending to the originator of the process group.

Integration test was changed to test this behavior. It fails without the
corresponding code change.

PiperOrigin-RevId: 216297263
Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
2018-10-08 20:48:54 -07:00
Michael Pratt b8048f75da Uncapitalize error
PiperOrigin-RevId: 216281263
Change-Id: Ie0c189e7f5934b77c6302336723bc1181fd2866c
2018-10-08 17:44:39 -07:00
Michael Pratt 569c2b06c4 Statfs Namelen should be NAME_MAX not PATH_MAX
We accidentally set the wrong maximum. I've also added PATH_MAX and
NAME_MAX to the linux abi package.

PiperOrigin-RevId: 216221311
Change-Id: I44805fcf21508831809692184a0eba4cee469633
2018-10-08 11:39:54 -07:00
Jamie Liu e9e8be6613 Implement shared futexes.
- Shared futex objects on shared mappings are represented by Mappable +
  offset, analogous to Linux's use of inode + offset. Add type
  futex.Key, and change the futex.Manager bucket API to use futex.Keys
  instead of addresses.

- Extend the futex.Checker interface to be able to return Keys for
  memory mappings. It returns Keys rather than just mappings because
  whether the address or the target of the mapping is used in the Key
  depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this
  matters because using mapping target for a futex on a MAP_PRIVATE
  mapping causes it to stop working across COW-breaking.

- futex.Manager.WaitComplete depends on atomic updates to
  futex.Waiter.addr to determine when it has locked the right bucket,
  which is much less straightforward for struct futex.Waiter.key. Switch
  to an atomically-accessed futex.Waiter.bucket pointer.

- futex.Manager.Wake now needs to take a futex.Checker to resolve
  addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit
  path to perform a shared futex wakeup (Linux:
  kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid,
  FUTEX_WAKE, ...)). This is a problem because futexChecker is in the
  syscalls/linux package. Move it to kernel.

PiperOrigin-RevId: 216207039
Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
2018-10-08 10:20:38 -07:00
Nicolas Lacasse 4a00ea557c Capture boot panics in debug log.
Docker and Containerd both eat the boot processes stderr, making it difficult
to track down panics (which are always written to stderr).

This CL makes the boot process dup its debug log FD to stderr, so that panics
will be captured in the debug log, which is better than nothing.

This is the 3rd try at this CL.  Previous attempts were foiled because Docker
expects the 'create' command to pass its stdio directly to the container, so
duping stderr in 'create' caused the applications stderr to go to the log file,
which breaks many applications (including our mysql test).

I added a new image_test that makes sure stdout and stderr are handled
correctly.

PiperOrigin-RevId: 215767328
Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
2018-10-04 11:01:44 -07:00
Fabricio Voznika 3f46f2e501 Fix sandbox chroot
Sandbox was setting chroot, but was not chaging the working
dir. Added test to ensure this doesn't happen in the future.

PiperOrigin-RevId: 215676270
Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
2018-10-03 20:44:20 -07:00
Ian Gudger beac59b37a Fix panic if FIOASYNC callback is registered and triggered without target
PiperOrigin-RevId: 215674589
Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
2018-10-03 20:22:31 -07:00
Nicolas Lacasse e98b14b4aa Bump rules_go to v0.15.4 and go toolchain to v1.11.1.
PiperOrigin-RevId: 215664253
Change-Id: Ice2500e669194630c9d03903c35622afb92dcba5
2018-10-03 18:16:43 -07:00
Nicolas Lacasse 213f6688a5 Implement TIOCSCTTY ioctl as a noop.
PiperOrigin-RevId: 215658757
Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
2018-10-03 17:29:56 -07:00
Ian Gudger 4fef31f96c Add S/R support for FIOASYNC
PiperOrigin-RevId: 215655197
Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
2018-10-03 17:03:09 -07:00
Nicolas Lacasse 9f2ba6ac3e Automated rollback of changelist 215585559
PiperOrigin-RevId: 215633475
Change-Id: I7bc471e3b9a2c725fb5e15b3bbcba2ee1ea574b1
2018-10-03 14:54:21 -07:00
Jamie Liu 8e729e0e1f Add //pkg/sync:generic_atomicptr.
PiperOrigin-RevId: 215620949
Change-Id: I519da4b44386d950443e5784fb8c48ff9a36c5d3
2018-10-03 13:52:15 -07:00
Nicolas Lacasse 7a6412cb0b runsc: Allow state transition from Creating to Stopped.
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.

Without this transition, errors during Create get hidden by the later panic.

PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
2018-10-03 11:49:40 -07:00
Nicolas Lacasse 37e57a903c Fix arithmetic error in multi_container_test.
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.

I switched back to Math.Pow format to make the arithmetic easier to inspect.

PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
2018-10-03 10:47:52 -07:00
Nicolas Lacasse 55d28fb124 runsc: Dup debug log file to stderr, so sentry panics don't get lost.
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.

If we have a debug log file, we should send panics (and all stderr data) to the
log file.

PiperOrigin-RevId: 215585559
Change-Id: I3844259ed0cd26e26422bcdb40dded302740b8b6
2018-10-03 10:33:56 -07:00
Nicolas Lacasse e215b9970a runsc: Pass root container's stdio via FD.
We were previously using the sandbox process's stdio as the root container's
stdio. This makes it difficult/impossible to distinguish output application
output from sandbox output, such as panics, which are always written to stderr.

Also close the console socket when we are done with it.

PiperOrigin-RevId: 215585180
Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-10-03 10:32:03 -07:00
Fabricio Voznika 77e43adeab Add TIOCINQ to allowed seccomp when hostinet is used
PiperOrigin-RevId: 215574070
Change-Id: Ib36e804adebaf756adb9cbc2752be9789691530b
2018-10-03 09:32:54 -07:00
Nicolas Lacasse 0a13042d48 Bump some timeouts in the image tests.
PiperOrigin-RevId: 215489101
Change-Id: Iaf96aa8edb1101b70548030c62995841215237d9
2018-10-02 17:28:09 -07:00
Nicolas Lacasse cf3dc2f8a5 Fix compilation bug.
Docker.Run only returns a single argument.

PiperOrigin-RevId: 215427309
Change-Id: I1eebbc628853ca57f79d25e18d4f04dfa5a2a003
2018-10-02 11:36:50 -07:00
Nicolas Lacasse f1c01ed886 runsc: Support job control signals in "exec -it".
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.

However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.

This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.

One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.

To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.

Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.

Example:
	root@:/usr/local/apache2# sleep 100
	^C

	root@:/usr/local/apache2# sleep 100
	^Z
	[1]+  Stopped                 sleep 100

	root@:/usr/local/apache2# fg
	sleep 100
	^C

	root@:/usr/local/apache2#

PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01 22:06:56 -07:00
Michael Pratt 0400e54592 Add itimer types to linux package, strace
PiperOrigin-RevId: 215278262
Change-Id: Icd10384c99802be6097be938196044386441e282
2018-10-01 14:16:53 -07:00
Nicolas Lacasse d185552e79 Fix ruby image tests.
PiperOrigin-RevId: 215274663
Change-Id: I051721f459084db3aa608432831170cd47ae7df0
2018-10-01 13:57:36 -07:00
Nicolas Lacasse 07aa040842 Fix possible panic in control.Processes.
There was a race where we checked task.Parent() != nil, and then later called
task.Parent() again, assuming that it is not nil.  If the task is exiting, the
parent may have been set to nil in between the two calls, causing a panic.

This CL changes the code to only call task.Parent() once.

PiperOrigin-RevId: 215274456
Change-Id: Ib5a537312c917773265ec72016014f7bc59a5f59
2018-10-01 13:56:07 -07:00
Fabricio Voznika a2ad8fef13 Make multi-container the default mode for runsc
And remove multicontainer option.

PiperOrigin-RevId: 215236981
Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-10-01 10:31:17 -07:00
Fabricio Voznika 43e6aff50e Don't fail if Root is readonly and is not a mount point
This makes runsc more friendly to run without docker or K8s.

PiperOrigin-RevId: 215165586
Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
2018-09-30 23:23:03 -07:00
Fabricio Voznika 9c7eb13079 Removed duplicate/stale TODOs
PiperOrigin-RevId: 215162121
Change-Id: I35f06ac3235cf31c9e8a158dcf6261a7ded6c4c4
2018-09-30 22:22:18 -07:00
Fabricio Voznika 50c283b9f5 Add test for 'signall --all' with stopped container
PiperOrigin-RevId: 215025517
Change-Id: I04b9d8022b3d9dfe279e466ddb91310b9860b9af
2018-09-28 18:16:10 -07:00
Fabricio Voznika cfdd418fe2 Made a few changes to make testutil.Docker easier to use
PiperOrigin-RevId: 215023376
Change-Id: I139569bd15c013e5dd0f60d0c98a64eaa0ba9e8e
2018-09-28 17:48:14 -07:00
Lantao Liu f21dde5666 runsc: allow `kill --all` when container is in stopped state.
PiperOrigin-RevId: 215009105
Change-Id: I1ab12eddf7694c4db98f6dafca9dae352a33f7c4
2018-09-28 15:53:25 -07:00
Fabricio Voznika 49ff81a42b Add ruby image tests
PiperOrigin-RevId: 215009066
Change-Id: I54ab920fa649cf4d0817f7cb8ea76f9126523330
2018-09-28 15:52:33 -07:00
Fabricio Voznika 2496d9b4b6 Make runsc kill and delete more conformant to the "spec"
PiperOrigin-RevId: 214976251
Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28 12:22:21 -07:00
Googler fb65b0b471 Change tcpip.Route.Mask to tcpip.AddressMask.
PiperOrigin-RevId: 214975659
Change-Id: I7bd31a2c54f03ff52203109da312e4206701c44c
2018-09-28 12:18:15 -07:00
Michael Pratt e22c4cba47 Clarify CLA requirements and Gerrit error
Call out the error that Gerrit returns if there is no CLA on file.

PiperOrigin-RevId: 214964718
Change-Id: I3d92e3eb73f178e8c4c52b5defbe8d21db536215
2018-09-28 11:12:30 -07:00
Michael Pratt 3ff24b4f2c Require AF_UNIX sockets from the gofer
host.endpoint already has the check, but it is missing from
host.ConnectedEndpoint.

PiperOrigin-RevId: 214962762
Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6
2018-09-28 11:03:11 -07:00
Sepehr Raissian c17ea8c6e2 Block for link address resolution
Previously, if address resolution for UDP or Ping sockets required sending
packets using Write in Transport layer, Resolve would return ErrWouldBlock
and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution
would run in background. Further calls to Write using same address would also
return ErrNoLinkAddress until resolution has been completed successfully.

Since Write is not allowed to block and System Calls need to be
interruptible in System Call layer, the caller to Write is responsible for
blocking upon return of ErrWouldBlock.

Now, when startAddressResolution is called a notification channel for
the completion of the address resolution is returned.
The channel will traverse up to the calling function of Write as well as
ErrNoLinkAddress. Once address resolution is complete (success or not) the
channel is closed. The caller would call Write again to send packets and
check if address resolution was compeleted successfully or not.

Fixes google/gvisor#5

Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93
PiperOrigin-RevId: 214962200
2018-09-28 11:00:16 -07:00
Fabricio Voznika cf226d48ce Switch to root in userns when CAP_SYS_CHROOT is also missing
Some tests check current capabilities and re-run the tests as root inside
userns if required capabibilities are missing. It was checking for
CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now.

PiperOrigin-RevId: 214949226
Change-Id: Ic81363969fa76c04da408fae8ea7520653266312
2018-09-28 09:44:13 -07:00
Fabricio Voznika 6779bd1187 Merge Loader.containerRootTGs and execProcess into a single map
It's easier to manage a single map with processes that we're interested
to track. This will make the next change to clean up the map on destroy
easier.

PiperOrigin-RevId: 214894210
Change-Id: I099247323a0487cd0767120df47ba786fac0926d
2018-09-27 23:55:05 -07:00
Fabricio Voznika 1166c088fc Move common test code to function
PiperOrigin-RevId: 214890335
Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27 22:53:18 -07:00
Nicolas Lacasse b709d23987 Forward ioctl(TCSETSF) calls on host ttys to the host kernel.
We already forward TCSETS and TCSETSW.  TCSETSF is roughly equivalent but
discards pending input.

The filters were relaxed to allow host ioctls with TCSETSF argument.

This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.

Before:
	root@b8a0240fc836:/# passwd
	Enter new UNIX password: 123
	Retype new UNIX password: 123
	passwd: password updated successfully

After:
	root@ae6f5dabe402:/# passwd
	Enter new UNIX password:
	Retype new UNIX password:
	passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27 18:17:38 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Anton Gyllenberg 68ac2ad1e1 netstack: make go:linkname work for all architectures
The //go:linkname directive requires the presence of
assembly files in the package. Even an empty file will do.
There was an empty assembly file commit_arm64.s, but
that is limited to GOARCH=arm64. Renaming to empty.s will
remove the unnecessary build constraint and allow building
netstack for other architectures than amd64 and arm64.

Without this, building directly with go (not bazel)
for e.g., GOARCH=arm gives:

sleep/sleep_unsafe.go:88:6: missing function body
sleep/sleep_unsafe.go:91:6: missing function body

Change-Id: I29d1d13e1ff31506a174d4595b8cd57fa58bf52b
PiperOrigin-RevId: 214820299
2018-09-27 12:53:10 -07:00
Zhaozhong Ni 234f36b6f2 sentry: export cpuTime function.
PiperOrigin-RevId: 214798278
Change-Id: Id59d1ceb35037cda0689d3a1c4844e96c6957615
2018-09-27 12:52:25 -07:00
Fabricio Voznika b514ab0589 Refactor 'runsc boot' to take container ID as argument
This makes the flow slightly simpler (no need to call
Loader.SetRootContainer). And this is required change to tag
tasks with container ID inside the Sentry.

PiperOrigin-RevId: 214795210
Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-27 10:26:34 -07:00
Fabricio Voznika 6910ff3643 Move uds_test_app to common test_app
This was done so it's easier to add more functionality
to this file for other tests.

PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
2018-09-27 08:58:23 -07:00
Fabricio Voznika fca9a390db Return correct parent PID
Old code was returning ID of the thread that created
the child process. It should be returning the ID of
the parent process instead.

PiperOrigin-RevId: 214720910
Change-Id: I95715c535bcf468ecf1ae771cccd04a4cd345b36
2018-09-26 22:00:04 -07:00
Lantao Liu a003e041c8 runsc: fix pid file race condition in exec detach mode.
PiperOrigin-RevId: 214700295
Change-Id: I73d8490572eebe5da584af91914650d1953aeb91
2018-09-26 17:41:20 -07:00
Tamir Duberstein 539df2940d Use the ICMP target address in responses
There is a subtle bug that is the result of two changes made when upstreaming
ICMPv6 support from Fuchsia:
1) ipv6.endpoint.WritePacket writes the local address it was initialized with,
rather than the provided route's local address
2) ipv6.endpoint.handleICMP doesn't set its route's local address to the ICMP
target address before writing the response

The result is that the ICMP response erroneously uses the target ipv6 address
(rather than icmp) as its source address in the response. When trying to debug
this by fixing (2), we ran into problems with bad ipv6 checksums because (1)
didn't respect the local address of the route being passed to it.

This fixes both problems.

PiperOrigin-RevId: 214650822
Change-Id: Ib6148bf432e6428d760ef9da35faef8e4b610d69
2018-09-26 12:41:04 -07:00
Tamir Duberstein bee264f0c5 Export ipv6 address helpers
This is useful for Fuchsia.

PiperOrigin-RevId: 214619681
Change-Id: If5a60dd82365c2eae51a12bbc819e5aae8c76ee9
2018-09-26 09:49:52 -07:00
Nicolas Lacasse d489336784 runsc: All non-root bind mounts should be shared.
This CL changes the semantics of the "--file-access" flag so that it only
affects the root filesystem.  The default remains "exclusive" which is the
common use case, as neither Docker nor K8s supports sharing the root.

Keeping the root fs as "exclusive" means that the fs-intensive work done during
application startup will mostly be cacheable, and thus faster.

Non-root bind mounts will always be shared.

This CL also removes some redundant FSAccessType validations.  We validate this
flag in main(), so we can assume it is valid afterwards.

PiperOrigin-RevId: 214359936
Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51
2018-09-24 17:22:15 -07:00