Commit Graph

576 Commits

Author SHA1 Message Date
Fabricio Voznika 6779bd1187 Merge Loader.containerRootTGs and execProcess into a single map
It's easier to manage a single map with processes that we're interested
to track. This will make the next change to clean up the map on destroy
easier.

PiperOrigin-RevId: 214894210
Change-Id: I099247323a0487cd0767120df47ba786fac0926d
2018-09-27 23:55:05 -07:00
Fabricio Voznika 1166c088fc Move common test code to function
PiperOrigin-RevId: 214890335
Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27 22:53:18 -07:00
Nicolas Lacasse b709d23987 Forward ioctl(TCSETSF) calls on host ttys to the host kernel.
We already forward TCSETS and TCSETSW.  TCSETSF is roughly equivalent but
discards pending input.

The filters were relaxed to allow host ioctls with TCSETSF argument.

This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.

Before:
	root@b8a0240fc836:/# passwd
	Enter new UNIX password: 123
	Retype new UNIX password: 123
	passwd: password updated successfully

After:
	root@ae6f5dabe402:/# passwd
	Enter new UNIX password:
	Retype new UNIX password:
	passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27 18:17:38 -07:00
Fabricio Voznika 491faac03b Implement 'runsc kill --all'
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.

PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27 15:00:58 -07:00
Anton Gyllenberg 68ac2ad1e1 netstack: make go:linkname work for all architectures
The //go:linkname directive requires the presence of
assembly files in the package. Even an empty file will do.
There was an empty assembly file commit_arm64.s, but
that is limited to GOARCH=arm64. Renaming to empty.s will
remove the unnecessary build constraint and allow building
netstack for other architectures than amd64 and arm64.

Without this, building directly with go (not bazel)
for e.g., GOARCH=arm gives:

sleep/sleep_unsafe.go:88:6: missing function body
sleep/sleep_unsafe.go:91:6: missing function body

Change-Id: I29d1d13e1ff31506a174d4595b8cd57fa58bf52b
PiperOrigin-RevId: 214820299
2018-09-27 12:53:10 -07:00
Zhaozhong Ni 234f36b6f2 sentry: export cpuTime function.
PiperOrigin-RevId: 214798278
Change-Id: Id59d1ceb35037cda0689d3a1c4844e96c6957615
2018-09-27 12:52:25 -07:00
Fabricio Voznika b514ab0589 Refactor 'runsc boot' to take container ID as argument
This makes the flow slightly simpler (no need to call
Loader.SetRootContainer). And this is required change to tag
tasks with container ID inside the Sentry.

PiperOrigin-RevId: 214795210
Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-27 10:26:34 -07:00
Fabricio Voznika 6910ff3643 Move uds_test_app to common test_app
This was done so it's easier to add more functionality
to this file for other tests.

PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
2018-09-27 08:58:23 -07:00
Fabricio Voznika fca9a390db Return correct parent PID
Old code was returning ID of the thread that created
the child process. It should be returning the ID of
the parent process instead.

PiperOrigin-RevId: 214720910
Change-Id: I95715c535bcf468ecf1ae771cccd04a4cd345b36
2018-09-26 22:00:04 -07:00
Lantao Liu a003e041c8 runsc: fix pid file race condition in exec detach mode.
PiperOrigin-RevId: 214700295
Change-Id: I73d8490572eebe5da584af91914650d1953aeb91
2018-09-26 17:41:20 -07:00
Tamir Duberstein 539df2940d Use the ICMP target address in responses
There is a subtle bug that is the result of two changes made when upstreaming
ICMPv6 support from Fuchsia:
1) ipv6.endpoint.WritePacket writes the local address it was initialized with,
rather than the provided route's local address
2) ipv6.endpoint.handleICMP doesn't set its route's local address to the ICMP
target address before writing the response

The result is that the ICMP response erroneously uses the target ipv6 address
(rather than icmp) as its source address in the response. When trying to debug
this by fixing (2), we ran into problems with bad ipv6 checksums because (1)
didn't respect the local address of the route being passed to it.

This fixes both problems.

PiperOrigin-RevId: 214650822
Change-Id: Ib6148bf432e6428d760ef9da35faef8e4b610d69
2018-09-26 12:41:04 -07:00
Tamir Duberstein bee264f0c5 Export ipv6 address helpers
This is useful for Fuchsia.

PiperOrigin-RevId: 214619681
Change-Id: If5a60dd82365c2eae51a12bbc819e5aae8c76ee9
2018-09-26 09:49:52 -07:00
Nicolas Lacasse d489336784 runsc: All non-root bind mounts should be shared.
This CL changes the semantics of the "--file-access" flag so that it only
affects the root filesystem.  The default remains "exclusive" which is the
common use case, as neither Docker nor K8s supports sharing the root.

Keeping the root fs as "exclusive" means that the fs-intensive work done during
application startup will mostly be cacheable, and thus faster.

Non-root bind mounts will always be shared.

This CL also removes some redundant FSAccessType validations.  We validate this
flag in main(), so we can assume it is valid afterwards.

PiperOrigin-RevId: 214359936
Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51
2018-09-24 17:22:15 -07:00
Ian Gudger 4094480b28 Remove unnecessary defer
PiperOrigin-RevId: 214073949
Change-Id: I8fab916cd77362c13dac2c9dcf2ecc1710d87a5e
2018-09-21 18:14:38 -07:00
Ian Gudger 7ce13ebcad Run gofmt -s on everything
PiperOrigin-RevId: 214040901
Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21 14:06:59 -07:00
Tamir Duberstein 4634cd66ad Extend tcpip.Address.String to ipv6 addresses
PiperOrigin-RevId: 214039349
Change-Id: Ia7d09c5f85eddd1e5634f3c21b0bd60b10be6bd2
2018-09-21 13:58:31 -07:00
Nicolas Lacasse d260e808f4 The "action" in container.Signal should be "signal".
PiperOrigin-RevId: 214038776
Change-Id: I4ad212540ec4ef4fb5ab5fdcb7f0865c4f746895
2018-09-21 13:54:35 -07:00
Tamir Duberstein 95f30ef67b Deflake TestSimpleReceive
...by increasing the allotted timeout and using direct comparison rather than
reflect.DeepEqual (which should be faster).

PiperOrigin-RevId: 214027024
Change-Id: I0a2690e65c7e14b4cc118c7312dbbf5267dc78bc
2018-09-21 12:33:21 -07:00
Tamir Duberstein 7fa57ee579 Export read-only tcpip.Subnet.Mask
PiperOrigin-RevId: 214023383
Change-Id: I5a7572f949840fb68a3ffb7342e6a3524bd00864
2018-09-21 12:07:29 -07:00
Nicolas Lacasse b4321f4447 runsc: Synchronize container metadata changes with a file lock.
Each container has associated metadata (particularly the container status) that
is manipulated by various runsc commands. This metadata is stored in a file
identified by the container id.

Different runsc processes may manipulate the same container metadata, and each
will read/write to the metadata file.

This CL adds a file lock per container which must be held when reading the
container metadata file, and when modifying and writing the container metadata.

PiperOrigin-RevId: 214019179
Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad
2018-09-21 11:42:06 -07:00
Fabricio Voznika b63c4bfe02 Set Sandbox.Chroot so it gets cleaned up upon destruction
I've made several attempts to create a test, but the lack of
permission from the test user makes it nearly impossible to
test anything useful.

PiperOrigin-RevId: 213922174
Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
2018-09-20 18:54:09 -07:00
Lantao Liu 8a938a3f9d runsc: allow `runsc wait` on a container for multiple times.
PiperOrigin-RevId: 213908919
Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7
2018-09-20 16:59:42 -07:00
Nicolas Lacasse cbaec4d614 Wait for all async fs operations to complete before returning from Destroy.
Destroy flushes dirent references, which triggers many async close operations.
We must wait for those to finish before returning from Destroy, otherwise we
may kill the gofer, causing a cascade of failing RPCs and leading to an
inconsistent FS state.

PiperOrigin-RevId: 213884637
Change-Id: Id054b47fc0f97adc5e596d747c08d3b97a1d1f71
2018-09-20 14:37:53 -07:00
Lantao Liu 9464b82a06 runsc: Fix a bug that `runsc wait` doesn't work after container exits.
PiperOrigin-RevId: 213849165
Change-Id: I5120b2f568850c0c42a08e8706e7f8653ef1bd94
2018-09-20 11:23:26 -07:00
Kevin Krakauer ffb5fdd690 runsc: Fix stdin/stdout/stderr in multi-container mode.
The issue with the previous change was that the stdin/stdout/stderr passed to
the sentry were dup'd by host.ImportFile. This left a dangling FD that by never
closing caused containerd to timeout waiting on container stop.

PiperOrigin-RevId: 213753032
Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
2018-09-19 22:20:41 -07:00
Nicolas Lacasse 915d76aa92 Add container.Destroy urpc method.
This method will:
1. Stop the container process if it is still running.
2. Unmount all sanadbox-internal mounts for the container.
3. Delete the contaner root directory inside the sandbox.

Destroy is idempotent, and safe to call concurrantly.

This fixes a bug where after stopping a container, we cannot unmount the
container root directory on the host. This bug occured because the sandbox
dirent cache was holding a dirent with a host fd corresponding to a file inside
the container root on the host. The dirent cache did not know that the
container had exited, and kept the FD open, preventing us from unmounting on
the host.

Now that we unmount (and flush) all container mounts inside the sandbox, any
host FDs donated by the gofer will be closed, and we can unmount the container
root on the host.

PiperOrigin-RevId: 213737693
Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-19 18:54:14 -07:00
Fabricio Voznika b873e388f3 Update gocapability commit to get bug fix
PiperOrigin-RevId: 213734203
Change-Id: I9cf5d3885fb88b41444c686168d4cab00f09988a
2018-09-19 18:17:14 -07:00
Kevin Krakauer 639226c3d9 runsc: Mark container_test flaky.
PiperOrigin-RevId: 213732520
Change-Id: Ife292987ec8b1de4c2e7e3b7d4452b00c1582e91
2018-09-19 18:03:35 -07:00
Ian Gudger 117ac8bc5b Fix data race on tcp.endpoint.hardError in tcp.(*endpoint).Read
tcp.endpoint.hardError is protected by tcp.endpoint.mu.

PiperOrigin-RevId: 213730698
Change-Id: I4e4f322ac272b145b500b1a652fbee0c7b985be2
2018-09-19 17:49:18 -07:00
Fabricio Voznika e395273301 Fix sandbox and gofer capabilities
Capabilities.Set() adds capabilities,
but doesn't remove existing ones that might have been loaded. Fixed
the code and added tests.

PiperOrigin-RevId: 213726369
Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba
2018-09-19 17:15:14 -07:00
Nicolas Lacasse 2ad3228cd0 runsc: Don't create __runsc_containers__ unless we are in multi-container mode.
PiperOrigin-RevId: 213715511
Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12
2018-09-19 16:10:47 -07:00
Bert Muthalaly 2e497de2d9 Pass local link address to DeliverNetworkPacket
This allows a NetworkDispatcher to implement transparent bridging,
assuming all implementations of LinkEndpoint.WritePacket call eth.Encode
with header.EthernetFields.SrcAddr set to the passed
Route.LocalLinkAddress, if it is provided.

PiperOrigin-RevId: 213686651
Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
2018-09-19 13:43:58 -07:00
Lingfu f0a92b6b67 Add docker command line args support for --cpuset-cpus and --cpus
`docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json
(runtime spec file). When nginx worker_connections is configured as auto, the worker is
generated according to the number of CPUs. If the cgroup is already set on the host, but
it is not displayed correctly in the sandbox, performance may be degraded.

This patch can get cpus info from spec file and apply to sentry on bootup, so the
/proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on
`/sys/devices/system/cpu/online` are also affected by this patch.

e.g.

--cpuset-cpus=2,3   ->  cpu number:2
--cpuset-cpus=4-7   ->  cpu number:4
--cpus=2.8          ->  cpu number:3
--cpus=0.5          ->  cpu number:1
Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316
PiperOrigin-RevId: 213685199
2018-09-19 13:35:42 -07:00
Bhasker Hariharan bd12e95247 Fix RTT estimation when timestamp option is enabled.
From RFC7323#Section-4

The [RFC6298] RTT estimator has weighting factors, alpha and beta, based on an
implicit assumption that at most one RTTM will be sampled per RTT.  When
multiple RTTMs per RTT are available to update the RTT estimator, an
implementation SHOULD try to adhere to the spirit of the history specified in
[RFC6298].  An implementation suggestion is detailed in Appendix G.

From RFC7323#appendix-G
Appendix G.  RTO Calculation Modification

   Taking multiple RTT samples per window would shorten the history calculated
   by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a
   similar history as originally intended by [RFC6298].

   It is roughly known how many samples a congestion window worth of data will
   yield, not accounting for ACK compression, and ACK losses.  Such events will
   result in more history of the path being reflected in the final value for
   RTO, and are uncritical.  This modification will ensure that a similar amount
   of time is taken into account for the RTO estimation, regardless of how many
   samples are taken per window:

      ExpectedSamples = ceiling(FlightSize / (SMSS * 2))

      alpha' = alpha / ExpectedSamples

      beta' = beta / ExpectedSamples

   Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".

   Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and
   beta' instead:

      RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|

      SRTT <- (1 - alpha') * SRTT + alpha' * R'

      (for each sample R')

PiperOrigin-RevId: 213644795
Change-Id: I52278b703540408938a8edb8c38be97b37f4a10e
2018-09-19 09:59:12 -07:00
Fabricio Voznika 8aec7473a1 Added state machine checks for Container.Status
For my own sanitity when thinking about possible transitions and state.

PiperOrigin-RevId: 213559482
Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c
2018-09-18 19:12:54 -07:00
Nicolas Lacasse fd222d62ed Short-circuit Readdir calls on overlay files when the dirent is frozen.
If we have an overlay file whose corresponding Dirent is frozen, then we should
not bother calling Readdir on the upper or lower files, since DirentReaddir
will calculate children based on the frozen Dirent tree.

A test was added that fails without this change.

PiperOrigin-RevId: 213531215
Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d
2018-09-18 15:42:22 -07:00
Fabricio Voznika 7967d8ecd5 Handle children processes better in tests
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.

Also made the contract around Container and Sandbox destroy clearer.

PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18 15:21:28 -07:00
Michael Pratt dd05c96d99 Increase state test timeout
PiperOrigin-RevId: 213519378
Change-Id: Iffdb987da3a7209a297ea2df171d2ae5fa9b2b34
2018-09-18 14:38:42 -07:00
Kevin Krakauer 7e00f37054 Automated rollback of changelist 213307171
PiperOrigin-RevId: 213504354
Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-18 13:22:26 -07:00
Brian Geffon ed08597d12 Allow for MSG_CTRUNC in input flags for recv.
PiperOrigin-RevId: 213481363
Change-Id: I8150ea20cebeb207afe031ed146244de9209e745
2018-09-18 11:14:37 -07:00
Fabricio Voznika da20559137 Provide better message when memfd_create fails with ENOSYS
Updates #100

PiperOrigin-RevId: 213414821
Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37
2018-09-18 02:09:28 -07:00
Fabricio Voznika 5d9816be41 Remove memory usage static init
panic() during init() can be hard to debug.

Updates #100

PiperOrigin-RevId: 213391932
Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17 21:34:37 -07:00
Fabricio Voznika 26b08e182c Rename container in test
's' used to stand for sandbox, before container exited.

PiperOrigin-RevId: 213390641
Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17 21:18:27 -07:00
Tamir Duberstein d6409b6564 Prevent TCP connect from picking bound ports
PiperOrigin-RevId: 213387851
Change-Id: Icc6850761bc11afd0525f34863acd77584155140
2018-09-17 20:44:04 -07:00
Kevin Krakauer bb88c187c5 runsc: Enable waiting on exited processes.
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
  code.
- Processes not waited on will consume memory (like a zombie process)

PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17 16:25:24 -07:00
Ian Gudger ab6fa44588 Allow kernel.(*Task).Block to accept an extract only channel
PiperOrigin-RevId: 213328293
Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a
2018-09-17 13:35:54 -07:00
Tamir Duberstein a452971630 Add empty .s file to allow `//go:linkname`
This was previously broken in 212917409, resulting in "missing function body"
compilation errors.

PiperOrigin-RevId: 213323695
Change-Id: I32a95b76a1c73fd731f223062ec022318b979bd4
2018-09-17 13:06:55 -07:00
Tamir Duberstein 23258ca284 Implement packet forwarding to enable NAT
PiperOrigin-RevId: 213323501
Change-Id: I0996ddbdcf097588745efe35481085d42dbaf446
2018-09-17 13:05:36 -07:00
Michael Pratt d639c3d61b Allow NULL data in mount(2)
PiperOrigin-RevId: 213315267
Change-Id: I7562bcd81fb22e90aa9c7dd9eeb94803fcb8c5af
2018-09-17 12:16:29 -07:00
Kevin Krakauer 25add7b22b runsc: Fix stdin/out/err in multi-container mode.
Stdin/out/err weren't being sent to the sentry.

PiperOrigin-RevId: 213307171
Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-17 11:31:28 -07:00