gvisor

Commit Graph

Author	SHA1	Message	Date
Tamir Duberstein	539df2940d	Use the ICMP target address in responses There is a subtle bug that is the result of two changes made when upstreaming ICMPv6 support from Fuchsia: 1) ipv6.endpoint.WritePacket writes the local address it was initialized with, rather than the provided route's local address 2) ipv6.endpoint.handleICMP doesn't set its route's local address to the ICMP target address before writing the response The result is that the ICMP response erroneously uses the target ipv6 address (rather than icmp) as its source address in the response. When trying to debug this by fixing (2), we ran into problems with bad ipv6 checksums because (1) didn't respect the local address of the route being passed to it. This fixes both problems. PiperOrigin-RevId: 214650822 Change-Id: Ib6148bf432e6428d760ef9da35faef8e4b610d69	2018-09-26 12:41:04 -07:00
Tamir Duberstein	bee264f0c5	Export ipv6 address helpers This is useful for Fuchsia. PiperOrigin-RevId: 214619681 Change-Id: If5a60dd82365c2eae51a12bbc819e5aae8c76ee9	2018-09-26 09:49:52 -07:00
Nicolas Lacasse	d489336784	runsc: All non-root bind mounts should be shared. This CL changes the semantics of the "--file-access" flag so that it only affects the root filesystem. The default remains "exclusive" which is the common use case, as neither Docker nor K8s supports sharing the root. Keeping the root fs as "exclusive" means that the fs-intensive work done during application startup will mostly be cacheable, and thus faster. Non-root bind mounts will always be shared. This CL also removes some redundant FSAccessType validations. We validate this flag in main(), so we can assume it is valid afterwards. PiperOrigin-RevId: 214359936 Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51	2018-09-24 17:22:15 -07:00
Ian Gudger	4094480b28	Remove unnecessary defer PiperOrigin-RevId: 214073949 Change-Id: I8fab916cd77362c13dac2c9dcf2ecc1710d87a5e	2018-09-21 18:14:38 -07:00
Ian Gudger	7ce13ebcad	Run gofmt -s on everything PiperOrigin-RevId: 214040901 Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942	2018-09-21 14:06:59 -07:00
Tamir Duberstein	4634cd66ad	Extend tcpip.Address.String to ipv6 addresses PiperOrigin-RevId: 214039349 Change-Id: Ia7d09c5f85eddd1e5634f3c21b0bd60b10be6bd2	2018-09-21 13:58:31 -07:00
Nicolas Lacasse	d260e808f4	The "action" in container.Signal should be "signal". PiperOrigin-RevId: 214038776 Change-Id: I4ad212540ec4ef4fb5ab5fdcb7f0865c4f746895	2018-09-21 13:54:35 -07:00
Tamir Duberstein	95f30ef67b	Deflake TestSimpleReceive ...by increasing the allotted timeout and using direct comparison rather than reflect.DeepEqual (which should be faster). PiperOrigin-RevId: 214027024 Change-Id: I0a2690e65c7e14b4cc118c7312dbbf5267dc78bc	2018-09-21 12:33:21 -07:00
Tamir Duberstein	7fa57ee579	Export read-only tcpip.Subnet.Mask PiperOrigin-RevId: 214023383 Change-Id: I5a7572f949840fb68a3ffb7342e6a3524bd00864	2018-09-21 12:07:29 -07:00
Nicolas Lacasse	b4321f4447	runsc: Synchronize container metadata changes with a file lock. Each container has associated metadata (particularly the container status) that is manipulated by various runsc commands. This metadata is stored in a file identified by the container id. Different runsc processes may manipulate the same container metadata, and each will read/write to the metadata file. This CL adds a file lock per container which must be held when reading the container metadata file, and when modifying and writing the container metadata. PiperOrigin-RevId: 214019179 Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad	2018-09-21 11:42:06 -07:00
Fabricio Voznika	b63c4bfe02	Set Sandbox.Chroot so it gets cleaned up upon destruction I've made several attempts to create a test, but the lack of permission from the test user makes it nearly impossible to test anything useful. PiperOrigin-RevId: 213922174 Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625	2018-09-20 18:54:09 -07:00
Lantao Liu	8a938a3f9d	runsc: allow `runsc wait` on a container for multiple times. PiperOrigin-RevId: 213908919 Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7	2018-09-20 16:59:42 -07:00
Nicolas Lacasse	cbaec4d614	Wait for all async fs operations to complete before returning from Destroy. Destroy flushes dirent references, which triggers many async close operations. We must wait for those to finish before returning from Destroy, otherwise we may kill the gofer, causing a cascade of failing RPCs and leading to an inconsistent FS state. PiperOrigin-RevId: 213884637 Change-Id: Id054b47fc0f97adc5e596d747c08d3b97a1d1f71	2018-09-20 14:37:53 -07:00
Lantao Liu	9464b82a06	runsc: Fix a bug that `runsc wait` doesn't work after container exits. PiperOrigin-RevId: 213849165 Change-Id: I5120b2f568850c0c42a08e8706e7f8653ef1bd94	2018-09-20 11:23:26 -07:00
Kevin Krakauer	ffb5fdd690	runsc: Fix stdin/stdout/stderr in multi-container mode. The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4	2018-09-19 22:20:41 -07:00
Nicolas Lacasse	915d76aa92	Add container.Destroy urpc method. This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9	2018-09-19 18:54:14 -07:00
Fabricio Voznika	b873e388f3	Update gocapability commit to get bug fix PiperOrigin-RevId: 213734203 Change-Id: I9cf5d3885fb88b41444c686168d4cab00f09988a	2018-09-19 18:17:14 -07:00
Kevin Krakauer	639226c3d9	runsc: Mark container_test flaky. PiperOrigin-RevId: 213732520 Change-Id: Ife292987ec8b1de4c2e7e3b7d4452b00c1582e91	2018-09-19 18:03:35 -07:00
Ian Gudger	117ac8bc5b	Fix data race on tcp.endpoint.hardError in tcp.(*endpoint).Read tcp.endpoint.hardError is protected by tcp.endpoint.mu. PiperOrigin-RevId: 213730698 Change-Id: I4e4f322ac272b145b500b1a652fbee0c7b985be2	2018-09-19 17:49:18 -07:00
Fabricio Voznika	e395273301	Fix sandbox and gofer capabilities Capabilities.Set() adds capabilities, but doesn't remove existing ones that might have been loaded. Fixed the code and added tests. PiperOrigin-RevId: 213726369 Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba	2018-09-19 17:15:14 -07:00
Nicolas Lacasse	2ad3228cd0	runsc: Don't create __runsc_containers__ unless we are in multi-container mode. PiperOrigin-RevId: 213715511 Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12	2018-09-19 16:10:47 -07:00
Bert Muthalaly	2e497de2d9	Pass local link address to DeliverNetworkPacket This allows a NetworkDispatcher to implement transparent bridging, assuming all implementations of LinkEndpoint.WritePacket call eth.Encode with header.EthernetFields.SrcAddr set to the passed Route.LocalLinkAddress, if it is provided. PiperOrigin-RevId: 213686651 Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a	2018-09-19 13:43:58 -07:00
Lingfu	f0a92b6b67	Add docker command line args support for --cpuset-cpus and --cpus `docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json (runtime spec file). When nginx worker_connections is configured as auto, the worker is generated according to the number of CPUs. If the cgroup is already set on the host, but it is not displayed correctly in the sandbox, performance may be degraded. This patch can get cpus info from spec file and apply to sentry on bootup, so the /proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on `/sys/devices/system/cpu/online` are also affected by this patch. e.g. --cpuset-cpus=2,3 -> cpu number:2 --cpuset-cpus=4-7 -> cpu number:4 --cpus=2.8 -> cpu number:3 --cpus=0.5 -> cpu number:1 Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316 PiperOrigin-RevId: 213685199	2018-09-19 13:35:42 -07:00
Bhasker Hariharan	bd12e95247	Fix RTT estimation when timestamp option is enabled. From RFC7323#Section-4 The [RFC6298] RTT estimator has weighting factors, alpha and beta, based on an implicit assumption that at most one RTTM will be sampled per RTT. When multiple RTTMs per RTT are available to update the RTT estimator, an implementation SHOULD try to adhere to the spirit of the history specified in [RFC6298]. An implementation suggestion is detailed in Appendix G. From RFC7323#appendix-G Appendix G. RTO Calculation Modification Taking multiple RTT samples per window would shorten the history calculated by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a similar history as originally intended by [RFC6298]. It is roughly known how many samples a congestion window worth of data will yield, not accounting for ACK compression, and ACK losses. Such events will result in more history of the path being reflected in the final value for RTO, and are uncritical. This modification will ensure that a similar amount of time is taken into account for the RTO estimation, regardless of how many samples are taken per window: ExpectedSamples = ceiling(FlightSize / (SMSS * 2)) alpha' = alpha / ExpectedSamples beta' = beta / ExpectedSamples Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs". Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and beta' instead: RTTVAR <- (1 - beta') * RTTVAR + beta' * \|SRTT - R'\| SRTT <- (1 - alpha') * SRTT + alpha' * R' (for each sample R') PiperOrigin-RevId: 213644795 Change-Id: I52278b703540408938a8edb8c38be97b37f4a10e	2018-09-19 09:59:12 -07:00
Fabricio Voznika	8aec7473a1	Added state machine checks for Container.Status For my own sanitity when thinking about possible transitions and state. PiperOrigin-RevId: 213559482 Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c	2018-09-18 19:12:54 -07:00
Nicolas Lacasse	fd222d62ed	Short-circuit Readdir calls on overlay files when the dirent is frozen. If we have an overlay file whose corresponding Dirent is frozen, then we should not bother calling Readdir on the upper or lower files, since DirentReaddir will calculate children based on the frozen Dirent tree. A test was added that fails without this change. PiperOrigin-RevId: 213531215 Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d	2018-09-18 15:42:22 -07:00
Fabricio Voznika	7967d8ecd5	Handle children processes better in tests Reap children more systematically in container tests. Previously, container_test was taking ~5 mins to run because constainer.Destroy() would timeout waiting for the sandbox process to exit. Now the test running in less than a minute. Also made the contract around Container and Sandbox destroy clearer. PiperOrigin-RevId: 213527471 Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d	2018-09-18 15:21:28 -07:00
Michael Pratt	dd05c96d99	Increase state test timeout PiperOrigin-RevId: 213519378 Change-Id: Iffdb987da3a7209a297ea2df171d2ae5fa9b2b34	2018-09-18 14:38:42 -07:00
Kevin Krakauer	7e00f37054	Automated rollback of changelist 213307171 PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f	2018-09-18 13:22:26 -07:00
Brian Geffon	ed08597d12	Allow for MSG_CTRUNC in input flags for recv. PiperOrigin-RevId: 213481363 Change-Id: I8150ea20cebeb207afe031ed146244de9209e745	2018-09-18 11:14:37 -07:00
Fabricio Voznika	da20559137	Provide better message when memfd_create fails with ENOSYS Updates #100 PiperOrigin-RevId: 213414821 Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37	2018-09-18 02:09:28 -07:00
Fabricio Voznika	5d9816be41	Remove memory usage static init panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9	2018-09-17 21:34:37 -07:00
Fabricio Voznika	26b08e182c	Rename container in test 's' used to stand for sandbox, before container exited. PiperOrigin-RevId: 213390641 Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412	2018-09-17 21:18:27 -07:00
Tamir Duberstein	d6409b6564	Prevent TCP connect from picking bound ports PiperOrigin-RevId: 213387851 Change-Id: Icc6850761bc11afd0525f34863acd77584155140	2018-09-17 20:44:04 -07:00
Kevin Krakauer	bb88c187c5	runsc: Enable waiting on exited processes. This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558	2018-09-17 16:25:24 -07:00
Ian Gudger	ab6fa44588	Allow kernel.(*Task).Block to accept an extract only channel PiperOrigin-RevId: 213328293 Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a	2018-09-17 13:35:54 -07:00
Tamir Duberstein	a452971630	Add empty .s file to allow `//go:linkname` This was previously broken in 212917409, resulting in "missing function body" compilation errors. PiperOrigin-RevId: 213323695 Change-Id: I32a95b76a1c73fd731f223062ec022318b979bd4	2018-09-17 13:06:55 -07:00
Tamir Duberstein	23258ca284	Implement packet forwarding to enable NAT PiperOrigin-RevId: 213323501 Change-Id: I0996ddbdcf097588745efe35481085d42dbaf446	2018-09-17 13:05:36 -07:00
Michael Pratt	d639c3d61b	Allow NULL data in mount(2) PiperOrigin-RevId: 213315267 Change-Id: I7562bcd81fb22e90aa9c7dd9eeb94803fcb8c5af	2018-09-17 12:16:29 -07:00
Kevin Krakauer	25add7b22b	runsc: Fix stdin/out/err in multi-container mode. Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2	2018-09-17 11:31:28 -07:00
newmanwang	de5a590ee2	Avoid reuse of pending SignalInfo objects runApp.execute -> Task.SendSignal -> sendSignalLocked -> sendSignalTimerLocked -> pendingSignals.enqueue assumes that it owns the arch.SignalInfo returned from platform.Context.Switch. On the other hand, ptrace.context.Switch assumes that it owns the returned SignalInfo and can safely reuse it on the next call to Switch. The KVM platform always returns a unique SignalInfo. This becomes a problem when the returned signal is not immediately delivered, allowing a future signal in Switch to change the previous pending SignalInfo. This is noticeable in #38 when external SIGINTs are delivered from the PTY slave FD. Note that the ptrace stubs are in the same process group as the sentry, so they are eligible to receive the PTY signals. This should probably change, but is not the only possible cause of this bug. Updates #38 Original change by newmanwang <wcs1011@gmail.com>, updated by Michael Pratt <mpratt@google.com>. Change-Id: I5383840272309df70a29f67b25e8221f933622cd PiperOrigin-RevId: 213071072	2018-09-14 17:39:25 -07:00
Tamir Duberstein	75c66f871b	Remove buffer.Prependable.UsedBytes It is the same as buffer.Prependable.View. PiperOrigin-RevId: 213064166 Change-Id: Ib33b8a2c4da864209d9a0be0a1c113be10b520d3	2018-09-14 16:39:56 -07:00
Michael Pratt	3aa50f18a4	Reuse readlink parameter, add sockaddr max. PiperOrigin-RevId: 213058623 Change-Id: I522598c655d633b9330990951ff1c54d1023ec29	2018-09-14 16:00:02 -07:00
Tamir Duberstein	d7a05b4e63	Pass buffer.Prependable by value PiperOrigin-RevId: 213053370 Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562	2018-09-14 15:23:58 -07:00
Nicolas Lacasse	b84bfa570d	Make gVisor hard link check match Linux's. Linux permits hard-linking if the target is owned by the user OR the target has Read+Write permission. PiperOrigin-RevId: 213024613 Change-Id: If642066317b568b99084edd33ee4e8822ec9cbb3	2018-09-14 12:29:46 -07:00
Jamie Liu	0380bcb3a4	Fix interaction between rt_sigtimedwait and ignored signals. PiperOrigin-RevId: 213011782 Change-Id: I716c6ea3c586b0c6c5a892b6390d2d11478bc5af	2018-09-14 11:10:50 -07:00
Chenggang	faa34a0738	platform/kvm: Get max vcpu number dynamically by ioctl The old kernel version, such as 4.4, only support 255 vcpus. While gvisor is ran on these kernels, it could panic because the vcpu id and vcpu number beyond max_vcpus. Use ioctl(vmfd, _KVM_CHECK_EXTENSION, _KVM_CAP_MAX_VCPUS) to get max vcpus number dynamically. Change-Id: I50dd859a11b1c2cea854a8e27d4bf11a411aa45c PiperOrigin-RevId: 212929704	2018-09-13 21:47:11 -07:00
Ian Gudger	29a7271f5d	Plumb monotonic time to netstack Netstack needs to be portable, so this seems to be preferable to using raw system calls. PiperOrigin-RevId: 212917409 Change-Id: I7b2073e7db4b4bf75300717ca23aea4c15be944c	2018-09-13 19:12:15 -07:00
Lantao Liu	bde2a91433	runsc: Support container signal/wait. This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a	2018-09-13 16:38:03 -07:00
Rahat Mahmood	adf8f33970	Extend memory usage events to report mapped memory usage. PiperOrigin-RevId: 212887555 Change-Id: I3545383ce903cbe9f00d9b5288d9ef9a049b9f4f	2018-09-13 15:16:47 -07:00

... 4 5 6 7 8 ...

816 Commits All Branches Search

816 Commits

All Branches