gvisor

Commit Graph

Author	SHA1	Message	Date
Ian Gudger	45566fa4e4	Add finalizer on AtomicRefCount to check for leaks. PiperOrigin-RevId: 255711454	2019-06-28 20:07:52 -07:00
Nicolas Lacasse	295078fa7a	Automated rollback of changelist 255263686 PiperOrigin-RevId: 255679453	2019-06-28 15:28:41 -07:00
Andrei Vagin	8a625ceeb1	runsc: allow openat for runsc-race I see that runsc-race is killed by SIGSYS, because openat isn't allowed by seccomp filters: 60052 openat(AT_FDCWD, "/proc/sys/vm/overcommit_memory", O_RDONLY\|O_CLOEXEC <unfinished ...> 60052 <... openat resumed> ) = 257 60052 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0xfaacf1, si_syscall=__NR_openat, si_arch=AUDIT_ARCH_X86_64} --- PiperOrigin-RevId: 255640808	2019-06-28 11:49:45 -07:00
Michael Pratt	5b41ba5d0e	Fix various spelling issues in the documentation Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779	2019-06-27 14:25:50 -07:00
Michael Pratt	085a907565	Cache directory entries in the overlay Currently, the overlay dirCache is only used for a single logical use of getdents. i.e., it is discard when the FD is closed or seeked back to the beginning. But the initial work of getting the directory contents can be quite expensive (particularly sorting large directories), so we should keep it as long as possible. This is very similar to the readdirCache in fs/gofer. Since the upper filesystem does not have to allow caching readdir entries, the new CacheReaddir MountSourceOperations method controls this behavior. This caching should be trivially movable to all Inodes if desired, though that adds an additional copy step for non-overlay Inodes. (Overlay Inodes already do the extra copy). PiperOrigin-RevId: 255477592	2019-06-27 14:24:03 -07:00
Fabricio Voznika	42e212f6b7	Preserve permissions when checking lower The code was wrongly assuming that only read access was required from the lower overlay when checking for permissions. This allowed non-writable files to be writable in the overlay. Fixes #316 PiperOrigin-RevId: 255263686	2019-06-26 14:24:44 -07:00
Andrei Vagin	fd16a329ce	fsgopher: reopen files via /proc/self/fd When we reopen file by path, we can't be sure that we will open exactly the same file. The file can be deleted and another one with the same name can be created. PiperOrigin-RevId: 254898594	2019-06-24 21:44:27 -07:00
Fabricio Voznika	b21b1db700	Allow to change logging options using 'runsc debug' New options are: runsc debug --strace=off\|all\|function1,function2 runsc debug --log-level=warning\|info\|debug runsc debug --log-packets=true\|false Updates #407 PiperOrigin-RevId: 254843128	2019-06-24 15:03:02 -07:00
Bhasker Hariharan	a8608c501b	Enable Receive Buffer Auto-Tuning for runsc. Updates #230 PiperOrigin-RevId: 253225078	2019-06-14 07:31:45 -07:00
Ian Gudger	3e9b8ecbfe	Plumb context through more layers of filesytem. All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709	2019-06-13 18:40:38 -07:00
Adin Scannell	add40fd6ad	Update canonical repository. This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620	2019-06-13 16:50:15 -07:00
Ian Lewis	4fdd560b76	Set the HOME environment variable (fixes #293 ) runsc will now set the HOME environment variable as required by POSIX. The user's home directory is retrieved from the /etc/passwd file located on the container's file system during boot. PiperOrigin-RevId: 253120627	2019-06-13 15:45:25 -07:00
Josh Gao	b915a25597	Fix use of "2 ^ 30". 2 ^ 30 is 28, not 1073741824.	2019-06-13 14:26:26 -07:00
Andrei Vagin	bb849bad29	gvisor/runsc: apply seccomp filters before parsing a state file PiperOrigin-RevId: 252869983	2019-06-12 11:55:24 -07:00
Fabricio Voznika	356d1be140	Allow 'runsc do' to run without root '--rootless' flag lets a non-root user execute 'runsc do'. The drawback is that the sandbox and gofer processes will run as root inside a user namespace that is mapped to the caller's user, intead of nobody. And network is defaulted to '--network=host' inside the root network namespace. On the bright side, it's very convenient for testing: runsc --rootless do ls runsc --rootless do curl www.google.com PiperOrigin-RevId: 252840970	2019-06-12 09:41:50 -07:00
Fabricio Voznika	fc746efa9a	Add support to mount pod shared tmpfs mounts Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037	2019-06-11 14:54:31 -07:00
Fabricio Voznika	847c4b9759	Use net.HardwareAddr for FDBasedLink.LinkAddress It prints formatted to the log. PiperOrigin-RevId: 252699551	2019-06-11 14:31:46 -07:00
Jamie Liu	48961d27a8	Move //pkg/sentry/memutil to //pkg/memutil. PiperOrigin-RevId: 252124156	2019-06-07 14:52:27 -07:00
Jamie Liu	a26043ee53	Implement reclaim-driven MemoryFile eviction. PiperOrigin-RevId: 251950660	2019-06-06 16:27:55 -07:00
Bhasker Hariharan	85be01b42d	Add multi-fd support to fdbased endpoint. This allows an fdbased endpoint to have multiple underlying fd's from which packets can be read and dispatched/written to. This should allow for higher throughput as well as better scalability of the network stack as number of connections increases. Updates #231 PiperOrigin-RevId: 251852825	2019-06-06 08:07:02 -07:00
Michael Pratt	57772db2e7	Shutdown host sockets on internal shutdown This is required to make the shutdown visible to peers outside the sandbox. The readClosed / writeClosed fields were dropped, as they were preventing a shutdown socket from reading the remainder of queued bytes. The host syscalls will return the appropriate errors for shutdown. The control message tests have been split out of socket_unix.cc to make the (few) remaining tests accessible to testing inherited host UDS, which don't support sending control messages. Updates #273 PiperOrigin-RevId: 251763060	2019-06-05 18:40:37 -07:00
Fabricio Voznika	f1aee6a7ad	Refactor container FS setup No change in functionaly. Added containerMounter object to keep state while the mounts are processed. This will help upcoming changes to share mounts per-pod. PiperOrigin-RevId: 251350096	2019-06-03 18:20:57 -07:00
Fabricio Voznika	d28f71adcf	Remove 'clearStatus' option from container.Wait*PID() clearStatus was added to allow detached execution to wait on the exec'd process and retrieve its exit status. However, it's not currently used. Both docker and gvisor-containerd-shim wait on the "shim" process and retrieve the exit status from there. We could change gvisor-containerd-shim to use waits, but it will end up also consuming a process for the wait, which is similar to having the shim process. Closes #234 PiperOrigin-RevId: 251349490	2019-06-03 18:16:09 -07:00
Michael Pratt	955685845e	Remove spurious period PiperOrigin-RevId: 251288885	2019-06-03 12:48:24 -07:00
Bhasker Hariharan	035a8fa38e	Add support for collecting execution trace to runsc. Updates #220 PiperOrigin-RevId: 250532302	2019-05-30 12:07:11 -07:00
Fabricio Voznika	c091e62369	Set sticky bit to /tmp This is generally done for '/tmp' to prevent accidental deletion of files. More details here: http://man7.org/linux/man-pages/man1/chmod.1.html#RESTRICTED_DELETION_FLAG_OR_STICKY_BIT PiperOrigin-RevId: 249633207 Change-Id: I444a5b406fdef664f5677b2f20f374972613a02b	2019-05-23 06:48:00 -07:00
Fabricio Voznika	9006304dfe	Initial support for bind mounts Separate MountSource from Mount. This is needed to allow mounts to be shared by multiple containers within the same pod. PiperOrigin-RevId: 249617810 Change-Id: Id2944feb7e4194951f355cbe6d4944ae3c02e468	2019-05-23 04:16:10 -07:00
Fabricio Voznika	ecb0f00e10	Cleanup around urpc file payload handling urpc always closes all files once the RPC function returns. PiperOrigin-RevId: 248406857 Change-Id: I400a8562452ec75c8e4bddc2154948567d572950	2019-05-15 14:36:28 -07:00
Andrei Vagin	85380ff03d	gvisor/runsc: use a veth link address instead of generating a new one PiperOrigin-RevId: 248367340 Change-Id: Id792afcfff9c9d2cfd62cae21048316267b4a924	2019-05-15 11:11:58 -07:00
Fabricio Voznika	1bee43be13	Implement fallocate(2) Closes #225 PiperOrigin-RevId: 247508791 Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc	2019-05-09 15:35:49 -07:00
Andrei Vagin	bf0ac565d2	Fix runsc restore to be compatible with docker start --checkpoint ... Change-Id: I02b30de13f1393df66edf8829fedbf32405d18f8 PiperOrigin-RevId: 246621192	2019-05-03 21:41:45 -07:00
Jamie Liu	8bfb83d0ac	Implement async MemoryFile eviction, and use it in CachingInodeOperations. This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb	2019-04-30 13:56:41 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Kevin Krakauer	43dff57b87	Make raw sockets a toggleable feature disabled by default. PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615	2019-04-26 16:51:46 -07:00
Bhasker Hariharan	99b877fa1d	Revert runsc to use RecvMMsg packet dispatcher. PacketMMap mode has issues due to a kernel bug. This change reverts us to using recvmmsg instead of a shared ring buffer to dispatch inbound packets. This will reduce performance but should be more stable under heavy load till PacketMMap is updated to use TPacketv3. See #210 for details. Perf difference between recvmmsg vs packetmmap. RecvMMsg : iperf3 -c 172.17.0.2 Connecting to host 172.17.0.2, port 5201 [ 4] local 172.17.0.1 port 43478 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 778 MBytes 6.53 Gbits/sec 4349 188 KBytes [ 4] 1.00-2.00 sec 786 MBytes 6.59 Gbits/sec 4395 212 KBytes [ 4] 2.00-3.00 sec 756 MBytes 6.34 Gbits/sec 3655 161 KBytes [ 4] 3.00-4.00 sec 782 MBytes 6.56 Gbits/sec 4419 175 KBytes [ 4] 4.00-5.00 sec 755 MBytes 6.34 Gbits/sec 4317 187 KBytes [ 4] 5.00-6.00 sec 774 MBytes 6.49 Gbits/sec 4002 173 KBytes [ 4] 6.00-7.00 sec 737 MBytes 6.18 Gbits/sec 3904 191 KBytes [ 4] 7.00-8.00 sec 530 MBytes 4.44 Gbits/sec 3318 189 KBytes [ 4] 8.00-9.00 sec 487 MBytes 4.09 Gbits/sec 2627 188 KBytes [ 4] 9.00-10.00 sec 770 MBytes 6.46 Gbits/sec 4221 170 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec 39207 sender [ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec receiver iperf Done. PacketMMap: bhaskerh@gvisor-bench:~/tensorflow$ iperf3 -c 172.17.0.2 Connecting to host 172.17.0.2, port 5201 [ 4] local 172.17.0.1 port 43496 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 657 MBytes 5.51 Gbits/sec 0 1.01 MBytes [ 4] 1.00-2.00 sec 1021 MBytes 8.56 Gbits/sec 0 1.01 MBytes [ 4] 2.00-3.00 sec 1.21 GBytes 10.4 Gbits/sec 45 1.01 MBytes [ 4] 3.00-4.00 sec 1018 MBytes 8.54 Gbits/sec 15 1.01 MBytes [ 4] 4.00-5.00 sec 1.28 GBytes 11.0 Gbits/sec 45 1.01 MBytes [ 4] 5.00-6.00 sec 1.38 GBytes 11.9 Gbits/sec 0 1.01 MBytes [ 4] 6.00-7.00 sec 1.34 GBytes 11.5 Gbits/sec 45 856 KBytes [ 4] 7.00-8.00 sec 1.23 GBytes 10.5 Gbits/sec 0 901 KBytes [ 4] 8.00-9.00 sec 1010 MBytes 8.48 Gbits/sec 0 923 KBytes [ 4] 9.00-10.00 sec 1.39 GBytes 11.9 Gbits/sec 0 960 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec 150 sender [ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec receiver Updates #210 PiperOrigin-RevId: 244968438 Change-Id: Id461b5cbff2dea6fa55cfc108ea246d8f83da20b	2019-04-23 19:07:06 -07:00
Fabricio Voznika	c8cee7108f	Use FD limit and file size limit from host FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6	2019-04-17 12:57:40 -07:00
Fabricio Voznika	9f8c89fc7f	Return error from fdbased.New RELNOTES: n/a PiperOrigin-RevId: 244031742 Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81	2019-04-17 11:16:35 -07:00
Bhasker Hariharan	eaac2806ff	Add TCP checksum verification. PiperOrigin-RevId: 242704699 Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f	2019-04-09 11:23:47 -07:00
Andrei Vagin	88409e983c	gvisor: Add support for the MS_NOEXEC mount option https://github.com/google/gvisor/issues/145 PiperOrigin-RevId: 242044115 Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878	2019-04-04 17:43:53 -07:00
Kevin Krakauer	f9431fb20f	Remove obsolete TODO. PiperOrigin-RevId: 241637164 Change-Id: I65476a739cf38f1818dc47f6ce60638dec8b77a8	2019-04-02 17:27:05 -07:00
Kevin Krakauer	a40ee4f4b8	Change bug number for duplicate bug. PiperOrigin-RevId: 241567897 Change-Id: I580eac04f52bb15f4aab7df9822c4aa92e743021	2019-04-02 11:28:06 -07:00
Andrei Vagin	a046054ba3	gvisor/runsc: enable generic segmentation offload (GSO) The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 241072403 Change-Id: I20b03063a1a6649362b43609cbbc9b59be06e6d5	2019-03-29 16:27:38 -07:00
Jamie Liu	8f4634997b	Decouple filemem from platform and move it to pgalloc.MemoryFile. This is in preparation for improved page cache reclaim, which requires greater integration between the page cache and page allocator. PiperOrigin-RevId: 238444706 Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4	2019-03-14 08:12:48 -07:00
Nicolas Lacasse	2512cc5617	Allow filesystem.Mount to take an optional interface argument. PiperOrigin-RevId: 238360231 Change-Id: I5eaf8d26f8892f77d71c7fbd6c5225ef471cedf1	2019-03-13 19:24:03 -07:00
Ian Gudger	a16f6e50c5	Make HandleLocal apply to all non-loopback interfaces. HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify the implementations. This has the benefit of making HandleLocal apply even when the fdbased link endpoint isn't in use. In addition, move looping logic to route creation so that it doesn't need to be run for each packet. This should improve performance. PiperOrigin-RevId: 238099480 Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23	2019-03-12 14:37:56 -07:00
Fabricio Voznika	bc9b979b94	Add profiling commands to runsc Example: runsc debug --root=<dir> \ --profile-heap=/tmp/heap.prof \ --profile-cpu=/tmp/cpu.prod --profile-delay=30 \ <container ID> PiperOrigin-RevId: 237848456 Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773	2019-03-11 11:47:30 -07:00
Ian Gudger	56a6128295	Implement IP_MULTICAST_LOOP. IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default route are looped back. In order to implement this switch, support for sending and looping back multicast packets on the default route had to be implemented. For now we only support IPv4 multicast. PiperOrigin-RevId: 237534603 Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c	2019-03-08 15:49:17 -08:00
Fabricio Voznika	0b76887147	Priority-inheritance futex implementation It is Implemented without the priority inheritance part given that gVisor defers scheduling decisions to Go runtime and doesn't have control over it. PiperOrigin-RevId: 236989545 Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560	2019-03-05 23:40:18 -08:00
Fabricio Voznika	fcba4e8f04	Add uncaught signal message to the user log This help troubleshoot cases where the container is killed and the app logs don't show the reason. PiperOrigin-RevId: 236982883 Change-Id: I361892856a146cea5b04abaa3aedbf805e123724	2019-03-05 22:20:17 -08:00
Fabricio Voznika	3dbd4a16f8	Add semctl(GETPID) syscall Also added unimplemented notification for semctl(2) commands. PiperOrigin-RevId: 236340672 Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478	2019-03-01 10:57:02 -08:00
Kevin Krakauer	b75aa51504	Rename ping endpoints to icmp endpoints. PiperOrigin-RevId: 235248572 Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09	2019-02-22 13:34:47 -08:00
Nicolas Lacasse	0a41ea72c1	Don't allow writing or reading to TTY unless process group is in foreground. If a background process tries to read from a TTY, linux sends it a SIGTTIN unless the signal is blocked or ignored, or the process group is an orphan, in which case the syscall returns EIO. See drivers/tty/n_tty.c:n_tty_read()=>job_control(). If a background process tries to write a TTY, set the termios, or set the foreground process group, linux then sends a SIGTTOU. If the signal is ignored or blocked, linux allows the write. If the process group is an orphan, the syscall returns EIO. See drivers/tty/tty_io.c:tty_check_change(). PiperOrigin-RevId: 234044367 Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9	2019-02-14 15:47:31 -08:00
Bhasker Hariharan	e0b3d3323f	Add support for using PACKET_RX_RING to receive packets. PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the kernel. This should cut down the number of host syscalls that need to be made to receive packets when the underlying fd is a socket of the AF_PACKET type. PiperOrigin-RevId: 233834998 Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f	2019-02-13 14:53:03 -08:00
Nicolas Lacasse	92e85623a0	Factor the subtargets method into a helper method with tests. PiperOrigin-RevId: 232047515 Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60	2019-02-01 15:23:43 -08:00
Michael Pratt	2a0c69b19f	Remove license comments Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses($.$)./licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0	2019-01-31 11:12:53 -08:00
Andrei Vagin	7e8a56087b	runsc: check whether a container is deleted or not before setupContainerFS PiperOrigin-RevId: 231811387 Change-Id: Ib143fb9a4d0fa1f105d1a3a3bd533dfc44e792af	2019-01-31 10:34:15 -08:00
Andrei Vagin	dd577f5410	runsc: reap a sandbox process only in sandbox.Wait() PiperOrigin-RevId: 231504064 Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d	2019-01-29 17:15:56 -08:00
Bhasker Hariharan	24cb2c0a72	Use recvmmsg() instead of readv() to read packets from NIC. This should reduce the number of syscalls required to process packets significantly and improve throughputs. PiperOrigin-RevId: 231366886 Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca	2019-01-29 01:39:01 -08:00
Fabricio Voznika	c1be25b78d	Scrub runsc error messages Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1	2019-01-18 17:36:02 -08:00
Fabricio Voznika	e4d3ca7263	Prevent internal tmpfs mount to override files in /tmp Runsc wants to mount /tmp using internal tmpfs implementation for performance. However, it risks hiding files that may exist under /tmp in case it's present in the container. Now, it only mounts over /tmp iff: - /tmp was not explicitly asked to be mounted - /tmp is empty If any of this is not true, then /tmp maps to the container's image /tmp. Note: checkpoint doesn't have sentry FS mounted to check if /tmp is empty. It simply looks for explicit mounts right now. PiperOrigin-RevId: 229607856 Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06	2019-01-16 12:48:32 -08:00
Nicolas Lacasse	dc8450b567	Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations. More helper structs have been added to the fsutil package to make it easier to implement fs.InodeOperations and fs.FileOperations. PiperOrigin-RevId: 229305982 Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b	2019-01-14 20:34:28 -08:00
Michael Pratt	33191e1cc4	Automated rollback of changelist 225089593 PiperOrigin-RevId: 227595007 Change-Id: If14cc5aab869c5fd7a4ebd95929c887ab690e94c	2019-01-02 15:48:00 -08:00
Fabricio Voznika	a891afad6d	Simplify synchronization between runsc and sandbox process Make 'runsc create' join cgroup before creating sandbox process. This removes the need to synchronize platform creation and ensure that sandbox process is charged to the right cgroup from the start. PiperOrigin-RevId: 227166451 Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2	2018-12-28 13:48:24 -08:00
Jamie Liu	194ef586fc	Rename limits.MemoryPagesLocked to limits.MemoryLocked. "RLIMIT_MEMLOCK: This is the maximum number of bytes of memory that may be locked into RAM." - getrlimit(2) PiperOrigin-RevId: 226384346 Change-Id: Iefac4a1bb69f7714dc813b5b871226a8344dc800	2018-12-20 13:28:46 -08:00
Googler	86c9bd2547	Automated rollback of changelist 225861605 PiperOrigin-RevId: 226224230 Change-Id: Id24c7d3733722fd41d5fe74ef64e0ce8c68f0b12	2018-12-19 13:30:08 -08:00
Michael Pratt	b62591e6a8	Expose internal testing flag Never to used outside of runsc tests! PiperOrigin-RevId: 225919013 Change-Id: Ib3b14aa2a2564b5246fb3f8933d95e01027ed186	2018-12-17 17:35:06 -08:00
Jamie Liu	2421006426	Implement mlock(), kind of. Currently mlock() and friends do nothing whatsoever. However, mlocking is directly application-visible in a number of ways; for example, madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked regions. We handle this inconsistently: MADV_DONTNEED is too important to not work, but MS_INVALIDATE is rejected. Change MM to track mlocked regions in a manner consistent with Linux. It still will not actually pin pages into host physical memory, but: - mlock() will now cause sentry memory management to precommit mlocked pages. - MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as described above. PiperOrigin-RevId: 225861605 Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee	2018-12-17 11:38:59 -08:00
Michael Pratt	24c1158b9c	Add "trace signal" option This option is effectively equivalent to -panic-signal, except that the sandbox does not die after logging the traceback. PiperOrigin-RevId: 225089593 Change-Id: Ifb1c411210110b6104613f404334bd02175e484e	2018-12-11 16:12:41 -08:00
Brian Geffon	d3bc79bc84	Open source system call tests. PiperOrigin-RevId: 224886231 Change-Id: I0fccb4d994601739d8b16b1d4e6b31f40297fb22	2018-12-10 14:42:34 -08:00
Zhaozhong Ni	9984138abe	sentry: turn "dynamically-created" procfs files into static creation. PiperOrigin-RevId: 224600982 Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34	2018-12-07 17:03:54 -08:00
Brian Geffon	82719be42e	Max link traversals should be for an entire path. The number of symbolic links that are allowed to be followed are for a full path and not just a chain of symbolic links. PiperOrigin-RevId: 224047321 Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40	2018-12-04 14:32:03 -08:00
Fabricio Voznika	eaac94d91c	Use RET_KILL_PROCESS if available in kernel RET_KILL_THREAD doesn't work well for Go because it will kill only the offending thread and leave the process hanging. RET_TRAP can be masked out and it's not guaranteed to kill the process. RET_KILL_PROCESS is available since 4.14. For older kernel, continue to use RET_TRAP as this is the best option (likely to kill process, easy to debug). PiperOrigin-RevId: 222357867 Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd	2018-11-20 22:56:51 -08:00
Fabricio Voznika	fadffa2ff8	Add unsupported syscall events for get/setsockopt PiperOrigin-RevId: 222148953 Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa	2018-11-20 14:04:12 -08:00
Fabricio Voznika	237f9c7a5e	Don't fail when destroyContainerFS is called more than once This can happen when destroy is called multiple times or when destroy failed previously and is being called again. PiperOrigin-RevId: 221882034 Change-Id: I8d069af19cf66c4e2419bdf0d4b789c5def8d19e	2018-11-20 14:03:42 -08:00
Nicolas Lacasse	845836c578	Internal change. PiperOrigin-RevId: 221848471 Change-Id: I882fbe5ce7737048b2e1f668848e9c14ed355665	2018-11-20 14:03:11 -08:00
Nicolas Lacasse	40f843fc78	Internal change. PiperOrigin-RevId: 221343626 Change-Id: I03d57293a555cf4da9952a81803b9f8463173c89	2018-11-13 15:18:17 -08:00
Nicolas Lacasse	6c2d320138	Internal change. PiperOrigin-RevId: 221299066 Change-Id: I8ae352458f9976c329c6946b1efa843a3de0eaa4	2018-11-13 11:14:28 -08:00
Fabricio Voznika	d97ccfa346	Close donated files if containerManager.Start() fails PiperOrigin-RevId: 220869535 Change-Id: I9917e5daf02499f7aab6e2aa4051c54ff4461b9a	2018-11-09 14:54:34 -08:00
Nicolas Lacasse	13b48f2e6a	AsyncBarrier should be run after all defers in destroyContainerFS. destroyContainerFS must wait for all async operations to finish before returning. In an attempt to do this, we call fs.AsyncBarrier() at the end of the function. However, there are many defer'd DecRefs which end up running AFTER the AsyncBarrier() call. This CL fixes this by calling fs.AsyncBarrier() in the first defer statement, thus ensuring that it runs at the end of the function, after all other defers. PiperOrigin-RevId: 220523545 Change-Id: I5e96ee9ea6d86eeab788ff964484c50ef7f64a2f	2018-11-07 13:55:36 -08:00
Fabricio Voznika	c92b9b7086	Add more logging to controller.go PiperOrigin-RevId: 220519632 Change-Id: Iaeec007fc1aa3f0b72569b288826d45f2534c4bf	2018-11-07 13:33:19 -08:00
Fabricio Voznika	86b3f0cd24	Fix race between start and destroy Before this change, a container starting up could race with destroy (aka delete) and leave processes behind. Now, whenever a container is created, Loader.processes gets a new entry. Start now expects the entry to be there, and if it's not it means that the container was deleted. I've also fixed Loader.waitPID to search for the process using the init process's PID namespace. We could use a few more tests for signal and wait. I'll send them in another cl. PiperOrigin-RevId: 220224290 Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4	2018-11-05 21:29:37 -08:00
Fabricio Voznika	a467f09261	Log when external signal is received PiperOrigin-RevId: 220204591 Change-Id: I21a9c6f5c12a376d18da5d10c1871837c4f49ad2	2018-11-05 17:42:24 -08:00
Fabricio Voznika	5cd55cd90f	Use spec with clean paths for gofer Otherwise the gofer's attach point may be different from sandbox when there symlinks in the path. PiperOrigin-RevId: 219730492 Change-Id: Ia9c4c2d16228c6a1a9e790e0cb673fd881003fe1	2018-11-01 17:52:11 -07:00
Fabricio Voznika	b6b81fd04b	Add new log format that is compatible with Kubernetes Fluentd configuration uses 'log' for the log message while containerd uses 'msg'. Since we can't have a single JSON format for both, add another log format and make debug log configurable. PiperOrigin-RevId: 219729658 Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0	2018-11-01 17:44:58 -07:00
Ian Lewis	9d69d85bc1	Make error messages a bit more user friendly. Updated error messages so that it doesn't print full Go struct representations when running a new container in a sandbox. For example, this occurs frequently when commands are not found when doing a 'kubectl exec'. PiperOrigin-RevId: 219729141 Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09	2018-11-01 17:40:09 -07:00
Adin Scannell	0091db9cbd	kvm: use private futexes. Use private futexes for performance and to align with other runtime uses. PiperOrigin-RevId: 219422634 Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd	2018-10-30 22:46:42 -07:00
Kevin Krakauer	b42a2a3203	Removes outdated TODO. PiperOrigin-RevId: 219151173 Change-Id: I73014ea648ae485692ea0d44860c87f4365055cb	2018-10-29 10:31:56 -07:00
Adin Scannell	75cd70ecc9	Track paths and provide a rename hook. This change also adds extensive testing to the p9 package via mocks. The sanity checks and type checks are moved from the gofer into the core package, where they can be more easily validated. PiperOrigin-RevId: 218296768 Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55	2018-10-23 00:20:15 -07:00
Fabricio Voznika	b2068cf5a5	Add more unimplemented syscall events Added events for ctl syscalls that may have multiple different commands. For runsc, each syscall event is only logged once. For ctl syscalls, use the cmd as identifier, not only the syscall number. PiperOrigin-RevId: 218015941 Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59	2018-10-20 11:14:23 -07:00
Ian Gudger	8fce67af24	Use correct company name in copyright header PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035	2018-10-19 16:35:11 -07:00
Fabricio Voznika	f3ffa4db52	Resolve mount paths while setting up root fs mount It's hard to resolve symlinks inside the sandbox because rootfs and mounts may be read-only, forcing us to create mount points inside lower layer of an overlay, before the volumes are mounted. Since the destination must already be resolved outside the sandbox when creating mounts, take this opportunity to rewrite the spec with paths resolved. "runsc boot" will use the "resolved" spec to load mounts. In addition, symlink traversals were disabled while mounting containers inside the sandbox. It haven't been able to write a good test for it. So I'm relying on manual tests for now. PiperOrigin-RevId: 217749904 Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9	2018-10-18 12:42:24 -07:00
Nicolas Lacasse	e0bb94201f	Close the gofer socket gracefully in boot:boot_test. We were closing the FD directly. If the test then created a new socket pair with the same FD, in-flight RPCs would get directed to the new socket and break the test. Instead, we should use unet.Socket.Close(), which allows any in-flight RPCs to finish. PiperOrigin-RevId: 217608491 Change-Id: I8c5a76638899ba30f33ca976e6fac967fa0aadbf	2018-10-17 16:18:39 -07:00
Nicolas Lacasse	4e6f0892c9	runsc: Support job control signals for the root container. Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732	2018-10-17 12:29:05 -07:00
Kevin Krakauer	8cbca46b6d	Remove incorrect TODO. PiperOrigin-RevId: 217548429 Change-Id: Ie640c881fdc4fc70af58c8ca834df1ac531e519a	2018-10-17 10:55:34 -07:00
Kevin Krakauer	9b3550f70b	runsc: Add --pid flag to runsc kill. --pid allows specific processes to be signalled rather than the container root process or all processes in the container. containerd needs to SIGKILL exec'd processes that timeout and check whether processes are still alive. PiperOrigin-RevId: 217547636 Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45	2018-10-17 10:51:39 -07:00
Nicolas Lacasse	3bc5e6482b	Fix reference leak in tests. PiperOrigin-RevId: 216780438 Change-Id: Ide637fe36f8d2a61fea9e5b16d1b3401f2540416	2018-10-11 16:23:54 -07:00
Fabricio Voznika	e68d86e1bd	Make debug log file name configurable This is a breaking change if you're using --debug-log-dir. The fix is to replace it with --debug-log and add a '/' at the end: --debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/ PiperOrigin-RevId: 216761212 Change-Id: I244270a0a522298c48115719fa08dad55e34ade1	2018-10-11 14:29:37 -07:00
Fabricio Voznika	f413e4b117	Add bare bones unsupported syscall logging This change introduces a new flags to create/run called --user-log. Logs to this files are visible to users and are meant to help debugging problems with their images and containers. For now only unsupported syscalls are sent to this log, and only minimum support was added. We can build more infrastructure around it as needed. PiperOrigin-RevId: 216735977 Change-Id: I54427ca194604991c407d49943ab3680470de2d0	2018-10-11 11:56:54 -07:00
Kevin Krakauer	e21ba16d9c	Removes irrelevant TODO. PiperOrigin-RevId: 216616873 Change-Id: I4d974ab968058eadd01542081e18a987ef08f50a	2018-10-10 16:50:59 -07:00

1 2 3 4 5 ...

262 Commits