gvisor

Commit Graph

Author	SHA1	Message	Date
Fabricio Voznika	cff2c57192	Fix bad merge PiperOrigin-RevId: 235818534 Change-Id: I99f7e3fd1dc808b35f7a08b96b7c3226603ab808	2019-02-26 16:42:06 -08:00
Ruidong Cao	a2b794b30d	FPE_INTOVF (integer overflow) should be 2 refer to Linux. Signed-off-by: Ruidong Cao <crdfrank@gmail.com> Change-Id: I03f8ab25cf29257b31f145cf43304525a93f3300 PiperOrigin-RevId: 235763203	2019-02-26 11:48:49 -08:00
Fabricio Voznika	23fe059761	Lazily allocate inotify map on inode PiperOrigin-RevId: 235735865 Change-Id: I84223eb18eb51da1fa9768feaae80387ff6bfed0	2019-02-26 09:33:44 -08:00
Fabricio Voznika	10426e0f31	Handle invalid offset in sendfile(2) PiperOrigin-RevId: 235578698 Change-Id: I608ff5e25eac97f6e1bda058511c1f82b0e3b736	2019-02-25 12:17:46 -08:00
Googler	532f4b2fba	Internal change. PiperOrigin-RevId: 235053594 Change-Id: Ie3d7b11843d0710184a2463886c7034e8f5305d1	2019-02-21 13:08:34 -08:00
Haibo Xu	15d3189884	Make some ptrace commands x86-only Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9751f859332d433ca772d6b9733f5a5a64398ec7 PiperOrigin-RevId: 234877624	2019-02-20 15:10:59 -08:00
Amanda Tait	ea070b9d5f	Implement Broadcast support This change adds support for the SO_BROADCAST socket option in gVisor Netstack. This support includes getsockopt()/setsockopt() functionality for both UDP and TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and down the stack, and route finding/creation for broadcast packets. Finally, a suite of tests have been implemented, exercising this functionality through the Linux syscall API. PiperOrigin-RevId: 234850781 Change-Id: If3e666666917d39f55083741c78314a06defb26c	2019-02-20 12:54:13 -08:00
Kevin Krakauer	ec2460b189	netstack: Add SIOCGSTAMP support. Ping sometimes uses this instead of SO_TIMESTAMP. PiperOrigin-RevId: 234699590 Change-Id: Ibec9c34fa0d443a931557a2b1b1ecd83effe7765	2019-02-19 16:41:32 -08:00
Jamie Liu	bed6f8534b	Set rax to syscall number on SECCOMP_RET_TRAP. PiperOrigin-RevId: 234690475 Change-Id: I1cbfb5aecd4697a4a26ec8524354aa8656cc3ba1	2019-02-19 15:49:37 -08:00
Jamie Liu	bb47d8a545	Fix clone(CLONE_NEWUSER). - Use new user namespace for namespace creation checks. - Ensure userns is never nil since it's used by other namespaces. PiperOrigin-RevId: 234673175 Change-Id: I4b9d9d1e63ce4e24362089793961a996f7540cd9	2019-02-19 14:20:05 -08:00
Jamie Liu	22d8b6eba1	Break /proc/[pid]/{uid,gid}_map's dependence on seqfile. In addition to simplifying the implementation, this fixes two bugs: - seqfile.NewSeqFile unconditionally creates an inode with mode 0444, but {uid,gid}_map have mode 0644. - idMapSeqFile.Write implements fs.FileOperations.Write ... but it doesn't implement any other fs.FileOperations methods and is never used as fs.FileOperations. idMapSeqFile.GetFile() => seqfile.SeqFile.GetFile() uses seqfile.seqFileOperations instead, which rejects all writes. PiperOrigin-RevId: 234638212 Change-Id: I4568f741ab07929273a009d7e468c8205a8541bc	2019-02-19 11:21:46 -08:00
Ian Gudger	c611dbc5a7	Implement IP_MULTICAST_IF. This allows setting a default send interface for IPv4 multicast. IPv6 support will come later. PiperOrigin-RevId: 234251379 Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173	2019-02-15 18:40:15 -08:00
Kevin Krakauer	a9cb3dcd9d	Move SO_TIMESTAMP from different transport endpoints to epsocket. SO_TIMESTAMP is reimplemented in ping and UDP sockets (and needs to be added for TCP), but can just be implemented in epsocket for simplicity. This will also make SIOCGSTAMP easier to implement. PiperOrigin-RevId: 234179300 Change-Id: Ib5ea0b1261dc218c1a8b15a65775de0050fe3230	2019-02-15 11:18:44 -08:00
Fabricio Voznika	e34d27e8b6	Redirect FIXME to more appropriate bug PiperOrigin-RevId: 234147487 Change-Id: I779a6012832bb94a6b89f5bcc7d821b40ae969cc	2019-02-15 08:23:27 -08:00
Nicolas Lacasse	0a41ea72c1	Don't allow writing or reading to TTY unless process group is in foreground. If a background process tries to read from a TTY, linux sends it a SIGTTIN unless the signal is blocked or ignored, or the process group is an orphan, in which case the syscall returns EIO. See drivers/tty/n_tty.c:n_tty_read()=>job_control(). If a background process tries to write a TTY, set the termios, or set the foreground process group, linux then sends a SIGTTOU. If the signal is ignored or blocked, linux allows the write. If the process group is an orphan, the syscall returns EIO. See drivers/tty/tty_io.c:tty_check_change(). PiperOrigin-RevId: 234044367 Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9	2019-02-14 15:47:31 -08:00
Jamie Liu	0e84ae72e0	Improve safecopy sanity checks. - Fix CopyIn/CopyOut/ZeroOut range checks. - Include the faulting signal number in the panic message. PiperOrigin-RevId: 233829501 Change-Id: I8959ead12d05dbd4cd63c2b908cddeb2a27eb513	2019-02-13 14:25:15 -08:00
Googler	7aaa6cf225	Internal change. PiperOrigin-RevId: 233802562 Change-Id: I40e1b13fd571daaf241b00f8df4bcedd034dc3f1	2019-02-13 12:07:34 -08:00
Nicolas Lacasse	f17692d807	Add fs.AsyncWithContext and call it in fs/gofer/inodeOperations.Release. fs/gofer/inodeOperations.Release does some asynchronous work. Previously it was calling fs.Async with an anonymous function, which caused the function to be allocated on the heap. Because Release is relatively hot, this results in a lot of small allocations and increased GC pressure, noticeable in perf profiles. This CL adds a new function, AsyncWithContext, which is just like Async, but passes a context to the async function. It avoids the need for an extra anonymous function in fs/gofer/inodeOperations.Release. The Async function itself still requires a single anonymous function. PiperOrigin-RevId: 233141763 Change-Id: I1dce4a883a7be9a8a5b884db01e654655f16d19c	2019-02-08 15:54:15 -08:00
Nicolas Lacasse	e884168e1e	Encode stat to bytes manually, instead of calling CopyObjectOut. CopyObjectOut grows its destination byte slice incrementally, causing many small slice allocations on the heap. This leads to increased GC and noticeably slower stat calls. PiperOrigin-RevId: 233140904 Change-Id: Ieb90295dd8dd45b3e56506fef9d7f86c92e97d97	2019-02-08 15:48:23 -08:00
Nicolas Lacasse	9c9386d2a8	CopyObjectOut should allocate a byte slice the size of the encoded object. This adds an extra Reflection call to CopyObjectOut, but avoids many small slice allocations if the object is large, since without this we grow the backing slice incrementally as we encode more data. PiperOrigin-RevId: 233110960 Change-Id: I93569af55912391e5471277f779139c23f040147	2019-02-08 13:00:00 -08:00
Ian Gudger	80f901b16b	Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack. Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in a followup CL. PiperOrigin-RevId: 233008638 Change-Id: If7dae6222fef43fda48033f0292af77832d95e82	2019-02-07 23:15:23 -08:00
Rahat Mahmood	2ba74f84be	Implement /proc/net/unix. PiperOrigin-RevId: 232948478 Change-Id: Ib830121e5e79afaf5d38d17aeef5a1ef97913d23	2019-02-07 14:44:21 -08:00
Nicolas Lacasse	fcae058a14	Make context.Background return a global background context. It currently allocates a new context on the heap each time it is called. Some of these are in relatively hot paths like signal delivery and releasing gofer inodes. It is also called very commonly in afterLoad. All of these should benefit from fewer heap allocations. PiperOrigin-RevId: 232938873 Change-Id: I53cec0ca299f56dcd4866b0b4fd2ec4938526849	2019-02-07 13:55:23 -08:00
Fabricio Voznika	9ef3427ac1	Implement semctl(2) SETALL and GETALL PiperOrigin-RevId: 232914984 Change-Id: Id2643d7ad8e986ca9be76d860788a71db2674cda	2019-02-07 11:41:44 -08:00
Zach Koopmans	0cf7fc4e11	Change /proc/PID/cmdline to read environment vector. - Change proc to return envp on overwrite of argv with limitations from upstream. - Add unit tests - Change layout of argv/envp on the stack so that end of argv is contiguous with beginning of envp. PiperOrigin-RevId: 232506107 Change-Id: I993880499ab2c1220f6dc456a922235c49304dec	2019-02-05 10:02:06 -08:00
Fabricio Voznika	2d20b121d7	CachingInodeOperations was over-dirtying cached attributes Dirty should be set only when the attribute is changed in the cache only. Instances where the change was also sent to the backing file doesn't need to dirty the attribute. Also remove size update during WriteOut as writing dirty page would naturaly grow the file if needed. RELNOTES: relnotes is needed for the parent CL. PiperOrigin-RevId: 232068978 Change-Id: I00ba54693a2c7adc06efa9e030faf8f2e8e7f188	2019-02-01 17:51:48 -08:00
Nicolas Lacasse	92e85623a0	Factor the subtargets method into a helper method with tests. PiperOrigin-RevId: 232047515 Change-Id: I00f036816e320356219be7b2f2e6d5fe57583a60	2019-02-01 15:23:43 -08:00
Michael Pratt	fe1369ac98	Move package sync to third_party PiperOrigin-RevId: 231889261 Change-Id: I482f1df055bcedf4edb9fe3fe9b8e9c80085f1a0	2019-01-31 17:49:14 -08:00
Michael Pratt	88b4ce8cac	Fix comment PiperOrigin-RevId: 231861005 Change-Id: I134d4e20cc898d44844219db0a8aacda87e11ef0	2019-01-31 15:03:12 -08:00
Fabricio Voznika	a497f5ed5f	Invalidate COW mappings when file is truncated This changed required making fsutil.HostMappable use a backing file to ensure the correct FD would be used for read/write operations. RELNOTES: relnotes is needed for the parent CL. PiperOrigin-RevId: 231836164 Change-Id: I8ae9639715529874ea7d80a65e2c711a5b4ce254	2019-01-31 12:54:00 -08:00
Michael Pratt	2a0c69b19f	Remove license comments Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses($.$)./licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0	2019-01-31 11:12:53 -08:00
Haibo Xu	cedff8d3ae	Add muldiv/rd_tsc support for arm64 platform. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: If35459be78e023346a140184401172f8e023c7f9 PiperOrigin-RevId: 231638020	2019-01-30 11:49:08 -08:00
Zhaozhong Ni	ae6e37df2a	Convert TODO into FIXME. PiperOrigin-RevId: 231301228 Change-Id: I3e18f3a12a35fb89a22a8c981188268d5887dc61	2019-01-28 15:34:18 -08:00
Nicolas Lacasse	09cf3b40a8	Fix data race in InodeSimpleAttributes.Unstable. We were modifying InodeSimpleAttributes.Unstable.AccessTime without holding the necessary lock. Luckily for us, InodeSimpleAttributes already has a NotifyAccess method that will do the update while holding the lock. In addition, we were holding dfo.dir.mu.Lock while setting AccessTime, which is unnecessary, so that lock has been removed. PiperOrigin-RevId: 231278447 Change-Id: I81ed6d3dbc0b18e3f90c1df5e5a9c06132761769	2019-01-28 13:26:28 -08:00
Jamie Liu	1cedccf8e9	Drop the one-page limit for /proc/[pid]/{cmdline,environ}. It never actually should have applied to environ (the relevant change in Linux 4.2 is c2c0bb44620d "proc: fix PAGE_SIZE limit of /proc/$PID/cmdline"), and we claim to be Linux 4.4 now anyway. PiperOrigin-RevId: 231250661 Change-Id: I37f9c4280a533d1bcb3eebb7803373ac3c7b9f15	2019-01-28 11:00:23 -08:00
Fabricio Voznika	55e8eb775b	Make cacheRemoteRevalidating detect changes to file size When file size changes outside the sandbox, page cache was not refreshing file size which is required for cacheRemoteRevalidating. In fact, cacheRemoteRevalidating should be skipping the cache completely since it's not really benefiting from it. The cache is cache is already bypassed for unstable attributes (see cachePolicy.cacheUAttrs). And althought the cache is called to map pages, they will always miss the cache and map directly from the host. Created a HostMappable struct that maps directly to the host and use it for files with cacheRemoteRevalidating. Closes #124 PiperOrigin-RevId: 230998440 Change-Id: Ic5f632eabe33b47241e05e98c95e9b2090ae08fc	2019-01-25 17:23:07 -08:00
Adin Scannell	b5088ba59c	cleanup: extract the kernel from context Change-Id: I94704a90beebb53164325e0cce1fcb9a0b97d65c PiperOrigin-RevId: 230817308	2019-01-24 17:02:52 -08:00
Rahat Mahmood	8d7c10e908	Display /proc/net entries for all network configurations. Most of the entries are stubbed out at the moment, but even those were only displayed if IPv6 support was enabled. The entries should be displayed with IPv4-support only, and with only loopback devices. PiperOrigin-RevId: 229946441 Change-Id: I18afaa3af386322787f91bf9d168ab66c01d5a4c	2019-01-18 10:02:12 -08:00
Nicolas Lacasse	12bc7834dc	Allow fsync on a directory. PiperOrigin-RevId: 229781337 Change-Id: I1f946cff2771714fb1abd83a83ed454e9febda0a	2019-01-17 11:06:59 -08:00
Nicolas Lacasse	dc8450b567	Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations. More helper structs have been added to the fsutil package to make it easier to implement fs.InodeOperations and fs.FileOperations. PiperOrigin-RevId: 229305982 Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b	2019-01-14 20:34:28 -08:00
Zach Koopmans	7f8de3bf92	Fixing select call to not enforce RLIMIT_NOFILE. Removing check to RLIMIT_NOFILE in select call. Adding unit test to select suite to document behavior. Moving setrlimit class from mlock to a util file for reuse. Fixing flaky test based on comments from Jamie. PiperOrigin-RevId: 228726131 Change-Id: Ie9dbe970bbf835ba2cca6e17eec7c2ee6fadf459	2019-01-10 09:44:45 -08:00
Jamie Liu	9270d940eb	Minor memevent fixes. - Call MemoryEvents.done.Add(1) outside of MemoryEvents.run() so that if MemoryEvents.Stop() => MemoryEvents.done.Wait() is called before the goroutine starts running, it still waits for the goroutine to stop. - Use defer to call MemoryEvents.done.Done() in MemoryEvents.run() so that it's called even if the goroutine panics. PiperOrigin-RevId: 228623307 Change-Id: I1b0459e7999606c1a1a271b16092b1ca87005015	2019-01-09 17:54:40 -08:00
Nicolas Lacasse	d321f575e2	Fix lock order violation. overlayFileOperations.Readdir was holding overlay.copyMu while calling DirentReaddir, which then attempts to take take the corresponding Dirent.mu, causing a lock order violation. (See lock order documentation in fs/copy_up.go.) We only actually need to hold copyMu during readdirEntries(), so holding the lock is moved in there, thus avoiding the lock order violation. A new lock was added to protect overlayFileOperations.dirCache. We were inadvertently relying on copyMu to protect this. There is no reason it should not have its own lock. PiperOrigin-RevId: 228542473 Change-Id: I03c3a368c8cbc0b5a79d50cc486fc94adaddc1c2	2019-01-09 10:29:36 -08:00
Brian Geffon	dd761c170c	Allow MSG_OOB and MSG_DONTROUTE to be no-ops on recvmsg(2). PiperOrigin-RevId: 228428223 Change-Id: I433ba5ffc15ea4c2706ec944901b8269b1f364f8	2019-01-08 17:13:17 -08:00
Brian Geffon	3676b7ff1c	Improve loader related error messages returned to users. PiperOrigin-RevId: 228382827 Change-Id: Ica1d30e0df826bdd77f180a5092b2b735ea5c804	2019-01-08 12:58:08 -08:00
Jamie Liu	f95b94fbe3	Grant no initial capabilities to non-root UIDs. See modified comment in auth.NewUserCredentials(); compare to the behavior of setresuid(2) as implemented by //pkg/sentry/kernel/task_identity.go:kernel.Task.setKUIDsUncheckedLocked(). PiperOrigin-RevId: 228381765 Change-Id: I45238777c8f63fcf41b99fce3969caaf682fe408	2019-01-08 12:52:24 -08:00
Jamie Liu	dc4849e49c	Add usermem support for arm64 platform. Signed-off-by: Haibo Xu <haibo.xu@arm.com> PiperOrigin-RevId: 228249611 Change-Id: I1046e70bec4274f18b9948eefd6b0d546e4c48bb	2019-01-07 15:40:26 -08:00
Jamie Liu	901ed5da44	Implement /proc/[pid]/smaps. PiperOrigin-RevId: 228245523 Change-Id: I5a4d0a6570b93958e51437e917e5331d83e23a7e	2019-01-07 15:17:44 -08:00
Fabricio Voznika	8e586db162	Add /proc/net/psched content FIO reads this file and expects it to be well formed. PiperOrigin-RevId: 227554483 Change-Id: Ia48ae2377626dd6a2daf17b5b4f5119f90ece55b	2019-01-02 11:39:57 -08:00
Andrei Vagin	652d068119	Implement SO_REUSEPORT for TCP and UDP sockets This option allows multiple sockets to be bound to the same port. Incoming packets are distributed to sockets using a hash based on source and destination addresses. This means that all packets from one sender will be received by the same server socket. PiperOrigin-RevId: 227153413 Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf	2018-12-28 11:27:14 -08:00
Fabricio Voznika	46e6577014	Fix deadlock between epoll_wait and getdents epoll_wait acquires EventPoll.listsMu (in EventPoll.ReadEvents) and then calls Inotify.Readiness which tries to acquire Inotify.evMu. getdents acquires Inotify.evMu (in Inotify.queueEvent) and then calls readyCallback.Callback which tries to acquire EventPoll.listsMu. The fix is to release Inotify.evMu before calling Queue.Notify. Queue is thread-safe and doesn't require Inotify.evMu to be held. Closes #121 PiperOrigin-RevId: 227066695 Change-Id: Id29364bb940d1727f33a5dff9a3c52f390c15761	2018-12-27 14:59:50 -08:00
Ian Gudger	bce2f9751f	Plumb IP_MULTICAST_TTL to netstack. PiperOrigin-RevId: 226993086 Change-Id: I71757f231436538081d494da32ca69f709bc71c7	2018-12-26 23:52:12 -08:00
Brian Geffon	bfa2f314ca	Add EventChannel messages for uncaught signals. PiperOrigin-RevId: 226936778 Change-Id: I2a6dda157c55d39d81e1b543ab11a58a0bfe5c05	2018-12-26 11:26:28 -08:00
Ian Gudger	0df0df35fc	Stub out SO_OOBINLINE. We don't explicitly support out-of-band data and treat it like normal in-band data. This is equilivent to SO_OOBINLINE being enabled, so always report that it is enabled. PiperOrigin-RevId: 226572742 Change-Id: I4c30ccb83265e76c30dea631cbf86822e6ee1c1b	2018-12-21 19:46:55 -08:00
Ian Gudger	b515556519	Implement SO_KEEPALIVE, TCP_KEEPIDLE, and TCP_KEEPINTVL. Within gVisor, plumb new socket options to netstack. Within netstack, fix GetSockOpt and SetSockOpt return value logic. PiperOrigin-RevId: 226532229 Change-Id: If40734e119eed633335f40b4c26facbebc791c74	2018-12-21 13:13:45 -08:00
Fabricio Voznika	1679ef31ef	inotify notifies watchers when control events bit are set The code that matches the event being published with events watchers was wronly matching all watchers in case any of the control event bits were set. Issue #121 PiperOrigin-RevId: 226521230 Change-Id: Ie2c42bc4366faaf59fbf80a74e9297499bd93f9e	2018-12-21 11:54:02 -08:00
Jamie Liu	9a442fa4b5	Automated rollback of changelist 226224230 PiperOrigin-RevId: 226493053 Change-Id: Ia98d1cb6dd0682049e4d907ef69619831de5c34a	2018-12-21 08:23:34 -08:00
Nicolas Lacasse	8ba450363f	Deflake gofer_test. We must wait for all lazy resources to be released before closing the rootFile. PiperOrigin-RevId: 226419499 Change-Id: I1d4d961a92b3816e02690cf3eaf0a88944d730cc	2018-12-20 17:23:26 -08:00
Ian Gudger	f6274804e1	Make read and write respect SO_RCVTIMEO and SO_SNDTIMEO PiperOrigin-RevId: 226387521 Change-Id: I0579ab262320fde6c72d2994dd38437f01a99ea5	2018-12-20 13:48:52 -08:00
Jamie Liu	194ef586fc	Rename limits.MemoryPagesLocked to limits.MemoryLocked. "RLIMIT_MEMLOCK: This is the maximum number of bytes of memory that may be locked into RAM." - getrlimit(2) PiperOrigin-RevId: 226384346 Change-Id: Iefac4a1bb69f7714dc813b5b871226a8344dc800	2018-12-20 13:28:46 -08:00
Googler	86c9bd2547	Automated rollback of changelist 225861605 PiperOrigin-RevId: 226224230 Change-Id: Id24c7d3733722fd41d5fe74ef64e0ce8c68f0b12	2018-12-19 13:30:08 -08:00
Zach Koopmans	ff7178a4d1	Implement pwritev2. Implement pwritev2 and associated unit tests. Clean up preadv2 unit tests. Tag RWF_ flags in both preadv2 and pwritev2 with associated bug tickets. PiperOrigin-RevId: 226222119 Change-Id: Ieb22672418812894ba114bbc88e67f1dd50de620	2018-12-19 13:16:06 -08:00
Jamie Liu	898838e34d	Fix mremap expansion with mm.checkInvariants = true. Also remove useless RSS changes in mm.movePMAsLocked(). PiperOrigin-RevId: 226052996 Change-Id: If59fd259b93238fb2f15c1c8ebfeda14cb590a87	2018-12-18 13:50:33 -08:00
Jamie Liu	3b3f026278	Truncate ar before calling mm.breakCopyOnWriteLocked(). ... as required by the latter's precondition. PiperOrigin-RevId: 226033824 Change-Id: I6bc46d0e100c61cc58cb5fc69e70c4ca905cd92d	2018-12-18 11:52:31 -08:00
Fabricio Voznika	03226cd950	Add BPFAction type with Stringer PiperOrigin-RevId: 226018694 Change-Id: I98965e26fe565f37e98e5df5f997363ab273c91b	2018-12-18 10:28:28 -08:00
Ian Gudger	12c7430a01	Fix recv blocking for connectionless Unix sockets. Connectionless Unix sockets (DGRAM Unix sockets created with the socket system call) inherently only have a read queue. They do not establish bidirectional connections, instead, the connect system call only sets a default send location. Writes give the data to the other endpoint which has its own read queue. To simplify the code, connectionless Unix sockets still get read and write queues, but the write queue is a dummy and never waited on. The read queue is the connectionless endpoint's queue. This change fixes a bug where the dummy queue was incorrectly set as the read queue and the endpoint's queue was incorrectly set as the write queue. This meant that read notifications went to the dummy queue and were black holed. PiperOrigin-RevId: 225921042 Change-Id: I8d9059def787a2c3c305185b92d05093fbd2be2a	2018-12-17 17:53:22 -08:00
Nicolas Lacasse	d3ae74d2a5	overlayBoundEndpoint must be recursive if there is an overlay in the lower. The old overlayBoundEndpoint assumed that the lower is not an overlay. It should check if the lower is an overlay and handle that case. PiperOrigin-RevId: 225882303 Change-Id: I60660c587d91db2826e0719da0983ec8ad024cb8	2018-12-17 13:46:57 -08:00
Jamie Liu	2421006426	Implement mlock(), kind of. Currently mlock() and friends do nothing whatsoever. However, mlocking is directly application-visible in a number of ways; for example, madvise(MADV_DONTNEED) and msync(MS_INVALIDATE) both fail on mlocked regions. We handle this inconsistently: MADV_DONTNEED is too important to not work, but MS_INVALIDATE is rejected. Change MM to track mlocked regions in a manner consistent with Linux. It still will not actually pin pages into host physical memory, but: - mlock() will now cause sentry memory management to precommit mlocked pages. - MADV_DONTNEED and MS_INVALIDATE will interact with mlocked pages as described above. PiperOrigin-RevId: 225861605 Change-Id: Iee187204979ac9a4d15d0e037c152c0902c8d0ee	2018-12-17 11:38:59 -08:00
Adin Scannell	5d8cf31346	Move fdnotifier package to reduce internal confusion. PiperOrigin-RevId: 225632398 Change-Id: I909e7e2925aa369adc28e844c284d9a6108e85ce	2018-12-14 18:05:01 -08:00
Andrei Vagin	3cf84e3bef	Mark sync.Mutex in TTYFileOperations as nosave PiperOrigin-RevId: 225621767 Change-Id: Ie3a42cdf0b0de22a020ff43e307bf86409cff329	2018-12-14 16:24:21 -08:00
Ian Gudger	e1dcf92ec5	Implement SO_SNDTIMEO PiperOrigin-RevId: 225620490 Change-Id: Ia726107b3f58093a5f881634f90b071b32d2c269	2018-12-14 16:15:06 -08:00
Ian Gudger	4659f7ed1a	Fix WAITALL and RCVTIMEO interaction PiperOrigin-RevId: 225424296 Change-Id: I60fcc2b859339dca9963cb32227a287e719ab765	2018-12-13 13:20:46 -08:00
Rahat Mahmood	ccce1d4281	Filesystems shouldn't be saving references to Platform. Platform objects are not savable, storing references to them in filesystem datastructures would cause save to fail if someone actually passed in a Platform. Current implementations work because everywhere a Platform is expected, we currently pass in a Kernel object which embeds Platform and thus satisfies the interface. Eliminate this indirection and save pointers to Kernel directly. PiperOrigin-RevId: 225288336 Change-Id: Ica399ff43f425e15bc150a0d7102196c3d54a2ab	2018-12-12 17:47:55 -08:00
Rahat Mahmood	f93c288dd7	Fix a data race on Shm.key. PiperOrigin-RevId: 225240907 Change-Id: Ie568ce3cd643f3e4a0eaa0444f4ed589dcf6031f	2018-12-12 13:18:48 -08:00
Rahat Mahmood	75e39eaa74	Pass information about map writableness to filesystems. This is necessary to implement file seals for memfds. PiperOrigin-RevId: 225239394 Change-Id: Ib3f1ab31385afc4b24e96cd81a05ef1bebbcbb70	2018-12-12 13:09:59 -08:00
Michael Pratt	2b6df6a204	Format unshare flags unshare actually takes a subset of clone flags, but has no unique flags, so formatting as clone flags is close enough. PiperOrigin-RevId: 225082774 Change-Id: I5b580f18607c7785f323e37809094115520a17c0	2018-12-11 15:33:14 -08:00
Christopher Koch	5934fad1d7	Remove unused envv variable from two funcs. PiperOrigin-RevId: 225041520 Change-Id: Ib1afc693e592d308d60db82022c5b7743fd3c646	2018-12-11 11:40:16 -08:00
Haibo Xu	52fe3b87a4	Add safecopy support for arm64 platform. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I565214581eeb44045169da7f44d45a489082ac3a PiperOrigin-RevId: 224938170	2018-12-10 21:35:02 -08:00
Ian Gudger	5d87d8865f	Implement MSG_WAITALL MSG_WAITALL requests that recv family calls do not perform short reads. It only has an effect for SOCK_STREAM sockets, other types ignore it. PiperOrigin-RevId: 224918540 Change-Id: Id97fbf972f1f7cbd4e08eec0138f8cbdf1c94fe7	2018-12-10 17:56:34 -08:00
Rahat Mahmood	fc29770251	Add type safety to shm ids and keys. PiperOrigin-RevId: 224864380 Change-Id: I49542279ad56bf15ba462d3de1ef2b157b31830a	2018-12-10 12:48:02 -08:00
Michael Pratt	99d5958693	Validate FS_BASE in Task.Clone arch_prctl already verified that the new FS_BASE was canonical, but Task.Clone did not. Centralize these checks in the arch packages. Failure to validate could cause an error in PTRACE_SET_REGS when we try to switch to the app. PiperOrigin-RevId: 224862398 Change-Id: Iefe63b3f9aa6c4810326b8936e501be3ec407f14	2018-12-10 12:37:16 -08:00
Ian Gudger	25b8424d75	Stub out TCP_QUICKACK PiperOrigin-RevId: 224696233 Change-Id: I45c425d9e32adee5dcce29ca7439a06567b26014	2018-12-09 00:50:33 -08:00
Zhaozhong Ni	9984138abe	sentry: turn "dynamically-created" procfs files into static creation. PiperOrigin-RevId: 224600982 Change-Id: I547253528e24fb0bb318fc9d2632cb80504acb34	2018-12-07 17:03:54 -08:00
Michael Pratt	42e2e5cae9	Format sigaction in strace Sample: I1206 14:24:56.768520 3700 x:0] [ 1] ioctl_test E rt_sigaction(SIGSEGV, 0x7ee6edb0c590 {Handler: 0x559c6d915cf0, Flags: SA_SIGINFO\|SA_RESTORER\|SA_ONSTACK\|SA_NODEFER, Restorer: 0x2a9901a259a0, Mask: []}, 0x7ee6edb0c630) I1206 14:24:56.768530 3700 x:0] [ 1] ioctl_test X rt_sigaction(SIGSEGV, 0x7ee6edb0c590 {Handler: 0x559c6d915cf0, Flags: SA_SIGINFO\|SA_RESTORER\|SA_ONSTACK\|SA_NODEFER, Restorer: 0x2a9901a259a0, Mask: []}, 0x7ee6edb0c630 {Handler: SIG_DFL, Flags: 0x0, Restorer: 0x0, Mask: []}) = 0x0 (2.701?s) PiperOrigin-RevId: 224596606 Change-Id: I3512493aed99d3d75600249263da46686b1dc0e7	2018-12-07 16:28:54 -08:00
Michael Pratt	673949048e	Add period to comment PiperOrigin-RevId: 224553291 Change-Id: I35d0772c215b71f4319c23f22df5c61c908f8590	2018-12-07 11:53:19 -08:00
Michael Pratt	51900fe3a4	Format signals, signal masks in strace Sample: I1205 16:51:49.869701 2492 x:0] [ 1] ioctl_test E rt_sigaction(SIGIO, 0x7e0e5b5e8500, 0x7e0e5b5e85a0) I1205 16:51:49.869766 2492 x:0] [ 1] ioctl_test X rt_sigaction(SIGIO, 0x7e0e5b5e8500, 0x7e0e5b5e85a0) = 0x0 (44.336?s) I1205 16:51:49.869831 2492 x:0] [ 1] ioctl_test E rt_sigprocmask(SIG_UNBLOCK, 0x7e0e5b5e8878 [SIGIO], 0x7e0e5b5e87c0, 0x8) I1205 16:51:49.869866 2492 x:0] [ 1] ioctl_test X rt_sigprocmask(SIG_UNBLOCK, 0x7e0e5b5e8878 [SIGIO], 0x7e0e5b5e87c0 [SIGIO 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64], 0x8) = 0x0 (2.575?s) PiperOrigin-RevId: 224422404 Change-Id: I3ed3f2ec6b1a639baa9cacd37ce7ee325c3703e4	2018-12-06 15:47:06 -08:00
Michael Pratt	666db00c26	Convert ValueSet to a map Unlike FlagSet, order doesn't matter here, so it can simply be a map. PiperOrigin-RevId: 224377910 Change-Id: I15810c698a7f02d8614bf09b59583ab73cba0514	2018-12-06 11:43:11 -08:00
Ian Gudger	000fa84a3b	Fix tcpip.Endpoint.Write contract regarding short writes * Clarify tcpip.Endpoint.Write contract regarding short writes. * Enforce tcpip.Endpoint.Write contract regarding short writes. * Update relevant users of tcpip.Endpoint.Write. PiperOrigin-RevId: 224377586 Change-Id: I24299ecce902eb11317ee13dae3b8d8a7c5b097d	2018-12-06 11:41:33 -08:00
Rahat Mahmood	685eaf119f	Add counters for memory events. Also ensure an event is emitted at startup. PiperOrigin-RevId: 224372065 Change-Id: I5f642b6d6b13c6468ee8f794effe285fcbbf29cf	2018-12-06 11:15:47 -08:00
Zach Koopmans	4d8c7ae869	Fixing O_TRUNC behavior to match Linux. PiperOrigin-RevId: 224351139 Change-Id: I9453bd75e5a8d38db406bb47fdc01038ac60922e	2018-12-06 09:26:49 -08:00
Michael Pratt	9f64e64a6e	Enforce directory accessibility before delete Walk By Walking before checking that the directory is writable and executable, MayDelete may return the Walk error (e.g., ENOENT) which would normally be masked by a permission error (EACCES). PiperOrigin-RevId: 224222453 Change-Id: I108a7f730e6bdaa7f277eaddb776267c00805475	2018-12-05 14:31:58 -08:00
Jamie Liu	23438b3632	Update MM.usageAS when mremap copies or moves a mapping. PiperOrigin-RevId: 224221509 Change-Id: I7aaea74629227d682786d3e435737364921249bf	2018-12-05 14:27:23 -08:00
Michael Pratt	592f5bdc67	Add context to mount errors This makes it more obvious why a mount failed. PiperOrigin-RevId: 224203880 Change-Id: I7961774a7b6fdbb5493a791f8b3815c49b8f7631	2018-12-05 12:46:30 -08:00
Zach Koopmans	06131fe749	Check for CAP_SYS_RESOURCE in prctl(PR_SET_MM, ...) If sys_prctl is called with PR_SET_MM without CAP_SYS_RESOURCE, the syscall should return failure with errno set to EPERM. See: http://man7.org/linux/man-pages/man2/prctl.2.html PiperOrigin-RevId: 224182874 Change-Id: I630d1dd44af8b444dd16e8e58a0764a0cf1ad9a3	2018-12-05 10:53:51 -08:00
Michael Pratt	076f107643	Remove initRegs arg from clone It is always the same as t.initRegs. PiperOrigin-RevId: 224085550 Change-Id: I5cc4ddc3b481d4748c3c43f6f4bb50da1dbac694	2018-12-04 18:53:43 -08:00
Brian Geffon	ffcbda0c8b	Partial writes should loop in rpcinet. FileOperations.Write should return ErrWouldBlock to allow the upper layer to loop and sendmsg should continue writing where it left off on a partial write. PiperOrigin-RevId: 224081631 Change-Id: Ic61f6943ea6b7abbd82e4279decea215347eac48	2018-12-04 18:15:10 -08:00
Brian Geffon	2cab0e82ad	Linkat(2) should sanity check flags. PiperOrigin-RevId: 224047765 Change-Id: I6f3c75b33c32bf8f8910ea3fab35406d7d672d87	2018-12-04 14:34:19 -08:00
Brian Geffon	82719be42e	Max link traversals should be for an entire path. The number of symbolic links that are allowed to be followed are for a full path and not just a chain of symbolic links. PiperOrigin-RevId: 224047321 Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40	2018-12-04 14:32:03 -08:00
Zhaozhong Ni	adafc08d7c	sentry: save / restore netstack procfs configuration. PiperOrigin-RevId: 224047120 Change-Id: Ia6cb17fa978595cd73857b6178c4bdba401e185e	2018-12-04 14:30:42 -08:00
Brian Geffon	5a6a1eb420	Enforce name length restriction on paths. NAME_LENGTH must be enforced per component. PiperOrigin-RevId: 224046749 Change-Id: Iba8105b00d951f2509dc768af58e4110dafbe1c9	2018-12-04 14:28:33 -08:00
Rahat Mahmood	806e346491	Fix mempolicy_test on bazel. Bazel runs multiple test cases on the same thread. Some of the test cases rely on the test thread starting with the default memory policy, while other tests modify the test thread's memory policy. This obviously breaks when the test framework doesn't run each test case on a new thread. Also fixing an incompatibility where set_mempolicy(2) was prevented from specifying an empty nodemask, which is allowed for some modes. PiperOrigin-RevId: 224038957 Change-Id: Ibf780766f2706ebc9b129dbc8cf1b85c2a275074	2018-12-04 13:45:58 -08:00
Nicolas Lacasse	54dd0d0dc5	Fix data race caused by unlocked call of Dirent.descendantOf. PiperOrigin-RevId: 224025363 Change-Id: I98864403c779832e9e1436f7d3c3f6fb2fba9904	2018-12-04 12:24:55 -08:00
Ian Gudger	5560615c53	Return an int32 for netlink SO_RCVBUF Untyped integer constants default to type int and the binary package will panic if one tries to encode an int. PiperOrigin-RevId: 223890001 Change-Id: Iccc3afd6d74bad24c35d764508e450fd317b76ec	2018-12-03 17:03:15 -08:00
Nicolas Lacasse	573622fdca	Fix data race in fs.Async. Replaces the WaitGroup with a RWMutex. Calls to Async hold the mutex for reading, while AsyncBarrier takes the lock for writing. This ensures that all executing Async work finishes before AsyncBarrier returns. Also pushes the Async() call from Inode.Release into gofer/InodeOperations.Release(). This removes a recursive Async call which should not have been allowed in the first place. The gofer Release call is the slow one (since it may make RPCs to the gofer), so putting the Async call there makes sense. PiperOrigin-RevId: 223093067 Change-Id: I116da7b20fce5ebab8d99c2ab0f27db7c89d890e	2018-11-27 18:17:09 -08:00
Brian Geffon	5bd02b224f	Save shutdown flags first. With rpcinet if shutdown flags are not saved before making the rpc a race is possible where blocked threads are woken up before the flags have been persisted. This would mean that threads can block indefinitely in a recvmsg after a shutdown(SHUT_RD) has happened. PiperOrigin-RevId: 223089783 Change-Id: If595e7add12aece54bcdf668ab64c570910d061a	2018-11-27 17:48:05 -08:00
Haibo Xu	9e0f132377	Add procid support for arm64 platform Change-Id: I7c3db8dfdf95a125d7384c1d67c3300dbb99a47e PiperOrigin-RevId: 223039923	2018-11-27 12:46:39 -08:00
Zach Koopmans	b3b60ea29a	Implementation of preadv2 for Linux 4.4 support Implement RWF_HIPRI (4.6) silently passes the read call. Implement -1 offset calls readv. PiperOrigin-RevId: 222840324 Change-Id: If9ddc1e8d086e1a632bdf5e00bae08205f95b6b0	2018-11-26 09:50:47 -08:00
Fabricio Voznika	eaac94d91c	Use RET_KILL_PROCESS if available in kernel RET_KILL_THREAD doesn't work well for Go because it will kill only the offending thread and leave the process hanging. RET_TRAP can be masked out and it's not guaranteed to kill the process. RET_KILL_PROCESS is available since 4.14. For older kernel, continue to use RET_TRAP as this is the best option (likely to kill process, easy to debug). PiperOrigin-RevId: 222357867 Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd	2018-11-20 22:56:51 -08:00
Fabricio Voznika	5236b78242	Dumps stacks if watchdog thread is stuck PiperOrigin-RevId: 222332703 Change-Id: Id5c3cf79591c5d2949895b4e323e63c48c679820	2018-11-20 17:24:19 -08:00
Fabricio Voznika	8b314b0bf4	Fix recursive read lock taken on TaskSet SyncSyscallFiltersToThreadGroup and Task.TheadID() both acquired TaskSet RWLock in R mode and could deadlock if a writer comes in between. PiperOrigin-RevId: 222313551 Change-Id: I4221057d8d46fec544cbfa55765c9a284fe7ebfa	2018-11-20 15:07:56 -08:00
Michael Pratt	03c1eb78b5	Reference upstream licenses Include copyright notices and the referenced LICENSE file. PiperOrigin-RevId: 222171321 Change-Id: I0cc0b167ca51b536d1087bf1c4742fdf1430bc2a	2018-11-20 14:05:16 -08:00
Fabricio Voznika	fadffa2ff8	Add unsupported syscall events for get/setsockopt PiperOrigin-RevId: 222148953 Change-Id: I21500a9f08939c45314a6414e0824490a973e5aa	2018-11-20 14:04:12 -08:00
Nicolas Lacasse	8c84f9a3c1	Parse the tmpfs mode before validating. This gets rid of the problematic modeRegex. PiperOrigin-RevId: 221835959 Change-Id: I566b8d8a43579a4c30c0a08a620a964bbcd826dd	2018-11-20 14:02:39 -08:00
Adin Scannell	bb9a2bb62e	Update futex to use usermem abstractions. This eliminates the indirection that existed in task_futex. PiperOrigin-RevId: 221832498 Change-Id: Ifb4c926d493913aa6694e193deae91616a29f042	2018-11-20 14:02:07 -08:00
Rahat Mahmood	f7aa937124	Advertise vsyscall support via /proc/<pid>/maps. Also update test utilities for probing vsyscall support and add a metric to see if vsyscalls are actually used in sandboxes. PiperOrigin-RevId: 221698834 Change-Id: I57870ecc33ea8c864bd7437833f21aa1e8117477	2018-11-15 15:14:38 -08:00
Nicolas Lacasse	6ef08c2bc2	Allow setting sticky bit in tmpfs permissions. PiperOrigin-RevId: 221683127 Change-Id: Ide6a9f41d75aa19d0e2051a05a1e4a114a4fb93c	2018-11-15 13:48:59 -08:00
Ian Gudger	7f60294a73	Implement TCP_NODELAY and TCP_CORK Previously, TCP_NODELAY was always enabled and we would lie about it being configurable. TCP_NODELAY is now disabled by default (to match Linux) in the socket layer so that non-gVisor users don't automatically start using this questionable optimization. PiperOrigin-RevId: 221368472 Change-Id: Ib0240f66d94455081f4e0ca94f09d9338b2c1356	2018-11-13 18:02:43 -08:00
Googler	25d07fbbed	Internal change. PiperOrigin-RevId: 221189534 Change-Id: Id20d318bed97d5226b454c9351df396d11251e1f	2018-11-12 17:44:46 -08:00
Andrei Vagin	2ef122da35	Implement sync_file_range() sync_file_range - sync a file segment with disk In Linux, sync_file_range() accepts three flags: SYNC_FILE_RANGE_WAIT_BEFORE Wait upon write-out of all pages in the specified range that have already been submitted to the device driver for write-out before performing any write. SYNC_FILE_RANGE_WRITE Initiate write-out of all dirty pages in the specified range which are not presently submitted write-out. Note that even this may block if you attempt to write more than request queue size. SYNC_FILE_RANGE_WAIT_AFTER Wait upon write-out of all pages in the range after performing any write. In this implementation: SYNC_FILE_RANGE_WAIT_BEFORE without SYNC_FILE_RANGE_WAIT_AFTER isn't supported right now. SYNC_FILE_RANGE_WRITE is skipped. It should initiate write-out of all dirty pages, but it doesn't wait, so it should be safe to do nothing while nobody uses SYNC_FILE_RANGE_WAIT_BEFORE. SYNC_FILE_RANGE_WAIT_AFTER is equal to fdatasync(). In Linux, sync_file_range() doesn't writes out the file's meta-data, but fdatasync() does if a file size is changed. PiperOrigin-RevId: 220730840 Change-Id: Iae5dfb23c2c916967d67cf1a1ad32f25eb3f6286	2018-11-08 17:39:51 -08:00
Rahat Mahmood	5a0be6fa20	Create stubs for syscalls upto Linux 4.4. Create syscall stubs for missing syscalls upto Linux 4.4 and advertise a kernel version of 4.4. PiperOrigin-RevId: 220667680 Change-Id: Idbdccde538faabf16debc22f492dd053a8af0ba7	2018-11-08 11:09:46 -08:00
Ian Lewis	9d69d85bc1	Make error messages a bit more user friendly. Updated error messages so that it doesn't print full Go struct representations when running a new container in a sandbox. For example, this occurs frequently when commands are not found when doing a 'kubectl exec'. PiperOrigin-RevId: 219729141 Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09	2018-11-01 17:40:09 -07:00
Rahat Mahmood	0e277a39c8	Prevent premature destruction of shm segments. Shm segments can be marked for lazy destruction via shmctl(IPC_RMID), which destroys a segment once it is no longer attached to any processes. We were unconditionally decrementing the segment refcount on shmctl(IPC_RMID) which allowed a user to force a segment to be destroyed by repeatedly calling shmctl(IPC_RMID), with outstanding memory maps to the segment. This is problematic because the memory released by a segment destroyed this way can be reused by a different process while remaining accessible by the process with outstanding maps to the segment. PiperOrigin-RevId: 219713660 Change-Id: I443ab838322b4fb418ed87b2722c3413ead21845	2018-11-01 15:54:14 -07:00
Juan	b23cd33682	modify modeRegexp to adapt the default spec of containerd https://github.com/containerd/containerd/blob/master/oci/spec.go#L206, the mode=755 didn't match the pattern modeRegexp = regexp.MustCompile("0[0-7][0-7][0-7]"). Closes #112 Signed-off-by: Juan <xionghuan.cn@gmail.com> Change-Id: I469e0a68160a1278e34c9e1dbe4b7784c6f97e5a PiperOrigin-RevId: 219672525	2018-11-01 11:57:54 -07:00
Adin Scannell	fb613020c7	kvm: simplify floating point logic. This reduces the number of floating point save/restore cycles required (since we don't need to restore immediately following the switch, this always happens in a known context) and allows the kernel hooks to capture state. This lets us remove calls like "Current()". PiperOrigin-RevId: 219552844 Change-Id: I7676fa2f6c18b9919718458aa888b832a7db8cab	2018-10-31 15:59:23 -07:00
Adin Scannell	c4bbb54168	kvm: add detailed traces on vCPU errors. This improves debuggability greatly. PiperOrigin-RevId: 219551560 Change-Id: I2ecaffdd1c17b0d9f25911538ea6f693e2bc699f	2018-10-31 15:50:10 -07:00
Adin Scannell	e9dbd5ab67	kvm: avoid siginfo allocations. PiperOrigin-RevId: 219492587 Change-Id: I47f6fc0b74a4907ab0aff03d5f26453bdb983bb5	2018-10-31 10:08:06 -07:00
Adin Scannell	0091db9cbd	kvm: use private futexes. Use private futexes for performance and to align with other runtime uses. PiperOrigin-RevId: 219422634 Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd	2018-10-30 22:46:42 -07:00
Adin Scannell	e7191f058f	Use TRAP to simplify vsyscall emulation. PiperOrigin-RevId: 218592058 Change-Id: I373a2d813aa6cc362500dd5a894c0b214a1959d7	2018-10-24 15:52:44 -07:00
Ian Gudger	425dccdd7e	Convert Unix transport to syserr Previously this code used the tcpip error space. Since it is no longer part of netstack, it can use the sentry's error space (except for a few cases where there is still some shared code. This reduces the number of error space conversions required for hot Unix socket operations. PiperOrigin-RevId: 218541611 Change-Id: I3d13047006a8245b5dfda73364d37b8a453784bb	2018-10-24 11:05:08 -07:00
Nicolas Lacasse	4a1a2dead9	Run ptrace stubs in their own session and process group. Pseudoterminal job control signals are meant to be received and handled by the sandbox process, but if the ptrace stubs are running in the same process group, they will receive the signals as well and inject then into the sentry kernel. This can result in duplicate signals being delivered (often to the wrong process), or a sentry panic if the ptrace stub is inactive. This CL makes the ptrace stub run in a new session. PiperOrigin-RevId: 218536851 Change-Id: Ie593c5687439bbfbf690ada3b2197ea71ed60a0e	2018-10-24 10:42:35 -07:00
Rahat Mahmood	46603b569c	Fix panic on creation of zero-len shm segments. Attempting to create a zero-len shm segment causes a panic since we try to allocate a zero-len filemem region. The existing code had a guard to disallow this, but the check didn't encode the fact that requesting a private segment implies a segment creation regardless of whether IPC_CREAT is explicitly specified. PiperOrigin-RevId: 218405743 Change-Id: I30aef1232b2125ebba50333a73352c2f907977da	2018-10-23 14:18:54 -07:00
Adin Scannell	75cd70ecc9	Track paths and provide a rename hook. This change also adds extensive testing to the p9 package via mocks. The sanity checks and type checks are moved from the gofer into the core package, where they can be more easily validated. PiperOrigin-RevId: 218296768 Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55	2018-10-23 00:20:15 -07:00
Ian Gudger	d7c11c7417	Refcount Unix transport queue This allows us to release messages in the queue when all users close. PiperOrigin-RevId: 218033550 Change-Id: I2f6e87650fced87a3977e3b74c64775c7b885c1b	2018-10-20 17:58:26 -07:00
Fabricio Voznika	b2068cf5a5	Add more unimplemented syscall events Added events for ctl syscalls that may have multiple different commands. For runsc, each syscall event is only logged once. For ctl syscalls, use the cmd as identifier, not only the syscall number. PiperOrigin-RevId: 218015941 Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59	2018-10-20 11:14:23 -07:00
Ian Gudger	8fce67af24	Use correct company name in copyright header PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035	2018-10-19 16:35:11 -07:00
Ian Gudger	f7419fec26	Use generic ilist in Unix transport queue This should improve performance. PiperOrigin-RevId: 217610560 Change-Id: I370f196ea2396f1715a460b168ecbee197f94d6c	2018-10-17 16:31:15 -07:00
Jamie Liu	b2a88ff471	Check thread group CPU timers in the CPU clock ticker. This reduces the number of goroutines and runtime timers when ITIMER_VIRTUAL or ITIMER_PROF are enabled, or when RLIMIT_CPU is set. This also ensures that thread group CPU timers only advance if running tasks are observed at the time the CPU clock advances, mostly eliminating the possibility that a CPU timer expiration observes no running tasks and falls back to the group leader. PiperOrigin-RevId: 217603396 Change-Id: Ia24ce934d5574334857d9afb5ad8ca0b6a6e65f4	2018-10-17 15:50:02 -07:00
Ian Gudger	6922eee649	Merge queue into Unix transport This queue only has a single user, so there is no need for it to use an interface. Merging it into the same package as its sole user allows us to avoid a circular dependency. This simplifies the code and should slightly improve performance. PiperOrigin-RevId: 217595889 Change-Id: Iabbd5164240b935f79933618c61581bc8dcd2822	2018-10-17 15:10:20 -07:00
Ian Gudger	8c85f5e9ce	Fix typos in socket_test PiperOrigin-RevId: 217576188 Change-Id: I82e45c306c5c9161e207311c7dbb8a983820c1df	2018-10-17 13:25:45 -07:00
Michael Pratt	8fa6f6fe76	Reflow comment to 80 columns PiperOrigin-RevId: 217573168 Change-Id: Ic1914d0ef71bab020e3ee11cf9c4a50a702bd8dd	2018-10-17 13:06:16 -07:00
Nicolas Lacasse	4e6f0892c9	runsc: Support job control signals for the root container. Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732	2018-10-17 12:29:05 -07:00
Michael Pratt	578fe5a50d	Fix PTRACE_GETREGSET write size The existing logic is backwards and writes iov_len == 0 for a full write. PiperOrigin-RevId: 217560377 Change-Id: I5a39c31bf0ba9063a8495993bfef58dc8ab7c5fa	2018-10-17 11:53:04 -07:00
Ian Gudger	6cba410df0	Move Unix transport out of netstack PiperOrigin-RevId: 217557656 Change-Id: I63d27635b1a6c12877279995d2d9847b6a19da9b	2018-10-17 11:37:51 -07:00
Ian Gudger	324ad3564b	Refactor host.ConnectedEndpoint * Integrate recvMsg and sendMsg functions into Recv and Send respectively as they are no longer shared. * Clean up partial read/write error handling code. * Re-order code to make sense given that there is no longer a host.endpoint type. PiperOrigin-RevId: 217255072 Change-Id: Ib43fe9286452f813b8309d969be11f5fa40694cd	2018-10-15 20:23:18 -07:00
Ian Gudger	167f2401c4	Merge host.endpoint into host.ConnectedEndpoint host.endpoint contained duplicated logic from the sockerpair implementation and host.ConnectedEndpoint. Remove host.endpoint in favor of a host.ConnectedEndpoint wrapped in a socketpair end. PiperOrigin-RevId: 217240096 Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c	2018-10-15 17:48:11 -07:00
Nicolas Lacasse	ecd94ea7a6	Clean up Rename and Unlink checks for EBUSY. - Change Dirent.Busy => Dirent.isMountPoint. The function body is unchanged, and it is no longer exported. - fs.MayDelete now checks that the victim is not the process root. This aligns with Linux's namei.c:may_delete(). - Fix "is-ancestor" checks to actually compare all ancestors, not just the parents. - Fix handling of paths that end in dots, which are handled differently in Rename vs. Unlink. PiperOrigin-RevId: 217239274 Change-Id: I7a0eb768e70a1b2915017ce54f7f95cbf8edf1fb	2018-10-15 17:42:30 -07:00
Zhaozhong Ni	4ea69fce8d	sentry: save fs.Dirent deleted info. PiperOrigin-RevId: 217155458 Change-Id: Id3265b1ec784787039e2131c80254ac4937330c7	2018-10-15 09:31:32 -07:00
Kevin Krakauer	47d3862c33	runsc: Support retrieving MTU via netdevice ioctl. This enables ifconfig to display MTU. PiperOrigin-RevId: 216917021 Change-Id: Id513b23d9d76899bcb71b0b6a25036f41629a923	2018-10-12 13:58:32 -07:00
Zhaozhong Ni	0bfa03d61c	sentry: allow saving of unlinked files with open fds on virtual fs. PiperOrigin-RevId: 216733414 Change-Id: I33cd3eb818f0c39717d6656fcdfff6050b37ebb0	2018-10-11 11:41:44 -07:00
Adin Scannell	463e73d46d	Add seccomp filter configuration to ptrace stubs. This is a defense-in-depth measure. If the sentry is compromised, this prevents system call injection to the stubs. There is some complexity with respect to ptrace and seccomp interactions, so this protection is not really available for kernel versions < 4.8; this is detected dynamically. Note that this also solves the vsyscall emulation issue by adding in appropriate trapping for those system calls. It does mean that a compromised sentry could theoretically inject these into the stub (ignoring the trap and resume, thereby allowing execution), but they are harmless. PiperOrigin-RevId: 216647581 Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8	2018-10-10 22:40:28 -07:00
Michael Pratt	ddb34b3690	Enforce message size limits and avoid host calls with too many iovecs Currently, in the face of FileMem fragmentation and a large sendmsg or recvmsg call, host sockets may pass > 1024 iovecs to the host, which will immediately cause the host to return EMSGSIZE. When we detect this case, use a single intermediate buffer to pass to the kernel, copying to/from the src/dst buffer. To avoid creating unbounded intermediate buffers, enforce message size checks and truncation w.r.t. the send buffer size. The same functionality is added to netstack unix sockets for feature parity. PiperOrigin-RevId: 216590198 Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7	2018-10-10 14:10:17 -07:00
Nicolas Lacasse	b78552d30e	When creating a new process group, add it to the session. PiperOrigin-RevId: 216554791 Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35	2018-10-10 10:42:11 -07:00
Ian Gudger	c36d2ef373	Add new netstack metrics to the sentry PiperOrigin-RevId: 216431260 Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820	2018-10-09 15:12:44 -07:00
Brian Geffon	acf7a95189	Add memunit to sysinfo(2). Also properly add padding after Procs in the linux.Sysinfo structure. This will be implicitly padded to 64bits so we need to do the same. PiperOrigin-RevId: 216372907 Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475	2018-10-09 09:52:14 -07:00
Michael Pratt	569c2b06c4	Statfs Namelen should be NAME_MAX not PATH_MAX We accidentally set the wrong maximum. I've also added PATH_MAX and NAME_MAX to the linux abi package. PiperOrigin-RevId: 216221311 Change-Id: I44805fcf21508831809692184a0eba4cee469633	2018-10-08 11:39:54 -07:00
Jamie Liu	e9e8be6613	Implement shared futexes. - Shared futex objects on shared mappings are represented by Mappable + offset, analogous to Linux's use of inode + offset. Add type futex.Key, and change the futex.Manager bucket API to use futex.Keys instead of addresses. - Extend the futex.Checker interface to be able to return Keys for memory mappings. It returns Keys rather than just mappings because whether the address or the target of the mapping is used in the Key depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this matters because using mapping target for a futex on a MAP_PRIVATE mapping causes it to stop working across COW-breaking. - futex.Manager.WaitComplete depends on atomic updates to futex.Waiter.addr to determine when it has locked the right bucket, which is much less straightforward for struct futex.Waiter.key. Switch to an atomically-accessed futex.Waiter.bucket pointer. - futex.Manager.Wake now needs to take a futex.Checker to resolve addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit path to perform a shared futex wakeup (Linux: kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid, FUTEX_WAKE, ...)). This is a problem because futexChecker is in the syscalls/linux package. Move it to kernel. PiperOrigin-RevId: 216207039 Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf	2018-10-08 10:20:38 -07:00
Ian Gudger	beac59b37a	Fix panic if FIOASYNC callback is registered and triggered without target PiperOrigin-RevId: 215674589 Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb	2018-10-03 20:22:31 -07:00
Nicolas Lacasse	213f6688a5	Implement TIOCSCTTY ioctl as a noop. PiperOrigin-RevId: 215658757 Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd	2018-10-03 17:29:56 -07:00
Ian Gudger	4fef31f96c	Add S/R support for FIOASYNC PiperOrigin-RevId: 215655197 Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f	2018-10-03 17:03:09 -07:00
Nicolas Lacasse	f1c01ed886	runsc: Support job control signals in "exec -it". Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39	2018-10-01 22:06:56 -07:00
Michael Pratt	0400e54592	Add itimer types to linux package, strace PiperOrigin-RevId: 215278262 Change-Id: Icd10384c99802be6097be938196044386441e282	2018-10-01 14:16:53 -07:00
Nicolas Lacasse	07aa040842	Fix possible panic in control.Processes. There was a race where we checked task.Parent() != nil, and then later called task.Parent() again, assuming that it is not nil. If the task is exiting, the parent may have been set to nil in between the two calls, causing a panic. This CL changes the code to only call task.Parent() once. PiperOrigin-RevId: 215274456 Change-Id: Ib5a537312c917773265ec72016014f7bc59a5f59	2018-10-01 13:56:07 -07:00
Michael Pratt	3ff24b4f2c	Require AF_UNIX sockets from the gofer host.endpoint already has the check, but it is missing from host.ConnectedEndpoint. PiperOrigin-RevId: 214962762 Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6	2018-09-28 11:03:11 -07:00
Sepehr Raissian	c17ea8c6e2	Block for link address resolution Previously, if address resolution for UDP or Ping sockets required sending packets using Write in Transport layer, Resolve would return ErrWouldBlock and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution would run in background. Further calls to Write using same address would also return ErrNoLinkAddress until resolution has been completed successfully. Since Write is not allowed to block and System Calls need to be interruptible in System Call layer, the caller to Write is responsible for blocking upon return of ErrWouldBlock. Now, when startAddressResolution is called a notification channel for the completion of the address resolution is returned. The channel will traverse up to the calling function of Write as well as ErrNoLinkAddress. Once address resolution is complete (success or not) the channel is closed. The caller would call Write again to send packets and check if address resolution was compeleted successfully or not. Fixes google/gvisor#5 Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93 PiperOrigin-RevId: 214962200	2018-09-28 11:00:16 -07:00
Nicolas Lacasse	b709d23987	Forward ioctl(TCSETSF) calls on host ttys to the host kernel. We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but discards pending input. The filters were relaxed to allow host ioctls with TCSETSF argument. This fixes programs like "passwd" that prevent user input from being displayed on the terminal. Before: root@b8a0240fc836:/# passwd Enter new UNIX password: 123 Retype new UNIX password: 123 passwd: password updated successfully After: root@ae6f5dabe402:/# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully PiperOrigin-RevId: 214869788 Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58	2018-09-27 18:17:38 -07:00
Fabricio Voznika	491faac03b	Implement 'runsc kill --all' In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6	2018-09-27 15:00:58 -07:00
Zhaozhong Ni	234f36b6f2	sentry: export cpuTime function. PiperOrigin-RevId: 214798278 Change-Id: Id59d1ceb35037cda0689d3a1c4844e96c6957615	2018-09-27 12:52:25 -07:00
Fabricio Voznika	fca9a390db	Return correct parent PID Old code was returning ID of the thread that created the child process. It should be returning the ID of the parent process instead. PiperOrigin-RevId: 214720910 Change-Id: I95715c535bcf468ecf1ae771cccd04a4cd345b36	2018-09-26 22:00:04 -07:00
Nicolas Lacasse	fd222d62ed	Short-circuit Readdir calls on overlay files when the dirent is frozen. If we have an overlay file whose corresponding Dirent is frozen, then we should not bother calling Readdir on the upper or lower files, since DirentReaddir will calculate children based on the frozen Dirent tree. A test was added that fails without this change. PiperOrigin-RevId: 213531215 Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d	2018-09-18 15:42:22 -07:00
Brian Geffon	ed08597d12	Allow for MSG_CTRUNC in input flags for recv. PiperOrigin-RevId: 213481363 Change-Id: I8150ea20cebeb207afe031ed146244de9209e745	2018-09-18 11:14:37 -07:00
Fabricio Voznika	da20559137	Provide better message when memfd_create fails with ENOSYS Updates #100 PiperOrigin-RevId: 213414821 Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37	2018-09-18 02:09:28 -07:00
Fabricio Voznika	5d9816be41	Remove memory usage static init panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9	2018-09-17 21:34:37 -07:00
Kevin Krakauer	bb88c187c5	runsc: Enable waiting on exited processes. This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558	2018-09-17 16:25:24 -07:00
Ian Gudger	ab6fa44588	Allow kernel.(*Task).Block to accept an extract only channel PiperOrigin-RevId: 213328293 Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a	2018-09-17 13:35:54 -07:00
Michael Pratt	d639c3d61b	Allow NULL data in mount(2) PiperOrigin-RevId: 213315267 Change-Id: I7562bcd81fb22e90aa9c7dd9eeb94803fcb8c5af	2018-09-17 12:16:29 -07:00
newmanwang	de5a590ee2	Avoid reuse of pending SignalInfo objects runApp.execute -> Task.SendSignal -> sendSignalLocked -> sendSignalTimerLocked -> pendingSignals.enqueue assumes that it owns the arch.SignalInfo returned from platform.Context.Switch. On the other hand, ptrace.context.Switch assumes that it owns the returned SignalInfo and can safely reuse it on the next call to Switch. The KVM platform always returns a unique SignalInfo. This becomes a problem when the returned signal is not immediately delivered, allowing a future signal in Switch to change the previous pending SignalInfo. This is noticeable in #38 when external SIGINTs are delivered from the PTY slave FD. Note that the ptrace stubs are in the same process group as the sentry, so they are eligible to receive the PTY signals. This should probably change, but is not the only possible cause of this bug. Updates #38 Original change by newmanwang <wcs1011@gmail.com>, updated by Michael Pratt <mpratt@google.com>. Change-Id: I5383840272309df70a29f67b25e8221f933622cd PiperOrigin-RevId: 213071072	2018-09-14 17:39:25 -07:00
Michael Pratt	3aa50f18a4	Reuse readlink parameter, add sockaddr max. PiperOrigin-RevId: 213058623 Change-Id: I522598c655d633b9330990951ff1c54d1023ec29	2018-09-14 16:00:02 -07:00
Nicolas Lacasse	b84bfa570d	Make gVisor hard link check match Linux's. Linux permits hard-linking if the target is owned by the user OR the target has Read+Write permission. PiperOrigin-RevId: 213024613 Change-Id: If642066317b568b99084edd33ee4e8822ec9cbb3	2018-09-14 12:29:46 -07:00
Jamie Liu	0380bcb3a4	Fix interaction between rt_sigtimedwait and ignored signals. PiperOrigin-RevId: 213011782 Change-Id: I716c6ea3c586b0c6c5a892b6390d2d11478bc5af	2018-09-14 11:10:50 -07:00
Chenggang	faa34a0738	platform/kvm: Get max vcpu number dynamically by ioctl The old kernel version, such as 4.4, only support 255 vcpus. While gvisor is ran on these kernels, it could panic because the vcpu id and vcpu number beyond max_vcpus. Use ioctl(vmfd, _KVM_CHECK_EXTENSION, _KVM_CAP_MAX_VCPUS) to get max vcpus number dynamically. Change-Id: I50dd859a11b1c2cea854a8e27d4bf11a411aa45c PiperOrigin-RevId: 212929704	2018-09-13 21:47:11 -07:00
Ian Gudger	29a7271f5d	Plumb monotonic time to netstack Netstack needs to be portable, so this seems to be preferable to using raw system calls. PiperOrigin-RevId: 212917409 Change-Id: I7b2073e7db4b4bf75300717ca23aea4c15be944c	2018-09-13 19:12:15 -07:00
Rahat Mahmood	adf8f33970	Extend memory usage events to report mapped memory usage. PiperOrigin-RevId: 212887555 Change-Id: I3545383ce903cbe9f00d9b5288d9ef9a049b9f4f	2018-09-13 15:16:47 -07:00
Michael Pratt	9c6b38e295	Format struct itimerspec PiperOrigin-RevId: 212874745 Change-Id: I0c3e8e6a9e8976631cee03bf0b8891b336ddb8c8	2018-09-13 14:07:47 -07:00
Nicolas Lacasse	e2d79480f5	initArgs must hold a reference on the Root if it is not nil. The contract in ExecArgs says that a reference on ExecArgs.Root must be held for the lifetime of the struct, but the caller is free to drop the ref after that. As a result, proc.Exec must take an additional ref on Root when it constructs the CreateProcessArgs, since that holds a pointer to Root as well. That ref is dropped in CreateProcess. PiperOrigin-RevId: 212828348 Change-Id: I7f44a612f337ff51a02b873b8a845d3119408707	2018-09-13 09:50:35 -07:00
Kevin Krakauer	2eff1fdd06	runsc: Add exec flag that specifies where to save the sandbox-internal pid. This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0	2018-09-12 15:23:35 -07:00
Nicolas Lacasse	6cc9b311af	platform: Pass device fd into platform constructor. We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910	2018-09-11 13:09:46 -07:00
Jamie Liu	a29c39aa62	Map committed chunks concurrently in FileMem.LoadFrom. PiperOrigin-RevId: 212345401 Change-Id: Iac626ee87ba312df88ab1019ade6ecd62c04c75c	2018-09-10 15:23:44 -07:00
Fabricio Voznika	7e9e6745ca	Allow '/dev/zero' to be mapped with unaligned length PiperOrigin-RevId: 212321271 Change-Id: I79d71c2e6f4b8fcd3b9b923fe96c2256755f4c48	2018-09-10 13:24:55 -07:00
Michael Pratt	7045828a31	Update cleanup TODO PiperOrigin-RevId: 212068327 Change-Id: I3f360cdf7d6caa1c96fae68ae3a1caaf440f0cbe	2018-09-07 18:14:57 -07:00
Nicolas Lacasse	9751b800a6	runsc: Support multi-container exec. We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537	2018-09-07 17:39:54 -07:00
Fabricio Voznika	172860a059	Add 'Starting gVisor...' message to syslog This allows applications to verify they are running with gVisor. It also helps debugging when running with a mix of container runtimes. Closes #54 PiperOrigin-RevId: 212059457 Change-Id: I51d9595ee742b58c1f83f3902ab2e2ecbd5cedec	2018-09-07 16:59:27 -07:00
Fabricio Voznika	f895cb4d8b	Use root abstract socket namespace for exec PiperOrigin-RevId: 211999211 Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d	2018-09-07 10:45:55 -07:00
Michael Pratt	169e2efc5a	Continue handling signals after disabling forwarding Before destroying the Kernel, we disable signal forwarding, relinquishing control to the Go runtime. External signals that arrive after disabling forwarding but before the sandbox exits thus may use runtime.raise (i.e., tkill(2)) and violate the syscall filters. Adjust forwardSignals to handle signals received after disabling forwarding the same way they are handled before starting forwarding. i.e., by implementing the standard Go runtime behavior using tgkill(2) instead of tkill(2). This also makes the stop callback block until forwarding actually stops. This isn't required to avoid tkill(2) but is a saner interface. PiperOrigin-RevId: 211995946 Change-Id: I3585841644409260eec23435cf65681ad41f5f03	2018-09-07 10:28:25 -07:00
Nicolas Lacasse	6516b5648b	createProcessArgs.RootFromContext should return process Root if it exists. It was always returning the MountNamespace root, which may be different from the process Root if the process is in a chroot environment. PiperOrigin-RevId: 211862181 Change-Id: I63bfeb610e2b0affa9fdbdd8147eba3c39014480	2018-09-06 13:47:49 -07:00
Fabricio Voznika	41b56696c4	Imported FD in exec was leaking Imported file needs to be closed after it's been imported. PiperOrigin-RevId: 211732472 Change-Id: Ia9249210558b77be076bcce465b832a22eed301f	2018-09-05 18:07:11 -07:00
Brian Geffon	2b8dae0bc5	Open(2) isn't honoring O_NOFOLLOW PiperOrigin-RevId: 211644897 Change-Id: I882ed827a477d6c03576463ca5bf2d6351892b90	2018-09-05 09:21:28 -07:00
Michael Pratt	3944cb41cb	/proc/PID/mounts is not tab-delimited PiperOrigin-RevId: 211513847 Change-Id: Ib484dd2d921c3e5d70d0e410cd973d3bff4f6b73	2018-09-04 13:29:49 -07:00
Adin Scannell	c09f9acd7c	Distinguish Element and Linker for ilist. Furthermore, allow for the specification of an ElementMapper. This allows a single "Element" type to exist on multiple inline lists, and work without having to embed the entry type. This is a requisite change for supporting a per-Inode list of Dirents. PiperOrigin-RevId: 211467497 Change-Id: If2768999b43e03fdaecf8ed15f435fe37518d163	2018-09-04 09:19:11 -07:00
Jamie Liu	f8ccfbbed4	Document more task-goroutine-owned fields in kernel.Task. Task.creds can only be changed by the task's own set*id and execve syscalls, and Task namespaces can only be changed by the task's own unshare/setns syscalls. PiperOrigin-RevId: 211156279 Change-Id: I94d57105d34e8739d964400995a8a5d76306b2a0	2018-08-31 15:44:40 -07:00
Jamie Liu	b935311e23	Do not use fs.FileOwnerFromContext in fs/proc.file.UnstableAttr(). From //pkg/sentry/context/context.go: // - It is not safe to retain a Context passed to a function beyond the scope // of that function call. Passing a stored kernel.Task as a context.Context to fs.FileOwnerFromContext violates this requirement. PiperOrigin-RevId: 211143021 Change-Id: I4c5b02bd941407be4c9cfdbcbdfe5a26acaec037	2018-08-31 14:17:56 -07:00
Jamie Liu	098046ba19	Disintegrate kernel.TaskResources. This allows us to call kernel.FDMap.DecRef without holding mutexes cleanly. PiperOrigin-RevId: 211139657 Change-Id: Ie59d5210fb9282e1950e2e40323df7264a01bcec	2018-08-31 13:58:04 -07:00
Jamie Liu	b1c1afa3cc	Delete the long-obsolete kernel.TaskMaybe interface. PiperOrigin-RevId: 211131855 Change-Id: Ia7799561ccd65d16269e0ae6f408ab53749bca37	2018-08-31 13:07:34 -07:00
Nicolas Lacasse	8bfb5fa919	fs: Add empty dir at /sys/class/power_supply. PiperOrigin-RevId: 210953512 Change-Id: I07d2d7fb0d268aa8eca26d81ef28b5b5c42289ee	2018-08-30 12:01:27 -07:00
Nicolas Lacasse	956fe64ad6	fs: Fix renameMu lock recursion. dirent.walk() takes renameMu, but is often called with renameMu already held, which can lead to a deadlock. Fix this by requiring renameMu to be held for reading when dirent.walk() is called. This causes walks and existence checks to block while a rename operation takes place, but that is what we were already trying to enforce by taking renameMu in walk() anyways. PiperOrigin-RevId: 210760780 Change-Id: Id61018e6e4adbeac53b9c1b3aa24ab77f75d8a54	2018-08-29 11:47:01 -07:00
Nicolas Lacasse	1893247616	fs: Drop reference to over-written file before renaming over it. dirent.go:Rename() walks to the file being replaced and defers replaced.DecRef(). After the rename, the reference is dropped, triggering a writeout and SettAttr call to the gofer. Because of lazyOpenForWrite, the gofer opens the replaced file BY ITS OLD NAME and calls ftruncate on it. This CL changes Remove to drop the reference on replaced (and thus trigger writeout) before the actual rename call. PiperOrigin-RevId: 210756097 Change-Id: I01ea09a5ee6c2e2d464560362f09943641638e0f	2018-08-29 11:22:27 -07:00
Ian Gudger	52e6714146	fasync: don't keep mutex after return PiperOrigin-RevId: 210637533 Change-Id: I3536c3f9efb54732a0d8ada8bc299142b2c1682f	2018-08-28 17:26:26 -07:00
Nicolas Lacasse	3b11769c77	fs: Don't bother saving negative dirents. PiperOrigin-RevId: 210616454 Change-Id: I3f536e2b4d603e540cdd9a67c61b8ec3351f4ac3	2018-08-28 15:18:42 -07:00
Nicolas Lacasse	515d9bf43b	fs: Add tests for dirent ref counting with an overlay. PiperOrigin-RevId: 210614669 Change-Id: I408365ff6d6c7765ed7b789446d30e7079cbfc67	2018-08-28 15:09:17 -07:00
Zhaozhong Ni	d724863a31	sentry: optimize dirent weakref map save / restore. Weak references save / restore involves multiple interface indirection and cause material latency overhead when there are lots of dirents, each containing a weak reference map. The nil entries in the map should also be purged. PiperOrigin-RevId: 210593727 Change-Id: Ied6f4c3c0726fcc53a24b983d9b3a79121b6b758	2018-08-28 13:22:07 -07:00
Michael Pratt	25a8e13a78	Bump to Go 1.11 The procid offset is unchanged. PiperOrigin-RevId: 210551969 Change-Id: I33ba1ce56c2f5631b712417d870aa65ef24e6022	2018-08-28 09:22:41 -07:00
Fabricio Voznika	ae648bafda	Add command-line parameter to trigger panic on signal This is to troubleshoot problems with a hung process that is not responding to 'runsc debug --stack' command. PiperOrigin-RevId: 210483513 Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d	2018-08-27 20:36:10 -07:00
Brian Geffon	f0492d45aa	Add /proc/sys/kernel/shm[all,max,mni]. PiperOrigin-RevId: 210459956 Change-Id: I51859b90fa967631e0a54a390abc3b5541fbee66	2018-08-27 17:21:37 -07:00
Nicolas Lacasse	0b3bfe2ea3	fs: Fix remote-revalidate cache policy. When revalidating a Dirent, if the inode id is the same, then we don't need to throw away the entire Dirent. We can just update the unstable attributes in place. If the inode id has changed, then the remote file has been deleted or moved, and we have no choice but to throw away the dirent we have a look up another. In this case, we may still end up losing a mounted dirent that is a child of the revalidated dirent. However, that seems appropriate here because the entire mount point has been pulled out from underneath us. Because gVisor's overlay is at the Inode level rather than the Dirent level, we must pass the parent Inode and name along with the Inode that is being revalidated. PiperOrigin-RevId: 210431270 Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42	2018-08-27 14:26:29 -07:00
Zhaozhong Ni	bd01816c87	sentry: mark fsutil.DirFileOperations as savable. PiperOrigin-RevId: 210405166 Change-Id: I252766015885c418e914007baf2fc058fec39b3e	2018-08-27 11:55:32 -07:00
Kevin Krakauer	2524111fc6	runsc: Terminal resizing support. Implements the TIOCGWINSZ and TIOCSWINSZ ioctls, which allow processes to resize the terminal. This allows, for example, sshd to properly set the window size for ssh sessions. PiperOrigin-RevId: 210392504 Change-Id: I0d4789154d6d22f02509b31d71392e13ee4a50ba	2018-08-27 10:49:16 -07:00
Nicolas Lacasse	106de2182d	runsc: Terminal support for "docker exec -ti". This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9	2018-08-24 17:43:21 -07:00
Nicolas Lacasse	c48708a041	fs: Drop unused WaitGroup in Dirent.destroy. PiperOrigin-RevId: 210182476 Change-Id: I655a2a801e2069108d30323f7f5ae76deb3ea3ec	2018-08-24 17:15:42 -07:00
Jamie Liu	64403265a0	Implement POSIX per-process interval timers. PiperOrigin-RevId: 210021612 Change-Id: If7c161e6fd08cf17942bfb6bc5a8d2c4e271c61e	2018-08-23 16:32:36 -07:00
Zhaozhong Ni	ba8f6ba8c8	sentry: mark idMapSeqHandle as savable. PiperOrigin-RevId: 209994384 Change-Id: I16186cf79cb4760a134f3968db30c168a5f4340e	2018-08-23 13:59:20 -07:00
Adin Scannell	a7a8d07d7d	Add separate Recycle method for allocator. This improves debugging for pagetable-related issues. PiperOrigin-RevId: 209827795 Change-Id: I4cfa11664b0b52f26f6bc90a14c5bb106f01e038	2018-08-22 14:16:04 -07:00
Zhaozhong Ni	6b9133ba96	sentry: mark S/R stating errors as save rejections / fs corruptions. PiperOrigin-RevId: 209817767 Change-Id: Iddf2b8441bc44f31f9a8cf6f2bd8e7a5b824b487	2018-08-22 13:19:16 -07:00
Brian Geffon	545ea7ab3f	Always add AT_BASE even if there is no interpreter. Linux will ALWAYS add AT_BASE even for a static binary, expect it will be set to 0 [1]. 1. https://github.com/torvalds/linux/blob/master/fs/binfmt_elf.c#L253 PiperOrigin-RevId: 209811129 Change-Id: I92cc66532f23d40f24414a921c030bd3481e12a0	2018-08-22 12:37:09 -07:00
Nicolas Lacasse	8d318aac55	fs: Hold Dirent.mu when calling Dirent.flush(). As required by the contract in Dirent.flush(). Also inline Dirent.freeze() into Dirent.Freeze(), since it is only called from there. PiperOrigin-RevId: 209783626 Change-Id: Ie6de4533d93dd299ffa01dabfa257c9cc259b1f4	2018-08-22 10:07:01 -07:00
Zhaozhong Ni	8bb50dab79	sentry: do not release gofer inode file state loading lock upon error. When an inode file state failed to load asynchronuously, we want to report the error instead of potentially panicing in another async loading goroutine incorrectly unblocked. PiperOrigin-RevId: 209683977 Change-Id: I591cde97710bbe3cdc53717ee58f1d28bbda9261	2018-08-21 16:52:27 -07:00
Ian Gudger	9c407382b0	Fix races in kernel.(*Task).Value() PiperOrigin-RevId: 209627180 Change-Id: Idc84afd38003427e411df6e75abfabd9174174e1	2018-08-21 11:16:17 -07:00
Ian Gudger	47d5a12ce5	Fix handling of abstract Unix socket addresses * Don't truncate abstract addresses at second null. * Properly handle abstract addresses with length < 108 bytes. PiperOrigin-RevId: 209502703 Change-Id: I49053f2d18b5a78208c3f640c27dbbdaece4f1a9	2018-08-20 16:12:23 -07:00
Nicolas Lacasse	1501400d9c	getdents should return type=DT_DIR for SpecialDirectories. It was returning DT_UNKNOWN, and this was breaking numpy. PiperOrigin-RevId: 209459351 Change-Id: Ic6f548e23aa9c551b2032b92636cb5f0df9ccbd4	2018-08-20 11:59:58 -07:00
Nicolas Lacasse	0050e3e71c	sysfs: Add (empty) cpu directories for each cpu in /sys/devices/system/cpu. Numpy needs these. Also added the "present" directory, since the contents are the same as possible and online. PiperOrigin-RevId: 209451777 Change-Id: I2048de3f57bf1c57e9b5421d607ca89c2a173684	2018-08-20 11:19:15 -07:00
Chenggang Qin	aeec7a4c00	fs: Support possible and online knobs for cpu Some linux commands depend on /sys/devices/system/cpu/possible, such as 'lscpu'. Add 2 knobs for cpu: /sys/devices/system/cpu/possible /sys/devices/system/cpu/online Both the values are '0 - Kernel.ApplicationCores()-1'. Change-Id: Iabd8a4e559cbb630ed249686b92c22b4e7120663 PiperOrigin-RevId: 209070163	2018-08-16 16:28:14 -07:00
Kevin Krakauer	635b0c4593	runsc fsgofer: Support dynamic serving of filesystems. When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068	2018-08-15 16:25:22 -07:00
Ian Gudger	a620bea045	Reduce map lookups in syserr PiperOrigin-RevId: 208755352 Change-Id: Ia24630f452a4a42940ab73a8113a2fd5ea2cfca2	2018-08-14 19:03:38 -07:00
Nicolas Lacasse	e8a4f2e133	runsc: Change cache policy for root fs and volume mounts. Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9	2018-08-14 16:25:58 -07:00
Kevin Krakauer	d4939f6dc2	TTY: Fix data race where calls into tty.queue's waiter were not synchronized. Now, there's a waiter for each end (master and slave) of the TTY, and each waiter.Entry is only enqueued in one of the waiters. PiperOrigin-RevId: 208734483 Change-Id: I06996148f123075f8dd48cde5a553e2be74c6dce	2018-08-14 16:22:56 -07:00
Kevin Krakauer	12a4912aed	Fix `ls -laR \| wc -l` hanging. stat()-ing /proc/PID/fd/FD incremented but didn't decrement the refcount for FD. This behavior wasn't usually noticeable, but in the above case: - ls would never decrement the refcount of the write end of the pipe to 0. - This caused the write end of the pipe never to close. - wc would then hang read()-ing from the pipe. PiperOrigin-RevId: 208728817 Change-Id: I4fca1ba5ca24e4108915a1d30b41dc63da40604d	2018-08-14 15:49:58 -07:00
Ian Gudger	e97717e29a	Enforce Unix socket address length limit PiperOrigin-RevId: 208720936 Change-Id: Ic943a88b6efeff49574306d4d4e1f113116ae32e	2018-08-14 15:07:05 -07:00
Nicolas Lacasse	6cf2278167	Automated rollback of changelist 208284483 PiperOrigin-RevId: 208685417 Change-Id: Ie2849c4811e3a2d14a002f521cef018ded0c6c4a	2018-08-14 11:50:49 -07:00
Nicolas Lacasse	66b0f3e15a	Fix bind() on overlays. InodeOperations.Bind now returns a Dirent which will be cached in the Dirent tree. When an overlay is in-use, Bind cannot return the Dirent created by the upper filesystem because the Dirent does not know about the overlay. Instead, overlayBind must create a new overlay-aware Inode and Dirent and return that. This is analagous to how Lookup and overlayLookup work. PiperOrigin-RevId: 208670710 Change-Id: I6390affbcf94c38656b4b458e248739b4853da29	2018-08-14 10:34:56 -07:00
Adin Scannell	dde836a918	Prevent renames across walk fast path. PiperOrigin-RevId: 208533436 Change-Id: Ifc1a4e2d6438a424650bee831c301b1ac0d670a3	2018-08-13 13:31:18 -07:00
Nicolas Lacasse	a2ec391dfb	fs: Allow overlays to revalidate files from the upper fs. Previously, an overlay would panic if either the upper or lower fs required revalidation for a given Dirent. Now, we allow revalidation from the upper file, but not the lower. If a cached overlay inode does need revalidation (because the upper needs revalidation), then the entire overlay Inode will be discarded and a new overlay Inode will be built with a fresh copy of the upper file. As a side effect of this change, Revalidate must take an Inode instead of a Dirent, since an overlay needs to revalidate individual Inodes. PiperOrigin-RevId: 208293638 Change-Id: Ic8f8d1ffdc09114721745661a09522b54420c5f1	2018-08-10 17:16:38 -07:00
Justine Olshan	ae6f092fe1	Implemented the splice(2) syscall. Currently the implementation matches the behavior of moving data between two file descriptors. However, it does not implement this through zero-copy movement. Thus, this code is a starting point to build the more complex implementation. PiperOrigin-RevId: 208284483 Change-Id: Ibde79520a3d50bc26aead7ad4f128d2be31db14e	2018-08-10 16:11:01 -07:00
Nicolas Lacasse	567c5eed11	cache policy: Check policy before returning a negative dirent. The cache policy determines whether Lookup should return a negative dirent, or just ENOENT. This CL fixes one spot where we returned a negative dirent without first consulting the policy. PiperOrigin-RevId: 208280230 Change-Id: I8f963bbdb45a95a74ad0ecc1eef47eff2092d3a4	2018-08-10 15:43:03 -07:00
Brielle Broder	4ececd8e8d	Enable checkpoint/restore in cases of UDS use. Previously, processes which used file-system Unix Domain Sockets could not be checkpoint-ed in runsc because the sockets were saved with their inode numbers which do not necessarily remain the same upon restore. Now, the sockets are also saved with their paths so that the new inodes can be determined for the sockets based on these paths after restoring. Tests for cases with UDS use are included. Test cleanup to come. PiperOrigin-RevId: 208268781 Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec	2018-08-10 14:33:20 -07:00
Neel Natu	d5b702b64f	Validate FS.base before establishing it in the task's register set. PiperOrigin-RevId: 208229341 Change-Id: I5d84bc52bbafa073446ef497e56958d0d7955aa8	2018-08-10 10:27:09 -07:00
Michael Pratt	2e06b23aa6	Fix missing O_LARGEFILE from O_CREAT files Cleanup some more syscall.O_* references while we're here. PiperOrigin-RevId: 208133460 Change-Id: I48db71a38f817e4f4673977eafcc0e3874eb9a25	2018-08-09 16:50:37 -07:00
Fabricio Voznika	4e171f7590	Basic support for ip link/addr and ifconfig Closes #94 PiperOrigin-RevId: 207997580 Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924	2018-08-08 22:39:58 -07:00
Adin Scannell	dbbe9ec915	Protect PCIDs with a mutex. Because the Drop method may be called across vCPUs, it is necessary to protect the PCID database with a mutex to prevent concurrent modification. The PCID is assigned prior to entersyscall, so it's safe to block. PiperOrigin-RevId: 207992864 Change-Id: I8b36d55106981f51e30dcf03e12886330bb79d67	2018-08-08 21:29:19 -07:00
Fabricio Voznika	0d350aac7f	Enable SACK in runsc SACK is disabled by default and needs to be manually enabled. It not only improves performance, but also fixes hangs downloading files from certain websites. PiperOrigin-RevId: 207906742 Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b	2018-08-08 10:26:18 -07:00
Jamie Liu	c036da5dff	Hold TaskSet.mu in Task.Parent. PiperOrigin-RevId: 207766238 Change-Id: Id3b66d8fe1f44c3570f67fa5ae7ba16021e35be1	2018-08-07 13:09:42 -07:00
Nicolas Lacasse	a38f41b464	fs: Add new cache policy "remote_revalidate". This CL adds a new cache-policy for gofer filesystems that uses the host page cache, but causes dirents to be reloaded on each Walk, and does not cache readdir results. This policy is useful when the remote filesystem may change out from underneath us, as any remote changes will be reflected on the next Walk. Importantly, this cache policy is only consistent if we do not use gVisor's internal page cache, since that page cache is tied to the Inode and may be thrown away upon Revalidation. This cache policy should only be used when the gofer supports donating host FDs, since then gVisor will make use of the host kernel page cache, which will be consistent for all open files in the gofer. In fact, a panic will be raised if a file is opened without a donated FD. PiperOrigin-RevId: 207752937 Change-Id: I233cb78b4695bbe00a4605ae64080a47629329b8	2018-08-07 11:43:41 -07:00
Zhaozhong Ni	c348d07863	sentry: make epoll.pollEntry wait for the file operation in restore. PiperOrigin-RevId: 207737935 Change-Id: I3a301ece1f1d30909715f36562474e3248b6a0d5	2018-08-07 10:27:37 -07:00
Michael Pratt	42086fe8e1	Make ramfs.File savable In other news, apparently proc.fdInfo is the last user of ramfs.File. PiperOrigin-RevId: 207564572 Change-Id: I5a92515698cc89652b80bea9a32d309e14059869	2018-08-06 10:15:56 -07:00
ShiruRen	3ec074897f	Fix a bug in PCIDs.Assign Store the new assigned pcid in p.cache[pt]. Signed-off-by: ShiruRen <renshiru2000@gmail.com> Change-Id: I4aee4e06559e429fb5e90cb9fe28b36139e3b4b6 PiperOrigin-RevId: 207563833	2018-08-06 10:11:56 -07:00
Zhaozhong Ni	25178ebdf5	stateify: make explicit mode no longer optional. PiperOrigin-RevId: 207303405 Change-Id: I17b6433963d78e3631a862b7ac80f566c8e7d106	2018-08-03 12:09:13 -07:00
Michael Pratt	a3927157c5	Copy creds in access PiperOrigin-RevId: 207181631 Change-Id: Ic6205278715a9260fb970efb414fc758ea72c4c6	2018-08-02 16:01:31 -07:00
Michael Pratt	b6a37ab9d9	Update comment reference PiperOrigin-RevId: 207180809 Change-Id: I08c264812919e81b2c56fdd4a9ef06924de8b52f	2018-08-02 15:56:40 -07:00
Zhaozhong Ni	57d0fcbdbf	Automated rollback of changelist 207037226 PiperOrigin-RevId: 207125440 Change-Id: I6c572afb4d693ee72a0c458a988b0e96d191cd49	2018-08-02 10:42:48 -07:00
Brian Geffon	cf44aff6e0	Add seccomp(2) support. Add support for the seccomp syscall and the flag SECCOMP_FILTER_FLAG_TSYNC. PiperOrigin-RevId: 207101507 Change-Id: I5eb8ba9d5ef71b0e683930a6429182726dc23175	2018-08-02 08:10:30 -07:00
Michael Pratt	60add78980	Automated rollback of changelist 207007153 PiperOrigin-RevId: 207037226 Change-Id: I8b5f1a056d4f3eab17846f2e0193bb737ecb5428	2018-08-01 19:57:32 -07:00
Zhaozhong Ni	b9e1cf8404	stateify: convert all packages to use explicit mode. PiperOrigin-RevId: 207007153 Change-Id: Ifedf1cc3758dc18be16647a4ece9c840c1c636c9	2018-08-01 15:43:24 -07:00
Brielle Broder	6b87378634	New conditional for adding key/value pairs to maps. When adding MultiDeviceKeys and their values into MultiDevice maps, make sure the keys and values have not already been added. This ensures that preexisting key/value pairs are not overridden. PiperOrigin-RevId: 206942766 Change-Id: I9d85f38eb59ba59f0305e6614a52690608944981	2018-08-01 09:44:57 -07:00
Andrei Vagin	a7a0167716	proc: show file flags in fdinfo Currently, there is an attempt to print FD flags, but they are not decoded into a number, so we see something like this: /criu # cat /proc/self/fdinfo/0 flags: {%!o(bool=000false)} Actually, fdinfo has to contain file flags. Change-Id: Idcbb7db908067447eb9ae6f2c3cfb861f2be1a97 PiperOrigin-RevId: 206794498	2018-07-31 11:19:15 -07:00
Justine Olshan	2793f7ac5f	Added the O_LARGEFILE flag. This flag will always be true for gVisor files. PiperOrigin-RevId: 206355963 Change-Id: I2f03d2412e2609042df43b06d1318cba674574d0	2018-07-27 12:27:46 -07:00
Zhaozhong Ni	be7fcbc558	stateify: support explicit annotation mode; convert refs and stack packages. We have been unnecessarily creating too many savable types implicitly. PiperOrigin-RevId: 206334201 Change-Id: Idc5a3a14bfb7ee125c4f2bb2b1c53164e46f29a8	2018-07-27 10:17:21 -07:00
Nicolas Lacasse	127c977ab0	Don't copy-up extended attributes that specifically configure a lower overlay. When copying-up files from a lower fs to an upper, we also copy the extended attributes on the file. If there is a (nested) overlay inside the lower, some of these extended attributes configure the lower overlay, and should not be copied-up to the upper. In particular, whiteout attributes in the lower fs overlay should not be copied-up, since the upper fs may actually contain the file. PiperOrigin-RevId: 206236010 Change-Id: Ia0454ac7b99d0e11383f732a529cb195ed364062	2018-07-26 15:55:50 -07:00
Michael Pratt	7cd9405b9c	Format openat flags PiperOrigin-RevId: 206021774 Change-Id: I447b6c751c28a8d8d4d78468b756b6ad8c61e169	2018-07-25 11:07:19 -07:00
Kevin Krakauer	32aa0f5465	Typo fix. PiperOrigin-RevId: 205880843 Change-Id: If2272b25f08a18ebe9b6309a1032dd5cdaa59866	2018-07-24 13:26:06 -07:00
Fabricio Voznika	d7a34790a0	Add KVM and overlay dimensions to container_test PiperOrigin-RevId: 205714667 Change-Id: I317a2ca98ac3bdad97c4790fcc61b004757d99ef	2018-07-23 13:31:42 -07:00
Michael Pratt	5f134b3c0a	Format getcwd path PiperOrigin-RevId: 205440332 Change-Id: I2a838f363e079164c83da88e1b0b8769844fe79b	2018-07-20 12:59:41 -07:00
Adin Scannell	8b8aad91d5	kernel: mutations on creds now require a copy. PiperOrigin-RevId: 205315612 Change-Id: I9a0a1e32c8abfb7467a38743b82449cc92830316	2018-07-19 15:48:56 -07:00
Nicolas Lacasse	be431d0934	fs: Pass context to Revalidate() function. The current revalidation logic is very simple and does not do much introspection of the dirent being revalidated (other than looking at the type of file). Fancier revalidation logic is coming soon, and we need to be able to look at the cached and uncached attributes of a given dirent, and we need a context to perform some of these operations. PiperOrigin-RevId: 205307351 Change-Id: If17ea1c631d8f9489c0e05a263e23d7a8a3bf159	2018-07-19 14:57:52 -07:00
Nicolas Lacasse	ea37103196	ConfigureMMap on an overlay file delegates to the upper if there is no lower. In the general case with an overlay, all mmap calls must go through the overlay, because in the event of a copy-up, the overlay needs to invalidate any previously-created mappings. If there if no lower file, however, there will never be a copy-up, so the overlay can delegate directly to the upper file in that case. This also allows us to correctly mmap /dev/zero when it is in an overlay. This file has special semantics which the overlay does not know about. In particular, it does not implement Mappable(), which (in the general case) the overlay uses to detect if a file is mappable or not. PiperOrigin-RevId: 205306743 Change-Id: I92331649aa648340ef6e65411c2b42c12fa69631	2018-07-19 14:53:38 -07:00
Brian Geffon	df5a5d388e	Add AT_UID, AT_EUID, AT_GID, AT_EGID to aux vector. With musl libc when these entries are missing from the aux vector it's forcing libc.secure (effectively AT_SECURE). This mode prevents RPATH and LD_LIBRARY_PATH from working. https://git.musl-libc.org/cgit/musl/tree/ldso/dynlink.c#n1488 As the first entry is a mask of all the aux fields set: https://git.musl-libc.org/cgit/musl/tree/ldso/dynlink.c#n187 PiperOrigin-RevId: 205284684 Change-Id: I04de7bab241043306b4f732306a81d74edfdff26	2018-07-19 12:42:05 -07:00
Zhaozhong Ni	a95640b1e9	sentry: save stack in proc net dev. PiperOrigin-RevId: 205253858 Change-Id: Iccdc493b66d1b4d39de44afb1184952183b1283f	2018-07-19 09:37:32 -07:00
Nicolas Lacasse	63e2820f7b	Fix lock-ordering violation in Create by logging BaseName instead of FullName. Dirent.FullName takes the global renameMu, but can be called during Create, which itself takes dirent.mu and dirent.dirMu, which is a lock-order violation: Dirent.Create d.dirMu.Lock d.mu.Lock Inode.Create gofer.inodeOperations.Create gofer.NewFile Dirent.FullName d.renameMu.RLock We only use the FullName here for logging, and in this case we can get by with logging only the BaseName. A `BaseName` method was added to Dirent, which simply returns the name, taking d.parent.mu as required. In the Create pathway, we can't call d.BaseName() because taking d.parent.mu after d.mu violates the lock order. But we already know the base name of the file we just created, so that's OK. In the Open/GetFile pathway, we are free to call d.BaseName() because the other dirent locks are not held. PiperOrigin-RevId: 205112278 Change-Id: Ib45c734081aecc9b225249a65fa8093eb4995f10	2018-07-18 11:49:50 -07:00
Michael Pratt	733ebe7c09	Merge FileMem.usage in IncRef Per the doc, usage must be kept maximally merged. Beyond that, it is simply a good idea to keep fragmentation in usage to a minimum. The glibc malloc allocator allocates one page at a time, potentially causing lots of fragmentation. However, those pages are likely to have the same number of references, often making it possible to merge ranges. PiperOrigin-RevId: 204960339 Change-Id: I03a050cf771c29a4f05b36eaf75b1a09c9465e14	2018-07-17 13:03:59 -07:00
Adin Scannell	29e00c943a	Add CPUID faulting for ptrace and KVM. PiperOrigin-RevId: 204858314 Change-Id: I8252bf8de3232a7a27af51076139b585e73276d4	2018-07-16 22:02:58 -07:00
Michael Pratt	14d06064d2	Start allocation and reclaim scans only where they may find a match If usageSet is heavily fragmented, findUnallocatedRange and findReclaimable can spend excessive cycles linearly scanning the set for unallocated/free pages. Improve common cases by beginning the scan only at the first page that could possibly contain an unallocated/free page. This metadata only guarantees that there is no lower unallocated/free page, but a scan may still be required (especially for multi-page allocations). That said, this heuristic can still provide significant performance improvements for certain applications. PiperOrigin-RevId: 204841833 Change-Id: Ic41ad33bf9537ecd673a6f5852ab353bf63ea1e6	2018-07-16 18:19:01 -07:00
Neel Natu	8f21c0bb28	Add EventOperations.HostFD() This method allows an eventfd inside the Sentry to be registered with with the host kernel. Update comment about memory mapping host fds via CachingInodeOperations. PiperOrigin-RevId: 204784859 Change-Id: I55823321e2d84c17ae0f7efaabc6b55b852ae257	2018-07-16 12:20:05 -07:00
Neel Natu	5b09ec3b89	Allow a filesystem to control its visibility in /proc/filesystems. PiperOrigin-RevId: 204508520 Change-Id: I09e5f8b6e69413370e1a0d39dbb7dc1ee0b6192d	2018-07-13 12:10:57 -07:00
Michael Pratt	f09ebd9c71	Note that Mount errors do not require translations PiperOrigin-RevId: 204490639 Change-Id: I0fe26306bae9320c6aa4f854fe0ef25eebd93233	2018-07-13 10:24:18 -07:00
Michael Pratt	a28b274abb	Fix aio eventfd lookup We're failing to set eventFile in the outer scope. PiperOrigin-RevId: 204392995 Change-Id: Ib9b04f839599ef552d7b5951d08223e2b1d5f6ad	2018-07-12 17:14:50 -07:00
Zhaozhong Ni	1cd46c8dd1	sentry: wait for restore clock instead of panicing in Timekeeper. PiperOrigin-RevId: 204372296 Change-Id: If1ed9843b93039806e0c65521f30177dc8036979	2018-07-12 15:09:02 -07:00
Zhaozhong Ni	bb41ad808a	sentry: save inet stacks in proc files. PiperOrigin-RevId: 204362791 Change-Id: If85ea7442741e299f0d7cddbc3d6b415e285da81	2018-07-12 14:19:04 -07:00
Michael Pratt	41e0b977e5	Format documentation PiperOrigin-RevId: 204323728 Change-Id: I1ff9aa062ffa12583b2e38ec94c87db7a3711971	2018-07-12 10:37:21 -07:00
Jamie Liu	b9c469f372	Move ptrace constants to abi/linux. PiperOrigin-RevId: 204188763 Change-Id: I5596ab7abb3ec9e210a7f57b3fc420e836fa43f3	2018-07-11 14:24:19 -07:00
Jamie Liu	ee0ef506d4	Add MemoryManager.Pin. PiperOrigin-RevId: 204162313 Change-Id: Ib0593dde88ac33e222c12d0dca6733ef1f1035dc	2018-07-11 11:52:09 -07:00
Jamie Liu	06920b3d1b	Exit tmpfs.fileInodeOperations.Translate early if required.Start >= EOF. Otherwise required and optional can be empty or have negative length. PiperOrigin-RevId: 204007079 Change-Id: I59e472a87a8caac11ffb9a914b8d79bf0cd70995	2018-07-10 13:58:54 -07:00
Zhaozhong Ni	b1683df90b	netstack: tcp socket connected state S/R support. PiperOrigin-RevId: 203958972 Change-Id: Ia6fe16547539296d48e2c6731edacdd96bd6e93c	2018-07-10 09:23:35 -07:00
Jamie Liu	41aeb680b1	Inherit parent in clone(CLONE_THREAD) under TaskSet.mu. PiperOrigin-RevId: 203849534 Change-Id: I4d81513bfd32e0b7fc40c8a4c194eba7abc35a83	2018-07-09 16:16:19 -07:00
Michael Pratt	0dedac637f	Trim all whitespace between interpreter and arg Multiple whitespace characters are allowed. This fixes Ubuntu's /usr/sbin/invoke-rc.d, which has trailing whitespace after the interpreter which we were treating as an arg. PiperOrigin-RevId: 203802278 Change-Id: I0a6cdb0af4b139cf8abb22fa70351fe3697a5c6b	2018-07-09 11:43:56 -07:00
Rahat Mahmood	34af9a6174	Fix data race on inotify.Watch.mask. PiperOrigin-RevId: 203180463 Change-Id: Ief50988c1c028f81ec07a26e704d893e86985bf0	2018-07-03 14:08:51 -07:00
Michael Pratt	660f1203ff	Fix runsc VDSO mapping `80bdf8a406` accidentally moved vdso into an inner scope, never assigning the vdso variable passed to the Kernel and thus skipping VDSO mappings. Fix this and remove the ability for loadVDSO to skip VDSO mappings, since tests that do so are gone. PiperOrigin-RevId: 203169135 Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba	2018-07-03 12:53:39 -07:00
Michael Pratt	062a6f6ec5	Handle NUL-only paths in exec The path in execve(2), interpreter script, and ELF interpreter may all be no more than a NUL-byte. Handle each of those cases. PiperOrigin-RevId: 203155745 Change-Id: I1c8b1b387924b23b2cf942341dfc76c9003da959	2018-07-03 11:28:53 -07:00
Michael Pratt	2821dfe6ce	Hold d.parent.mu when reading d.name PiperOrigin-RevId: 203041657 Change-Id: I120783d91712818e600505454c9276f8d9877f37	2018-07-02 17:39:10 -07:00
Justine Olshan	80bdf8a406	Sets the restore environment for restoring a container. Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c	2018-06-29 14:47:40 -07:00
Nicolas Lacasse	1b5e09f968	aio: Return EINVAL if the number of events is negative. PiperOrigin-RevId: 202671065 Change-Id: I248b74544d47ddde9cd59d89aa6ccb7dad2b6f89	2018-06-29 10:47:38 -07:00
Nicolas Lacasse	f93bd2cbe6	Hold t.mu while calling t.FSContext(). PiperOrigin-RevId: 202562686 Change-Id: I0f5be7cc9098e86fa31d016251c127cb91084b05	2018-06-28 16:11:19 -07:00
Nicolas Lacasse	1ceed49ba9	Check for invalid offset when submitting an AIO read/write request. PiperOrigin-RevId: 202528335 Change-Id: Ic32312cf4337bcb40a7155cb2174e5cd89a280f7	2018-06-28 12:55:18 -07:00
Fabricio Voznika	6b6852bceb	Fix semaphore data races PiperOrigin-RevId: 202371908 Change-Id: I72603b1d321878cae6404987c49e64732b676331	2018-06-27 14:41:32 -07:00
Nicolas Lacasse	99afc982f1	Call mm.CheckIORange() when copying in IOVecs. CheckIORange is analagous to Linux's access_ok() method, which is checked when copying in IOVecs in both lib/iov_iter.c:import_single_range() and lib/iov_iter.c:import_iovec() => fs/read_write.c:rw_copy_check_uvector(). gVisor copies in IOVecs via Task.SingleIOSequence() and Task.CopyInIovecs(). We were checking the address range bounds, but not whether the address is valid. To conform with linux, we should also check that the address is valid. For usual preadv/pwritev syscalls, the effect of this change is not noticeable, since we find out that the address is invalid before the syscall completes. For vectorized async-IO operations, however, this change is necessary because Linux returns EFAULT when the operation is submitted, but before it executes. Thus, we must validate the iovecs when copying them in. PiperOrigin-RevId: 202370092 Change-Id: I8759a63ccf7e6b90d90d30f78ab8935a0fcf4936	2018-06-27 14:31:35 -07:00

... 4 5 6 7 8 ...

670 Commits