gvisor

Commit Graph

Author	SHA1	Message	Date
Michael Pratt	69eac1198f	Move wait constants to abi/linux package Updates #214 PiperOrigin-RevId: 249483756 Change-Id: I0d3cf4112bed75a863d5eb08c2063fbc506cd875	2019-05-22 11:15:33 -07:00
Adin Scannell	ae1bb08871	Clean up pipe internals and add fcntl support Pipe internals are made more efficient by avoiding garbage collection. A pool is now used that can be shared by all pipes, and buffers are chained via an intrusive list. The documentation for pipe structures and methods is also simplified and clarified. The pipe tests are now parameterized, so that they are run on all different variants (named pipes, small buffers, default buffers). The pipe buffer sizes are exposed by fcntl, which is now supported by this change. A size change test has been added to the suite. These new tests uncovered a bug regarding the semantics of open named pipes with O_NONBLOCK, which is also fixed by this CL. This fix also addresses the lack of the O_LARGEFILE flag for named pipes. PiperOrigin-RevId: 249375888 Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773	2019-05-21 20:12:27 -07:00
Michael Pratt	c8857f7269	Fix inconsistencies in ELF anonymous mappings * A segment with filesz == 0, memsz > 0 should be an anonymous only mapping. We were failing to load such an ELF. * Anonymous pages are always mapped RW, regardless of the segment protections. PiperOrigin-RevId: 249355239 Change-Id: I251e5c0ce8848cf8420c3aadf337b0d77b1ad991	2019-05-21 17:06:05 -07:00
Bhasker Hariharan	2ac0aeeb42	Refactor fdbased endpoint dispatcher code. This is in preparation to support an fdbased endpoint that can read/dispatch packets from multiple underlying fds. Updates #231 PiperOrigin-RevId: 249337074 Change-Id: Id7d375186cffcf55ae5e38986e7d605a96916d35	2019-05-21 15:24:25 -07:00
Adin Scannell	9cdae51fec	Add basic plumbing for splice and stub implementation. This does not actually implement an efficient splice or sendfile. Rather, it adds a generic plumbing to the file internals so that this can be added. All file implementations use the stub fileutil.NoSplice implementation, which causes sendfile and splice to fall back to an internal copy. A basic splice system call interface is added, along with a test. PiperOrigin-RevId: 249335960 Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b	2019-05-21 15:18:12 -07:00
Neel Natu	adeb99709b	Remove unused struct member. Remove unused struct member. PiperOrigin-RevId: 249300446 Change-Id: Ifb16538f684bc3200342462c3da927eb564bf52d	2019-05-21 12:20:19 -07:00
Michael Pratt	80cc2c78e5	Forward named pipe creation to the gofer The backing 9p server must allow named pipe creation, which the runsc fsgofer currently does not. There are small changes to the overlay here. GetFile may block when opening a named pipe, which can cause a deadlock: 1. open(O_RDONLY) -> copyMu.Lock() -> GetFile() 2. open(O_WRONLY) -> copyMu.Lock() -> Deadlock A named pipe usable for writing must already be on the upper filesystem, but we are still taking copyMu for write when checking for upper. That can be changed to a read lock to fix the common case. However, a named pipe on the lower filesystem would still deadlock in open(O_WRONLY) when it tries to actually perform copy up (which would simply return EINVAL). Move the copy up type check before taking copyMu for write to avoid this. p9 must be modified, as it was incorrectly removing the file mode when sending messages on the wire. PiperOrigin-RevId: 249154033 Change-Id: Id6637130e567b03758130eb6c7cdbc976384b7d6	2019-05-20 16:53:08 -07:00
Michael Pratt	6588427451	Fix incorrect tmpfs timestamp updates * Creation of files, directories (and other fs objects) in a directory should always update ctime. * Same for removal. * atime should not be updated on lookup, only readdir. I've also renamed some misleading functions that update mtime and ctime. PiperOrigin-RevId: 249115063 Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7	2019-05-20 13:35:17 -07:00
Michael Pratt	4a842836e5	Return EPERM for mknod This more directly matches what Linux does with unsupported nodes. PiperOrigin-RevId: 248780425 Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec	2019-05-17 13:47:40 -07:00
Michael Pratt	04105781ad	Fix gofer rename ctime and cleanup stat_times test There is a lot of redundancy that we can simplify in the stat_times test. This will make it easier to add new tests. However, the simplification reveals that cached uattrs on goferfs don't properly update ctime on rename. PiperOrigin-RevId: 248773425 Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf	2019-05-17 13:05:47 -07:00
Andrei Vagin	2105158d4b	gofer: don't call hostfile.Close if hostFile is nil PiperOrigin-RevId: 248437159 Change-Id: Ife71f6ca032fca59ec97a82961000ed0af257101	2019-05-15 17:21:10 -07:00
Andrei Vagin	3abee2ecb9	Automated rollback of changelist 247964961 PiperOrigin-RevId: 248411456 Change-Id: I21c3767b0b7e5948536d4c0b78be46ba35cf76cb	2019-05-15 14:58:40 -07:00
Nicolas Lacasse	dd153c014d	Start of support for /proc/pid/cgroup file. PiperOrigin-RevId: 248263378 Change-Id: Ic057d2bb0b6212110f43ac4df3f0ac9bf931ab98	2019-05-14 20:34:50 -07:00
Michael Pratt	330a1bbd04	Remove false comment PiperOrigin-RevId: 248249285 Change-Id: I9b6d267baa666798b22def590ff20c9a118efd47	2019-05-14 18:06:14 -07:00
Andrei Vagin	ec248daf29	gvisor/hostnet: restart epoll_wait after epoll_ctl Otherwise changes of epoll_ctl will not have affect. PiperOrigin-RevId: 247964961 Change-Id: I9fbb35c44766421af45d9ed53760e0c324d80d99	2019-05-13 10:38:27 -07:00
Jamie Liu	5ee8218483	Add pgalloc.DelayedEvictionManual. PiperOrigin-RevId: 247667272 Change-Id: I16b04e11bb93f50b7e05e888992303f730e4a877	2019-05-10 13:37:48 -07:00
Fabricio Voznika	1bee43be13	Implement fallocate(2) Closes #225 PiperOrigin-RevId: 247508791 Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc	2019-05-09 15:35:49 -07:00
Tamir Duberstein	0f4be95a33	Remove dhcp client This was upstreamed from Fuchsia, but it is pretty buggy and doesn't rely on any private APIs. Thus it can be checked into the Fuchsia source tree without forking netstack, where we can more easily iterate on (and eventually remove) it. PiperOrigin-RevId: 247506582 Change-Id: Ifb1b60c6c4941c374a59c5570a6a9cacf2468981	2019-05-09 15:23:03 -07:00
Nicolas Lacasse	bfd9f75ba4	Set the FilesytemType in MountSource from the Filesystem. And stop storing the Filesystem in the MountSource. This allows us to decouple the MountSource filesystem type from the name of the filesystem. PiperOrigin-RevId: 247292982 Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8	2019-05-08 14:35:06 -07:00
Googler	cbf6ab9697	Check GSO for nil in WritePacket Testing: Unit tests added PiperOrigin-RevId: 247096269 Change-Id: I849c010eadcb53caf45896a15ef38162d66a9568	2019-05-07 14:57:03 -07:00
Ian Gudger	20862f0db2	Add gonet.DialContextTCP. Allows cancellation and timeouts. PiperOrigin-RevId: 247090428 Change-Id: I91907f12e218677dcd0e0b6d72819deedbd9f20c	2019-05-07 14:27:36 -07:00
Fabricio Voznika	e5432fa1b3	Remove defers from gofer.contextFile Most are single line methods in hot paths. PiperOrigin-RevId: 247050267 Change-Id: I428d78723fe00b57483185899dc8fa9e1f01e2ea	2019-05-07 10:55:09 -07:00
Jamie Liu	14f0e7618e	Ensure all uses of MM.brk occur under MM.mappingMu in MM.Brk(). PiperOrigin-RevId: 246921386 Change-Id: I71d8908858f45a9a33a0483470d0240eaf0fd012	2019-05-06 16:39:43 -07:00
Kevin Krakauer	ff8ed5e6a5	Fix raw socket behavior and tests. Some behavior was broken due to the difficulty of running automated raw socket tests. Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6 PiperOrigin-RevId: 246747067	2019-05-05 16:07:25 -07:00
Bin Lu	ebe2f78d9b	Add arm64 support to pkg/seccomp Signed-off-by: Bin Lu <bin.lu@arm.com> PiperOrigin-RevId: 246622505 Change-Id: I803639a0c5b0f75959c64fee5385314214834d10	2019-05-03 22:03:59 -07:00
Ian Gudger	b4a9f18687	Update tcpip Clock description. The tcpip.Clock comment stated that times provided by it should not be used for netstack internal timekeeping. This comment was from before the interface supported monotonic times. The monotonic times that it provides are now be the preferred time source for netstack internal timekeeping. PiperOrigin-RevId: 246618772 Change-Id: I853b720e3d719b03fabd6156d2431da05d354bda	2019-05-03 21:01:42 -07:00
Andrei Vagin	24d8656585	gofer: don't leak file descriptors Fixes #219 PiperOrigin-RevId: 246568639 Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a	2019-05-03 14:01:50 -07:00
Googler	f2699b76c8	Support IPv4 fragmentation in netstack Testing: Unit tests and also large ping in Fuchsia OS PiperOrigin-RevId: 246563592 Change-Id: Ia12ab619f64f4be2c8d346ce81341a91724aef95	2019-05-03 13:30:35 -07:00
Kevin Krakauer	264d012d81	Add netfilter ABI for iptables support. Change-Id: Ifbd2abf63ea8062a89b83e948d3e9735480d8216 PiperOrigin-RevId: 246559904	2019-05-03 13:06:09 -07:00
Tamir Duberstein	0e1cc476db	Fix transport/raw copybara export - include packet_list.go - exclude state.go (by renaming to include an underscore) Also rename raw.go to endpoint.go for consistency. PiperOrigin-RevId: 246547912 Change-Id: I19c8331c794ba683a940cc96a8be6497b53ff24d	2019-05-03 11:52:59 -07:00
Bhasker Hariharan	458fe955a7	Implement support for SACK based recovery(RFC 6675). PiperOrigin-RevId: 246536003 Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9	2019-05-03 10:51:18 -07:00
Chris Kuiper	2d8e90b311	Proper cleanup of sockets that used REUSEPORT Fixed a small logic error that broke proper accounting of MultiPortEndpoints. PiperOrigin-RevId: 246502126 Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2	2019-05-03 07:02:51 -07:00
Chris Kuiper	8972e47a2e	Support reception of multicast data on more than one socket This requires two changes: 1) Support for more than one socket to join a given multicast group. 2) Duplicate delivery of incoming multicast packets to all sockets listening for it. In addition, I tweaked the code (and added a test) to disallow duplicates IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does it. PiperOrigin-RevId: 246437315 Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72	2019-05-02 19:41:00 -07:00
Michael Pratt	23ca9886c6	Update reference to old type PiperOrigin-RevId: 246036806 Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7	2019-04-30 15:42:39 -07:00
Jamie Liu	8bfb83d0ac	Implement async MemoryFile eviction, and use it in CachingInodeOperations. This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb	2019-04-30 13:56:41 -07:00
Ian Gudger	81ecd8b6ea	Implement the MSG_CTRUNC msghdr flag for Unix sockets. Updates google/gvisor#206 PiperOrigin-RevId: 245880573 Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e	2019-04-29 21:21:08 -07:00
Fabricio Voznika	ddab854b9a	Reduce memory allocations on serving path Cache last used messages and reuse them for subsequent requests. If more messages are needed, they are created outside the cache on demand. PiperOrigin-RevId: 245836910 Change-Id: Icf099ddff95df420db8e09f5cdd41dcdce406c61	2019-04-29 15:33:47 -07:00
Michael Pratt	4d52a55201	Change copyright notice to "The gVisor Authors" Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9	2019-04-29 14:26:23 -07:00
Nicolas Lacasse	f4ce43e1f4	Allow and document bug ids in gVisor codebase. PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789	2019-04-29 14:04:14 -07:00
Nicolas Lacasse	2df64cd6d2	createAt should return all errors from FindInode except ENOENT. Previously, createAt was eating all errors from FindInode except for EACCES and proceeding with the creation. This is incorrect, as FindInode can return many other errors (like ENAMETOOLONG) that should stop creation. This CL changes createAt to return all errors encountered except for ENOENT, which we can ignore because we are about to create the thing. PiperOrigin-RevId: 245773222 Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56	2019-04-29 10:30:24 -07:00
Ben Burkert	66bca6fc22	tcpip/adapters/gonet: add CloseRead & CloseWrite methods to Conn Add the CloseRead & CloseWrite methods that performs shutdown on the corresponding Read & Write sides of a connection. Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1 PiperOrigin-RevId: 245537950	2019-04-26 22:46:45 -07:00
Kevin Krakauer	43dff57b87	Make raw sockets a toggleable feature disabled by default. PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615	2019-04-26 16:51:46 -07:00
Adin Scannell	5749f64314	kvm: remove non-sane sanity check Apparently some platforms don't have pSize < vSize. Fixes #208 PiperOrigin-RevId: 245480998 Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c	2019-04-26 13:53:12 -07:00
Bhasker Hariharan	228dc15fd1	Bump the AF_PACKET socket rcv buf size to 4MB by default. Packet socket receive buffers default to the sysctl value of net.core.rmem_default and are capped by net.core.rmem_max both which are usually set to 208KB on most systems. Since we can't expect every gVisor user to bump these we use SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs with CAP_NET_ADMIN outside the sandbox and can do this before the FD is passed to the sentry inside the sandbox. Updates #211 iperf output w/ 4MB buffer. iperf3 -c 172.17.0.2 -t 100 Connecting to host 172.17.0.2, port 5201 [ 4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.02 MBytes [ 4] 1.00-2.00 sec 1.18 GBytes 10.2 Gbits/sec 0 1.02 MBytes [ 4] 2.00-3.00 sec 965 MBytes 8.09 Gbits/sec 0 1.02 MBytes [ 4] 3.00-4.00 sec 942 MBytes 7.90 Gbits/sec 0 1.02 MBytes [ 4] 4.00-5.00 sec 952 MBytes 7.99 Gbits/sec 0 1.02 MBytes [ 4] 5.00-6.00 sec 1.14 GBytes 9.81 Gbits/sec 0 1.02 MBytes [ 4] 6.00-7.00 sec 1.13 GBytes 9.68 Gbits/sec 0 1.02 MBytes [ 4] 7.00-8.00 sec 930 MBytes 7.80 Gbits/sec 0 1.02 MBytes [ 4] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.02 MBytes [ 4] 9.00-10.00 sec 938 MBytes 7.87 Gbits/sec 0 1.02 MBytes [ 4] 10.00-11.00 sec 737 MBytes 6.18 Gbits/sec 0 1.02 MBytes [ 4] 11.00-12.00 sec 1.16 GBytes 9.93 Gbits/sec 0 1.02 MBytes [ 4] 12.00-13.00 sec 917 MBytes 7.69 Gbits/sec 0 1.02 MBytes [ 4] 13.00-14.00 sec 1.19 GBytes 10.2 Gbits/sec 0 1.02 MBytes [ 4] 14.00-15.00 sec 1.01 GBytes 8.70 Gbits/sec 0 1.02 MBytes [ 4] 15.00-16.00 sec 1.20 GBytes 10.3 Gbits/sec 0 1.02 MBytes [ 4] 16.00-17.00 sec 1.14 GBytes 9.80 Gbits/sec 0 1.02 MBytes ^C[ 4] 17.00-17.60 sec 718 MBytes 10.1 Gbits/sec 0 1.02 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-17.60 sec 18.4 GBytes 8.98 Gbits/sec 0 sender [ 4] 0.00-17.60 sec 0.00 Bytes 0.00 bits/sec receiver PiperOrigin-RevId: 245470590 Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00	2019-04-26 12:52:02 -07:00
Kevin Krakauer	5f13338d30	Fix reference counting bug in /proc/PID/fdinfo/. PiperOrigin-RevId: 245452217 Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869	2019-04-26 11:09:55 -07:00
Michael Pratt	f17cfa4d53	Perform explicit CPUID and FP state compatibility checks on restore PiperOrigin-RevId: 245341004 Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657	2019-04-25 17:47:05 -07:00
Jamie Liu	6b76c172b4	Don't enforce NAME_MAX in fs.Dirent.walk(). Maximum filename length is filesystem-dependent, and obtained via statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not always. For example, VFAT supports filenames of up to 255... UCS-2 characters, which Linux conservatively takes to mean UTF-8-encoded bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. As a result, Linux's VFS does not enforce NAME_MAX: $ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/ fs/libfs.c 38: buf->f_namelen = NAME_MAX; 64: if (dentry->d_name.len > NAME_MAX) include/linux/relay.h 74: char base_filename[NAME_MAX]; /* saved base filename / include/linux/fscrypt.h 149: filenames up to NAME_MAX bytes, since base64 encoding expands the length. include/linux/exportfs.h 176: * understanding that it is already pointing to a a %NAME_MAX+1 sized Remove this check from core VFS, and add it to ramfs (and by extension tmpfs), where it is actually applicable: mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup does enforce NAME_MAX. PiperOrigin-RevId: 245324748 Change-Id: I17567c4324bfd60e31746a5270096e75db963fac	2019-04-25 16:05:13 -07:00
Bhasker Hariharan	56cadcac4e	Fixes to PacketMMap dispatcher. This CL fixes the following bugs: - Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc which are not atomic. - Increments ringOffsets for frames that are truncated (i.e status is tpStatusCopy) - Does not ignore frames with tpStatusLost bit set as they are valid frames and only indicate that there some frames were lost before this one and metrics can be retrieved with a getsockopt call. - Adds checks to make sure blockSize is a multiple of page size. This is required as the kernel allocates in pages per block and rejects sizes that are not page aligned with an EINVAL. Updates #210 PiperOrigin-RevId: 244959464 Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f	2019-04-23 17:47:56 -07:00
Fabricio Voznika	db334f7154	Remove reflection from 9P serving path p9.messageByType was taking 7% of p9.recv before, spending time with reflection and map lookup. Now it's reduced to 1%. PiperOrigin-RevId: 244947313 Change-Id: I42813f920557b7656f8b29157eb32acd79e11fa5	2019-04-23 16:26:10 -07:00
Fabricio Voznika	908edee04f	Replace os.File with fd.FD in fsgofer os.NewFile() accounts for 38% of CPU time in localFile.Walk(). This change switchs to use fd.FD which is much cheaper to create. Now, fd.New() in localFile.Walk() accounts for only 4%. PiperOrigin-RevId: 244944983 Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46	2019-04-23 16:10:54 -07:00
Wei Zhang	17ff6063a3	Bugfix: fix fstatat symbol link to dir For a symbol link to some directory, eg. `/tmp/symlink -> /tmp/dir` `fstatat("/tmp/symlink")` should return symbol link data, but `fstatat("/tmp/symlink/")` (symlink with trailing slash) should return directory data it points following linux behaviour. Currently fstatat() a symlink with trailing slash will get "not a directory" error which is wrong. Signed-off-by: Wei Zhang <zhangwei198900@gmail.com> Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2 PiperOrigin-RevId: 244783916	2019-04-22 20:07:06 -07:00
Michael Pratt	d6aac9387f	Fix doc typo PiperOrigin-RevId: 244773890 Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa	2019-04-22 18:22:19 -07:00
Michael Pratt	f86c35a51f	Clean up state error handling PiperOrigin-RevId: 244773836 Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf	2019-04-22 18:20:51 -07:00
Ben Burkert	56927e5317	tcpip/transport/tcp: read side only shutdown of an endpoint Support shutdown on only the read side of an endpoint. Reads performed after a call to Shutdown with only the ShutdownRead flag will return ErrClosedForReceive without data. Break out the shutdown(2) with SHUT_RD syscall test into to two tests. The first tests that no packets are sent when shutting down the read side of a socket. The second tests that, after shutting down the read side of a socket, unread data can still be read, or an EOF if there is no more data to read. Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce PiperOrigin-RevId: 244459430	2019-04-19 19:29:05 -07:00
Ian Gudger	358eb52a76	Add support for the MSG_TRUNC msghdr flag. The MSG_TRUNC flag is set in the msghdr when a message is truncated. Fixes google/gvisor#200 PiperOrigin-RevId: 244440486 Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456	2019-04-19 16:17:01 -07:00
Ben Burkert	cec2cdc12f	tcpip/transport/udp: add Forwarder type Add a UDP forwarder for intercepting and forwarding UDP sessions. Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9 PiperOrigin-RevId: 244293576	2019-04-18 17:49:57 -07:00
Michael Pratt	c931c8e082	Format struct pollfd in poll(2)/ppoll(2) I0410 15:40:38.854295 3776 x:0] [ 1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1) I0410 15:40:38.854348 3776 x:0] [ 1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT\|POLLERR\|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s) PiperOrigin-RevId: 244269879 Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307	2019-04-18 15:24:07 -07:00
Ian Gudger	133700007a	Only emit unimplemented syscall events for unsupported values. Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER when attempting to set unsupported values. PiperOrigin-RevId: 244229675 Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1	2019-04-18 11:51:41 -07:00
Andrei Vagin	4524790ff6	netstack: use a proper network protocol to set gso.L3HdrLen It is possible to create a listening socket which will accept IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber for all accepted endpoints, even if they handle IPv4 connections. This means that we can't use endpoint.netProto to set gso.L3HdrLen. PiperOrigin-RevId: 244227948 Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1	2019-04-18 11:42:23 -07:00
Michael Pratt	b52cbd6028	Don't allow sigtimedwait to catch unblockable signals The existing logic attempting to do this is incorrect. Unary ^ has higher precedence than &^, so mask always has UnblockableSignals cleared, allowing dequeueSignalLocked to dequeue unblockable signals (which allows userspace to ignore them). Switch the logic so that unblockable signals are always masked. PiperOrigin-RevId: 244058487 Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3	2019-04-17 13:43:20 -07:00
Fabricio Voznika	c8cee7108f	Use FD limit and file size limit from host FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6	2019-04-17 12:57:40 -07:00
Michael Pratt	08d99c5fbe	Convert poll/select to operate more directly on linux.PollFD Current, doPoll copies the user struct pollfd array into a []syscalls.PollFD, which contains internal kdefs.FD and waiter.EventMask types. While these are currently binary-compatible with the Linux versions, we generally discourage copying directly to internal types (someone may inadvertantly change kdefs.FD to uint64). Instead, copy directly to a []linux.PollFD, which will certainly be binary compatible. Most of syscalls/polling.go is included directly into syscalls/linux/sys_poll.go, as it can then operate directly on linux.PollFD. The additional syscalls.PollFD type is providing little value. I've also added explicit conversion functions for waiter.EventMask, which creates the possibility of a different binary format. PiperOrigin-RevId: 244042947 Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf	2019-04-17 12:15:01 -07:00
Googler	e091b4e7c0	Internal change. PiperOrigin-RevId: 244036529 Change-Id: I280f9632a65d2e40d844e0d5ec3a101d808434ee	2019-04-17 11:40:11 -07:00
Fabricio Voznika	9f8c89fc7f	Return error from fdbased.New RELNOTES: n/a PiperOrigin-RevId: 244031742 Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81	2019-04-17 11:16:35 -07:00
Michael Pratt	6b24f7ab08	Format FDs in strace logs Normal files display their path in the current mount namespace: I0410 10:57:54.964196 216336 x:0] [ 1] ls X read(0x3 /proc/filesystems, 0x55cee3bdb2c0 "nodev\t9p\nnodev\tdevpts \nnodev\tdevtmpfs\nnodev\tproc\nnodev\tramdiskfs\nnodev\tsysfs\nnodev\ttmpfs\n", 0x1000) = 0x58 (24.462?s) AT_FDCWD includes the CWD: I0411 12:58:48.278427 1526 x:0] [ 1] stat_test E newfstatat(AT_FDCWD /home/prattmic, 0x55ea719b564e /proc/self, 0x7ef5cefc2be8, 0x0) Sockets (and other non-vfs files) display an inode number (like /proc/PID/fd): I0410 10:54:38.909123 207684 x:0] [ 1] nc E bind(0x3 socket:[1], 0x55b5a1652040 {Family: AF_INET, Addr: , Port: 8080}, 0x10) I also fixed a few syscall args that should be Path. PiperOrigin-RevId: 243169025 Change-Id: Ic7dda6a82ae27062fe2a4a371557acfd6a21fa2a	2019-04-11 16:48:39 -07:00
Jamie Liu	4209edafb6	Use open fids when fstat()ing gofer files. PiperOrigin-RevId: 243018347 Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37	2019-04-11 00:43:04 -07:00
Michael Pratt	cc48969bb7	Internal change PiperOrigin-RevId: 242978508 Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b	2019-04-10 18:00:18 -07:00
Nicolas Lacasse	d93d19fd4e	Fix uses of RootFromContext. RootFromContext can return a dirent with reference taken, or nil. We must call DecRef if (and only if) a real dirent is returned. PiperOrigin-RevId: 242965515 Change-Id: Ie2b7b4cb19ee09b6ccf788b71f3fd7efcdf35a11	2019-04-10 16:36:28 -07:00
Yong He	89cc8eef9b	DATA RACE in fs.(Dirent).fullName add renameMu.Lock when oldParent == newParent in order to avoid data race in following report: WARNING: DATA RACE Read at 0x00c000ba2160 by goroutine 405: gvisor.googlesource.com/gvisor/pkg/sentry/fs.(Dirent).fullName() pkg/sentry/fs/dirent.go:246 +0x6c gvisor.googlesource.com/gvisor/pkg/sentry/fs.(Dirent).FullName() pkg/sentry/fs/dirent.go:356 +0x8b gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(FDMap).String() pkg/sentry/kernel/fd_map.go:135 +0x1e0 fmt.(pp).handleMethods() GOROOT/src/fmt/print.go:603 +0x404 fmt.(pp).printArg() GOROOT/src/fmt/print.go:686 +0x255 fmt.(pp).doPrintf() GOROOT/src/fmt/print.go:1003 +0x33f fmt.Fprintf() GOROOT/src/fmt/print.go:188 +0x7f gvisor.googlesource.com/gvisor/pkg/log.(Writer).Emit() pkg/log/log.go:121 +0x89 gvisor.googlesource.com/gvisor/pkg/log.GoogleEmitter.Emit() pkg/log/glog.go:162 +0x1acc gvisor.googlesource.com/gvisor/pkg/log.(GoogleEmitter).Emit() <autogenerated>:1 +0xe1 gvisor.googlesource.com/gvisor/pkg/log.(BasicLogger).Debugf() pkg/log/log.go:177 +0x111 gvisor.googlesource.com/gvisor/pkg/log.Debugf() pkg/log/log.go:235 +0x66 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).Debugf() pkg/sentry/kernel/task_log.go:48 +0xfe gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).DebugDumpState() pkg/sentry/kernel/task_log.go:66 +0x11f gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(runApp).execute() pkg/sentry/kernel/task_run.go:272 +0xc80 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).run() pkg/sentry/kernel/task_run.go:91 +0x24b Previous write at 0x00c000ba2160 by goroutine 423: gvisor.googlesource.com/gvisor/pkg/sentry/fs.Rename() pkg/sentry/fs/dirent.go:1628 +0x61f gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1.1() pkg/sentry/syscalls/linux/sys_file.go:1864 +0x1f8 gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt( gvisor.googlesource.com/g/linux/sys_file.go:51 +0x20f gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1() pkg/sentry/syscalls/linux/sys_file.go:1852 +0x218 gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt() pkg/sentry/syscalls/linux/sys_file.go:51 +0x20f gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt() pkg/sentry/syscalls/linux/sys_file.go:1840 +0x180 gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.Rename() pkg/sentry/syscalls/linux/sys_file.go:1873 +0x60 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).executeSyscall() pkg/sentry/kernel/task_syscall.go:165 +0x17a gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).doSyscallInvoke() pkg/sentry/kernel/task_syscall.go:283 +0xb4 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).doSyscallEnter() pkg/sentry/kernel/task_syscall.go:244 +0x10c gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).doSyscall() pkg/sentry/kernel/task_syscall.go:219 +0x1e3 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(runApp).execute() pkg/sentry/kernel/task_run.go:215 +0x15a9 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).run() pkg/sentry/kernel/task_run.go:91 +0x24b Reported-by: syzbot+e1babbf756fab380dfff@syzkaller.appspotmail.com Change-Id: Icd2620bb3ea28b817bf0672d454a22b9d8ee189a PiperOrigin-RevId: 242938741	2019-04-10 14:17:33 -07:00
Kevin Krakauer	f7aff0aaa4	Allow threads with CAP_SYS_RESOURCE to raise hard rlimits. PiperOrigin-RevId: 242919489 Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f	2019-04-10 12:36:45 -07:00
Nicolas Lacasse	0a0619216e	Start saving MountSource.DirentCache. DirentCache is already a savable type, and it ensures that it is empty at the point of Save. There is no reason not to save it along with the MountSource. This did uncover an issue where not all MountSources were properly flushed before Save. If a mount point has an open file and is then unmounted, we save the MountSource without flushing it first. This CL also fixes that by flushing all MountSources for all open FDs on Save. PiperOrigin-RevId: 242906637 Change-Id: I3acd9d52b6ce6b8c989f835a408016cb3e67018f	2019-04-10 11:27:16 -07:00
Shiva Prasanth	7140b1fdca	Fixed /proc/cpuinfo permissions This also applies these permissions to other static proc files. Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d PiperOrigin-RevId: 242898575	2019-04-10 10:49:43 -07:00
Li Qiang	b3b140ea4f	syscalls: sendfile: limit the count to MAX_RW_COUNT From sendfile spec and also the linux kernel code, we should limit the count arg to 'MAX_RW_COUNT'. This patch export 'MAX_RW_COUNT' in kernel pkg and use it in the implementation of sendfile syscall. Signed-off-by: Li Qiang <pangpei.lq@antfin.com> Change-Id: I1086fec0685587116984555abd22b07ac233fbd2 PiperOrigin-RevId: 242745831	2019-04-09 14:57:05 -07:00
Bhasker Hariharan	eaac2806ff	Add TCP checksum verification. PiperOrigin-RevId: 242704699 Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f	2019-04-09 11:23:47 -07:00
Tamir Duberstein	cf4ed408c3	Use (*testing.T).Helper to clean up test failures PiperOrigin-RevId: 242647530 Change-Id: I1bf9ac1d664f452dc47ca670d408a73538cb482f	2019-04-09 05:17:32 -07:00
Jamie Liu	9471c01348	Export kernel.SignalInfoPriv. Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks. PiperOrigin-RevId: 242562428 Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90	2019-04-08 16:32:11 -07:00
Nicolas Lacasse	70906f1d24	Intermediate ram fs dirs should be writable. We construct a ramfs tree of "scaffolding" directories for all mount points, so that a directory exists that each mount point can be mounted over. We were creating these directories without write permissions, which meant that they were not wribable even when underlayed under a writable filesystem. They should be writable. PiperOrigin-RevId: 242507789 Change-Id: I86645e35417560d862442ff5962da211dbe9b731	2019-04-08 11:56:38 -07:00
Nicolas Lacasse	ee7e6d33b2	Use string type for extended attribute values, instead of []byte. Strings are a better fit for this usage because they are immutable in Go, and can contain arbitrary bytes. It also allows us to avoid casting bytes to string (and the associated allocation) in the hot path when checking for overlay whiteouts. PiperOrigin-RevId: 242208856 Change-Id: I7699ae6302492eca71787dd0b72e0a5a217a3db2	2019-04-05 15:49:39 -07:00
Michael Pratt	252f877f3d	Set fixed field in CPUID function 2 From the SDM: "The least-significant byte in register EAX (register AL) will always return 01H. Software should ignore this value and not interpret it as an informational descriptor." Unfortunately, online docs [1] [2] (likely based on an old version of the SDM) say: "The least-significant byte in register EAX (register AL) indicates the number of times the CPUID instruction must be executed with an input value of 2 to get a complete description of the processor's caches and TLBs." dlang uses this second interpretation [3] and will loop 2^32 times if we return zero. Fix this by specifying the fixed value of one. We still don't support exposing the actual cache information, leaving all other bytes empty. A zero byte means: "Null descriptor, this byte contains no information." [1] http://www.sandpile.org/x86/cpuid.htm#level_0000_0002h [2] https://c9x.me/x86/html/file_module_x86_id_45.html [3] `424640864c/src/core/cpuid.d (L533-L534)` PiperOrigin-RevId: 242046629 Change-Id: Ic0f0a5f974b20f71391cb85645bdcd4003e5fe88	2019-04-04 18:01:56 -07:00
Andrei Vagin	88409e983c	gvisor: Add support for the MS_NOEXEC mount option https://github.com/google/gvisor/issues/145 PiperOrigin-RevId: 242044115 Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878	2019-04-04 17:43:53 -07:00
Michael Pratt	75a5ccf5d9	Remove defer from trivial ThreadID methods In particular, ns.IDOfTask and tg.ID are used for gettid and getpid, respectively, where removing defer saves ~100ns. This may be a small improvement to application logging, which may call gettid/getpid frequently. PiperOrigin-RevId: 242039616 Change-Id: I860beb62db3fe077519835e6bafa7c74cba6ca80	2019-04-04 17:14:27 -07:00
Adin Scannell	75c8ac38e0	BUILD: Add useful go_path target Change-Id: Ibd6d8a1a63826af6e62a0f0669f8f0866c8091b4 PiperOrigin-RevId: 242037969	2019-04-04 17:05:38 -07:00
Googler	efe4461d74	Internal change. PiperOrigin-RevId: 241867632 Change-Id: I29459f2758ac4835882b491ff25c6aca9a37d41d	2019-04-03 22:02:51 -07:00
Michael Pratt	9cf33960fc	Only CopyOut CPU when it changes This will save copies when preemption is not caused by a CPU migration. PiperOrigin-RevId: 241844399 Change-Id: I2ba3b64aa377846ab763425bd59b61158f576851	2019-04-03 18:06:36 -07:00
Nicolas Lacasse	61d8c361c6	Don't release d.mu in checks for child-existence. Dirent.exists() is called in Create to check whether a child with the given name already exists. Dirent.exists() calls walk(), and before this CL allowed walk() to drop d.mu while calling d.Inode.Lookup. During this existence check, a racing Rename() can acquire d.mu and create a new child of the dirent with the same name. (Note that the source and destination of the rename must be in the same directory, otherwise renameMu will be taken preventing the race.) In this case, d.exists() can return false, even though a child with the same name actually does exist. This CL changes d.exists() so that it does not release d.mu while walking, thus preventing the race with Rename. It also adds comments noting that lockForRename may not take renameMu if the source and destination are in the same directory, as this is a bit surprising (at least it was to me). PiperOrigin-RevId: 241842579 Change-Id: I56524870e39dfcd18cab82054eb3088846c34813	2019-04-03 17:53:56 -07:00
Michael Pratt	4968dd1341	Cache ThreadGroups in PIDNamespace If there are thousands of threads, ThreadGroupsAppend becomes very expensive as it must iterate over all Tasks to find the ThreadGroup leaders. Reduce the cost by maintaining a map of ThreadGroups which can be used to grab them all directly. The one somewhat visible change is to convert PID namespace init children zapping to a group-directed SIGKILL, as Linux did in 82058d668465 "signal: Use group_send_sig_info to kill all processes in a pid namespace". In a benchmark that creates N threads which sleep for two minutes, we see approximately this much CPU time in ThreadGroupsAppend: Before: 1 thread: 0ms 1024 threads: 30ms - 9130ms 4096 threads: 50ms - 2000ms 8192 threads: 18160ms 16384 threads: 17210ms After: 1 thread: 0ms 1024 threads: 0ms 4096 threads: 0ms 8192 threads: 0ms 16384 threads: 0ms The profiling is actually extremely noisy (likely due to cache effects), as some runs show almost no samples at 1024, 4096 threads, but obviously this does not scale to lots of threads. PiperOrigin-RevId: 241828039 Change-Id: I17827c90045df4b3c49b3174f3a05bca3026a72c	2019-04-03 16:22:43 -07:00
Kevin Krakauer	82529becae	Fix index out of bounds in tty implementation. The previous implementation revolved around runes instead of bytes, which caused weird behavior when converting between the two. For example, peekRune would read the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for ?. However, peekRune also returned the length of the byte (1). When calling utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte character ?. tl;dr: I apparently didn't understand runes when I wrote this. PiperOrigin-RevId: 241789081 Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727	2019-04-03 13:00:34 -07:00
Kevin Krakauer	c79e81bd27	Addresses data race in tty implementation. Also makes the safemem reading and writing inline, as it makes it easier to see what locks are held. PiperOrigin-RevId: 241775201 Change-Id: Ib1072f246773ef2d08b5b9a042eb7e9e0284175c	2019-04-03 11:49:55 -07:00
Ian Lewis	77f01ee3c7	Add syscall annotations for unimplemented syscalls Added syscall annotations for unimplemented syscalls for later generation into reference docs. Annotations are of the form: @Syscall(<name>, <key:value>, ...) Supported args and values are: - arg: A syscall option. This entry only applies to the syscall when given this option. - support: Indicates support level - UNIMPLEMENTED: Unimplemented (implies returns:ENOSYS) - PARTIAL: Partial support. Details should be provided in note. - FULL: Full support - returns: Indicates a known return value. Values are syscall errors. This is treated as a string so you can use something like "returns:EPERM or ENOSYS". - issue: A Github issue number. - note: A note Example: // @Syscall(mmap, arg:MAP_PRIVATE, support:FULL, note:Private memory fully supported) // @Syscall(mmap, arg:MAP_SHARED, support:UNIMPLEMENTED, issue:123, note:Shared memory not supported) // @Syscall(setxattr, returns:ENOTSUP, note:Requires file system support) Annotations should be placed as close to their implementation as possible (preferrably as part of a supporting function's Godoc) and should be updated as syscall support changes. PiperOrigin-RevId: 241697482 Change-Id: I7a846135db124e1271dc5057d788cba82ca312d4	2019-04-03 03:10:23 -07:00
Jamie Liu	c4caccd540	Set options on the correct Task in PTRACE_SEIZE. $ docker run --rm --runtime=runsc -it --cap-add=SYS_PTRACE debian bash -c "apt-get update && apt-get install strace && strace ls" ... Setting up strace (4.15-2) ... execve("/bin/ls", ["ls"], [/* 6 vars */]) = 0 brk(NULL) = 0x5646d8c1e000 uname({sysname="Linux", nodename="114ef93d2db3", ...}) = 0 ... PiperOrigin-RevId: 241643321 Change-Id: Ie4bce27a7fb147eef07bbae5895c6ef3f529e177	2019-04-02 18:13:19 -07:00
Nicolas Lacasse	1776ab28f0	Add test that symlinking over a directory returns EEXIST. Also remove comments in InodeOperations that required that implementation of some Create* operations ensure that the name does not already exist, since these checks are all centralized in the Dirent. PiperOrigin-RevId: 241637335 Change-Id: Id098dc6063ff7c38347af29d1369075ad1e89a58	2019-04-02 17:28:36 -07:00
Rahat Mahmood	d14a7de658	Fix more data races in shm debug messages. PiperOrigin-RevId: 241630409 Change-Id: Ie0df5f5a2f20c2d32e615f16e2ba43c88f963181	2019-04-02 16:46:32 -07:00
Wei Zhang	1fcd40719d	device: fix device major/minor Current gvisor doesn't give devices a right major and minor number. When testing golang supporting of gvisor, I run the test case below: ``` $ docker run -ti --runtime runsc golang:1.12.1 bash -c "cd /usr/local/go/src && ./run.bash " ``` And it reports some errors, one of them is: "--- FAIL: TestDevices (0.00s) --- FAIL: TestDevices//dev/null_1:3 (0.00s) dev_linux_test.go:45: for /dev/null Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/null Minor(0x0) == 0, want 3 dev_linux_test.go:51: for /dev/null Mkdev(1, 3) == 0x103, want 0x0 --- FAIL: TestDevices//dev/zero_1:5 (0.00s) dev_linux_test.go:45: for /dev/zero Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/zero Minor(0x0) == 0, want 5 dev_linux_test.go:51: for /dev/zero Mkdev(1, 5) == 0x105, want 0x0 --- FAIL: TestDevices//dev/random_1:8 (0.00s) dev_linux_test.go:45: for /dev/random Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/random Minor(0x0) == 0, want 8 dev_linux_test.go:51: for /dev/random Mkdev(1, 8) == 0x108, want 0x0 --- FAIL: TestDevices//dev/full_1:7 (0.00s) dev_linux_test.go:45: for /dev/full Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/full Minor(0x0) == 0, want 7 dev_linux_test.go:51: for /dev/full Mkdev(1, 7) == 0x107, want 0x0 --- FAIL: TestDevices//dev/urandom_1:9 (0.00s) dev_linux_test.go:45: for /dev/urandom Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/urandom Minor(0x0) == 0, want 9 dev_linux_test.go:51: for /dev/urandom Mkdev(1, 9) == 0x109, want 0x0 " So I think we'd better assign to them correct major/minor numbers following linux spec. Signed-off-by: Wei Zhang <zhangwei198900@gmail.com> Change-Id: I4521ee7884b4e214fd3a261929e3b6dac537ada9 PiperOrigin-RevId: 241609021	2019-04-02 14:51:07 -07:00
Kevin Krakauer	52a51a8e20	Add a raw socket transport endpoint and use it for raw ICMP sockets. Having raw socket code together will make it easier to add support for other raw network protocols. Currently, only ICMP uses the raw endpoint. However, adding support for other protocols such as UDP shouldn't be much more difficult than adding a few switch cases. PiperOrigin-RevId: 241564875 Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf	2019-04-02 11:13:49 -07:00
Rahat Mahmood	7cff746ef2	Save/restore simple devices. We weren't saving simple devices' last allocated inode numbers, which caused inode number reuse across S/R. PiperOrigin-RevId: 241414245 Change-Id: I964289978841ef0a57d2fa48daf8eab7633c1284	2019-04-01 15:39:16 -07:00
Jamie Liu	b4006686d2	Don't expand COW-break on executable VMAs. PiperOrigin-RevId: 241403847 Change-Id: I4631ca05734142da6e80cdfa1a1d63ed68aa05cc	2019-04-01 14:47:31 -07:00
Andrei Vagin	a4b34e2637	gvisor: convert ilist to ilist:generic_list ilist:generic_list works faster (cl/240185278) and the code looks cleaner without type casting. PiperOrigin-RevId: 241381175 Change-Id: I8487ab1d73637b3e9733c253c56dce9e79f0d35f	2019-04-01 12:53:27 -07:00
Jamie Liu	26e8d9981f	Use kernel.Task.CopyScratchBuffer in syscalls/linux where possible. PiperOrigin-RevId: 241072126 Change-Id: Ib4d9f58f550732ac4c5153d3cf159a5b1a9749da	2019-03-29 16:25:33 -07:00
Nicolas Lacasse	e8fef3d873	Treat fsync errors during save as SaveRejection errors. PiperOrigin-RevId: 241055485 Change-Id: I70259e9fef59bdf9733b35a2cd3319359449dd45	2019-03-29 14:48:16 -07:00
Michael Pratt	d11ef20a93	Drop reference on shared anon mappable We call NewSharedAnonMappable simply to use it for Mappable/MappingIdentity for shared anon mmap. From MMapOpts.MappingIdentity: "If MMapOpts is used to successfully create a memory mapping, a reference is taken on MappingIdentity." mm.createVMALocked (below) takes this additional reference, so we don't need the reference returned by NewSharedAnonMappable. Holding it leaks the mappable. PiperOrigin-RevId: 241038108 Change-Id: I78ee3af78e0cc7aac4063b274b30d0e41eb5677d	2019-03-29 13:17:56 -07:00
Jamie Liu	69afd0438e	Return srclen in proc.idMapFileOperations.Write. PiperOrigin-RevId: 241037926 Change-Id: I4b0381ac1c7575e8b861291b068d3da22bc03850	2019-03-29 13:16:46 -07:00
Nicolas Lacasse	ed23f54709	Treat ENOSPC as a state-file error during save. PiperOrigin-RevId: 241028806 Change-Id: I770bf751a2740869a93c3ab50370a727ae580470	2019-03-29 12:26:25 -07:00
Bhasker Hariharan	45c54b1f4e	Fix incorrect checksums in TCP and UDP tests. PiperOrigin-RevId: 241025361 Change-Id: I292e7aea9a4b294b11e4f736e107010d9524586b	2019-03-29 12:05:43 -07:00
Bhasker Hariharan	cc0e96a4bd	Fix Panic in SACKScoreboard.Delete. The panic was caused by modifying the tree while iterating which invalidated the iterator. Also fixes another bug in SACKScoreboard.Insert() which was causing blocks to be merged incorrectly. PiperOrigin-RevId: 240895053 Change-Id: Ia72b8244297962df5c04283346da5226434740af	2019-03-28 18:18:39 -07:00
chris.zn	31c2236e97	set task's name when fork When fork a child process, the name filed of TaskContext is not set. It results in that when we cat /proc/{pid}/status, the name filed is null. Like this: Name: State: S (sleeping) Tgid: 28 Pid: 28 PPid: 26 TracerPid: 0 FDSize: 8 VmSize: 89712 kB VmRSS: 6648 kB Threads: 1 CapInh: 00000000a93d35fb CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 00000000a93d35fb Seccomp: 0 Change-Id: I5d469098c37cedd19da16b7ffab2e546a28a321e PiperOrigin-RevId: 240893304	2019-03-28 18:05:42 -07:00
Nicolas Lacasse	99195b0e16	Setting timestamps should trigger an inotify event. PiperOrigin-RevId: 240850187 Change-Id: I1458581b771a1031e47bba439e480829794927b8	2019-03-28 14:15:23 -07:00
Bert Muthalaly	f2e5dcf21c	Add ICMP stats PiperOrigin-RevId: 240848882 Change-Id: I23dd4599f073263437aeab357c3f767e1a432b82	2019-03-28 14:09:20 -07:00
Googler	e373d3642e	Internal change. PiperOrigin-RevId: 240842801 Change-Id: Ibbd6f849f9613edc1b1dd7a99a97d1ecdb6e9188	2019-03-28 13:43:47 -07:00
Jamie Liu	f005350c93	Clean up gofer handle caching. - Document fsutil.CachedFileObject.FD() requirements on access permissions, and change gofer.inodeFileState.FD() to honor them. Fixes #147. - Combine gofer.inodeFileState.readonly and gofer.inodeFileState.readthrough, and simplify handle caching logic. - Inline gofer.cachePolicy.cacheHandles into gofer.inodeFileState.setSharedHandles, because users with access to gofer.inodeFileState don't necessarily have access to the fs.Inode (predictably, this is a save/restore problem). Before this CL: $ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash root@34d51017ed67:/# /root/repro/runsc-b147 mmap: 0x7f3c01e45000 Segmentation fault After this CL: $ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash root@d3c3cb56bbf9:/# /root/repro/runsc-b147 mmap: 0x7f78987ec000 o PiperOrigin-RevId: 240818413 Change-Id: I49e1d4a81a0cb9177832b0a9f31a10da722a896b	2019-03-28 11:43:51 -07:00
Andrei Vagin	f4105ac21a	netstack/fdbased: add generic segmentation offload (GSO) support The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 240809103 Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a	2019-03-28 11:03:41 -07:00
Nicolas Lacasse	9c18897887	Add rsslim field in /proc/pid/stat. PiperOrigin-RevId: 240681675 Change-Id: Ib214106e303669fca2d5c744ed5c18e835775161	2019-03-27 17:44:38 -07:00
Tamir Duberstein	8406504817	Avoid mutating memory passed to DeliverTransportPacket PiperOrigin-RevId: 240642903 Change-Id: I16625015123a827d267d60b328a202057264bbd6	2019-03-27 14:36:57 -07:00
Nicolas Lacasse	2d355f0e8f	Add start time to /proc/<pid>/stat. The start time is the number of clock ticks between the boot time and application start time. PiperOrigin-RevId: 240619475 Change-Id: Ic8bd7a73e36627ed563988864b0c551c052492a5	2019-03-27 12:41:27 -07:00
Nicolas Lacasse	645af7cdd8	Dev device methods should take pointer receiver. PiperOrigin-RevId: 240600504 Change-Id: I7dd5f27c8da31f24b68b48acdf8f1c19dbd0c32d	2019-03-27 11:08:50 -07:00
Jamie Liu	26583e413e	Convert []byte to string without copying in usermem.CopyStringIn. This is the same technique used by Go's strings.Builder (https://golang.org/src/strings/builder.go#L45), and for the same reason. (We can't just use strings.Builder because there's no way to get the underlying []byte to pass to usermem.IO.CopyIn.) PiperOrigin-RevId: 240594892 Change-Id: Ic070e7e480aee53a71289c7c120850991358c52c	2019-03-27 10:46:28 -07:00
Tamir Duberstein	9c20a88bd7	Remove polling from ICMP test PiperOrigin-RevId: 240483396 Change-Id: Ie75d3ae38af83f1d92f167ff9ba58fa10f5b372b	2019-03-26 20:20:52 -07:00
Michael Pratt	e9152d4a62	Automated rollback of changelist 234892473 PiperOrigin-RevId: 240462667 Change-Id: I3d1c5c0d80a3badced963ae1d450c20ed8a767ed	2019-03-26 17:27:48 -07:00
Andrei Vagin	654e878abb	netstack: Don't exclude length when a pseudo-header checksum is calculated This is a preparation for GSO changes (cl/234508902). RELNOTES[gofers]: Refactor checksum code to include length, which it already did, but in a convoluted way. Should be a no-op. PiperOrigin-RevId: 240460794 Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2	2019-03-26 17:15:13 -07:00
Rahat Mahmood	06ec97a3f8	Implement memfd_create. Memfds are simply anonymous tmpfs files with no associated mounts. Also implementing file seals, which Linux only implements for memfds at the moment. PiperOrigin-RevId: 240450031 Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f	2019-03-26 16:16:57 -07:00
Tamir Duberstein	9cd2b66f10	Remove echoReplier Mirror the ICMPv6 echo implementation in ICMPv4 echo. This removes unnecessary asynchrony, reduces copying, and reduces complexity. PiperOrigin-RevId: 240394525 Change-Id: If8f53254154f86772f5e51159765aa23b3b328b8	2019-03-26 11:45:01 -07:00
Tamir Duberstein	23a5306b5c	Resolve stringer TODO PiperOrigin-RevId: 240224782 Change-Id: Iab4e4e7047b2d022f15e807c2348685d8e972020	2019-03-25 14:59:58 -07:00
Jamie Liu	f3723f8059	Call memmap.Mappable.Translate with more conservative usermem.AccessType. MM.insertPMAsLocked() passes vma.maxPerms to memmap.Mappable.Translate (although it unsets AccessType.Write if the vma is private). This somewhat simplifies handling of pmas, since it means only COW-break needs to replace existing pmas. However, it also means that a MAP_SHARED mapping of a file opened O_RDWR dirties the file, regardless of the mapping's permissions and whether or not the mapping is ever actually written to with I/O that ignores permissions (e.g. ptrace(PTRACE_POKEDATA)). To fix this: - Change the pma-getting path to request only the permissions that are required for the calling access. - Change memmap.Mappable.Translate to take requested permissions, and return allowed permissions. This preserves the existing behavior in the common cases where the memmap.Mappable isn't fsutil.CachingInodeOperations and doesn't care if the translated platform.File pages are written to. - Change the MM.getPMAsLocked path to support permission upgrading of pmas outside of copy-on-write. PiperOrigin-RevId: 240196979 Change-Id: Ie0147c62c1fbc409467a6fa16269a413f3d7d571	2019-03-25 12:42:43 -07:00
Andrei Vagin	ddc05e3053	epoll: use ilist:generic_list instead of ilist:ilist ilist:generic_list works faster than ilist:ilist. Here is a beanchmark test to measure performance of epoll_wait, when readyList isn't empty. It shows about 30% better performance with these changes. Benchmark Time(ns) CPU(ns) Iterations Before: BM_EpollAllEvents 46725 46899 14286 After: BM_EpollAllEvents 33167 33300 18919 PiperOrigin-RevId: 240185278 Change-Id: I3e33f9b214db13ab840b91613400525de5b58d18	2019-03-25 11:41:50 -07:00
Nicolas Lacasse	b81bfd6013	lstat should resolve the final path component if it ends in a slash. PiperOrigin-RevId: 239896221 Change-Id: I0949981fe50c57131c5631cdeb10b225648575c0	2019-03-22 17:38:13 -07:00
Jamie Liu	3d0b960112	Implement PTRACE_SEIZE, PTRACE_INTERRUPT, and PTRACE_LISTEN. PiperOrigin-RevId: 239803092 Change-Id: I42d612ed6a889e011e8474538958c6de90c6fcab	2019-03-22 08:55:44 -07:00
Yong He	45ba52f824	Allow BP and OF can be called from user space Change the DPL from 0 to 3 for Breakpoint and Overflow, then user space could trigger Breakpoint and Overflow as excepected. Change-Id: Ibead65fb8c98b32b7737f316db93b3a8d9dcd648 PiperOrigin-RevId: 239736648	2019-03-21 22:04:50 -07:00
Kevin Krakauer	0cd5f20044	Replace manual pty copies to/from userspace with safemem operations. Also, changing queue.writeBuf from a buffer.Bytes to a [][]byte should reduce copying and reallocating of slices. PiperOrigin-RevId: 239713547 Change-Id: I6ee5ff19c3ee2662f1af5749cae7b73db0569e96	2019-03-21 18:05:07 -07:00
Ian Gudger	ba828233b9	Clear msghdr flags on successful recvmsg. .net sets these flags to -1 and then uses their result, especting it to be zero. Does not set actual flags (e.g. MSG_TRUNC), but setting to zero is more correct than what we did before. PiperOrigin-RevId: 239657951 Change-Id: I89c5f84bc9b94a2cd8ff84e8ecfea09e01142030	2019-03-21 13:19:11 -07:00
Andrei Vagin	064fda1a75	gvisor: don't allocate a new credential object on fork A credential object is immutable, so we don't need to copy it for a new task. PiperOrigin-RevId: 239519266 Change-Id: I0632f641fdea9554779ac25d84bee4231d0d18f2	2019-03-20 18:41:00 -07:00
Rahat Mahmood	81f4829d11	Record sockets created during accept(2) for all families. Track new sockets created during accept(2) in the socket table for all families. Previously we were only doing this for unix domain sockets. PiperOrigin-RevId: 239475550 Change-Id: I16f009f24a06245bfd1d72ffd2175200f837c6ac	2019-03-20 14:31:16 -07:00
Andrei Vagin	9f4e1cb797	netstack: adjust the sequence number after trimming the packet PiperOrigin-RevId: 239417224 Change-Id: I14a9adc31a6330a79a6156c105969cd5f1f63d20	2019-03-20 09:58:10 -07:00
Andrei Vagin	87cce0ec08	netstack: reduce MSS from SYN to account tcp options See: https://tools.ietf.org/html/rfc6691#section-2 PiperOrigin-RevId: 239305632 Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e	2019-03-19 17:33:20 -07:00
Fabricio Voznika	7b33df6845	Fix data race in netlink send buffer size PiperOrigin-RevId: 239221041 Change-Id: Icc19e32a00fa89167447ab2f45e90dcfd61bea04	2019-03-19 10:38:50 -07:00
Bert Muthalaly	928809fa7d	Add layer 2 stats (tx, rx) X (packets, bytes) to netstack PiperOrigin-RevId: 239194420 Change-Id: Ie193e8ac2b7a6db21195ac85824a335930483971	2019-03-19 08:30:43 -07:00
Michael Pratt	8a499ae65f	Remove references to replaced child in Rename in ramfs/agentfs In the case of a rename replacing an existing destination inode, ramfs Rename failed to first remove the replaced inode. This caused: 1. A leak of a reference to the inode (making it live indefinitely). 2. For directories, a leak of the replaced directory's .. link to the parent. This would cause the parent's link count to incorrectly increase. (2) is much simpler to test than (1), so that's what I've done. agentfs has a similar bug with link count only, so the Dirent layer informs the Inode if this is a replacing rename. Fixes #133 PiperOrigin-RevId: 239105698 Change-Id: I4450af2462d8ae3339def812287213d2cbeebde0	2019-03-18 18:40:06 -07:00
Rahat Mahmood	cea1dd7d21	Remove racy access to shm fields. PiperOrigin-RevId: 239016776 Change-Id: Ia7af4258e7c69b16a4630a6f3278aa8e6b627746	2019-03-18 10:49:03 -07:00
Tamir Duberstein	5496be7c5d	Remove duplicate TCP flag definitions PiperOrigin-RevId: 238467634 Change-Id: If4cd8efff7386fbee1195f051d15549b495910a9	2019-03-14 10:19:21 -07:00
Jamie Liu	8f4634997b	Decouple filemem from platform and move it to pgalloc.MemoryFile. This is in preparation for improved page cache reclaim, which requires greater integration between the page cache and page allocator. PiperOrigin-RevId: 238444706 Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4	2019-03-14 08:12:48 -07:00
Jamie Liu	fb9919881c	Use WalkGetAttr in gofer.inodeOperations.Create. p9.Twalk.handle() with a non-empty path also stats the walked-to path anyway, so the preceding GetAttr is completely wasted. PiperOrigin-RevId: 238440645 Change-Id: I7fbc7536f46b8157639d0d1f491e6aaa9ab688a3	2019-03-14 07:43:15 -07:00
Nicolas Lacasse	2512cc5617	Allow filesystem.Mount to take an optional interface argument. PiperOrigin-RevId: 238360231 Change-Id: I5eaf8d26f8892f77d71c7fbd6c5225ef471cedf1	2019-03-13 19:24:03 -07:00
Kevin Krakauer	f97c4f1b7a	Remove unused function. PiperOrigin-RevId: 238336475 Change-Id: I8131e04699028246ebc233953ebb3feca5673940	2019-03-13 16:40:10 -07:00
Fabricio Voznika	70d0613444	Reduce PACKET_RX_RING memory usage Previous memory allocation was excessive (80 MB). Changed it to use 2 MB instead. There is no drop in perfomance due to this change: ab -n 100 -c 10 http://server/latin10m.txt ==> 10 MB file 80 MB: 178 MB/s 2 MB: 181 MB/s PiperOrigin-RevId: 238321594 Change-Id: I1c8aed13cad5d75f4506d2b406b305117055fbe5	2019-03-13 15:25:13 -07:00
Noah Gold	8003bd6a5c	Make gonet.PacketConn implement net.Conn. gonet.PacketConn now implements net.Conn, allowing it to be returned from net.Dial.Dialer functions. PiperOrigin-RevId: 238111980 Change-Id: I174884385ff4d9b8e9918fac7bbb5b93ca366ba7	2019-03-12 15:36:33 -07:00
Ian Gudger	a16f6e50c5	Make HandleLocal apply to all non-loopback interfaces. HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify the implementations. This has the benefit of making HandleLocal apply even when the fdbased link endpoint isn't in use. In addition, move looping logic to route creation so that it doesn't need to be run for each packet. This should improve performance. PiperOrigin-RevId: 238099480 Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23	2019-03-12 14:37:56 -07:00
Jamie Liu	8930e79ebf	Clarify the platform.File interface. - Redefine some memmap.Mappable, platform.File, and platform.Memory semantics in terms of File reference counts (no functional change). - Make AddressSpace.MapFile take a platform.File instead of a raw FD, and replace platform.File.MapInto with platform.File.FD. This allows kvm.AddressSpace.MapFile to always use platform.File.MapInternal instead of maintaining its own (redundant) cache of file mappings in the sentry address space. PiperOrigin-RevId: 238044504 Change-Id: Ib73a11e4275c0da0126d0194aa6c6017a9cef64f	2019-03-12 10:29:16 -07:00
Adin Scannell	6e6dbf0e56	kvm: minimum guest/host timekeeping delta. PiperOrigin-RevId: 237927368 Change-Id: I359badd1967bb118fe74eab3282c946c18937edc	2019-03-11 18:19:45 -07:00
Fabricio Voznika	bc9b979b94	Add profiling commands to runsc Example: runsc debug --root=<dir> \ --profile-heap=/tmp/heap.prof \ --profile-cpu=/tmp/cpu.prod --profile-delay=30 \ <container ID> PiperOrigin-RevId: 237848456 Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773	2019-03-11 11:47:30 -07:00
Ian Gudger	71d53382bf	Fix getsockopt(IP_MULTICAST_IF). getsockopt(IP_MULTICAST_IF) only supports struct in_addr. Also adds support for setsockopt(IP_MULTICAST_IF) with struct in_addr. PiperOrigin-RevId: 237620230 Change-Id: I75e7b5b3e08972164eb1906f43ddd67aedffc27c	2019-03-09 11:40:51 -08:00
Ian Gudger	281092e842	Make IP_MULTICAST_LOOP and IP_MULTICAST_TTL allow setting int or char. This is the correct Linux behavior, and at least PHP depends on it. PiperOrigin-RevId: 237565639 Change-Id: I931af09c8ed99a842cf70d22bfe0b65e330c4137	2019-03-08 20:27:58 -08:00
Ian Gudger	86036f979b	Validate multicast addresses in multicast group operations. PiperOrigin-RevId: 237559843 Change-Id: I93a9d83a08cd3d49d5fc7fcad5b0710d0aa04aaa	2019-03-08 19:05:26 -08:00
Ian Gudger	56a6128295	Implement IP_MULTICAST_LOOP. IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default route are looped back. In order to implement this switch, support for sending and looping back multicast packets on the default route had to be implemented. For now we only support IPv4 multicast. PiperOrigin-RevId: 237534603 Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c	2019-03-08 15:49:17 -08:00
Nicolas Lacasse	fbacb35039	No need to check for negative uintptr. Fixes #134 PiperOrigin-RevId: 237128306 Change-Id: I396e808484c18931fc5775970ec1f5ae231e1cb9	2019-03-06 15:06:46 -08:00
Fabricio Voznika	0b76887147	Priority-inheritance futex implementation It is Implemented without the priority inheritance part given that gVisor defers scheduling decisions to Go runtime and doesn't have control over it. PiperOrigin-RevId: 236989545 Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560	2019-03-05 23:40:18 -08:00
Bhasker Hariharan	1718fdd1a8	Add new retransmissions and recovery related metrics. PiperOrigin-RevId: 236945145 Change-Id: I051760d95154ea5574c8bb6aea526f488af5e07b	2019-03-05 16:41:44 -08:00
Kevin Krakauer	23e66ee96d	Remove unused commit() function argument to Bind. PiperOrigin-RevId: 236926132 Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181	2019-03-05 14:53:34 -08:00
Tamir Duberstein	dcb634ce73	Remove duplicate SetSockOpt call Clean up some error handling, and add TODO explaining incorrect behaviour with respect to broadcast on interfaces lacking an IP address. PiperOrigin-RevId: 236756233 Change-Id: I9662e7dc062c90565a32a3e153c4dbc98c55b522	2019-03-04 17:17:30 -08:00
Nicolas Lacasse	0d683c9961	Make tmpfs respect MountNoATime now that fs.Handle is gone. PiperOrigin-RevId: 236752802 Change-Id: I9e50600b2ae25d5f2ac632c4405a7a185bdc3c92	2019-03-04 16:57:14 -08:00
Tamir Duberstein	bc70897bb4	Reconcile DHCP with SO_BROADCAST Now that we have SO_BROADCAST, we don't need (some of) the hackery in the DHCP client. This also fixes a bizarre regression observed in Fuchsia where DHCP acquisition was taking over two minutes. PiperOrigin-RevId: 236661954 Change-Id: Ibcfe5d311fa5df8ff4ff2e40ccedffe91f92daa5	2019-03-04 09:01:03 -08:00
Adin Scannell	d811c1016d	ptrace: drop old FIXME The globalPool uses a sync.Once mechanism for initialization, and no cleanup is strictly required. It's not really feasible to have the platform implement a full creation -> destruction cycle (due to the way filters are assumed to be installed), so drop the FIXME. PiperOrigin-RevId: 236385278 Change-Id: I98ac660ed58cc688d8a07147d16074a3e8181314	2019-03-01 15:05:18 -08:00
Nicolas Lacasse	9177bcd0ba	DecRef replaced dirent in inode_overlay. PiperOrigin-RevId: 236352158 Change-Id: Ide5104620999eaef6820917505e7299c7b0c5a03	2019-03-01 11:58:59 -08:00
Fabricio Voznika	3dbd4a16f8	Add semctl(GETPID) syscall Also added unimplemented notification for semctl(2) commands. PiperOrigin-RevId: 236340672 Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478	2019-03-01 10:57:02 -08:00
Michael Pratt	7693b7469f	Format capget/capset arguments I0225 15:32:10.795034 4166 x:0] [ 6] E capget(0x7f477fdff8c8 {Version: 3, Pid: 0}, 0x7f477fdff8b0) I0225 15:32:10.795059 4166 x:0] [ 6] X capget(0x7f477fdff8c8 {Version: 3, Pid: 0}, 0x7f477fdff8b0 {Permitted: CAP_CHOWN\|CAP_DAC_OVERRIDE\|CAP_DAC_READ_SEARCH\|CAP_FOWNER\|CAP_FSETID\|CAP_KILL\|CAP_SETGID\|CAP_SETUID\|CAP_SETPCAP\|CAP_LINUX_IMMUTABLE\|CAP_NET_BIND_SERVICE\|CAP_NET_BROADCAST\|CAP_NET_ADMIN\|CAP_NET_RAW\|CAP_IPC_LOCK\|CAP_IPC_OWNER\|CAP_SYS_MODULE\|CAP_SYS_RAWIO\|CAP_SYS_CHROOT\|CAP_SYS_PTRACE\|CAP_SYS_PACCT\|CAP_SYS_ADMIN\|CAP_SYS_BOOT\|CAP_SYS_NICE\|CAP_SYS_RESOURCE\|CAP_SYS_TIME\|CAP_SYS_TTY_CONFIG\|CAP_MKNOD\|CAP_LEASE\|CAP_AUDIT_WRITE\|CAP_AUDIT_CONTROL\|CAP_SETFCAP\|CAP_MAC_OVERRIDE\|CAP_MAC_ADMIN\|CAP_SYSLOG\|CAP_WAKE_ALARM\|CAP_BLOCK_SUSPEND\|CAP_AUDIT_READ, Inheritable: CAP_CHOWN\|CAP_DAC_OVERRIDE\|CAP_DAC_READ_SEARCH\|CAP_FOWNER\|CAP_FSETID\|CAP_KILL\|CAP_SETGID\|CAP_SETUID\|CAP_SETPCAP\|CAP_LINUX_IMMUTABLE\|CAP_NET_BIND_SERVICE\|CAP_NET_BROADCAST\|CAP_NET_ADMIN\|CAP_NET_RAW\|CAP_IPC_LOCK\|CAP_IPC_OWNER\|CAP_SYS_MODULE\|CAP_SYS_RAWIO\|CAP_SYS_CHROOT\|CAP_SYS_PTRACE\|CAP_SYS_PACCT\|CAP_SYS_ADMIN\|CAP_SYS_BOOT\|CAP_SYS_NICE\|CAP_SYS_RESOURCE\|CAP_SYS_TIME\|CAP_SYS_TTY_CONFIG\|CAP_MKNOD\|CAP_LEASE\|CAP_AUDIT_WRITE\|CAP_AUDIT_CONTROL\|CAP_SETFCAP\|CAP_MAC_OVERRIDE\|CAP_MAC_ADMIN\|CAP_SYSLOG\|CAP_WAKE_ALARM\|CAP_BLOCK_SUSPEND\|CAP_AUDIT_READ, Effective: 0x0}) = 0x0 (3.399?s) I0225 15:32:10.795114 4166 x:0] [ 6] E capset(0x7f477fdff8c8 {Version: 3, Pid: 0}, 0x7f477fdff8b0 {Permitted: CAP_CHOWN\|CAP_DAC_OVERRIDE\|CAP_DAC_READ_SEARCH\|CAP_FOWNER\|CAP_FSETID\|CAP_KILL\|CAP_SETGID\|CAP_SETUID\|CAP_SETPCAP\|CAP_LINUX_IMMUTABLE\|CAP_NET_BIND_SERVICE\|CAP_NET_BROADCAST\|CAP_NET_ADMIN\|CAP_NET_RAW\|CAP_IPC_LOCK\|CAP_IPC_OWNER\|CAP_SYS_MODULE\|CAP_SYS_RAWIO\|CAP_SYS_CHROOT\|CAP_SYS_PTRACE\|CAP_SYS_PACCT\|CAP_SYS_ADMIN\|CAP_SYS_BOOT\|CAP_SYS_NICE\|CAP_SYS_RESOURCE\|CAP_SYS_TIME\|CAP_SYS_TTY_CONFIG\|CAP_MKNOD\|CAP_LEASE\|CAP_AUDIT_WRITE\|CAP_AUDIT_CONTROL\|CAP_SETFCAP\|CAP_MAC_OVERRIDE\|CAP_MAC_ADMIN\|CAP_SYSLOG\|CAP_WAKE_ALARM\|CAP_BLOCK_SUSPEND\|CAP_AUDIT_READ, Inheritable: CAP_CHOWN\|CAP_DAC_OVERRIDE\|CAP_DAC_READ_SEARCH\|CAP_FOWNER\|CAP_FSETID\|CAP_KILL\|CAP_SETGID\|CAP_SETUID\|CAP_SETPCAP\|CAP_LINUX_IMMUTABLE\|CAP_NET_BIND_SERVICE\|CAP_NET_BROADCAST\|CAP_NET_ADMIN\|CAP_NET_RAW\|CAP_IPC_LOCK\|CAP_IPC_OWNER\|CAP_SYS_MODULE\|CAP_SYS_RAWIO\|CAP_SYS_CHROOT\|CAP_SYS_PTRACE\|CAP_SYS_PACCT\|CAP_SYS_ADMIN\|CAP_SYS_BOOT\|CAP_SYS_NICE\|CAP_SYS_RESOURCE\|CAP_SYS_TIME\|CAP_SYS_TTY_CONFIG\|CAP_MKNOD\|CAP_LEASE\|CAP_AUDIT_WRITE\|CAP_AUDIT_CONTROL\|CAP_SETFCAP\|CAP_MAC_OVERRIDE\|CAP_MAC_ADMIN\|CAP_SYSLOG\|CAP_WAKE_ALARM\|CAP_BLOCK_SUSPEND\|CAP_AUDIT_READ, Effective: CAP_FOWNER}) I0225 15:32:10.795127 4166 x:0] [ 6] X capset(0x7f477fdff8c8 {Version: 3, Pid: 0}, 0x7f477fdff8b0 {Permitted: CAP_CHOWN\|CAP_DAC_OVERRIDE\|CAP_DAC_READ_SEARCH\|CAP_FOWNER\|CAP_FSETID\|CAP_KILL\|CAP_SETGID\|CAP_SETUID\|CAP_SETPCAP\|CAP_LINUX_IMMUTABLE\|CAP_NET_BIND_SERVICE\|CAP_NET_BROADCAST\|CAP_NET_ADMIN\|CAP_NET_RAW\|CAP_IPC_LOCK\|CAP_IPC_OWNER\|CAP_SYS_MODULE\|CAP_SYS_RAWIO\|CAP_SYS_CHROOT\|CAP_SYS_PTRACE\|CAP_SYS_PACCT\|CAP_SYS_ADMIN\|CAP_SYS_BOOT\|CAP_SYS_NICE\|CAP_SYS_RESOURCE\|CAP_SYS_TIME\|CAP_SYS_TTY_CONFIG\|CAP_MKNOD\|CAP_LEASE\|CAP_AUDIT_WRITE\|CAP_AUDIT_CONTROL\|CAP_SETFCAP\|CAP_MAC_OVERRIDE\|CAP_MAC_ADMIN\|CAP_SYSLOG\|CAP_WAKE_ALARM\|CAP_BLOCK_SUSPEND\|CAP_AUDIT_READ, Inheritable: CAP_CHOWN\|CAP_DAC_OVERRIDE\|CAP_DAC_READ_SEARCH\|CAP_FOWNER\|CAP_FSETID\|CAP_KILL\|CAP_SETGID\|CAP_SETUID\|CAP_SETPCAP\|CAP_LINUX_IMMUTABLE\|CAP_NET_BIND_SERVICE\|CAP_NET_BROADCAST\|CAP_NET_ADMIN\|CAP_NET_RAW\|CAP_IPC_LOCK\|CAP_IPC_OWNER\|CAP_SYS_MODULE\|CAP_SYS_RAWIO\|CAP_SYS_CHROOT\|CAP_SYS_PTRACE\|CAP_SYS_PACCT\|CAP_SYS_ADMIN\|CAP_SYS_BOOT\|CAP_SYS_NICE\|CAP_SYS_RESOURCE\|CAP_SYS_TIME\|CAP_SYS_TTY_CONFIG\|CAP_MKNOD\|CAP_LEASE\|CAP_AUDIT_WRITE\|CAP_AUDIT_CONTROL\|CAP_SETFCAP\|CAP_MAC_OVERRIDE\|CAP_MAC_ADMIN\|CAP_SYSLOG\|CAP_WAKE_ALARM\|CAP_BLOCK_SUSPEND\|CAP_AUDIT_READ, Effective: CAP_FOWNER}) = 0x0 (3.062?s) Not the most readable, but better than just a pointer. PiperOrigin-RevId: 236338875 Change-Id: I4b83f778122ab98de3874e16f4258dae18da916b	2019-03-01 10:46:36 -08:00
Michael Pratt	088c6522b2	Fix typo PiperOrigin-RevId: 236239090 Change-Id: I92e63d6f4b52b78852626c87743fdd86175e09d3	2019-02-28 18:47:13 -08:00
Fabricio Voznika	3b44377eda	Fix "-c dbg" build break Remove allocation from vCPU.die() to save stack space. Closes #131 PiperOrigin-RevId: 236238102 Change-Id: Iafca27a1a3a472d4cb11dcda9a2060e585139d11	2019-02-28 18:38:34 -08:00
Ruidong Cao	3851705a73	Fix procfs bugs Current procfs has some bugs. After executing ls twice, many dirs come out with same name like "1" or ".". Files like "cpuinfo" disappear. Here variable names is a slice with cap() > len(). Sort after appending to it will not alloc a new space and impact orignal slice. Same to m. Signed-off-by: Ruidong Cao <crdfrank@gmail.com> Change-Id: I83e5cd1c7968c6fe28c35ea4fee497488d4f9eef PiperOrigin-RevId: 236222270	2019-02-28 16:44:54 -08:00
Michael Pratt	f7df9d72cf	Upgrade to Go 1.12 PiperOrigin-RevId: 236218980 Change-Id: I82cb4aeb2a56524ee1324bfea2ad41dce26db354	2019-02-28 16:26:14 -08:00
Tamir Duberstein	3830786883	Map IPv{4,6} addresses to ethernet addresses ...in accordance with RFCs 1112 and 2464. Fixes IPv4 multicast when IP_MULTICAST_IF is specified. Don't return ErrNoRoute when no route is needed. Don't set Route.NextHop when no route is needed. PiperOrigin-RevId: 236199813 Change-Id: I48ed33e1b7f760deaa37e18ad7f1b8b62819ab43	2019-02-28 14:38:32 -08:00
Jamie Liu	05d721f9ee	Hold dataMu for writing in CachingInodeOperations.WriteOut. fsutil.SyncDirtyAll mutates the DirtySet. PiperOrigin-RevId: 236183349 Change-Id: I7e809d5b406ac843407e61eff17d81259a819b4f	2019-02-28 13:14:43 -08:00
Kevin Krakauer	121db29a93	Ping support via IPv4 raw sockets. Broadly, this change: * Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`. * Passes the network-layer (IP) header up the stack to the transport endpoint, which can pass it up to the socket layer. This allows a raw socket to return the entire IP packet to users. * Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer that enable incoming packets to be delivered to raw endpoints. New raw sockets of other protocols (not ICMP) just need to register with the stack. * Enables ping.endpoint to return IP headers when created via SOCK_RAW. PiperOrigin-RevId: 235993280 Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a	2019-02-27 14:31:21 -08:00
Nicolas Lacasse	d516ee3312	Allow overlay to merge Directories and SepcialDirectories. Needed to mount inside /proc or /sys. PiperOrigin-RevId: 235936529 Change-Id: Iee6f2671721b1b9b58a3989705ea901322ec9206	2019-02-27 09:45:45 -08:00
Fabricio Voznika	cff2c57192	Fix bad merge PiperOrigin-RevId: 235818534 Change-Id: I99f7e3fd1dc808b35f7a08b96b7c3226603ab808	2019-02-26 16:42:06 -08:00
Googler	12d9cf6fab	Adds a WriteRawPacket method to the InjectableLinkEndpoint interface. Also exposes ipv4.MaxTotalSize since it is a generally useful constant. PiperOrigin-RevId: 235799755 Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13	2019-02-26 14:58:37 -08:00
Ruidong Cao	a2b794b30d	FPE_INTOVF (integer overflow) should be 2 refer to Linux. Signed-off-by: Ruidong Cao <crdfrank@gmail.com> Change-Id: I03f8ab25cf29257b31f145cf43304525a93f3300 PiperOrigin-RevId: 235763203	2019-02-26 11:48:49 -08:00
Fabricio Voznika	23fe059761	Lazily allocate inotify map on inode PiperOrigin-RevId: 235735865 Change-Id: I84223eb18eb51da1fa9768feaae80387ff6bfed0	2019-02-26 09:33:44 -08:00
Amanda Tait	33d0e824c7	Use more conservative locking in NIC.DeliverNetworkPacket An earlier CL excessively minimizes the period in which it holds a lock on NIC. This earlier CL had done this out of the mistaken impression it fixed a broken test, when in fact it just reduced the rate of failure of a flaky test in tcp_test.go. This new change holds the lock on NIC for the duration of the loop over n.endpoints. PiperOrigin-RevId: 235732487 Change-Id: I53ee6df264f093ddc4d29e9acdcba6b4838cb112	2019-02-26 09:10:37 -08:00
Bhasker Hariharan	26be25e4ec	Add a SACK scoreboard to TCP endpoints. This change does not make use of SACK information but adds support to track SACK information and store it in the endpoint. The actual SACK based recovery will be in a separate CL. Part of commits to add RFC 6675 support to Netstack. PiperOrigin-RevId: 235612264 Change-Id: I261f94844d7bad5abda803152ce6cc6125a467ff	2019-02-25 15:20:04 -08:00
Jamie Liu	41167e6c50	Don't call WalkGetAttr for walk(names=[]). PiperOrigin-RevId: 235587729 Change-Id: I37074416b10a30ca3a00d11bcde338d8d979beaf	2019-02-25 13:03:56 -08:00
Fabricio Voznika	10426e0f31	Handle invalid offset in sendfile(2) PiperOrigin-RevId: 235578698 Change-Id: I608ff5e25eac97f6e1bda058511c1f82b0e3b736	2019-02-25 12:17:46 -08:00
Amanda Tait	c14a1a1618	Fix race condition in NIC.DeliverNetworkPacket cl/234850781 introduced a race condition in NIC.DeliverNetworkPacket by failing to hold a lock. This change fixes this regressesion by acquiring a read lock before iterating through n.endpoints, and then releasing the lock once iteration is complete. PiperOrigin-RevId: 235549770 Change-Id: Ib0133288be512d478cf759c3314dc95ec3205d4b	2019-02-25 10:02:29 -08:00
Googler	317c0324c9	Internal change. PiperOrigin-RevId: 235447861 Change-Id: Ic6ba5e0ed89f1b85651da084be70ef8d0ffc13cf	2019-02-24 17:31:59 -08:00
Kevin Krakauer	b75aa51504	Rename ping endpoints to icmp endpoints. PiperOrigin-RevId: 235248572 Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09	2019-02-22 13:34:47 -08:00
Googler	532f4b2fba	Internal change. PiperOrigin-RevId: 235053594 Change-Id: Ie3d7b11843d0710184a2463886c7034e8f5305d1	2019-02-21 13:08:34 -08:00
Michael Pratt	b2a5ad047a	Automated rollback of changelist 234680481 PiperOrigin-RevId: 234892473 Change-Id: Ie568c67d299082a008a1cf9802942e5e03746501	2019-02-20 16:27:56 -08:00
Haibo Xu	15d3189884	Make some ptrace commands x86-only Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9751f859332d433ca772d6b9733f5a5a64398ec7 PiperOrigin-RevId: 234877624	2019-02-20 15:10:59 -08:00
Amanda Tait	ea070b9d5f	Implement Broadcast support This change adds support for the SO_BROADCAST socket option in gVisor Netstack. This support includes getsockopt()/setsockopt() functionality for both UDP and TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and down the stack, and route finding/creation for broadcast packets. Finally, a suite of tests have been implemented, exercising this functionality through the Linux syscall API. PiperOrigin-RevId: 234850781 Change-Id: If3e666666917d39f55083741c78314a06defb26c	2019-02-20 12:54:13 -08:00
Bhasker Hariharan	3e3a1ef9d6	Updates tcp_proxy to use an AF_PACKET and veth devices. tcp_proxy now uses an AF_PACKET socket as the FD for netstack link layer endpoint instead of a tap device. It also changes the link layer endpoint to use PacketMMap dispatch instead of Readv. This reduces overall cpu and reflects the current runsc setup which uses PacketMMap and also uses veth devices to receive packets. Also fixed a bug in gonet where Read() was not doing coalescing read and would read small amounts at a time. PiperOrigin-RevId: 234714768 Change-Id: Idabf8e600e4512489d3ba441c4096dc74deba5d7	2019-02-19 18:23:54 -08:00
Kevin Krakauer	ec2460b189	netstack: Add SIOCGSTAMP support. Ping sometimes uses this instead of SO_TIMESTAMP. PiperOrigin-RevId: 234699590 Change-Id: Ibec9c34fa0d443a931557a2b1b1ecd83effe7765	2019-02-19 16:41:32 -08:00
Michael Pratt	0b310ada5b	Rename "perfctr_l2" to "perfctr_llc" 910448bbed066ab1082b510eef1ae61bb792d854 ("perf/x86/amd/uncore: Rename cpufeatures macro for cache counters") in 4.14 changed the name. We change both the internal and cpuinfo name. As the upstream commit states, "In Family 17h, L3 is the last level cache as opposed to L2 in previous families. Avoid this name confusion ..." PiperOrigin-RevId: 234698034 Change-Id: Ibf2efd4c0b83c1a8b5bb123da65ea1d7c6acd778	2019-02-19 16:32:22 -08:00
Jamie Liu	2840f7c1b1	Add p9.Sticky. PiperOrigin-RevId: 234691125 Change-Id: I2a588153ded5a4fbed07bc2f0937a43ccfba791b	2019-02-19 15:53:46 -08:00
Jamie Liu	bed6f8534b	Set rax to syscall number on SECCOMP_RET_TRAP. PiperOrigin-RevId: 234690475 Change-Id: I1cbfb5aecd4697a4a26ec8524354aa8656cc3ba1	2019-02-19 15:49:37 -08:00
Michael Pratt	fd50504a3a	Rename "rdt" to "rdt_a" The final merged patch in Linux 4.10, 4ab1586488cb56ed8728e54c4157cc38646874d9 ("x86/cpufeature: Add RDT CPUID feature bits") named this feature "rdt_a". Earlier patch sets had named this "rdt". PiperOrigin-RevId: 234680481 Change-Id: I0cc968201ec9a2825701405e207994a7331322b7	2019-02-19 14:58:12 -08:00
Jamie Liu	bb47d8a545	Fix clone(CLONE_NEWUSER). - Use new user namespace for namespace creation checks. - Ensure userns is never nil since it's used by other namespaces. PiperOrigin-RevId: 234673175 Change-Id: I4b9d9d1e63ce4e24362089793961a996f7540cd9	2019-02-19 14:20:05 -08:00
Jamie Liu	22d8b6eba1	Break /proc/[pid]/{uid,gid}_map's dependence on seqfile. In addition to simplifying the implementation, this fixes two bugs: - seqfile.NewSeqFile unconditionally creates an inode with mode 0444, but {uid,gid}_map have mode 0644. - idMapSeqFile.Write implements fs.FileOperations.Write ... but it doesn't implement any other fs.FileOperations methods and is never used as fs.FileOperations. idMapSeqFile.GetFile() => seqfile.SeqFile.GetFile() uses seqfile.seqFileOperations instead, which rejects all writes. PiperOrigin-RevId: 234638212 Change-Id: I4568f741ab07929273a009d7e468c8205a8541bc	2019-02-19 11:21:46 -08:00
Ian Gudger	c611dbc5a7	Implement IP_MULTICAST_IF. This allows setting a default send interface for IPv4 multicast. IPv6 support will come later. PiperOrigin-RevId: 234251379 Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173	2019-02-15 18:40:15 -08:00
Googler	e2dcce5442	Internal change. PiperOrigin-RevId: 234237297 Change-Id: Ic9b7a37db831556d2c2cf733a6e27fba27afee0b	2019-02-15 16:47:55 -08:00
Kevin Krakauer	a9cb3dcd9d	Move SO_TIMESTAMP from different transport endpoints to epsocket. SO_TIMESTAMP is reimplemented in ping and UDP sockets (and needs to be added for TCP), but can just be implemented in epsocket for simplicity. This will also make SIOCGSTAMP easier to implement. PiperOrigin-RevId: 234179300 Change-Id: Ib5ea0b1261dc218c1a8b15a65775de0050fe3230	2019-02-15 11:18:44 -08:00
Googler	c5f10af2c8	Internal change. PiperOrigin-RevId: 234169795 Change-Id: I3c576ae6ad460e2c0e3f142a2671dc18d34a07ef	2019-02-15 10:34:20 -08:00
Fabricio Voznika	e34d27e8b6	Redirect FIXME to more appropriate bug PiperOrigin-RevId: 234147487 Change-Id: I779a6012832bb94a6b89f5bcc7d821b40ae969cc	2019-02-15 08:23:27 -08:00
Nicolas Lacasse	0a41ea72c1	Don't allow writing or reading to TTY unless process group is in foreground. If a background process tries to read from a TTY, linux sends it a SIGTTIN unless the signal is blocked or ignored, or the process group is an orphan, in which case the syscall returns EIO. See drivers/tty/n_tty.c:n_tty_read()=>job_control(). If a background process tries to write a TTY, set the termios, or set the foreground process group, linux then sends a SIGTTOU. If the signal is ignored or blocked, linux allows the write. If the process group is an orphan, the syscall returns EIO. See drivers/tty/tty_io.c:tty_check_change(). PiperOrigin-RevId: 234044367 Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9	2019-02-14 15:47:31 -08:00
Googler	d60ce17a21	Internal change. PiperOrigin-RevId: 234011346 Change-Id: Ic69375ddb3794dd0d3d6e62ee4dc60fdf4baf2c7	2019-02-14 12:54:27 -08:00

... 2 3 4 5 6 ...

911 Commits