Commit Graph

1334 Commits

Author SHA1 Message Date
Michael Pratt c61a2e709a Modernize mknod test
PiperOrigin-RevId: 247704588
Change-Id: I1e63e2b310145695fbe38429b91e44d72473fcd6
2019-05-10 17:37:43 -07:00
Jamie Liu 5ee8218483 Add pgalloc.DelayedEvictionManual.
PiperOrigin-RevId: 247667272
Change-Id: I16b04e11bb93f50b7e05e888992303f730e4a877
2019-05-10 13:37:48 -07:00
Fabricio Voznika 1bee43be13 Implement fallocate(2)
Closes #225

PiperOrigin-RevId: 247508791
Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
2019-05-09 15:35:49 -07:00
Tamir Duberstein 0f4be95a33 Remove dhcp client
This was upstreamed from Fuchsia, but it is pretty buggy and doesn't
rely on any private APIs. Thus it can be checked into the Fuchsia source
tree without forking netstack, where we can more easily iterate on (and
eventually remove) it.

PiperOrigin-RevId: 247506582
Change-Id: Ifb1b60c6c4941c374a59c5570a6a9cacf2468981
2019-05-09 15:23:03 -07:00
Googler c3b6d4587e Fix types that are subtly incorrect.
PiperOrigin-RevId: 247294093
Change-Id: Iac8c76e50bbc15c240ae7da7f5786f9968e7057c
2019-05-08 14:40:09 -07:00
Nicolas Lacasse bfd9f75ba4 Set the FilesytemType in MountSource from the Filesystem.
And stop storing the Filesystem in the MountSource.

This allows us to decouple the MountSource filesystem type from the name of the
filesystem.

PiperOrigin-RevId: 247292982
Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8
2019-05-08 14:35:06 -07:00
Googler cbf6ab9697 Check GSO for nil in WritePacket
Testing:
Unit tests added
PiperOrigin-RevId: 247096269
Change-Id: I849c010eadcb53caf45896a15ef38162d66a9568
2019-05-07 14:57:03 -07:00
Ian Gudger 20862f0db2 Add gonet.DialContextTCP.
Allows cancellation and timeouts.

PiperOrigin-RevId: 247090428
Change-Id: I91907f12e218677dcd0e0b6d72819deedbd9f20c
2019-05-07 14:27:36 -07:00
Fabricio Voznika e5432fa1b3 Remove defers from gofer.contextFile
Most are single line methods in hot paths.

PiperOrigin-RevId: 247050267
Change-Id: I428d78723fe00b57483185899dc8fa9e1f01e2ea
2019-05-07 10:55:09 -07:00
Jamie Liu 14f0e7618e Ensure all uses of MM.brk occur under MM.mappingMu in MM.Brk().
PiperOrigin-RevId: 246921386
Change-Id: I71d8908858f45a9a33a0483470d0240eaf0fd012
2019-05-06 16:39:43 -07:00
Kevin Krakauer ff8ed5e6a5 Fix raw socket behavior and tests.
Some behavior was broken due to the difficulty of running automated raw
socket tests.

Change-Id: I152ca53916bb24a0208f2dc1c4f5bc87f4724ff6
PiperOrigin-RevId: 246747067
2019-05-05 16:07:25 -07:00
Bin Lu ebe2f78d9b Add arm64 support to pkg/seccomp
Signed-off-by: Bin Lu <bin.lu@arm.com>
PiperOrigin-RevId: 246622505
Change-Id: I803639a0c5b0f75959c64fee5385314214834d10
2019-05-03 22:03:59 -07:00
Andrei Vagin bf0ac565d2 Fix runsc restore to be compatible with docker start --checkpoint ...
Change-Id: I02b30de13f1393df66edf8829fedbf32405d18f8
PiperOrigin-RevId: 246621192
2019-05-03 21:41:45 -07:00
Ian Gudger b4a9f18687 Update tcpip Clock description.
The tcpip.Clock comment stated that times provided by it should not be used for
netstack internal timekeeping. This comment was from before the interface
supported monotonic times. The monotonic times that it provides are now be the
preferred time source for netstack internal timekeeping.

PiperOrigin-RevId: 246618772
Change-Id: I853b720e3d719b03fabd6156d2431da05d354bda
2019-05-03 21:01:42 -07:00
Andrei Vagin 9e1c253fe8 gvisor: run bazel in a docker container
bazel has a lot of dependencies and users don't want to install them
just to build gvisor.

These changes allows to run bazel in a docker container.
A bazel cache is on the local file system (~/.cache/bazel), so
incremental builds should be fast event after recreating a bazel
container.

Here is an example how to build runsc:
make BAZEL_OPTIONS="build runsc:runsc" bazel

Change-Id: I8c0a6d0c30e835892377fb6dd5f4af7a0052d12a
PiperOrigin-RevId: 246570877
2019-05-03 14:13:08 -07:00
Andrei Vagin 24d8656585 gofer: don't leak file descriptors
Fixes #219

PiperOrigin-RevId: 246568639
Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a
2019-05-03 14:01:50 -07:00
Googler f2699b76c8 Support IPv4 fragmentation in netstack
Testing:
Unit tests and also large ping in Fuchsia OS
PiperOrigin-RevId: 246563592
Change-Id: Ia12ab619f64f4be2c8d346ce81341a91724aef95
2019-05-03 13:30:35 -07:00
Kevin Krakauer 264d012d81 Add netfilter ABI for iptables support.
Change-Id: Ifbd2abf63ea8062a89b83e948d3e9735480d8216
PiperOrigin-RevId: 246559904
2019-05-03 13:06:09 -07:00
Tamir Duberstein 0e1cc476db Fix transport/raw copybara export
- include packet_list.go
- exclude state.go (by renaming to include an underscore)

Also rename raw.go to endpoint.go for consistency.

PiperOrigin-RevId: 246547912
Change-Id: I19c8331c794ba683a940cc96a8be6497b53ff24d
2019-05-03 11:52:59 -07:00
Andrei Vagin 4edd6f5ccf runsc: add a bazel target to build a debian package
$ dpkg -s runsc
Package: runsc
Status: install ok installed
Priority: optional
Section: contrib/devel
Maintainer: The gVisor Authors <gvisor-dev@googlegroups.com>
Architecture: amd64
Version: 20190304.1-123-g861434f612ce-dirty
Description: gVisor is a user-space kernel, written in Go, that
 implements a substantial portion of the Linux system surface. It
 includes an Open Container Initiative (OCI) runtime called runsc that
 provides an isolation boundary between the application and the host
 kernel. The runsc runtime integrates with Docker and Kubernetes,
 making it simple to run sandboxed containers.
Homepage: https://gvisor.dev/
Built-Using: Bazel
Change-Id: I6f161de8fba649f12272a87b99529ccfd22e499a
PiperOrigin-RevId: 246546294
2019-05-03 11:43:43 -07:00
Andrei Vagin 3f3e3a6303 gvisor/kokoro: save runsc logs
PiperOrigin-RevId: 246542315
Change-Id: Ia9ba2bc104e0af3277d3b6102122c13d320ea802
2019-05-03 11:21:22 -07:00
Bhasker Hariharan 458fe955a7 Implement support for SACK based recovery(RFC 6675).
PiperOrigin-RevId: 246536003
Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
2019-05-03 10:51:18 -07:00
Fabricio Voznika 95614bbefa Increase timeout to wait for port to become available
TestHttpd fails sporadically waiting for the port on slow
machines.

PiperOrigin-RevId: 246525277
Change-Id: Ie0ea71e3c4664d24f580eabd8f7461e47079f734
2019-05-03 09:54:24 -07:00
Fabricio Voznika 6b9ab65163 Skip flaky ClockGettime.CputimeId take 2
The test also times out when GCE machine has 2 CPUs. I cannot
repro it locally with a 2 CPU cgroup though. Let's skip the
test when there are 2 CPUs to stop the flakiness and retest it
once the fix is available.

PiperOrigin-RevId: 246523363
Change-Id: I9d9d922a5be3aa7bc91dff5a1807ca99f3f4a4f9
2019-05-03 09:42:10 -07:00
Chris Kuiper 2d8e90b311 Proper cleanup of sockets that used REUSEPORT
Fixed a small logic error that broke proper accounting of MultiPortEndpoints.

PiperOrigin-RevId: 246502126
Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-03 07:02:51 -07:00
Chris Kuiper 8972e47a2e Support reception of multicast data on more than one socket
This requires two changes:
1) Support for more than one socket to join a given multicast group.

2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.

In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.

PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-05-02 19:41:00 -07:00
Andrei Vagin 5f8225c009 runsc: don't create an empty network namespace if NetworkHost is set
With this change, we will be able to run runsc do in a host network namespace.

PiperOrigin-RevId: 246436660
Change-Id: I8ea18b1053c88fe2feed74239b915fe7a151ce34
2019-05-02 19:34:36 -07:00
Andrei Vagin c967fbdaa2 runsc: move test_app in a separate directory
Opensource tools (e. g. https://github.com/fatih/vim-go) can't hanlde more than
one golang package in one directory.

PiperOrigin-RevId: 246435962
Change-Id: I67487915e3838762424b2d168efc54ae34fb801f
2019-05-02 19:27:27 -07:00
Kevin Krakauer bf40fa2129 Replace dynamic macros with constants in memfd test.
PiperOrigin-RevId: 246433167
Change-Id: Idb9b6c20ee1da193176288dfd2f9d85ec0e69c54
2019-05-02 18:57:58 -07:00
Fabricio Voznika bbb6539114 Add [simple] network support to 'runsc do'
Sandbox always runsc with IP 192.168.10.2 and the peer
network adds 1 to the address (192.168.10.3). Sandbox
IP can be changed using --ip flag.

Here a few examples:
  sudo runsc do curl www.google.com
  sudo runsc do --ip=10.10.10.2 bash -c "echo 123 | netcat -l -p 8080"

PiperOrigin-RevId: 246421277
Change-Id: I7b3dce4af46a57300350dab41cb27e04e4b6e9da
2019-05-02 17:17:39 -07:00
Adin Scannell 2c1c1c9917 CONTRIBUTING: fix broken repository link
PiperOrigin-RevId: 246079174
Change-Id: I423078a065e0cc5d258d674b4f2f0680a5db0aee
2019-04-30 21:53:59 -07:00
Michael Pratt 23ca9886c6 Update reference to old type
PiperOrigin-RevId: 246036806
Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7
2019-04-30 15:42:39 -07:00
Jamie Liu 8bfb83d0ac Implement async MemoryFile eviction, and use it in CachingInodeOperations.
This feature allows MemoryFile to delay eviction of "optional"
allocations, such as unused cached file pages.

Note that this incidentally makes CachingInodeOperations writeback
asynchronous, in the sense that it doesn't occur until eviction; this is
necessary because between when a cached page becomes evictable and when
it's evicted, file writes (via CachingInodeOperations.Write) may dirty
the page.

As currently implemented, this feature won't meaningfully impact
steady-state memory usage or caching; the reclaimer goroutine will
schedule eviction as soon as it runs out of other work to do. Future CLs
increase caching by adding constraints on when eviction is scheduled.

PiperOrigin-RevId: 246014822
Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb
2019-04-30 13:56:41 -07:00
Ian Gudger 81ecd8b6ea Implement the MSG_CTRUNC msghdr flag for Unix sockets.
Updates google/gvisor#206

PiperOrigin-RevId: 245880573
Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29 21:21:08 -07:00
Fabricio Voznika 2843f2a956 Skip flaky ClockGettime.CputimeId
Test times out when it runs on a single core. Skip until the
bug in the Go runtime is fixed.

PiperOrigin-RevId: 245866466
Change-Id: Ic3e72131c27136d58b71f6b11acc78abf55895d4
2019-04-29 18:41:54 -07:00
Fabricio Voznika ddab854b9a Reduce memory allocations on serving path
Cache last used messages and reuse them for subsequent requests.
If more messages are needed, they are created outside the cache
on demand.

PiperOrigin-RevId: 245836910
Change-Id: Icf099ddff95df420db8e09f5cdd41dcdce406c61
2019-04-29 15:33:47 -07:00
Michael Pratt 4d52a55201 Change copyright notice to "The gVisor Authors"
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.

1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.

Fixes #209

PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29 14:26:23 -07:00
Nicolas Lacasse f4ce43e1f4 Allow and document bug ids in gVisor codebase.
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29 14:04:14 -07:00
Googler 38e6276447 n/a
PiperOrigin-RevId: 245810347
Change-Id: Ia5f4bb268a8207bd2a7d4c77c83cdfbe1483c64f
2019-04-29 13:19:51 -07:00
Tamir Duberstein ac8fca1ef4 Appease googletest deprecation
PiperOrigin-RevId: 245788366
Change-Id: I17bbecf8493132dbe95564c34c45b838194bfabb
2019-04-29 11:34:16 -07:00
Nicolas Lacasse 2df64cd6d2 createAt should return all errors from FindInode except ENOENT.
Previously, createAt was eating all errors from FindInode except for EACCES and
proceeding with the creation. This is incorrect, as FindInode can return many
other errors (like ENAMETOOLONG) that should stop creation.

This CL changes createAt to return all errors encountered except for ENOENT,
which we can ignore because we are about to create the thing.

PiperOrigin-RevId: 245773222
Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
2019-04-29 10:30:24 -07:00
Ben Burkert 66bca6fc22 tcpip/adapters/gonet: add CloseRead & CloseWrite methods to Conn
Add the CloseRead & CloseWrite methods that performs shutdown on the
corresponding Read & Write sides of a connection.

Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1
PiperOrigin-RevId: 245537950
2019-04-26 22:46:45 -07:00
Kevin Krakauer 43dff57b87 Make raw sockets a toggleable feature disabled by default.
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-26 16:51:46 -07:00
Adin Scannell 5749f64314 kvm: remove non-sane sanity check
Apparently some platforms don't have pSize < vSize.

Fixes #208

PiperOrigin-RevId: 245480998
Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
2019-04-26 13:53:12 -07:00
Bhasker Hariharan 228dc15fd1 Bump the AF_PACKET socket rcv buf size to 4MB by default.
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.

Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.

Updates #211

iperf output w/ 4MB buffer.

 iperf3 -c 172.17.0.2 -t 100
 Connecting to host 172.17.0.2, port 5201
 [  4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
 [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
 [  4]   0.00-1.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.02 MBytes
 [  4]   1.00-2.00   sec  1.18 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]   2.00-3.00   sec   965 MBytes  8.09 Gbits/sec    0   1.02 MBytes
 [  4]   3.00-4.00   sec   942 MBytes  7.90 Gbits/sec    0   1.02 MBytes
 [  4]   4.00-5.00   sec   952 MBytes  7.99 Gbits/sec    0   1.02 MBytes
 [  4]   5.00-6.00   sec  1.14 GBytes  9.81 Gbits/sec    0   1.02 MBytes
 [  4]   6.00-7.00   sec  1.13 GBytes  9.68 Gbits/sec    0   1.02 MBytes
 [  4]   7.00-8.00   sec   930 MBytes  7.80 Gbits/sec    0   1.02 MBytes
 [  4]   8.00-9.00   sec  1.15 GBytes  9.91 Gbits/sec    0   1.02 MBytes
 [  4]   9.00-10.00  sec   938 MBytes  7.87 Gbits/sec    0   1.02 MBytes
 [  4]  10.00-11.00  sec   737 MBytes  6.18 Gbits/sec    0   1.02 MBytes
 [  4]  11.00-12.00  sec  1.16 GBytes  9.93 Gbits/sec    0   1.02 MBytes
 [  4]  12.00-13.00  sec   917 MBytes  7.69 Gbits/sec    0   1.02 MBytes
 [  4]  13.00-14.00  sec  1.19 GBytes  10.2 Gbits/sec    0   1.02 MBytes
 [  4]  14.00-15.00  sec  1.01 GBytes  8.70 Gbits/sec    0   1.02 MBytes
 [  4]  15.00-16.00  sec  1.20 GBytes  10.3 Gbits/sec    0   1.02 MBytes
 [  4]  16.00-17.00  sec  1.14 GBytes  9.80 Gbits/sec    0   1.02 MBytes
 ^C[  4]  17.00-17.60  sec   718 MBytes  10.1 Gbits/sec    0   1.02 MBytes
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ ID] Interval           Transfer     Bandwidth       Retr
 [  4]   0.00-17.60  sec  18.4 GBytes  8.98 Gbits/sec    0             sender
 [  4]   0.00-17.60  sec  0.00 Bytes  0.00 bits/sec                  receiver

PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
2019-04-26 12:52:02 -07:00
Tamir Duberstein 59442238d4 Remove syscall tests' dependency on glog
PiperOrigin-RevId: 245469859
Change-Id: I0610e477cc3a884275852e83028ecfb501f2c039
2019-04-26 12:47:46 -07:00
Kevin Krakauer 5f13338d30 Fix reference counting bug in /proc/PID/fdinfo/.
PiperOrigin-RevId: 245452217
Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
2019-04-26 11:09:55 -07:00
Kevin Krakauer f4d34b420b Change name of sticky test arg.
PiperOrigin-RevId: 245451875
Change-Id: Icee2c4ed74564e77454c60d60f456454443ccadf
2019-04-26 11:08:08 -07:00
Michael Pratt f17cfa4d53 Perform explicit CPUID and FP state compatibility checks on restore
PiperOrigin-RevId: 245341004
Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657
2019-04-25 17:47:05 -07:00
Jamie Liu 6b76c172b4 Don't enforce NAME_MAX in fs.Dirent.walk().
Maximum filename length is filesystem-dependent, and obtained via
statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not
always. For example, VFAT supports filenames of up to 255... UCS-2
characters, which Linux conservatively takes to mean UTF-8-encoded
bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
As a result, Linux's VFS does not enforce NAME_MAX:

$ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/
fs/libfs.c
38:     buf->f_namelen = NAME_MAX;
64:     if (dentry->d_name.len > NAME_MAX)

include/linux/relay.h
74:     char base_filename[NAME_MAX];   /* saved base filename */

include/linux/fscrypt.h
149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length.

include/linux/exportfs.h
176: *    understanding that it is already pointing to a a %NAME_MAX+1 sized

Remove this check from core VFS, and add it to ramfs (and by extension
tmpfs), where it is actually applicable:
mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does*
enforce NAME_MAX.

PiperOrigin-RevId: 245324748
Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-25 16:05:13 -07:00