Commit Graph

2377 Commits

Author SHA1 Message Date
Nayana Bidari 04c284f8c2 Fix panic when calling dup2().
PiperOrigin-RevId: 329572337
2020-09-01 13:41:01 -07:00
Ayush Ranjan 723fb5c116 [go-marshal] Enable auto-marshalling for fs/tty.
PiperOrigin-RevId: 329564614
2020-09-01 13:02:17 -07:00
Nayana Bidari 0eae08bc9e Automated rollback of changelist 328350576
PiperOrigin-RevId: 329526153
2020-09-01 09:54:55 -07:00
Jamie Liu 6cdfa4fee0 Don't use read-only host FD for writable gofer dentries in VFS2.
As documented for gofer.dentry.hostFD.

PiperOrigin-RevId: 329372319
2020-08-31 13:57:19 -07:00
gVisor bot 911cecaa34 Implement walk in gvisor verity fs
Implement walk directories in gvisor verity file system. For each step,
the child dentry is verified against a verified parent root hash.

PiperOrigin-RevId: 329358747
2020-08-31 12:52:21 -07:00
Ting-Yu Wang ba25485d96 stateify: Bring back struct field and type names in pretty print
PiperOrigin-RevId: 329349158
2020-08-31 12:06:00 -07:00
Nicolas Lacasse f6ddcbefac Fix kernfs.Dentry reference leak.
PiperOrigin-RevId: 329036994
2020-08-28 17:20:17 -07:00
Ghanan Gowripalan d5787f628c Don't bind loopback to all IPs in an IPv6 subnet
An earlier change considered the loopback bound to all addresses in an
assigned subnet. This should have only be done for IPv4 to maintain
compatability with Linux:

```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3062ms

$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
^C
--- 2001:db8::2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms

$ sudo ip addr add 2001:db8::1/64 dev lo
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
64 bytes from 2001:db8::1: icmp_seq=1 ttl=64 time=0.055 ms
64 bytes from 2001:db8::1: icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from 2001:db8::1: icmp_seq=3 ttl=64 time=0.073 ms
64 bytes from 2001:db8::1: icmp_seq=4 ttl=64 time=0.071 ms
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3075ms
rtt min/avg/max/mdev = 0.055/0.068/0.074/0.007 ms
$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
From 2001:db8::1 icmp_seq=1 Destination unreachable: No route
From 2001:db8::1 icmp_seq=2 Destination unreachable: No route
From 2001:db8::1 icmp_seq=3 Destination unreachable: No route
From 2001:db8::1 icmp_seq=4 Destination unreachable: No route
^C
--- 2001:db8::2 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3070ms
```

Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 329011566
2020-08-28 14:39:30 -07:00
Rahat Mahmood b4820e5986 Implement StatFS for various VFS2 filesystems.
This mainly involved enabling kernfs' client filesystems to provide a
StatFS implementation.

Fixes #3411, #3515.

PiperOrigin-RevId: 329009864
2020-08-28 14:31:11 -07:00
Ghanan Gowripalan bdd5996a73 Improve type safety for network protocol options
The existing implementation for NetworkProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.

This change introduces marker interfaces for network protocol options
that may be set or queried which network protocol option types implement
to ensure that invalid types are caught at compile time. Different
interfaces are used to allow the compiler to enforce read-only or
set-only socket options.

PiperOrigin-RevId: 328980359
2020-08-28 11:50:17 -07:00
Dean Deng 8b9cb36d1c Fix EOF handling for splice.
Also, add corresponding EOF tests for splice/sendfile.

Discovered by syzkaller.

PiperOrigin-RevId: 328975990
2020-08-28 11:28:28 -07:00
Kevin Krakauer b3ff31d041 fix panic when calling SO_ORIGINAL_DST without initializing iptables
Reported-by: syzbot+074ec22c42305725b79f@syzkaller.appspotmail.com
PiperOrigin-RevId: 328963899
2020-08-28 10:35:18 -07:00
Ghanan Gowripalan 8ae0ab722c Use a single NetworkEndpoint per address
This change was already done as of
https://github.com/google/gvisor/commit/1736b2208f but
https://github.com/google/gvisor/commit/a174aa7597 conflicted with that
change and it was missed in reviews.

This change fixes the conflict.

PiperOrigin-RevId: 328920372
2020-08-28 05:09:15 -07:00
Ayush Ranjan 421e35020b [go-marshal] Enable auto-marshalling for tundev.
PiperOrigin-RevId: 328863725
2020-08-27 19:27:18 -07:00
Dean Deng 84f04909c2 Fix vfs2 pipe behavior when splicing to a non-pipe.
Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero
would not actually empty the pipe.

There was no guarantee that the data would actually be consumed on a splice
operation unless the output file's implementation of Write/PWrite actually
called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed
regardless of whether CopyIn is called or not.

Furthermore, the number of bytes in the IOSequence for reads is now capped at
the amount of data actually available. Before, splicing to /dev/zero would
always return the requested splice size without taking the actual available
data into account.

This change also refactors the case where an input file is spliced into an
output pipe so that it follows a similar pattern, which is arguably cleaner
anyway.

Updates #3576.

PiperOrigin-RevId: 328843954
2020-08-27 16:57:40 -07:00
Andrei Vagin dc008fbbcc unix: return ECONNREFUSE if a socket file exists but a socket isn't bound to it
PiperOrigin-RevId: 328843560
2020-08-27 16:52:02 -07:00
Ayush Ranjan 57877b420c [go-marshal] Support for usermem.IOOpts.
PiperOrigin-RevId: 328839759
2020-08-27 16:30:05 -07:00
Ghanan Gowripalan 6f8fb7e0db Improve type safety for socket options
The existing implementation for {G,S}etSockOpt take arguments of an
empty interface type which all types (implicitly) implement; any
type may be passed to the functions.

This change introduces marker interfaces for socket options that may be
set or queried which socket option types implement to ensure that invalid
types are caught at compile time. Different interfaces are used to allow
the compiler to enforce read-only or set-only socket options.

Fixes #3714.

RELNOTES: n/a
PiperOrigin-RevId: 328832161
2020-08-27 15:46:44 -07:00
Ghanan Gowripalan dc81eb9c37 Add function to get error from a tcpip.Endpoint
In an upcoming CL, socket option types are made to implement a marker
interface with pointer receivers. Since this results in calling methods
of an interface with a pointer, we incur an allocation when attempting
to get an Endpoint's last error with the current implementation.

When calling the method of an interface, the compiler is unable to
determine what the interface implementation does with the pointer
(since calling a method on an interface uses virtual dispatch at runtime
so the compiler does not know what the interface method will do) so it
allocates on the heap to be safe incase an implementation continues to
hold the pointer after the functioon returns (the reference escapes the
scope of the object).

In the example below, the compiler does not know what b.foo does with
the reference to a it allocates a on the heap as the reference to a may
escape the scope of a.
```
var a int
var b someInterface
b.foo(&a)
```

This change removes the opportunity for that allocation.

RELNOTES: n/a
PiperOrigin-RevId: 328796559
2020-08-27 12:50:19 -07:00
Kevin Krakauer 01a35a2f19 ip6tables: (de)serialize ip6tables structs
More implementation+testing to follow.

#3549.

PiperOrigin-RevId: 328770160
2020-08-27 10:53:49 -07:00
Fabricio Voznika 32e7a54f7f Make flag propagation automatic
Use reflection and tags to provide automatic conversion from
Config to flags. This makes adding new flags less error-prone,
skips flags using default values (easier to read), and makes
tests correctly use default flag values for test Configs.

Updates #3494

PiperOrigin-RevId: 328662070
2020-08-26 20:24:41 -07:00
gVisor bot a4b1c6f5a4 Merge pull request #3742 from lubinszARM:pr_n1_1
PiperOrigin-RevId: 328639254
2020-08-26 17:10:16 -07:00
Adin Scannell 983a55aa06 Support stdlib analyzers with nogo.
This immediately revealed an escape analysis violation (!), where
the sync.Map was being used in a context that escapes were not
allowed. This is a relatively minor fix and is included.

PiperOrigin-RevId: 328611237
2020-08-26 14:42:35 -07:00
Nicolas Lacasse 366f1a8f16 Remove spurious fd.IncRef().
PiperOrigin-RevId: 328583461
2020-08-26 12:30:44 -07:00
Nicolas Lacasse 83a8b309e9 tmpfs: Allow xattrs in the trusted namespace if creds has CAP_SYS_ADMIN.
This is needed to support the overlay opaque attribute.

PiperOrigin-RevId: 328552985
2020-08-26 10:05:34 -07:00
Dean Deng df3c105f49 Use new reference count utility throughout gvisor.
This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and
vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted
soon anyway.

The following structs now use the new tool, with leak check enabled:
devpts:rootInode
fuse:inode
kernfs:Dentry
kernfs:dir
kernfs:readonlyDir
kernfs:StaticDirectory
proc:fdDirInode
proc:fdInfoDirInode
proc:subtasksInode
proc:taskInode
proc:tasksInode
vfs:FileDescription
vfs:MountNamespace
vfs:Filesystem
sys:dir
kernel:FSContext
kernel:ProcessGroup
kernel:Session
shm:Shm
mm:aioMappable
mm:SpecialMappable
transport:queue

And the following use the template, but because they currently are not leak
checked, a TODO is left instead of enabling leak check in this patch:
kernel:FDTable
tun:tunEndpoint

Updates #1486.

PiperOrigin-RevId: 328460377
2020-08-25 21:04:04 -07:00
Jamie Liu 247dcd62d4 Return non-zero size for tmpfs statfs(2).
This does not implement accepting or enforcing any size limit, which will be
more complex and has performance implications; it just returns a fixed non-zero
size.

Updates #1936

PiperOrigin-RevId: 328428588
2020-08-25 16:40:02 -07:00
Dean Deng cb573c8e0b Expose basic coverage information to userspace through kcov interface.
In Linux, a kernel configuration is set that compiles the kernel with a
custom function that is called at the beginning of every basic block, which
updates the memory-mapped coverage information. The Go coverage tool does not
allow us to inject arbitrary instructions into basic blocks, but it does
provide data that we can convert to a kcov-like format and transfer them to
userspace through a memory mapping.

Note that this is not a strict implementation of kcov, which is especially
tricky to do because we do not have the same coverage tools available in Go
that that are available for the actual Linux kernel. In Linux, a kernel
configuration is set that compiles the kernel with a custom function that is
called at the beginning of every basic block to write program counters to the
kcov memory mapping. In Go, however, coverage tools only give us a count of
basic blocks as they are executed. Every time we return to userspace, we
collect the coverage information and write out PCs for each block that was
executed, providing userspace with the illusion that the kcov data is always
up to date. For convenience, we also generate a unique synthetic PC for each
block instead of using actual PCs. Finally, we do not provide thread-specific
coverage data (each kcov instance only contains PCs executed by the thread
owning it); instead, we will supply data for any file specified by --
instrumentation_filter.

Also, fix issue in nogo that was causing pkg/coverage:coverage_nogo
compilation to fail.

PiperOrigin-RevId: 328426526
2020-08-25 16:28:45 -07:00
Toshi Kikuchi 70a7a3ac70 Only send an ICMP error message if UDP checksum is valid.
Test:
 - TestV4UnknownDestination
 - TestV6UnknownDestination
PiperOrigin-RevId: 328424137
2020-08-25 16:15:29 -07:00
Ayush Ranjan 430487c9e7 [go-marshal] Enable auto-marshalling for host tty.
PiperOrigin-RevId: 328415633
2020-08-25 15:29:03 -07:00
Nicolas Lacasse c28bbee993 overlay: clonePrivateMount must pass a Dentry reference to MakeVirtualDentry.
PiperOrigin-RevId: 328410065
2020-08-25 15:00:31 -07:00
Bhasker Hariharan 1f0d23c7ad Clarify comment on NetworkProtocolNumber.
The actual values used for this field in Netstack are actually EtherType values
of the protocol in an Ethernet frame. Eg. header.IPv4ProtocolNumber is 0x0800
and not the number of the IPv4 Protocol Number itself which is 4. Similarly
header.IPv6ProtocolNumber is set to 0x86DD whereas the IPv6 protocol number is
41.

See:
  - https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml (For EtherType)
  - https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml (For ProtocolNumbers)
PiperOrigin-RevId: 328407293
2020-08-25 14:47:08 -07:00
Kevin Krakauer 3cba0a41d9 remove iptables sockopt special cases
iptables sockopts were kludged into an unnecessary check, this properly
relegates them to the {get,set}SockOptIP functions.

PiperOrigin-RevId: 328395135
2020-08-25 13:43:26 -07:00
Adin Scannell b0c53f8475 Add nogo support to go_binary and go_test targets.
Updates #3374

PiperOrigin-RevId: 328378700
2020-08-25 12:18:25 -07:00
gVisor bot b83758cd87 Change "Fd" member to "FD" according to convension
PiperOrigin-RevId: 328374775
2020-08-25 11:59:33 -07:00
Sam Balana a174aa7597 Add option to replace linkAddrCache with neighborCache
This change adds an option to replace the current implementation of ARP through
linkAddrCache, with an implementation of NUD through neighborCache. Switching
to using NUD for both ARP and NDP is beneficial for the reasons described by
RFC 4861 Section 3.1:

  "[Using NUD] significantly improves the robustness of packet delivery in the
  presence of failing routers, partially failing or partitioned links, or nodes
  that change their link-layer addresses. For instance, mobile nodes can move
  off-link without losing any connectivity due to stale ARP caches."

  "Unlike ARP, Neighbor Unreachability Detection detects half-link failures and
  avoids sending traffic to neighbors with which two-way connectivity is
  absent."

Along with these changes exposes the API for querying and operating the
neighbor cache. Operations include:
  - Create a static entry
  - List all entries
  - Delete all entries
  - Remove an entry by address

This also exposes the API to change the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][1] for the list of
constants.

Finally, an API for subscribing to NUD state changes is exposed through
NUDDispatcher. See [RFC 4861 Appendix C][3] for the list of edges.

Tests:
 pkg/tcpip/network/arp:arp_test
 + TestDirectRequest

 pkg/tcpip/network/ipv6:ipv6_test
 + TestLinkResolution
 + TestNDPValidation
 + TestNeighorAdvertisementWithTargetLinkLayerOption
 + TestNeighorSolicitationResponse
 + TestNeighorSolicitationWithSourceLinkLayerOption
 + TestRouterAdvertValidation

 pkg/tcpip/stack:stack_test
 + TestCacheWaker
 + TestForwardingWithFakeResolver
 + TestForwardingWithFakeResolverManyPackets
 + TestForwardingWithFakeResolverManyResolutions
 + TestForwardingWithFakeResolverPartialTimeout
 + TestForwardingWithFakeResolverTwoPackets
 + TestIPv6SourceAddressSelectionScopeAndSameAddress

[1]: https://tools.ietf.org/html/rfc4861#section-10
[2]: https://tools.ietf.org/html/rfc4861#appendix-C

Fixes #1889
Fixes #1894
Fixes #1895
Fixes #1947
Fixes #1948
Fixes #1949
Fixes #1950

PiperOrigin-RevId: 328365034
2020-08-25 11:09:33 -07:00
Nayana Bidari b26f7503b5 Support SO_LINGER socket option.
When SO_LINGER option is enabled, the close will not return until all the
queued messages are sent and acknowledged for the socket or linger timeout is
reached. If the option is not set, close will return immediately. This option
is mainly supported for connection oriented protocols such as TCP.

PiperOrigin-RevId: 328350576
2020-08-25 10:04:07 -07:00
Bhasker Hariharan ae332d96e4 Fix TCP_LINGER2 behavior to match linux.
We still deviate a bit from linux in how long we will actually wait in
FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely
obvious as to why it's done that way. For now I think we can ignore that and
fix it if it really is an issue.

PiperOrigin-RevId: 328324922
2020-08-25 07:17:32 -07:00
Dean Deng c61f6fcf6a Fix deadlock in gofer direct IO.
Fixes several java runtime tests:
java/nio/channels/FileChannel/directio/ReadDirect.java
java/nio/channels/FileChannel/directio/PreadDirect.java

Updates #3576.

PiperOrigin-RevId: 328281849
2020-08-25 00:26:06 -07:00
Ghanan Gowripalan f1821fdb68 Automated rollback of changelist 327325153
PiperOrigin-RevId: 328259353
2020-08-24 20:41:09 -07:00
Jamie Liu 4ad858a586 Flush in fsimpl/gofer.regularFileFD.OnClose() if there are no dirty pages.
This is closer to indistinguishable from VFS1 behavior.

PiperOrigin-RevId: 328256068
2020-08-24 20:06:16 -07:00
Bin Lu 57bfbed1d6 Device major number greater than 2 digits in /proc/self/maps on arm64 N1 machine
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-08-24 22:41:01 -04:00
gVisor bot ee041b60bf Add check for same source in merkle tree lib
If the data is in the same Reader as the merkle tree, we should verify
from the first layer in the tree, instead of from the beginning.

PiperOrigin-RevId: 328230988
2020-08-24 16:34:15 -07:00
Zach Koopmans 2b0b5e2521 Remove go profiling flag from dockerutil.
Go profiling was removed from runsc debug in a previous change.

PiperOrigin-RevId: 328203826
2020-08-24 13:53:10 -07:00
Michael Pratt ab6c474210 Bump build constraints to 1.17
This enables pre-release testing with 1.16. The intention is to replace these
with a nogo check before the next release.

PiperOrigin-RevId: 328193911
2020-08-24 12:58:39 -07:00
Ghanan Gowripalan 339d266be4 Consider loopback bound to all addresses in subnet
When a loopback interface is configurd with an address and associated
subnet, the loopback should treat all addresses in that subnet as an
address it owns.

This is mimicking linux behaviour as seen below:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
^C
--- 192.0.2.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms

$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
^C
--- 192.0.2.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms

$ sudo ip addr add 192.0.2.1/24 dev lo
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 192.0.2.1/24 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms
^C
--- 192.0.2.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms
^C
--- 192.0.2.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms
```

Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 328188546
2020-08-24 12:28:35 -07:00
Dean Deng bae25d2a08 Update inotify documentation for gofer filesystem.
We now allow hard links to be created within gofer fs (see
github.com/google/gvisor/commit/f20e63e31b56784c596897e86f03441f9d05f567).
Update the inotify documentation accordingly.

PiperOrigin-RevId: 328177485
2020-08-24 11:30:44 -07:00
gVisor bot e7270096a7 Implement GetFilesystem for verity fs
verity GetFilesystem is implemented by mounting the underlying file
system, save the mount, and store both the underlying root dentry and
root Merkle file dentry in verity's root dentry.

PiperOrigin-RevId: 327959334
2020-08-22 09:54:50 -07:00
Ayush Ranjan 17bc5c1b00 [vfs] Allow mountpoint to be an existing non-directory.
Unlike linux mount(2), OCI spec allows mounting on top of an existing
non-directory file.

PiperOrigin-RevId: 327914342
2020-08-21 20:06:01 -07:00
Ting-Yu Wang 9607515aed stateify: Fix pretty print not printing odd numbered fields.
PiperOrigin-RevId: 327902182
2020-08-21 17:34:26 -07:00