Commit Graph

1226 Commits

Author SHA1 Message Date
Dean Deng 38330e9377 Update symlink traversal limit when resolving interpreter path.
When execveat is called on an interpreter script, the symlink count for
resolving the script path should be separate from the count for resolving the
the corresponding interpreter. An ELOOP error should not occur if we do not hit
the symlink limit along any individual path, even if the total number of
symlinks encountered exceeds the limit.

Closes #574

PiperOrigin-RevId: 277358474
2019-10-29 13:59:28 -07:00
Michael Pratt c0b8fd4b6a Update build tags to allow Go 1.14
Currently there are no ABI changes. We should check again closer to release.

PiperOrigin-RevId: 277349744
2019-10-29 13:18:16 -07:00
Dean Deng 2e00771d5a Refactor logic for loadExecutable.
Separate the handling of filenames and *fs.File objects in a more explicit way
for the sake of clarity.

PiperOrigin-RevId: 277344203
2019-10-29 12:51:29 -07:00
Ian Gudger 7d80e85835 Allow waiting for Endpoint worker goroutines to finish.
Updates #837

PiperOrigin-RevId: 277325162
2019-10-29 11:32:48 -07:00
gVisor bot 8b04e2dd8b Merge pull request #1087 from xiaobo55x:fstat_Nlink
PiperOrigin-RevId: 277324979
2019-10-29 11:27:57 -07:00
Ghanan Gowripalan 41e2df1bde Support iterating an NDP options buffer.
This change helps support iterating over an NDP options buffer so that
implementations can handle all the NDP options present in an NDP packet.

Note, this change does not yet actually handle these options, it just provides
the tools to do so (in preparation for NDP's Prefix, Parameter, and a complete
implementation of Neighbor Discovery).

Tests: Unittests to make sure we can iterate over a valid NDP options buffer
that may contain multiple options. Also tests to check an iterator before
using it to see if the NDP options buffer is malformed.
PiperOrigin-RevId: 277312487
2019-10-29 10:30:21 -07:00
Dean Deng 29273b0384 Disallow execveat on interpreter scripts with fd opened with O_CLOEXEC.
When an interpreter script is opened with O_CLOEXEC and the resulting fd is
passed into execveat, an ENOENT error should occur (the script would otherwise
be inaccessible to the interpreter). This matches the actual behavior of
Linux's execveat.

PiperOrigin-RevId: 277306680
2019-10-29 10:04:39 -07:00
Ghanan Gowripalan 0864549ecc Use the user supplied TCP MSS when creating a new active socket
This change supports using a user supplied TCP MSS for new active TCP
connections. Note, the user supplied MSS must be less than or equal to the
maximum possible MSS for a TCP connection's route. If it is greater than the
maximum possible MSS, the maximum possible MSS will be used as the connection's
MSS instead.

This change does not use this user supplied MSS for connections accepted from
listening sockets - that will come in a later change.

Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user
supplied MSS if it is not greater than the maximum possible MSS for the route.
PiperOrigin-RevId: 277185125
2019-10-28 18:20:36 -07:00
Michael Pratt 198f1cddb8 Update comment
FDTable.GetFile doesn't exist.

PiperOrigin-RevId: 277089842
2019-10-28 10:20:23 -07:00
Haibo Xu dec831b493 Cast the Stat_t.Nlink to uint64 on arm64.
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.

Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-28 05:56:03 +00:00
Dean Deng 1c480abc39 Aggregate arguments for loading executables into a single struct.
This change simplifies the function signatures of functions related to loading
executables, such as LoadTaskImage, Load, loadBinary.

PiperOrigin-RevId: 276821187
2019-10-25 22:44:19 -07:00
Ghanan Gowripalan 5a421058a0 Validate the checksum for incoming ICMPv6 packets
This change validates the ICMPv6 checksum field before further processing an
ICMPv6 packet.

Tests: Unittests to make sure that only ICMPv6 packets with a valid checksum
are accepted/processed. Existing tests using checker.ICMPv6 now also check the
ICMPv6 checksum field.
PiperOrigin-RevId: 276779148
2019-10-25 16:06:55 -07:00
Ian Gudger 8f029b3f82 Convert DelayOption to the newer/faster SockOpt int type.
DelayOption is set on all new endpoints in gVisor.

PiperOrigin-RevId: 276746791
2019-10-25 13:15:34 -07:00
Andrei Vagin fd598912be platform/ptrace: use tgkill instead of kill
The syscall filters don't allow kill, just tgkill.

PiperOrigin-RevId: 276718421
2019-10-25 11:19:20 -07:00
gVisor bot 9a726745ee Merge pull request #1070 from lubinszARM:pr_abi
PiperOrigin-RevId: 276609608
2019-10-25 10:59:42 -07:00
Ghanan Gowripalan 27e896f290 Add a type to represent the NDP Prefix Information option.
This change is in preparation for NDP Prefix Discovery and SLAAC where the stack
will need to handle NDP Prefix Information options.

Tests: Test that given an NDP Prefix Information option buffer, correct values
are returned by the field getters.
PiperOrigin-RevId: 276594592
2019-10-24 16:53:08 -07:00
Ghanan Gowripalan e50a1f5739 Remove the amss field from tcpip.tcp.handshake as it was unused
The amss field in the tcpip.tcp.handshake was not used anywhere. Removed it to
not cause confusion with the amss field in the tcpip.tcp.endpoint struct, which
was documented to be used (and is actually being used) for the same purpose.

PiperOrigin-RevId: 276577088
2019-10-24 15:23:43 -07:00
Ghanan Gowripalan f034790ad8 Use interface-specific NDP configurations instead of the stack-wide default.
This change makes it so that NDP work is done using the per-interface NDP
configurations instead of the stack-wide default NDP configurations to correctly
implement RFC 4861 section 6.3.2 (note here, a host is a single NIC operating
as a host device), and RFC 4862 section 5.1.

Test: Test that we can set NDP configurations on a per-interface basis without
affecting the configurations of other interfaces or the stack-wide default. Also
make sure that after the configurations are updated, the updated configurations
are used for NDP processes (e.g. Duplicate Address Detection).
PiperOrigin-RevId: 276525661
2019-10-24 11:09:18 -07:00
Bin Lu 7f9c391cf1 slight changes to pkg/abi
In glibc, some structures are defined differently on different
platforms.
Such as: C.struct_stat

Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-10-24 09:15:29 +00:00
Dean Deng d9fd536340 Handle AT_SYMLINK_NOFOLLOW flag for execveat.
PiperOrigin-RevId: 276441249
2019-10-24 01:45:25 -07:00
Dean Deng 7ca50236c4 Handle AT_EMPTY_PATH flag in execveat.
PiperOrigin-RevId: 276419967
2019-10-23 22:23:05 -07:00
gVisor bot 6d4d9564e3 Merge pull request #641 from tanjianfeng:master
PiperOrigin-RevId: 276380008
2019-10-23 16:55:15 -07:00
DarcySail fbe6b50d56 Keep minimal available fd to accelerate fd allocation
Use fd.next to store the iteration start position, which can be used to accelerate allocating new FDs.
And adding the corresponding gtest benchmark to measure performance.
@tanjianfeng

COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/758 from DarcySail:master 96685ec7886dfe1a64988406831d3bc002b438cc
PiperOrigin-RevId: 276351250
2019-10-23 14:27:53 -07:00
Ghanan Gowripalan de3dbf8a09 Inform netstack integrator when Duplicate Address Detection completes
This change introduces a new interface, stack.NDPDispatcher. It can be
implemented by the netstack integrator to receive NDP related events. As of this
change, only DAD related events are supported.

Tests: Existing tests were modified to use the NDPDispatcher's DAD events for
DAD tests where it needed to wait for DAD completing (failing and resolving).
PiperOrigin-RevId: 276338733
2019-10-23 13:26:35 -07:00
Ian Lewis ebe8001724 Update const names to be Go style.
PiperOrigin-RevId: 276165962
2019-10-22 16:16:41 -07:00
Andrei Vagin e63ff6d923 platform/ptrace: exit without panic if a stub process has been killed by SIGKILL
SIGKILL can be sent only by an user or OOM-killer. In both cases, we don't
need to panic.

PiperOrigin-RevId: 276150120
2019-10-22 14:57:23 -07:00
Ghanan Gowripalan 515e0558d4 Add a type to represent the NDP Router Advertisement message.
This change is in preparation for NDP Router Discovery where the stack will need
to handle NDP Router Advertisments.

Tests: Test that given an NDP Router Advertisement buffer (body of an ICMPv6
packet, correct values are returned by the field getters).
PiperOrigin-RevId: 276146817
2019-10-22 14:41:51 -07:00
Ghanan Gowripalan c356fe2ebb Respect new PrimaryEndpointBehavior when addresses gets promoted to permanent
This change makes sure that when an address which is already known by a NIC and
has kind = permanentExpired gets promoted to permanent, the new
PrimaryEndpointBehavior is respected.

PiperOrigin-RevId: 276136317
2019-10-22 13:54:33 -07:00
Andrei Vagin 8720bd643e netstack/tcp: software segmentation offload
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.

The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.

Nginx test results:

Server Software:        nginx/1.15.9
Server Hostname:        10.138.0.2
Server Port:            8080

Document Path:          /10m.txt
Document Length:        10485760 bytes

w/o gso:
Concurrency Level:      5
Time taken for tests:   5.491 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    18.21 [#/sec] (mean)
Time per request:       274.525 [ms] (mean)
Time per request:       54.905 [ms] (mean, across all concurrent requests)
Transfer rate:          186508.03 [Kbytes/sec] received

sw-gso:

Concurrency Level:      5
Time taken for tests:   3.852 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      1048600200 bytes
HTML transferred:       1048576000 bytes
Requests per second:    25.96 [#/sec] (mean)
Time per request:       192.576 [ms] (mean)
Time per request:       38.515 [ms] (mean, across all concurrent requests)
Transfer rate:          265874.92 [Kbytes/sec] received

w/o gso:
$ ./tcp_benchmark --client --duration 15  --ideal
[SUM]  0.0-15.1 sec  2.20 GBytes  1.25 Gbits/sec

software gso:
$ tcp_benchmark --client --duration 15  --ideal --gso $((1<<16)) --swgso
[SUM]  0.0-15.1 sec  3.99 GBytes  2.26 Gbits/sec

PiperOrigin-RevId: 276112677
2019-10-22 11:55:56 -07:00
Ghanan Gowripalan fb69de696b Auto-generate an IPv6 link-local address based on the NIC's MAC Address.
This change adds support for optionally auto-generating an IPv6 link-local
address based on the NIC's MAC Address on NIC enable.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that a link-local
address will not be auto-generated unless the stack is explicitly configured.
See `stack.Options` for more details. Specifically, see
`stack.Options.AutoGenIPv6LinkLocal`.

Tests: Tests to make sure that the IPb6 link-local address is only
auto-generated if the stack is specifically configured to do so. Also tests to
make sure that an auto-generated address goes through the DAD process.
PiperOrigin-RevId: 276059813
2019-10-22 07:26:54 -07:00
Bin Lu 2cee066929 enable ring0 to support arm64
This patch enabled the basic framework for arm64 guest.

Serveral jobs were finished in this patch:
1, ring0.Vectors()
2, switchToUser()
3, basic framwork for Arm64 guest.

Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-10-22 08:33:39 +00:00
Nicolas Lacasse 070a8c2d4c Remove old TODO.
PiperOrigin-RevId: 275956240
2019-10-21 17:04:32 -07:00
Dean Deng 0b569b7cae Add basic implementation of execveat syscall and associated tests.
Allow file descriptors of directories as well as AT_FDCWD.

PiperOrigin-RevId: 275929668
2019-10-21 14:55:18 -07:00
Kevin Krakauer 12235d533a AF_PACKET support for netstack (aka epsocket).
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.

Binding isn't supported yet.

PiperOrigin-RevId: 275909366
2019-10-21 13:23:18 -07:00
Kevin Krakauer 652f7b1d0f Add support for pipes in VFS2.
PiperOrigin-RevId: 275650307
2019-10-19 11:49:38 -07:00
Tamir Duberstein 51538c973e Store primary endpoints in a slice
There's no need for a linked list here.

PiperOrigin-RevId: 275565920
2019-10-18 16:14:09 -07:00
Mithun Iyer 487d3b2358 Fix typo while initializing protocol for UDP endpoints.
Fixes #763

PiperOrigin-RevId: 275563222
2019-10-18 16:00:11 -07:00
Michael Pratt 49b596b98d Cleanup host UDS support
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.

A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.

Updates #235
Updates #1003

[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.

PiperOrigin-RevId: 275558502
2019-10-18 15:33:03 -07:00
Tamir Duberstein 4e6f3a0c71 Remove restrictions on the sending address
It is quite legal to send from the ANY address (it is required for
DHCP). I can't figure out why the broadcast address was included here,
so removing that as well.

PiperOrigin-RevId: 275541954
2019-10-18 14:10:30 -07:00
Kevin Krakauer dfdbdf14fa Refactor pipe to support VFS2.
* Pulls common functionality (IO and locking on open) into pipe_util.go.
* Adds pipe/vfs.go, which implements a subset of vfs.FileDescriptionImpl.

A subsequent change will add support for pipes in memfs.

PiperOrigin-RevId: 275322385
2019-10-17 13:11:07 -07:00
Ghanan Gowripalan 962aa235de NDP Neighbor Solicitations sent during DAD must have an IP hop limit of 255
NDP Neighbor Solicitations sent during Duplicate Address Detection must have an
IP hop limit of 255, as all NDP Neighbor Solicitations should have.

Test: Test that DAD messages have the IPv6 hop limit field set to 255.
PiperOrigin-RevId: 275321680
2019-10-17 13:06:15 -07:00
Ghanan Gowripalan 06ed9e329d Do Duplicate Address Detection on permanent IPv6 addresses.
This change adds support for Duplicate Address Detection on IPv6 addresses
as defined by RFC 4862 section 5.4.

Note, this change will not break existing uses of netstack as the default
configuration for the stack options is set in such a way that DAD will not be
performed. See `stack.Options` and `stack.NDPConfigurations` for more details.

Tests: Tests to make sure that the DAD process properly resolves or fails.
That is, tests make sure that DAD resolves only if:
  - No other node is performing DAD for the same address
  - No other node owns the same address
PiperOrigin-RevId: 275189471
2019-10-16 22:54:45 -07:00
Kevin Krakauer 2a82d5ad68 Reorder BUILD license and load functions in gvisor.
PiperOrigin-RevId: 275139066
2019-10-16 16:40:30 -07:00
Michael Pratt 8fe48dcb1e Add sublevel to kernel version
Standard Linux kernel versions are VERSION.PATCHLEVEL.SUBLEVEL. e.g., 4.4.0,
even when the sublevel is 0. Match this standard.

PiperOrigin-RevId: 275125715
2019-10-16 15:22:42 -07:00
Fabricio Voznika 9fb562234e Fix problem with open FD when copy up is triggered in overlayfs
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.

Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.

Closes #1006

It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.

PiperOrigin-RevId: 275115289
2019-10-16 15:06:24 -07:00
Nicolas Lacasse fd4e436002 Support O_SYNC and O_DSYNC flags.
When any of these flags are set, all writes will trigger a subsequent fsync
call. This behavior already existed for "write-through" mounts.

O_DIRECT is treated as an alias for O_SYNC. Better support coming soon.

PiperOrigin-RevId: 275114392
2019-10-16 15:01:23 -07:00
Michael Pratt bbdcf44ebb Fix syscall changes lost in rebase
These syscalls were changed in the amd64 file around the time the arm64 PR was
sent out, so their changes got lost.

Updates #63

PiperOrigin-RevId: 275114194
2019-10-16 14:56:29 -07:00
gVisor bot d22f0534c0 Merge pull request #736 from tanjianfeng:fix-unix
PiperOrigin-RevId: 275114157
2019-10-16 14:41:43 -07:00
Jamie Liu 0457a4c4cb Minor vfs.FileDescriptionImpl fixes.
- Pass context.Context to OnClose().

- Pass memmap.MMapOpts to ConfigureMMap() by pointer so that implementations
  can actually mutate it as required.

PiperOrigin-RevId: 274934967
2019-10-15 18:40:45 -07:00
Bhasker Hariharan f98c3ee32c Remove panic when reassembly fails.
Reassembly can fail due to an invalid sequence of fragments
being received. eg. Multiple fragments with same id which
claim to be the last one by setting the more flag to 0 etc.
It's safer to just drop the reassembler and increment a metric
than to panic when reassembly fails.

PiperOrigin-RevId: 274920901
2019-10-15 17:04:44 -07:00
Tamir Duberstein db1ca5c786 Set NDP hop limit in accordance with RFC 4861
...and do not populate link address cache at dispatch. This partially
reverts 313c767b00, which caused malformed
packets (e.g. NDP Neighbor Adverts with incorrect hop limit values) to
populate the address cache. In particular, this masked a bug that was
introduced to the Neighbor Advert generation code in
7c1587e340.

PiperOrigin-RevId: 274865182
2019-10-15 12:43:25 -07:00
Jianfeng Tan d277bfba27 epsocket: support /proc/net/snmp
Netstack has its own stats, we use this to fill /proc/net/snmp.

Note that some metrics are not recorded in Netstack, which will be shown
as 0 in the proc file.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15 16:38:41 +00:00
Jianfeng Tan aee2c93366 netstack: add counters for tcp CurrEstab and EstabResets
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-15 16:38:40 +00:00
Jianfeng Tan dd7d1f825d hostinet: support /proc/net/snmp and /proc/net/dev
For hostinet, we inherit the data from host procfs. To to that, we
cache the fds for these files for later reads.

Fixes #506

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I2f81215477455b9c59acf67e33f5b9af28ee0165
2019-10-15 16:38:40 +00:00
Jianfeng Tan b94505ecc0 support /proc/net/route
This proc file reports routing information to applications inside the
container.

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15 16:38:40 +00:00
Jianfeng Tan e3d4a67739 support /proc/net/snmp
This proc file contains statistics according to [1].

[1] https://tools.ietf.org/html/rfc2013

Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-15 16:38:40 +00:00
gVisor bot bfa0bb24dd Internal change.
PiperOrigin-RevId: 274700093
2019-10-14 17:46:52 -07:00
Kevin Krakauer 2302afb53d Reorder BUILD license and load functions in netstack.
PiperOrigin-RevId: 274672346
2019-10-14 15:21:59 -07:00
Bhasker Hariharan a296425970 Use a different fanoutID for each new fdbased endpoint.
PiperOrigin-RevId: 274638272
2019-10-14 13:10:16 -07:00
Ian Lewis 470997ca99 Allow for zero byte iovec with MSG_PEEK | MSG_TRUNC in recvmsg.
This allows for peeking at the length of the next message on a netlink socket
without pulling it off the socket's buffer/queue, allowing tools like 'ip' to
work.

This CL also fixes an issue where dump_done_errno was not included in the
NLMSG_DONE messages payload.

Issue #769

PiperOrigin-RevId: 274068637
2019-10-10 16:55:48 -07:00
Bhasker Hariharan c7e901f47a Fix bugs in fragment handling.
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.

PiperOrigin-RevId: 274049313
2019-10-10 15:14:55 -07:00
Adin Scannell f8b1859319 Fix signalfd polling.
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 274017890
2019-10-10 12:51:22 -07:00
gVisor bot 14952d01fb Merge pull request #909 from xiaobo55x:atomic_bitsops
PiperOrigin-RevId: 274011064
2019-10-10 12:46:46 -07:00
gVisor bot bf870c1a42 Internal change.
PiperOrigin-RevId: 273861936
2019-10-09 17:56:05 -07:00
gVisor bot 7a2d5b2fa7 Merge pull request #811 from lubinszARM:pr_testutil
PiperOrigin-RevId: 273781641
2019-10-09 12:00:53 -07:00
gVisor bot 559aba7670 Merge pull request #813 from xiaobo55x:pkg_sleep
PiperOrigin-RevId: 273668431
2019-10-09 11:11:28 -07:00
Haibo Xu ebbf2b7fbd Enable pkg/atomicbitops support on arm64.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I1646aaa6f07b5ec31c39c318b70f48693fe59a7c
2019-10-09 03:09:52 +00:00
Ian Gudger 7c1587e340 Implement IP_TTL.
Also change the default TTL to 64 to match Linux.

PiperOrigin-RevId: 273430341
2019-10-07 19:29:51 -07:00
Kevin Krakauer 1de0cf3563 Remove unnecessary context parameter for new pipes.
PiperOrigin-RevId: 273421634
2019-10-07 18:16:14 -07:00
Kevin Krakauer 6a98237949 Rename epsocket to netstack.
PiperOrigin-RevId: 273365058
2019-10-07 13:57:59 -07:00
gVisor bot 8fce24d33a Merge pull request #753 from lubinszARM:pr_syscall_linux
PiperOrigin-RevId: 273364848
2019-10-07 13:52:19 -07:00
Nicolas Lacasse f24c3188b5 Add sanity check that overlayCreate is called with an overlay parent inode.
PiperOrigin-RevId: 272987037
2019-10-04 17:03:50 -07:00
Jamie Liu b941e35761 Return EIO from p9 if flipcall.Endpoint.Connect() fails.
Also ensure that all flipcall transport errors not returned by p9 (converted to
EIO by the client, or dropped on the floor by channel server goroutines) are
logged.

PiperOrigin-RevId: 272963663
2019-10-04 14:56:53 -07:00
Kevin Krakauer 7ef1c44a7f Change linux.FileMode from uint to uint16, and update VFS to use FileMode.
In Linux (include/linux/types.h), mode_t is an unsigned short.

PiperOrigin-RevId: 272956350
2019-10-04 14:20:32 -07:00
Chris Kuiper 4874525161 Implement proper local broadcast behavior
The behavior for sending and receiving local broadcast (255.255.255.255)
traffic is as follows:

Outgoing
--------
* A broadcast packet sent on a socket that is bound to an interface goes out
  that interface
* A broadcast packet sent on an unbound socket follows the route table to
  select the outgoing interface
  + if an explicit route entry exists for 255.255.255.255/32, use that one
  + else use the default route
* Broadcast packets are looped back and delivered following the rules for
  incoming packets (see next). This is the same behavior as for multicast
  packets, except that it cannot be disabled via sockopt.

Incoming
--------
* Sockets wishing to receive broadcast packets must bind to either INADDR_ANY
  (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives
  broadcast packets.
* Broadcast packets are multiplexed to all sockets matching it. This is the
  same behavior as for multicast packets.
* A socket can bind to 255.255.255.255:<port> and then receive its own
  broadcast packets sent to 255.255.255.255:<port>

In addition, this change implicitly fixes an issue with multicast reception. If
two sockets want to receive a given multicast stream and one is bound to ANY
while the other is bound to the multicast address, only one of them will
receive the traffic.

PiperOrigin-RevId: 272792377
2019-10-03 19:31:35 -07:00
gVisor bot 135aadb517 Merge pull request #757 from xiaobo55x:pkg_bits
PiperOrigin-RevId: 272760964
2019-10-03 16:13:34 -07:00
Andrei Vagin db218fdfcf Don't report partialResult errors from sendfile
The input file descriptor is always a regular file, so sendfile can't lose any
data if it will not be able to write them to the output file descriptor.

Reported-by: syzbot+22d22330a35fa1c02155@syzkaller.appspotmail.com
PiperOrigin-RevId: 272730357
2019-10-03 13:38:30 -07:00
gVisor bot cde7711837 Merge pull request #865 from tanjianfeng:fix-829
PiperOrigin-RevId: 272522508
2019-10-02 14:51:04 -07:00
Andrei Vagin 2016cc283c fs/proc: report PID-s from a pid namespace of the proc mount
Right now, we can find more than one process with the 1 PID in /proc.

$ for i in `seq 10`; do
> unshare -fp sleep 1000 &
> done

$ ls /proc
1  1  1  1  12  18  24  29  6            loadavg  net   sys          version
1  1  1  1  16  20  26  32  cpuinfo      meminfo  self  thread-self
1  1  1  1  17  21  28  36  filesystems  mounts   stat  uptime

PiperOrigin-RevId: 272506593
2019-10-02 13:29:42 -07:00
Andrei Vagin 9a875306db
Merge branch 'master' into pr_syscall_linux 2019-10-02 13:00:07 -07:00
Michael Pratt 0d483985c5 Include AT_SECURE in the aux vector
gVisor does not currently implement the functionality that would result in
AT_SECURE = 1, but Linux includes AT_SECURE = 0 in the normal case, so we
should do the same.
PiperOrigin-RevId: 272311488
2019-10-01 15:43:14 -07:00
Michael Pratt dd69b49ed1 Disable cpuClockTicker when app is idle
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to
track their CPU usage. This improves latency in the syscall path by avoid
expensive monotonic clock calls on every syscall entry/exit.

However, this timer fires every 10ms. Thus, when all tasks are idle (i.e.,
blocked or stopped), this forces a sentry wakeup every 10ms, when we may
otherwise be able to sleep until the next app-relevant event. These wakeups
cause the sentry to utilize approximately 2% CPU when the application is
otherwise idle.

Updates to clock are not strictly necessary when the app is idle, as there are
no readers of cpuClock. This commit reduces idle CPU by disabling the timer
when tasks are completely idle, and computing its effects at the next wakeup.

Rather than disabling the timer as soon as the app goes idle, we wait until the
next tick, which provides a window for short sleeps to sleep and wakeup without
doing the (relatively) expensive work of disabling and enabling the timer.

PiperOrigin-RevId: 272265822
2019-10-01 12:21:01 -07:00
Michael Pratt 53cc72da90 Honor X bit on extra anon pages in PT_LOAD segments
Linux changed this behavior in 16e72e9b30986ee15f17fbb68189ca842c32af58
(v4.11). Previously, extra pages were always mapped RW. Now, those pages will
be executable if the segment specified PF_X. They still must be writeable.

PiperOrigin-RevId: 272256280
2019-10-01 11:30:36 -07:00
Andrei Vagin 7a234f736f splice: try another fallback option only if the previous one isn't supported
Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com
PiperOrigin-RevId: 272110815
2019-09-30 18:23:42 -07:00
Andrei Vagin 29a1ba54ea splice: compare inode numbers only if both ends are pipes
It isn't allowed to splice data from and into the same pipe.

But right now this check is broken, because we don't check that both ends are
pipes.

PiperOrigin-RevId: 272107022
2019-09-30 17:57:14 -07:00
Adin Scannell 20841b98e1 Update FIXME bug with GitHub issue.
PiperOrigin-RevId: 272101930
2019-09-30 17:24:29 -07:00
Bhasker Hariharan bcbb3ef317 Add a Stringer implementation to PacketDispatchMode
PiperOrigin-RevId: 272083936
2019-09-30 15:52:55 -07:00
Bhasker Hariharan 61f6fbd0ce Fix bugs in PickEphemeralPort for TCP.
Netstack always picks a random start point everytime PickEphemeralPort
is called. While this is required for UDP so that DNS requests go
out through a randomized set of ports it is not required for TCP. Infact
Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret
initialized at start of the application to get a random offset. But to
ensure it doesn't start from the same point on every scan it uses a static
hint that is incremented by 2 in every call to pick ephemeral ports.

The reason for 2 is Linux seems to split the port ranges where active connects
seem to use even ones while odd ones are used by listening sockets.

This CL implements a similar strategy where we use a hash + hint to generate
the offset to start the search for a free Ephemeral port.

This ensures that we cycle through the available port space in order for
repeated connects to the same destination and significantly reduces the
chance of picking a recently released port.

PiperOrigin-RevId: 272058370
2019-09-30 13:55:22 -07:00
Nicolas Lacasse 3ad17ff597 Force timestamps to update when set via InodeOperations.SetTimestamps.
The gofer's CachingInodeOperations implementation contains an optimization for
the common open-read-close pattern when we have a host FD.  In this case, the
host kernel will update the timestamp for us to a reasonably close time, so we
don't need an extra RPC to the gofer.

However, when the app explicitly sets the timestamps (via futimes or similar)
then we actually DO need to update the timestamps, because the host kernel
won't do it for us.

To fix this, a new boolean `forceSetTimestamps` was added to
CachineInodeOperations.SetMaskedAttributes. It is only set by
gofer.InodeOperations.SetTimestamps.

PiperOrigin-RevId: 272048146
2019-09-30 13:08:45 -07:00
Michael Pratt 981fc188f0 Only copy out remaining time on nanosleep success
It looks like the old code attempted to do this, but didn't realize that err !=
nil even in the happy case.

PiperOrigin-RevId: 272005887
2019-09-30 13:07:32 -07:00
gVisor bot eebc38be7a Merge pull request #882 from DarcySail:darcy_faster_CopyStringIn
PiperOrigin-RevId: 271675009
2019-09-27 17:27:13 -07:00
gVisor bot 8539abc0df Merge pull request #864 from tanjianfeng:fix-861
PiperOrigin-RevId: 271649711
2019-09-27 15:18:09 -07:00
gVisor bot abbee5615f Implement SO_BINDTODEVICE sockopt
PiperOrigin-RevId: 271644926
2019-09-27 14:14:04 -07:00
Kevin Krakauer 543492650d Make raw socket tests pass in environments with or without CAP_NET_RAW.
PiperOrigin-RevId: 271442321
2019-09-26 15:09:20 -07:00
gVisor bot dd0e5eedae Merge pull request #765 from trailofbits:uds_support
PiperOrigin-RevId: 271235134
2019-09-25 16:44:22 -07:00
Kevin Krakauer 59ccbb1044 Remove centralized registration of protocols.
Also removes the need for protocol names.

PiperOrigin-RevId: 271186030
2019-09-25 12:57:05 -07:00
gVisor bot 99c86b8dbd Merge pull request #863 from tanjianfeng:fix-862
PiperOrigin-RevId: 271168948
2019-09-25 11:36:06 -07:00
gVisor bot 76ff1947b6 gvisor: change syscall.RawSyscall to syscall.RawSyscall6 where required
Before https://golang.org/cl/173160 syscall.RawSyscall would zero out
the last three register arguments to the system call. That no longer happens.
For system calls that take more than three arguments, use RawSyscall6 to
ensure that we pass zero, not random data, for the additional arguments.

PiperOrigin-RevId: 271062527
2019-09-24 23:47:42 -07:00
Adin Scannell 502f8f238e Stub out readahead implementation.
Closes #261

PiperOrigin-RevId: 270973347
2019-09-24 13:29:46 -07:00
Chris Kuiper 6704d625ef Return only primary addresses in Stack.NICInfo()
Non-primary addresses are used for endpoints created to accept multicast and
broadcast packets, as well as "helper" endpoints (0.0.0.0) that allow sending
packets when no proper address has been assigned yet (e.g., for DHCP). These
addresses are not real addresses from a user point of view and should not be
part of the NICInfo() value. Also see b/127321246 for more info.

This switches NICInfo() to call a new NIC.PrimaryAddresses() function. To still
allow an option to get all addresses (mostly for testing) I added
Stack.GetAllAddresses() and NIC.AllAddresses().

In addition, the return value for GetMainNICAddress() was changed for the case
where the NIC has no primary address. Instead of returning an error here,
it now returns an empty AddressWithPrefix() value. The rational for this
change is that it is a valid case for a NIC to have no primary addresses.

Lastly, I refactored the code based on the new additions.

PiperOrigin-RevId: 270971764
2019-09-24 13:21:20 -07:00