Commit Graph

1312 Commits

Author SHA1 Message Date
Andrei Vagin a12848ffeb netstack/tcp: fix calculating a number of outstanding packets
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.

PiperOrigin-RevId: 251743020
2019-06-05 16:30:45 -07:00
Chris Kuiper d18bb4f38a Adjust route when looping multicast packets
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.

PiperOrigin-RevId: 251739298
2019-06-05 16:08:29 -07:00
Ian Gudger c08fcaa364 Give test instantiations meaningful names.
PiperOrigin-RevId: 251737069
2019-06-05 15:57:27 -07:00
Nicolas Lacasse 3b950344b3 Bump googletest version.
PiperOrigin-RevId: 251716439
2019-06-05 14:17:22 -07:00
Michael Pratt d3ed9baac0 Implement dumpability tracking and checks
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.

Lack of support for set-uid binaries or fs creds simplifies things a
bit.

As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.

PiperOrigin-RevId: 251712714
2019-06-05 14:00:13 -07:00
Adin Scannell cecb71dc37 Building containerd with go modules is broken, use GOPATH.
PiperOrigin-RevId: 251583707
2019-06-04 23:09:18 -07:00
Bhasker Hariharan e0fb921205 Fix data race in synRcvdState.
When checking the length of the acceptedChan we should hold the
endpoint mutex otherwise a syn received while the listening socket
is being closed can result in a data race where the cleanupLocked
routine sets acceptedChan to nil while a handshake goroutine
in progress could try and check it at the same time.

PiperOrigin-RevId: 251537697
2019-06-04 16:17:24 -07:00
Yong He 7398f013f0 Drop one dirent reference after referenced by file
When pipe is created, a dirent of pipe will be
created and its initial reference is set as 0.
Cause all dirent will only be destroyed when
the reference decreased to -1, so there is already
a 'initial reference' of dirent after it created.
For destroying dirent after all reference released,
the correct way is to drop the 'initial reference'
once someone hold a reference to the dirent, such
as fs.NewFile, otherwise the reference of dirent
will stay 0 all the time, and will cause memory
leak of dirent.
Except pipe, timerfd/eventfd/epoll has the same
problem

Here is a simple case to create memory leak of dirent
for pipe/timerfd/eventfd/epoll in C langange, after
run the case, pprof the runsc process, you will
find lots dirents of pipe/timerfd/eventfd/epoll not
freed:

int main(int argc, char *argv[])
{
	int i;
	int n;
	int pipefd[2];

	if (argc != 3) {
		printf("Usage: %s epoll|timerfd|eventfd|pipe <iterations>\n", argv[0]);
	}

	n = strtol(argv[2], NULL, 10);

	if (strcmp(argv[1], "epoll") == 0) {
		for (i = 0; i < n; ++i)
			close(epoll_create(1));
	} else if (strcmp(argv[1], "timerfd") == 0) {
		for (i = 0; i < n; ++i)
			close(timerfd_create(CLOCK_REALTIME, 0));
	} else if (strcmp(argv[1], "eventfd") == 0) {
		for (i = 0; i < n; ++i)
			close(eventfd(0, 0));
	} else if (strcmp(argv[1], "pipe") == 0) {
		for (i = 0; i < n; ++i)
			if (pipe(pipefd) == 0) {
				close(pipefd[0]);
				close(pipefd[1]);
			}
	}

	printf("%s %s test finished\r\n",argv[1],argv[2]);
	return 0;
}

Change-Id: Ia1b8a1fb9142edb00c040e44ec644d007f81f5d2
PiperOrigin-RevId: 251531096
2019-06-04 15:40:23 -07:00
Adin Scannell 6f92038ce0 Use github directory if it exists.
Unfortunately, kokoro names the top-level directory per the SCM type. This
means there's no way to make the job names match; we simply need to probe for
the existence of the correct directory.

PiperOrigin-RevId: 251519409
2019-06-04 14:43:24 -07:00
Nicolas Lacasse 0c292cdaab Remove the Dirent field from Pipe.
Dirents are ref-counted, but Pipes are not. Holding a Dirent inside of a Pipe
raises difficult questions about the lifecycle of the Pipe and Dirent.

Fortunately, we can side-step those questions by removing the Dirent field from
Pipe entirely. We only need the Dirent when constructing fs.Files (which are
ref-counted), and in GetFile (when a Dirent is passed to us anyways).

PiperOrigin-RevId: 251497628
2019-06-04 12:58:56 -07:00
Adin Scannell 7436ea247b Fix Kokoro revision and 'go get usage'
As a convenience for debugging, also factor the scripts such that
can be run without Kokoro. In the future, this may be used to add
additional presubmit hooks that run without Kokoro.

PiperOrigin-RevId: 251474868
2019-06-04 11:07:27 -07:00
Adin Scannell f520d0d585 Resolve impossible dependencies.
PiperOrigin-RevId: 251377523
2019-06-03 23:00:40 -07:00
Andrei Vagin 90a116890f gvisor/sock/unix: pass creds when a message is sent between unconnected sockets
and don't report a sender address if it doesn't have one

PiperOrigin-RevId: 251371284
2019-06-03 21:48:19 -07:00
Andrei Vagin 00f8663887 gvisor/fs: return a proper error from FileWriter.Write in case of a short-write
The io.Writer contract requires that Write writes all available
bytes and does not return short writes. This causes errors with
io.Copy, since our own Write interface does not have this same
contract.

PiperOrigin-RevId: 251368730
2019-06-03 21:26:01 -07:00
Fabricio Voznika f1aee6a7ad Refactor container FS setup
No change in functionaly. Added containerMounter object
to keep state while the mounts are processed. This will
help upcoming changes to share mounts per-pod.

PiperOrigin-RevId: 251350096
2019-06-03 18:20:57 -07:00
Fabricio Voznika d28f71adcf Remove 'clearStatus' option from container.Wait*PID()
clearStatus was added to allow detached execution to wait
on the exec'd process and retrieve its exit status. However,
it's not currently used. Both docker and gvisor-containerd-shim
wait on the "shim" process and retrieve the exit status from
there. We could change gvisor-containerd-shim to use waits, but
it will end up also consuming a process for the wait, which is
similar to having the shim process.

Closes #234

PiperOrigin-RevId: 251349490
2019-06-03 18:16:09 -07:00
Adin Scannell 18e6e63503 Allow specification of origin in cloudbuild.
PiperOrigin-RevId: 251347966
2019-06-03 18:05:59 -07:00
Bhasker Hariharan bfe3220992 Delete debug log lines left by mistake.
Updates #236

PiperOrigin-RevId: 251337915
2019-06-03 17:00:18 -07:00
Michael Pratt 6e1f51f3eb Remove duplicate socket tests
socket_unix_abstract.cc: Subset of socket_abstract.cc
socket_unix_filesystem.cc: Subset of socket_filesystem.cc

PiperOrigin-RevId: 251297117
2019-06-03 13:31:47 -07:00
Michael Pratt 4555f6adc8 Update straggling copyright holder
Updates #209

PiperOrigin-RevId: 251289513
2019-06-03 12:51:55 -07:00
Michael Pratt 955685845e Remove spurious period
PiperOrigin-RevId: 251288885
2019-06-03 12:48:24 -07:00
Andrei Vagin 8e926e3f74 gvisor: validate a new map region in the mremap syscall
Right now, mremap allows to remap a memory region over MaxUserAddress,
this means that we can change the stub region.

PiperOrigin-RevId: 251266886
2019-06-03 10:59:46 -07:00
Adin Scannell 216da0b733 Add tooling for Go-compatible branch.
The WORKSPACE go_repositories can be generated from a standard go.mod file. Add
the necessary gazelle hooks to do so, and include a test that sanity checks
there are no changes. This go.mod file will be used in a subsequent commit to
generate a go gettable branch of the repository.

This commit also adds a tools/go_branch.sh script, which given an existing go
branch in the repository, will add an additional synthetic change to the branch
bringing it up-to-date with HEAD.

As a final step, a cloudbuild script is included, which can be used to automate
the process for every change pushed to the repository. This may be used after
an initial go branch is pushed, but this is manual process.

PiperOrigin-RevId: 251095016
2019-06-01 23:10:43 -07:00
Bhasker Hariharan 3577a4f691 Disable certain tests that are flaky under race detector.
PiperOrigin-RevId: 250976665
2019-05-31 16:19:49 -07:00
Bhasker Hariharan 033f96cc93 Change segment queue limit to be of fixed size.
Netstack sets the unprocessed segment queue size to match the receive
buffer size. This is not required as this queue only needs to hold enough
for a short duration before the endpoint goroutine can process it.

Updates #230

PiperOrigin-RevId: 250976323
2019-05-31 16:17:33 -07:00
Adin Scannell 132bf68de4 Switch to new dedicated RBE project.
PiperOrigin-RevId: 250970783
2019-05-31 15:43:30 -07:00
Nicolas Lacasse 6f73d79c32 Simplify overlayBoundEndpoint.
There is no reason to do the recursion manually, since
Inode.BoundEndpoint will do it for us.

PiperOrigin-RevId: 250794903
2019-05-30 17:20:20 -07:00
Fabricio Voznika 38de91b028 Add build guard to files using go:linkname
Funcion signatures are not validated during compilation. Since
they are not exported, they can change at any time. The guard
ensures that they are verified at least on every version upgrade.

PiperOrigin-RevId: 250733742
2019-05-30 12:09:39 -07:00
Adin Scannell e3c5fa3345 Update CONTRIBUTING.md
PiperOrigin-RevId: 250730726
2019-05-30 12:09:10 -07:00
Bhasker Hariharan ae26b2c425 Fixes to TCP listen behavior.
Netstack listen loop can get stuck if cookies are in-use and the app is slow to
accept incoming connections. Further we continue to complete handshake for a
connection even if the backlog is full. This creates a problem when a lots of
connections come in rapidly and we end up with lots of completed connections
just hanging around to be delivered.

These fixes change netstack behaviour to mirror what linux does as described
here in the following article

http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

Now when cookies are not in-use Netstack will silently drop the ACK to a SYN-ACK
and not complete the handshake if the backlog is full.  This will result in the
connection staying in a half-complete state. Eventually the sender will
retransmit the ACK and if backlog has space we will transition to a connected
state and deliver the endpoint.

Similarly when cookies are in use we do not try and create an endpoint unless
there is space in the accept queue to accept the newly created endpoint. If
there is no space then we again silently drop the ACK as we can just recreate it
when the ACK is retransmitted by the peer.

We also now use the backlog to cap the size of the SYN-RCVD queue for a given
endpoint. So at any time there can be N connections in the backlog and N in a
SYN-RCVD state if the application is not accepting connections. Any new SYNs
will be dropped.

This CL also fixes another small bug where we mark a new endpoint which has not
completed handshake as connected. We should wait till handshake successfully
completes before marking it connected.

Updates #236

PiperOrigin-RevId: 250717817
2019-05-30 12:08:41 -07:00
Michael Pratt 8d25cd0b40 Update procid for Go 1.13
Upstream Go has no changes here.

PiperOrigin-RevId: 250602731
2019-05-30 12:08:10 -07:00
chris.zn b18df9bed6 Add VmData field to /proc/{pid}/status
VmData is the size of private data segments.
It has the same meaning as in Linux.

Change-Id: Iebf1ae85940a810524a6cde9c2e767d4233ddb2a
PiperOrigin-RevId: 250593739
2019-05-30 12:07:40 -07:00
Bhasker Hariharan 035a8fa38e Add support for collecting execution trace to runsc.
Updates #220

PiperOrigin-RevId: 250532302
2019-05-30 12:07:11 -07:00
Andrei Vagin b52e571a61 runsc/do: don't specify the read-only flag for the root mount
The root mount is an overlay mount.

PiperOrigin-RevId: 250429317
2019-05-30 12:06:42 -07:00
Andrei Vagin 4b9cb38157 gvisor: socket() returns EPROTONOSUPPORT if protocol is not supported
PiperOrigin-RevId: 250426407
2019-05-30 12:06:15 -07:00
Michael Pratt 507a15dce9 Always wait on tracee children
After bf959931ddb88c4e4366e96dd22e68fa0db9527c ("wait/ptrace: assume
__WALL if the child is traced") (Linux 4.7), tracees are always eligible
for waiting, regardless of type.

PiperOrigin-RevId: 250399527
2019-05-30 12:05:46 -07:00
Andrei Vagin 673358c0d9 runsc/do: allow to run commands in a host network namespace
PiperOrigin-RevId: 250329795
2019-05-30 12:05:16 -07:00
Fabricio Voznika 1e42b4cfca Update internal flag name and documentation
Updates #234

PiperOrigin-RevId: 250323553
2019-05-30 12:04:49 -07:00
Adin Scannell f29ea87d2a Create annotated tags for release.
PiperOrigin-RevId: 249929942
2019-05-30 12:04:20 -07:00
Adin Scannell 2165b77774 Remove obsolete bug.
The original bug is no longer relevant, and the FIXME here
contains lots of obsolete information.

PiperOrigin-RevId: 249924036
2019-05-30 12:03:39 -07:00
Adin Scannell ed5793808e Remove obsolete TODO.
We don't need to model internal interfaces after the system
call interfaces (which are objectively worse and simply use a
flag to distinguish between two logically different operations).

PiperOrigin-RevId: 249916814
Change-Id: I45d02e0ec0be66b782a685b1f305ea027694cab9
2019-05-24 16:18:09 -07:00
Michael Pratt 6cdec6fadf Wrap comments and reword in common present tense
PiperOrigin-RevId: 249888234
Change-Id: Icfef32c3ed34809c34100c07e93e9581c786776e
2019-05-24 13:23:53 -07:00
Tamir Duberstein 9119478830 Extract SleepSafe from test_util
Allows socket tests that rely on test_util to compile on Fuchsia.

PiperOrigin-RevId: 249884084
Change-Id: I17617e3f1baaba4c85c689f40db4a42a8de1597e
2019-05-24 12:58:46 -07:00
Tamir Duberstein e4b395db49 Remove unused wakers
These wakers are uselessly allocated and passed around; nothing ever
listens for notifications on them. The code here appears to be
vestigial, so removing it and allowing a nil waker to be passed seems
appropriate.

PiperOrigin-RevId: 249879320
Change-Id: Icd209fb77cc0dd4e5c49d7a9f2adc32bf88b4b71
2019-05-24 12:29:14 -07:00
Andrei Vagin a949133c4b gvisor: interrupt the sendfile system call if a task has been interrupted
sendfile can be called for a big range and it can require significant
amount of time to process it, so we need to handle task interrupts in
this system call.

PiperOrigin-RevId: 249781023
Change-Id: Ifc2ec505d74c06f5ee76f93b8d30d518ec2d4015
2019-05-23 23:21:13 -07:00
Andrei Vagin 409e8eea60 runsc/do: do a proper cleanup if a command failed due to internal errors
Fatalf calls os.Exit and a process exits without calling defer callbacks.

Should we do this for other runsc commands?

PiperOrigin-RevId: 249776310
Change-Id: If9d8b54d0ae37db443895906eb33bd9e9b600cc9
2019-05-23 22:28:38 -07:00
Ayush Ranjan 6240abb205 Added boilerplate code for ext4 fs.
Initialized BUILD with license
Mount is still unimplemented and is not meant to be
part of this CL. Rest of the fs interface is implemented.
Referenced the Linux kernel appropriately when needed

PiperOrigin-RevId: 249741997
Change-Id: Id1e4c7c9e68b3f6946da39896fc6a0c3dcd7f98c
2019-05-23 16:55:42 -07:00
Fabricio Voznika c091e62369 Set sticky bit to /tmp
This is generally done for '/tmp' to prevent accidental
deletion of files. More details here:
http://man7.org/linux/man-pages/man1/chmod.1.html#RESTRICTED_DELETION_FLAG_OR_STICKY_BIT

PiperOrigin-RevId: 249633207
Change-Id: I444a5b406fdef664f5677b2f20f374972613a02b
2019-05-23 06:48:00 -07:00
Fabricio Voznika 9006304dfe Initial support for bind mounts
Separate MountSource from Mount. This is needed to allow
mounts to be shared by multiple containers within the same
pod.

PiperOrigin-RevId: 249617810
Change-Id: Id2944feb7e4194951f355cbe6d4944ae3c02e468
2019-05-23 04:16:10 -07:00
Bhasker Hariharan 022bd0fd10 Fix the signature for gopark.
gopark's signature was changed from having a string reason to a
uint8.

See: 4d7cf3fedb

This broke execution tracing of the sentry.

Switching to the right signature makes tracing work again.

Updates #220

PiperOrigin-RevId: 249565311
Change-Id: If77fd276cecb37d4003c8222f6de510b8031a074
2019-05-22 18:57:15 -07:00