Commit Graph

1534 Commits

Author SHA1 Message Date
Jamie Liu 48961d27a8 Move //pkg/sentry/memutil to //pkg/memutil.
PiperOrigin-RevId: 252124156
2019-06-07 14:52:27 -07:00
Kevin Krakauer 8afbd974da Address Ian's comments.
Change-Id: I7445033b1970cbba3f2ed0682fe520dce02d8fad
2019-06-07 12:54:53 -07:00
Adin Scannell e5fb3aab12 BUILD: Use runsc to generate version
This also ensures BUILD files are correctly formatted.

PiperOrigin-RevId: 251990267
2019-06-06 22:09:55 -07:00
Jamie Liu c933f3eede Change visibility of //pkg/sentry/time.
PiperOrigin-RevId: 251965598
2019-06-06 17:58:55 -07:00
Fabricio Voznika 2e43dcb26b Add alsologtostderr option
When set sends log messages to the error log:

sudo ./runsc --logtostderr do ls
I0531 17:59:58.105064  144564 x:0] ***************************
I0531 17:59:58.105087  144564 x:0] Args: [runsc --logtostderr do ls]
I0531 17:59:58.105112  144564 x:0] PID: 144564
I0531 17:59:58.105125  144564 x:0] UID: 0, GID: 0
[...]

PiperOrigin-RevId: 251964377
2019-06-06 17:50:07 -07:00
Jamie Liu 9ea248489b Cap initial usermem.CopyStringIn buffer size.
Almost (?) all uses of CopyStringIn are via linux.copyInPath(), which
passes maxlen = linux.PATH_MAX = 4096. Pre-allocating a buffer of this
size is measurably inefficient in most cases: most paths will not be
this long, 4 KB is a lot of bytes to zero, and as of this writing the Go
runtime allocator maps only two 4 KB objects to each 8 KB span,
necessitating a call to runtime.mcache.refill() on ~every other call.
Limit the initial buffer size to 256 B instead, and geometrically
reallocate if necessary.

PiperOrigin-RevId: 251960441
2019-06-06 17:22:00 -07:00
Rahat Mahmood 315cf9a523 Use common definition of SockType.
SockType isn't specific to unix domain sockets, and the current
definition basically mirrors the linux ABI's definition.

PiperOrigin-RevId: 251956740
2019-06-06 17:00:27 -07:00
Ian Lewis 6a4c006564 Add the gVisor gitter badge to the README
Moves the build badge to just below the logo and adds the gitter badge next to
it for consistency.

PiperOrigin-RevId: 251956383
2019-06-06 16:58:29 -07:00
Fabricio Voznika 02ab1f187c Copy up parent when binding UDS on overlayfs
Overlayfs was expecting the parent to exist when bind(2)
was called, which may not be the case. The fix is to copy
the parent directory to the upper layer before binding
the UDS.

There is not good place to add tests for it. Syscall tests
would be ideal, but it's hard to guarantee that the
directory where the socket is created hasn't been touched
before (and thus copied the parent to the upper layer).
Added it to runsc integration tests for now. If it turns
out we have lots of these kind of tests, we can consider
moving them somewhere more appropriate.

PiperOrigin-RevId: 251954156
2019-06-06 16:45:51 -07:00
Jamie Liu b3f104507d "Implement" mbind(2).
We still only advertise a single NUMA node, and ignore mempolicy
accordingly, but mbind() at least now succeeds and has effects reflected
by get_mempolicy().

Also fix handling of nodemasks: round sizes to unsigned long (as
documented and done by Linux), and zero trailing bits when copying them
out.

PiperOrigin-RevId: 251950859
2019-06-06 16:29:46 -07:00
Jamie Liu a26043ee53 Implement reclaim-driven MemoryFile eviction.
PiperOrigin-RevId: 251950660
2019-06-06 16:27:55 -07:00
Fabricio Voznika 93aa7d1167 Remove tmpfs restriction from test
runsc supports UDS over gofer mounts and tmpfs is
not needed for this test.

PiperOrigin-RevId: 251944870
2019-06-06 15:56:20 -07:00
Rahat Mahmood 2d2831e354 Track and export socket state.
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).

For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.

PiperOrigin-RevId: 251934850
2019-06-06 15:04:47 -07:00
Fabricio Voznika bf0b1b9d76 Add overlay dimension to FS related syscall tests
PiperOrigin-RevId: 251929314
2019-06-06 14:38:47 -07:00
Rahat Mahmood 8b8bd8d5b2 Try increase listen backlog.
PiperOrigin-RevId: 251928000
2019-06-06 14:32:04 -07:00
Googler 81eafb2c5e Internal change.
PiperOrigin-RevId: 251902567
2019-06-06 12:29:12 -07:00
Fabricio Voznika 720ec3590d Send error message to docker/kubectl exec on failure
Containerd uses the last error message sent to the log to
print as failure cause for create/exec. This required a
few changes in the logging logic for runsc:
  - cmd.Errorf/Fatalf: now writes a message with 'error'
    level to containerd log, in addition to stderr and
    debug logs, like before.
  - log.Infof/Warningf/Fatalf: are not sent to containerd
    log anymore. They are mostly used for debugging and not
    useful to containerd. In most cases, --debug-log is
    enabled and this avoids the logs messages from being
    duplicated.
  - stderr is not used as default log destination anymore.
    Some commands assume stdio is for the container/process
    running inside the sandbox and it's better to never use
    it for logging. By default, logs are supressed now.

PiperOrigin-RevId: 251881815
2019-06-06 10:49:43 -07:00
Bhasker Hariharan 85be01b42d Add multi-fd support to fdbased endpoint.
This allows an fdbased endpoint to have multiple underlying fd's from which
packets can be read and dispatched/written to.

This should allow for higher throughput as well as better scalability of the
network stack as number of connections increases.

Updates #231

PiperOrigin-RevId: 251852825
2019-06-06 08:07:02 -07:00
Andrei Vagin 79f7cb6c1c netstack/sniffer: log GSO attributes
PiperOrigin-RevId: 251788534
2019-06-05 22:51:53 -07:00
Michael Pratt 57772db2e7 Shutdown host sockets on internal shutdown
This is required to make the shutdown visible to peers outside the
sandbox.

The readClosed / writeClosed fields were dropped, as they were
preventing a shutdown socket from reading the remainder of queued bytes.
The host syscalls will return the appropriate errors for shutdown.

The control message tests have been split out of socket_unix.cc to make
the (few) remaining tests accessible to testing inherited host UDS,
which don't support sending control messages.

Updates #273

PiperOrigin-RevId: 251763060
2019-06-05 18:40:37 -07:00
Andrei Vagin a12848ffeb netstack/tcp: fix calculating a number of outstanding packets
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.

PiperOrigin-RevId: 251743020
2019-06-05 16:30:45 -07:00
Chris Kuiper d18bb4f38a Adjust route when looping multicast packets
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.

PiperOrigin-RevId: 251739298
2019-06-05 16:08:29 -07:00
Ian Gudger c08fcaa364 Give test instantiations meaningful names.
PiperOrigin-RevId: 251737069
2019-06-05 15:57:27 -07:00
Nicolas Lacasse 3b950344b3 Bump googletest version.
PiperOrigin-RevId: 251716439
2019-06-05 14:17:22 -07:00
Michael Pratt d3ed9baac0 Implement dumpability tracking and checks
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.

Lack of support for set-uid binaries or fs creds simplifies things a
bit.

As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.

PiperOrigin-RevId: 251712714
2019-06-05 14:00:13 -07:00
Adin Scannell cecb71dc37 Building containerd with go modules is broken, use GOPATH.
PiperOrigin-RevId: 251583707
2019-06-04 23:09:18 -07:00
Bhasker Hariharan e0fb921205 Fix data race in synRcvdState.
When checking the length of the acceptedChan we should hold the
endpoint mutex otherwise a syn received while the listening socket
is being closed can result in a data race where the cleanupLocked
routine sets acceptedChan to nil while a handshake goroutine
in progress could try and check it at the same time.

PiperOrigin-RevId: 251537697
2019-06-04 16:17:24 -07:00
Yong He 7398f013f0 Drop one dirent reference after referenced by file
When pipe is created, a dirent of pipe will be
created and its initial reference is set as 0.
Cause all dirent will only be destroyed when
the reference decreased to -1, so there is already
a 'initial reference' of dirent after it created.
For destroying dirent after all reference released,
the correct way is to drop the 'initial reference'
once someone hold a reference to the dirent, such
as fs.NewFile, otherwise the reference of dirent
will stay 0 all the time, and will cause memory
leak of dirent.
Except pipe, timerfd/eventfd/epoll has the same
problem

Here is a simple case to create memory leak of dirent
for pipe/timerfd/eventfd/epoll in C langange, after
run the case, pprof the runsc process, you will
find lots dirents of pipe/timerfd/eventfd/epoll not
freed:

int main(int argc, char *argv[])
{
	int i;
	int n;
	int pipefd[2];

	if (argc != 3) {
		printf("Usage: %s epoll|timerfd|eventfd|pipe <iterations>\n", argv[0]);
	}

	n = strtol(argv[2], NULL, 10);

	if (strcmp(argv[1], "epoll") == 0) {
		for (i = 0; i < n; ++i)
			close(epoll_create(1));
	} else if (strcmp(argv[1], "timerfd") == 0) {
		for (i = 0; i < n; ++i)
			close(timerfd_create(CLOCK_REALTIME, 0));
	} else if (strcmp(argv[1], "eventfd") == 0) {
		for (i = 0; i < n; ++i)
			close(eventfd(0, 0));
	} else if (strcmp(argv[1], "pipe") == 0) {
		for (i = 0; i < n; ++i)
			if (pipe(pipefd) == 0) {
				close(pipefd[0]);
				close(pipefd[1]);
			}
	}

	printf("%s %s test finished\r\n",argv[1],argv[2]);
	return 0;
}

Change-Id: Ia1b8a1fb9142edb00c040e44ec644d007f81f5d2
PiperOrigin-RevId: 251531096
2019-06-04 15:40:23 -07:00
Yoshiaki Tamura 0fcfc54e65 Create README.
Create README with license information of the logos.

Change-Id: Ida8aaf99d815b295e5d5da226fdf2976bd507312
2019-06-04 15:31:31 -07:00
Adin Scannell 6f92038ce0 Use github directory if it exists.
Unfortunately, kokoro names the top-level directory per the SCM type. This
means there's no way to make the job names match; we simply need to probe for
the existence of the correct directory.

PiperOrigin-RevId: 251519409
2019-06-04 14:43:24 -07:00
Nicolas Lacasse 0c292cdaab Remove the Dirent field from Pipe.
Dirents are ref-counted, but Pipes are not. Holding a Dirent inside of a Pipe
raises difficult questions about the lifecycle of the Pipe and Dirent.

Fortunately, we can side-step those questions by removing the Dirent field from
Pipe entirely. We only need the Dirent when constructing fs.Files (which are
ref-counted), and in GetFile (when a Dirent is passed to us anyways).

PiperOrigin-RevId: 251497628
2019-06-04 12:58:56 -07:00
Adin Scannell 7436ea247b Fix Kokoro revision and 'go get usage'
As a convenience for debugging, also factor the scripts such that
can be run without Kokoro. In the future, this may be used to add
additional presubmit hooks that run without Kokoro.

PiperOrigin-RevId: 251474868
2019-06-04 11:07:27 -07:00
Adin Scannell f520d0d585 Resolve impossible dependencies.
PiperOrigin-RevId: 251377523
2019-06-03 23:00:40 -07:00
Andrei Vagin 90a116890f gvisor/sock/unix: pass creds when a message is sent between unconnected sockets
and don't report a sender address if it doesn't have one

PiperOrigin-RevId: 251371284
2019-06-03 21:48:19 -07:00
Andrei Vagin 00f8663887 gvisor/fs: return a proper error from FileWriter.Write in case of a short-write
The io.Writer contract requires that Write writes all available
bytes and does not return short writes. This causes errors with
io.Copy, since our own Write interface does not have this same
contract.

PiperOrigin-RevId: 251368730
2019-06-03 21:26:01 -07:00
Fabricio Voznika f1aee6a7ad Refactor container FS setup
No change in functionaly. Added containerMounter object
to keep state while the mounts are processed. This will
help upcoming changes to share mounts per-pod.

PiperOrigin-RevId: 251350096
2019-06-03 18:20:57 -07:00
Fabricio Voznika d28f71adcf Remove 'clearStatus' option from container.Wait*PID()
clearStatus was added to allow detached execution to wait
on the exec'd process and retrieve its exit status. However,
it's not currently used. Both docker and gvisor-containerd-shim
wait on the "shim" process and retrieve the exit status from
there. We could change gvisor-containerd-shim to use waits, but
it will end up also consuming a process for the wait, which is
similar to having the shim process.

Closes #234

PiperOrigin-RevId: 251349490
2019-06-03 18:16:09 -07:00
Adin Scannell 18e6e63503 Allow specification of origin in cloudbuild.
PiperOrigin-RevId: 251347966
2019-06-03 18:05:59 -07:00
Bhasker Hariharan bfe3220992 Delete debug log lines left by mistake.
Updates #236

PiperOrigin-RevId: 251337915
2019-06-03 17:00:18 -07:00
Michael Pratt 6e1f51f3eb Remove duplicate socket tests
socket_unix_abstract.cc: Subset of socket_abstract.cc
socket_unix_filesystem.cc: Subset of socket_filesystem.cc

PiperOrigin-RevId: 251297117
2019-06-03 13:31:47 -07:00
Michael Pratt 4555f6adc8 Update straggling copyright holder
Updates #209

PiperOrigin-RevId: 251289513
2019-06-03 12:51:55 -07:00
Michael Pratt 955685845e Remove spurious period
PiperOrigin-RevId: 251288885
2019-06-03 12:48:24 -07:00
Andrei Vagin 8e926e3f74 gvisor: validate a new map region in the mremap syscall
Right now, mremap allows to remap a memory region over MaxUserAddress,
this means that we can change the stub region.

PiperOrigin-RevId: 251266886
2019-06-03 10:59:46 -07:00
Adin Scannell 216da0b733 Add tooling for Go-compatible branch.
The WORKSPACE go_repositories can be generated from a standard go.mod file. Add
the necessary gazelle hooks to do so, and include a test that sanity checks
there are no changes. This go.mod file will be used in a subsequent commit to
generate a go gettable branch of the repository.

This commit also adds a tools/go_branch.sh script, which given an existing go
branch in the repository, will add an additional synthetic change to the branch
bringing it up-to-date with HEAD.

As a final step, a cloudbuild script is included, which can be used to automate
the process for every change pushed to the repository. This may be used after
an initial go branch is pushed, but this is manual process.

PiperOrigin-RevId: 251095016
2019-06-01 23:10:43 -07:00
Bhasker Hariharan 3577a4f691 Disable certain tests that are flaky under race detector.
PiperOrigin-RevId: 250976665
2019-05-31 16:19:49 -07:00
Bhasker Hariharan 033f96cc93 Change segment queue limit to be of fixed size.
Netstack sets the unprocessed segment queue size to match the receive
buffer size. This is not required as this queue only needs to hold enough
for a short duration before the endpoint goroutine can process it.

Updates #230

PiperOrigin-RevId: 250976323
2019-05-31 16:17:33 -07:00
Kevin Krakauer d58eb9ce82 Add basic iptables structures to netstack.
Change-Id: Ib589906175a59dae315405a28f2d7f525ff8877f
2019-05-31 16:14:04 -07:00
Adin Scannell 132bf68de4 Switch to new dedicated RBE project.
PiperOrigin-RevId: 250970783
2019-05-31 15:43:30 -07:00
Nicolas Lacasse 6f73d79c32 Simplify overlayBoundEndpoint.
There is no reason to do the recursion manually, since
Inode.BoundEndpoint will do it for us.

PiperOrigin-RevId: 250794903
2019-05-30 17:20:20 -07:00
Fabricio Voznika 38de91b028 Add build guard to files using go:linkname
Funcion signatures are not validated during compilation. Since
they are not exported, they can change at any time. The guard
ensures that they are verified at least on every version upgrade.

PiperOrigin-RevId: 250733742
2019-05-30 12:09:39 -07:00