More tests will come, but it's worth getting what's done so far reviewed.
PiperOrigin-RevId: 219734531
Change-Id: If15ca6e6855e3d1cc28c83b5f9c3a72cb65b2e59
Otherwise the gofer's attach point may be different from sandbox when there
symlinks in the path.
PiperOrigin-RevId: 219730492
Change-Id: Ia9c4c2d16228c6a1a9e790e0cb673fd881003fe1
Fluentd configuration uses 'log' for the log message
while containerd uses 'msg'. Since we can't have a single
JSON format for both, add another log format and make
debug log configurable.
PiperOrigin-RevId: 219729658
Change-Id: I2a6afc4034d893ab90bafc63b394c4fb62b2a7a0
Updated error messages so that it doesn't print full Go struct representations
when running a new container in a sandbox. For example, this occurs frequently
when commands are not found when doing a 'kubectl exec'.
PiperOrigin-RevId: 219729141
Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
From https://golang.org/pkg/net/http/#Get:
"When err is nil, resp always contains a non-nil resp.Body. Caller should close
resp.Body when done reading from it."
PiperOrigin-RevId: 219658052
Change-Id: I556e88ac4f2c90cd36ab16cd3163d1a52afc32b7
With recent changes to 9P server, path walks are now safe inside
open, create, rename and setattr calls. To simplify the code, remove
the lazyopen=false mode that was used for bind mounts, and converge
all mounts to using lazy open.
PiperOrigin-RevId: 219508628
Change-Id: I073e7e1e2e9a9972d150eaf4cb29e553997a9b76
Use private futexes for performance and to align with other runtime uses.
PiperOrigin-RevId: 219422634
Change-Id: Ief2af5e8302847ea6dc246e8d1ee4d64684ca9dd
It can be occurred if two controllers are mounted together or if Uninstall() is called on a error path.
PiperOrigin-RevId: 218723886
Change-Id: I69d7a3c0685a7da38527ea8b7b301dbe96268285
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.
PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
Errors are shown as being ignored by assigning to the blank identifier.
PiperOrigin-RevId: 218103819
Change-Id: I7cc7b9d8ac503a03de5504ebdeb99ed30a531cf2
Added events for *ctl syscalls that may have multiple different commands.
For runsc, each syscall event is only logged once. For *ctl syscalls, use
the cmd as identifier, not only the syscall number.
PiperOrigin-RevId: 218015941
Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.
Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.
It haven't been able to write a good test for it. So I'm relying on manual tests
for now.
PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
We were closing the FD directly. If the test then created a new socket pair
with the same FD, in-flight RPCs would get directed to the new socket and break
the test.
Instead, we should use unet.Socket.Close(), which allows any in-flight RPCs to
finish.
PiperOrigin-RevId: 217608491
Change-Id: I8c5a76638899ba30f33ca976e6fac967fa0aadbf
Now containers run with "docker run -it" support control characters like ^C and
^Z.
This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.
PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
--pid allows specific processes to be signalled rather than the container root
process or all processes in the container. containerd needs to SIGKILL exec'd
processes that timeout and check whether processes are still alive.
PiperOrigin-RevId: 217547636
Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
It has timed out running with kokoro a few times. I passes
consistently on my machine (200+ runsc). Increase the timeout
to see if it helps.
Failure: image_test.go:212: WaitForHTTP() timeout: Get http://localhost:32785/: dial tcp [::1]:32785: connect: connection refused
PiperOrigin-RevId: 217532428
Change-Id: Ibf860aecf537830bef832e436f2e804b3fc12f2d
This is one of the many tests that fails periodically, making Kokoro unstable.
PiperOrigin-RevId: 217528257
Change-Id: I2508ecf4d74d71b91feff1183544d61d7bd16995
Sometimes if we try to remove the cgroup directory too soon after killing the
sandbox we EBUSY. This CL adds a retry (up to 5 seconds) for removing.
Deflakes ChrootTest.
PiperOrigin-RevId: 217526909
Change-Id: I749bb172117e2298c9888ecad094072393b94810
We treat handle the boot process stdio separately from the application stdio
(which gets passed via flags), but we were still sending both to same place. As
a result, some logs that are written directly to os.Stderr by the boot process
were ending up in the application logs.
This CL starts sendind boot process stdio to the null device (since we don't
have any better options). The boot process is already configured to send all
logs (and panics) to the log file, so we won't miss anything important.
PiperOrigin-RevId: 217173020
Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794
The spec command is analygous to the 'runc spec' command and allows for
the convenient creation of a config.json file for users that don't have
runc handy.
Change-Id: Ifdfec37e023048ea461c32da1a9042a45b37d856
PiperOrigin-RevId: 216907826
It's possible for Start() and Wait() calls to race, if the sandboxed
application is short-lived. If the application finishes before (or during) the
Wait RPC, then Wait will fail. In practice this looks like "connection
refused" or "EOF" errors when waiting for an RPC response.
This race is especially bad in tests, where we often run "true" inside a
sandbox.
This CL does a best-effort fix, by returning the sandbox exit status as the
container exit status. In most cases, these are the same.
This fixes the remaining flakes in runsc/container:container_test.
PiperOrigin-RevId: 216777793
Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
This is a breaking change if you're using --debug-log-dir.
The fix is to replace it with --debug-log and add a '/' at
the end:
--debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/
PiperOrigin-RevId: 216761212
Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.
For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.
PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY
in the new process, not the current process. The ioctl call is made after
duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD.
This fixes the "bad address" flakes in runsc/container:container_test, although
some other flakes remain.
PiperOrigin-RevId: 216594394
Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.
When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.
To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.
PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
Sandbox creation uses the limits and reservations configured in the
OCI spec and set cgroup options accordinly. Then it puts both the
sandbox and gofer processes inside the cgroup.
It also allows the cgroup to be pre-configured by the caller. If the
cgroup already exists, sandbox and gofer processes will join the
cgroup but it will not modify the cgroup with spec limits.
PiperOrigin-RevId: 216538209
Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
We were previously only sending to the originator of the process group.
Integration test was changed to test this behavior. It fails without the
corresponding code change.
PiperOrigin-RevId: 216297263
Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
Docker and Containerd both eat the boot processes stderr, making it difficult
to track down panics (which are always written to stderr).
This CL makes the boot process dup its debug log FD to stderr, so that panics
will be captured in the debug log, which is better than nothing.
This is the 3rd try at this CL. Previous attempts were foiled because Docker
expects the 'create' command to pass its stdio directly to the container, so
duping stderr in 'create' caused the applications stderr to go to the log file,
which breaks many applications (including our mysql test).
I added a new image_test that makes sure stdout and stderr are handled
correctly.
PiperOrigin-RevId: 215767328
Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
Sandbox was setting chroot, but was not chaging the working
dir. Added test to ensure this doesn't happen in the future.
PiperOrigin-RevId: 215676270
Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.
Without this transition, errors during Create get hidden by the later panic.
PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.
I switched back to Math.Pow format to make the arithmetic easier to inspect.
PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.
If we have a debug log file, we should send panics (and all stderr data) to the
log file.
PiperOrigin-RevId: 215585559
Change-Id: I3844259ed0cd26e26422bcdb40dded302740b8b6
We were previously using the sandbox process's stdio as the root container's
stdio. This makes it difficult/impossible to distinguish output application
output from sandbox output, such as panics, which are always written to stderr.
Also close the console socket when we are done with it.
PiperOrigin-RevId: 215585180
Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.
However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.
This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.
One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.
To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.
Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.
Example:
root@:/usr/local/apache2# sleep 100
^C
root@:/usr/local/apache2# sleep 100
^Z
[1]+ Stopped sleep 100
root@:/usr/local/apache2# fg
sleep 100
^C
root@:/usr/local/apache2#
PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
Some tests check current capabilities and re-run the tests as root inside
userns if required capabibilities are missing. It was checking for
CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now.
PiperOrigin-RevId: 214949226
Change-Id: Ic81363969fa76c04da408fae8ea7520653266312
It's easier to manage a single map with processes that we're interested
to track. This will make the next change to clean up the map on destroy
easier.
PiperOrigin-RevId: 214894210
Change-Id: I099247323a0487cd0767120df47ba786fac0926d
We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but
discards pending input.
The filters were relaxed to allow host ioctls with TCSETSF argument.
This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.
Before:
root@b8a0240fc836:/# passwd
Enter new UNIX password: 123
Retype new UNIX password: 123
passwd: password updated successfully
After:
root@ae6f5dabe402:/# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
In order to implement kill --all correctly, the Sentry needs
to track all tasks that belong to a given container. This change
introduces ContainerID to the task, that gets inherited by all
children. 'kill --all' then iterates over all tasks comparing the
ContainerID field to find all processes that need to be signalled.
PiperOrigin-RevId: 214841768
Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
This makes the flow slightly simpler (no need to call
Loader.SetRootContainer). And this is required change to tag
tasks with container ID inside the Sentry.
PiperOrigin-RevId: 214795210
Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
This was done so it's easier to add more functionality
to this file for other tests.
PiperOrigin-RevId: 214782043
Change-Id: I1f38b9ee1219b3ce7b789044ada8e52bdc1e6279
This CL changes the semantics of the "--file-access" flag so that it only
affects the root filesystem. The default remains "exclusive" which is the
common use case, as neither Docker nor K8s supports sharing the root.
Keeping the root fs as "exclusive" means that the fs-intensive work done during
application startup will mostly be cacheable, and thus faster.
Non-root bind mounts will always be shared.
This CL also removes some redundant FSAccessType validations. We validate this
flag in main(), so we can assume it is valid afterwards.
PiperOrigin-RevId: 214359936
Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51
Each container has associated metadata (particularly the container status) that
is manipulated by various runsc commands. This metadata is stored in a file
identified by the container id.
Different runsc processes may manipulate the same container metadata, and each
will read/write to the metadata file.
This CL adds a file lock per container which must be held when reading the
container metadata file, and when modifying and writing the container metadata.
PiperOrigin-RevId: 214019179
Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad
I've made several attempts to create a test, but the lack of
permission from the test user makes it nearly impossible to
test anything useful.
PiperOrigin-RevId: 213922174
Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
Destroy flushes dirent references, which triggers many async close operations.
We must wait for those to finish before returning from Destroy, otherwise we
may kill the gofer, causing a cascade of failing RPCs and leading to an
inconsistent FS state.
PiperOrigin-RevId: 213884637
Change-Id: Id054b47fc0f97adc5e596d747c08d3b97a1d1f71
The issue with the previous change was that the stdin/stdout/stderr passed to
the sentry were dup'd by host.ImportFile. This left a dangling FD that by never
closing caused containerd to timeout waiting on container stop.
PiperOrigin-RevId: 213753032
Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
This method will:
1. Stop the container process if it is still running.
2. Unmount all sanadbox-internal mounts for the container.
3. Delete the contaner root directory inside the sandbox.
Destroy is idempotent, and safe to call concurrantly.
This fixes a bug where after stopping a container, we cannot unmount the
container root directory on the host. This bug occured because the sandbox
dirent cache was holding a dirent with a host fd corresponding to a file inside
the container root on the host. The dirent cache did not know that the
container had exited, and kept the FD open, preventing us from unmounting on
the host.
Now that we unmount (and flush) all container mounts inside the sandbox, any
host FDs donated by the gofer will be closed, and we can unmount the container
root on the host.
PiperOrigin-RevId: 213737693
Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
Capabilities.Set() adds capabilities,
but doesn't remove existing ones that might have been loaded. Fixed
the code and added tests.
PiperOrigin-RevId: 213726369
Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba
`docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json
(runtime spec file). When nginx worker_connections is configured as auto, the worker is
generated according to the number of CPUs. If the cgroup is already set on the host, but
it is not displayed correctly in the sandbox, performance may be degraded.
This patch can get cpus info from spec file and apply to sentry on bootup, so the
/proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on
`/sys/devices/system/cpu/online` are also affected by this patch.
e.g.
--cpuset-cpus=2,3 -> cpu number:2
--cpuset-cpus=4-7 -> cpu number:4
--cpus=2.8 -> cpu number:3
--cpus=0.5 -> cpu number:1
Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316
PiperOrigin-RevId: 213685199
For my own sanitity when thinking about possible transitions and state.
PiperOrigin-RevId: 213559482
Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c
Reap children more systematically in container tests. Previously,
container_test was taking ~5 mins to run because constainer.Destroy()
would timeout waiting for the sandbox process to exit. Now the test
running in less than a minute.
Also made the contract around Container and Sandbox destroy clearer.
PiperOrigin-RevId: 213527471
Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
This makes `runsc wait` behave more like waitpid()/wait4() in that:
- Once a process has run to completion, you can wait on it and get its exit
code.
- Processes not waited on will consume memory (like a zombie process)
PiperOrigin-RevId: 213358916
Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
This CL:
1) Fix `runsc wait`, it now also works after the container exits;
2) Generate correct container state in Load;
2) Make sure `Destory` cleanup everything before successfully return.
PiperOrigin-RevId: 212900107
Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
This is different from the existing -pid-file flag, which saves a host pid.
PiperOrigin-RevId: 212713968
Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
We were previously openining the platform device (i.e. /dev/kvm) inside the
platfrom constructor (i.e. kvm.New). This requires that we have RW access to
the platform device when constructing the platform.
However, now that the runsc sandbox process runs as user "nobody", it is not
able to open the platform device.
This CL changes the kvm constructor to take the platform device FD, rather than
opening the device file itself. The device file is opened outside of the
sandbox and passed to the sandbox process.
PiperOrigin-RevId: 212505804
Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
Inside the chroot, we run as user nobody, so all mounted files and directories
must be accessible to all users.
PiperOrigin-RevId: 212284805
Change-Id: I705e0dbbf15e01e04e0c7f378a99daffe6866807
We must use a context.Context with a Root Dirent that corresponds to the
container's chroot. Previously we were using the root context, which does not
have a chroot.
Getting the correct context required refactoring some of the path-lookup code.
We can't lookup the path without a context.Context, which requires
kernel.CreateProcArgs, which we only get inside control.Execute. So we have to
do the path lookup much later than we previously were.
PiperOrigin-RevId: 212064734
Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
It was used before gofer was implemented and it's not
supported anymore.
BREAKING CHANGE: proxy-shared and proxy-exclusive options
are now: shared and exclusive.
PiperOrigin-RevId: 212017643
Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
We construct a dir with the executable bind-mounted at /exe, and proc mounted
at /proc. Runsc now executes the sandbox process inside this chroot, thus
limiting access to the host filesystem. The mounts and chroot dir are removed
when the sandbox is destroyed.
Because this requires bind-mounts, we can only do the chroot if we have
CAP_SYS_ADMIN.
PiperOrigin-RevId: 211994001
Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.
If we have a debug log file, we should send panics (and all stderr data) to the
log file.
PiperOrigin-RevId: 211992321
Change-Id: I5f0d2f45f35c110a38dab86bafc695aaba42f7a3
Now, we can kill individual containers rather than the entire sandbox.
PiperOrigin-RevId: 211748106
Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
When no capabilities are specified in exec, use the
container's capabilities to match runc's behavior.
PiperOrigin-RevId: 211735186
Change-Id: Icd372ed64410c81144eae94f432dffc9fe3a86ce
With multi-gofers, bind mounts in sub-containers should
just work. Removed restrictions and added test. There are
also a few cleanups along the way, e.g. retry unmounting
in case cleanup races with gofer teardown.
PiperOrigin-RevId: 211699569
Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
Remove GetExecutablePath (the non-internal version). This makes path handling
more consistent between exec, root, and child containers.
The new getExecutablePath now uses MountNamespace.FindInode, which is more
robust than Walking the Dirent tree ourselves.
This also removes the last use of lstat(2) in the sentry, so that can be
removed from the filters.
PiperOrigin-RevId: 211683110
Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
When starting a sandbox without direct file or network access, we create an
empty user namespace and run the sandbox in there. However, the root user in
that namespace is still mapped to the root user in the parent namespace.
This CL maps the "nobody" user from the parent namespace into the child
namespace, and runs the sandbox process as user "nobody" inside the new
namespace.
PiperOrigin-RevId: 211572223
Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.
Instead, we must open then before starting the sandbox process, and pass them
by FD.
The specutils.ReadSpecFromFile method was fixed to always seek to the beginning
of the file before reading. This allows Files from the same FD to be read
multiple times, as we do in the boot command when the apply-caps flag is set.
Tested with --network=host.
PiperOrigin-RevId: 211570647
Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
GoCompile: missing strict dependencies:
/tmpfs/tmp/bazel/sandbox/linux-sandbox/1744/execroot/__main__/runsc/main.go:
import of "gvisor.googlesource.com/gvisor/runsc/specutils"
This was broken in 210995199.
PiperOrigin-RevId: 211086595
Change-Id: I166b9a2ed8e4d6e624def944b720190940d7537c
This is a prereq for running the sandbox process as user "nobody", when it may
not have permissions to open these files.
Instead, we must open then before starting the sandbox process, and pass them
by FD.
PiperOrigin-RevId: 210995199
Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
This is to keep it consistent with other test, and
it's easier to maintain them in single file.
Also increase python test timeout to deflake it.
PiperOrigin-RevId: 210575042
Change-Id: I2ef5bcd5d97c08549f0c5f645c4b694253ef0b4d
This is to troubleshoot problems with a hung process that is
not responding to 'runsc debug --stack' command.
PiperOrigin-RevId: 210483513
Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
For readonly filesystems specified via relative path, we were forgetting to
mount relative to the container's bundle directory.
PiperOrigin-RevId: 210483388
Change-Id: I84809fce4b1f2056d0e225547cb611add5f74177
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.
If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.
Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.
PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
Previously, we were only using the host inode id as the QID path. But the host
filesystem can have multiple devices with conflicting inode ids. This resulted
in duplicate inode ids in the sentry.
This CL generates a unique QID for each <host inode, host device> pair.
PiperOrigin-RevId: 210424813
Change-Id: I16d106f61c7c8f910c0da4ceec562a010ffca2fb
Now each container gets its own dedicated gofer that is chroot'd to the
rootfs path. This is done to add an extra layer of security in case the
gofer gets compromised.
PiperOrigin-RevId: 210396476
Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
This CL adds terminal support for "docker exec". We previously only supported
consoles for the container process, but not exec processes.
The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.
Note that control-character signals are still not properly supported.
Tested with:
$ docker run --runtime=runsc -it alpine
In another terminial:
$ docker exec -it <containerid> /bin/sh
PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
Previously, runsc improperly attempted to find an executable in the container's
PATH.
We now search the PATH via the container's fsgofer rather than the host FS,
eliminating the confusing differences between paths on the host and within a
container.
PiperOrigin-RevId: 210159488
Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2
This is used when '--overlay=true' to guarantee writes are not sent to gofer.
PiperOrigin-RevId: 210116288
Change-Id: I7616008c4c0e8d3668e07a205207f46e2144bf30
Removed syscalls that are only used by whitelistfs
which has its own set of filters.
PiperOrigin-RevId: 209967259
Change-Id: Idb2e1b9d0201043d7cd25d96894f354729dbd089
The bug was caused by os.File's finalizer, which closes the file. Because
fsgofer.serve() was passed a file descriptor as an int rather than a os.File,
callers would pass os.File.Fd(), and the os.File would go out of scope. Thus,
the file would get GC'd and finalized nondeterministically, causing failures
when the file was used.
PiperOrigin-RevId: 209861834
Change-Id: Idf24d5c1f04c9b28659e62c97202ab3b4d72e994
UDS has a lower size limit than regular files. When running under bazel
this limit is exceeded. Test was changed to always mount /tmp and use
it for the test.
PiperOrigin-RevId: 209717830
Change-Id: I1dbe19fe2051ffdddbaa32b188a9167f446ed193