Currently, shm.Registry.FindByID will return Shm instances without taking an
additional reference on them, making it possible for them to disappear.
More explicitly handle references. All callers hold a reference for the
duration that they hold the instance. Registry.shms may transitively hold Shms
with no references, so it must TryIncRef to determine if they are still valid.
PiperOrigin-RevId: 288314529
Threadgroups already know their TTY (if they have one), which now contains the
TTY Index, and is returned in the Processes() call.
PiperOrigin-RevId: 284263850
Initialize the VDSO "os" and "arch" fields explicitly,
or the VDSO load process would failed on arm64 platform.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Ic6768df88e43cd7c7956eb630511672ae11ac52f
When an interpreter script is opened with O_CLOEXEC and the resulting fd is
passed into execveat, an ENOENT error should occur (the script would otherwise
be inaccessible to the interpreter). This matches the actual behavior of
Linux's execveat.
PiperOrigin-RevId: 277306680
This change simplifies the function signatures of functions related to loading
executables, such as LoadTaskImage, Load, loadBinary.
PiperOrigin-RevId: 276821187
Use fd.next to store the iteration start position, which can be used to accelerate allocating new FDs.
And adding the corresponding gtest benchmark to measure performance.
@tanjianfeng
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/758 from DarcySail:master 96685ec7886dfe1a64988406831d3bc002b438cc
PiperOrigin-RevId: 276351250
* Pulls common functionality (IO and locking on open) into pipe_util.go.
* Adds pipe/vfs.go, which implements a subset of vfs.FileDescriptionImpl.
A subsequent change will add support for pipes in memfs.
PiperOrigin-RevId: 275322385
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.
Updates #139
PiperOrigin-RevId: 274017890
gVisor does not currently implement the functionality that would result in
AT_SECURE = 1, but Linux includes AT_SECURE = 0 in the normal case, so we
should do the same.
PiperOrigin-RevId: 272311488
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to
track their CPU usage. This improves latency in the syscall path by avoid
expensive monotonic clock calls on every syscall entry/exit.
However, this timer fires every 10ms. Thus, when all tasks are idle (i.e.,
blocked or stopped), this forces a sentry wakeup every 10ms, when we may
otherwise be able to sleep until the next app-relevant event. These wakeups
cause the sentry to utilize approximately 2% CPU when the application is
otherwise idle.
Updates to clock are not strictly necessary when the app is idle, as there are
no readers of cpuClock. This commit reduces idle CPU by disabling the timer
when tasks are completely idle, and computing its effects at the next wakeup.
Rather than disabling the timer as soon as the app goes idle, we wait until the
next tick, which provides a window for short sleeps to sleep and wakeup without
doing the (relatively) expensive work of disabling and enabling the timer.
PiperOrigin-RevId: 272265822
Adresses a deadlock with the rolled back change:
b6a5b950d2
Creating a session from an orphaned process group was causing a lock to be
acquired twice by a single goroutine. This behavior is addressed, and a test
(OrphanRegression) has been added to pty.cc.
Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().
Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.
PiperOrigin-RevId: 270088599
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.
In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context. Probably not
worthwhile fixing immediately.
PiperOrigin-RevId: 269901749
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.
Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.
PiperOrigin-RevId: 268805207
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.
This CL defers resuming all protocol goroutines until the end of
restore.
PiperOrigin-RevId: 262409429
(Don't worry, this is mostly tests.)
Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().
Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.
PiperOrigin-RevId: 261387276
We can get the mount namespace from the CreateProcessArgs in all cases where we
need it. This also gets rid of kernel.Destroy method, since the only thing it
was doing was DecRefing the mounts.
Removing the need to call kernel.SetRootMountNamespace also allowed for some
more simplifications in the container fs setup code.
PiperOrigin-RevId: 261357060
Adds feature to launch from an open host FD instead of a binary_path.
The FD should point to a valid executable and most likely be statically
compiled. If the executable is not statically compiled, the loader will
search along the interpreter paths, which must be able to be resolved in
the Sandbox's file system or start will fail.
PiperOrigin-RevId: 260756825
This introduces two new types of Emitters:
1. MultiEmitter, which will forward events to other registered Emitters, and
2. RateLimitedEmitter, which will forward events to a wrapped Emitter, subject
to given rate limits.
The methods in the eventchannel package itself act like a multiEmitter, but is
not actually an Emitter. Now we have a DefaultEmitter, and the methods in
eventchannel simply forward calls to the DefaultEmitter.
The unimplemented syscall handler now uses a RateLimetedEmitter that wraps the
DefaultEmitter.
PiperOrigin-RevId: 260612770
The different containers in a sandbox used only one pid
namespace before. This results in that a container can see
the processes in another container in the same sandbox.
This patch use different pid namespace for different containers.
Signed-off-by: chris.zn <chris.zn@antfin.com>
This renames FDMap to FDTable and drops the kernel.FD type, which had an entire
package to itself and didn't serve much use (it was freely cast between types,
and served as more of an annoyance than providing any protection.)
Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup
operation, and 10-15 ns per concurrent lookup operation of savings.
This also fixes two tangential usage issues with the FDMap. Namely, non-atomic
use of NewFDFrom and associated calls to Remove (that are both racy and fail to
drop the reference on the underlying file.)
PiperOrigin-RevId: 256285890
Get/Set pipe size and ioctl support were missing from
overlayfs. It required moving the pipe.Sizer interface
to fs so that overlay could get access.
Fixes#318
PiperOrigin-RevId: 255511125
Addresses obvious typos, in the documentation only.
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
Credentials are immutable and even before these changes we could read them
without locks, but we needed to take a task lock to get a credential object
from a task object.
It is possible to avoid this lock, if we will guarantee that a credential
object will not be changed after setting it on a task.
PiperOrigin-RevId: 254989492
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.
Fixes#343
PiperOrigin-RevId: 254575752