Add performance guide.
This commit is contained in:
parent
0f5e7e011c
commit
599590c9d7
|
@ -0,0 +1,245 @@
|
|||
+++
|
||||
title = "Performance Guide"
|
||||
weight = 30
|
||||
+++
|
||||
gVisor is designed to provide a secure, virtualized environment while preserving
|
||||
key benefits of containerization, such as small fixed overheads and a dynamic
|
||||
resource footprint. For containerized infrastructure, this can provide an "easy
|
||||
button" for sandboxing untrusted workloads: there are no changes to the
|
||||
fundamental resource model.
|
||||
|
||||
gVisor imposes runtime costs over native containers. These costs come in two
|
||||
forms: additional cycles and memory usage, which may manifest as increased
|
||||
latency, reduced throughput or density, or not at all. In general, these costs
|
||||
come from two different sources.
|
||||
|
||||
First, the existence of the [Sentry](../) means that additional memory will be
|
||||
required, and application system calls must traverse additional layers of
|
||||
software. The design emphasizes [security](../security/) and therefore we chose
|
||||
to use a language for the Sentry that provides benefits in this domain but may
|
||||
not yet offer the raw performance of other choices. Costs imposed by these
|
||||
design choices are **structural costs**.
|
||||
|
||||
Second, as gVisor is an independent implementation of the system call surface,
|
||||
many of the subsystems or specific calls are not as optimized as more mature
|
||||
implementations. A good example here is the network stack, which is continuing
|
||||
to evolve but does not support all the advanced recovery mechanisms offered by
|
||||
other stacks and is less CPU efficient. This an **implementation cost** and is
|
||||
distinct from **structural costs**. Improvements here are ongoing and driven by
|
||||
the workloads that matter to gVisor users and contributors.
|
||||
|
||||
This page provides a guide for understanding baseline performance, and calls out
|
||||
distint **structural costs** and **implementation costs**, highlighting where
|
||||
improvements are possible and not possible.
|
||||
|
||||
While we include a variety of workloads here, it’s worth emphasizing that gVisor
|
||||
may not be an appropriate solution for every workload, for reasons other than
|
||||
performance. For example, a sandbox is likely to provide minimal benefit for
|
||||
your database, since *all your user data would already be inside the sandbox*
|
||||
and there is no need for an attacker to break out in the first place.
|
||||
|
||||
## Methodology
|
||||
|
||||
All data below was generated using the [benchmark tools][benchmark-tools]
|
||||
repository, and the machines under test are uniform [Google Compute Engine][gce]
|
||||
Virtual Machines (VMs) with the following specifications:
|
||||
|
||||
```
|
||||
Machine type: n1-standard-4 (broadwell)
|
||||
Image: Debian GNU/Linux 9 (stretch) 4.19.0-0
|
||||
BootDisk: 2048GB SSD persistent disk
|
||||
```
|
||||
|
||||
Through this document, `runsc` is used to indicate the runtime provided by
|
||||
gVisor. When relevant, we use the name `runsc-platform` to describe a specific
|
||||
[platform choice](../overview/).
|
||||
|
||||
**Except where specified, all tests below are conducted with the `ptrace`
|
||||
platform. The `ptrace` platform works everywhere and does not require hardware
|
||||
virtualization or kernel modifications but suffers from the highest structural
|
||||
costs by far. This platform is used to provide a clear understanding of the
|
||||
performance model, but in no way represents an ideal scenario. In the future,
|
||||
this guide will be extended to bare metal environments and include additional
|
||||
platforms.**
|
||||
|
||||
## Memory access
|
||||
|
||||
gVisor does not introduce any additional costs with respect to raw memory
|
||||
accesses. Page faults are other Operating System (OS) mechanisms are translated
|
||||
through the Sentry, but once mappings are installed and available to the
|
||||
application, there is no additional overhead.
|
||||
|
||||
{{< graph id="sysbench-memory" url="/performance/sysbench-memory.csv" title="perf.py sysbench.memory --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figure demonstrates the memory transfer rate as measured by
|
||||
`sysbench`.
|
||||
|
||||
## Memory usage
|
||||
|
||||
The Sentry provides an additional layer of indirection, and it requires memory
|
||||
in order to store state associated with the application. This memory generally
|
||||
consists of a fixed component, plus an amount that varies with the usage of
|
||||
operating system resources (e.g. how many sockets or files are opened).
|
||||
|
||||
For many use cases, fixed memory overheads are a primary concern. This may be
|
||||
because sandboxed containers handle a low volume of requests, and it is
|
||||
therefore important to achieve high densities for efficiency.
|
||||
|
||||
{{< graph id="density" url="/performance/density.csv" title="perf.py density --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figure demonstrates these costs based on three sample applications.
|
||||
This test is the result of running many instances of a container (typically 50)
|
||||
and calculating available memory on the host before and afterwards, and dividing
|
||||
the difference by the number of containers.
|
||||
|
||||
The first application is an instance of `sleep`: a trivial application that does
|
||||
nothing. The second application is a synthetic `node` application which imports
|
||||
a number of modules and listens for requests. The third application is a similar
|
||||
synthetic `ruby` application which does the same.
|
||||
|
||||
## CPU performance
|
||||
|
||||
gVisor does not perform emulation or otherwise interfere with the raw execution
|
||||
of CPU instructions by the application. Therefore, there is no runtime cost
|
||||
imposed for CPU operations.
|
||||
|
||||
{{< graph id="sysbench-cpu" url="/performance/sysbench-cpu.csv" title="perf.py sysbench.cpu --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figure demonstrates the `sysbench` measurement of CPU events per
|
||||
second. Events per second is based on a CPU-bound loop that calculates all prime
|
||||
numbers in a specified range. We note that `runsc` does not impose substantial
|
||||
degradation, as the code is executing natively in both cases.
|
||||
|
||||
This has important consequences for classes of workloads that are often
|
||||
CPU-bound, such as data processing or machine learning. In these cases, `runsc`
|
||||
will similarly impose minimal runtime overhead.
|
||||
|
||||
{{< graph id="tensorflow" url="/performance/tensorflow.csv" title="perf.py tensorflow --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
For example, the above figure shows a sample TensorFlow workload, the
|
||||
[convolutional neural network example][cnn]. The time indicated includes the
|
||||
full start-up and run time for the workload, which trains a model.
|
||||
|
||||
## System calls
|
||||
|
||||
Some **structural costs** of gVisor are heavily influenced by the [platform
|
||||
choice](../overview/), which implements system call interception. Today, gVisor
|
||||
supports a variety of platforms. These platforms present distinct performance,
|
||||
compatibility and security trade-offs. For example, the KVM platform has low
|
||||
overhead system call interception but runs poorly with nested virtualization.
|
||||
|
||||
{{< graph id="syscall" url="/performance/syscall.csv" title="perf.py syscall --runtime=runc --runtime=runsc-ptrace --runtime=runsc-kvm" log="true" >}}
|
||||
|
||||
The above figure demonstrates the time required for a raw system call on various
|
||||
platforms. The test is implemented by a custom binary which performs a large
|
||||
number of system calls and calculates the average time required.
|
||||
|
||||
This cost will principally impact applications that are system call bound, which
|
||||
tend to be high-performance data stores and static network services. In general,
|
||||
the impact of system call interception will be lower the more work an
|
||||
application does.
|
||||
|
||||
{{< graph id="redis" url="/performance/redis.csv" title="perf.py redis --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
For example, `redis` is an application that performs relatively little work in
|
||||
userspace: in general it reads from a connected socket, reads or modifies some
|
||||
data, and writes a result back to the socket. The above figure shows the results
|
||||
of running [comprehensive set of benchmarks][redis-benchmark]. We can see that
|
||||
small operations impose a large operation, while larger operations, such as
|
||||
`LRANGE`, where more work is done in the application, have a smaller relative
|
||||
overhead.
|
||||
|
||||
Some of these costs above are **structural costs**, and `redis` is likely to
|
||||
remain a challenging performance scenario. However, optimizing the
|
||||
[platform](../overview) will also have a dramatic impact.
|
||||
|
||||
## Start-up time
|
||||
|
||||
For many use cases, the ability to spin-up containers quickly and efficiently is
|
||||
important. A sandbox may be short-lived and perform minimal user work (e.g. a
|
||||
function invocation).
|
||||
|
||||
{{< graph id="startup" url="/performance/startup.csv" title="perf.py startup --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figure indicates how total time required to start a container through
|
||||
[Docker][docker]. This benchmark uses three different applications. First, an
|
||||
alpine Linux-container that executes `true`. Second, a `node` application that
|
||||
loads a number of modules and binds an HTTP server. The time is measured by a
|
||||
successful request to the bound port. Finally, a `ruby` application that
|
||||
similarly loads a number of modules and binds an HTTP server.
|
||||
|
||||
## Network
|
||||
|
||||
Networking is mostly bound by **implementation costs**, and gVisor's network stack
|
||||
is improving quickly.
|
||||
|
||||
While typically not an important metric in practice for common sandbox use
|
||||
cases, nevertheless `iperf` is a common microbenchmark used to measure raw
|
||||
throughput.
|
||||
|
||||
{{< graph id="iperf" url="/performance/iperf.csv" title="perf.py iperf --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figure shows the result of an `iperf` test between two instances. For
|
||||
the upload case, the specified runtime is used for the `iperf` client, and in
|
||||
the download case, the specified runtime is the server. A native runtime is
|
||||
always used for the other endpoint in the test.
|
||||
|
||||
{{< graph id="applications" metric="requests_per_second" url="/performance/applications.csv" title="perf.py http.(node|ruby) --connections=25 --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figure shows the result of simple `node` and `ruby` web services that
|
||||
render a template upon receiving a request. Because these synthetic benchmarks
|
||||
do minimal work per request, must like the `redis` case, they suffer from high
|
||||
overheads. In practice, the more work an application does the smaller the impact
|
||||
of **structural costs** become.
|
||||
|
||||
## File system
|
||||
|
||||
Some aspects of file system performance are also reflective of **implementation
|
||||
costs**, and an area where gVisor's implementation is improving quickly.
|
||||
|
||||
In terms of raw disk I/O, gVisor does not introduce significant fundamental
|
||||
overhead. For general file operations, gVisor introduces a small fixed overhead
|
||||
for data that transitions across the sandbox boundary. This manifests as
|
||||
**structural costs** in some cases, since these operations must be routed
|
||||
through the [Gofer](../) as a result of our [security model](../security/), but
|
||||
in most cases are dominated by **implementation costs**, due to an internal
|
||||
[Virtual File System][vfs] (VFS) implementation the needs improvement.
|
||||
|
||||
{{< graph id="fio-bw" url="/performance/fio.csv" title="perf.py fio --engine=sync --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The above figures demonstrate the results of `fio` for reads and writes to and
|
||||
from the disk. In this case, the disk quickly becomes the bottleneck and
|
||||
dominates other costs.
|
||||
|
||||
{{< graph id="fio-tmpfs-bw" url="/performance/fio-tmpfs.csv" title="perf.py fio --engine=sync --runtime=runc --tmpfs=True --runtime=runsc" >}}
|
||||
|
||||
The above figure shows the raw I/O performance of using a `tmpfs` mount which is
|
||||
sandbox-internal in the case of `runsc`. Generally these operations are
|
||||
similarly bound to the cost of copying around data in-memory, and we don't see
|
||||
the cost of VFS operations.
|
||||
|
||||
{{< graph id="httpd100k" metric="transfer_rate" url="/performance/httpd100k.csv" title="perf.py http.httpd --connections=1 --connections=5 --connections=10 --connections=25 --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
The high costs of VFS operations can manifest in benchmarks that execute many
|
||||
such operations in the hot path for serviing requests, for example. The above
|
||||
figure shows the result of using gVisor to serve small pieces of static content
|
||||
with predictably poor results. This workload represents `apache` serving a
|
||||
single file sized 100k to a client running [ApacheBench][ab] with varying levels
|
||||
of concurrency. The high overhead comes principles from a VFS implementation
|
||||
needs improvement, with several internal serialization points (since all
|
||||
requests are reading the same file). Note that some of some of network stack
|
||||
performance issues also impact this benchmark.
|
||||
|
||||
{{< graph id="ffmpeg" url="/performance/ffmpeg.csv" title="perf.py media.ffmpeg --runtime=runc --runtime=runsc" >}}
|
||||
|
||||
For benchmarks that are bound by raw disk I/O and a mix of compute, file system
|
||||
operations are less of an issue. The above figure shows the total time required
|
||||
for an `ffmpeg` container to start, load and transcode an input video.
|
||||
|
||||
[ab]: https://en.wikipedia.org/wiki/ApacheBench
|
||||
[benchmark-tools]: https://gvisor.googlesource.com/benchmark-tools
|
||||
[gce]: https://cloud.google.com/compute/
|
||||
[cnn]: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py
|
||||
[docker]: https://docker.io
|
||||
[redis-benchmark]: https://redis.io/topics/benchmarks
|
||||
[vfs]: https://en.wikipedia.org/wiki/Virtual_file_system
|
|
@ -24,4 +24,8 @@
|
|||
src="https://code.jquery.com/jquery-3.3.1.min.js"
|
||||
integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
|
||||
crossorigin="anonymous"></script>
|
||||
<script
|
||||
src="https://d3js.org/d3.v4.min.js"
|
||||
integrity="sha384-1EOYqz4UgZkewWm70NbT1JBUXSQpOIS2AaJy6/evZH+lXOrt9ITSJbFctNeyBoIJ"
|
||||
crossorigin="anonymous"></script>
|
||||
{{ partial "hooks/head-end.html" . }}
|
||||
|
|
|
@ -0,0 +1,199 @@
|
|||
<svg id="{{ .Get "id" }}" width=500 height=200>
|
||||
<title>{{ .Get "title" }}</title>
|
||||
</svg>
|
||||
|
||||
<script type="text/javascript">
|
||||
d3.csv("{{ .Get "url" }}", function(d, i, columns) {
|
||||
return d; // Transformed below.
|
||||
}, function(error, data) {
|
||||
if (error) throw(error);
|
||||
|
||||
// Create a new data that pivots on runtime.
|
||||
//
|
||||
// To start, we have:
|
||||
// runtime, ..., result
|
||||
// runc, ..., 1
|
||||
// runsc, ..., 2
|
||||
//
|
||||
// In the end we want:
|
||||
// ..., runsc, runc
|
||||
// ..., 1, 2
|
||||
|
||||
// Filter by metric, if required.
|
||||
if ("{{ .Get "metric" }}" != "") {
|
||||
orig_columns = data.columns;
|
||||
data = data.filter(d => d.metric == "{{ .Get "metric" }}");
|
||||
data.columns = orig_columns;
|
||||
}
|
||||
|
||||
// Filter by method, if required.
|
||||
if ("{{ .Get "method" }}" != "") {
|
||||
orig_columns = data.columns;
|
||||
data = data.filter(d => d.method == "{{ .Get "method" }}");
|
||||
data.columns = orig_columns.filter(key => key != "method");
|
||||
}
|
||||
|
||||
// Enumerate runtimes.
|
||||
var runtimes = Array.from(new Set(data.map(d => d.runtime)));
|
||||
var metrics = Array.from(new Set(data.map(d => d.metric)));
|
||||
if (metrics.length < 1) {
|
||||
console.log(data);
|
||||
throw("need at least one metric");
|
||||
} else if (metrics.length == 1) {
|
||||
metric = metrics[0];
|
||||
data.columns = data.columns.filter(key => key != "metric");
|
||||
} else {
|
||||
metric = ""; // Used for grouping.
|
||||
}
|
||||
|
||||
var isSubset = function(a, sup) {
|
||||
var ap = Object.getOwnPropertyNames(a);
|
||||
for (var i = 0; i < ap.length; i++) {
|
||||
if (a[ap[i]] !== sup[ap[i]]) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
};
|
||||
|
||||
// Execute a pivot to include runtimes as attributes.
|
||||
var new_data = data.map(function(data_item) {
|
||||
// Generate a prototype data item.
|
||||
var proto_item = Object.assign({}, data_item);
|
||||
delete proto_item.runtime;
|
||||
delete proto_item.result;
|
||||
var next_item = Object.assign({}, proto_item);
|
||||
|
||||
// Find all matching runtime items.
|
||||
data.forEach(function(d) {
|
||||
if (isSubset(proto_item, d)) {
|
||||
// Add the result result.
|
||||
next_item[d.runtime] = d.result;
|
||||
}
|
||||
});
|
||||
return next_item;
|
||||
});
|
||||
|
||||
// Remove any duplication.
|
||||
new_data = Array.from(new Set(new_data));
|
||||
new_data.columns = data.columns;
|
||||
new_data.columns = new_data.columns.filter(key => key != "runtime" && key != "result");
|
||||
new_data.columns = new_data.columns.concat(runtimes);
|
||||
data = new_data;
|
||||
|
||||
// Slice based on the first key.
|
||||
if (data.columns.length != runtimes.length) {
|
||||
x0_key = new_data.columns[0];
|
||||
var x1_domain = data.columns.slice(1);
|
||||
} else {
|
||||
x0_key = "runtime";
|
||||
var x1_domain = runtimes;
|
||||
}
|
||||
|
||||
// Determine varaible margins.
|
||||
var x0_domain = data.map(d => d[x0_key]);
|
||||
var margin_bottom_pad = 0;
|
||||
if (x0_domain.length > 8) {
|
||||
margin_bottom_pad = 50;
|
||||
}
|
||||
|
||||
// Use log scale if required.
|
||||
var y_min = 0;
|
||||
if ({{ .Get "log" | default false }}) {
|
||||
// Need to cap lower end of the domain at 1.
|
||||
y_min = 1;
|
||||
}
|
||||
|
||||
var svg = d3.select("#{{ .Get "id" }}"),
|
||||
margin = {top: 20, right: 20, bottom: 30 + margin_bottom_pad, left: 50},
|
||||
width = +svg.attr("width") - margin.left - margin.right,
|
||||
height = +svg.attr("height") - margin.top - margin.bottom,
|
||||
g = svg.append("g").attr("transform", "translate(" + margin.left + "," + margin.top + ")");
|
||||
|
||||
var x0 = d3.scaleBand()
|
||||
.rangeRound([margin.left / 2, width - (4 * margin.right)])
|
||||
.paddingInner(0.1);
|
||||
|
||||
var x1 = d3.scaleBand()
|
||||
.padding(0.05);
|
||||
|
||||
var y = d3.scaleLinear()
|
||||
.rangeRound([height, 0]);
|
||||
if ({{ .Get "log" | default false }}) {
|
||||
y = d3.scaleLog()
|
||||
.rangeRound([height, 0]);
|
||||
}
|
||||
|
||||
var z = d3.scaleOrdinal()
|
||||
.range(["#262362", "#FBB03B", "#286FD7", "#6b486b"]);
|
||||
|
||||
// Set all domains.
|
||||
x0.domain(x0_domain);
|
||||
x1.domain(x1_domain).rangeRound([0, x0.bandwidth()]);
|
||||
y.domain([y_min, d3.max(data, d => d3.max(x1_domain, key => parseFloat(d[key])))]).nice();
|
||||
|
||||
// The data.
|
||||
g.append("g")
|
||||
.selectAll("g")
|
||||
.data(data)
|
||||
.enter().append("g")
|
||||
.attr("transform", function(d) { return "translate(" + x0(d[x0_key]) + ",0)"; })
|
||||
.selectAll("rect")
|
||||
.data(d => x1_domain.map(key => ({key, value: d[key]})))
|
||||
.enter().append("rect")
|
||||
.attr("x", d => x1(d.key))
|
||||
.attr("y", d => y(d.value))
|
||||
.attr("width", x1.bandwidth())
|
||||
.attr("height", d => y(y_min) - y(d.value))
|
||||
.attr("fill", d => z(d.key));
|
||||
|
||||
// X0 ticks and labels.
|
||||
var x0_axis = g.append("g")
|
||||
.attr("class", "axis")
|
||||
.attr("transform", "translate(0," + height + ")")
|
||||
.call(d3.axisBottom(x0));
|
||||
if (x0_domain.length > 8) {
|
||||
x0_axis.selectAll("text")
|
||||
.style("text-anchor", "end")
|
||||
.attr("dx", "-.8em")
|
||||
.attr("dy", ".15em")
|
||||
.attr("transform", "rotate(-65)");
|
||||
}
|
||||
|
||||
// Y ticks and top-label.
|
||||
if (metric == "default") {
|
||||
metric = ""; // Don't display.
|
||||
}
|
||||
g.append("g")
|
||||
.attr("class", "axis")
|
||||
.call(d3.axisLeft(y).ticks(null, "s"))
|
||||
.append("text")
|
||||
.attr("x", -30.0)
|
||||
.attr("y", y(y.ticks().pop()) - 10.0)
|
||||
.attr("dy", "0.32em")
|
||||
.attr("fill", "#000")
|
||||
.attr("font-weight", "bold")
|
||||
.attr("text-anchor", "start")
|
||||
.text(metric);
|
||||
|
||||
// The legend.
|
||||
var legend = g.append("g")
|
||||
.attr("font-family", "sans-serif")
|
||||
.attr("font-size", 10)
|
||||
.attr("text-anchor", "end")
|
||||
.selectAll("g")
|
||||
.data(x1_domain.slice().reverse())
|
||||
.enter().append("g")
|
||||
.attr("transform", function(d, i) { return "translate(0," + i * 20 + ")"; });
|
||||
legend.append("rect")
|
||||
.attr("x", width - 19)
|
||||
.attr("width", 19)
|
||||
.attr("height", 19)
|
||||
.attr("fill", z);
|
||||
legend.append("text")
|
||||
.attr("x", width - 24)
|
||||
.attr("y", 9.5)
|
||||
.attr("dy", "0.32em")
|
||||
.text(function(d) { return d; });
|
||||
});
|
||||
</script>
|
After Width: | Height: | Size: 6.1 KiB |
|
@ -0,0 +1,9 @@
|
|||
# Performance data
|
||||
|
||||
This directory holds the CSVs generated by the
|
||||
[benchmark-tools][benchmark-tools] repository.
|
||||
|
||||
In the future, these will be automatically posted to a cloud storage bucket and
|
||||
loaded dynamically. At that point, this directory will be removed.
|
||||
|
||||
[benchmark-tools]: https://gvisor.googlesource.com/benchmark-tools
|
Loading…
Reference in New Issue