Since there is very little wasted work for Buildkite, increasing the
parallelsim will decrease throw-away work on cancelation or failure.
This aims to achieve ~3 minutes per individiaul test instance.
PiperOrigin-RevId: 424469351
Add logic to keep running on individual benchmark failures.
- Reconsider t.Fatalf in tensorflow benchmark so that the test completes all
workloads before failing.
- Add "-i" option to make file calls. -i will ignore errors in place and
attempt to keep running the benchmark run. Any results able to be parsed
will be parsed. The overall run will exit with an error, so buildkite will
register that error and show a run as "red".
PiperOrigin-RevId: 415341536
In some cases, it may be desirable to prebuild binaries and run all tests,
for example to run benchmarks with various experiments. Allow the top-level
Makefile to support this by checking for a STAGED_BINARIES variable.
PiperOrigin-RevId: 410673120
"cri.runtimeoptions.v1" moved to "runtimeoptions.v1" and containerd
configuration format version 2 is required.
Updates #6449
PiperOrigin-RevId: 405474653
Otherwise it can fail:
$ bazel cquery pkg/p9/... --output=starlark --starlark:file=tools/show_paths.bzl
...
ERROR: Starlark evaluation error for //pkg/p9/p9test:mockgen:
Traceback (most recent call last):
File "tools/show_paths.bzl", line 8, column 32, in format
Error: 'NoneType' value has no field or method 'get'
PiperOrigin-RevId: 396457764
#6322 tried to update Go to 1.16, but existing nodes fail to upgrade due to the
presence of old Go [1]. Specifically when trying to add Go to `/usr/bin`:
```
ln: failed to create symbolic link '/usr/bin/go': File exists
```
Also:
- Removing `golang-go` also removes apt installs of `gcc` and `pkg-config`, so
those are installed explicitly.
- Add `-c` to wget, which will prevent re-downloading Go for each run.
- Disable GO111MODULE when building cri-tools and containerd, since we're using
pre-module versions of each.
1 - https://buildkite.com/gvisor/pipeline/builds/7285#3593244c-e411-472d-804a-9c7fbbd24762
PiperOrigin-RevId: 386106881
Building nogo targets takes a very long time. This change extracts it into its
own BuildKite job.
This change also additionally speeds up other targets that were using the bazel
flag --test_tag_filters. Without --build_tag_filters, the filter is not
applied while building the specified targets and so we might end up building
targets that are not actually tested.
PiperOrigin-RevId: 368918211
The fio benchmark was changed to a fixed size read/write ammount
because the timed benchmark was overwhelming machine memory on
tmpfs mounts.
Now rand(read|write) operations are prohibitively long, leading to timeouts.
Split the benchmarks as they were in python bm-tools: the read/write as
fixed sized (1GB) and the rand(read|write) as timed operations (15s).
PiperOrigin-RevId: 364584436
The previous "bind" filesystem, already included in go/runsc-benchmarks
is a remote re-validate mount. However, the non-re-validate mount
was not present, and it has been added in the form of rootfs.
Also, fix the fio runs to reads/writes of 10GB as running
with --test.benchtime=Xs may scale beyond the memory available
to tmpfs mounts on buildkite VMs. Currently, our buildkite
pipelines are run on e2-standard-8 machines with 32GB of memory,
allowing tmpfs mounts to safely be at least 10GB.
PiperOrigin-RevId: 362143620
It's unclear why permissions wind up corrupted, but these can be cleared
on any failure, similar to the bazel cache itself:
https://buildkite.com/gvisor/pipeline/builds/2304#_
PiperOrigin-RevId: 355057421
Files removed from the working tree were not being properly removed from
the branch, leading to symbol conflicts while building. This requires the
change to 'git add --all' in the tools/go_branch.sh script.
But why was this not caught by CI? The "git clean -f" command by default
only cleans files in the current working directory. In order to clean the
whole tree recursively, we need to specify a pathspec, which is ".".
In addition to these fixes, re-add the "go tests" command to help prevent
this from happening again, since merges on the Go branch will happen in
GitHub actions for simplicity. The Go test is retained in BuildKite.
PiperOrigin-RevId: 351503804
This requires several changes:
* Templates must preserve relevant tags.
* Pagetables templates are split into two targets, each preserving tags.
* The binary VDSO is similarly split into two targets, with some juggling.
* The top level tools/go_branch.sh now does a crossbuild of ARM64 as well,
and checks and merges the results of the two branches together.
Fixes#5178
PiperOrigin-RevId: 351304330
This allows us to link directly to profiling results from
the build results. The code uses the standard pprof http
server, exported from the Cloud Run instance.
PiperOrigin-RevId: 350440910
This includes minor fix-ups:
* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.
PiperOrigin-RevId: 350205718
This change cleans up some minor Makefile issues, and adds support for
BuildKite annotations on failure and on profiles being generated. These
annotations will make failures very clear and link to the artifacts.
This change is a stepping stone for aggregating coverage data from all
individual test jobs, as this will also happen in .buildkite/annotate.sh.
PiperOrigin-RevId: 349606598
A few images were broken with respect to aarch64. We should now
be able to run push-all-images with ARCH=aarch64 as part of the
regular continuous integration builds, and add aarch64 smoke tests
(via user emulation for now) to the regular test suite (future).
PiperOrigin-RevId: 346685462
Recursive make is difficult to follow and debug. Drop this by using
internal functions, which, while difficult, are easier than trying to
following recursive invokations.
Further simplify the Makefile by collapsing the image bits and removing
the tools/vm directory, which is effectively unused.
Fixes#4952
PiperOrigin-RevId: 346569133