This change moves all of Servo's WPT Python support scripts into one
directory as they were previously scattered throughout the directory
structure. This should allow more code reuse and make it easier to
understand how everything fits together.
The changes:
- `tests/wpt/update` → `python/wpt/importer`
- `etc/ci/upstream-wpt-changes/wptupstreamer` → `python/wpt/exporter`
- `etc/ci/upstream-wpt-changes/test.py` → `python/wpt/test.py`
- `etc/ci/upstream-wpt-changes/tests` → `python/wpt/tests`
- `tests/wpt/servowpt.py` →
- `python/wpt/update.py`
- `python/wpt/run.py`
- `tests/wpt/manifestupdate.py` → `python/wpt/manifestupdate.py`
This change also removes
- The ability to run the `update-wpt` and `test-wpt` commands without
using `mach`. These didn't work very well, because it was difficult
to get all of the wptrunner and mach dependencies installed outside
of the Python virtualenv. It's simpler if they are always run through
`mach`.
- The old WPT change upstreaming script that was no longer used.
There are two kinds of flaky/intermittent tests in Servo. The
traditional kind is the test that fails on the CI, but has an associated
bug indicating that the test is an intermittent failure. Many of these
tests have completely unstable results, for instance those where an
unpredictable set of subtests fail. It's impossible to generate stable
results for these, so we have traditionally simply discard these
unexpected results.
Another kind of intermittent test is one that will produce an expected
result when rerun (ie will flake). Some of these are also labeled with
bugs, while some are not. In some cases, there is flakiness in some core
Servo functionality that can lead to *any* test flaking, such as a race
condition that can lead to an early screenshot for reftests. When these
kinds of tests do not have associated bugs, they cause the CI to fail.
In this case, it is impossible to label these tests as intermittent
because it can literally be any test.
This change, reruns failed tests in order to detect unlabeled tests in
the second category. Instead of blocking the CI when the second run
leads to expected results, the CI will now pass, but the flake will be
reported to the new flakiness dashboard. This prevents unrelated flakes
from slowing down the merge queue.
These tests test the behavior of many properties and due to issues in
Servo, the results are incredibly unstable. Since the tests use large
property lists this leads to hundreds of failed subtests every run. We
let these tests either pass or fail so that results in the CI are
stable. The ultimate goal here is to fix the instability in Servo so
that these tests pass or fail consistently.
This change also adds support for intermittent expectations to the
ServoHandler. Before these kind of test results were interpreted as
unexpected results.
Use the new intermittent dashboard to report intermittents and get
information about open bugs. This is now used to filter out
known-intermittents from results. In addition, this also allows the
scripts to report bug information to the GitHub. Display that in all
output.
This will allow results to be formatted by other parts of the code (such
as the intermittent filtering) code. Previously, formatting was handled
in ServoHandler, which was a bit strange as it's really only necessary
for GroupingFormatter and the intermittent filtering code. This also
allows the results to be properly typed by the Python typing system.
This change integrates the filter-intermittents command into test-wpt.
This is in preparation for future work on tracking intermittent
failures. This change also:
- Removes the SrvoJson logger and replaces it with a generic WPT log
handler which tracks unexpected results.
- The intermittent filter is now controlled via environment variables
and the GitHub version requires a token instead of credentials.
- Output is saved to a single file and is always text.
Use blessings when it is possible for the WPT UI and fall back to using
raw control characters. This also fixes a bug where the UI didn't work
correctly in iTerm.app. Remove the one line version of the UI, since it
is no longer used as the iTerm fallback.
When running in non-interactive mode, print test names as they are run.
This prevent timeouts on the bots, which are detected by watching for
output from the test runner.
When running in iTerm.app don't use the multi-line status line, as the
carriage return is not processed properly. Instead use a single-line
output with the last test run and then backspace characters to keep it
up to date.
Fixes#8368.
Fixes#8395.
Print process output for TIMEOUT status in WPT UI
Tests can time out due to script or runtime errors, which are typically
reported via the Servo process output. Including process output for tests
timing out makes it easier to understand these problems.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/8323)
<!-- Reviewable:end -->
Tests can time out due to script or runtime errors, which are typically
reported via the Servo process output. Including process output for tests
timing out makes it easier to understand these problems.
The idea is that this UI will be installed adhoc by the Servo scripts
for now. Later, after baking for a while in the Servo source tree it
will be upstreamed to the mozlog project itself.