This change moves all of Servo's WPT Python support scripts into one
directory as they were previously scattered throughout the directory
structure. This should allow more code reuse and make it easier to
understand how everything fits together.
The changes:
- `tests/wpt/update` → `python/wpt/importer`
- `etc/ci/upstream-wpt-changes/wptupstreamer` → `python/wpt/exporter`
- `etc/ci/upstream-wpt-changes/test.py` → `python/wpt/test.py`
- `etc/ci/upstream-wpt-changes/tests` → `python/wpt/tests`
- `tests/wpt/servowpt.py` →
- `python/wpt/update.py`
- `python/wpt/run.py`
- `tests/wpt/manifestupdate.py` → `python/wpt/manifestupdate.py`
This change also removes
- The ability to run the `update-wpt` and `test-wpt` commands without
using `mach`. These didn't work very well, because it was difficult
to get all of the wptrunner and mach dependencies installed outside
of the Python virtualenv. It's simpler if they are always run through
`mach`.
- The old WPT change upstreaming script that was no longer used.
Add try command to mach & try build partitioned
Adds `./mach try` command that enables anybody to easily test their changes without opening PR and requesting try from bors-servo, by force pushing HEAD to appropriate branch. Command accepts branches names to select only partial runs of CI (same like bors try command). So if you only want to test mac build (that would be `@bors-servo try=mac`) you run `./mach try mac`. If no job is specified, try branch is used.
As partitioned CI jobs were not working after migration to GitHub Actions I remade them by using if guards.
Also WPT jobs were failing due to empty `INTERMITTENT_TRACKER_DASHBOARD_SECRET` on my fork, so I added additional check to prevent failed run.
And that concludes my work on #29379🎉
---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix#29379
<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because it's CI
<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->
<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
There are two kinds of flaky/intermittent tests in Servo. The
traditional kind is the test that fails on the CI, but has an associated
bug indicating that the test is an intermittent failure. Many of these
tests have completely unstable results, for instance those where an
unpredictable set of subtests fail. It's impossible to generate stable
results for these, so we have traditionally simply discard these
unexpected results.
Another kind of intermittent test is one that will produce an expected
result when rerun (ie will flake). Some of these are also labeled with
bugs, while some are not. In some cases, there is flakiness in some core
Servo functionality that can lead to *any* test flaking, such as a race
condition that can lead to an early screenshot for reftests. When these
kinds of tests do not have associated bugs, they cause the CI to fail.
In this case, it is impossible to label these tests as intermittent
because it can literally be any test.
This change, reruns failed tests in order to detect unlabeled tests in
the second category. Instead of blocking the CI when the second run
leads to expected results, the CI will now pass, but the flake will be
reported to the new flakiness dashboard. This prevents unrelated flakes
from slowing down the merge queue.
Use the new intermittent dashboard to report intermittents and get
information about open bugs. This is now used to filter out
known-intermittents from results. In addition, this also allows the
scripts to report bug information to the GitHub. Display that in all
output.
This makes it easier to run `update-wpt` based on results from the bots.
A future version of this could aggregate all unexpected results that
were not filtered as intermittents.
After filtering intermittents, output the results as JSON. Update the
GitHub workflow to aggregate this JSON data into an artifact and use the
aggregated data to generate a GitHub comment with details about the try
run. The idea here is that this comment will make it easier to track
intermittent tests and notice when a change affects a test marked as
intermittent -- either causing it to permanently fail or fixing it.
This will allow results to be formatted by other parts of the code (such
as the intermittent filtering) code. Previously, formatting was handled
in ServoHandler, which was a bit strange as it's really only necessary
for GroupingFormatter and the intermittent filtering code. This also
allows the results to be properly typed by the Python typing system.
This change integrates the filter-intermittents command into test-wpt.
This is in preparation for future work on tracking intermittent
failures. This change also:
- Removes the SrvoJson logger and replaces it with a generic WPT log
handler which tracks unexpected results.
- The intermittent filter is now controlled via environment variables
and the GitHub version requires a token instead of credentials.
- Output is saved to a single file and is always text.
Combine `run.py` and `update.py` into `servowpt.py` in order to allow
them to share code. Import them directly into the mach script to avoid
having to call `compile` and `exec` on the code. This makes it clearer
how they are executed. In addition, move all of the setup into
`setupwpt.py` to avoid differences between tests executed via mach and
not. Finally, be more ambitious when detecting the build to use. If none
was specified, try to use the one that exists between "release" and
"debug."