Increase parallelism on Linux WPT testing
The time taken by each chunk is uneven, with WPT-1 the longest before this change at 30 ~ 45 minutes. This reduces it to 15 ~ 20 minutes.
Surprisingly, increasing the number of processes seems to also make `test_element_in_collection` in `/webdriver/tests/execute_script/cyclic.py` **unexpectedly pass**. This happened reliably in three different runs:
https://community-tc.services.mozilla.com/tasks/S9O27WJvSa6j2PSjcRcbBA/runs/2
Enable some mach commands to be run with python3
This change finally enable the following commands to be run with python3:
* `build`
* `test-unit`
* `package`
As previously explained, `test-tidy` will require more work in the wpt repository directly. Maybe `test-tidy --no-wpt` is achievable relatively quickly though.
For possible remaining bits that might need to be worked on, see https://github.com/servo/servo/issues/23607
---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
<!-- Either: -->
- [x] There are tests for these changes
<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->
<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
The time taken by each chunk is uneven, with WPT-1 the longest
before this change at 30~45 minutes. This reduces it to 15~20 minutes.
Surprisingly, increasing the number of processes seems to also make
`test_element_in_collection` in `/webdriver/tests/execute_script/cyclic.py`
**unexpectedly pass**. This happened reliably in three different runs:
https://community-tc.services.mozilla.com/tasks/S9O27WJvSa6j2PSjcRcbBA/runs/2
GStreamer plugin should use GLMemory
<!-- Please describe your changes on the following line: -->
Get the gstreamer servosrc plugin to generate frames in GLMemory rather than main memory.
---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix#24831
- [x] These changes do not require tests because it's an embedding perf issue
<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->
<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
… now that remaining dual-core workers have been upgraded to quad-cores.
Also reduces the number of WPT chunks since per-task overhead becomes more siginificant when tasks are becoming shorter.
Fix updating the GitHub Status as soon as any TC task fails
… rather than only when the entire task group is resolved. This allows Homu to more quickly be notified of a failure, and move on to the next PR in the queue sooner.
(Plus drive-by Brewfile fix.)
… rather than only when the entire task group is resolved.
This allows Homu to more quickly be notified of a failure,
and move on to the next PR in the queue sooner.
… since other time-sensitive tasks depend on them.
Note: we need to be careful with task priorities,
especially in worker pools with limited capacity,
since they are absolute and can cause starvation:
https://docs.taskcluster.net/docs/manual/tasks/priority
## Before this
Before this PR, we had roughly as many chunks as available workers.
Because the the number of test files is a poor estimate for the time
needed to run them, we have significant variation in the completion time
between chunks when testing one given PR.
https://github.com/servo/taskcluster-config/pull/9 adds a tool to collect
this data. Here are two full runs of `test_wpt` before this PR:
https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw
```
count 1, total 0:00:32, max: 0:00:32 docker 0:00:32
count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14
count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19
count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```
Times for a given chunk vary between 19 minutes and 61 minutes.
Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing
this means that that worker sits idle for 42 minutes
and our limited CPU resources are under-utilized.
When there *are* `try` PRs being tested however, they compete with
each other and any `r+` PR for the same workers. If we get unlucky,
a 61 minute task could only *start* after some other tasks have finished,
Increasing the overall time-to-merge a lot.
## This
This PR changes the number of chunks to be significantly more
than the number of available workers. When one of them finishes,
that worker can pick up another one instead of sitting idle.
Now the ratio of number of tasks to number of workers doesn’t matter:
the differences in run time between tasks becomes somewhat of an advantage
and the distribution to workers evens out on average.
The number 30 is a bit arbitrary. A higher number reduces resource
under-utilization, but increases the effect of per-task overhead.
The git cache added in https://github.com/servo/servo/pull/24753
reduced that overhead, though.
Another worry I had was whether this would make worse the similar problem
of unequal scheduling between processes within a task,
where some CPU cores sit idle while the rest processes finish their
assigned work.
This turned out not to be enough of a problem to negatively affect
the total machine time:
https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48 docker 0:00:48
count 1, total 0:39:04, max: 0:39:04 macos-disabled-mac9 0:39:04
count 31, total 4:03:29, max: 0:15:29 macos-disabled-mac9 WPT
0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```
(4h03min is even lower than above, but seems within variation.)
## After this
https://github.com/servo/servo/issues/23655 proposes automatically
restarting failed WPT tasks, in case the failure is intermittent.
With the test suite split into more chunks we have fewer tests per chunk,
and therefore lower probability that a given one fails.
Restarting one of them also causes less repeated work.