WebRender.
This happens asynchronously, just as it does in non-WebRender mode.
This functionality is a prerequisite for doing proper display-list-based
hit testing in WebRender, since it moves the scroll offsets into Servo
(and, specifically, into the script thread, enabling iframe event
forwarding) instead of keeping them private to WebRender.
Requires servo/webrender_traits#55 and servo/webrender#277.
Partially addresses #11108.
This commit adds the `--profiler-trace-path` flag. When combined with `-p` to
enable profiling, it dumps a profile as a self-contained HTML file to the given
path. The profile visualizes the traced operations as a gant-chart style
timeline.
* Sections like `[dependencies.foo]` can be entries in a `[dependencies]`
section with the `{key = value}` syntax.
* Per-target dependencies can be expressed with more general `cfg(…)`
conditions instead of exact target triples:
https://github.com/rust-lang/cargo/pull/2328
speculation code.
The old code tried to do the speculation as a single bottom-up pass
after intrinsic inline-size calculation, which was unable to handle
cases like this:
<div>
<div style="float: left">Foo</div>
</div>
<div>
<div style="overflow: hidden">Bar</div>
</div>
No single bottom-up pass could possibly handle this case, because the
inline-size of the float flowing out of the "Foo" block could never make
it down to the "Bar" block, where it is needed for speculation.
On the pages I tried, this regresses layout performance by 1%-2%.
I first noticed this breaking some pages, like the Google SERPs, several
months ago.
bottom-up pass.
Right now, the only reason that overflow calculation works is that we
rely on script inducing extra reflows that are sent for display. This
was preventing #10021 from landing.
This change regresses layout performance by about 1% in my tests.
Fixes#7797 properly.
Instead of producing a tree of stacking contexts, display list
generation now produces a flat list of display items and a tree of
stacking contexts. This will eventually allow display list construction
to produce and modify WebRender vertex buffers directly, removing the
overhead of display list conversion. This change also moves
layerization of the display list to the paint thread, since it isn't
currently useful for WebRender.
To accomplish this, display list generation now takes three passes of
the flow tree:
1. Calculation of absolute positions.
2. Collection of a tree of stacking contexts.
3. Creation of a list of display items.
After collection of display items, they are sorted based upon the index
of their parent stacking contexts and their position in CSS 2.1
Appendeix E stacking order.
This is a big change, but it actually simplifies display list generation.
There is no good reason to have the two types.
This also means that the result of LayoutTask::profiler_metadata no longer
borrows the LayoutTask, which I'll need later.
Integrate with simple Heartbeats
This PR adds Heartbeats capability to servo. Heartbeats are used for detailed performance and power/energy profiling. We will add the power/energy readings in the future.
New dependencies are introduced which need in-depth reviews. I'm the only one who has had eyes on any of this, and I have limited resources for testing cross-platform compatibility.
* https://github.com/libheartbeats/heartbeats-simple - provides native C libraries from a shared code base:
* hbs[-static] - performance monitoring
* hbs-acc[-static] - performance with accuracy monitoring
* hbs-pow[-static] - performance with power/energy monitoring (the one we're using)
* hbs-acc-pow[-static] - performance with accuracy and power/energy monitoring
* https://github.com/connorimes/heartbeats-simple-sys provides rust wrappers for the native C libraries above - one crate for each + a common crate. These link with the *-static versions of the heartbeats libraries.
* https://github.com/connorimes/heartbeats-simple-rust provides rust abstractions over the -sys crates above - one crate for each.
The new `heartbeats` module in the `profile` crate looks for environment variables telling it to use heartbeats for each ProfilerCategory and where to put log files. (Of course, if somebody knows how to iterate over the enum instead of hardcoding each one, that would be fantastic.) If the environment variables aren't set for particular categories, heartbeats aren't created or used.
An interface change is made in the `profile_traits` crate to pass both the start and end time in a `ProfilerMsg` instead of just the elapsed time. Later we will add energy readings as well.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/7179)
<!-- Reviewable:end -->
This is used for two memory reporting improvements.
- It's used to distinguish "explicit" memory reports from others. This
mirrors the same categorization that is used in Firefox, and gives a single
tree that's the best place to look. It replaces the "pages" tree which
was always intended to be a temporary stand-in for "explicit".
- It's used to computed "heap-unclassified" values for both the jemalloc
and system heaps, both of which are placed into the "explicit" tree.
Example output:
```
| 114.99 MiB -- explicit
| 52.34 MiB -- jemalloc-heap-unclassified
| 46.14 MiB -- system-heap-unclassified
| 14.95 MiB -- url(file:///home/njn/moz/servo2/../servo-static-suite/wikipe
dia/Guardians%20of%20the%20Galaxy%20(film)%20-%20Wikipedia,%20the%20free%20encyc
lopedia.html)
| 7.32 MiB -- js
| 3.07 MiB -- malloc-heap
| 3.00 MiB -- gc-heap
| 2.49 MiB -- used
| 0.34 MiB -- decommitted
| 0.09 MiB -- unused
| 0.09 MiB -- admin
| 1.25 MiB -- non-heap
| 1.36 MiB -- layout-worker-3-local-context
| 1.34 MiB -- layout-worker-0-local-context
| 1.24 MiB -- layout-worker-1-local-context
| 1.24 MiB -- layout-worker-4-local-context
| 1.16 MiB -- layout-worker-2-local-context
| 0.89 MiB -- layout-worker-5-local-context
| 0.38 MiB -- layout-task
| 0.31 MiB -- display-list
| 0.07 MiB -- local-context
| 1.56 MiB -- compositor-task
| 0.78 MiB -- surface-map
| 0.78 MiB -- layer-tree
```
The heap-unclassified values dominate the "explicit" tree because reporter
coverage is still quite poor.