Integrate with simple Heartbeats
This PR adds Heartbeats capability to servo. Heartbeats are used for detailed performance and power/energy profiling. We will add the power/energy readings in the future.
New dependencies are introduced which need in-depth reviews. I'm the only one who has had eyes on any of this, and I have limited resources for testing cross-platform compatibility.
* https://github.com/libheartbeats/heartbeats-simple - provides native C libraries from a shared code base:
* hbs[-static] - performance monitoring
* hbs-acc[-static] - performance with accuracy monitoring
* hbs-pow[-static] - performance with power/energy monitoring (the one we're using)
* hbs-acc-pow[-static] - performance with accuracy and power/energy monitoring
* https://github.com/connorimes/heartbeats-simple-sys provides rust wrappers for the native C libraries above - one crate for each + a common crate. These link with the *-static versions of the heartbeats libraries.
* https://github.com/connorimes/heartbeats-simple-rust provides rust abstractions over the -sys crates above - one crate for each.
The new `heartbeats` module in the `profile` crate looks for environment variables telling it to use heartbeats for each ProfilerCategory and where to put log files. (Of course, if somebody knows how to iterate over the enum instead of hardcoding each one, that would be fantastic.) If the environment variables aren't set for particular categories, heartbeats aren't created or used.
An interface change is made in the `profile_traits` crate to pass both the start and end time in a `ProfilerMsg` instead of just the elapsed time. Later we will add energy readings as well.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/7179)
<!-- Reviewable:end -->
This is used for two memory reporting improvements.
- It's used to distinguish "explicit" memory reports from others. This
mirrors the same categorization that is used in Firefox, and gives a single
tree that's the best place to look. It replaces the "pages" tree which
was always intended to be a temporary stand-in for "explicit".
- It's used to computed "heap-unclassified" values for both the jemalloc
and system heaps, both of which are placed into the "explicit" tree.
Example output:
```
| 114.99 MiB -- explicit
| 52.34 MiB -- jemalloc-heap-unclassified
| 46.14 MiB -- system-heap-unclassified
| 14.95 MiB -- url(file:///home/njn/moz/servo2/../servo-static-suite/wikipe
dia/Guardians%20of%20the%20Galaxy%20(film)%20-%20Wikipedia,%20the%20free%20encyc
lopedia.html)
| 7.32 MiB -- js
| 3.07 MiB -- malloc-heap
| 3.00 MiB -- gc-heap
| 2.49 MiB -- used
| 0.34 MiB -- decommitted
| 0.09 MiB -- unused
| 0.09 MiB -- admin
| 1.25 MiB -- non-heap
| 1.36 MiB -- layout-worker-3-local-context
| 1.34 MiB -- layout-worker-0-local-context
| 1.24 MiB -- layout-worker-1-local-context
| 1.24 MiB -- layout-worker-4-local-context
| 1.16 MiB -- layout-worker-2-local-context
| 0.89 MiB -- layout-worker-5-local-context
| 0.38 MiB -- layout-task
| 0.31 MiB -- display-list
| 0.07 MiB -- local-context
| 1.56 MiB -- compositor-task
| 0.78 MiB -- surface-map
| 0.78 MiB -- layer-tree
```
The heap-unclassified values dominate the "explicit" tree because reporter
coverage is still quite poor.
This puts the larger sub-trees first. E.g. this:
```
| 1.04 MiB -- url(http://en.wikipedia.org/wiki/Main_Page)
| 0.26 MiB -- display-list
| 0.78 MiB -- paint-task # new output line
| 0.78 MiB -- buffer-map # new output line
```
becomes this:
```
| 1.04 MiB -- url(http://en.wikipedia.org/wiki/Main_Page)
| 0.78 MiB -- paint-task # new output line
| 0.78 MiB -- buffer-map # new output line
| 0.26 MiB -- display-list
```
This matches how Firefox's about:memory works.
Now that this is done for all sub-trees, the ad hoc sorting done for
Linux segments is no longer necessary, and has been removed.
A rebuild after touching components/profile/mem.rs now takes 48 seconds (and
only rebuilds `profile` and `servo`) which is much lower than it used to be.
In comparison, a rebuild after touching components/profile_traits/mem.rs takes
294 seconds and rebuilds many more crates.
This change also removes some unnecessary crate dependencies in `net` and
`net_traits`.
- Most of util::memory has been moved into profile::mem, though the
`SizeOf` trait and related things remain in util::memory. The
`SystemMemoryReporter` code is now in a submodule
profile::mem::system_reporter.
- util::time has been moved entirely into profile::time.