Known issues:
* Display list optimization can sometimes optimize out elements that
should be shown. This affects the Enyo demo.
* The `overflow: scroll` container doesn't clip the inner layer properly
when borders, border radius, etc. are present.
* `overflow-x: scroll` and `overflow-y: scroll` don't work individually;
elements are scrolled all at once.
* Scrolling only works on absolutely-positioned elements.
This is used for two memory reporting improvements.
- It's used to distinguish "explicit" memory reports from others. This
mirrors the same categorization that is used in Firefox, and gives a single
tree that's the best place to look. It replaces the "pages" tree which
was always intended to be a temporary stand-in for "explicit".
- It's used to computed "heap-unclassified" values for both the jemalloc
and system heaps, both of which are placed into the "explicit" tree.
Example output:
```
| 114.99 MiB -- explicit
| 52.34 MiB -- jemalloc-heap-unclassified
| 46.14 MiB -- system-heap-unclassified
| 14.95 MiB -- url(file:///home/njn/moz/servo2/../servo-static-suite/wikipe
dia/Guardians%20of%20the%20Galaxy%20(film)%20-%20Wikipedia,%20the%20free%20encyc
lopedia.html)
| 7.32 MiB -- js
| 3.07 MiB -- malloc-heap
| 3.00 MiB -- gc-heap
| 2.49 MiB -- used
| 0.34 MiB -- decommitted
| 0.09 MiB -- unused
| 0.09 MiB -- admin
| 1.25 MiB -- non-heap
| 1.36 MiB -- layout-worker-3-local-context
| 1.34 MiB -- layout-worker-0-local-context
| 1.24 MiB -- layout-worker-1-local-context
| 1.24 MiB -- layout-worker-4-local-context
| 1.16 MiB -- layout-worker-2-local-context
| 0.89 MiB -- layout-worker-5-local-context
| 0.38 MiB -- layout-task
| 0.31 MiB -- display-list
| 0.07 MiB -- local-context
| 1.56 MiB -- compositor-task
| 0.78 MiB -- surface-map
| 0.78 MiB -- layer-tree
```
The heap-unclassified values dominate the "explicit" tree because reporter
coverage is still quite poor.
To actually make the multiprocess communication work, we'll need to
reroute the task creation to the pipeline or the compositor. But this
works as a first step.
profile: Make the time and memory profilers run over IPC.
Uses a couple of extra threads to work around the lack of cross-process
boxed trait objects.
r? @nnethercote
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/6629)
<!-- Reviewable:end -->
We currently store LayerBuffers, because previously NativeSurfaces did
not record their own size. Now we can store NativeSurfaces directly,
which saves a bit of space in the surface cache and allows us to create
LayerBuffers only in the PaintTask.
This also means that instead of sending cached LayerBuffers, the
compositor can just send cached NativeSurfaces to the PaintTask.
Add memory profiling for the compositor task
Currently only the BufferMap is recorded, but a later change will also
measure the memory usage of the compositor tree.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/6640)
<!-- Reviewable:end -->