The idea is to put the memmoving of the ThreadLocalContext out of line from the point of view of top_down_dom so we don't need to allocate a large chunk of stack space for it.
It turns out that it's problematic to embed ThreadLocalStyleContext within
LayoutContext, because parameterizing the former on TElement (which we do
in the next patch) infects all the traversal stuff with the trait parameters,
which we don't really want.
In general, it probably makes sense to use separate scoped TLS types for
the separate DOM and Flow tree passes, so we can add a different ScopedTLS
type for the Flow pass if we ever need it.
We also reorder the |scope| and |shared| parameters in parallel.rs, because
it aligns more with the order in style/parallel.rs. I did this when I was
adding a TLS parameter to all these functions, which I realized we don't need
for now.