assign-heights phases.
This uses the new work-stealing deque. By default, 3/4 of a thread per
logical CPU is used. This can be tuned with the `-y` flag.
I measured a 65% reflow speedup on `perf-rainbow.html` and a 247% reflow
speedup on `http://en.wikipedia.org/wiki/South_China_Sea` on a 4-core
HyperThreaded Core i7. However, numbers were fairly volatile, especially
for the latter.