Weak scaling
This branch is based on the gpu-hack-noallocs branch.
- Reduce memory footprint of the tree (no longer the bottleneck)
- Remove unnecessary data (ro_0 and p_0)
- Remove initFluidDensityAtRest call
- Fix work allocation and load balancing for both small and high number of ranks
- Fix a debug macros and add some more
- Fix Global Tree configuration for running at scale