Global tree creation
The consequence of this update is a reduced number of halos particles for the first iteration. This reduces the amount of memory needed (in the first iteration), and makes it much faster too.
Using size_t everywhere we deal with large number of particles enable the mini-app to handle large test cases. It has been tested on Piz Daint with 32,000,000,000 particles on 1024 nodes (12 cores + GPU).