Find neighbors gpu
New iterative findNeighbors with good offloading performance.
Inspired by: https://devblogs.nvidia.com/thinking-parallel-part-ii-tree-traversal-gpu/
The tree is transformed into a 1D array just for the GPU walk (LinearOctree.hpp). The code tries to minimize execution divergence by keeping CUDA threads in sync.
A CUDA version of the code is also included.
Marking as draft for the following reasons:
- Hard-coded constants in findNeighbors.hpp should be moved elsewhere
- Can we compute them?