Galaxy Merger using HPX-5
Simulations have long played a role in astrophysicists developing an understanding of physical phenomena in the Universe. One class of problems that have been solved with high performance computing is n-body problems, which evolve the dynamics of a large number of point particles subject to their mutual interaction. With such simulations scientists can study galaxy formation and evolution, both in individual cases and in the context of the large scale structure of the universe.
The Barnes-Hut (BH) algorithm has been used for decades to decrease the arithmetic complexity of computing the interactions of large numbers of particles. The BH algorithm is an exemplar dynamic adaptive computational method; it is often applied to problems with a huge range of scales both in space and in time. As the distribution of particles modeled changes, the amount of work that each particle needs to perform changes. The BH algorithm is thus an excellent target for HPX-5’s dynamic adaptive features.
A BH code, Yggdrasil, is being developed that leverages the power of HPX-5 for creating scalable efficient parallel codes. The current implementation uses many features of HPX-5: the central data in the BH scheme, a tree, is placed into HPX-5’s global address space; the execution proceeds in a message-driven fashion, with events occurring when the needed data is available, or by moving the work to the needed data; Local Control Objects perform lightweight synchronization to assure correct computation; and lightweight ephemeral threads are spawned to perform each computational task, allowing the fine grained parallelism inherent in BH to be expressed with HPX-5.
This movie shows the position of the disk particles in a merger of two Milky Way-like disk galaxies. THis simulation has a total of 2 million particles in both the disk and dark matter (dark matter is not shown). The time frame of the movie is 2 billion years. The simulation has no gas dynamics, and a relatively low force resolution. The time to compute this solution was lower than it took GADGET-3 for the same initial conditions with the same parameters.
HPX-5 Running on Cori@NERSC (Cray XC-40) with dynamic scheduling to stay under soft power cap
We show the performance scalability of HPX-5 integrated with APEX. The demo shows the LULESH application running on NERSC Edison, using the Photon integrated communication library. APEX Introspection observes the application, runtime, OS and hardware to maintain the APEX state, while its Policy Engine enforces policy rules to adapt, constrain or otherwise modify the application behavior. This application shows APEX adapting to the runtime to turn hyper-threading down. Photon supports a tight coupling of the runtime system with the underlying network fabric that scales and remains performant in exascale environments.
Dynamic, Adaptive Execution of LULESH in HPX-5 demo
One goal of asynchronous multithreaded runtimes is to tolerate latency with concurrency. This visualization depicts this capability in HPX-5 through two different instantiations of the LULESH mini app. The first is a traditional one domain per core decomposition that clearly demonstrates the underlying synchronous nature of the application. The second is an over-decomposition using eight smaller domains per core. The HPX-5 scheduler is aware of dependencies and schedules computation as it becomes enabled. This naturally results in overlapped communication and computation, as well as more effective cache use, and outperforms the MPI reference implementation by 15% at this scale.
Automatic, Dynamic Load Balancing of FMM in HPX-5 demo
One demonstrates the effectiveness of dynamic load balancing through global data relocation using the active global address space (AGAS) in HPX-5. With the Fast Multipole Method (FMM) application as an example, the visualization will show the reduction in communication due to dynamic global data rebalancing. The source and target spatial decomposition trees in FMM will be laid out next to each other with nodes in the tree colored based on their physical location. As the FMM application runs, HPX-5 will perform online profiling of accesses to the global data. During the load-balancing phase, the optimal data distribution will be determined by performing edge-cut recursive partitioning of the aggregated communication graph. The visualization will resume after global data rebalancing to show the apparent decrease in remote communication activity.