The Center for Research in Extreme Scale Technologies (CREST) at Indiana University is pleased to announce the release of version 4.0 of HPX-5, a state-of-the-art runtime system for extreme-scale computing. Version 4.0 of the HPX-5 runtime systems represents a significant maturation of the sequence of HPX-5 releases to date. It incorporates new optimization for performance, features associated with the ParalleX execution model, and programmer services including, C++ bindings and collectives.
HPX-5 is a realization of the ParalleX execution model, which establishes the runtime’s roles and responsibilities with respect to other interoperating system layers, and explicitly includes a performance model that provides an analytic framework for performance and optimization. As an Asynchronous Multi-Tasking (AMT) software system, HPX-5 is event-driven, enabling the migration of continuations and the movement of work to data, when appropriate, based on sophisticated local control synchronization objects (e.g., futures, dataflow) and active messages. ParalleX compute complexes, embodied as lightweight, first-class threads, can block, perform global mutable side-effects, employ non-strict firing rules, and serve as continuations. HPX-5 employs an active global address space in which virtually addressed objects can migrate across the physical system without changing address. First-class named processes can span and share nodes.
HPX-5 is an evolving runtime system used both to enable dynamic adaptive parallel applications and to conduct path-finding experimentation to quantify effects of latency, overhead, contention, and parallelism of its integral mechanisms. These performance parameters determine a tradeoff space within which dynamic control is performed for best performance. It is an area of active research driven by complex applications and advances in HPC architecture. HPX-5 employs dynamic and adaptive resource management and task scheduling to achieve the significant improvements in efficiency and scalability necessary to deploy many classes of parallel applications on the largest (current and future) supercomputers in the nation and world. Although still under development, HPX-5 is portable to a diverse set of systems, is reliable and programmable, scales across multi-core and multi-node systems, and delivers efficiency improvements for irregular, time-varying problems.
HPX-5 is written primarily in portable C99 and is released under an open source BSD license. Future major releases will be delivered semi-annually, and correctness and performance bug fixes will be made available as required. To support active engagement with the larger developer community, active development branches are available. HPX-5 will also be disseminated through the OpenHPC consortium led by the Linux Foundation.
HPX-5 research is an ongoing program sponsored by the National Science Foundation and Department of Energy with strong support by IU. Visit http://crest.sice.indiana.edu/hpx-5 to learn more and to download the software.
A revolution in supercomputing is underway to overcome the barriers in achieving practical exascale computing and beyond for the general class of applications. Using complementing methods employing incremental changes to conventional practices, a pathfinding paradigm shift is being explored and implemented by multiple institutions exploiting dynamic adaptive execution through the use of introspective runtime systems. HPX-5 is a state-of-the-art runtime software system developed to enable highly scalable and efficient computation by providing application programmers with a set of powerful system software functions for the management of resources, scheduling and immediate tasks, and event-driven control in asynchronous execution environments. These capabilities are particularly important for dynamic irregular problems exposing and exploiting far greater parallelism than usual means while mitigating latency sources and effects, and reducing overheads such as global barriers.
The HPX-5 package provides the programmer with a set of important runtime constructs to benefit from the efficiency and scalability opportunities of dynamic adaptive execution. It creates an application context of a global name space with a dynamic hierarchy of P-processes in a virtual address space, each of which may span multiple nodes. It exposes and exploits a unified class of tasks (e.g., threads) for coarse, medium, and fine-grain application parallelism for dramatic scalability and efficiency advantage. It supports event-driven computation to move work to data (when appropriate) to reduce latency and hide latency effects. HPX-5 employs a set of semantically rich synchronization and lightweight control objects to create a dynamically distributed layer of control state (in addition to the conventional static program counters) comprising a graph of migrating continuations. This addresses the uncertainties of asynchrony, avoids contention due to global barriers, load balances work, and adapts to unpredictable data distribution. It supports introspection and scheduling priority policies for a diversity of application requirements. Together, HPX-5 functions deliver an innovative and powerful environment for dynamic control of resources and adaptive scheduling of application tasks.
Runtime Software Architecture
The addition of a runtime system layer to the system software stack between the application programming interface and the operating system offers a unique opportunity to exploit continuous information about the evolving state of the system and its progress towards application goals. The architecture of the HPX-5 runtime software system comprises its structure of separate but interoperable functional units, and the semantics of its interface is exposed to the programmer, compiler, and high-level application libraries.
The core of the HPX-5 runtime system is the compute-complex supervisor (or thread supervisor on conventional platforms) that controls local tasks, creates and terminates tasks, efficiently performs context switching for preemption, multi-tasking, load balancing, and blocking avoidance, and supports dynamic resource allocation. HPX-5 drives thread instantiation with a second major function that takes commands from local parent threads directly or uses message-driven computation from remote threads to create new threads.
A third major function creates and sustains a global name space and corresponding global address space that gives direct access to any remote global resource, such as data, synchronization objects, or executing threads and processes. It supports ParalleX processes (P-processes) that are unique in that they potentially span or share multiple nodes. The global address space infrastructure permits virtual objects to migrate in physical space without modifying their virtual addresses. A set of optimized library functions delivers efficient synchronization primitives, such as dataflow and futures, that also serve to support continuations, its migration, and asynchrony adaptivity. Finally, an internal procedure is provided for introspection which, although invisible to the programmer, determines priorities of scheduling and resource management, in part based on system status and progress towards program goals. Back-ends are optimized for system platform architectures, networks, and operating systems.