Performance Analytics for Computational Experiments (PACE)
About
Understanding computational performance of a complex coupled model like the Energy Exascale Earth System Model (E3SM) poses a singular challenge to domain and computational scientists. Toward that goal, we developed PACE (Performance Analytics for Computational Experiments), a web-enabled framework to summarize performance data collected from E3SM experiments to derive insights and present them through a web portal.
The primary goal of PACE is to serve as a central hub of performance data to provide an executive summary of E3SM experiment performance.
Additionally, PACE is designed to enable the following capabilities:
- Interactive analyses and deep-dives into experiments and application sub-regions, as desired,
- Tracking performance benchmarks and simulation campaigns of interest,
- Facilitating performance research on load balancing and process layouts,
- Identification of bottlenecks to inform targeted optimization efforts.
Presently, it contains data for 199703 experiments.
Architecture
Performance data from simulations executed on supported supercomputers is automatically uploaded to PACE.
PACE processes and ingests that information into the backend database.
Finally, an user can interactively explore the performance data through the web portal.
Development Team
Sarat Sreepathi , Oak Ridge National Laboratory.
Gaurab KC , Oak Ridge National Laboratory.
In collaboration with:
Youngsung Kim, Oak Ridge National Laboratory.
Past Students:
Zachary Mitchell, Pellissippi State Community College.
Gaurab KC, University of Tennessee, Knoxville.
Acknowledgments
Special thanks to Patrick H. Worley for incorporating the timing infrastructure and performance archiving capabilities in E3SM which paved the way for PACE.
This research was supported as part of the E3SM project, funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research. The students' work was partially supported by an appointment to the Science Education and Workforce Development Programs at Oak Ridge National Laboratory, administered by ORISE through the U.S. Department of Energy Oak Ridge Institute for Science and Education. This research used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Thanks to Aaron Donahue and Peter Caldwell for sharing their process layout and atmosphere sub-component timing scripts.