Integrating Low-Latency Analysis Into Hpc System Monitoring
Keywords
Application; Low-latency analysis; Performance data processing; System monitoring
Abstract
The growth of High Performance Computer (HPC) systems increases the complexity with respect to understanding resource utilization, system management, and performance issues. While raw performance data is increasingly exposed at the component level, the usefulness of the data is dependent on the ability to do meaningful analysis on actionable timescales. However, current system monitoring infrastructures largely focus on data collection, with analysis performed off-system in post-processing mode. This increases the time required to provide analysis and feedback to a variety of consumers. In this work, we enhance the architecture of a monitoring system used on large-scale computational platforms, to integrate streaming analysis capabilities at arbitrary locations within its data collection, transport, and aggregation facilities. We leverage the flexible communication topology of the monitoring system to enable placement of transformations based on overhead concerns, while still enabling low-latency exposure on node. Our design internally supports and exposes the raw and transformed data uniformly for both node level and off-system consumers. We show the viability of our implementation for a case with production-relevance: run-time determination of the relative per-node files system demands.
Publication Date
8-13-2018
Publication Title
ACM International Conference Proceeding Series
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1145/3225058.3225086
Copyright Status
Unknown
Socpus ID
85054838222 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85054838222
STARS Citation
Izadpanah, Ramin; Naksinehaboon, Nichamon; Brandt, Jim; Gentile, Ann; and Dechev, Damian, "Integrating Low-Latency Analysis Into Hpc System Monitoring" (2018). Scopus Export 2015-2019. 7670.
https://stars.library.ucf.edu/scopus2015/7670