Numerical methods and programming
 RUS  ENG JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB
 General information Latest issue Archive Search papers Search references RSS Latest issue Current issues Archive issues What is RSS

 Num. Meth. Prog.: Year: Volume: Issue: Page: Find

 Num. Meth. Prog., 2019, Volume 20, Issue 3, Pages 182–191 (Mi vmp958)

A comprehensive analysis of performance quality of large supercomputer complexes

Lomonosov Moscow State University, Research Computing Center

Abstract: Currently, the problem of low performance of supercomputer complexes is largely due to the fact that administrators of such complexes cannot always timely detect and eliminate the root causes of reduced efficiency. This largely concerns not the equipment failure (such cases can usually be detected using monitoring systems), but an implicit performance decrease of certain supercomputer components, provided that they seems to continue working correctly. Such a situation arises because there are no sufficiently flexible and convenient software tools for prompt and comprehensive analysis of all the performance quality characteristics of computer systems at the moment. The existing solutions either allow analyzing only a small part of such characteristics or are made as non-universal solutions that satisfy only a small set of specific needs provided by administrators of a particular system. This paper describes a systematic approach to solving this issue, which will allow one to perform a comprehensive analysis of various aspects of supercomputer functioning, primarily related to the execution of supercomputer applications. A software tool developed on the basis of this approach will collect, within a single model, all the most important data on the properties and quality of jobs running on the supercomputer - data on their execution performance, size and duration, presence of specific or abnormal behavior scenarios, the usage of application packages and libraries, etc. Using flexible aggregation capabilities, the required level of detail will be specified - individual users, projects, application packages, subject areas, supercomputer partitions, time ranges, etc. This will allow one to create hundreds and thousands of different views for analyzing the state of the supercomputer, which will help administrators to choose the most suitable option for them.

Keywords: supercomputer, parallel computing, supercomputer applications, performance, efficiency analysis, monitoring data.

DOI: https://doi.org/10.26089/NumMet.v20r317

Full text: PDF file (278 kB)

UDC: 519.68

Citation: Vad. V. Voevodin, “A comprehensive analysis of performance quality of large supercomputer complexes”, Num. Meth. Prog., 20:3 (2019), 182–191

Citation in format AMSBIB
\Bibitem{Voe19} \by Vad.~V.~Voevodin \paper A comprehensive analysis of performance quality of large supercomputer complexes \jour Num. Meth. Prog. \yr 2019 \vol 20 \issue 3 \pages 182--191 \mathnet{http://mi.mathnet.ru/vmp958} \crossref{https://doi.org/10.26089/NumMet.v20r317} \elib{https://elibrary.ru/item.asp?id=39540771}