Нова версія середовища виконання Mathpar-DAP

Loading...
Thumbnail Image
Date
2023
Authors
Сідько, Алла
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
У цій статті згадано основні особливості середовища виконання децентралізованого управління розподіленими обчисленнями DAP (Drop-Pine-Amine), які було опубліковано в [4]. Головною метою цієї статті є опис нових функціональних можливостей, які з’явилися в останньому випуску. Як приклад алгоритму з блоковою рекурсією описано факторизацію Холецького симетричної позитивно означеної матриці у вигляді блокового дихотомічного алгоритму. Результати експериментів демонструють гарну масштабованість запропонованого рішення. Запропоновано розвивати співпрацю у цій науковій сфері. Розроблений програмний пакет відкритий для спільного розроблення, його можна вільно використовувати для наукових і освітніх цілей.
Description
In this paper, we recall the main features of the DAP runtime, that was published in [4]. But the main purpose of this paper is to describe the new functionality that appeared in our latest release. As an example of a block recursive algorithm, the Cholesky factorization of a symmetric positive definite matrix in the form of a block dichotomous algorithm is described. The results of experiments demonstrate good scalability of the proposed solution. Modern supercomputer systems containing hundreds of thousands of cores face difficulties in the organization of parallel computations (e.g., see [1]). The three main difficulties are the nonuniform hardware workload, accumulation of errors in the process of computations with large matrices, and possible failures of cores during the computation process. Recently, a universal Dynamic Task Discovery (DTD) scheme for the PaRSEC runtime environment [2], [3] has been developed. This environment can support systems with shared and distributed memory. This new paradigm demonstrated better performance compared with the parameterized task scheduling that was used earlier. In [1] we described a new runtime environment for supercomputers with distributed memory. It is designed for solving matrix problems using block recursive algorithms. Its main advantage is to provide an efficient computational process and good scalability of programs both for sparse and dense matrices on a cluster with distributed memory. Another advantage is the ability to reorganize the computational process in the event of failure of individual nodes during computations. A key feature of DAP is its ability to sequentially unroll functions in depth, maintaining all states at any nesting level until all computations in the current computational subtree are complete. This design allows any processor to switch freely between subtasks without waiting for the completion of the current subtask. An important feature of this runtime environment is protection against failures of some nodes during computations. The parent node that sent a drop to its child node must receive a result. However, instead of a result, it may receive a message regarding the status of the child node. In such cases, the drop task is redirected to an alternate node. No additional changes to the other nodes are required. As a result, only the subtree corresponding to this drop will be lost and subsequently recalculated. We would like to develop cooperation in this scientific area. The software package developed by us is open for joint development, and can be freely used for scientific and educational purposes.
Keywords
розподілені обчислення, паралельне програмування, середовище виконання, OpenMPI, стаття, distributed computing, parallel programming, OpenMPI, runtime
Citation
Сідько А. А. Нова версія середовища виконання Mathpar-DAP / Сідько А. А. // Наукові записки НаУКМА. Комп'ютерні науки. - 2023. - Т. 6. - С. 76-80. - https://doi.org/10.18523/2617-3808.2023.6.76-80