Метод кластеризації з використанням багатовимірного адресного сортування
Loading...
Date
2020
Authors
Крещенко, Тарас
Ющенко, Юрій
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
У роботі розглянуто багатовимірне адресне сортування, зокрема декілька методів його реалізації. Описано декілька структур даних для збереження та використання результатів багатовимірного адресного сортування. На прикладі реалізованого програмного проекту продемонстровано
корисність і доцільність використання багатовимірного адресного сортування для розв’язання задач класифікації сукупностей згрупованих даних. Визначено переваги використання багатовимірного адресного сортування при розв’язанні задач кластеризації порівняно з методами, які нині
набули широкого використання.
The paper examines multidimensional address sorting as a way to sort datasets by multiple columns (features of the objects) and to avoid copying data at the same time. Several data structures are proposed for storing and using the resulting data of multidimensional sorting. That includes the usage of indexing arrays, doubly linked lists, and foreign keys in a relational database. Each variant is analyzed in terms of time complexity of performing various tasks. The paper illuminates advantages and disadvantages of using each proposed data structure. When using indexing arrays, by which arrays that store indices of the elements in the desired order are meant, it becomes impossible to access the next or previous element in any other sort in O(1) time. To resolve this problem the paper proposes to use inverted index arrays, which map data points to their indices in each sort. The implementation that uses doubly linked lists shows promising time complexity results, but the one that uses foreign keys has proven to be better, because the presence of primary keys allows to get elements by indices in logarithmic time, or O(log(n)). It also reduces the risk of losing data if a critical error occurs in the main program. One of the main goals of cluster analysis is to make it easier to analyze similar data within a dataset. Multidimentional sorting adds to this goal by simplifying the process of displaying the clustered data and making it possible to compare data points that are close by a chosen feature in any cluster. The implemented software project is used to demonstrate expediency and convenience of using multidimensional address sorting to display and visualize the results of data clustering. The paper identifies advantages of using multidimensional address sorting in solving clustering problems over methods that are currently widely used.
The paper examines multidimensional address sorting as a way to sort datasets by multiple columns (features of the objects) and to avoid copying data at the same time. Several data structures are proposed for storing and using the resulting data of multidimensional sorting. That includes the usage of indexing arrays, doubly linked lists, and foreign keys in a relational database. Each variant is analyzed in terms of time complexity of performing various tasks. The paper illuminates advantages and disadvantages of using each proposed data structure. When using indexing arrays, by which arrays that store indices of the elements in the desired order are meant, it becomes impossible to access the next or previous element in any other sort in O(1) time. To resolve this problem the paper proposes to use inverted index arrays, which map data points to their indices in each sort. The implementation that uses doubly linked lists shows promising time complexity results, but the one that uses foreign keys has proven to be better, because the presence of primary keys allows to get elements by indices in logarithmic time, or O(log(n)). It also reduces the risk of losing data if a critical error occurs in the main program. One of the main goals of cluster analysis is to make it easier to analyze similar data within a dataset. Multidimentional sorting adds to this goal by simplifying the process of displaying the clustered data and making it possible to compare data points that are close by a chosen feature in any cluster. The implemented software project is used to demonstrate expediency and convenience of using multidimensional address sorting to display and visualize the results of data clustering. The paper identifies advantages of using multidimensional address sorting in solving clustering problems over methods that are currently widely used.
Description
Keywords
адресне сортування, списки, двозв’язні списки, дерева, індексація, класифікація, кластерування, кластеризація, кластерний аналіз, машинне навчання, стаття, address sorting, lists, doubly linked lists, trees, index, classification, clustering, clusterization, cluster analysis, machine learning
Citation
Крещенко Т. О. Метод кластеризації з використанням багатовимірного адресного сортування / Крещенко Т. О., Ющенко Ю. О. // Наукові записки НаУКМА. Комп'ютерні науки. - 2020. - Т. 3. - С. 83-87.