Використання алгоритму LSA для кластеризації задач із геометрії

dc.contributor.authorЖежерун, Олександр
dc.contributor.authorБорозенний, Сергій
dc.contributor.authorНіверовський, Микита
dc.date.accessioned2021-01-08T22:02:18Z
dc.date.available2021-01-08T22:02:18Z
dc.date.issued2020
dc.description.abstractУ роботі розглянуто метод LSA (латентно-семантичного аналізу), зокрема його найпоширеніший варіант, що базується на сингулярному розкладі матриці (SVD). На його основі реалізовано алгоритм кластеризації задач і застосовано на прикладі кластеризації задач із геометрії.uk_UA
dc.description.abstractCurrently, there are a huge number of clustering algorithms. The basic idea of most of them is to combine identical sequences into one class or cluster based on similarity. As a rule, the choice of algorithm is determined by the task. As for textual data, the compared components are sequences of words and their attributes (for example, the weight of a word in the text, the type of the named entity, tonality, etc.). Thus, the texts are first transformed into vectors, which are used for various types of manipulation. At the same time, as a rule, there are a number of problems connected with: selection of primary clusters, the dependence of the quality of clustering on the length of the text, determining the total number of clusters, etc. But the most difficult problem is the lack of connection between similar texts, which use different vocabulary. In such cases, the association should take place not only on the basis of similarity, but also on the basis of semantic contiguity or associativity. One of the methods that allows to solve such problems is Latent semantic analysis (LSA). LSA is a method of information processing that analyzes a set of documents and finds the terms that occur there, and on this basis identifies the characteristic factors, topics that characterize the content of the document. Define the following types of correlation: "Word-word"; "Word-paragraph"; "Paragraph-paragraph". These are the three types that a person thinks, comparing parts of the text with the content. LSA technology takes into account not only the frequency of the text use, but also latent (deep) connections. The first article on the Automatic Document Classification [4] was published in the Journal of the ACM in early 1963, and was the first to describe the method of factor analysis as a means of finding information. Factor analysis is a method that determines the relationship between the values of variables. In this paper, the possibility of using latent-semantic analysis for clustering of texts (geometry problems) has been investigated, for which an algorithm and the necessary software have been developed.en_US
dc.identifier.citationЖежерун О. П. Використання алгоритму LSA для кластеризації задач із геометрії / Жежерун О. П., Борозенний С. О., Ніверовський М. М. // Наукові записки НаУКМА. Комп'ютерні науки. - 2020. - Т. 3. - С. 107-113.uk_UA
dc.identifier.issn2617-3808
dc.identifier.urihttps://doi.org/10.18523/2617-3808.2020.3.107-113
dc.identifier.urihttps://ekmair.ukma.edu.ua/handle/123456789/19174
dc.language.isoukuk_UA
dc.relation.sourceНаукові записки НаУКМА. Комп'ютерні науки.uk_UA
dc.statusfirst publisheduk_UA
dc.subjectLSAuk_UA
dc.subjectLSIuk_UA
dc.subjectSVDuk_UA
dc.subjectкластеризаціяuk_UA
dc.subjectстаттяuk_UA
dc.subjectLSAen_US
dc.subjectLSIen_US
dc.subjectSVDen_US
dc.subjectclusteringen_US
dc.titleВикористання алгоритму LSA для кластеризації задач із геометріїuk_UA
dc.title.alternativeUsing the LSA Algorithm for Clustering Geometry Problemsen_US
dc.typeArticleuk_UA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhezherun_Vykorystannia_alhorytmu_LSA.pdf
Size:
398.9 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
7.54 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections