Метод формування навчальної вибірки для масивів даних на основі машинного навчання

Ліп’яніна-Гочаренко, Христина

Метод формування навчальної вибірки для масивів даних на основі машинного навчання

dc.contributor.author	Ліп’яніна-Гочаренко, Христина
dc.date.accessioned	2024-04-26T05:45:02Z
dc.date.available	2024-04-26T05:45:02Z
dc.date.issued	2023
dc.description	The study introduces an innovative methodology for crafting training samples through the integration of machine learning techniques. This method encompasses a fusion of RFM (Recency, Frequency, Monetary) analysis and cluster analysis, offering a comprehensive approach to sample formation. The application of this approach is demonstrated on a dataset derived from concluded tender agreements by participants in Ukraine, sourced from the ProZorro Sales platform. The compiled dataset encompasses an impressive volume, encompassing a total of 92,638 auctions, which further breaks down into 29,164 distinct auctions and an assemblage of 39,747 unique organizers. The utilization of RFM analysis within this framework yields the categorization of the dataset into distinct groups, each characterized by its own distinct attributes. These groupings include designations such as "The Best Organizers of Tenders," "Loyal Organizers of Tenders," "Large Consumers," "Tenders Held Infrequently but with Substantial Sums," and "Weak Tender Organizers." Following the RFM analysis, the K-means clustering methodology is implemented, resulting in the division of the data into five clusters, each contributing to a nuanced differentiation of diverse organizer profiles. Intriguingly, a comparative analysis involving RTF (Relative Total Frequency) scores and the K-means groupings reveals congruence between clusters representing organizers who actively orchestrate numerous tenders with significant monetary value, as well as clusters characterized by minimal tender activity with less substantial monetary implications. To validate the efficacy of the proposed method, rigorous testing is conducted employing Logistic Regression and Naive Bayes algorithms. Encouragingly, the results consistently showcase impressive accuracy for both methods, highlighting their robustness. An outlook towards future research endeavors suggests a promising avenue of developing an automated system for the selection of tender organizers, underpinned by machine learning principles. Such a system would undoubtedly revolutionize the optimization of participation strategies within the domain of tender processes, fostering efficiency and accuracy in decision-making.	en_US
dc.description.abstract	У цій роботі запропоновано новий метод формування навчальної вибірки на базі машинного навчання, що об’єднує дані з RFM-аналізу та кластерного аналізу. Метод застосовано до даних, отриманих з аукціонів українського сайту ProZorro Продажі. Запропонована вибірка охоплює 92 638 аукціонів, 29 164 унікальні аукціони та 39 747 унікальних організаторів. У процесі RFM-аналізу дані розбито на групи: "Найкращі організатори тендерів", "Вірні організатори тендерів" та ін. Далі, методом K-means, дані були поділено на кластери, що дало змогу відокремити різні категорії організаторів. Результати тестування, проведеного з використанням Logistic Regression і Naive Bayes, засвідчили високу точність для обох методів. Продемонстровано, що вибірка та групування за допомогою запропонованого методу допомагають відрізняти організаторів тендерів за їхніми характеристиками та результатами. Подальші дослідження мають бути у напрямі розроблення автоматизованої системи для вибору організаторів тендерів на основі машинного навчання, що сприятиме оптимізації участі у тендерних процедурах.	uk_UA
dc.identifier.citation	Ліп'яніна-Гончаренко Х. В. Метод формування навчальної вибірки для масивів даних на основі машинного навчання / Ліп'яніна-Гончаренко Х. В. // Наукові записки НаУКМА. Комп'ютерні науки. - 2023. - Т. 6. - С. 30-35. - https://doi.org/10.18523/2617-3808.2023.6.30-35	uk_UA
dc.identifier.uri	https://ekmair.ukma.edu.ua/handle/123456789/29243
dc.identifier.uri	https://doi.org/10.18523/2617-3808.2023.6.30-35
dc.language.iso	uk	uk_UA
dc.relation.source	Наукові записки НаУКМА. Комп'ютерні науки. Том 6	uk_UA
dc.status	first published	uk_UA
dc.subject	навчальна вибірка	uk_UA
dc.subject	машинне навчання	uk_UA
dc.subject	RFM-аналіз	uk_UA
dc.subject	кластерний аналіз	uk_UA
dc.subject	тендери	uk_UA
dc.subject	стаття	uk_UA
dc.subject	training sample	en_US
dc.subject	machine learning	en_US
dc.subject	cluster analysis	en_US
dc.subject	tenders	en_US
dc.subject	RFM analysis	en_US
dc.title	Метод формування навчальної вибірки для масивів даних на основі машинного навчання	uk_UA
dc.title.alternative	Method for forming training samples for data arrays based on machine learning	en_US
dc.type	Article	uk_UA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lipianina-Hocharenko_Metod_formuvannia_navchalnoi_vybirky_dlia_masyviv_danykh_na_osnovi_mashynnoho_navchannia.pdf
Size:: 1.59 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Том 6