Багатомовні корпуси Інституту славістики Польської академії наук – CLARIN-PL. Польсько-литовський паралельний корпус "2" та Польсько-український паралельний корпус
Loading...
Date
2020
Authors
Левчук, Павло
Рошко, Данута
Рошко, Роман
Journal Title
Journal ISSN
Volume Title
Publisher
Дух і Літера
Abstract
У статті описано групу Clarin-PL, яка є польським представництвом
Європейської дослідницької інфраструктури CLARIN ERIC. Представлено завдання та цілі інфраструктури CLARIN ERIC та групи Clarin-PL.
Як приклади подано окремі мовні засоби та ресурси, розроблені групою
Clarin-PL. Особливу увагу присвячено тим багатомовним ресурсам, головна роль у побудові яких належить команді Інституту славістики Польської академії наук (ІС ПАН), зокрема це два розширені багатомовні корпуси
сучасних текстів Polish-Lithuanian Parallel Corpus "2" i Polish-Ukrainian
Parallel Corpus. Схарактеризовано провідну роль ІС ПАН у побудові групою
Clarin-PL багатомовних корпусів. Окреслено нові, вже розпочаті та заплановані завдання, пов’язані з побудовою багатомовних ресурсів Clarin-PL.
Background. This article describes the Clarin-PL consortium, which represents the Polish contribution to the CLARIN ERIC European research infrastructure. The aims and tasks of both CLARIN ERIC and Clarin-PL are presented. Purpose. Presentation of the achievements of researchers from the Institute of Slavic Studies of the Polish Academy of Sciences in the field of creating and developing multilingual corpora, including tagging and parallelizing texts. Methods. The team of the Institute of Slavic Studies of the Polish Academy of Sciences adopted common assumptions for the construction of multilingual corpora of the Slavic and Baltic languages. Namely, the corpora contains selected modern texts that represent all functional styles to the greatest extent. Mutual translations are preferred. Results. The article presents a description of selected multilingual resources created by Clarin-PL and made available online via the Clarin-PL website, which a team from the Institute of Slavic Studies of the Polish Academy of Sciences (IS PAN) played a key role in creating. These resources are two expanded multilingual corpora of parallel contemporary texts: the Polish-Lithuanian Parallel Corpus 2 and the Polish-Ukrainian Parallel Corpus. Due to the fact that IS PAN played a leading role in the development of the multilingual corpora in the Clarin-PL consortium, it was decided to present an outline of corpus linguistics development in IS PAN. Discussion. The European Clarin-ERIC infrastructure is steadily developing. Scattered resources (previously created and newly emerging) are combined into a coherent whole. The Polish Consortium Clarin-PL primarily creates and develops resources and tools for the Polish language. The aim of these works is to provide the recipient with the highest possible quality of corpora compatible with constantly changing standards, allowing for the versatile use of tools.
Background. This article describes the Clarin-PL consortium, which represents the Polish contribution to the CLARIN ERIC European research infrastructure. The aims and tasks of both CLARIN ERIC and Clarin-PL are presented. Purpose. Presentation of the achievements of researchers from the Institute of Slavic Studies of the Polish Academy of Sciences in the field of creating and developing multilingual corpora, including tagging and parallelizing texts. Methods. The team of the Institute of Slavic Studies of the Polish Academy of Sciences adopted common assumptions for the construction of multilingual corpora of the Slavic and Baltic languages. Namely, the corpora contains selected modern texts that represent all functional styles to the greatest extent. Mutual translations are preferred. Results. The article presents a description of selected multilingual resources created by Clarin-PL and made available online via the Clarin-PL website, which a team from the Institute of Slavic Studies of the Polish Academy of Sciences (IS PAN) played a key role in creating. These resources are two expanded multilingual corpora of parallel contemporary texts: the Polish-Lithuanian Parallel Corpus 2 and the Polish-Ukrainian Parallel Corpus. Due to the fact that IS PAN played a leading role in the development of the multilingual corpora in the Clarin-PL consortium, it was decided to present an outline of corpus linguistics development in IS PAN. Discussion. The European Clarin-ERIC infrastructure is steadily developing. Scattered resources (previously created and newly emerging) are combined into a coherent whole. The Polish Consortium Clarin-PL primarily creates and develops resources and tools for the Polish language. The aim of these works is to provide the recipient with the highest possible quality of corpora compatible with constantly changing standards, allowing for the versatile use of tools.
Description
Keywords
CLARIN ERIC, Clarin-PL, паралельні корпуси, польсько-український паралельний корпус, польсько-литовський паралельний корпус, стаття, Parallel Corpus, Polish-Lithuanian Parallel Corpus, Polish-Ukrainian Parallel Corpus
Citation
Левчук П.. Багатомовні корпуси Інституту славістики Польської академії наук – CLARIN-PL. Польсько-литовський паралельний корпус "2" та Польсько-український паралельний корпус / Павло Левчук, Данута Рошко, Роман Рошко // Мова: класичне - модерне - постмодерне. - 2020. - Вип. 6. - С. 146-170. - https://doi.org/10.18523/lcmp2522-9281.2020.6.146-170