Development and digitalization: the Crimean Tatar linguistic corpus was created

Development and digitalization: the Crimean Tatar linguistic corpus was created

[ad_1]

The Ministry of Reintegration together with the public organization QIRI’M Young launched an electronic database of texts in Crimean Tatar — National corpus of the Crimean Tatar language (NKCM). The collection includes texts of various genres and historical eras, which will be used for educational projects, scientific research and program development.

This was reported by the press service of the Ministry of Reintegration.

The corpus will be used for language research, as well as for the implementation of Crimean Tatar in operating systems, online translators, dictionaries and spell-checking programs.

UP Culture talked with a direct participant of the NKKM, an assistant at the Department of Turkology at KNU, a philologist Abibullah Seit-Jelil. He said that the issue of financial support arose in the scientific environment.

The idea of ​​creating the National Corpus of the Crimean Tatar language is connected with the fact that since the opening of the specialty [прим. – кримськотатарської мови та літератури] at Taras Shevchenko Kyiv National University, students and teachers are directly faced with the problem of the lack of any sources of the Crimean Tatar language and about the Crimean Tatars. Libraries, unfortunately, do not have such a source base as there is in Crimea, and it was almost impossible to get anything from Crimea or to find this data on the Internet“, the philologist said.

Photo: NBUV

He also added that all available texts written in Cyrillic are translated into Latin script. After all, from September 2021, the official transition of the Crimean Tatar language to Latin took place.

The electronic base is aimed primarily at helping researchers who are engaged in the development of printed and electronic dictionaries. Therefore, the grammatical structure, etymology, and historical changes in words can be traced in the NKKM. In addition, the linguistic corpus will also be useful to literary scholars, because the database also collects materials about Crimean Tatar writers.

As noted in the National Corpus of the Crimean Tatar language, the database can be used:

  • To find the most illustrative examples of the use of words when creating explanatory dictionaries.
  • To analyze the usage of words in different eras when creating historical dictionaries.
  • For automatic language detection and processing by machine translation and spelling checking systems (for example, Google Translate, LanguageTools, etc.) using N-grams tools, Keywords.

During the work on the project, 30 experts analyzed more than 900 materials, including fiction and scientific literature, periodicals and other texts. Also, the NKKM kept original texts with errors, in particular, features of the author’s language.

The creators also emphasize that the project will be an important step for the preservation and popularization of the Crimean Tatar language.

New electronic dictionaries, as well as programs for correction and machine translation of texts in the Crimean Tatar language, can be created with the help of the NKKM database. Such developments will contribute to the popularization of the language both in everyday life and in the scientific and literary spheres. In addition, the linguistic base of the NKKM will expand the possibilities of the Crimean Tatar language at international technical and educational platforms“, the linguistic center wrote.

The project was implemented within the framework of the Strategy for the Development of the Crimean Tatar Language for 2022-2032. It is created with support Eastern Europe Foundation, Representations of the President of Ukraine in the Autonomous Republic of Crimea, Embassy of Switzerland in Ukraine, Ministry of Reintegration and Taras Shevchenko Kyiv National University.

Read also: Geographical objects in Crimea will return their names in the Crimean Tatar language – Ministry of Reintegration

[ad_2]

Original Source Link