Mathematical Physics and Computer Simulation
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mathematical Physics and Computer Simulation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Mathematical Physics and Computer Simulation, 2019, Volume 22, Issue 4, Pages 53–63
DOI: https://doi.org/10.15688/mpcm.jvolsu.2019.4.4
(Mi vvgum267)
 

Modeling, informatics and management

Automation of morphological tagging of archival documents

A. S. Komendantov, A. G. Matveev, A. V. Svetlov

Volgograd State University
Abstract: The paper provides the description of the add-on to MyStem stemming tool by I. Segalovich. We designe the application to add to MyStem a convenient graphical interface that is easy to learn and intuitive for users who do not specialize in information technology. It turned out that MyStem correctly processes outdated vocabulary if it is passed into the program using modern Cyrillic. In addition to the convenient interface, our program has the option to work with the outdated Cyrillic alphabet, when for instance, the letters zelo and omega are replaced by “ks” and “o” respectively, and only then the text is transferred for analysis to MyStem, and then the characters are replaced back in the processed document. So our add-on intercepts the output of MyStem tool, reformats and analyzes it in a special way. In addition, the application has functionality for removing homonyms manually if the program was not correct with automatic tagging of morphological characteristics of a word. The main purpose of this application is to prepare morphological tagging of documents of the archival fund “Mikhailovsky Stanichny Ataman” to create a linguistic corpus. During the work on the application, we solved the problem with correct processing of texts containing outdated Cyrillic characters. To implement a functional and user-friendly graphical interface, we use JavaFX platform (OpenJFX).
Keywords: automation of linguistic analysis, automation of morphological analysis, MyStem tool, graphical interface, software shell, corpus-based linguistics.
Funding agency Grant number
Russian Foundation for Basic Research 19-012-00246
Received: 02.07.2019
Document Type: Article
UDC: 004.91, 81’33, 004.42
BBC: 32.973, 81.1
Language: Russian
Citation: A. S. Komendantov, A. G. Matveev, A. V. Svetlov, “Automation of morphological tagging of archival documents”, Mathematical Physics and Computer Simulation, 22:4 (2019), 53–63
Citation in format AMSBIB
\Bibitem{KomMatSve19}
\by A.~S.~Komendantov, A.~G.~Matveev, A.~V.~Svetlov
\paper Automation of morphological tagging of archival documents
\jour Mathematical Physics and Computer Simulation
\yr 2019
\vol 22
\issue 4
\pages 53--63
\mathnet{http://mi.mathnet.ru/vvgum267}
\crossref{https://doi.org/10.15688/mpcm.jvolsu.2019.4.4}
Linking options:
  • https://www.mathnet.ru/eng/vvgum267
  • https://www.mathnet.ru/eng/vvgum/v22/i4/p53
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Mathematical Physics and Computer Simulation
    Statistics & downloads:
    Abstract page:123
    Full-text PDF :69
    References:1
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025