Mathematical Physics and Computer Simulation
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mathematical Physics and Computer Simulation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Mathematical Physics and Computer Simulation, 2022, Volume 25, Issue 1, Pages 34–48
DOI: https://doi.org/10.15688/mpcm.jvolsu.2022.1.3
(Mi vvgum324)
 

Modeling, informatics and management

On development of web application for corpus of archival documents

A. V. Pavlova, Yu. D. Sapicha, A. V. Svetlova, A. S. Komendantovb

a Volgograd State University
b OOO “SET”
Abstract: This work is a part of the project on creation the linguistic corpus of the fund “Mikhailovsky stanitsa ataman” documents. This fund contains historically valuable administrative documents of the Don Cossacks Army of the 18th-19th centuries, stored in the state archives of Volgograd Region. To introduce it to scientific society, a lot of preliminary work to digitize them was done by group of scientists from Volgograd State University headed by Professor O.A. Gorban. In their current form, these documents are suitable for computer processing. The only significant problem is outdated vocabulary and graphics, but it was generally solved in our previous works. At the current stage, the main task is to develop the technical and software parts of the corpus. In fact, this means the creation of an “engine” for a document corpus, that is, software for storing a database of marked-up texts, executing queries to this database, and also providing user-friendly interface that does not require special IT-skills. At the same time, in the process of working on the previous tasks, we decided to integrate the document markup tool into the general corpus software. Thus, the present work is devoted to the development of a REST service that allows you to perform automated morphological analysis of texts, save a special form of processed documents in a database, search in database by a query with morphological features of elements in the texts. The software also provides a function for manual correction of errors that occur in automated analysis of Old Slavonic texts with obsolete characters.
Keywords: linguistic corpus of documents, web service, automation of morphological analysis, MyStem tool, corpus-based linguistics.
Funding agency Grant number
Russian Foundation for Basic Research 19-012-00246
Received: 29.12.2021
Document Type: Article
UDC: 004.91, 81’33, 004.42
BBC: 32.973, 81.1
Language: Russian
Citation: A. V. Pavlov, Yu. D. Sapich, A. V. Svetlov, A. S. Komendantov, “On development of web application for corpus of archival documents”, Mathematical Physics and Computer Simulation, 25:1 (2022), 34–48
Citation in format AMSBIB
\Bibitem{PavSapSve22}
\by A.~V.~Pavlov, Yu.~D.~Sapich, A.~V.~Svetlov, A.~S.~Komendantov
\paper On development of web application for corpus of archival documents
\jour Mathematical Physics and Computer Simulation
\yr 2022
\vol 25
\issue 1
\pages 34--48
\mathnet{http://mi.mathnet.ru/vvgum324}
\crossref{https://doi.org/10.15688/mpcm.jvolsu.2022.1.3}
Linking options:
  • https://www.mathnet.ru/eng/vvgum324
  • https://www.mathnet.ru/eng/vvgum/v25/i1/p34
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Mathematical Physics and Computer Simulation
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025