Modelirovanie i Analiz Informatsionnykh Sistem
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Model. Anal. Inform. Sist.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Modelirovanie i Analiz Informatsionnykh Sistem, 2024, Volume 31, Number 2, Pages 206–220
DOI: https://doi.org/10.18255/1818-1015-2024-2-206-220
(Mi mais825)
 

This article is cited in 2 scientific papers (total in 2 papers)

Artificial intelligence

Keywords, morpheme parsing and syntactic trees: features for text complexity assessment

D. A. Morozova, I. A. Smala, T. A. Garipova, A. V. Glazkovab

a Novosibirsk National Research State University, Novosibirsk, Russia
b University of Tyumen, Tyumen, Russia
Full-text PDF (591 kB) Citations (2)
References:
Abstract: The text complexity assessment is an applied problem of current interest with potential application in the drafting of legal documents, editing textbooks, and selecting books for extracurricular reading. The methods for generating a feature vector when automatically assessing the text complexity are quite diverse. Early approaches relied on easily calculable quantities, such as the average length of a sentence or the average number of syllables per word. With the development of natural language processing algorithms, the space of used features is expanding. In this work, we examined three groups of features: 1) automatically generated keywords, 2) information about the features of morphemic word parsing, and 3) information about the diversity, branching, and depth of syntactic trees. The RuTermExtract algorithm was utilized to generate keywords, a convolutional neural network model was used to generate morphemic parses, and the Stanza model, trained on the SynTagRus corpus, was used to generate syntax trees. We conducted a comparison using four different machine learning algorithms and four annotated Russian-language text corpora. The corpora used differ both in the domain and markup paradigm, due to which the results obtained more objectively reflect the real relationship between the characteristics and the text complexity. The use of keywords performed worse on average than the use of topic markers obtained using latent Dirichlet allocation. In most situations, morphemic characteristics turned out to be more effective than previously described methods for assessing the lexical complexity of a text: the frequency of words and the occurrence of word-formation patterns. The use of an extensive set of syntactic features allowed, in most cases, to improve the quality of work of neural network models in comparison with the previously described set.
Keywords: text complexity, keyword generation, morpheme parsing generation, syntax trees.
Received: 27.02.2024
Revised: 29.03.2024
Accepted: 08.05.2024
Document Type: Article
UDC: 004.912
MSC: 68T50
Language: Russian
Citation: D. A. Morozov, I. A. Smal, T. A. Garipov, A. V. Glazkova, “Keywords, morpheme parsing and syntactic trees: features for text complexity assessment”, Model. Anal. Inform. Sist., 31:2 (2024), 206–220
Citation in format AMSBIB
\Bibitem{MorSmaGar24}
\by D.~A.~Morozov, I.~A.~Smal, T.~A.~Garipov, A.~V.~Glazkova
\paper Keywords, morpheme parsing and syntactic trees: features for text complexity assessment
\jour Model. Anal. Inform. Sist.
\yr 2024
\vol 31
\issue 2
\pages 206--220
\mathnet{http://mi.mathnet.ru/mais825}
\crossref{https://doi.org/10.18255/1818-1015-2024-2-206-220}
Linking options:
  • https://www.mathnet.ru/eng/mais825
  • https://www.mathnet.ru/eng/mais/v31/i2/p206
  • This publication is cited in the following 2 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Моделирование и анализ информационных систем
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025