RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform., 2020, Volume 20, Issue 1, Pages 116–126 (Mi isu833)  

Scientific Part
Computer Sciences

The study of the statistical characteristics of the text based on the graph model of the linguistic corpus

E. G. Grigorievaa, V. A. Klyachinba

a Volgograd State University, 100 Universitetskii Prosp., Volgograd 400062, Russia
b Kalmyk State University name after B. B. Gorodovikov, 11 Pushkin St., Elista 358000, Repablic of Kalmykia, Russia

Abstract: The article is devoted to the study of the statistical characteristics of the text, which are calculated on the basis of the graph model of the text from the linguistic corpus. The introduction describes the relevance of the statistical analysis of the texts and some of the tasks solved using such an analysis. The graph model of the text proposed in the article is constructed as a graph in the vertices of which the words of the text are located, and the edges of the graph reflect the fact that two words fall into any part of the text, for example, in — a sentence. For the vertices and edges of the graph, the article introduces the concept of weight as a value from some additive semigroup. Formulas for calculating a graph and its weights are proved for text concatenation. Based on the proposed model, calculations are implemented in the Python programming language. For an experimental study of statistical characteristics, 24 values are distinguished, which are expressed in terms of the weights of the vertices, edges of the graph, as well as other characteristics of the graph, for example, the degrees of its vertices. It should be noted that the purpose of numerical experiments is to squeak in the characteristics of the text, with which you can determine whether the text is man-made or randomly generated. The article proposes one of the possible such algorithms, which generates random text using some other text created by man as a template. In this case, the sequence of parts of speech in an auxiliary text alternation is preserved in the random text. It turns out that the required conditions are satisfied by the median value of the ratio of the text graph edge weight value to the number of sentences in the text.

Key words: text, graph, linguistic corpus, automatic text processing.

Funding Agency Grant Number
Russian Foundation for Basic Research 18-412-340007
This work was supported by the Russian Foundation for Basic Research and the Administration of the Volgograd Region (project No. 18-412-340007).


DOI: https://doi.org/10.18500/1816-9791-2020-20-1-116-126

Full text: PDF file (425 kB)
References: PDF file   HTML file

Bibliographic databases:

UDC: 519.688+004.942
Received: 28.02.2019
Accepted:19.05.2019

Citation: E. G. Grigorieva, V. A. Klyachin, “The study of the statistical characteristics of the text based on the graph model of the linguistic corpus”, Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform., 20:1 (2020), 116–126

Citation in format AMSBIB
\Bibitem{GriKly20}
\by E.~G.~Grigorieva, V.~A.~Klyachin
\paper The study of the statistical characteristics of the text based on the graph model of the linguistic corpus
\jour Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform.
\yr 2020
\vol 20
\issue 1
\pages 116--126
\mathnet{http://mi.mathnet.ru/isu833}
\crossref{https://doi.org/10.18500/1816-9791-2020-20-1-116-126}


Linking options:
  • http://mi.mathnet.ru/eng/isu833
  • http://mi.mathnet.ru/eng/isu/v20/i1/p116

    SHARE: VKontakte.ru FaceBook Twitter Mail.ru Livejournal Memori.ru


    Citing articles on Google Scholar: Russian citations, English citations
    Related articles on Google Scholar: Russian articles, English articles
  • Известия Саратовского университета. Новая серия. Серия Математика. Механика. Информатика
    Number of views:
    This page:39
    Full text:10
    References:2

     
    Contact us:
     Terms of Use  Registration  Logotypes © Steklov Mathematical Institute RAS, 2021