Modelirovanie i Analiz Informatsionnykh Sistem
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Model. Anal. Inform. Sist.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Modelirovanie i Analiz Informatsionnykh Sistem, 2017, Volume 24, Number 6, Pages 772–787
DOI: https://doi.org/10.18255/1818-1015-2017-6-772-787
(Mi mais600)
 

This article is cited in 6 scientific papers (total in 6 papers)

Analysis of influence of different relations types on the quality of thesaurus application to text classification problems

N. S. Lagutina, K. V. Lagutina, I. A. Shchitov, I. V. Paramonov

P.G. Demidov Yaroslavl State University, 14 Sovetskaya str., Yaroslavl, 150003 Russia
Full-text PDF (558 kB) Citations (6)
References:
Abstract: The main purpose of the article is to analyze how effectively different types of thesaurus relations can be used for solutions of text classification tasks. The basis of the study is an automatically generated thesaurus of a subject area, that contains three types of relations: synonymous, hierarchical and associative. To generate the thesaurus the authors use a hybrid method based on several linguistic and statistical algorithms for extraction of semantic relations. The method allows to create a thesaurus with a sufficiently large number of terms and relations among them. The authors consider two problems: topical text classification and sentiment classification of large newspaper articles. To solve them, the authors developed two approaches that complement standard algorithms with a procedure that take into account thesaurus relations to determine semantic features of texts. The approach to topical classification includes the standard unsupervised BM25 algorithm and the procedure, that take into account synonymous and hierarchical relations of the thesaurus of the subject area. The approach to sentiment classification consists of two steps. At the first step, a thesaurus is created, whose terms weight polarities are calculated depending on the term occurrences in the training set or on the weights of related thesaurus terms. At the second step, the thesaurus is used to compute the features of words from texts and to classify texts by the algorithm SVM or Naive Bayes. In experiments with text corpora BBCSport, Reuters, PubMed and the corpus of articles about American immigrants, the authors varied the types of thesaurus relations that are involved in the classification and the degree of their use. The results of the experiments make it possible to evaluate the efficiency of the application of thesaurus relations for classification of raw texts and to determine under what conditions certain relationships affect more or less. In particular, the most useful thesaurus connections are synonymous and hierarchical, as they provide a better quality of classification.
Keywords: thesaurus, semantic relations, thesaurus relations, topical classification, sentiment classification.
Funding agency Grant number
Ministry of Education and Science of the Russian Federation MK-5456.2016.9
This work was supported by the grant of the President of Russian Federation for state support of young Russian scientists (project MK-5456.2016.9).
Received: 16.10.2017
Bibliographic databases:
Document Type: Article
UDC: 004.912
Language: Russian
Citation: N. S. Lagutina, K. V. Lagutina, I. A. Shchitov, I. V. Paramonov, “Analysis of influence of different relations types on the quality of thesaurus application to text classification problems”, Model. Anal. Inform. Sist., 24:6 (2017), 772–787
Citation in format AMSBIB
\Bibitem{LagLagShc17}
\by N.~S.~Lagutina, K.~V.~Lagutina, I.~A.~Shchitov, I.~V.~Paramonov
\paper Analysis of influence of different relations types on the quality of thesaurus application to text classification problems
\jour Model. Anal. Inform. Sist.
\yr 2017
\vol 24
\issue 6
\pages 772--787
\mathnet{http://mi.mathnet.ru/mais600}
\crossref{https://doi.org/10.18255/1818-1015-2017-6-772-787}
\elib{https://elibrary.ru/item.asp?id=30730616}
Linking options:
  • https://www.mathnet.ru/eng/mais600
  • https://www.mathnet.ru/eng/mais/v24/i6/p772
  • This publication is cited in the following 6 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Моделирование и анализ информационных систем
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025