Informatsionnye Tekhnologii i Vychslitel'nye Sistemy
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Guidelines for authors

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatsionnye Tekhnologii i Vychslitel'nye Sistemy:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, Issue 4, Pages 60–69
DOI: https://doi.org/10.14357/20718632190406
(Mi itvs363)
 

DATA PROCESSING AND ANALYSIS

Effective clustering of a text sample depending on the different parameterization of this sample

E. A. Golovastovaa, D. N. Krasotinb

a Lomonosov Moscow State University, Moscow, Russia
b CJSC “MNITI”, Moscow, Russia
Abstract: The Internet becomes the primary means of receiving text news. As a result, there is a necessity in automated processing of large data amount. One of the most important tasks is the automated cultivation of text information. In this paper we will consider the problem of effective clustering for objects from text sample. The most common representation of the text set is the matrix, which elements are the statistical measure values calculated on the basis of the word frequency. In opposition to we suggest parametrization by the text key words. We use two methods to provide the clustering: K-means and Dbscan. This paper considers the analysis of mentioned methods and provide comparison of the clustering quality results, which depend on various text parameterization and the used algorithm.
Keywords: Clustering, text set, sample parameterization, tf-idf-measure, keywords, effective method.
Document Type: Article
Language: Russian
Citation: E. A. Golovastova, D. N. Krasotin, “Effective clustering of a text sample depending on the different parameterization of this sample”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, no. 4, 60–69
Citation in format AMSBIB
\Bibitem{GolKra19}
\by E.~A.~Golovastova, D.~N.~Krasotin
\paper Effective clustering of a text sample depending on the different parameterization of this sample
\jour Informatsionnye Tekhnologii i Vychslitel'nye Sistemy
\yr 2019
\issue 4
\pages 60--69
\mathnet{http://mi.mathnet.ru/itvs363}
\crossref{https://doi.org/10.14357/20718632190406}
Linking options:
  • https://www.mathnet.ru/eng/itvs363
  • https://www.mathnet.ru/eng/itvs/y2019/i4/p60
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatsionnye  Tekhnologii i Vychslitel'nye Sistemy
    Statistics & downloads:
    Abstract page:145
    Full-text PDF :352
    References:2
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025