RUS  ENG JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Optics:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Optics, 2016, Volume 40, Issue 4, Pages 572–582 (Mi co252)  

DATA ANALYSIS

Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets

D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov

Yaroslav-the-Wise Novgorod State University, Velikii Novgorod, Russia

Abstract: In this paper we look at two interrelated problems of extracting knowledge units from a set of subject-oriented texts (the so-called corpus) and selecting texts to the corpus by analyzing the relevance to the initial phrase. The main practical goal here is finding the most rational variant to express the knowledge fragment in a given natural language for further reflection in the thesaurus and ontology of a subject area. The problems are of importance when constructing systems for processing, analysis, estimation and understanding of information. In this paper the text relevance to the initial phrase in terms of the described fragment of actual knowledge (including forms of its expression in a given natural language) is defined by the total numerical estimate of the coupling strength of words from the initial phrase jointly occurring in phrases of the text under analysis. The paper considers known variants of such estimation procedures and their application for the search of distinct components which reflect the initial phrase in the texts selected to the topical text corpus. These components correspond to words and their combinations. In comparison with the search of such components on a syntactically marked text corpus, the method for text selection offered in this paper enables a 15-times reduction (on average) in the output of phrases which are irrelevant to the initial one in terms of either the described knowledge fragment or its expression forms in a given natural language.

Keywords: pattern recognition, intelligent data analysis, information theory, open-form test assignment, natural-language expression of expert knowledge, contextual annotation, document ranking in information retrieval.

Funding Agency Grant Number
Ministry of Education and Science of the Russian Federation
Russian Foundation for Basic Research 16-01-00004_
This work was supported by the Ministry of Education and Science of the Russian Federation (the base portion goszadaniya) and RFBR grant (16-01-00004).


DOI: https://doi.org/10.18287/2412-6179-2016-40-4-572-582

Full text: PDF file (278 kB)
Full text: http://www.computeroptics.smr.ru/.../400417.html
References: PDF file   HTML file

Received: 14.04.2016
Accepted:01.07.2016

Citation: D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov, “Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets”, Computer Optics, 40:4 (2016), 572–582

Citation in format AMSBIB
\Bibitem{MikKozEme16}
\by D.~V.~Mikhaylov, A.~P.~Kozlov, G.~M.~Emelyanov
\paper Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets
\jour Computer Optics
\yr 2016
\vol 40
\issue 4
\pages 572--582
\mathnet{http://mi.mathnet.ru/co252}
\crossref{https://doi.org/10.18287/2412-6179-2016-40-4-572-582}


Linking options:
  • http://mi.mathnet.ru/eng/co252
  • http://mi.mathnet.ru/eng/co/v40/i4/p572

    SHARE: VKontakte.ru FaceBook Twitter Mail.ru Livejournal Memori.ru


    Citing articles on Google Scholar: Russian citations, English citations
    Related articles on Google Scholar: Russian articles, English articles
  • Computer Optics
    Number of views:
    This page:75
    Full text:30
    References:14

     
    Contact us:
     Terms of Use  Registration  Logotypes © Steklov Mathematical Institute RAS, 2020