Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets
D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov
Yaroslav-the-Wise Novgorod State University, Velikii Novgorod, Russia
In this paper we look at two interrelated problems of extracting knowledge units from a set of subject-oriented texts (the so-called corpus) and selecting texts to the corpus by analyzing the relevance to the initial phrase. The main practical goal here is finding the most rational variant to express the knowledge fragment in a given natural language for further reflection in the thesaurus and ontology of a subject area. The problems are of importance when constructing systems for processing, analysis, estimation and understanding of information. In this paper the text relevance to the initial phrase in terms of the described fragment of actual knowledge (including forms of its expression in a given natural language) is defined by the total numerical estimate of the coupling strength of words from the initial phrase jointly occurring in phrases of the text under analysis. The paper considers known variants of such estimation procedures and their application for the search of distinct components which reflect the initial phrase in the texts selected to the topical text corpus. These components correspond to words and their combinations. In comparison with the search of such components on a syntactically marked text corpus, the method for text selection offered in this paper enables a 15-times reduction (on average) in the output of phrases which are irrelevant to the initial one in terms of either the described knowledge fragment or its expression forms in a given natural language.
pattern recognition, intelligent data analysis, information theory, open-form test assignment, natural-language expression of expert knowledge, contextual annotation, document ranking in information retrieval.
PDF file (278 kB)
D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov, “Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets”, Computer Optics, 40:4 (2016), 572–582
Citation in format AMSBIB
\by D.~V.~Mikhaylov, A.~P.~Kozlov, G.~M.~Emelyanov
\paper Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets
\jour Computer Optics
Citing articles on Google Scholar:
Related articles on Google Scholar:
|Number of views:|