RUS  ENG JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PERSONAL OFFICE
General information
Latest issue
Archive
Guidelines for authors
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Program Systems: Theory and Applications:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Program Systems: Theory and Applications, 2018, Volume 9, Issue 4, Pages 579–596 (Mi ps329)  

Mathematical Foundations of Programming

Stable assessment of the quality of similarity algorithms of character strings and their normalizations

S. V. Znamenskij

Ailamazyan Program Systems Institute of Russian Academy of Sciences

Abstract: The choice of search tools for hidden commonality in the data of a new nature requires stable and reproducible comparative assessments of the quality of abstract algorithms for the proximity of symbol strings. Conventional estimates based on artificially generated or manually labeled tests vary significantly, rather evaluating the method of this artificial generation with respect to similarity algorithms, and estimates based on user data cannot be accurately reproduced.
A simple, transparent, objective and reproducible numerical quality assessment of a string metric. Parallel texts of book translations in different languages are used. The quality of a measure is estimated by the percentage of errors in possible different tries of determining the translation of a given paragraph among two paragraphs of a book in another language, one of which is actually a translation. The stability of assessments is verified by independence from the choice of a book and a pair of languages.
The numerical experiment steadily ranked by quality algorithms for abstract character string comparisons and showed a strong dependence on the choice of normalization.

Key words and phrases: string similarity, data analysis, similarity metric, distance metric, numeric evaluation, quality assessment.

DOI: https://doi.org/10.25209/2079-3316-2018-9-4-579-596

Full text: PDF file (3957 kB)
References: PDF file   HTML file

Document Type: Article
UDC: 519.652.3
Received: 17.04.2018
03.12.2018
Accepted: 28.12.2018

Citation: S. V. Znamenskij, “Stable assessment of the quality of similarity algorithms of character strings and their normalizations”, Program Systems: Theory and Applications, 9:4 (2018), 579–596

Citation in format AMSBIB
\Bibitem{Zna18}
\by S.~V.~Znamenskij
\paper Stable assessment of the quality of similarity algorithms of character strings and their normalizations
\jour Program Systems: Theory and Applications
\yr 2018
\vol 9
\issue 4
\pages 579--596
\mathnet{http://mi.mathnet.ru/ps329}
\crossref{https://doi.org/10.25209/2079-3316-2018-9-4-579-596}


Linking options:
  • http://mi.mathnet.ru/eng/ps329
  • http://mi.mathnet.ru/eng/ps/v9/i4/p579

    SHARE: VKontakte.ru FaceBook Twitter Mail.ru Livejournal Memori.ru


    Citing articles on Google Scholar: Russian citations, English citations
    Related articles on Google Scholar: Russian articles, English articles
    Translation
  • Program Systems: Theory and Applications
    Number of views:
    This page:17
    Full text:6
    References:2

     
    Contact us:
     Terms of Use  Registration  Logotypes © Steklov Mathematical Institute RAS, 2019