RUS  ENG JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB
 General information Latest issue Archive Guidelines for authors Submit a manuscript Search papers Search references RSS Latest issue Current issues Archive issues What is RSS

 Program Systems: Theory and Applications: Year: Volume: Issue: Page: Find

 Program Systems: Theory and Applications, 2016, Volume 7, Issue 4, Pages 347–358 (Mi ps229)

Mathematical Foundations of Programming

Approximation of the longest common subsequence length for two long random strings

S. V. Znamenskij

Ailamazyan Program Systems Institute of RAS

Abstract: The expected value $E$ of the longest common subsequence of letters in two random words is considered as a function of the $\alpha=|A|$ of alphabet and of words lengths $m$ and $n$. It is assumed that each letter independently appears at any position with equal probability.
An approximate analitic expression for $E(\alpha, m, n)$ calculation is presented that allow to calculate the $E (m, n, \alpha)$ with an accuracy of $0.3$ percent for $64 \leqslant m + n \leqslant 65,536$ and $1 <\alpha < 129$. The coefficients are selected by hand and can be refined. It is expected that the formula holds for each grater values of the argument with the same relative error.

Key words and phrases: similarity of strings, sequence alignment, edit distance, LCS, Levenshtein metric.

Full text: PDF file (1547 kB)
References: PDF file   HTML file
UDC: 004.416
MSC: 68T37; 68P10, 68W32
Accepted: 28.12.2016

Citation: S. V. Znamenskij, “Approximation of the longest common subsequence length for two long random strings”, Program Systems: Theory and Applications, 7:4 (2016), 347–358

Citation in format AMSBIB
\Bibitem{Zna16} \by S.~V.~Znamenskij \paper Approximation of the longest common subsequence length for two long random strings \jour Program Systems: Theory and Applications \yr 2016 \vol 7 \issue 4 \pages 347--358 \mathnet{http://mi.mathnet.ru/ps229}