
Program Systems: Theory and Applications, 2016, Volume 7, Issue 4, Pages 347–358
(Mi ps229)




Mathematical Foundations of Programming
Approximation of the longest common subsequence length for two long random strings
S. V. Znamenskij^{} ^{} Ailamazyan Program Systems Institute of RAS
Abstract:
The expected value $E$ of the longest common subsequence of letters in two random words is considered as a function of the $\alpha=A$ of alphabet and of words lengths $m$ and $n$.
It is assumed that each letter independently appears at any position with equal probability.
An approximate analitic expression for $E(\alpha, m, n)$ calculation is presented that allow to calculate the $E (m, n, \alpha) $ with an accuracy of $0.3$ percent for $ 64 \leqslant m + n \leqslant 65,536 $ and $ 1 <\alpha < 129$.
The coefficients are selected by hand and can be refined.
It is expected that the formula holds for each grater values of the argument with the same relative error.
Key words and phrases:
similarity of strings, sequence alignment, edit distance, LCS, Levenshtein metric.
Full text:
PDF file (1547 kB)
References:
PDF file
HTML file
UDC:
004.416
MSC: 68T37; 68P10, 68W32 Received: 24.11.2016 Accepted: 28.12.2016
Citation:
S. V. Znamenskij, “Approximation of the longest common subsequence length for two long random strings”, Program Systems: Theory and Applications, 7:4 (2016), 347–358
Citation in format AMSBIB
\Bibitem{Zna16}
\by S.~V.~Znamenskij
\paper Approximation of the longest common subsequence length for two long random strings
\jour Program Systems: Theory and Applications
\yr 2016
\vol 7
\issue 4
\pages 347358
\mathnet{http://mi.mathnet.ru/ps229}
Linking options:
http://mi.mathnet.ru/eng/ps229 http://mi.mathnet.ru/eng/ps/v7/i4/p347
Citing articles on Google Scholar:
Russian citations,
English citations
Related articles on Google Scholar:
Russian articles,
English articles

Number of views: 
This page:  108  Full text:  30  References:  13 
