|
Program Systems: Theory and Applications, 2016, Volume 7, Issue 1, Pages 201–208
(Mi ps207)
|
|
|
|
This article is cited in 2 scientific papers (total in 2 papers)
Mathematical Foundations of Programming
A picture of common subsequence length for two random strings over an alphabet of 4 symbols
S. V. Znamenskij Ailamazyan Program System Institute of RAS
Abstract:
The maximal length of longest common subsequence (LCS) for a couple of random finite sequences over an alphabet of 4 characters was considered as a random function of the sequences lengths $m$ and $n$.
Exact probability distributions tables are presented for all couples of length in a range $2<m+n<19$.
The graphs of expected value and standard deviation as a functions of length are shown in linear perspective which presents the behaviour of large lengths at the horizon.
In order to illustrate behaviour on large lengths, the results of numeric simulation for $m+n=32$, 512, 8192 and 131072 are also shown on the same graphs.
The presented graph of expected value dependency of $m$ and $n$ looks to have asymptotic right circular cone.
The variance looks alike growing as $(n+m)^{\frac34}$.
Key words and phrases:
similarity of strings, sequence alignment, edit distance, LCS, Levenshtein metric.
Received: 25.12.2015 Accepted: 28.03.2016
Citation:
S. V. Znamenskij, “A picture of common subsequence length for two random strings over an alphabet of 4 symbols”, Program Systems: Theory and Applications, 7:1 (2016), 201–208
Linking options:
https://www.mathnet.ru/eng/ps207 https://www.mathnet.ru/eng/ps/v7/i1/p201
|
Statistics & downloads: |
Abstract page: | 229 | Full-text PDF : | 88 | References: | 60 |
|