Fast trie-based method for multiple pairwise sequence alignment
- Authors: Yakovlev P.A.1
-
Affiliations:
- Closed Joint Stock Company “Biocad”
- Issue: Vol 484, No 4 (2019)
- Pages: 401-404
- Section: Mathematics
- URL: https://journals.eco-vector.com/0869-5652/article/view/12545
- DOI: https://doi.org/10.31857/S0869-56524844401-404
- ID: 12545
Cite item
Full Text
Abstract
A method for efficient comparison of a symbol sequence with all strings of a set is presented, which performs considerably faster than the naive enumeration of comparisons with all strings in succession. The procedure is accelerated by applying an original algorithm combining a prefix tree and a standard dynamic programming algorithm searching for the edit distance (Levenshtein distance) between strings. The efficiency of the method is confirmed by numerical experiments with arrays consisting of tens of millions of biological sequences of variable domains of monoclonal antibodies.
About the authors
P. A. Yakovlev
Closed Joint Stock Company “Biocad”
Author for correspondence.
Email: yakovlev@biocad.ru
Russian Federation, Saint Petersburg
References
- Левенштейн В.И. // ДАН. 1965. Т. 163. № 4. С. 845–848.
- Needleman S.B., Wunsch C.D. // J. Mol. Biol. 1970. V. 48. № 3. P. 443–453.
- Damerau F.J. // Commun. ACM. 1964. V. 7. № 3. P. 171–176.
- Smith T.F., Waterman M.S. // J. Mol. Biol. 1981. V. 147. № 1. P. 195–197.
- Brudno M., et al. // Bioinformatics. 2003. V. 19. P. 54–62.
- Wagner R.A., Fischer M.J. // J. ACM. 1974. V. 21. № 1. P. 168–173.
- Aho A.V., Corasick M.J. // Commun. ACM. 1975. V. 18. № 6. P. 333–340.
- Cohen J.D. // ACM Trans. Inf. Sys. 1997. V. 15. № 3. P. 291–320.
- Market E., Papavasiliou F.N. // PLoS Biol. 2003. V. 1. № 1. P. 24–27.