Fast trie-based method for multiple pairwise sequence alignment

Cover Page

Abstract


A method for efficient comparison of a symbol sequence with all strings of a set is presented, which performs considerably faster than the naive enumeration of comparisons with all strings in succession. The procedure is accelerated by applying an original algorithm combining a prefix tree and a standard dynamic programming algorithm searching for the edit distance (Levenshtein distance) between strings. The efficiency of the method is confirmed by numerical experiments with arrays consisting of tens of millions of biological sequences of variable domains of monoclonal antibodies.


About the authors

P. A. Yakovlev

Closed Joint Stock Company “Biocad”

Author for correspondence.
Email: yakovlev@biocad.ru

Russian Federation, Saint Petersburg

References

  1. Левенштейн В.И. // ДАН. 1965. Т. 163. № 4. С. 845–848.
  2. Needleman S.B., Wunsch C.D. // J. Mol. Biol. 1970. V. 48. № 3. P. 443–453.
  3. Damerau F.J. // Commun. ACM. 1964. V. 7. № 3. P. 171–176.
  4. Smith T.F., Waterman M.S. // J. Mol. Biol. 1981. V. 147. № 1. P. 195–197.
  5. Brudno M., et al. // Bioinformatics. 2003. V. 19. P. 54–62.
  6. Wagner R.A., Fischer M.J. // J. ACM. 1974. V. 21. № 1. P. 168–173.
  7. Aho A.V., Corasick M.J. // Commun. ACM. 1975. V. 18. № 6. P. 333–340.
  8. Cohen J.D. // ACM Trans. Inf. Sys. 1997. V. 15. № 3. P. 291–320.
  9. Market E., Papavasiliou F.N. // PLoS Biol. 2003. V. 1. № 1. P. 24–27.

Statistics

Views

Abstract - 266

PDF (Russian) - 401

PlumX


Copyright (c) 2019 Russian academy of sciences

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies