PhaseAll: a simple tool for read-based allele phasing
- Авторы: Zhurbenko P.M.1,2, Klimenko F.N.3
-
Учреждения:
- Saint Petersburg State University
- Komarov Botanical Institute of the Russian Academy of Sciences
- Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht
- Выпуск: Том 20 (2022): Спецвыпуск
- Страницы: 32-32
- Раздел: «ГМО: ИСТОРИЯ, ДОСТИЖЕНИЯ, СОЦИАЛЬНЫЕ И ЭКОЛОГИЧЕСКИЕ РИСКИ»
- Статья получена: 03.11.2022
- Статья одобрена: 04.11.2022
- Статья опубликована: 08.12.2022
- URL: https://journals.eco-vector.com/ecolgenet/article/view/112363
- DOI: https://doi.org/10.17816/ecogen112363
- ID: 112363
Цитировать
Полный текст
Аннотация
The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an «X» is written in the allele sequences. This means that the alleles can swap at this position.
PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll
The research was supported by RSF (project No. 21-14-00050).
Полный текст
The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an «X» is written in the allele sequences. This means that the alleles can swap at this position.
PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll
The research was supported by RSF (project No. 21-14-00050).
Об авторах
Peter Zhurbenko
Saint Petersburg State University; Komarov Botanical Institute of the Russian Academy of Sciences
Email: pj_28@mail.ru
Researcher, Laboratory of Biosystematics and Cytology, Junior Researcher, Department of Genetics and Biotechnology
Россия, Saint Petersburg; Saint PetersburgFedor Klimenko
Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht
Автор, ответственный за переписку.
Email: tniknko@gmail.com
Junior Researcher, Department of Innovative Technologies and Technical Means of Rehabilitation, Institute of Prosthetics and Orthotics
Россия, Saint Petersburg