PhaseAll: a simple tool for read-based allele phasing
- Authors: Zhurbenko P.M.1,2, Klimenko F.N.3
-
Affiliations:
- Saint Petersburg State University
- Komarov Botanical Institute of the Russian Academy of Sciences
- Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht
- Issue: Vol 20 (2022): Supplement
- Pages: 32-32
- Section: Genetically modified organism. The Нistory, Achivements, Social and Environmental Riscs
- Submitted: 03.11.2022
- Accepted: 04.11.2022
- Published: 08.12.2022
- URL: https://journals.eco-vector.com/ecolgenet/article/view/112363
- DOI: https://doi.org/10.17816/ecogen112363
- ID: 112363
Cite item
Full Text
Abstract
The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an «X» is written in the allele sequences. This means that the alleles can swap at this position.
PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll
The research was supported by RSF (project No. 21-14-00050).
Full Text
The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an «X» is written in the allele sequences. This means that the alleles can swap at this position.
PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll
The research was supported by RSF (project No. 21-14-00050).
About the authors
Peter M. Zhurbenko
Saint Petersburg State University; Komarov Botanical Institute of the Russian Academy of Sciences
Email: pj_28@mail.ru
Researcher, Laboratory of Biosystematics and Cytology, Junior Researcher, Department of Genetics and Biotechnology
Russian Federation, Saint Petersburg; Saint PetersburgFedor N. Klimenko
Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht
Author for correspondence.
Email: tniknko@gmail.com
Junior Researcher, Department of Innovative Technologies and Technical Means of Rehabilitation, Institute of Prosthetics and Orthotics
Russian Federation, Saint Petersburg