PhaseAll: a simple tool for read-based allele phasing

Cover Page


Cite item

Full Text

Abstract

The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an «X» is written in the allele sequences. This means that the alleles can swap at this position.

PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll

The research was supported by RSF (project No. 21-14-00050).

Full Text

The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an «X» is written in the allele sequences. This means that the alleles can swap at this position.

PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll

The research was supported by RSF (project No. 21-14-00050).

×

About the authors

Peter M. Zhurbenko

Saint Petersburg State University; Komarov Botanical Institute of the Russian Academy of Sciences

Email: pj_28@mail.ru

Researcher, Laboratory of Biosystematics and Cytology, Junior Researcher, Department of Genetics and Biotechnology

Russian Federation, Saint Petersburg; Saint Petersburg

Fedor N. Klimenko

Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht

Author for correspondence.
Email: tniknko@gmail.com

Junior Researcher, Department of Innovative Technologies and Technical Means of Rehabilitation, Institute of Prosthetics and Orthotics

Russian Federation, Saint Petersburg

References

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2022 Eco-Vector



СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 65617 от 04.05.2016.


This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies