Review of heterozygosity visualization approaches in the context of conservation research

Cover Page


Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription or Fee Access

Abstract

The assessment of heterozygosity level is one of the key metrics in conservation biology, as it contributes to the accurate design of conservation programs for endangered species. With the development of whole-genome sequencing technologies, it is now possible to more accurately estimate heterozygosity not only at the organismal level, but also at the population and species level. Contemporary conservation studies involve the processing of large volumes of whole-genome data, leading to problems of interpretation and necessitates the study of modern visualization methods for clear and correct presentation of results. In this review, we comprehensively examine the main types of visualization of heterozygosity assessments obtained using various approaches. We delve into the theory underlying each visualization method and discuss their characteristics using examples from studies of non-model species with different conservation statuses. The review provides insight into current tools for heterozygosity assessment and subsequent visualization, as well as current trends in this field.

Full Text

Restricted Access

About the authors

Andrey A. Tomarovsky

Institute of Molecular and Cellular Biology, Siberian Branch, Russian Academy of Sciences; Novosibirsk State University

Author for correspondence.
Email: andrey.tomarovsky@gmail.com
ORCID iD: 0000-0002-6414-704X
SPIN-code: 6727-8664
Scopus Author ID: 57264872500
Russian Federation, Novosibirsk; Novosibirsk

Azamat A. Totikov

Institute of Molecular and Cellular Biology, Siberian Branch, Russian Academy of Sciences; Novosibirsk State University

Email: a.totickov1@gmail.com
ORCID iD: 0000-0003-1236-631X
SPIN-code: 9767-3971
Scopus Author ID: 57265434800
Russian Federation, Novosibirsk; Novosibirsk

Aliya R. Yakupova

Email: aliyah.yakupova@gmail.com
ORCID iD: 0000-0003-1486-0864
SPIN-code: 4292-0609
Scopus Author ID: 57264122200

independent researcher

Germany

Alexander S. Graphodatsky

Institute of Molecular and Cellular Biology, Siberian Branch, Russian Academy of Sciences

Email: graf@mcb.nsc.ru
ORCID iD: 0000-0002-8282-1085
SPIN-code: 4436-9033
Scopus Author ID: 7003878913

Dr. Sci. (Biology)

Russian Federation, Novosibirsk

Sergei F. Kliver

Email: mahajrod@gmail.com
ORCID iD: 0000-0002-2965-3617
SPIN-code: 8635-4259
Scopus Author ID: 56449314300

independent researcher

Denmark

References

  1. Soulé ME. What is conservation biology? A new synthetic discipline addresses the dynamics and problems of perturbed species, communities, and ecosystems. BioSci. 1985;35(11):727–734. doi: 10.2307/1310054
  2. Fuentes-Pardo AP, Ruzzante DE. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations. Mol Ecol. 2017;26(20):5369–5406. doi: 10.1111/mec.14264
  3. Hoban S, Kelley JL, Lotterhos KE, et al. Finding the genomic basis of local adaptation: Pitfalls, practical solutions, and future directions. Am Nat The University of Chicago Press. 2016;188(4):379–397. doi: 10.1086/688018
  4. Hoban S, da Silva JM, Mastretta-Yanes A, et al. Monitoring status and trends in genetic diversity for the Convention on Biological Diversity: An ongoing assessment of genetic indicators in nine countries. Conserv Lett. 2023;16(3):e12953. doi: 10.1111/conl.12953
  5. Ng PC, Kirkness EF. Whole genome sequencing. In: Barnes MR, Breen G, editors. Genetic variation: methods and protocols. Totowa, NJ: Humana Press, 2010. P. 215–226. doi: 10.1007/978-1-60327-367-1_12
  6. Breed MF, Harrison PA, Blyth C, et al. The potential of genomics for restoring ecosystems and biodiversity: 10. Nat Rev Genet. 2019;20(10):615–628. doi: 10.1038/s41576-019-0152-0
  7. Kliver SF. Whole genome approach in conservation biology and its perspectives. Ecological genetics. 2021;19(3):281–298. doi: 10.17816/ecogen65152
  8. Joop Ouborg N, Angeloni F, Vergeer P. An essay on the necessity and feasibility of conservation genomics. Conserv Genet. 2010;11(2):643–653. doi: 10.1007/s10592-009-0016-9
  9. Dudchenko O, Shamim MS, Batra SS, et al. The Juicebox assembly tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv. 2018;254797. doi: 10.1101/254797
  10. Durand NC, Robinson JT, Shamim MS, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3(1):99–101. doi: 10.1016/j.cels.2015.07.012
  11. Luikart G, England PR, Tallmon D, et al. The power and promise of population genomics: from genotyping to genome typing: 12. Nat Rev Genet. 2003;4(12):981–994. doi: 10.1038/nrg1226
  12. Campbell MR, Vu NV, LaGrange AP, et al. Development and application of single-nucleotide polymorphism (SNP) genetic markers for conservation monitoring of burbot populations. Trans Am Fish Soc. 2019;148(3):661–670. doi: 10.1002/tafs.10157
  13. Bijlsma R, Loeschcke V. Genetic erosion impedes adaptive responses to stressful environments. Evol Appl. 2012;5(2):117–129. doi: 10.1111/j.1752-4571.2011.00214.x
  14. Leroy G, Carrol EL, Bruford MW, et al. Next-generation metrics for monitoring genetic erosion within populations of conservation concern. Evol Appl. 2018;11(7):1066–1083. doi: 10.1111/eva.12564
  15. Frankham R, Ballou JD, Eldridge MD, et al. Predicting the probability of outbreeding depression. Conserv Biol. 2011;25(3):465–475. doi: 10.1111/j.1523-1739.2011.01662.x
  16. Charlesworth D, Willis JH. The genetics of inbreeding depression: 11. Nat Rev Genet. 2009;10(11):783–796. doi: 10.1038/nrg2664
  17. Mayr E. Populations, species and evolution. Beknap Press. 453 p.
  18. Tomimatsu H, Ohara M. Genetic diversity and local population structure of fragmented populations of Trillium camschatcense (Trilliaceae). Biol Conserv. 2003;109(2):249–258. doi: 10.1016/S0006-3207(02)00153-2
  19. Hanski I. The Shrinking world: Ecological consequences of habitat loss. Excell Ecol. 2005;14.
  20. Lande R, Barrowclough G. Effective population size, genetic variation, and their use in population management. In: Soulé M, editor. Viable populations for conservation. Cambridge: Cambridge University Press, 1987. P. 87–124. doi: 10.1017/CBO9780511623400.007
  21. Wright S. Random drift and the shifting balance theory of evolution. In: Kojima K, editor. Mathematical topics in population genetics. Berlin, Heidelberg: Springer, 1970. P. 1–31. doi: 10.1007/978-3-642-46244-3_1
  22. Nevo E. Genetic variation in natural populations: Patterns and theory. Theor Popul Biol. 1978;13(1):121–177. doi: 10.1016/0040-5809(78)90039-4
  23. Lewontin R. The genetic basis of evolutionary change. Columbia University Press, 1974. 346 p.
  24. Steiner CC, Putnam AS, Hoeck PEA, Ryder A. Conservation genomics of threatened animal species. Annu Rev Anim Biosci. 2013;1(1):261–281. doi: 10.1146/annurev-animal-031412-103636
  25. Weir BS. Genetic data analysis II: Methods for discrete population genetic data. Oxford, New York: Oxford University Press, 1996. 445 p.
  26. Ritland K. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet Res. 1996;67(2):175–185. doi: 10.1017/S0016672300033620
  27. Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. PNAS. 1979;76(10):5269–5273. doi: 10.1073/pnas.76.10.5269
  28. Wright S. The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution. 1965;19(3): 395–420. doi: 10.2307/2406450
  29. Shafer ABA, Wolf JBW, Alves PC, et al. Genomics and the challenging translation into conservation practice. Trends Ecol Evol. 2015;30(2):78–87. doi: 10.1016/j.tree.2014.11.009
  30. Hoffmann A, Griffin P, Dillon S, et al. A framework for incorporating evolutionary genomics into biodiversity conservation and management. Clim Change Responses. 2015;2(1):1. doi: 10.1186/s40665-014-0009-x
  31. Benestan LM, Ferchaud A-L, Hohenlohe PA, et al. Conservation genomics of natural and managed populations: building a conceptual and practical framework. Mol Ecol. 2016;25(13):2967–2977. doi: 10.1111/mec.13647
  32. Hoban S, Gaggiotti O, ConGRESS Consortium, Bertorelle G. Sample planning optimization tool for conservation and population genetics (SPOTG): a software for choosing the appropriate number of markers and samples. Methods Ecol Evol. 2013;4(3):299–303. doi: 10.1111/2041-210x.12025
  33. Nazareno AG, Bemmels JB, Dick CW, Lohmann LG. Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species. Mol Ecol Resour. 2017;17(6):1136–1147. doi: 10.1111/1755-0998.12654
  34. Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006;15(5):789–795. doi: 10.1093/hmg/ddi493
  35. McQuillan R, Leutenegger A-L, Abdel-Rahman R, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83(3):359–372. doi: 10.1016/j.ajhg.2008.08.007
  36. Darwin C. The effects of cross and self fertilisation in the vegetable kingdom. Ams PressInc, 1877. doi: 10.5962/bhl.title.104481
  37. Ceballos FC, Joshi PK, Clark DW, et al. Runs of homozygosity: windows into population history and trait architecture: 4. Nat Rev Genet. 2018;19(4):220–234. doi: 10.1038/nrg.2017.109
  38. Hoffman JI, Simpson F, David P, et al. High-throughput sequencing reveals inbreeding depression in a natural population. PNAS. 2014;111(10):3775–3780. doi: 10.1073/pnas.1318945111
  39. Muir WM, Wong GK-S, Zhang Y, et al. Genome-wide assessment of worldwide chicken SNP genetic diversity indicates significant absence of rare alleles in commercial breeds. PNAS. 2008;105(45):17312–17317. doi: 10.1073/pnas.0806569105
  40. Urbinati I, Stafuzza NB, Oliveira MT, et al. Selection signatures in Canchim beef cattle. J Anim Sci Biotechnol. 2016;7(1):29. doi: 10.1186/s40104-016-0089-5
  41. Samuels DC, Wang J, Ye K, et al. Heterozygosity ratio, a robust global genomic measure of autozygosity and its association with height and disease risk. Genetics. 2016;204(3):893–904. doi: 10.1534/genetics.116.189936
  42. Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics. 2003;165(4):2213–2233. doi: 10.1093/genetics/165.4.2213
  43. Rife DC. Populations of hybrid origin as source material for the detection of linkage. Am J Hum Genet. 1954;6(1):26–33.
  44. Robinson JA, Räikkönen J, Vucetich LM, et al. Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction. Sci Adv. 2019;5(5):eaau0757. doi: 10.1126/sciadv.aau0757
  45. Koepfli K-P, Tamazian G, Wildt D, et al. Whole genome sequencing and re-sequencing of the sable antelope (Hippotragus niger): A resource for monitoring diversity in ex situ and in situ populations. G3 Genes Genomes Genetics. 2019;9(6):1785–1793. doi: 10.1534/g3.119.400084
  46. Big Soviet Encyclopedia. Vol. 20. 3rd ed. 1974. P. 25. (In Russ.)
  47. Zhu L, Deng C, Zhao X, et al. Endangered Père David’s deer genome provides insights into population recovering. Evol Appl. 2018;11(10):2040–2053. doi: 10.1111/eva.12705
  48. Beichman AC, Koepfli K-P, Li G, et al. Aquatic adaptation and depleted diversity: A Deep dive into the genomes of the sea otter and giant otter. Mol Biol Evol. 2019;36(12):2631–2655. doi: 10.1093/molbev/msz101
  49. Abascal F, Corvelo A, Cruz F, et al. Extreme genomic erosion after recurrent demographic bottlenecks in the highly endangered Iberian lynx. Genome Biol. 2016;17(1):251. doi: 10.1186/s13059-016-1090-1
  50. Cho YS, Hu L, Hou H, et al. The tiger genome and comparative analysis with lion and snow leopard genomes: 1. Nat Commun. 2013;4(1):2433. doi: 10.1038/ncomms3433
  51. Miller W, Schuster SC, Welch AJ, et al. Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. PNAS. 2012;109(36):E2382–E2390. doi: 10.1073/pnas.1210506109
  52. Venn JI. On the diagrammatic and mechanical representation of propositions and reasonings. Lond Edinb Dublin Philos Mag J Sci. 1880;10(59):1–18. doi: 10.1080/14786448008626877
  53. Miller W, Hayes VM, Ratan A, et al. Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). PNAS. 2011;108(30):12348–12353. doi: 10.1073/pnas.1102838108
  54. Humble E, Dobrynin P, Senn H, et al. Chromosomal-level genome assembly of the scimitar-horned oryx: Insights into diversity and demography of a species extinct in the wild. Mol Ecol Resour. 2020;20(6):1668–1681. doi: 10.1111/1755-0998.13181
  55. Yakupova A, Tomarovsky A, Totikov A, et al. Chromosome-length assembly of the baikal seal (Pusa sibirica) genome reveals a historically large population prior to isolation in Lake Baikal: 3. Genes. 2023;14(3):619. doi: 10.3390/genes1403061
  56. Kliver S, Houk ML, Perelman PL, et al. Chromosome-length genome assembly and karyotype of the endangered black-footed ferret (Mustela nigripes). J Hered. 2023;114(5):539–548. doi: 10.1093/jhered/esad035
  57. Li R, Fan W, Tian G, et al. The sequence and de novo assembly of the giant panda genome: 7279. Nature. 2010;463(7279): 311–317. doi: 10.1038/nature08696
  58. Dobrynin P, Liu S, Tamazian G, et al. Genomic legacy of the African cheetah, Acinonyx jubatus. Genome Biol. 2015;16(1):277. doi: 10.1186/s13059-015-0837-4
  59. Lindblad-Toh K, Wade CM, Mikkelsen TS, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog: 7069. Nature. 2005;438(7069):803–819. doi: 10.1038/nature04338
  60. Benjamini Y. Opening the box of a boxplot. Am Stat. 1988;42(4):257–262. doi: 10.2307/2685133
  61. Totikov A, Tomarovsky A, Prokopov D, et al. Chromosome-level genome assemblies expand capabilities of genomics for conservation biology: 9. Genes. 2021;12(9):1336. doi: 10.3390/genes12091336
  62. Hintze JL, Nelson RD. Violin plots: A box plot-density trace synergism. Am Stat. 1998;52(2):181–184. doi: 10.1080/00031305.1998.10480559
  63. de Manuel M, Barnett R, Sandoval-Velasco M, et al. The evolutionary history of extinct and living lions. PNAS USA. 2020;117(20): 10927–10934. doi: 10.1073/pnas.1919423117
  64. Burton JN, Adey A, Patwardhan RP, et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions: 12. Nat Biotechnol. 2013;31(12):1119–1125. doi: 10.1038/nbt.2727
  65. Lewin HA, Graves JAM, Ryder OA, et al. Precision nomenclature for the new genomics. GigaScience. 2019;8(8):giz086. doi: 10.1093/gigascience/giz086
  66. Wilkinson L, Friendly M. The history of the cluster heat map. Am Stat. 2009;63(2):179–184. doi: 10.1198/tas.2009.0033
  67. de Ferran V, Figueiro HV, de Jesus Trindade F, et al. Phylogenomics of the world’s otters. Curr Biol. 2022;32(16):3650–3658.e4. doi: 10.1016/j.cub.2022.06.036
  68. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4
  69. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5
  70. Katoh K, Standley DM. MAFFT Multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–780. doi: 10.1093/molbev/mst010
  71. Magis C, Taly J-F, Bussotti G, et al. T-Coffee: tree-based consistency objective function for alignment evaluation. In: Russell DJ, editor. Multiple sequence alignment methods. Totowa, NJ: Humana Press, 2014. P. 117–129. doi: 10.1007/978-1-62703-646-7_7
  72. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5(1):113. doi: 10.1186/1471-2105-5-113
  73. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2
  74. Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinforma. 2003;1:2–3. doi: 10.1002/0471250953.bi0203s00
  75. Altshuler D, Donnelly P; The International HapMap Consortium. A haplotype map of the human genome: 7063. Nature. 2005;437(7063):1299–1320. doi: 10.1038/nature04226
  76. Durbin RM; The International HapMap Consortium, et al. A map of human genome variation from population-scale sequencing: 7319. Nature. 2010;467(7319):1061–1073. doi: 10.1038/nature09534
  77. Nusrat S, Harbig T, Gehlenborg N. Tasks, techniques, and tools for genomic data visualization. Comput Graph Forum. 2019;38(3): 781–805. doi: 10.1111/cgf.13727
  78. Karolchik D, Baertsch R, Diekhans M, et al. The UCSC genome browser database. Nucleic Acids Res. 2003;31(1):51–54. doi: 10.1093/nar/gkg129
  79. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–192. doi: 10.1093/bib/bbs017
  80. Yates AD, Achuthan P, Akanni W, et al. Ensembl 2020. Nucleic Acids Res. 2020;48(D1):D682–D688. doi: 10.1093/nar/gkz966
  81. Okonechnikov K, Golosova O, Fursov M, et al. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8): 1166–1167. doi: 10.1093/bioinformatics/bts091
  82. Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–2158. doi: 10.1093/bioinformatics/btr330
  83. Narasimhan V, Danecek P, Scally A, et al. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016;32(11):1749–1751. doi: 10.1093/bioinformatics/btw044
  84. Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996;5(3):299–314. doi: 10.1080/10618600.1996.10474713
  85. van Rossum G. Python reference manual. Dep Comput Sci. 1995; R9525.
  86. Hunter JD. Matplotlib: A 2D Graphics environment. Comput Sci. 2007;9(3):90–95. doi: 10.1109/MCSE.2007.55
  87. Schiavinato M, del Olmo V, Muya VN, Gabaldon T. JLOH: Inferring loss of heterozygosity blocks from sequencing data. bioRxiv. 2023;2023.05.04.539368. doi: 10.1101/2023.05.04.539368
  88. Gel B, Serra E. KaryoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics. 2017;33(19):3088–3090. doi: 10.1093/bioinformatics/btx346
  89. Bertrand AR, Kadri NK, Flori L, et al. RZooRoH: An R package to characterize individual genomic autozygosity and identify homozygous-by-descent segments. Methods Ecol Evol. 2019;10(6):860–866. doi: 10.1111/2041-210X.13167
  90. Zhou J, Liu L, Lopdell TJ, et al. HandyCNV: Standardized summary, annotation, comparison, and visualization of copy number variant, copy number variation region, and runs of homozygosity. Front Genet. 2021;12:731355. doi: 10.3389/fgene.2021.731355
  91. Biscarini F, Cozzi P, Gaspa G, Marras G. detectRUNS: Detect runs of homozygosity and runs of heterozygosity in diploid genomes. CRAN (The Comprehensive R Archive Network), 2018. Available at: https://cran.r-project.org/web/packages/detectRUNS/vignettes/detectRUNS.vignette.html
  92. Allaire J. RStudio: integrated development environment for R. Boston MA. 2012;770(394):165–171.
  93. Kluyver T, Ragan-Kelley B, Pérez F, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. Elpub. 2016;2016:87–90. doi: 10.3233/978-1-61499-649-1-87

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. Examples of visualization of heterozygosity using line diagrams: a — mean heterozygosity level (left) and total ROH length (right) in the genomes of wolves from the Isle Royale population; b — number of heterozygous and homozygous variants in two subspecies of black antelope. SB2027* and HN216* are individuals from the southern subspecies, the others from the Zambian subspecies. Original images from [44, 45]

Download (215KB)
3. Fig. 2. Example of visualization of unique and shared SNPs for polar, brown and black bear using a Venn diagram. Original image from [51]

Download (1MB)
4. Fig. 3. Examples of heterozygosity visualization using distribution diagrams: a — histograms of heterozygosity distribution across the genome for five samples of pinnipeds. Heterozygous SNPs were counted in 1 million bp windows and scaled to SNP / 1 thousand bp; b — estimates of heterozygosity in boxplots for multiple individuals of Sahara oryx from different populations; c — violin diagrams of heterozygosity distribution for 2 individuals of Baikal seal, 2 individuals of spotted seal, and 1 individual of gray seal. Original images from [54, 55]

Download (173KB)
5. Fig. 4. Cumulative distribution plot of ROH at different length cutoff threshold for lions from different populations. Original image from [63]

Download (198KB)
6. Fig. 5. Examples of visualization of mean heterozygosity in windows using line plots and heat maps: a — mean heterozygosity in 500 kbp sliding windows in Tanzanian lion (orange plot) and Indian lion (blue plot) based on African lion genome assembly. Chromosomes are sequentially merged along the X axis; b — heat map of heterozygous SNP density based on chromosome-level genomic assembly for male Eurasian river otter. Heterozygous SNPs were counted in 1 million bp windows and scaled to SNP / 1 thousand bp. Original images from [61, 63]

Download (338KB)

Copyright (c) 2023 Eco-Vector



СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 65617 от 04.05.2016.


This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies