RNA-Seq contamination as a metatranscriptomic data for screening of plant pests and symbionts



Cite item

Full Text

Abstract

Background: Transcriptome sequencing data can contain up to 30% contaminating reads. These may originate from laboratory contamination or biologically relevant sources, amenable to metatranscriptomics analysis. Aim: to evaluate the utility of contaminating reads for large-scale screening of plant pests and symbionts.

Materials and methods: We analyzed the data of RNA-seq experiments of rye (Secale cereale L.) including five in-house accessions and 50 public datasets from NCBI SRA archive. Reads with good mapping to the rye genome were filtered out, retaining putative contaminats for downstream analysis.

Results: After removing laboratory contaminants, we compared aphids, symbiotic fungi, bacteria and viruses across accessions. Symbiome-derived reads were reproducible in biological replicates and varied by location, condition, and plant species, enabling post-hoc metatranscriptomic analysis.

Conclusion: Contaminating reads correlated with field-observed species or expected symbionts. Distribution patterns across accessions support repurposing existing and future sequencing data to screen for plant pests, monitor symbiotic organisms, and plan eradication strategies amid global climate change.

Full Text

Restricted Access

About the authors

Pavel A. Zykin

Saint-Petersburg State University

Email: pavel.zykin@spbu.ru
ORCID iD: 0000-0003-1624-6163
SPIN-code: 2730-5890
Russian Federation, University emb., 7/9, Saint-Petersburg, Russia, 199034

Elena A. Andreeva

Saint Petersburg State University; Vavilov Institute of General Genetics

Author for correspondence.
Email: e.a.andreeva@spbu.ru
ORCID iD: 0000-0002-9326-3170
SPIN-code: 7269-8240

Cand. Sci. (Biology)

Russian Federation, Saint Petersburg; Moscow

Natalia V. Tsvetkova

Saint Petersburg State University

Email: n.tswetkowa@spbu.ru
ORCID iD: 0000-0002-7353-1107
SPIN-code: 1687-5757

Cand. Sci. (Biology)

Russian Federation, Saint Petersburg

Andrey N. Bulanov

Vavilov Institute of General Genetics

Email: an.bulanov20002014@gmail.com
ORCID iD: 0009-0003-8092-9978
SPIN-code: 3791-9700
Russian Federation, Moscow, Russia

Anatoly Vasilievich Voylokov

St Petersburg Branch Russian Academy of Sciences, Vavilov Institute of General Genetics

Email: av_voylokov@mail.ru

Head of the Laboratory, Dr.Biol.Sci., Laboratory of Genetics and Plant Biotechnology

References

  1. Sangiovanni M., Granata I., Thind A.S., Guarracino M.R. From trash to treasure: detecting unexpected contamination in unmapped NGS data // BMC Bioinformatics. 2019. Vol. 20. P. 168. doi: 10.1186/s12859-019-2684-x
  2. Simion P., Belkhir K., François C., et al. A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data // BMC Biology. 2018. Vol. 16. P. 28. doi: 10.1186/s12915-018-0486-7
  3. Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor // Bioinformatics. 2018. Vol. 34. P. i884–i890. doi: 10.1093/bioinformatics/bty560
  4. Rabanus-Wallace M.T., Hackauf B., Mascher M., et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential // Nat Genet. 2021. Vol. 53. P. 564–573. doi: 10.1038/s41588-021-00807-0
  5. Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. United States : Department of Energy. Office of Science, 2014.
  6. Bushmanova E., Antipov D., Lapidus A., Prjibelski A.D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data // Gigascience. 2019. Vol. 8, N. 9. P. giz100. doi: 10.1093/gigascience/giz100
  7. NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information // Nucleic Acids Res. 2018. Vol. 46. P. D8–D13. doi: 10.1093/nar/gkx1095
  8. Wingett S.W., Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control // F1000Res. 2018. Vol. 7. P. 1338. doi: 10.12688/f1000research.15931.2
  9. Lafond-Lapalme J., Duceppe M.-O., Wang S., et al. A new method for decontamination of de novo transcriptomes using a hierarchical clustering algorithm // Bioinformatics. 2017. Vol. 33. P. 1293–1300. doi: 10.1093/bioinformatics/btw793
  10. Chen Y., Singh A., Kaithakottil G.G., et al. An aphid RNA transcript migrates systemically within plants and is a virulence factor // PNAS. 2020. Vol. 117. P. 12763–12771. doi: 10.1073/pnas.1918410117
  11. Salter S.J., Cox M.J., Turek E.M., et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses // BMC Biol. 2014. Vol. 12. P. 87. doi: 10.1186/s12915-014-0087-z
  12. NCBI SRA (archive). Available from: https://www.ncbi.nlm.nih.gov/sra. Accessed: Dec 2, 2024.
  13. Interactive agricultural ecological atlas of Russia and neighboring countries. Available from: https://agroatlas.ru/en/content/pests/Schizaphis_graminum/index.html. Accessed: Dec 2, 2024.
  14. Berim M.N. The most harmful species of aphids in the North-West of Russia // Plant health and quarantine. 2014. Vol. 9. P. 29-30.
  15. van Kleeff P.J.M., Galland M., Schuurink R.C., Bleeker P.M. Small RNAs from Bemisia tabaci Are Transferred to Solanum lycopersicum Phloem during Feeding // Front. Plant Sci. 2016. Vol. 7. P. 1759. doi: 10.3389/fpls.2016.01759
  16. Su Y.-L., Li J.-M., Li M., et al. Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly // PLoS One. 2012. Vol. 7, N. 6. P. e39303. doi: 10.1371/journal.pone.0039303
  17. Ban L., Didon A., Jonsson L.M.V., et al. An improved detection method for the Rhopalosiphum padi virus (RhPV) allows monitoring of its presence in aphids and movement within plants // Journal of Virological Methods. 2003. Vol. 142. P. 136–142. doi: 10.1016/j.jviromet.2007.01.014
  18. Zhao S. Ye Z., Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13. PMID: 32284352
  19. Zhao Y., Li M.C., Konaté M.M. et al. TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository. J Transl Med 19, 269 (2021). doi: 10.1186/s12967-021-02936-w
  20. Mukherjee A., Reddy M.S. Metatranscriptomics: an approach for retrieving novel eukaryotic genes from polluted and related environments // 3 Biotech. 2020. Vol. 10. P. 71. doi: 10.1007/s13205-020-2057-1
  21. Shakya M., Lo C.-C., Chain P.S.G. Advances and Challenges in Metatranscriptomic Analysis // Front. Genet. 2019. Vol. 10. P. 904. doi: 10.3389/fgene.2019.00904
  22. Barton H.A., Taylor N.M., Lubbers B.R., Pemberton A.C. DNA extraction from low-biomass carbonate rock: An improved method with reduced contamination and the low-biomass contaminant database // Journal of Microbiological Methods. 2006. Vol. 66. P. 21–31. doi: 10.1016/j.mimet.2005.10.005
  23. Wally N., Schneider M., Thannesberger J., et al. Plasmid DNA contaminant in molecular reagents // Sci Rep. 2019. Vol. 9. P. 1652. doi: 10.1038/s41598-019-38733-1
  24. Weyrich L.S., Farrer A.G., Eisenhofer R., et al. Laboratory contamination over time during low‐biomass sample analysis // Mol Ecol Resour. 2019. Vol. 19. P. 982–996. doi: 10.1111/1755-0998.13011
  25. Christensen G.J.M., Brüggemann H. Bacterial skin commensals and their role as host guardians // Beneficial Microbes. 2014. Vol. 5, N. 2. P. 201-215. doi: 10.3920/BM2012.0062

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) Eco-Vector



СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 89324 от 21.04.2025.