RNA-Seq contamination as a metatranscriptomic data for screening of plant pests and symbionts
- Authors: Zykin P.A.1, Andreeva E.A.2,3, Tsvetkova N.V.2, Bulanov A.N.3, Voylokov A.V.4
-
Affiliations:
- Saint-Petersburg State University
- Saint Petersburg State University
- Vavilov Institute of General Genetics
- St Petersburg Branch Russian Academy of Sciences, Vavilov Institute of General Genetics
- Section: Ecosystems metagenomics
- Submitted: 02.12.2024
- Accepted: 18.09.2025
- Published: 30.09.2025
- URL: https://journals.eco-vector.com/ecolgenet/article/view/642484
- DOI: https://doi.org/10.17816/ecogen642484
- ID: 642484
Cite item
Full Text
Abstract
Background: Transcriptome sequencing data can contain up to 30% contaminating reads. These may originate from laboratory contamination or biologically relevant sources, amenable to metatranscriptomics analysis. Aim: to evaluate the utility of contaminating reads for large-scale screening of plant pests and symbionts.
Materials and methods: We analyzed the data of RNA-seq experiments of rye (Secale cereale L.) including five in-house accessions and 50 public datasets from NCBI SRA archive. Reads with good mapping to the rye genome were filtered out, retaining putative contaminats for downstream analysis.
Results: After removing laboratory contaminants, we compared aphids, symbiotic fungi, bacteria and viruses across accessions. Symbiome-derived reads were reproducible in biological replicates and varied by location, condition, and plant species, enabling post-hoc metatranscriptomic analysis.
Conclusion: Contaminating reads correlated with field-observed species or expected symbionts. Distribution patterns across accessions support repurposing existing and future sequencing data to screen for plant pests, monitor symbiotic organisms, and plan eradication strategies amid global climate change.
Full Text

About the authors
Pavel A. Zykin
Saint-Petersburg State University
Email: pavel.zykin@spbu.ru
ORCID iD: 0000-0003-1624-6163
SPIN-code: 2730-5890
Russian Federation, University emb., 7/9, Saint-Petersburg, Russia, 199034
Elena A. Andreeva
Saint Petersburg State University; Vavilov Institute of General Genetics
Author for correspondence.
Email: e.a.andreeva@spbu.ru
ORCID iD: 0000-0002-9326-3170
SPIN-code: 7269-8240
Cand. Sci. (Biology)
Russian Federation, Saint Petersburg; MoscowNatalia V. Tsvetkova
Saint Petersburg State University
Email: n.tswetkowa@spbu.ru
ORCID iD: 0000-0002-7353-1107
SPIN-code: 1687-5757
Cand. Sci. (Biology)
Russian Federation, Saint PetersburgAndrey N. Bulanov
Vavilov Institute of General Genetics
Email: an.bulanov20002014@gmail.com
ORCID iD: 0009-0003-8092-9978
SPIN-code: 3791-9700
Russian Federation, Moscow, Russia
Anatoly Vasilievich Voylokov
St Petersburg Branch Russian Academy of Sciences, Vavilov Institute of General Genetics
Email: av_voylokov@mail.ru
Head of the Laboratory, Dr.Biol.Sci., Laboratory of Genetics and Plant Biotechnology
References
- Sangiovanni M., Granata I., Thind A.S., Guarracino M.R. From trash to treasure: detecting unexpected contamination in unmapped NGS data // BMC Bioinformatics. 2019. Vol. 20. P. 168. doi: 10.1186/s12859-019-2684-x
- Simion P., Belkhir K., François C., et al. A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data // BMC Biology. 2018. Vol. 16. P. 28. doi: 10.1186/s12915-018-0486-7
- Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor // Bioinformatics. 2018. Vol. 34. P. i884–i890. doi: 10.1093/bioinformatics/bty560
- Rabanus-Wallace M.T., Hackauf B., Mascher M., et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential // Nat Genet. 2021. Vol. 53. P. 564–573. doi: 10.1038/s41588-021-00807-0
- Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. United States : Department of Energy. Office of Science, 2014.
- Bushmanova E., Antipov D., Lapidus A., Prjibelski A.D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data // Gigascience. 2019. Vol. 8, N. 9. P. giz100. doi: 10.1093/gigascience/giz100
- NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information // Nucleic Acids Res. 2018. Vol. 46. P. D8–D13. doi: 10.1093/nar/gkx1095
- Wingett S.W., Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control // F1000Res. 2018. Vol. 7. P. 1338. doi: 10.12688/f1000research.15931.2
- Lafond-Lapalme J., Duceppe M.-O., Wang S., et al. A new method for decontamination of de novo transcriptomes using a hierarchical clustering algorithm // Bioinformatics. 2017. Vol. 33. P. 1293–1300. doi: 10.1093/bioinformatics/btw793
- Chen Y., Singh A., Kaithakottil G.G., et al. An aphid RNA transcript migrates systemically within plants and is a virulence factor // PNAS. 2020. Vol. 117. P. 12763–12771. doi: 10.1073/pnas.1918410117
- Salter S.J., Cox M.J., Turek E.M., et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses // BMC Biol. 2014. Vol. 12. P. 87. doi: 10.1186/s12915-014-0087-z
- NCBI SRA (archive). Available from: https://www.ncbi.nlm.nih.gov/sra. Accessed: Dec 2, 2024.
- Interactive agricultural ecological atlas of Russia and neighboring countries. Available from: https://agroatlas.ru/en/content/pests/Schizaphis_graminum/index.html. Accessed: Dec 2, 2024.
- Berim M.N. The most harmful species of aphids in the North-West of Russia // Plant health and quarantine. 2014. Vol. 9. P. 29-30.
- van Kleeff P.J.M., Galland M., Schuurink R.C., Bleeker P.M. Small RNAs from Bemisia tabaci Are Transferred to Solanum lycopersicum Phloem during Feeding // Front. Plant Sci. 2016. Vol. 7. P. 1759. doi: 10.3389/fpls.2016.01759
- Su Y.-L., Li J.-M., Li M., et al. Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly // PLoS One. 2012. Vol. 7, N. 6. P. e39303. doi: 10.1371/journal.pone.0039303
- Ban L., Didon A., Jonsson L.M.V., et al. An improved detection method for the Rhopalosiphum padi virus (RhPV) allows monitoring of its presence in aphids and movement within plants // Journal of Virological Methods. 2003. Vol. 142. P. 136–142. doi: 10.1016/j.jviromet.2007.01.014
- Zhao S. Ye Z., Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13. PMID: 32284352
- Zhao Y., Li M.C., Konaté M.M. et al. TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository. J Transl Med 19, 269 (2021). doi: 10.1186/s12967-021-02936-w
- Mukherjee A., Reddy M.S. Metatranscriptomics: an approach for retrieving novel eukaryotic genes from polluted and related environments // 3 Biotech. 2020. Vol. 10. P. 71. doi: 10.1007/s13205-020-2057-1
- Shakya M., Lo C.-C., Chain P.S.G. Advances and Challenges in Metatranscriptomic Analysis // Front. Genet. 2019. Vol. 10. P. 904. doi: 10.3389/fgene.2019.00904
- Barton H.A., Taylor N.M., Lubbers B.R., Pemberton A.C. DNA extraction from low-biomass carbonate rock: An improved method with reduced contamination and the low-biomass contaminant database // Journal of Microbiological Methods. 2006. Vol. 66. P. 21–31. doi: 10.1016/j.mimet.2005.10.005
- Wally N., Schneider M., Thannesberger J., et al. Plasmid DNA contaminant in molecular reagents // Sci Rep. 2019. Vol. 9. P. 1652. doi: 10.1038/s41598-019-38733-1
- Weyrich L.S., Farrer A.G., Eisenhofer R., et al. Laboratory contamination over time during low‐biomass sample analysis // Mol Ecol Resour. 2019. Vol. 19. P. 982–996. doi: 10.1111/1755-0998.13011
- Christensen G.J.M., Brüggemann H. Bacterial skin commensals and their role as host guardians // Beneficial Microbes. 2014. Vol. 5, N. 2. P. 201-215. doi: 10.3920/BM2012.0062
Supplementary files
