CTGA: a web-based functional genomic resource for Cyamopsis tetragonoloba (L.) Taub.
- Authors: Zorin E.A.1,2, Vishnyakova M.A.3, Zhukov V.A.1,2
-
Affiliations:
- N.I. Vavilov All-Russian Institute of Plant Genetic Resources
- All-Russia Research Institute for Agricultural Microbiology
- N.I. Vavilov All-Russian Institute of Plant Genetic Resourcess”
- Section: Methodology in ecological genetics
- Submitted: 11.11.2025
- Accepted: 11.12.2025
- Published: 30.12.2025
- URL: https://journals.eco-vector.com/ecolgenet/article/view/696083
- DOI: https://doi.org/10.17816/ecogen696083
- ID: 696083
Cite item
Abstract
BACKGROUND: Guar (Cyamopsis tetragonoloba), an industrially important crop, is valued for the galactomannan gum derived from its seeds. Recent advances in genomic and transcriptomic research have provided valuable resources such as the reference genome and several sets of gene expression profiles. However, these data are currently fragmented and therefore require bioinformatics expertise to access and analyze them. Additionally, several genomic assemblies have been recently published, but there are currently no bioinformatics platforms specifically dedicated to guar genomics and transcriptomics.
AIM: To address this challenge, we have developed “CTGA”, a comprehensive functional genomic web portal for guar.
METHODS: Using Flask, as well as popular Python, CSS, and HTML libraries, we have developed a backend and frontend for the genomic platform.
RESULTS: We have performed a de novo structural and functional annotation of the guar genome predicting 57,019 protein-coding genes with UTRs. Besides, expression data from 85 public RNA-seq libraries representing various tissues and conditions were collected to create a normalized gene expression atlas. “CTGA” features an intuitive web interface to provide interactive tools, including a genome browser (IGV), BLAST for homology searching, tools for the Gene Ontology enrichment analysis, for working with guar genomic sequences, as well as a tool for generating heatmaps for more convenient analysis of guar gene expression in various tissues and experimental conditions. It also includes detailed functional annotations from various sources (eggNOG, Mercator4, GO, and KEGG) and instant visualization of gene expression profiles.
CONCLUSION: “TCGA” is available at: https://guar.arriam.ru /.
Full Text
background
Guar (Cyamopsis tetragonoloba (L.) Taub.) is an important technical, feed, and food crop globally, primarily valued for its seed endosperm gum – a storage polysaccharide with extensive applications in food, oil, textile, pharmaceutical, and cosmetic industries [1].
While traditional guar breeding was based on phenotypic selection, with the advent of next-generation sequencing (NGS), genomic and transcriptomic approaches have emerged. Recent research has focused on elucidating the molecular mechanisms of galactomannan biosynthesis. One of the first studies, the work of Naoumkina and co-authors [2], made it possible to identify key candidate genes using cDNA libraries from developing seeds.
Subsequent RNA-seq studies comparing guar varieties with varying gum yields revealed that expression peaks for mannan synthase and sucrose synthase occur during the mid-stage of seed development, corresponding with gum accumulation [3]. Hu et al. (2019), using quantitative RNA-Seq, highlighted the role of cellulose synthase-like A (CsLA) gene family, including mannan synthase [4]. These findings were further supported by Sharma and coauthors, who provided spatio-temporal insights into galactomannan regulation [5].
A significant advancement was made with the first genome assembly by Gaikwad and coauthors [6]. This enabled the precise mapping of genes involved in galactomannan biosynthesis and their regulatory elements. In parallel, efforts have expanded genomic resources, including the development of transcriptome-derived single nucleotide polymorphism (SNP) markers [7, 8]. Grigoreva and colleagues created an SNP panel for use in marker-assisted selection, utilizing a draft genome sequence[9].
Research has expanded to include traits other than gum production. Integrating transcriptome and metabolome analyses have identified genes and metabolites associated with flowering time [10, 11]. Furthermore, the complete chloroplast genome has been sequenced, facilitating phylogenetic studies [12].
Collectively, these advances have transformed guar from a crop that has been understudied to one that has been molecularly characterized, with foundational resources such as a reference genome, expression profiles, and molecular markers. However, manipulating this data requires specialized skills for access and analysis. To streamline the research of guar, we have developed a user-friendly web-based platform that includes a interactive genome browser, a BLAST service [13], functional gene annotations, expression profiles from all publicly available RNA-Seq data and other useful tools for working with the genomic sequence. “CTGA” is available at https://guar.arriam.ru/.
Methods
Genomic sequence obtaining and structural reannotation
To reannotate the genes in the C. tetragonoloba genome [14], the reference assembly in FASTA format was downloaded from the National Center for Biotechnology Information (NCBI) database (available under BioProject ID: PRJNA1055737 or GenBank ID: GCA_037177725.1). De novo gene annotation was performed using the BRAKER2 tool (version 2.1.6) [15]. BRAKER2 performed automatic prediction of gene structure by combining ab initio evidence from GeneMark-EP+ [16] and AUGUSTUS [17], as well as RNA sequencing alignment data (85 samples, in total) (Supp. Table 1). The default parameters were used. To increase the completeness of the annotation, an additional step was performed to predict untranslated regions (UTR) using the capabilities built into pipeline BRAKER2/AUGUSTUS. As a result, a GFF3 file was generated containing the coordinates of the predicted genes, mRNA, exons, and UTRs, as well as their corresponding protein sequences. Aberrant CDS and UTR have been fixed or removed from the annotation using a custom Python script.
Protein functional annotation and quality assesment
An integrated approach was applied to assign a functional annotation to the predicted protein sequences. The primary functional annotation, including the prediction of Gene Ontology (GO) [18, 19], metabolic pathways (KEGG) [20], and domain architecture, was performed using eggNOG-mapper (version 2.1.9) [21] against the eggNOG database (v5.0) using homology search mode (diamond). Additionally, for the categorization of genes in the context of biological pathways and comparison with other plant species, annotation was performed using Mercator4 [22, 23]. This tool assigned each protein to one of 70 hierarchical MapMan BIN categories based on hidden Markov models.
The quality control of predicted genes was conducted using BUSCO [24] with “embryophyta_odb10” database.
Raw RNA-seq reads processing
All publicly available RNA-seq datasets for C. tetragonoloba were obtained from the NCBI Sequence Read Archive (SRA) using the SRA Toolkit version 3.0.0 [25]. The search and selection were based on species-specific keywords, resulting in the loading of 89 libraries representing various tissue types and experimental conditions.
Initial processing of raw reads was carried out to ensure high-quality data for subsequent analysis. Adapter sequences, technical artifacts, and low-quality reads were filtered using BBDuk version 38.96. [26]. Parameters used included ktrim=r, k=23, mink=11, hdist=1, tbo, tpe, qtrim=rl, trimq=20, minlen=50.
Reads mapping and count matrix construction
The high-quality reads were mapped to the guar reference genome using the STAR tool version 2.7.10a [27] in two stages. At the first stage, splice events were detected, which were then used to improve genome annotation in the second stage. This improved the accuracy of the mapping.
Based on the BAM-formatted mapping results obtained using STAR, a count matrix was created using the featureCounts program [28]. This utility calculated the number of reads uniquely mapped to each BRAKER2 annotated gene for each library.
Data normalization and expression atlas construction
To compare the expression levels between the samples, which differ significantly in the total number of sequenced reads (library size), the counts matrix was normalized using the Counts Per Million (CPM) method. The resulting normalized CPM matrix served as the basis for constructing the guar expression atlas.
The implementation of a web-based functional genomics resource for guar.
A specialized web service dedicated to Cyamopsis tetragonoloba has been developed to provide convenient and interactive access to the obtained genomic, transcriptomic and functional data.
The server part of the application is implemented in Python (version 3.11) using the Flask framework (version 2.3.2) [29]. The application provides routing, query processing, and programmatic access to data (annotated genome, pre-build BLAST database, expression matrix, and functional annotation) that is stored in a structured form on the server.
The client side is built using standard web technologies: HTML5, CSS3 and JavaScript. Bootstrap and Chart.js (version 4.3.0) libraries are used to create interactive and dynamic user interface elements.
The web service includes four main functional modules. For visual analysis of the annotated genome, the IGV.js component was integrated (Integrative Genomics Viewer [30], version 2.13.2), pre-generated reference genome (FASTA) and annotation (GFF3) files are uploaded on the client side, allowing users to navigate through chromosomes, scale the loci of interest, and visualize predicted gene structures, including exons, introns, and UTR regions. The gene expression analysis module allows the user to enter the gene identifier, after which the server application on Flask extracts the corresponding normalized expression values (CPM) for all samples from the prepared matrix. The data is transmitted to the client side, where an interactive boxplot chart is automatically generated using the Chart.js library, which visually displays the expression profile of the requested gene in various tissues and conditions. Searching by gene identifier also allows the user to obtain comprehensive functional information. On a separate page or as a pop-up window, data obtained from the EggNOG and Mercator tools are displayed, including protein function prediction, Gene Ontology (GO) terms, KEGG pathways, as well as Mercator4 detailed functional description. To enable the search for homologous genes in the guar genome by the sequence of nucleotides or amino acids, BLAST+ was integrated into the web service. A local BLAST database containing annotated coding sequences (CDS) and protein sequences was created on the server. The user interface includes a form for entering ID or uploading a sequence in FASTA format and configuring basic parameters. After sending the request, the server application on Flask runs the BLAST+ utility, processes the results and returns to the user an interactive HTML page with alignments, E-value and percentage of identity, providing direct links to homologous genes in other modules of the service (genome browser, gene expression, annotation).
The Gene Ontology enrichment analysis was implemented using custom R and python scripts and the topGO R package [31]. The calculation was carried out using the Fisher’s exact test and the weight01 algorithm.
The web service is available at https://guar.arriam.ru/ and it can be used by the scientific community for in-depth analysis of the guar genome.
Results
The guar reference genome has been assembled at the chromosomal level and annotated in the year 2024, but the annotation is not publicly available at the moment. To obtain a high-quality set of predicted genes for future work, we performed a de novo genome annotation.
Re-annotation and Characterization of the Guar Genome
In this study, we performed a comprehensive de novo annotation of the published guar (Cyamopsis tetragonoloba) genome using the BRAKER2 pipeline. This approach resulted in the prediction of 57,019 protein-coding genes encoding 82,042 proteins. A key advancement of our annotation over the existing one was the precise prediction of untranslated regions (UTRs) for the gene models, which are crucial for the regulation of gene expression, especially if the analysis is performed using technologies involving RNA capture by the polyA-tail and therefore sequencing only 3’end of transcripts (for example, the 3' MACE technology [32]).
To assess the quality of the structural annotations, the BUSCO program was run on the predicted proteins using the embryophyta_odb10 database. As a result, a significant number of genes have been fully covered, and the low percentage of missing or fragmented data was obtained, indicating the high quality of the annotation process (Table 1).
The functional annotation of the predicted genes using EggNOG-mapper successfully assigned putative functions to 36,998 genes (Table 2), providing Gene Ontology (GO) terms, KEGG pathways, and domain architectures.
Complementary analysis with Mercator4, enabled the categorization of 28,443 genes into the hierarchical MapMan BIN system (Table 2), facilitating the functional exploration of biological pathways in guar. All the major biological processes and metabolic pathways encoded in MapMan bins were covered by the predicted genes (Fig. 1, a). In addition, galactomannan biosynthesis genes were also identified among the annotated ones (Fig. 1, b), which indicates the high quality of the annotation.
Construction of a Comprehensive Gene Expression Atlas
To capture the transcriptomic landscape of guar, we collected all publicly available RNA-seq datasets from NCBI SRA, comprising 96 libraries derived from a wide range of tissues and developmental stages (Table S1). However, not all the samples collected could be analyzed, as various technical errors were found that did not allow proper processing of the reads. Eventually, 85 samples were left for further analysis (Table 3).
After rigorous quality control and adapter trimming using BBDuk, high-quality reads were aligned to the annotated genome using the STAR aligner.
The resulting expression matrix was normalized using Counts Per Million (CPM) to enable cross-sample comparison. This comprehensive expression atlas reveals the transcript abundance of all predicted genes across the studied conditions. Principal Component Analysis (PCA) of the CPM matrix showed clear separation of samples by tissue type (Fig. 2, b) but not sequencing running or experiment (Fig. 2, a), demonstrating the biological consistency of the dataset and the quality of normalization.
From the 57,019 total genes annotated in the guar genome, 27823 genes (48.79%) showed significant expression (≥10 CPMs, Counts Per Million) in the sum of all samples.
Development of an Interactive Guar Genomic Resource
To make annotated genome, gene expression, functional data and several research tools freely accessible and user-friendly, we developed a comprehensive web resource using the Flask framework. The platform integrates several key modules (Fig. 3).
First of all, the user can get information both by the identifier of a specific gene and by the nucleotide or amino acid sequence of a guar or a closely related organism by inserting it into the appropriate window and conducting a BLAST search (Fig. 4, a). The user can fine-tune a filter for the BLAST search by changing the E-value (Fig. 4, c). On the results page, the user can select the best hits based on a number of parameters, such as the percentage of identity, alignment length, number of substitutions, and e-value (Fig. 4, b).
By clicking on the selected gene, the user is taken to the next page with detailed information about the gene. The platform provides instant access to comprehensive functional annotation (via EggNOG and Mercator4 databases) for each gene. In addition, KEGG and GO terms are assigned to each gene, along with a brief description, which facilitates subsequent analysis (Fig. 4, c).
An embedded IGV.js instance allows for intuitive visualization of the genomic context of any gene, including exon-intron structures and predicted UTRs (Fig. 5, a). The genomic browser is available for all genes, regardless of whether the user accesses it through the BLAST service or the ID search. In the genomic browser section, the user can also extract the complete sequence of the gene of interest, or only the CDS or protein sequence, with one click (Fig. 5, a).
For any gene of interest, users can generate an interactive barplot and boxplot displaying its normalized expression level (CPM) across all integrated RNA-seq samples, facilitating quick assessment of its expression pattern. Each box reflects the median, Q1 and Q2 values of the normalized expression counts for all available replications, except in cases where the data is publicly available only in a single replicate (Fig. 5, b, c).
Integration of a BLAST service allows users to search for homologous sequences within the guar genome using nucleotide or protein queries, directly linking results to the genome browser and expression modules.
To facilitate the work of researchers in the fields of genomics and transcriptomics of guar, we have added the ability to download all necessary files for analysis. These include structural genome annotation, functional annotation using eggnog with GO/KEGG identifiers, functional annotation using mercator4, and a gene expression data. All the listed datasets are available in the “Downloads” section.
By conducting transcriptomic and genomic analyses using the available guar reference genome assembly and the datasets obtained during this study, researchers can use a special tool to perform the Gene Ontology enrichment analysis on their own set of genes. Based on user-entered gene identifiers, the R script performs Gene Ontology enrichment analysis for one of three categories: biological process, molecular function, or cellular component. As an output, the user receives a barplot and a table with statistically significant gene ontology terms (Fig. 6). This service is available in the “Tools” section.
Having a set of genes of interest or regions of the genome, users can extract sequences directly from the genomic assembly by coordinates using the "Sequence Extractor" tool implemented in our resource, which is also available on the “Tools” page.
When working with sequences of many genes, scientists often need to estimate their expression levels across different organs and tissues of the organism. For these cases, we developed the “Heatmap Generator” tool, allowing create a heatmap based on all currently available RNA-sequencing libraries using gene identifiers as a query (Fig. 7). This tool is available online in the “Tools” section.
This tool allows users to quickly create a heatmap using flexible parameters for z-scale standardization, gene or sample clustering, and provides a choice of color palettes.
Discussion
In recent decades, due to the development of next-generation sequencing methods, guar has transformed from a poorly studied agriculture crop into a genetically well-studied species. Initial studies have successfully identified key genes associated with galactomannan biosynthesis [3, 5, 6], developed a comprehensive collection of molecular markers [9, 11], and resulted in the chromosome-level genome assembly [6, 14]. However, the full potential of these diverse genomic resources has yet to be fully realized, as their accessibility and integration necessitate significant bioinformatics expertise, posing a barrier for numerous researchers and breeders. Our research was aimed at solving this problem.
We presented a comprehensive genomic resource for C. tetragonoloba, which includes a structural and functional annotation with an extensive expression atlas and is available through a user-friendly web interface. High-quality de novo gene prediction performed using BRAKER2 and confirmed by BUSCO's high completeness score (97.5%) provides an accurate and reliable set of genes (Table 1). Accurate prediction of untranslated regions (UTRs) is essential for studying post-transcriptional regulation and research based on 3'-MACE sequencing technology [18]. The predicted genes ultimately covered all major functional categories of Mercator, indicating both successful gene prediction and high-quality annotation.
The value of genome annotation increases significantly when understanding the conditions and tissues in which genes are expressed.
The expression atlas we have built, based on 85 publicly available RNA-seq libraries covering various tissues and developmental conditions, provides an unprecedented overview of the guar transcriptome landscape. The clear separation of samples by experiment conditions, genotypes and tissue in Principal Component Analysis (PCA) highlights the high quality of this integrated dataset (Fig. 2) and the ability to make comparisons of gene expression, disregarding factors such as the sequencing run and the origin of the data.
The service is designed to simplify the work of researchers and provides them with the opportunity to work effectively with data. By integrating IGV's interactive genomic browser, BLAST server, and instant visualization of expression profiles and functional annotations, the service allows researchers to opt out of using command-line tools and local data processing.
In our view, the tools provided in this resource have the potential to significantly enhance research opportunities in the field of guar biology. By facilitating accurate genomic analysis without requiring programming skills, the sequence extractor enables researchers to rapidly obtain specific sequences of exons, introns, and intergenic regions, allowing them to design experiments, characterize genes. Gene Ontology enrichment analysis tool assists in identifying key biological processes and functions associated with, for instance, stress tolerance or seed maturation in guar. Heatmaps allowing detect trends in gene expression patterns, and the clustering of genes based on these patterns enabling identify their common regulation pathways and association with physiological processes. Together, these instruments make complex genomic and transcriptomic data more approachable and interpretable to plant biologists, facilitating a better understanding of plant physiological mechanisms and development processes.
Thus, while previous works have provided important data for genomic and transcriptomic studies of guar, our study integrates previous experience and serves as a centralized database. We have combined a variety of data into a single powerful platform. By reducing the technical barrier, this resource will allow a wider range of scientists and breeders to contribute to improving the quality of guar research, which will eventually lead, hopefully, to the creation of high-yielding, disease-resistant varieties that meet global agricultural and industrial requirements. We are aware of the limitations of the current platform and intend to address them in the future by incorporating new data, functional modules, and features that will further enhance the user experience for researchers.
Conclusion
This study presents CTGA, a comprehensive functional genomics resource that significantly advances research on C. tetragonoloba. By integrating high-quality de novo genome annotation and extensive expression data from 85 RNA-sequencing libraries, we have created a unified platform that addresses the fragmentation of existing genomic resources. The resource offers accurate gene models and functional annotations from the EggNOG and Mercator4 databases, detailed expression profiles across various tissues and stages of development as well as several analytical tools.
About the authors
Evgeny A. Zorin
N.I. Vavilov All-Russian Institute of Plant Genetic Resources;All-Russia Research Institute for Agricultural Microbiology
Author for correspondence.
Email: ezorin@arriam.ru
ORCID iD: 0000-0001-5666-3020
SPIN-code: 5048-0203
Cand. Sci. (Biology)
Russian Federation, Saint PetersburgMargarita A. Vishnyakova
N.I. Vavilov All-Russian Institute of Plant Genetic Resourcess”
Email: m.vishnyakova@vir.nw.ru
ORCID iD: 0000-0003-2808-7745
SPIN-code: 2802-9614
Scopus Author ID: 6603209207
Professor, Chief Researcher, Head of the Department of Grain Legumes Genetic Resources
Russian Federation, St. PetersburgVladimir A. Zhukov
N.I. Vavilov All-Russian Institute of Plant Genetic Resources;All-Russia Research Institute for Agricultural Microbiology
Email: vzhukov@arriam.ru
ORCID iD: 0000-0002-2411-9191
SPIN-code: 2610-3670
Cand. Sci. (Biology)
Russian Federation, Saint Petersburg; Saint PetersburgReferences
- Thombare N, Jha U, Mishra S, Siddiqui MZ. Guar gum as a promising starting material for diverse applications: A review. Int J Biol Macromol. 2016;88:361-372. doi: 10.1016/j.ijbiomac.2016.04.001
- Naoumkina M, Torres-Jerez I, Allen S, et al. Analysis of cDNA libraries from developing seeds of guar (Cyamopsis tetragonoloba (L.) Taub). BMC Plant Biol. 2007;7:62. doi: 10.1186/1471-2229-7-62
- Chaudhury A, Kaila T, Gaikwad K. Elucidation of Galactomannan Biosynthesis Pathway Genes through Transcriptome Sequencing of Seeds Collected at Different Developmental Stages of Commercially Important Indian Varieties of Cluster Bean (Cyamopsis tetragonoloba L.). Sci Rep. 2019;9(1):11539. doi: 10.1038/s41598-019-48072-w
- Hu H, Wang H, Zhang Y, Kan B, Ding Y, Huang J. Characterization of genes in guar gum biosynthesis based on quantitative RNA-sequencing in guar bean (Cyamopsis tetragonoloba). Sci Rep. 2019;9(1):10991. doi: 10.1038/s41598-019-47518-5
- Sharma S, Tyagi A, Srivastava H, et al. Exploring the edible gum (galactomannan) biosynthesis and its regulation during pod developmental stages in clusterbean using comparative transcriptomic approach. Sci Rep. 2021;11(1):4000. doi: 10.1038/s41598-021-83507-3
- Gaikwad K, Ramakrishna G, Srivastava H, et al. The chromosome-scale genome assembly of cluster bean provides molecular insight into edible gum (galactomannan) biosynthesis family genes. Sci Rep. 2023;13(1):9941. doi: 10.1038/s41598-023-33762-3
- Rawal HC, Kumar S, Mithra S V A, et al. High Quality Unigenes and Microsatellite Markers from Tissue Specific Transcriptome and Development of a Database in Clusterbean (Cyamopsis tetragonoloba, L. Taub). Genes. 2017;8(11):313. doi: 10.3390/genes8110313
- Thakur O, Randhawa GS. Identification and characterization of SSR, SNP and InDel molecular markers from RNA-Seq data of guar (Cyamopsis tetragonoloba, L. Taub.) roots. BMC Genomics. 2018;19(1):951. doi: 10.1186/s12864-018-5205-9
- Grigoreva E, Barbitoff Y, Changalidi A, et al. Development of SNP Set for the Marker-Assisted Selection of Guar (Cyamopsis tetragonoloba (L.) Taub.) Based on a Custom Reference Genome Assembly. Plants Basel Switz. 2021;10(10):2063. doi: 10.3390/plants10102063
- Arkhimandritova S, Shavarda A, Potokina E. Key metabolites associated with the onset of flowering of guar genotypes (Cyamopsis tetragonoloba (L.) Taub). BMC Plant Biol. 2020;20(Suppl 1):291. doi: 10.1186/s12870-020-02498-x
- Grigoreva E, Tkachenko A, Arkhimandritova S, et al. Identification of Key Metabolic Pathways and Biomarkers Underlying Flowering Time of Guar (Cyamopsis tetragonoloba (L.) Taub.) via Integrated Transcriptome-Metabolome Analysis. Genes. 2021;12(7):952. doi: 10.3390/genes12070952
- Kaila T, Chaduvla PK, Rawal HC, et al. Chloroplast Genome Sequence of Clusterbean (Cyamopsis tetragonoloba L.): Genome Structure and Comparative Analysis. Genes. 2017;8(9):212. doi: 10.3390/genes8090212
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403-410. doi: 10.1016/S0022-2836(05)80360-2
- Li JH, Li MJ, Li WL, et al. Leguminous industrial crop guar (Cyamopsis tetragonoloba): The chromosome-level reference genome de novo assembly. Ind Crops Prod. 2024;216:118748. doi: 10.1016/j.indcrop.2024.118748
- Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinforma. 2021;3(1):lqaa108. doi: 10.1093/nargab/lqaa108
- Brůna T, Lomsadze A, Borodovsky M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genomics Bioinforma. 2020;2(2):lqaa026. doi: 10.1093/nargab/lqaa026
- Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32(Web Server issue):W309-312. doi: 10.1093/nar/gkh379
- The Gene Ontology Consortium, Aleksander SA, Balhoff J, et al. The Gene Ontology knowledgebase in 2023. Baryshnikova A, ed. GENETICS. 2023;224(1):iyad031. doi: 10.1093/genetics/iyad031
- Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25-29. doi: 10.1038/75556
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27-30. doi: 10.1093/nar/28.1.27
- Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Tamura K, ed. Mol Biol Evol. 2021;38(12):5825-5829. doi: 10.1093/molbev/msab293
- Bolger M, Schwacke R, Usadel B. MapMan Visualization of RNA-Seq Data Using Mercator4 Functional Annotations. In: Dobnik D, Gruden K, Ramšak Ž, Coll A, eds. Solanum Tuberosum. Vol 2354. Methods in Molecular Biology. Springer US; 2021:195-212. doi: 10.1007/978-1-0716-1609-3_9
- Schwacke R, Ponce-Soto GY, Krause K, et al. MapMan4: A Refined Protein Classification and Annotation Framework Applicable to Multi-Omics Data Analysis. Mol Plant. 2019;12(6):879-892. doi: 10.1016/j.molp.2019.01.003
- Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Kelley J, ed. Mol Biol Evol. 2021;38(10):4647-4654. doi: 10.1093/molbev/msab199
- SRA-toolkit. Available online: https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit (accessed on 02.10.2025).
- BBMap. Available online: https://sourceforge.net/projects/bbmap/ (accessed on 02.10.2025).
- Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinforma Oxf Engl. 2013;29(1):15-21. doi: 10.1093/bioinformatics/bts635
- Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923-930. doi: 10.1093/bioinformatics/btt656
- FLASK. Available online: https://flask.palletsprojects.com/en/stable/ (accessed on 02.10.2025).
- Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178-192. doi: 10.1093/bib/bbs01
- Adrian, A.; Rahnenfuhrer, J. topGO; Bioconductor: Buffalo, NY, USA, 2017.
- Boneva S, Schlecht A, Böhringer D, et al. 3′ MACE RNA-sequencing allows for transcriptome profiling in human tissue samples after long-term storage. Lab Invest. 2020;100(10):1345-1355. doi: 10.1038/s41374-020-0446-z
Supplementary files



