<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Current Bioinformatics</journal-id><journal-title-group><journal-title xml:lang="en">Current Bioinformatics</journal-title><trans-title-group xml:lang="ru"><trans-title>Current Bioinformatics</trans-title></trans-title-group></journal-title-group><issn publication-format="print">1574-8936</issn><issn publication-format="electronic">2212-392X</issn><publisher><publisher-name xml:lang="en">Bentham Science</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">643880</article-id><article-id pub-id-type="doi">10.2174/0115748936264122231016094702</article-id><article-categories><subj-group subj-group-type="toc-heading"><subject>Life Sciences</subject></subj-group><subj-group subj-group-type="article-type"><subject>Research Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Muñoz</surname><given-names>Jessica</given-names></name><email>info@benthamscience.net</email><xref ref-type="aff" rid="aff1"/></contrib><contrib contrib-type="author"><name><surname>Reyes-Suárez</surname><given-names>José</given-names></name><email>info@benthamscience.net</email><xref ref-type="aff" rid="aff1"/></contrib><contrib contrib-type="author"><name><surname>Besoain</surname><given-names>Felipe</given-names></name><email>info@benthamscience.net</email><xref ref-type="aff" rid="aff2"/></contrib><contrib contrib-type="author"><name><surname>Arenas-Salinas</surname><given-names>Mauricio</given-names></name><email>info@benthamscience.net</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff id="aff1"><institution>Centro de Bioinformática, Simulación y Modelado (CBSM). Facultad de Ingeniería., Universidad de Talca</institution></aff><aff id="aff2"><institution>Faculty of Engineering,, Campus Talca, Universidad de Talca,</institution></aff><pub-date date-type="pub" iso-8601-date="2024-04-01" publication-format="electronic"><day>01</day><month>04</month><year>2024</year></pub-date><volume>19</volume><issue>4</issue><issue-title xml:lang="ru"/><fpage>398</fpage><lpage>407</lpage><history><date date-type="received" iso-8601-date="2025-01-07"><day>07</day><month>01</month><year>2025</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2024, Bentham Science Publishers</copyright-statement><copyright-year>2024</copyright-year><copyright-holder xml:lang="en">Bentham Science Publishers</copyright-holder><ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/></permissions><self-uri xlink:href="https://journals.eco-vector.com/1574-8936/article/view/643880">https://journals.eco-vector.com/1574-8936/article/view/643880</self-uri><abstract xml:lang="en"><p id="idm46041443823232">Introduction:Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features.</p><p id="idm46041443827232">Methods:In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance.</p><p id="idm46041443831200">Results:The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively.</p><p id="idm46041443836256">Conclusion:The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.</p></abstract><kwd-group xml:lang="en"><kwd>Fur</kwd><kwd>transcriptions factor</kwd><kwd>antibacterial</kwd><kwd>machine learning</kwd><kwd>protein-DNA</kwd><kwd>biotechnology.</kwd></kwd-group></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><mixed-citation>Deng C, Wu Y, Lv X, et al. Refactoring transcription factors for metabolic engineering. Biotech Adv 2022; 57(August 2021): 107935. doi: 10.1016/j.biotechadv.2022.107935</mixed-citation></ref><ref id="B2"><label>2.</label><mixed-citation>Neph S, Vierstra J, Stergachis AB, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 2012; 489(7414): 83-90. doi: 10.1038/nature11212 PMID: 22955618</mixed-citation></ref><ref id="B3"><label>3.</label><mixed-citation>Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci USA 2006; 103(40): 14724-31. doi: 10.1073/pnas.0508637103 PMID: 17003135</mixed-citation></ref><ref id="B4"><label>4.</label><mixed-citation>Geng H, Jiang R. cAMP receptor protein (CRP)-mediated resistance/tolerance in bacteria: Mechanism and utilization in biotechnology. Appl Microbiol Biotechnol 2015; 99(11): 4533-43. doi: 10.1007/s00253-015-6587-0 PMID: 25913005</mixed-citation></ref><ref id="B5"><label>5.</label><mixed-citation>Lin Z, Zhang Y, Wang J. Engineering of transcriptional regulators enhances microbial stress tolerance. Biotechnol Adv 2013; 31(6): 986-91. doi: 10.1016/j.biotechadv.2013.02.010 PMID: 23473970</mixed-citation></ref><ref id="B6"><label>6.</label><mixed-citation>Papavassiliou KA, Papavassiliou AG. Transcription factor drug targets. J Cell Biochem 2016; 117(12): 2693-6. doi: 10.1002/jcb.25605 PMID: 27191703</mixed-citation></ref><ref id="B7"><label>7.</label><mixed-citation>Seo SW, Kim D, Latif H, OBrien EJ, Szubin R, Palsson BO. Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli. Nat Commun 2014; 5(1): 4910. doi: 10.1038/ncomms5910 PMID: 25222563</mixed-citation></ref><ref id="B8"><label>8.</label><mixed-citation>Hantke K. Iron and metal regulation in bacteria. Curr Opin Microbiol 2001; 4(2): 172-7. doi: 10.1016/S1369-5274(00)00184-3 PMID: 11282473</mixed-citation></ref><ref id="B9"><label>9.</label><mixed-citation>Pich OQ, Merrell DS. The ferric uptake regulator of Helicobacter pylori: A critical player in the battle for iron and colonization of the stomach. Future Microbiol 2013; 8(6): 725-38. doi: 10.2217/fmb.13.43 PMID: 23701330</mixed-citation></ref><ref id="B10"><label>10.</label><mixed-citation>Pohl E, Haller JC, Mijovilovich A, Meyer-Klaucke W, Garman E, Vasil ML. Architecture of a protein central to iron homeostasis: Crystal structure and spectroscopic analysis of the ferric uptake regulator. Mol Microbiol 2003; 47(4): 903-15. doi: 10.1046/j.1365-2958.2003.03337.x PMID: 12581348</mixed-citation></ref><ref id="B11"><label>11.</label><mixed-citation>Sritharan M. Iron and bacterial virulence. Indian J Med Microbiol 2006; 24(3): 163-4. doi: 10.1016/S0255-0857(21)02343-4 PMID: 16912433</mixed-citation></ref><ref id="B12"><label>12.</label><mixed-citation>Cissé C, Mathieu SV, Abeih MBO, et al. Inhibition of the ferric uptake regulator by peptides derived from anti-FUR peptide aptamers: Coupled theoretical and experimental approaches. ACS Chem Biol 2014; 9(12): 2779-86. doi: 10.1021/cb5005977 PMID: 25238402</mixed-citation></ref><ref id="B13"><label>13.</label><mixed-citation>Mathieu S, Cissé C, Vitale S, et al. From peptide aptamers to inhibitors of FUR, bacterial transcriptional regulator of iron homeostasis and virulence. ACS Chem Biol 2016; 11(9): 2519-28. doi: 10.1021/acschembio.6b00360 PMID: 27409249</mixed-citation></ref><ref id="B14"><label>14.</label><mixed-citation>He X, Liao X, Li H, Xia W, Sun H. Bismuth-induced inactivation of ferric uptake regulator from helicobacter pylori. Inorg Chem 2017; 56(24): 15041-8. doi: 10.1021/acs.inorgchem.7b02380 PMID: 29200284</mixed-citation></ref><ref id="B15"><label>15.</label><mixed-citation>Zhang Y, Ni J, Gao Y RF‐SVM. Identification of DNA‐binding proteins based on comprehensive feature representation methods and support vector machine Proteins 2022; 90(2): 395-404. doi: 10.1002/prot.26229 PMID: 34455627</mixed-citation></ref><ref id="B16"><label>16.</label><mixed-citation>Hendrix SG, Chang KY, Ryu Z, Xie ZR. Deepdise: Dna binding site prediction using a deep learning method. Int J Mol Sci 2021; 22(11): 5510. doi: 10.3390/ijms22115510 PMID: 34073705</mixed-citation></ref><ref id="B17"><label>17.</label><mixed-citation>Liu B, Xu J, Lan X, et al. iDNA-Prot⋅dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014; 9(9): e106691. doi: 10.1371/journal.pone.0106691 PMID: 25184541</mixed-citation></ref><ref id="B18"><label>18.</label><mixed-citation>Sang X, Xiao W, Zheng H, Yang Y, Liu T. HMMPred: Accurate prediction of DNA-binding proteins based on HMM profiles and XGBoost feature selection. Comput Math Methods Med 2020; 2020: 1-10. doi: 10.1155/2020/1384749 PMID: 32300371</mixed-citation></ref><ref id="B19"><label>19.</label><mixed-citation>Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in proteinDNA binding interfaces. Brief Bioinform 2020; 21(3): 1038-46. doi: 10.1093/bib/bbz037 PMID: 30957840</mixed-citation></ref><ref id="B20"><label>20.</label><mixed-citation>Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H. The Protein Data Bank. Nucleic Acids Res 2000; 28(1): 235-42.</mixed-citation></ref><ref id="B21"><label>21.</label><mixed-citation>Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011; 28(10): 2731-9. doi: 10.1093/molbev/msr121 PMID: 21546353</mixed-citation></ref><ref id="B22"><label>22.</label><mixed-citation>Humphrey W. VMD: Visual molecular dynamics. J Mol Graph 1996; 14(1): 33-8.</mixed-citation></ref><ref id="B23"><label>23.</label><mixed-citation>Eargle J, Wright D, Luthey-Schulten Z. Multiple Alignment of protein structures and sequences for VMD. Bioinformatics 2006; 22(4): 504-6. doi: 10.1093/bioinformatics/bti825 PMID: 16339280</mixed-citation></ref><ref id="B24"><label>24.</label><mixed-citation>Osorio D, Rondón-Villarreal P, Torres R. Peptides: A package for data mining of antimicrobial peptides. R J 2015; 7(1): 4-14. doi: 10.32614/RJ-2015-001</mixed-citation></ref><ref id="B25"><label>25.</label><mixed-citation>Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chous general PseAAC. Sci Rep 2017; 7(1): 42362. doi: 10.1038/srep42362 PMID: 28205576</mixed-citation></ref><ref id="B26"><label>26.</label><mixed-citation>Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: An online force field. Nucleic Acids Res 2005; 33(Web Server) (Suppl. 2): W382-8. doi: 10.1093/nar/gki387 PMID: 15980494</mixed-citation></ref><ref id="B27"><label>27.</label><mixed-citation>Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014; 40(1): 16-28. doi: 10.1016/j.compeleceng.2013.11.024</mixed-citation></ref><ref id="B28"><label>28.</label><mixed-citation>Berisha V, Krantsevich C, Hahn PR, et al. Digital medicine and the curse of dimensionality. NPJ Digit Med 2021; 4(1): 153. doi: 10.1038/s41746-021-00521-5 PMID: 34711924</mixed-citation></ref><ref id="B29"><label>29.</label><mixed-citation>Chowdhury SY, Shatabda S, Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Sci Rep 2017; 7(1): 14938. doi: 10.1038/s41598-017-14945-1 PMID: 29097781</mixed-citation></ref><ref id="B30"><label>30.</label><mixed-citation>Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9(1): e86703. doi: 10.1371/journal.pone.0086703 PMID: 24475169</mixed-citation></ref></ref-list></back></article>
