Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-44695
Titel: Identifying optimal substrate classes of membrane transporters
VerfasserIn: Denger, Andreas
Helms, Volkhard
Sprache: Englisch
Titel: PloS One
Bandnummer: 19
Heft: 12
Verlag/Plattform: Plos
Erscheinungsjahr: 2024
DDC-Sachgruppe: 500 Naturwissenschaften
Dokumenttyp: Journalartikel / Zeitschriftenartikel
Abstract: Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.
DOI der Erstveröffentlichung: 10.1371/journal.pone.0315330
URL der Erstveröffentlichung: https://doi.org/10.1371/journal.pone.0315330
Link zu diesem Datensatz: urn:nbn:de:bsz:291--ds-446958
hdl:20.500.11880/39811
http://dx.doi.org/10.22028/D291-44695
ISSN: 1932-6203
Datum des Eintrags: 18-Mär-2025
Bezeichnung des in Beziehung stehenden Objekts: Supporting information
In Beziehung stehendes Objekt: https://doi.org/10.1371/journal.pone.0315330.s001
https://doi.org/10.1371/journal.pone.0315330.s002
https://doi.org/10.1371/journal.pone.0315330.s003
https://doi.org/10.1371/journal.pone.0315330.s004
https://doi.org/10.1371/journal.pone.0315330.s005
https://doi.org/10.1371/journal.pone.0315330.s006
https://doi.org/10.1371/journal.pone.0315330.s007
https://doi.org/10.1371/journal.pone.0315330.s008
https://doi.org/10.1371/journal.pone.0315330.s009
https://doi.org/10.1371/journal.pone.0315330.s010
https://doi.org/10.1371/journal.pone.0315330.s011
https://doi.org/10.1371/journal.pone.0315330.s012
https://doi.org/10.1371/journal.pone.0315330.s013
https://doi.org/10.1371/journal.pone.0315330.s014
Fakultät: NT - Naturwissenschaftlich- Technische Fakultät
Fachrichtung: NT - Biowissenschaften
Professur: NT - Prof. Dr. Volkhard Helms
Sammlung:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:
Datei Beschreibung GrößeFormat 
journal.pone.0315330.pdf2,06 MBAdobe PDFÖffnen/Anzeigen


Diese Ressource wurde unter folgender Copyright-Bestimmung veröffentlicht: Lizenz von Creative Commons Creative Commons