Please use this identifier to cite or link to this item: doi:10.22028/D291-44695
Title: Identifying optimal substrate classes of membrane transporters
Author(s): Denger, Andreas
Helms, Volkhard
Language: English
Title: PloS One
Volume: 19
Issue: 12
Publisher/Platform: Plos
Year of Publication: 2024
DDC notations: 500 Science
Publikation type: Journal Article
Abstract: Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.
DOI of the first publication: 10.1371/journal.pone.0315330
URL of the first publication: https://doi.org/10.1371/journal.pone.0315330
Link to this record: urn:nbn:de:bsz:291--ds-446958
hdl:20.500.11880/39811
http://dx.doi.org/10.22028/D291-44695
ISSN: 1932-6203
Date of registration: 18-Mar-2025
Description of the related object: Supporting information
Related object: https://doi.org/10.1371/journal.pone.0315330.s001
https://doi.org/10.1371/journal.pone.0315330.s002
https://doi.org/10.1371/journal.pone.0315330.s003
https://doi.org/10.1371/journal.pone.0315330.s004
https://doi.org/10.1371/journal.pone.0315330.s005
https://doi.org/10.1371/journal.pone.0315330.s006
https://doi.org/10.1371/journal.pone.0315330.s007
https://doi.org/10.1371/journal.pone.0315330.s008
https://doi.org/10.1371/journal.pone.0315330.s009
https://doi.org/10.1371/journal.pone.0315330.s010
https://doi.org/10.1371/journal.pone.0315330.s011
https://doi.org/10.1371/journal.pone.0315330.s012
https://doi.org/10.1371/journal.pone.0315330.s013
https://doi.org/10.1371/journal.pone.0315330.s014
Faculty: NT - Naturwissenschaftlich- Technische Fakultät
Department: NT - Biowissenschaften
Professorship: NT - Prof. Dr. Volkhard Helms
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
journal.pone.0315330.pdf2,06 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons