Identifying optimal substrate classes of membrane transporters

Denger, Andreas; Helms, Volkhard

Please use this identifier to cite or link to this item: doi:10.22028/D291-44695

Title:	Identifying optimal substrate classes of membrane transporters
Author(s):	Denger, Andreas Helms, Volkhard
Language:	English
Title:	PloS One
Volume:	19
Issue:	12
Publisher/Platform:	Plos
Year of Publication:	2024
DDC notations:	500 Science
Publikation type:	Journal Article
Abstract:	Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.
DOI of the first publication:	10.1371/journal.pone.0315330
URL of the first publication:	https://doi.org/10.1371/journal.pone.0315330
Link to this record:	urn:nbn:de:bsz:291--ds-446958 hdl:20.500.11880/39811 http://dx.doi.org/10.22028/D291-44695
ISSN:	1932-6203
Date of registration:	18-Mar-2025
Description of the related object:	Supporting information
Related object:	https://doi.org/10.1371/journal.pone.0315330.s001 https://doi.org/10.1371/journal.pone.0315330.s002 https://doi.org/10.1371/journal.pone.0315330.s003 https://doi.org/10.1371/journal.pone.0315330.s004 https://doi.org/10.1371/journal.pone.0315330.s005 https://doi.org/10.1371/journal.pone.0315330.s006 https://doi.org/10.1371/journal.pone.0315330.s007 https://doi.org/10.1371/journal.pone.0315330.s008 https://doi.org/10.1371/journal.pone.0315330.s009 https://doi.org/10.1371/journal.pone.0315330.s010 https://doi.org/10.1371/journal.pone.0315330.s011 https://doi.org/10.1371/journal.pone.0315330.s012 https://doi.org/10.1371/journal.pone.0315330.s013 https://doi.org/10.1371/journal.pone.0315330.s014
Faculty:	NT - Naturwissenschaftlich- Technische Fakultät
Department:	NT - Biowissenschaften
Professorship:	NT - Prof. Dr. Volkhard Helms
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
journal.pone.0315330.pdf		2,06 MB	Adobe PDF	View/Open

Export: BibTex

This item is licensed under a Creative Commons License