Please use this identifier to cite or link to this item: doi:10.22028/D291-41693
Title: MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants
Author(s): Srikakulam, Sanjay K.
Keller, Sebastian
Dabbaghie, Fawaz
Bals, Robert
Kalinina, Olga V.
Language: English
Title: Bioinformatics
Volume: 39
Issue: 3
Publisher/Platform: Oxford University Press
Year of Publication: 2023
DDC notations: 610 Medicine and health
Publikation type: Journal Article
Abstract: Motivation: Bloom filters are a popular data structure that allows rapid searches in large sequence datasets. So far, all tools work with nucleotide sequences; however, protein sequences are conserved over longer evolutionary distances, and only mutations on the protein level may have any functional significance. Results: We present MetaProFi, a Bloom filter-based tool that, for the first time, offers the functionality to build indexes of amino acid sequences and query them with both amino acid and nucleotide sequences, thus bringing sequence comparison to the biologically relevant protein level. MetaProFi implements additional efficient engineering solutions, such as a shared memory system, chunked data storage and efficient compression. In addition to its conceptual novelty, MetaProFi demonstrates state-of-the-art performance and excellent memory consumption-to-speed ratio when applied to various large datasets. Availability and implementation: Source code in Python is available at Contact:
DOI of the first publication: 10.1093/bioinformatics/btad101
URL of the first publication:
Link to this record: urn:nbn:de:bsz:291--ds-416933
ISSN: 1367-4811
Date of registration: 1-Mar-2024
Faculty: M - Medizinische Fakultät
Department: M - Innere Medizin
M - Medizinische Biometrie, Epidemiologie und medizinische Informatik
Professorship: M - Prof. Dr. Robert Bals
M - Prof. Dr. Olga Kalinina
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
btad101.pdf1,42 MBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons