Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-41694
Titel: PanPA: generation and alignment of panproteome graphs
VerfasserIn: Dabbaghie, Fawaz
Srikakulam, Sanjay K.
Marschall, Tobias
Kalinina, Olga V.
Sprache: Englisch
Titel: Bioinformatics Advances
Bandnummer: 3
Heft: 1
Verlag/Plattform: Oxford University Press
Erscheinungsjahr: 2023
DDC-Sachgruppe: 610 Medizin, Gesundheit
Dokumenttyp: Journalartikel / Zeitschriftenartikel
Abstract: Motivation: Compared to eukaryotes, prokaryote genomes are more diverse through different mechanisms, including a higher mutation rate and horizontal gene transfer. Therefore, using a linear representative reference can cause a reference bias. Graph-based pangenome methods have been developed to tackle this problem. However, comparisons in DNA space are still challenging due to this high diversity. In contrast, amino acid sequences have higher similarity due to evolutionary constraints, whereby a single amino acid may be encoded by several synonymous codons. Coding regions cover the majority of the genome in prokaryotes. Thus, panproteomes present an attractive alternative leveraging the higher sequence similarity while not losing much of the genome in non-coding regions. Results: We present PanPA, a method that takes a set of multiple sequence alignments of protein sequences, indexes them, and builds a graph for each multiple sequence alignment. In the querying step, it can align DNA or amino acid sequences back to these graphs. We first showcase that PanPA generates correct alignments on a panproteome from 1350 Escherichia coli. To demonstrate that panproteomes allow comparisons at longer phylogenetic distances, we compare DNA and protein alignments from 1073 Salmonella enterica assemblies against E. coli reference genome, pangenome, and panproteome using BWA, GraphAligner, and PanPA, respectively; with PanPA aligning around 22% more sequences. We also aligned a DNA short-reads whole genome sequencing (WGS) sample from S.enterica against the E.coli reference with BWA and the panproteome with PanPA, where PanPA was able to find alignment for 68% of the reads compared to 5% with BWA.
DOI der Erstveröffentlichung: 10.1093/bioadv/vbad167
URL der Erstveröffentlichung: https://doi.org/10.1093/bioadv/vbad167
Link zu diesem Datensatz: urn:nbn:de:bsz:291--ds-416940
hdl:20.500.11880/37322
http://dx.doi.org/10.22028/D291-41694
ISSN: 2635-0041
Datum des Eintrags: 1-Mär-2024
Bezeichnung des in Beziehung stehenden Objekts: Supplementary data
In Beziehung stehendes Objekt: https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bioinformaticsadvances/3/1/10.1093_bioadv_vbad167/1/vbad167_supplementary_data.pdf?Expires=1711486788&Signature=EQqeQ4OC~4oG13EX6lCpRjKPHpwSfmXbs8vVT-RSgxj4GSMBiyKsGHM4TD0Ur5Zd0csBfrPMTNI9J2fwxJN31dOE5g~ZSwf6jSYlQs0RPcNJ8RRgJMxsN4IaCuKq~4sAo4jYIqvIRFUhlwx1WNw4r7iU5dh4gvsIfZNasKXS711reJaF9zZlMMOSLiWvD9ilU5UiTUR7Ie25eZeYeBiAC1svXxq5WNwTWBdSzmjXzjCdZ5yYtaGia1Eb4C18l3SYv-vJ9wwzWzR~vfLC3HIjD0D4P4TayHKgFYWjsSZpu8lhBllfNwzu3E9bVFVGMzB2SclbsYBqspJmhKNgSXEFHw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA
Fakultät: M - Medizinische Fakultät
MI - Fakultät für Mathematik und Informatik
Fachrichtung: M - Medizinische Biometrie, Epidemiologie und medizinische Informatik
MI - Informatik
Professur: M - Prof. Dr. Olga Kalinina
MI - Prof. Dr. Tobias Marschall
Sammlung:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:
Datei Beschreibung GrößeFormat 
vbad167.pdf1,65 MBAdobe PDFÖffnen/Anzeigen


Diese Ressource wurde unter folgender Copyright-Bestimmung veröffentlicht: Lizenz von Creative Commons Creative Commons