Understanding fundamental database operations on modern hardware

Schuh, Stefan

Please use this identifier to cite or link to this item: doi:10.22028/D291-26638

Title:	Understanding fundamental database operations on modern hardware
Other Titles:	Verstehen fundamentaler Datenbank Operationen auf moderner Hardware
Author(s):	Schuh, Stefan
Language:	English
Year of Publication:	2015
SWD key words:	Datenbank Hardware Abfrageverarbeitung
Free key words:	database queryprocessing joins hardware
DDC notations:	004 Computer science, internet
Publikation type:	Dissertation
Abstract:	We live in an interconnected digital society, where many companies like e.g. Google, Facebook, and Twitter gather and manage tremendous amounts of data every day. The ongoing rise of mobile computing and the availability of more and more sensor data, e.g. from smart meters, increases the amount of data that is produced every day. Several different architectures have evolved to cope with these vast amount of data over the years.Traditionally, mainframes were used to handle huge amounts of data. However, the mainframe has to renew itself to allow for modern data analytics to be efficient and affordable. Advances in the main memory capacity led to the development of in-memory databases architectures, run on many-core non-uniform memory access (NUMA) machines that can handle terabytes of data on a single machine.As another architecture Google developed MapReduce, a distributed framework for data processing on hundreds or even thousands of commodity machines, to handle data that cannot be stored or processed by a single machine, even if it has a capacity in the range of terabytes. This thesis consists of three independent parts, as we investigate different fundamental database operations on three different hardware environments mentioned before in three independent projects.In the first project we look at recently published relational equi-join algorithms on modern many-core NUMA servers with large main memory capacities and introduce our own variants of those algorithms.Afterwards, in a second project we investigate how to introduce efficient static and adaptive indexing into the open source Hadoop MapReduce framework, which runs on a cluster of commodity machines. In that project we will also introduce and investigate the Adaptive Index Replacement problem, a variant of the online Index Selection problem. Finally, in the third project we investigate how to bring analytical workloads to the IBM System Z mainframe and introduce a new hardware component that allows us to accelerate filtering and aggregation on large in-memory column stores. Wir leben in einer vernetzten digitalen Gesellschaft, in der viele Unternehmen, wie zum Beispiel Google, Facebook und Twitter, enorme Datenmengen sammeln und verwalten. Das anhaltende Wachstum des mobilen Computing und die Verfügbarkeit von immer mehr Sensordaten, zum Beispiel von intelligenten Stromzählern, erhöhen die täglich generierte Datenmenge. Um mit der immer stärker wachsenden Datenflut fertig zu werden, haben sich verschiedene Architekturen entwickelt. Die älteste Architektur zur Handhabung großer Daten stellt der Mainframe Computer dar. Allerdings muss sich der Mainframe neu erfinden, um die heutige Datenmenge effizient und vor allem auch zu einem bezahlbaren Preis nicht nur abzuspeichern, sondern auch zu analysieren. Stark erhöhte Hauptspeicher-Kapazitäten haben zur Entwicklung einer neuen Architektur von In-Memory Datenbanken geführt, die typischerweise auf modernen Vielkern-Servern ausgeführt werden. Um mehrere Terabytes an Hauptspeicher in einem einzelnen Server anzubieten wird der Hauptspeicher an mehrere CPUs angeschlossen, was zu nicht-uniformen Speicherzugriffszeiten (NUMA) führt, da es einen Unterschied in der Zugriffszeit zwischen lokalem und entfernten Speicher gibt. Als eine weitere Architektur hat Google MapReduce entwickelt, ein verteiltes System zur Datenverarbeitung auf hunderten oder sogar tausenden Servern. Diese Thesis besteht aus drei unabhängigen Teilen, in denen wir verschiedene fundamentale Datenbank-Operationen in drei unterschiedlichen Hardwareumgebungen untersuchen. In dem ersten Teil untersuchen wir kürzlich veröffentlichte relationale Equi-Joins auf modernen Vielkern-Systemen mit NUMA und großen Hauptspeicherkapazitäten. Zusätzlich stellen wir eigene Join Algorithmen vor. Im zweiten Teil untersuchen wir, wie wir effizientes statisches und adaptives Indizieren in das Open Source Framework Hadoop MapReduce integrieren können. In diesem Teil untersuchen wir auch das Adaptive Index Replacement Problem, eine Variante des Online Index Selection Problems. Im dritten und letzten Teil untersuchen wir Möglichkeiten den System Z Mainframe von IBM für Datenanalyse interessant zu machen und entwickeln eine neue Hardware-Komponente um Filter- und Aggregationsanfragen auf großen In-Memory Column Stores zu beschleunigen.
Link to this record:	urn:nbn:de:bsz:291-scidok-63326 hdl:20.500.11880/26694 http://dx.doi.org/10.22028/D291-26638
Advisor:	Dittrich, Jens
Date of oral examination:	18-Dec-2015
Date of registration:	23-Dec-2015
Faculty:	MI - Fakultät für Mathematik und Informatik
Department:	MI - Informatik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
ThesisMain.pdf		3,32 MB	Adobe PDF	View/Open

Export: BibTex