Understanding and managing the performance variation and data growth in cloud computing

Schad, Jörg

Please use this identifier to cite or link to this item: doi:10.22028/D291-26707

Title:	Understanding and managing the performance variation and data growth in cloud computing
Other Titles:	Verstehen und Verwalten der Performance Variation und Datenwachstum in Cloud Computing
Author(s):	Schad, Jörg
Language:	English
Year of Publication:	2015
SWD key words:	Cloud Computing Mehrrechnersystem Rechenkapazität Variation
Free key words:	cloud computing MapReduce data processing
DDC notations:	004 Computer science, internet
Publikation type:	Dissertation
Abstract:	The topics of Cloud Computing and Big Data Analytics dominate today's IT landscape. This dissertation considers the combination of both and the resulting challenges. In particular, it addresses executing data intensive jobs efficiently on public cloud infrastructure, with respect to response time, cost, and reproducibility. We present an extensive study of performance variance in public cloud infrastructures covering various dimensions including micro-benchmarks and application-benchmarks, different points in time, and different cloud vendors. It shows that performance variance is a problem for cloud users even at the application level. It then provides some guidelines and tools on how to counter the effects of performance variance. Next, this dissertation addresses the challenge of efficiently processing dynamic datasets. Dynamic datasets, i.e., datasets which change over time, are a challenge for standard MapReduce Big Data Analytics as they require the entire dataset to be reprocessed after every change. We present a framework to deal efficiently with dynamic datasets inside MapReduce using different techniques depending on the characteristics of the dataset. The results show that we can significantly reduce reprocessing time for most use-cases. This dissertation concludes with a discussion on how new technologies such as container virtualization will affect the challenges presented here. Cloud Computing und die Verarbeitung grosser Datenmengen sind allgegenwärtige Themen in der heutigen IT Landschaft. Diese Dissertation befasst sich mit der Kombination dieser beiden Technologien. Insbesondere werden die Problematiken mit effizienter und reproduzierbarer Verarbeitung von grossen Datenmengen innerhalb von Public Cloud Angeboten betrachtet. Das Problem von variabler Rechenleistung bei Public-Cloud-Angeboten wird in einer ausführlichen Studie untersucht. Es wird gezeigt, dass sich die Varianz der Rechenleistung auf verschiedenen Leveln bis zur Applikation auswirkt. Wir diskutieren Ansätze um diese Varianz zu reduzieren und präsentieren verschiedene Algorithmus um homogene Rechenleistung in einem Rechnerverbund zu erreichen. Die Verarbeitung von grossen, dynamischen Datenmengen mit heutigen MapReduce basierten Systeme ist relativ aufwändig, weil Änderungen der Daten eine Neuberechnung des gesamten Datensatzes erfordert. Solche Neuberechnungen können insbesondere in Cloud Umgebungen schnell zu hohen Kosten führen. Diese Dissertation präsentiert verschiedene Algorithmen zum effizienten Verarbeiten solcher dynamischen Datensets und ein System welches automatisch den für das Datenset passenden optimalen Algorithmus auswählt. Wir schliessen mit einer Diskussion welchen Einfluss neue Technologien wie Docker auf die hier präsentierten Probleme haben.
Link to this record:	urn:nbn:de:bsz:291-scidok-68235 hdl:20.500.11880/26763 http://dx.doi.org/10.22028/D291-26707
Advisor:	Dittrich, Jens
Date of oral examination:	15-Apr-2016
Date of registration:	12-Apr-2017
Faculty:	MI - Fakultät für Mathematik und Informatik
Department:	MI - Informatik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
Final_Version_Schad.pdf		4,12 MB	Adobe PDF	View/Open

Export: BibTex