Toward better computation models for modern machines

Jurkiewicz, Tomasz

Please use this identifier to cite or link to this item: doi:10.22028/D291-26543

Title:	Toward better computation models for modern machines
Other Titles:	Zu besseren Berechnungsmodellen für moderne Rechner
Author(s):	Jurkiewicz, Tomasz
Language:	English
Year of Publication:	2013
SWD key words:	Informatik Modell Konvexe Hülle Virtueller Speicher Peripherer Speicher Algorithmus
Free key words:	mathematics computer science convex hull multicore algorithms parallel algorithms external memory CUDA GPGPU
DDC notations:	004 Computer science, internet
Publikation type:	Dissertation
Abstract:	Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores, and a virtual memory. We address the computational cost of the address translation in the virtual memory and difficulties in design of parallel algorithms on modern many-core machines. Starting point for our work on virtual memory is the observation that the analysis of some simple algorithms (random scan of an array, binary search, heapsort) in either the RAM model or the EM model (external memory model) does not correctly predict growth rates of actual running times. We propose the VAT model (virtual address translation) to account for the cost of address translations and analyze the algorithms mentioned above and others in the model. The predictions agree with the measurements. We also analyze the VAT-cost of cache-oblivious algorithms. In the second part of the paper we present a case study of the design of an efficient 2D convex hull algorithm for GPUs. The algorithm is based on the ultimate planar convex hull algorithm of Kirkpatrick and Seidel, and it has been referred to as the first successful implementation of the QuickHull algorithm on the GPU by Gao et al. in their 2012 paper on the 3D convex hull. Our motivation for work on modern many-core machines is the general belief of the engineering community that the theory does not produce applicable results, and that the theoretical researchers are not aware of the difficulties that arise while adapting algorithms for practical use. We concentrate on showing how the high degree of parallelism available on GPUs can be applied to problems that do not readily decompose into many independent tasks. Moderne Computer sind keine Random Access Machines (RAMs), da ihr Speicher hierarchisch ist und sie sowohl mehrere Rechenkerne als auch virtuellen Speicher benutzen. Wir betrachten die Kosten von Adressübersetzungen in virtuellem Speicher und die Schwierigkeiten beim Entwurf nebenläufiger Algorithmen für moderne Mehrkernprozessoren. Am Anfang unserer Arbeit über virtuellen Speicher steht die Beobachtung, dass die Analyse einiger einfacher Algorithmen (zufällige Zugriffe in einem Array, Binärsuche, Heapsort) sowohl im RAM Modell als auch im EM (Modell für externen Speicher) die tatsächlichen asymptotischen Laufzeiten nicht korrekt wiedergibt. Um auch die Kosten der Adressübersetzung mit in die Analyse aufzunehmen, definieren wir das sogenannte VAT Modell (virtual address translation) und benutzen es, um die oben genannten Algorithmen zu analysieren. In diesem Modell stimmen die theoretischen Laufzeiten mit den Messungen aus der Praxis überein. Zudem werden die Kosten von Cache-oblivious Algorithmen im VAT Modell untersucht. Der zweite Teil der Arbeit behandelt eine Fallstudie zur Implementierung eines effizienten Algorithmus zur Berechnung von konvexen Hüllen in 2D auf GPUs (Graphics Processing Units). Der Algorithmus basiert auf dem ultimate planar convex hull algorithm von Kirkpatrick und Seidel und wurde 2012 von Gao et al. in ihrer Veröffentlichung über konvexe Hüllen in 3D als die erste erfolgreiche Implementierung des QuickHull-Algorithmus auf GPUs bezeichnet. Motiviert wird diese Arbeit durch den generellen Glauben der IT-Welt, dass Resultate aus der theoretischen Informatik nicht immer auf Probleme in der Praxis anwendbar sind und dass oft nicht auf die speziellen Anforderungen und Probleme eingegangen wird, die mit einer Implementierung eines Algorithmus einhergehen. Wir zeigen, wie der hohe Grad an Parallelität, der auf GPUs verfügbar ist, für Probleme nutzbar gemacht werden kann, für die eine Zerlegung in unabhängige Teilprobleme nicht offensichtlich ist.
Link to this record:	urn:nbn:de:bsz:291-scidok-55407 hdl:20.500.11880/26599 http://dx.doi.org/10.22028/D291-26543
Advisor:	Mehlhorn, Kurt
Date of oral examination:	30-Oct-2013
Date of registration:	7-Nov-2013
Faculty:	MI - Fakultät für Mathematik und Informatik
Department:	MI - Informatik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
final_dissertation.pdf		1,24 MB	Adobe PDF	View/Open

Export: BibTex