Clustering with neighborhood graphs

Maier, Markus Martin

Please use this identifier to cite or link to this item: doi:10.22028/D291-25967

Title:	Clustering with neighborhood graphs
Author(s):	Maier, Markus Martin
Language:	English
Year of Publication:	2009
SWD key words:	Graph Cluster-Analyse
Free key words:	Graphclustering Nachbarschaftsgraph Clusteridentifizierung graph clustering neighborhood graph cluster identification
DDC notations:	004 Computer science, internet
Publikation type:	Dissertation
Abstract:	Graph clustering methods are defined for general weighted graphs. If data is given in the form of points and distances between them, a neighborhood graph, such as the r-graph or kNN-graphs, is constructed and graph clustering is applied to this graph. We investigate the influence of the type and parameter of the neighborhood graph on the clustering results, when n sample points are drawn independently from a density in Euclidean space. In Chapter 2 we study "cluster identification';: the true clusters are the connected components of density level sets and a cluster is identified if its points are a connected component of the graph. We compare (modifications of) the mutual and the symmetric kNN-graph. They behave differently if the goal is to identify the "most significant'; clusters, whereas there is no difference if the goal is to identify all clusters. We give the range of k for which the clusters are identified in the graphs and derive the optimal choice of k, which, surprisingly, is linear in n. In Chapter 3 we study the convergence of the normalized cut (Ncut) and the ratio cut as n -> for cuts in the kNN- and the r-graph induced by a hyperplane. The limits differ; consequently Ncut on a kNN-graph does something systematically different than Ncut on an r-graph! This can be experimentally observed on toy and real data sets. Therefore, graph clustering criteria cannot be studied independently of the type of graph to which they are applied. Graphclustering ist für gewichtete Graphen definiert. Liegen Daten jedoch in Form von Punkten und Abständen zwischen ihnen vor, wird zuerst ein Nachbarschaftsgraph wie der r-Graph oder kNN-Graphen konstruiert, auf den dann Graphclustering angewandt wird. In dieser Arbeit wird der Einfluss des Nachbarschaftsgraphen auf die Clusteringergebnisse untersucht, wenn n Punkte unabhängig voneinander von einer Dichte im euklidischen Raum gezogen werden. In Kapitel 2 wird das Problem der "Clusteridentifizierung'; betrachtet: die Cluster sind die Zusammenhangskomponenten einer Dichteniveaumenge. Ein Cluster wird identifiziert, wenn seine Punkte eine Zusammenhangskomponente des Graphen bilden. Modifikationen verschiedener kNN-Graphen werden verglichen. Sollen nur die "signifikantesten'; Cluster gefunden werden, unterscheidet sich ihr erhalten, nicht jedoch für die Identifizierung aller Cluster. Es wird gezeigt, für welche k die Cluster identifiziert werden und dass die optimale Wahl von k linear in n ist. In Kapitel 3 wird die Konvergenz der Kriterien "normalized cut'; (Ncut) und "ratio cut'; für Schnitte im kNN- und r-Graphen, die von einer Hyperebene induziert werden, gezeigt. Die Grenzwerte unterscheiden sich. Folglich bewirkt Ncut auf einem kNNGraphen etwas anderes als Ncut auf einem r-Graphen. Dieser Effekt kann experimentell beobachtet werden. Daraus folgt, dass Graphclusteringkriterien nicht getrennt vom Graphtyp betrachtet werden können.
Link to this record:	urn:nbn:de:bsz:291-scidok-29661 hdl:20.500.11880/26023 http://dx.doi.org/10.22028/D291-25967
Advisor:	Hein, Matthias
Date of oral examination:	22-Feb-2010
Date of registration:	6-May-2010
Faculty:	MI - Fakultät für Mathematik und Informatik
Department:	MI - Informatik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
Dissertation_8319_Maie_Mark_2010.pdf		905,98 kB	Adobe PDF	View/Open

Export: BibTex