U-AIDA : a customizable system for named entity recognition, classification, and disambiguation

Yosef, Mohamed Amir

Please use this identifier to cite or link to this item: doi:10.22028/D291-25426

Title:	U-AIDA : a customizable system for named entity recognition, classification, and disambiguation
Other Titles:	U-AIDA : ein anpassbares System zur Erkennung, Klassifikation und Disambiguierung benannter Entitäten
Author(s):	Yosef, Mohamed Amir
Language:	English
Year of Publication:	2015
SWD key words:	Multi-Lingual Scholar Automatische Klassifikation Information Retrieval
Free key words:	multi-lingual program automatic classification information retrieval
DDC notations:	004 Computer science, internet
Publikation type:	Dissertation
Abstract:	Recognizing and disambiguating entities such as people, organizations, events or places in natural language text are essential steps for many linguistic tasks such as information extraction and text categorization. A variety of named entity disambiguation methods have been proposed, but most of them focus on Wikipedia as a sole knowledge resource. This focus does not fit all application scenarios, and customization to the respective application domain is crucial. This dissertation addresses the problem of building an easily customizable system for named entity disambiguation. The first contribution is the development of a universal and flexible architecture that supports plugging in different knowledge resources. The second contribution is utilizing the flexible architecture to develop two domain-specific disambiguation systems. The third contribution is the design of a complete pipeline for building disambiguation systems for languages other than English that have poor annotated resources such as Arabic. The fourth contribution is a novel approach that performs fine-grained type classification of names in natural language text. Das Erkennen und die Disambiguierung von Entitäten wie etwa Personen, Organisationen oder Orte in natürlichsprachigem Text sind wertvolle Hilfsmittel für zahlreiche linguistische Aufgaben Biespielanwendungen sind Informationsextraktion oder die Kategorisierung von Texten. In diesem Kontext sind eine Vielzahl von Verfahren zur Disambiguierung erforscht worden. Allerdings basieren die meisten dieser Verfahren lediglich auf dem aus Wikipedia extrahierbaren “Wissen”. Diese Fokussierung eignet sich jedoch keines- wegs für alle Anwendungsszenarien, weshalb eine Anpassung an die jeweils vorliegende Anwendungsdomäne besonders wichtig ist. Diese Dissertation befasst sich daher mit dem Entwurf eines Universell einsetzbaren und individuell konfigurierbaren Systems zur Disambiguierung von Entitätsnamen. Der erste Beitrag dieser Arbeit ist die Entwicklung einer universell einsatzfähigen und anpassbaren Architektur, die das Einbinden unterschiedlicher Wissensquellen ermöglicht. Darauf aufbauend wird die Flexibilität der vorgestellten Architektur mittels zweier domänen-spezifischer Anwendungen belegt. Darüber hinaus wird die Vielseitigkeit des Verfahrens durch den Entwurf eines kompletten Verarbeitungsprozess für resourcenarme Sprachen am Beispiel der arabischen Sprache gezeigt. Abschließend wird ein neuartiger Ansatz zur feingranularen Typisierung von benannten Entitäten in natürlichsprachigem Text vorgestellt.
Link to this record:	urn:nbn:de:bsz:291-scidok-63703 hdl:20.500.11880/25482 http://dx.doi.org/10.22028/D291-25426
Advisor:	Weikum, Gerhard
Date of oral examination:	11-Dec-2015
Date of registration:	19-Feb-2016
Faculty:	SE - Sonstige Einrichtungen
Department:	SE - Max-Planck-Institut für Informatik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
mamir_dissertation_with_reviewers.pdf		4,2 MB	Adobe PDF	View/Open

Export: BibTex