Human computing and crowdsourcing methods for knowledge acquisition

Kondreddi, Sarath Kumar

Please use this identifier to cite or link to this item: doi:10.22028/D291-26564

Title:	Human computing and crowdsourcing methods for knowledge acquisition
Other Titles:	Human Computing und Crowdsourcing-Methoden zur Informationsextraktion
Author(s):	Kondreddi, Sarath Kumar
Language:	English
Year of Publication:	2014
SWD key words:	Information Extraction KADS Wissensbanksystem Wissenserwerb Open Innovation
Free key words:	Human Computing Crowdsourcing Information Extraction Knowledge Acquisition KBMS
DDC notations:	004 Computer science, internet
Publikation type:	Dissertation
Abstract:	Ambiguity, complexity, and diversity in natural language textual expressions are major hindrances to automated knowledge extraction. As a result state-of-the-art methods for extracting entities and relationships from unstructured data make incorrect extractions or produce noise. With the advent of human computing, computationally hard tasks have been addressed through human inputs. While text-based knowledge acquisition can benefit from this approach, humans alone cannot bear the burden of extracting knowledge from the vast textual resources that exist today. Even making payments for crowdsourced acquisition can quickly become prohibitively expensive. In this thesis we present principled methods that effectively garner human computing inputs for improving the extraction of knowledge-base facts from natural language texts. Our methods complement automatic extraction techniques with human computing to reap benefits of both while overcoming each other’s limitations. We present the architecture and implementation of H IGGINS, a system that combines an information extraction (IE) engine with a human computing (HC) engine to produce high quality facts. The IE engine combines statistics derived from large Web corpora with semantic resources like WordNet and ConceptNet to construct a large dictionary of entity and relational phrases. It employs specifically designed statistical language models for phrase relatedness to come up with questions and relevant candidate answers that are presented to human workers. Through extensive experiments we establish the superiority of this approach in extracting relation-centric facts from text. In our experiments we extract facts about fictitious characters in narrative text, where the issues of diversity and complexity in expressing relations are far more pronounced. Finally, we also demonstrate how interesting human computing games can be designed for knowledge acquisition tasks. Mehrdeutigkeit, Komplexität sowie Vielfältigkeit im Ausdruck stellen die automatische Extraktion von Wissen aus natürlichsprachlichen Texten vor große Herausforderungen. Infolgedessen sind aktuelle Methoden zur Informationsxtraktion (IE) von Entitäten sowie deren wechselseitigen Relationen aus unstrukturierten Daten oft fehleranfällig. Durch die Methodik des Human Computing (HC) kann eine Vielzahl von schwierigen Problemen mit Hilfe menschlicher Eingaben adressiert werden. Wenngleich Problemstellungen des textbasierten Wissenserwerbs auch durch HC unterstützt werden, kann die Wissensextraktion aus sehr umfangreichen Textsammlungen nicht alleine durch diesen manuellen Ansatz gelöst werden. Weiterhin sind, im Rahmen eines Bezahlungsmodells, die durch Vergütung der von menschlichen Anwendern ausgeführten Kleinstaufgaben entstehenden Kosten unbezahlbar. In dieser Arbeit stellen wir Methoden vor, die Algorithmen zur automatischen Extraktion mit den durch Human Computing gewinnbaren Informationen kombinieren. Wir stellen die Architektur und Implementierung des Higgins-Systems vor, das IE und HC synergetisch verbindet mit dem Ziel hochwertiger und umfassender Wissensakquisition aus Texten. Die IE-Komponente von Higgins konstruiert zunächst umfangreiche Sammlungen von Entitätsbezeichnungen und relationalen Paraphrasen. Weiterhin werden aus Webkorpora gewonnene statistische Informationen mit semantischen Ressourcen wie WordNet und ConceptNet kombiniert, um die gewonnenen relationalen Phrasen zu expandieren. Spezifisch definierte statistische Modelle werden zur Bestimmung der semantischen Ähnlichkeit von Phrasen eingesetzt. Auf diese Weise generiert die IE-Komponente sowohl Fragen für HC als auch relevante Antwortmöglichkeiten. Die HC-Komponente erzeugt daraus kleine Aufgaben für Crowdsourcing oder Onlinespiele und sammelt das daraus resultierende Nutzerfeedback. Eine umfassende experimentelle Evaluation belegt die Praktikabilität und Vorteile dieser kombinierten IE/HC-Methodologie.
Link to this record:	urn:nbn:de:bsz:291-scidok-57948 hdl:20.500.11880/26620 http://dx.doi.org/10.22028/D291-26564
Advisor:	Weikum, Gerhard
Date of oral examination:	6-May-2014
Date of registration:	12-May-2014
Faculty:	MI - Fakultät für Mathematik und Informatik
Department:	MI - Informatik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
Dissertation_Kondreddi.pdf		2,58 MB	Adobe PDF	View/Open

Export: BibTex