Hybrid approaches for sentiment analysis

Wiegand, Michael

Please use this identifier to cite or link to this item: doi:10.22028/D291-22705

Title:	Hybrid approaches for sentiment analysis
Other Titles:	Hybridansätze für die Sentimentanalyse
Author(s):	Wiegand, Michael
Language:	English
Year of Publication:	2011
SWD key words:	Computerlinguistik Maschinelles Lernen
Free key words:	sentiment analysis computational linguistics text classification information extraction machine learning
DDC notations:	400 Language, linguistics
Publikation type:	Dissertation
Abstract:	Sentiment Analysis is the task of extracting and classifying opinionated content in natural language texts. Common subtasks are the distinction between opinionated and factual texts, the classification of polarity in opinionated texts, and the extraction of the participating entities of an opinion(-event), i.e. the source from which an opinion emanates and the target towards which it is directed. With the emerging Web 2.0 which describes the shift towards a highly user-interactive communication medium, the amount of subjective content on the World Wide Web is steadily increasing. Thus, there is a growing need for automatically processing this type of content which is provided by sentiment analysis. Both natural language processing, which is the task of providing computational methods for the analysis and representation of natural language, and machine learning, which is the task of building task-specific classification models on the basis of empirical data, may be instrumental in mastering the challenges of the automatic sentiment analysis of written text. Many problems in sentiment analysis have been proposed to be solved with machine learning methods exclusively using a fairly low-level feature design, such as bag of words, containing little linguistic information. In this thesis, we examine the effectiveness of linguistic features in various subtasks of sentiment analysis. Thus, we heavily draw from the insights gained by natural language processing. The application of linguistic features can be applied on various classification methods, be it in rule-based classification, where the linguistic features are directly encoded as a classifier, in supervised machine learning, where these features complement basic low-level features, or in bootstrapping methods, where these features form a rule-based classifier generating a labeled training set from which a supervised classifier can be trained. In this thesis, we will in particular focus on scenarios where the combination of linguistic features and machine learning methods is effective. We will look at common text classification tasks, both coarse-grained and fine-grained, and extraction tasks. Sentimentanalyse beschreibt die Aufgabe, Meinungen aus natürlich-sprachlichem Text zu extrahieren bzw. deren Inhalt zu klassifizieren. Übliche Teilaufgaben sind die Unterscheidung zwischen sachbezogenem Text und Meinung, die Klassifikation von Polarität (einer Meinung), sowie die Extraktion von Entitäten, die an einer Meinung beteiligt sind, d.h. der Ursprung, von dem eine Meinung ausgeht, und das Ziel, auf das sich eine Meinung richtet. Mit dem Aufkommen des Web 2.0, das den Übergang des Internets zu einem hochgradig interaktiven Kommunikationsmedium beschreibt, ist parallel auch der Anteil an subjektiven Inhalten im Netz gestiegen. Dadurch wächst logischerweise auch der Bedarf an automatischen Verfahren, die die Aufgaben der Sentimentanalyse unterstützen. Bei der Bewältigung der automatischen Sentimentanalyse geschriebener Sprache sind sowohl die natürliche Sprachverarbeitung, die berechenbare Modelle für die Analyse und Repräsentation natürlicher Sprache bereitstellt, als auch maschinelle Lernverfahren, die aufgabenspezifische Klassifikationsmodelle auf der Basis von empirischen Daten liefern, hilfreich. Viele Probleme in der Sentimentanalyse können mit Standardmethoden aus dem maschinellen Lernen, die sich hauptsächlich auf elementares Merkmalsdesign stützen (wie z.B. Bag of Words, die nur sehr begrenzt linguistische Information kodieren), gelöst werden. In dieser Dissertation soll die Nutzbarkeit von linguistischen Merkmalen in unterschiedlichen Teilaufgaben in der Sentimentanalyse untersucht werden. Hierbei stützen wir uns vorwiegend auf Erkenntnisse der natürlichen Sprachverarbeitung. Linguistische Merkmale können in den unterschiedlichsten Klassifikationsmethoden Anwendung finden, sei es in rein regelbasierten Klassifikationsverfahren, bei denen die Merkmale direkt als Klassifikator kodiert werden, in überwachten Lernverfahren, bei denen diese Merkmale Standardmerkmale (z.B. Bag of Words) ergänzen, oder aber auch in Bootstrappingverfahren, bei denen die Merkmale Bestandteil eines regelbasierten Klassifikators sein können, der ein annotiertes Trainingsset generiert, auf dem wiederum einfache überwachte Klassifikatoren trainiert werden können. In dieser Dissertation werden wir uns vorwiegend auf Szenarien beschränken, bei denen eine Kombination aus linguistischen Merkmalen und maschinellem Lernen vorteilhaft ist. Wir werden Textklassifikationsaufgaben (sowohl grob-körnig als auch fein-körnig) und Extraktionsaufgaben betrachten.
Link to this record:	urn:nbn:de:bsz:291-scidok-38820 hdl:20.500.11880/22761 http://dx.doi.org/10.22028/D291-22705
Advisor:	Klakow, Dietrich
Date of oral examination:	21-Jan-2011
Date of registration:	19-Apr-2011
Faculty:	NT - Naturwissenschaftlich- Technische Fakultät
Department:	NT - Systems Engineering
Former Department:	bis SS 2016: Fachrichtung 7.4 - Mechatronik
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
thesis_finalNoCV.pdf		1,06 MB	Adobe PDF	View/Open

Export: BibTex