Connectionist language production : distributed representations and the uniform information density hypothesis

Calvillo, Jesús

Please use this identifier to cite or link to this item: doi:10.22028/D291-27934

Title:	Connectionist language production : distributed representations and the uniform information density hypothesis
Author(s):	Calvillo, Jesús
Language:	English
Year of Publication:	2019
Free key words:	neural networks language production semantics syntax
DDC notations:	400 Language, linguistics 620 Engineering and machine engineering
Publikation type:	Dissertation
Abstract:	This dissertation approaches the task of modeling human sentence production from a connectionist point of view, and using distributed semantic representations. The main questions it tries to address are: (i) whether the distributed semantic representations defined by Frank et al. (2009) are suitable to model sentence production using artificial neural networks, (ii) the behavior and internal mechanism of a model that uses this representations and recurrent neural networks, and (iii) a mechanistic account of the Uniform Information Density Hypothesis (UID; Jaeger, 2006; Levy and Jaeger, 2007). Regarding the first point, the semantic representations of Frank et al. (2009), called situation vectors are points in a vector space where each vector contains information about the observations in which an event and a corresponding sentence are true. These representations have been successfully used to model language comprehension (e.g., Frank et al., 2009; Venhuizen et al., 2018). During the construction of these vectors, however, a dimensionality reduction process introduces some loss of information, which causes some aspects to be no longer recognizable, reducing the performance of a model that utilizes them. In order to address this issue, belief vectors are introduced, which could be regarded as an alternative way to obtain semantic representations of manageable dimensionality. These two types of representations (situation and belief vectors) are evaluated using them as input for a sentence production model that implements an extension of a Simple Recurrent Neural network (Elman, 1990). This model was tested under different conditions corresponding to different levels of systematicity, which is the ability of a model to generalize from a set of known items to a set of novel ones. Systematicity is an essential attribute that a model of sentence processing has to possess, considering that the number of sentences that can be generated for a given language is infinite, and therefore it is not feasible to memorize all possible message-sentence pairs. The results showed that the model was able to generalize with a very high performance in all test conditions, demonstrating a systematic behavior. Furthermore, the errors that it elicited were related to very similar semantic representations, reflecting the speech error literature, which states that speech errors involve elements with semantic or phonological similarity. This result further demonstrates the systematic behavior of the model, as it processes similar semantic representations in a similar way, even if they are new to the model. Regarding the second point, the sentence production model was analyzed in two different ways. First, by looking at the sentences it produces, including the errors elicited, highlighting difficulties and preferences of the model. The results revealed that the model learns the syntactic patterns of the language, reflecting its statistical nature, and that its main difficulty is related to very similar semantic representations, sometimes producing unintended sentences that are however very semantically related to the intended ones. Second, the connection weights and activation patterns of the model were also analyzed, reaching an algorithmic account of the internal processing of the model. According to this, the input semantic representation activates the words that are related to its content, giving an idea of their order by providing relatively more activation to words that are likely to appear early in the sentence. Then, at each time step the word that was previously produced activates syntactic and semantic constraints on the next word productions, while the context units of the recurrence preserve information through time, allowing the model to enforce long distance dependencies. We propose that these results can inform about the internal processing of models with similar architecture. Regarding the third point, an extension of the model is proposed with the goal of modeling UID. According to UID, language production is an efficient process affected by a tendency to produce linguistic units distributing the information as uniformly as possible and close to the capacity of the communication channel, given the encoding possibilities of the language, thus optimizing the amount of information that is transmitted per time unit. This extension of the model approaches UID by balancing two different production strategies: one where the model produces the word with highest probability given the semantics and the previously produced words, and another one where the model produces the word that would minimize the sentence length given the semantic representation and the previously produced words. By combining these two strategies, the model was able to produce sentences with different levels of information density and uniformity, providing a first step to model UID at the algorithmic level of analysis. In sum, the results show that the distributed semantic representations of Frank et al. (2009) can be used to model sentence production, exhibiting systematicity. Moreover, an algorithmic account of the internal behavior of the model was reached, with the potential to generalize to other models with similar architecture. Finally, a model of UID is presented, highlighting some important aspects about UID that need to be addressed in order to go from the formulation of UID at the computational level of analysis to a mechanistic account at the algorithmic level. Diese Dissertation widmet sich der Aufgabe, die menschliche Satzproduktion aus konnektionistischer Sicht zu modellieren und dabei verteilte semantische Repräsentationen zu verwenden. Die Schwerpunkte werden dabei sein: (i) die Frage, ob die von Frank et al. (2009) definierten verteilten semantischen Repräsentationen geeignet sind, um die Satzproduktion unter Verwendung künstlicher neuronaler Netze zu modellieren; (ii) das Verhalten und der interne Mechanismus eines Modells, das diese Repräsentationen und rekurrente neuronale Netze verwendet; (iii) eine mechanistische Darstellung der Uniform Information Density Hypothesis (UID; Jaeger, 2006; Levy and Jaeger, 2007). Zunächst sei angenommen, dass die Repräsentationen von Frank et al. (2009), genannt Situation Vektoren, Punkte in einem Vektorraum sind, in dem jeder Vektor Informationen über Beobachtungen enthält, in denen ein Ereignis und ein entsprechender Satz wahr sind. Diese Repräsentationen wurden erfolgreich verwendet, um Sprachverständnis zu modellieren (z.B. Frank et al., 2009; Venhuizen et al., 2018). Während der Konstruktion dieser Vektoren führt ein Prozess der Dimensionsreduktion jedoch zu einem gewissen Informationsverlust, wodurch einige Aspekte verloren gehen. Um das Problem zu lösen, werden als Alternative Belief Vektoren eingeführt. Diese beiden Arten der Repräsentation werden ausgewertet, indem sie als Eingabe für ein Satzproduktionsmodell verwendet werden, welches als Erweiterung eines Simple Recurrent Neural Network (SRN, Elman, 1990) implementiert wurden. Dieses Modell wird unter verschiedenen Bedingungen getestet, die verschiedenen Ebenen der Systematizität entsprechen, d.h. der Fähigkeit eines Modells, von einer Menge bekannter Elemente auf eine Menge neuer Elemente zu verallgemeinern. Systematizität ist ein wesentliches Attribut, das ein Modell der Satzverarbeitung besitzen muss, wenn man bedenkt, dass die Anzahl der Sätze, die in einer bestimmte Sprache erzeugt werden können, unendlich ist und es daher nicht möglich ist, sich alle möglichen Nachrichten-Satz-Paare zu merken. Die Ergebnisse zeigen, dass das Modell in der Lage ist, unter allen Testbedingungen erfolgreich zu generalisieren und dabei ein systematisches Verhalten zeigt. Darüber hinaus weisen die verbleibenden Fehler starke Ähnlichkeit zu anderen semantischen Repräsentationen auf. Dies findet sich in der Literatur zu Sprachfehlern wider, wo es heißt, dass Fehler Elemente semantischer oder phonologischer Ähnlichkeit beinhalten. Dieses Ergebnis beweist das v systematische Verhalten des Modells, da es ähnliche semantische Repräsentationen in ähnlicher Weise verarbeitet, auch wenn sie dem Modell unbekannt sind. Zweitens wurde das Satzproduktionsmodell auf zwei verschiedene Arten analysiert. (i) Indem man sich die von ihm erzeugten Sätze ansieht, einschließlich der aufgetretenen Fehler, und dabei die Schwierigkeiten und Präferenzen des Modells hervorhebt. Die Ergebnisse zeigen, dass das Modell die syntaktischen Muster der Sprache lernt. Darüber hinaus zeigt sich, dass die verbleibenden Probleme im Wesentlichen mit sehr ähnlichen semantischen Repräsentationen zusammenhängen, die manchmal ungewollte Sätze produzieren, welche jedoch semantisch nah an den beabsichtigten Sätzen liegen. (ii) Indem die Verbindungsgewichte und Aktivierungsmuster des Modells analysiert und eine algorithmische Darstellung der internen Verarbeitung erzielt wird. Demnach aktiviert die semantische Eingangsrepräsentation jene Wörter, mit denen sie inhaltlich zusammenhängt. In diesem Zusammenhang wird ein Ranking erzeugt, weil Wörter, die wahrscheinlich früh im Satz erscheinen eine stärkere Aktivierung erfahren. Im nächsten Schritt aktiviert das zuvor produzierte Wort syntaktische und semantische Einschränkungen der nächsten Wortproduktionen. Derweil speichern Kontext-Einheiten Informationen für einen längeren Zeitraum, und ermöglichen es dem Modell so, längere Abhängigkeiten zu realisieren. Nach unserem Verständnis können diese Erkenntnisse als Erklärungsgrundlage für andere, verwandte Modelle herangezogen werden. Drittens wird eine Erweiterung des Modells vorgeschlagen, um die UID nachzubilden. Laut UID ist die Sprachproduktion ein effizienter Prozess, der von der Tendenz geprägt ist, linguistische Einheiten zu produzieren, die Informationen so einheitlich wie möglich verteilen, und dabei die Kapazität des Kommunikationskanals vor dem Hintergrund der sprachlichen Kodierungsmöglichkeiten ausreizt, wodurch die Menge der pro Zeiteinheit übertragenen Informationen maximiert wird. Dies wird in der Erweiterung umgesetzt, indem zwei verschiedene Strategien der Wortproduktion gegeneinander ausgespielt werden: Wähle das Wort (i) mit der höchsten Wahrscheinlichkeit unter den zuvor produzierten Wörtern; oder (ii) welches die Satzlänge minimiert. Durch die Kombination dieser beiden Strategien ist das Modell in der Lage, Sätze unter Vorgabe der Informationsdichte und -verteilung zu erzeugen, was einer ersten Modellierung der UID auf algorithmischer Ebene gleichkommt. Zusammenfassend zeigen die Resultate, dass die verteilten semantischen Repräsentationen von Frank et al. (2009) für die Satzproduktion verwendet werden können und dabei Systematizität beobachtet werden kann. Darüber hinaus wird eine algorithmische Erklärung der internen Mechanismen des Modells geliefert. Schließlich wird ein Modell der UID vorgestellt, das einen ersten Schritt zu einer mechanistischen Darstellung auf der algorithmischen Ebene der Analyse darstellt.
Link to this record:	urn:nbn:de:bsz:291--ds-279340 hdl:20.500.11880/27442 http://dx.doi.org/10.22028/D291-27934
Advisor:	Crocker, Matthew W.
Date of oral examination:	20-May-2019
Date of registration:	27-May-2019
Faculty:	P - Philosophische Fakultät
Department:	P - Sprachwissenschaft und Sprachtechnologie
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
calvillo-dissertation.pdf	file of the dissertation	4,43 MB	Adobe PDF	View/Open

Export: BibTex

This item is licensed under a Creative Commons License