Please use this identifier to cite or link to this item: doi:10.22028/D291-30973
Volltext verfügbar? / Dokumentlieferung
Title: POS tag perplexity as a measure of syntactic complexity
Author(s): von Prince, Kilu
Demberg, Vera
Editor(s): Berdicevskis, Aleksandrs
Bentz, Christian
Language: English
Title: Proceedings of the First Shared Task on Measuring Language Complexity
Startpage: 20
Endpage: 25
Year of Publication: 2018
Title of the Conference: EvoLang 2018
Place of the conference: Toruń, Poland
Publikation type: Conference Paper
Abstract: Comparing languages of the world with respect to their complexity is a long-standing open question in linguistics. We here focus on syntactic complexity, aconcept that has been particularly hard to address due to the lack of readily avail-able syntactically annotated corpora and the intricacies of syntactic theories. Wepropose to use a simple information-theoretic measure, perplexity, on the POS tagsequence of texts. Perplexity captures how predictable POS tags are on averagegiven their recent co-texts. Calculating perplexity based on POS tag sequenceshelps us to abstract away from morphological or lexical features of the language,in order to get at the predictability of word order. In this paper, we compare POStag perplexity to other recently proposed measures of syntactic complexity, andevaluate measures by correlating them with expert-proposed scores of syntacticflexibility (Bakker 1998).
URL of the first publication: http://www.christianbentz.de/MLC2018/Prince_Demberg.pdf
Link to this record: hdl:20.500.11880/29760
http://dx.doi.org/10.22028/D291-30973
ISBN: 978-91-639-7435-9
Date of registration: 28-Sep-2020
Notes: Beitrag des Workshops "Measuring Language Complexity (MLC)"
Faculty: MI - Fakultät für Mathematik und Informatik
Department: MI - Informatik
Professorship: MI - Prof. Dr. Vera Demberg
Collections:Die Universitätsbibliographie

Files for this record:
There are no files associated with this item.


Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.