Please use this identifier to cite or link to this item: doi:10.22028/D291-37888
Title: Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German
Author(s): Krielke, Marie-Pauline
Talamo, Luigi
Fawzi, Mahmoud
Knappen, Jörg
Language: English
Title: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Pages: 4808-4816
Publisher/Platform: European Language Resources Association
Year of Publication: 2022
Place of publication: Marseille, France
Place of the conference: Marseille, France
Free key words: universal dependencies
evaluation
English-German contrastive
diachronic linguistics
scientific language
DDC notations: 420 English
430 German
Publikation type: Conference Paper
Abstract: We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.--19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.
URL of the first publication: https://aclanthology.org/2022.lrec-1.514
Link to this record: urn:nbn:de:bsz:291--ds-378883
hdl:20.500.11880/34331
http://dx.doi.org/10.22028/D291-37888
ISBN: 979-10-95546-72-6
Date of registration: 14-Nov-2022
Third-party funds sponsorship: This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 1102.
Faculty: P - Philosophische Fakultät
Department: P - Sprachwissenschaft und Sprachtechnologie
Professorship: P - Prof. Dr. Elke Teich
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
2022.lrec-1.514.pdf350,18 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons