Please use this identifier to cite or link to this item: doi:10.22028/D291-38086
Title: Generating linguistically relevant metadata for the Royal Society Corpus
Author(s): Menzel, Katrin
Knappen, Jörg
Teich, Elke
Language: English
Title: Research in Corpus Linguistics
Volume: 9
Issue: 1
Pages: 1-18
Publisher/Platform: Asociación Española de Lingüística de Corpus
Year of Publication: 2021
Free key words: corpus building and extension
specialized diachronic corpora
written scientific English discourse
Royal Society Corpus
register-based metadata
DDC notations: 400 Language, linguistics
Publikation type: Journal Article
Abstract: This paper provides an overview on metadata generation and management for the Royal Society Corpus (RSC), aiming to encourage discussion about the specific challenges in building substantial diachronic corpora intended to be used for linguistic and humanistic analysis. We discuss the motivations and goals of building the corpus, describe its composition and present the types of metadata it contains. Specifically, we tackle two challenges: first, integration of original metadata from the data providers (JSTOR and the Royal Society); second, derivation of additional linguistically relevant metadata regarding text structure and situational context (register).
DOI of the first publication: 10.32714/ricl.09.01.02
Link to this record: urn:nbn:de:bsz:291--ds-380860
hdl:20.500.11880/34433
http://dx.doi.org/10.22028/D291-38086
ISSN: 2243-4712
Date of registration: 22-Nov-2022
Faculty: P - Philosophische Fakultät
Department: P - Sprachwissenschaft und Sprachtechnologie
Professorship: P - Prof. Dr. Elke Teich
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
158-Article Text-969-2-10-20211002.pdf389,55 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons