Please use this identifier to cite or link to this item: doi:10.22028/D291-45921
Title: Robustness of large language models in moral judgements
Author(s): Oh, Soyoung
Demberg, Vera
Language: English
Title: Royal Society Open Science
Volume: 12
Issue: 4
Publisher/Platform: The Royal Society
Year of Publication: 2025
Free key words: large language model
moral reasoning
robustness
DDC notations: 400 Language, linguistics
Publikation type: Journal Article
Abstract: With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclu sions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indi cate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.
DOI of the first publication: 10.1098/rsos.241229
URL of the first publication: https://doi.org/10.1098/rsos.241229
Link to this record: urn:nbn:de:bsz:291--ds-459215
hdl:20.500.11880/40296
http://dx.doi.org/10.22028/D291-45921
ISSN: 2054-5703
Date of registration: 28-Jul-2025
Faculty: P - Philosophische Fakultät
Department: P - Sprachwissenschaft und Sprachtechnologie
Professorship: P - Keiner Professur zugeordnet
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
oh-demberg-robustness-of-large-language-models-in-moral-judgements.pdf2,21 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons