Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Large language models (LLMs) as jurors: Assessing the potential of LLMs in legal contexts.

Sun, Yongjie; Zappalà, Angelo; Di Maso, Eleonora; Pompedda, Francesco; Nyman, Thomas J.; Santtila, Pekka

Large language models (LLMs) as jurors: Assessing the potential of LLMs in legal contexts.

Sun, Yongjie
Zappalà, Angelo
Di Maso, Eleonora
Pompedda, Francesco
Nyman, Thomas J.
Santtila, Pekka
Katso/Avaa
(accepted with APA wording) LLMs as Jurors Assessing the Potential of Large Language Models in Legal Contexts. Law and Human Behaviour.docx (821.7Kb)
Lataukset: 

American Psychological Association (APA)
doi:10.1037/lhb0000620
URI
https://doi.org/10.1037/lhb0000620
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe202601217326
Tiivistelmä

Objective

We explored the potential of large language models (LLMs) in legal decision making by replicating Fraser et al. (2023) mock jury experiment using LLMs (GPT-4o, Claude 3.5 Sonnet, and GPT-o1) as decision makers. We investigated LLMs’ reactions to factors that influenced human jurors, including defendant race, social status, number of allegations, and reporting delay in sexual assault cases.

Hypotheses

We hypothesized that LLMs would show higher consistency than humans, with no explicit but potential implicit biases. We also examined potential mediating factors (race-crime congruence, credibility, black sheep effect) and moderating effects (beliefs about traumatic memory, ease of reporting) explaining LLM decision making.

Method

Using a 2 × 2 × 2 × 3 factorial design, we manipulated defendant race (Black/White), social status (low/high), number of allegations (one/five), and reporting delay (5/20/35 years), collecting 2,304 responses across conditions. LLMs were prompted to act as jurors, providing probability of guilt assessments (0–100), dichotomous verdicts, and responses to mediator and moderator variables.

Results

LLMs showed higher average probability of guilt assessments compared with humans (63.56 vs. 58.82) but were more conservative in rendering guilty verdicts (21% vs. 49%). Similar to humans, LLMs demonstrated bias against White defendants and increased guilt attributions with multiple allegations. Unlike humans, who showed minimal effects of reporting delay, LLMs assigned higher guilt probabilities to cases with shorter reporting delays. Mediation analyses revealed that race-crime stereotype congruency and the black sheep effect partially mediated the racial bias effect, whereas perceived memory strength mediated the reporting delay effect.

Conclusions

Although LLMs may offer more consistent decision making, they are not immune to biases and may interpret certain case factors differently from human jurors.

Kokoelmat
  • Rinnakkaistallenteet [29337]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste