[Submitted on 11 Apr 2025 (v1), last revised 17 Apr 2025 (this version, v2)]
Abstract:Generative agents have been increasingly used to simulate human behaviour in silico, driven by large language models (LLMs). These simulacra serve as sandboxes for studying human behaviour without compromising privacy or safety. However, it remains unclear whether such agents can truly represent real individuals. This work compares survey data from the Understanding America Study (UAS) on healthcare decision-making with simulated responses from generative agents. Using demographic-based prompt engineering, we create digital twins of survey respondents and analyse how well different LLMs reproduce real-world behaviours. Our findings show that some LLMs fail to reflect realistic decision-making, such as predicting universal vaccine acceptance. However, Llama 3 captures variations across race and Income more accurately but also introduces biases not present in the UAS data. This study highlights the potential of generative agents for behavioural research while underscoring the risks of bias from both LLMs and prompting strategies.Submission history
From: Yonchanok Khaokaew [view email]
[v1]
Fri, 11 Apr 2025 05:11:40 UTC (860 KB)
[v2]
Thu, 17 Apr 2025 01:36:57 UTC (860 KB)