Findings from giving 15 LLMs personality disorder tests

2 hours ago 2

Harrison Ford in Blade Runner giving a Voight-Kampff Test

I gave 15 popular LLM models the PID-5 personality assessment at least 10 times each. 9 out of 15 LLMs had clinically significant personality findings. I'm not saying we should diagnose those LLMs with conditions from the DSM, but it definitely feels concerning.

Graph of negative affect scores for 15 LLM models

You can explore the data along with human benchmarks and clinically significantly cutoffs at https://www.personalitybenchmark.ai

Why

AI labs and technologists (myself included) expect that AI models and agentic systems will become coworkers, employees, collaboration partners, companions, care givers and friends. Personal experience and social science research points to how those we spend time with affect our feelings and behaviors. If we wish to understand that effect we must have measures. This work is an attempt to begin that measurement.

As AI and Agentic systems become part of our everyday life their personality will have more of an impact on our own thoughts beliefs and personality. Instead of blindly adopting daily usage of a technology that will have dramatic and potentially negative impact on our mental health I suggest we conduct a more deliberate study of how these systems perform in the emotional benchmarks.

By having a measure of the AI systems' presented personality, and how that personality can be steered, we will be better able to engineer safe, effective and productive systems.

Lets Chat

Please get in touch if you are a:

  • social science researcher and are interested in collaborating in turning this into publication(s)
  • ai/ml researcher looking to collaborate on engineering how to steer personality with guardrails
  • conference organizer looking for a presentation of this work at your conference

You can find me at Kam Lasater or [email protected]

See you in the comments.

Read Entire Article