Reading Smiles: Proxy Bias in Foundation Models for Facial Emotion Recognition

3 hours ago 2

[Submitted on 23 Jun 2025]

View PDF HTML (experimental)

Abstract:Foundation Models (FMs) are rapidly transforming Affective Computing (AC), with Vision Language Models (VLMs) now capable of recognising emotions in zero shot settings. This paper probes a critical but underexplored question: what visual cues do these models rely on to infer affect, and are these cues psychologically grounded or superficially learnt? We benchmark varying scale VLMs on a teeth annotated subset of AffectNet dataset and find consistent performance shifts depending on the presence of visible teeth. Through structured introspection of, the best-performing model, i.e., GPT-4o, we show that facial attributes like eyebrow position drive much of its affective reasoning, revealing a high degree of internal consistency in its valence-arousal predictions. These patterns highlight the emergent nature of FMs behaviour, but also reveal risks: shortcut learning, bias, and fairness issues especially in sensitive domains like mental health and education.

Submission history

From: Iosif Tsangko [view email]
[v1] Mon, 23 Jun 2025 19:56:30 UTC (827 KB)

Read Entire Article