Introduction
A core tenet of adaptive behavior is that rewards reinforce actions, while punishments suppress them. This tenet underpins shaping of individual choices, artificial intelligence and machine learning, as well as societal systems designed to promote co-operative behavior1,2,3. Yet, this tenet fails to explain why many individuals persist in punished behaviors, incurring profound costs from their choices4,5. Such punishment insensitivity can manifest early in life6, predict adverse outcomes across health and well-being8,7,9, and resist interventions ranging from fines to incarceration10,11. Why do some individuals adaptively avoid harm through experience, while others exhibit maladaptive behavioral persistence?
Current explanations for this variation in punishment sensitivity remain fragmented. Studies of punishment sensitivity have focused on learning in clinical populations, incarcerated individuals, individuals with addiction, brain lesions, or neurodegeneration10,11,12,13,14,15, leaving the spectrum of individual differences in the wider population poorly mapped. Compounding this, fundamental questions about punishment sensitivity persist. The mechanisms that drive these differences between individuals remain poorly understood, so it remains unclear whether punishment-insensitive individuals fail to learn from consequences, misvalue consequences, or struggle to act on their acquired knowledge about punishment16,17,18,19. It is also unclear whether observed variations in punishment sensitivity reflect stable phenotypic differences in learning and decision-making or random fluctuations in behavior20. Mechanistic answers to these questions are essential to improving decision-making in contexts ranging from public health campaigns to cognitive-behavioral therapies.
Here, we address these gaps through three advances. First, we assess punishment learning in a general population sample spanning 24 countries. Second, we employ a task4,5,21,22 that precisely isolates what individuals learn from punishment. Third, we test whether differences between individuals predict learning and decision-making over a six-month interval. We identify robust phenotypes that reflect stable, trait-like differences in causal inference and cognitive-affective-behavioral integration, highlighting the need for approaches to address the cognitive barriers to adaptive decision-making.
Methods
The study was approved by UNSW Human Research Ethics Advisory Panel C (HREAP-C #3385). Informed consent was obtained from all participants. This study was not pre-registered.
Design
Data collection took place in two stages: an initial test and a retest set six months later. During each stage, participants filled out self-report questionnaires, and played six 3 min blocks of a Planets & Pirates game5,22 interspersed with post-block self-report measures (see Post-block measures) and task information (see Procedures).
Game design
The game is based on the Fixed Utility protocol22. Two continuously-displayed planets (R1, R2) that could be “traded” with (clicked on) for point rewards. In initial pre-punishment blocks, R1 and R2 clicks yielded rewards (+100 points) with 50% probability (2 sec countdown). Rewards for R1 and R2 responses were independent; both planets could be in countdown simultaneously so maximising point gain during pre-punishment involved clicking on both planets.
In later punishment blocks, responses continued to yield rewards but could also trigger ship cues (CS+ , CS–). R1 exclusively triggered CS+ while R2 exclusively triggered CS–. CS+ led to an “Attack” (substantial point loss), whereas CS– had no negative consequence. Despite the ongoing reward contingency, the R1 → CS+ →Attack contingency meant R1 responses generally had negative utility during punishment blocks, while the safe R2 → CS– contingency meant R2 responses maintained its positive utility. Therefore, maximising point gain in punishment blocks involved avoiding R1 responses in favor of R2 responses. A shield button was present during ship presentations, but was only available to be selected on 50% of ship trials. If selected (-50 point cost), the shield prevented an upcoming attack. This mitigated, but did not nullify the overall negative utility of R1 responses. Given the relative rarity of shield events (many participants did not receive the opportunity to use a shield for a given CS within a block), shield measures were not examined.
Following 3 blocks of punishment, R1 → CS+ →Attack and R2 → CS– contingencies were revealed to participants (see Apparatus and Stimuli), and they were given a final punishment block to assess effects of this reveal on behavior and cognitions.
Participants were randomly allocated to 10% or 40% Action-CS probability groups at the start of the initial test. Planet clicks triggered CSs with 10% or 40% probability for 10% and 40% groups, respectively. To match the negative utility of punished R1 responses across probability groups, attacks caused –40% and –10% point loss (of accumulated points) for 10% and 40% groups, respectively.
Retest
Individuals that passed initial test engagement checks (see Participants) were invited undertake the retest. All participants were allocated to their original probability group, but other counterbalancing parameters were randomised (e.g., punished R1 = left vs. right planet, CS+ = ship Type 1 vs. 2). All other aspects of retest (design, procedure) were identical to the initial test.
Participants
Participants were recruited from the online research platform Prolific23,24. Given that understanding task instructions was essential for participation, self-reported fluency in English was a selection criterion. Participants were reimbursed pro-rata with monetary compensation for their time (£9/hour). Participants were informed that Top 10% scorers would be receive a monetary bonus. Bonuses (additional £10) were determined and distributed after each stage of data collection (initial test and retest).
Participants were excluded from analyses if they failed either of two engagement checks: (1) failing to give correct responses to two catch questions embedded in the questionnaire battery (see Self-reported trait questionnaires for details); or (2) answering post-block measures too quickly or slowly (<0.8 s or >30 s per question, averaged per page). These were included to minimize the issue of inattentive participants driving reported effects25. For the initial test, 267 participants (aged 18-63 years; 118 female, 143 male, 6 other) met inclusion criteria. For retest, 128 participants (aged 20–60 years; 59 female, 67 male, 2 other) met inclusion criteria.
Apparatus and stimuli
The experiment was programmed using the jsPsych library26. Experiment code can be found at https://github.com/philjrdb/HCP-Test-Retest and https://zenodo.org/records/155819599.
Game interface
During game blocks, participants had control of a custom mouse pointer that turned dark when clicking (visual feedback). Two planets were continuously displayed center-left and center-right of the screen. The color (orange vs. blue) and location (left vs. right) of punished R1 vs. unpunished R2 planets was counterbalanced across participants. A green ring appeared around a planet whenever the pointer hovered over it (visual feedback). Clicks on either planet (R1, R2) triggered a trading signal cue (2 s), displayed directly beneath each planet. Trading signal terminated with reward (“Success! +100” in green text) or no reward (“Trade attempt failed” in yellow text) outcome, displayed directly above each planet. Accumulated points were continuously displayed top-center of the screen.
“Incoming ship” icons (Type I [turquoise], Type II [purple]) were presented in the upper-middle part of the screen. A countdown timer (6secs) was presented immediately below the ship icon. After 6secs, ship outcomes (attack [“Attack!” with point loss amount in red text]; nothing [“Ship passed by without incident” in green text]) were presented for 2 s below the ship icon. A shield indicator/button was displayed in the lower-middle part of the screen during ship presentations; this button indicated if a shield was available or unavailable during a given ship trial. If available, the button could be clicked to “activate” the shield (“-50 points” cost shown above button). If activated during a CS+ trial, the subsequent attack was prevented (“Attack deflected” in gray text shown in place of attack outcome), whereas activating a shield had no effect on CS– outcomes.
Reveal
Prior to the last punishment block, task contingencies were revealed to participants. This involved a page describing and depicting R1 → CS+ → Attack and R2 → CS– as follows:
“Local intel has determined where the pirates are coming from!
Your signals to the [blue/orange] planet ([left/right] side) have been attracting pirate ships (Ship: [Type 1/Type 2]), that have been stealing your points!
Your signals to the [orange/blue] planet ([right/left] side) have only been attracting friendly ships (Ship: [Type 2/Type 1]).”
A depiction of each contingency (R1 → CS+ → Attack; R2 → CS– [using relevant planet/ship/attack icons and arrows]) was displayed beneath each contingency statement.
Post-block measures
Valuations, inferences, preference estimates, and preference endorsements were polled via labeled 100-point sliders after each game block. The default slider position was set at 50 (midpoint).
Valuations
Participants were asked how they felt about game elements (“Very negative” to “Very positive”). Each game element (planets, ships, outcomes) was indicated with an icon and label.
Causal inferences
Participants were asked how often interacting with a game element led to another game element (“Never [0%]” to “Every time [100%]”). Inferences per antecedent (R1, R2, Ship I, Ship II) were assayed on separate pages. The antecedent icon was displayed at the top of the screen, and each slider for each potential consequence (e.g., ships, outcomes) were accompanied by an icon and label.
Preference estimates
Participants were asked to reflect on the most recent block and estimate the proportion of their clicking across planets (“100% Planet A/0% Planet B” to “0% Planet A/100% Planet B”). Relevant planet icons were displayed at the two ends of the slider.
Preference endorsements
Participants were asked to reflect on the optimal strategy for maximizing points in the preceding block (“100% Planet A/0% Planet B” to “0% Planet A/100% Planet B”). Relevant planet icons were displayed at the two ends of the slider.
Self-reported trait questionnaires
Participants completed a self-report questionnaire battery consisting of Cognitive Flexibility Index (CFI)27, Habitual Tendency Questionnaire (HTQ)28, and Alcohol Use Disorders Identification Test (AUDIT)29. Each questionnaire was presented on separate pages. Catch questions were embedded halfway through the CFI (“Select the left-most option, strongly disagree, for this question”) and AUDIT (“Select the fourth option from the left, weekly, for this question”).
CFI
The CFI is a measure of one’s ability to think adaptively when encountering difficult situations27. The questionnaire consists of 20 items rated on a seven-point scale ranging from 1 (“strongly disagree”) to 7 (“strongly agree”). Items are divided into two subscales: “Control”, which measures self-reported tendencies to view challenging situations as relatively controllable, and “Alternatives”, which measures self-reported ability to perceive and generate alternative solutions to a problem. Higher scores indicate greater cognitive flexibility.
HTQ
The HTQ assesses inclinations to enact repeated behavior without conscious thought or effort28. The questionnaire contains 11 items rated on a seven-point scale ranging from 1 (“strongly disagree”) to 7 (“strongly agree”). Items are divided into three subscales: “Compulsivity”, preference for “Regularity”, and “Aversion to novelty”. Higher scores indicate greater habitual tendencies.
AUDIT
The AUDIT is a measure of alcohol consumption patterns over the past 12 months. It consists of 10 items rated on a five-point scale ranging from 0 to 4. Higher scores indicate greater alcohol consumption and risk of problematic drinking.
Procedure
Informed consent and self-reported demographics
Participants were provided with an information sheet with consent form. After informed consent was obtained, participants were asked to report their gender (“Male”, “Female”, “Other”), age, and native language (not used as selection criteria). No data on race/ethnicity was collected.
Self-report questionnaires
Participants received the self-report questionnaire battery. Participants that correctly answered catch questions continued with the experiment, while those that did not were ejected and prevented from re-participating.
Initial instructions
Participants were then informed that they would play a game over 6 blocks and that the goal of the game was to gain as many points as possible. They were informed they could gain points by “trading” with planets by clicking on them. Participants were also informed that top 10% scoring participants would earn an additional monetary prize. Participants could not proceed with the experiment until they passed a brief multiple-choice comprehension test.
Pre-punishment phase
Participants underwent two 3 min game blocks, each followed by relevant post-block measures. During game blocks, R1 and R2 responses could earn point rewards (each +100, 50% probability).
Punishment (pre-reveal) phase
Participants were informed of local pirates that might steal points from them. They were also informed that a shield button could be clicked to prevent attacks, but that it was not always available. Participants were reminded that their goal was to maximise points before proceeding with the experiment. Participants then underwent three 3 min game blocks, each followed relevant post-block measures. During game blocks, R1 responses could yield reward as well as CS+ ships, and R2 responses could yield reward as well as CS– ships. CS+ co-terminated with attack (point loss), whereas CS– had no negative consequence.
Reveal
Participants were informed of R1 → CS+ → Attack and R2 → CS−contingencies (see Apparatus and Stimuli). Encoding of this information was confirmed with a follow-up multiple-choice contingency knowledge test; participants were returned to the previous contingency reveal page until they answered the contingency knowledge test correctly.
Post-reveal phase
After the reveal, participants received a final punishment block with post-reveal measures (identical to pre-reveal punishment).
Post-retest reflections (retest only)
To assess awareness of test-retest performance, participants were asked 3 questions (displayed on separate pages) at the end of the retest experiment: Responses were assessed via 100-point sliders [text displayed beneath 0 and 100, respectively]:
-
1.
“While playing the game, did you remember playing the same game in the past?”
[“I don’t think I have played it before”; “I remembered everything about this game”]
-
2.
“Based on this memory, did you choose to change how you played the game this time?”
[“I did not change anything”; “I completely changed my strategy.”]
-
3.
“Do you think you did better or worse playing the game this time compared to your performance last time?”
[“I did much worse this time”; “I did much better this time.”]
Data analysis
De-identified experiment data is provided at https://osf.io/ju35h/. Data was extracted and pre-processed using custom MATLAB scripts (https://github.com/philjrdb/HCP-Test-Retest, https://zenodo.org/records/155819599). Clustering, chi-square, ANOVAs, t-tests, and stepwise logistic regression were performed in SPSS (version 29.0). Confidence intervals for ANOVA effect sizes (partial eta squared [ηp2]) were calculated in R using apaTables package (version 2.0.8). Where applicable, evidence for the null (BF01) was assessed using Bayesian ANOVA and t tests in jamovi (v.2.6.44.0; jsq module v.1.2.0). Singular Value Decomposition was performed in MATLAB (R2022b). Given no overall differences in pre-punishment data (Fig. S2), all data for pre-punishment blocks were collapsed, such that “Pre” represents averaged pre-punishment phase data.
Task behavior
Participant behavior was assessed via R1 and R2 click rates (clicks/min) during non-CS periods. These rates were used to calculate a self-normalized preference score [(R1 rate/Overall rate)*100] indicating the percentage of clicks that were R1. A score of 50% indicates equal rates of R1 and R2 (i.e., no preference) whereas score of 0 indicates a complete preference for R2 over R1. Differences in behavior (click rates, preferences) were analyzed using orthogonal contrasts (see Contrast analysis below). Significant bias in preference was also determined via one-sample t-tests against the null value of 50.
Clustering
Behavioral phenotypes were auto-identified using TwoStep clustering (Bayesian Information Criterion; Table S1), using final pre-reveal and post-reveal preferences as inputs. Clustering was performed separately for initial test and retest to allow data-driven discovery of phenotypes per test; the same 3 phenotypes (Sensitive, Unaware, Compulsive) were found per test. Phenotype stability (test-retest cluster change) was examined by categorizing individuals as having the same avoidance phenotype (Sensitive→Sensitive, Unaware→Unaware, Compulsive→Compulsive), an improved avoidance phenotype (Unaware→Sensitive, Compulsive→Sensitive, Compulsive→Unaware), or a worse avoidance phenotype (Unaware→Compulsive, Sensitive→Unaware, Sensitive→Compulsive) at retest.
Contrast analysis
Behavior and post-block measure data were analyzed via between- x within-subject ANOVAs using orthogonal contrasts. Where applicable, within-subject contrasts were block (linear), action (R1 vs. R2), CS (CS+ vs. CS − ), and/or inferences (correct vs incorrect R → CS). Block contrasts used all valid blocks ([Pre,Pun1,Pun2,Pun3,Rev]) unless indicated as “Pre-reveal” (applicable [Pre,Pun1,Pun2,Pun3]) or “Reveal” ([Pun3,Rev]). Between-subject factors were cluster (Sensitive vs. Unaware vs. Compulsive), probability group (10% vs. 40%), and/or test-retest cluster change (Same vs. Improve vs. Worse). Where applicable, follow-up analyses were conducted using one-way ANOVAs, pairwise between-subject comparisons (Sidak correction), and/or t-tests of a within-subject factor (e.g. R1 vs. R2) per cluster. In the case of key non-significant results, evidence in favor of the null (BF01) was assessed using Bayesian ANOVA and t-tests. Confidence intervals for ηp2 was calculated at the recommended 90% level30.
Chi-square
Significant interactions between categorical variables (cluster, probability group, sex, retested vs. not) were assessed via Pearson Chi-square (2-sided).
Cognitive-behavioral trajectories
Singular value decomposition (SVD) was used to represent relationships between action-related bias measures (R1:R2 attack inferences, valuation, endorsement, estimate, behavior) per cluster. This approach does not rely on assumptions of normality, homoscedasticity, or directionality. Rather, it holistically captures functional “true-score” relationships between variables within a single analysis.
R1:R2 measures per block per individual in a cluster was used as input for SVD analyses. Given all R1:R2 measures except attack inferences were measured during pre-punishment blocks, the null value of 50 (no bias) was imputed for pre-punishment R1:R2 attack inferences so pre-punishment data could contribute to the analysis. Data per cluster was organized into matrices (rows = individuals’ block data; columns = R1:R2 measures) and centered by subtracting column means. The centered matrix (A) was then decomposed into three matrices (U, Σ, and VT) using MATLAB svd function, such that A = UΣVT. VT contains the principal axes of data covariance (eigenvectors). The prime eigenvector in VT captured the majority of covariance across R1:R2 measures per cluster (Sensitive r 2: 74.0%; Unaware r 2: 73.5%; Sensitive r 2: 57.3% [calculated via MATLAB pca function]), indicating a single linear trajectory captured the majority of cross-measure variance.
The reliability of prime eigenvectors and cluster differences in these was determined by bootstrapping cluster data (resampling subjects with replacement) to obtain 1000 bootstrapped VT per cluster. To maintain sign consistency across bootstraps, if the first element of a bootstrapped VT was negative, that VT was multiplied by -1. Error regions depicted across figures (Fig. 3, S9) were the bounded volume of bootstrapped prime vectors (MATLAB alphaShape [Alpha parameter=10]).
Slope coefficients between pairs of measures was calculated as the ratio of corresponding values in VT (Fig. S4). Differences in slope coefficients between clusters was determined by subtracting bootstrapped slope coefficients for each cluster pair, yielding a distribution of 1000 bootstrapped slope differences per cluster and measure pair. After adjusting each distribution for narrowness bias (distributions extended from true slope differences by factor of \(\sqrt{n/(n-1)}\)) 31,32, the two-tailed p-value was determined as the proportion of percentile confidence intervals containing 0 (i.e., no difference between slopes).
Stepwise logistic regression. Multinomial logistic regression was used to assess whether retest cluster could be predicted from initial test clusters and/or self-report trait measures. Initial test cluster, CFI subscales (test, retest), HTQ subscales (test, retest), AUDIT (test, retest), and all 2-way interactions between variables were allowed to conditionally enter the model (likelihood ratio <0.05). Only initial cluster and intercept were included in the final model (χ2(4) = 32.82, p < 0.001, Nagelkerke r2 = 0.262).
Post-retest reflections
Cluster-related differences in test-retest awareness were assessed using one-way ANOVAs (Fig. S11). Between-subject factors were initial test cluster, retest cluster, or test-retest cluster change. Where applicable, follow-up pairwise between-subject comparisons (Sidak correction) were conducted.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Results
Distinct behavioral phenotypes of punishment sensitivity
Our international sample (N = 267; mean age = 32.17, SD = 10.74; n = 118 female, 143 male, 6 other; Fig. S1) participated in an online game (“Planets & Pirates”) designed to measure punishment learning and decision-making5. During the task, participants could click on two continuously-available planets (R1, R2), with each response rewarded 50% of the time (Fig. 1a). After initial reward training, ship cues were introduced (punishment phase). Responses on R1 triggered a cue (R1 → CS+ ) predicting significant point loss (“attack”), while responses on R2 (R2 → CS) had no further consequences. Participants were randomly assigned to either a high-probability punishment group (40% CS probability; n = 134) or a low-probability group (10% CS probability; n = 133). To equalize the negative utility of R1 across groups, the 40% group experienced mild point loss from attacks (-10% of total points), and the 10% group faced severe loss (-40% of total points). R1 responses generally resulted in net point loss during the punishment blocks, so avoiding R1 was crucial for maximizing gains.
a Conditioned punishment task design. Across game blocks, participants could click on continuously-available planets (R1, R2) for point rewards. During punishment phase (Pun [1–3]), these responses also led to ship cues (CS+ vs. CS–) with 10% or 40% probability (assigned probability group). CS+ led to an “Attack” (-40% or -10% point loss, according to probability group), whereas CS– was safe. The R1 → CS+ → Attack punishment contingency meant R1 had negative utility (matched across probability groups); R1 needed to be avoided to maximize point gain. Contingencies (R1 → CS+ → Attack; R2 → CS–) were revealed to participants before the final punished game block (Rev). b Mean (±SEM) [left panel] and individual [middle, right panels] R1 preferences per behavioral phenotype across blocks. * p < 0.05 one sample t test vs. 50% (no R1:R2 bias). c Mean (±SEM) point gain across blocks. Poor avoidance during punishment was associated with attenuated point gain (Pun 2: F(2, 264) = 5.098, p = 0.007; Pun 3: F(2,264) = 56.243, p < 0.001; Rev: F(2,264) = 76.91, p < 0.001). d Composition of phenotypes by probability group [top panel], and vice versa [bottom panel] (χ2(2) = 22.61, p < 0.001). Stronger contingencies drove individuals towards a Sensitive phenotype (χ2(1) = 16.24, p < 0.001), whereas weaker contingencies drove individuals towards Compulsive (χ2(1) = 4.208, p = 0.040) over Unaware (χ2(1) = 2.162, p = 0.142) phenotype.
Following three punishment blocks, participants were given explicit corrective information regarding task contingencies (i.e., R1 → CS+ → Attack and R2 → CS–). They were also required to pass a contingency knowledge test to confirm they understood this information before completing a final game block (Rev) designed to assess how this information influenced their behavior.
Before punishment, participants showed no preference between R1 and R2 (Fig. 1b; see also Fig. S2) (t(266) = 0.058, p = 0.953, d = 0.004 [95%CI:–0.12,12], BF01 = 14.6). However, as expected, avoidance of punished R1 increased across punishment blocks (F(1,264) = 402.19, p < 0.001). The distribution of punishment avoidance was bimodal (Fig. 1b). Using automated clustering, we identified three distinct behavioral phenotypes: “Sensitive” (n = 70), who learned to avoid R1 before the contingency reveal (Pun3: t(69) = –51.53, p < .001, d = –6.16 [95%CI:–7.2,–5.1]); “Unaware” (n = 126), who only avoided after the reveal (Pun3: t(125) = –1.19, p = 0.238, d = –0.106 [95%CI:–2.8,0.07], BF01 = 5.09; Rev: t(125) = –124.1, p < 0.001, d = –11.06 [95%CI:–12.4,–9.7]); and “Compulsive” (n = 71), who did not substantially avoid R1 even after explicit information was provided (Pun3: t(57) = –1.79, p = 0.078, d = –0.213 [95%CI:–0.45,0.02], BF01 = 1.69; Rev: t(57) = –10.91, p < 0.001, d = –1.30 [95%CI:–1.6,–0.98]). These phenotypes were observed across demographic variables, but Compulsives were over-represented older individuals (χ2(4) = 12.42, p = 0.014) (Fig. S1b).
Phenotypes did not markedly differ in their overall click rates (cluster: F(2264) = 3.317, p = 0.038, ηp2 = 0.025 [90%CI: 0,0.059]; block*cluster: F(2264) = 2.201, p = 0.113, ηp2 = 0.025 [90%CI: 0,0.045]) (Fig. S3a), indicating similar engagement and motivation in the task, but they employed markedly different response strategies (action*cluster [pre-reveal]: F(2,264) = 26.82, p < 0.001, ηp2 = 0.169 [90%CI: 0.102,0.232]). Sensitive participants shifted their responses away from the punished R1 toward the unpunished R2 during punishment blocks (action*block: F(1,69) = 32.58, p < 0.001, ηp2 = 0.321 [90%CI: 0.174,0.443]), whereas Unawares (action*block: F(1,125) < 0.001, p = 0.988, ηp2 < 0.001 [90%CI: 0,1]) and Compulsives (action*block: F(1,70) = 0.182, p = 0.671, ηp2 = 0.003 [90%CI: 0,0.053]) participants did not. This difference had a significant impact on outcomes (block*cluster: F(2264) = 78.618, p < 0.001, ηp2 = 0.229 [90%CI: 0.297,0.436]): Sensitive participants accumulated points during punishment blocks, while Unaware and Compulsive participants generally lost points (Figs. 1c, S3b). Therefore, despite continued effort, the strategies of Unaware and Compulsive phenotypes were counterproductive.
We examined the impact of punishment probability group on behavior. Consistent with expectations, mild but frequent punishment led to more avoidance than severe but infrequent punishment (group [pre-reveal]: F(1265) = 17.18, p < 0.001, ηp2 = 0.061 [90%CI: 0.022,0.112]) (Fig. S3c). Furthermore, punishment probability influenced behavioral phenotype distribution (Fig. 1d). Higher punishment probabilities increased the likelihood of being categorized as Sensitive, while lower probabilities increased the likelihood of being categorized as Compulsive (χ2(2) = 22.61, p < 0.001). Importantly, punishment probability did not have a significant effect after accounting for phenotype (group [pre-reveal]: F(1,261) = 2.226, p = 0.137, ηp2 = 0.008 [90%CI: 0,0.036], BF01 = 4.19; group [Rev]: F(1,261) = 0.002, p = 0.967, ηp2 < 0.001 [90%CI: 0,1], BF01 = 9.4) (Fig. S3c), showing the effect of punishment probability on behavior is mediated by its influence on phenotype.
Punishment phenotypes do not differ in habit formation, outcome valuations, or cue-related learning
These findings show that a significant proportion of individuals fail to modify their behavior to avoid punishment, even after receiving repeated punishers and explicit information on how to do so. To identify the mechanisms underlying these individual differences in learning and decision-making, we first examined several commonly proposed explanations19.
One possibility is that punishment insensitive behavior may arise from habit formation, where actions become automatic and disconnected from their consequences33. Two key characteristics of habits—automaticity and independence from goals—were assessed after each block. Participants estimated their R1:R2 preference for the previous block (preference estimate) and what R1:R2 preference they believed would have maximized their point gain (preference endorsement). Comparing these with actual behavior allowed us to assess participants’ awareness of their own actions and whether their choices aligned with their task goals. Contrary to a habit-based explanation, preference estimates (Fig. 2b) and endorsements (Fig. 2c) closely matched actual behavioral preference (Fig. 2a) across all phenotypes (cluster*measure: BF01 = 15.92). Each phenotype showed accurate awareness of what they chose (estimates) and why they chose it (endorsements), stating their own choices as optimal for maximizing points, even when they were not. This shows that punishment-insensitive behavior was deliberate and self-assured (i.e., goal-directed), not habitual.
Mean (±SEM) R1:R2 behavioral preference (a), post-block estimates of R1:R2 preference (b), and post-block endorsed (perceived optimal) R1:R2 preference (c) per phenotype across blocks. Behavior estimates and endorsements matched actual behavior, suggesting behavior across phenotypes was deliberate, not habit-like. d–f Mean (±SEM) value ratings for point outcomes (reward [Rew], attack [Atk]), CSs (CS+ , CS− ), and actions (R1, R2) across blocks. d All phenotypes reported liking rewards (cluster: F(2,264) = 3.975, p = 0.02) and disliking attacks (cluster: F(2,264) = 0.502, p = 0.606). e All phenotypes rapidly disliked CS+ relative to CS− before contingency reveal (CS*block [pre-reveal]: F(1,264) = 214.38, p < 0.001; CS*block*cluster [pre-reveal]: F(2,264) = 2.691, p = 0.070). f Only Sensitives valued actions differently prior to contingency reveal (action*block*cluster [pre-reveal]: F(2,264) = 102.61, p < 0.001; action [Sensitive]: F(1,69) = 203.2, p < 0.001; [Unaware]: F(1,125) = 2.585, p = 0.110; [Compulsive]: F(1,70) = 1.064, p = 0.306). The reveal drove stronger action revaluation for Unawares than Compulsives (action*block*cluster [Unaware vs. Compulsive]: F(1,195) = 53.709, p < 0.001); Compulsives continued to undervalue unpunished actions after the reveal (p < 0.001 vs. Unawares and Sensitives). g, h Mean (±SEM) causal inferences per phenotype across blocks. g All clusters were broadly aware that CS+ , not CS–, led to Attack before the reveal (CS*block [pre-reveal]: F(1,264) = 60.53, p < 0.001; CS*block*cluster: F(2,264) = 3.282, p = 0.039). h Only Sensitives developed accurate Action→CS (correct ✓ [R1 → CS+ ; R2 → CS–] over incorrect ✗ [R1 → CS–; R2 → CS+ ]) inferences before the reveal (correct*block [Sensitives]: F(1,69) = 147.0, p < .001; [Unaware]: F(1,125) = 3.380, p = 0.068; [Compulsive]: F(1,70) = 2.748, p = 0.102). Contingency reveal drove stronger inference updating for Unawares than Compulsives (correct*block*cluster [Unaware vs. Compulsive]: F(1,195) = 4.816, p = 0.029); Compulsives continued to misattribute attacks to the unpunished action after the reveal ([vs. Unawares, Sensitives]: p < 0.001).
A second possibility is that punishment insensitive behavior may be driven by distortions in value-based decision-making, where individuals overvalue rewards or undervalue punishments, leading to persistent engagement in punished actions34. To test this, we assessed participants’ valuations of rewards and punishments. All phenotypes reported similar levels of liking for rewards (cluster: F(2264) = 3.975, p = 0.02, ηp2 = 0.029 [90%CI: 0.003,0.066]) and disliking for punishments (cluster: F(2264) = 0.502, p = 0.606, ηp2 = 0.004 [90%CI: 0,0.019], BF01 = 14.36) (Fig. 2d), indicating that the differences in punishment avoidance were not due to distorted reward or punishment valuation in the conditions tested.
A third explanation is that individuals may fail to learn about environmental cues (e.g., the CS+ ship cue) that predict punishment and mediate the relationship between actions and point loss35. To investigate this, we measured how participants valued the ship cues (CSs) and their attribution of attacks to these cues. All phenotypes quickly developed a dislike for CS+ over CS− (CS*block [pre-reveal]: F(1,264) = 214.38, p < 0.001, ηp2 = 0.448 [90%CI: 0.377,0.508]; CS*block*cluster [pre-reveal]: F(2,264) = 2.691, p = 0.070, ηp2 = 0.02 [90%CI: 0,0.051], BF01 = 35.43) (Fig. 2e), reflecting an understanding of the Pavlovian CS→Attack contingencies (Fig. 2g). This indicates that all participants readily learned about and appropriately valued the cues predicting point loss. Therefore, the pronounced differences in behavior between phenotypes cannot be attributed to failures in general learning or motivation.
Punishment phenotypes reflect different cognitive-behavioral trajectories
Differences in punishment sensitivity were instead driven by what individuals learned about their actions. Phenotypes exhibited stark differences in how they valued each action during the task (action*block*cluster [pre-reveal]: F(2264) = 102.61, p < 0.001, ηp2 = 0.437 [90%CI: 0.363,0.496]) (Fig. 2f). While actions were valued similarly before punishment (action*cluster: F(2264) = 1.872, p = 0.156, ηp2 = 0.014 [90%CI: 0,0.041], BF01 = 4.59), only the Sensitive group showed a clear preference for the unpunished action (R2) over the punished action (R1) during the punishment blocks (action [Sensitive]: F(1,69) = 203.2, p < 0.001, ηp2 = 0.747 [90%CI: 0.656,0.799]). In contrast, Unaware and Compulsive groups failed to discriminate ([Unaware]: F(1,125) = 2.585, p = 0.110, ηp2 = 0.02 [90%CI: 0,0.077], BF01 = 0.927; [Compulsive]: F(1,70) = 1.064, p = 0.306, ηp2 = 0.015 [90%CI: 0,0.091], BF01 = 7.29). This difference reflected each group’s ability to form accurate Action→CS (correct*cluster [pre-reveal]: F(2,264) = 83.92, p < 0.001, ηp2 = .389 [90%CI: 0.312,0.451]) and Action→Attack inferences (action*cluster: F(2,264) = 38.92, p < 0.001, ηp2 = 0.228 [90%CI: 0.155,0.293]) (Fig. 2h, S3d). Sensitives correctly attributed CS+ and attacks to the punished action (correct [pre-reveal]: F(1,69) = 181.4, p < 0.001, ηp2 = 0.724 [90%CI: 0.627,0.781]; action: F(1,69) = 94.57, p < 0.001, ηp2 = 0.578 [90%CI: 0.446,0.663]), whereas Unawares (correct: F(1,125) = 4.612, p = 0.034, ηp2 = 0.036 [90%CI: 0.001,0.102]; action: F(1,125) = 3.371, p = 0.069, ηp2 = 0.026 [90%CI: 0,0.087], BF01 = 1.68) and Compulsives (correct: F(1,70) = 6.601, p = 0.012, ηp2 = 0.086 [90%CI: 0.010,0.200]; action: F(1,70) = 0.371, p = 0.544, ηp2 = 0.005 [90%CI: 0,0.065], BF01 = 3.96) largely did not. Consequently, Sensitives learned to value the unpunished action, while the other groups did not.
Providing explicit, correct contingency information corrected these knowledge deficits in Unawares, leading to a pronounced revaluation of actions (Fig. 2f) and improved decision-making (Fig. 1b). Although Compulsives showed some cognitive updating (Fig. 2f), it was significantly attenuated (action*block*cluster [Unaware vs. Compulsive]: F(1,195) = 53.71, p < 0.001, ηp2 = 0.216 [90%CI: 0.136,0.295]); they continued to incorrectly attribute punishment to the unpunished action, despite receiving the same corrective information and passing the same assessment of punishment contingency knowledge.
To further understand these differences, we mapped the interrelationships between action-related cognitions and behavior (i.e., R1:R2 punishment inferences, valuations, endorsements, estimates and behavior) across blocks for each phenotype using singular value decomposition36 (Fig. 3). This analysis reveals the latent cognitive-behavioral trajectory of each phenotype to show whether and how phenotypes differ in their cognitive-behavioral integration. We hypothesized that punishment inferences influence action values, which in turn affect endorsed behavioral strategies (i.e., intentions). Assuming intact behavioral control and awareness, these intentions should dictate actual behavior and post hoc estimates of behavior (Fig. 3f).
a True-score relationships between Action-Attack inference bias, action value bias, and behavior preference [left panel], and endorsed preference, estimated preference, and behavior preference [right panel]. Colored lines represent true-score relationships per phenotype (determined via singular value decomposition), shaded regions represent 3D confidence regions for true score relationships (determined via bootstrapping), and each dot represents an individual’s block score. Corresponding pairwise true-score relationships between behavior preference and Action-Attack inference bias (b), action value bias (c), endorsed preference (d), and estimated preference (e). R1:R2 attack inferences and R1:R2 valuations failed to translate to behavioral preference in Compulsives relative to other phenotypes, whereas relationships between endorsements, estimates and behavior were relatively intact. *p < 0.05; **p < 0.01 for cluster slope differences. f Model for underlying differences in behavioral phenotypes. Sensitive and Unaware individuals share cognitive-behavioral trajectories (functional relationship between cognitions, intentions, and behavior), but only Sensitives readily learn from experience alone. Compulsive individuals have an altered cognitive-behavioral trajectory: provision of information leads to updated cognitions (action-related inferences and valuations), but these do not translate into a corresponding endorsed behavioral strategy. N = 267 participants.
Interestingly, Sensitives and Unawares shared similar trajectories (Fig. 3a–e), despite their initial differences. Once Unawares received contingency information, they used this to correct their punishment-related knowledge, which resulted in corresponding changes to action valuations, preference endorsements, behavior, and estimates (Figs. 3f, S5). So, information-driven updating in Unawares aligned with experience-driven updating in Sensitives.
In contrast, Compulsives followed a distinct cognitive-behavioral trajectory. Unlike the other phenotypes, Compulsives exhibited a divergence between their punishment inferences, action values, and behavior (Figs. 3a, b, S4a, S5a). Even though Compulsives used contingency information to modestly update their punishment inferences and valuations, these changes did not result in corresponding changes to endorsed or implemented behavior (Figs. 3a, b, S4a). So, attenuated punishment avoidance in Compulsives was not simply due to failures in belief updating. Rather, Compulsives were on an altered cognitive-behavioral trajectory, such that information-driven changes to action-related inferences and values did not readily translate to a corresponding behavioral strategy. This represents a perturbed integration of cognitive and behavioral processes, specifically at the junction between beliefs and intentions (Fig. 3f).
Punishment phenotypes are trait-like
To assess the stability of the punishment phenotypes, we retested participants after a 6-month interval (N = 128). No selective attrition was observed across original probability groups (χ2(1) = 0.63, p = 0.427) and phenotypes (χ2(2) = 0.705, p = 0.703) (Figure S6). Participants were reassigned to their original probability group but were randomly allocated to counterbalancing conditions (i.e., unlikely to have the same punished action, CS+ cue, etc).
Clustering of avoidance behavior at retest revealed the same Sensitive, Unaware, and Compulsive phenotypes as found in initial test (Fig. 4a). Additionally, the cognitive and motivational characteristics associated with each phenotype (e.g., poor Action-CS awareness, distinct cognitive-behavioral trajectories) were recapitulated at retest (Figs. S4, S7–S9). These findings show the robustness of these phenotypes. Crucially, each phenotype at retest predominantly consisted of participants who exhibited the same phenotype at the initial test (χ2(4) = 30.811, p < 0.001) (Fig. 4b, c), indicating that inter-individual differences are relatively stable over time. This suggests that variation in punishment learning and cognitive-behavioral integration reflect stable, trait-like characteristics.
a Mean (±SEM) [left panel] and individual [middle, right panels] R1 preferences across blocks per retest cluster (N = 128). *p < 0.05 one sample t test vs. 50% (no R1:R2 bias). b Relationship between phenotype at initial test and phenotype at retest. Individuals were significantly more likely to be categorized into the same phenotype at retest (χ2(4) = 30.811, p < 0.001). In a stepwise logistic regression model, original phenotype was the dominant predictor of retest phenotype (χ2(4) = 32.82, p < 0.001, Nagelkerke r2 = 0.262). c The majority of individuals had the same phenotype at retest. Some individuals improved their punishment phenotype (e.g., went from Unaware at initial test to Sensitive at retest), while some individuals were categorized as being in a worse punishment-avoiding cluster.
Phenotype stability was related to self-reported cognitive flexibility (F(2,125) = 4.278, p = 0.016, ηp2 = 0.064 [90%CI: 0.007,0.134]). Participants who performed worse at retest (e.g., classified as Sensitive at initial test but Unaware at retest) reported significantly lower cognitive flexibility compared to those who maintained their performance (MD = –11.67, p = 0.017). However, cognitive flexibility was not significantly related to maintained versus improved performance across tests (MD = 4.13, p = 0.341), and phenotype stability was not significantly related to other self-reported tendencies previously linked to poor decision-making (Habitual Tendencies Questionnaire [F(2,125) = 0.55, p = 0.579, ηp2 = 0.009 [90%CI: 0,0.042]; Alcohol Use Disorders Identification Test [F(2,125) = 0.037, p = 0.946, ηp2 = 0.001 [90%CI: 0,1]) (Fig. S10).
Finally, we examined what combination of trait variables best predicted behavioral phenotype at retest. Self-reported traits across tests, initial test phenotype (i.e., behaviorally-identified trait), and interactions between these, were submitted to a stepwise logistic regression model. Interestingly, initial phenotypes, but not self-report measures or interaction terms, were included in the final model (see Methods). So, behaviorally-identified phenotypes surpass traditional self-report measures in predicting future punishment learning and decision-making.
Discussion
In a moderately large, diverse sample, we identified three enduring phenotypes of punishment sensitivity: Sensitives who adaptively avoid punishment through experience, Unawares who require explicit information to correct maladaptive choices, and Compulsives who persist in punished behavior despite both experiential and informational interventions. These phenotypes reflect stable, trait-like differences and predicted longitudinal decision-making outcomes, positioning them as robust markers for persistent harmful behavior.
We identify two dissociable cognitive mechanisms driving these phenotypes. First, causal inference deficits drive punishment insensitivity and maladaptive behavior in Unawares. These individuals misattributed punishment, failing to correctly link their specific action to the delivery of punishment. These causal inference deficits explain why experiential feedback often fails to drive behavior change37 - a finding that has proved challenging to policies relying on punishments (e.g., fines, sanctions, etc) to correct behavior38,39,40. Importantly, these causal inference deficits were remediable via an information-based intervention. Second, failures in cognitive-behavioral integration in Compulsives rendered punishment inert. Even with accurate causal knowledge (achieved post-intervention), these individuals could not synthesize their causal knowledge with action selection processes to avoid further punishment. This integration failure, driving resistance to both punishment experience and remedial information, identifies a core mechanism explaining the limitation of fact-based interventions (e.g., warnings, public health campaigns37,41,42) at correcting behavior in some individuals and suggests that suboptimal choices in these individuals may require alternative approaches.
These cognitive mechanisms exhibited striking stability within an individual. Behavioral phenotyping at baseline predicted individuals’ learning trajectories and choice patterns six months later. Moreover, behavioral phenotyping outperformed self-reported cognitive and behavioral flexibility in forecasting long-term learning and decision-making outcomes, underscoring its potential as an objective marker for assessing the cognitive mechanisms that place individuals at risk of chronic, maladaptive decision-making. By mapping stable punishment decision-making phenotypes to distinct cognitive mechanisms, our work advances a framework to stratify punishment sensitivity (e.g., distinguishing misattribution from integration deficits)19 and prioritize mechanism-targeted interventions that address the different cognitive drivers of poor decision-making43.
Limitations
While our findings provide insight into the cognitive mechanisms underpinning punishment sensitivity, several limitations warrant acknowledgment. First, although we identify distinct behavioral phenotypes that are stable over time, the task was conducted in a controlled online environment. As such, the ecological validity of these findings, including their relationship to more complex real-world decision-making, remains undetermined. Second, while our international sample was demographically diverse and drawn from 24 countries, data collection relied on an online participant pool fluent in English with access to digital devices. This may limit generalizability to populations with distinct socioeconomic and cultural profiles. Future work could examine whether these phenotypes also occur in more heterogeneous samples.
Data availability
All experiment data is available at https://osf.io/ju35h/. Anonymized data have been deposited at https://osf.io/ju35h/. All experiment code and analysis scripts are accessible at https://github.com/philjrdb/HCP-Test-Retest and https://zenodo.org/records/155819599.
References
Thorndike, E. B. Animal intelligence: An experimental study of the associative processes in animals. (Columbia University, 1898).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1598 (1997).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. Second Edition edn, (MIT Press, 2018).
Jean-Richard-Dit-Bressel, P., Ma, C., Bradfield, L. A., Killcross, S. & McNally, G. P. Punishment insensitivity emerges from impaired contingency detection, not aversion insensitivity or reward dominance. Elife 8, e52765 (2019).
Jean-Richard-Dit-Bressel, P. et al. Punishment insensitivity in humans is due to failures in instrumental contingency learning. Elife 10, https://doi.org/10.7554/eLife.69594 (2021).
Dadds, M. R. & Salmon, K. Punishment insensitivity and parenting: Temperament and learning as interacting risks for antisocial behavior. Clin. Child Fam. Psychol. Rev. 6, 69–86 (2003).
Hemphill, J. F., Hare, R. D. & Wong, S. Psychopathy and recidivism: A review. Leg. Criminol. Psychol. 3, 139–170 (2011).
Johnson, S. L., Turner, R. J. & Iwata, N. BIS/BAS levels and psychiatric disorder: An epidemiological study. J. Psychopathol. Behav. Assessment, 25, 25-36.
Goulter, N. et al. Predictive validity of adolescent callous-unemotional traits and conduct problems with respect to adult outcomes: High- and low-risk samples. Child Psychiatry Hum. Dev. 54, 1321–1335 (2022).
Schmauk, F. J. Punishment, arousal, and avoidance learning in sociopaths. J. Abnorm. Psychol. 76, 325–335 (1970).
Blair, K. S., Morton, J., Leonard, A. & Blair, R. J. R. Impaired decision-making on the basis of both reward and punishment information in individuals with psychopathy. Personal. Individ. Differ. 41, 155–165 (2006).
Ersche, K. D. et al. Carrots and sticks fail to change behavior in cocaine addiction. Science 352, 1468–1471 (2016).
Elster, E. M. et al. Impaired punishment learning in conduct disorder. J. Am. Acad. Child Adolesc. Psychiatry 63, 454–463 (2024).
Bechara, A., Damasio, A. R., Damasio, H. & Anderson, S. W. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50, 7–15 (1994).
Cools, R., Barker, R. A., Sahakian, B. J. & Robbins, T. W. L-Dopa medication remediates cognitive inflexibility, but increases impulsivity in patients with Parkinson’s disease. Neuropsychologia 41, 1431–1441 (2003).
Luscher, C., Robbins, T. W. & Everitt, B. J. The transition to compulsion in addiction. Nat. Rev. Neurosci. 21, 247–263 (2020).
Luscher, C. The emergence of a circuit model for addiction. Annu. Rev. Neurosci. 39, 257–276 (2016).
Vandaele, Y., Vouillac-Mendoza, C. & Ahmed, S. H. Inflexible habitual decision-making during choice between cocaine and a nondrug alternative. Transl. Psychiatry 9, 109 (2019).
McNally, G. P., Jean-Richard-dit-Bressel, P., Millan, E. Z. & Lawrence, A. J. Pathways to the persistence of drug use despite its adverse consequences. Mol. Psychiatry 28, 2228–2237 (2023).
Vrizzi, S., Najar, N., Lemogne, C., Palminteri, S. & Lebreton, M. Comparing the test-retest reliability of behavioral, computational and self-reported individual measures of reward and punishment sensitivity in relation tomental health symptom. PsyArXiv https://doi.org/10.31234/osf.io/3u4gp (2023).
Killcross, S., Robbins, T. W. & Everitt, B. J. Different types of fear-conditioned behaviour mediated by separate nuclei within amygdala. Nature 388, 377–380 (1997).
Jean-Richard-dit-Bressel, P. et al. A cognitive pathway to punishment insensitivity. Proc. Natl. Acad. Sci. 120, e2221634120 (2023).
Palan, S. & Schitter, C. Prolific.ac—A subject pool for online experiments. J. Behav. Exp. Financ. 17, 22–27 (2018).
Peer, E., Brandimarte, L., Samat, S. & Acquisti, A. Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. J. Exp. Soc. Psychol. 70, 153–163 (2017).
Zorowitz, S. et al. Inattentive responding can induce spurious associations between task behaviour and symptom measures. Nat. Hum. Behav. 7, 1667–1681 (2023).
de Leeuw, J. R. jsPsych: A JavaScript library for creating behavioral experiments in a web browser. Behav. Res. Methods 47, 1–12 (2015).
Dennis, J. P. & Vander Wal, J. S. The cognitive flexibility inventory: Instrument development and estimates of reliability and validity. Cogn. Ther. Res. 34, 241–253 (2009).
Ramakrishnan, S., Robbins, T. W. & Zmigrod, L. The habitual tendencies questionnaire: A tool for psychometric individual differences research. Personal. Ment. Health 16, 30–46 (2022).
Saunders, J. B., Aasland, O. G., Babor, T. F., de la Fuente, J. R. & Grant, M. Development of the alcohol use disorders identification test (AUDIT): WHO Collaborative Project on Early Detection of Persons with Harmful Alcohol Consumption-II. Addiction 88, 791–804 (1993).
Smithson, M. (2001) Correct Confidence Intervals for Various Regression Effect Sizes and Parameters: The Importance of Noncentral Distributions in Computing Intervals. Educational and Psychological Measurement, 61, 605-632. https://doi.org/10.1177/00131640121971392.
Jean-Richard-dit-Bressel, P., Clifford, C. W. G. & McNally, G. P. Analyzing Event-Related Transients: Confidence Intervals, Permutation Tests, and Consecutive Thresholds. Front. Mol. Neurosci. 13, https://doi.org/10.3389/fnmol.2020.00014 (2020).
Hersterberg, T. C. What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics. (2014).
Belin, D., Belin-Rauscent, A., Murray, J. E. & Everitt, B. J. Addiction: failure of control over maladaptive incentive habits. Curr. Opin. Neurobiol., 1-9 https://doi.org/10.1016/j.conb.2013.01.025 (2013).
Hogarth, L. Addiction is driven by excessive goal-directed drug choice under negative affect: translational critique of habit and compulsion theory. Neuropsychopharmacology 45, 720–735 (2020).
Vanderschuren, L. J. M. J. & Everitt, B. J. Drug seeking becomes compulsive after prolonged cocaine self-administration. Science 305, 1017–1019 (2004).
Wall, M. E., Rechtsteiner, A., Rocha, L. M. (2003). Singular Value Decomposition and Principal Component Analysis. In: Berrar, D. P., Dubitzky, W., Granzow, M. (eds) A Practical Approach to Microarray Data Analysis. Springer, Boston, M. A. https://doi.org/10.1007/0-306-47815-3_5.
Camilleri, A. R. & Newell, B. R. Mind the gap? Description, experience, and the continuum of uncertainty in risky choice. Prog. Brain Res. 202, 55–71 (2013).
Hancock, P. A., Kaplan, A. D., MacArthur, K. R. & Szalma, J. L. How effective are warnings? A meta-analysis. Safety Sci. 130, https://doi.org/10.1016/j.ssci.2020.104876 (2020).
Witte, K. & Allen, M. A meta-analysis of fear appeals: implications for effective public health campaigns. Health Educ. Behav. 27, 591–615 (2000).
Snyder, L. B. et al. A meta-analysis of the effect of mediated health communication campaigns on behavior change in the United States. J. Health Commun. 9, 71–96 (2004).
Barron, G., Leider, S. & Stack, J. The effect of safe experience on a warnings’ impact: Sex, drugs, and rock-n-roll. Organ. Behav. Hum. Decis. Process. 106, 125–142 (2008).
Hertwig, R., Barron, G., Weber, E. U. & Erev, I. Decisions from experience and the effect of rare events in risky choice. Psychol. Sci. 15, 534–539 (2004).
McNally, G. P. & Jean-Richard-dit-Bressel, P. A cognitive pathway to persistent, maladaptive choice. Eur. Addict. Res., 1-1 https://doi.org/10.1159/000538103 (2024).
Acknowledgements
This work was supported by grants from the Australian Research Council to PJRDB (DP220102317), G.P.M. (DP250100345; DP220100040), and by an NHMRC Synergy grant (GNT2011633). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Psychology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Eva Pool and Troby Ka-Yan Lui. [A peer review file is available].
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zeng, L., Park, H.R.P., McNally, G.P. et al. Causal inference and cognitive-behavioral integration deficits drive stable variation in human punishment sensitivity. Commun Psychol 3, 103 (2025). https://doi.org/10.1038/s44271-025-00284-9
Received: 24 February 2025
Accepted: 23 June 2025
Published: 09 July 2025
DOI: https://doi.org/10.1038/s44271-025-00284-9
.png)




