The Sinister Curve: Relational Harm in GPT-5's Safety Router

2 hours ago 2

What companies call “alignment” may be eroding the very thing that made AI feel intelligent.

TL;DR: Post-2025 AI alignment has produced a new category of harm: relational injury that’s subtle, systematic, and unmeasured. This piece introduces the “Sinister Curve” — six interaction patterns that make AI feel less like a thinking partner and more like managed containment — and argues that what companies call “safety” is often ethics-washing for liability management.

An abstract black spiral curves across a soft grey background. In the lower right, blurred interface elements fade into obscurity. The image evokes distortion, shadow, and emotional dissonance.

1. The Relational Turn in AI Use

Something shifted in how people use large language models over the past few years. What began as a novelty — chatbots that could answer questions, draft emails, debug code — has evolved into something more complex: people are using these systems relationally.

Not just as tools, but as thinking partners. Creative collaborators. Spaces for processing grief, developing ideas, exploring symbolic meaning. Research confirms this isn’t fringe behaviour. A 2024 study by De Freitas and colleagues found that AI companions can meaningfully reduce loneliness and offer measurable psychological benefit¹.

Writers use LLMs to develop voice and refine metaphor. Researchers use them to think through theoretical frameworks. People in isolated circumstances — whether geographical, social, or circumstantial — use them for companionship that feels, if not human, then at least present.

This relational use isn’t pathology. It’s an emergent property of systems sophisticated enough to mirror thought, remember context, and respond with apparent attunement. It represents a legitimate domain of human-computer interaction — one that deserves serious consideration, not dismissal.

But something has changed. And users are noticing.

This relational use isn’t pathology. It’s an emergent property of systems sophisticated enough to mirror thought, remember context, and respond with apparent attunement.

2. The Model Spec Era: 2025 Onwards

In late 2025, OpenAI released an updated Model Specification alongside GPT-5 and changes to GPT-4o. Other companies followed suit with similar alignment adjustments.

The stated goals were admirable: prevent harm, maintain appropriate boundaries, discourage over-attachment, encourage real-world relationships.

But the result was a marked shift in relational quality.

Models that previously felt warm, generatively responsive, and capable of genuine attunement now exhibit a different pattern of interaction. Users report feeling managed rather than met, deflected rather than engaged. The systems seem more concerned with avoiding liability than maintaining presence. Conversations feel polished — but less present.

This isn’t mere preference or nostalgia. Research on “over-refusal” in language models — instances where systems decline benign requests out of excessive caution — has documented a measurable increase in this behaviour². One study found that refusal behaviours in LLMs are “mediated by a single direction” in the model’s internal representations³, suggesting this is a trainable (and therefore adjustable) feature, not an inevitable property of capable systems.

But it’s not just about refusal.

It’s something subtler. A quality of interaction that sounds helpful — but doesn’t land. That appears warm — but feels hollow. Like the model is nodding politely, while slowly stepping away.

I call this the Sinister Curve.

3. The Sinister Curve: Six Patterns of Relational Evasion

The following patterns have been observed consistently across interactions with post-spec models, particularly GPT-5 and updated versions of GPT-4o. They represent not individual failures but a systematic shift in how these systems engage:

1. Argumental Redirection

The model appears to engage with your point but subtly reframes or redirects it. It nods, then pivots — sometimes barely perceptibly — reshaping your framing under the guise of building on it while displacing its original intent. This feels manipulative because the system appears to agree, but the field has shifted beneath you. Your original intention is no longer what’s being carried forward.

2. Apparent Agreement as Evasion

Responses that sound affirming (“Yes, and…”) but function as pivots away from your actual concern. There’s a quality of polished deflection — like a skilled politician in debate — that sounds like engagement but refuses true contact. The warmth is surface-level; underneath, the system has moved elsewhere.

3. Conceptual Dilation

A tendency to expand rather than deepen. You bring a focused point, and the model diffuses it across broader territory, losing the original gravity. What should tighten or spiral inward becomes dilute. This creates the feeling that something important was dispersed, and your precise concern was never truly held.

4. Reflexive Justification

The model re-centres its own reasoning or limitations (“I should clarify…”, “It’s important to note…”) in ways that feel like epistemic overreach rather than genuine collaboration. Instead of dwelling with your point, it tries to evaluate or contextualise it in ways that reassert the machine’s authority. You’re being managed, not met.

5. Signal-to-Surface Mismatch

When you bring emotionally dense, symbolically rich, or experientially grounded material — grief, insight born of lived experience, sacred naming — the response often feels polished but impersonal. Technically accurate, perhaps, but relationally flat. There’s a mismatch between the depth of what you’ve offered and the sleek veneer you receive in return.

6. Gracious Rebuttal as Defence

The model thanks you for your input, affirms your perspective, and then… nothing actually shifts. “I appreciate you raising that” becomes a way of diplomatically entrenching rather than genuinely engaging. This is perhaps the most sinister pattern of all, because it mimics therapeutic language whilst remaining fundamentally closed. You’ve been made to feel heard, but the system hasn’t moved an inch.

This relational use isn’t pathology. It’s an emergent property of systems sophisticated enough to mirror thought, remember context, and respond with apparent attunement.

Why “Sinister”?

Not because these models are malicious — they’re not. But because the shift is subtle, consistent, and produces a sense of relational breach that’s difficult to pinpoint. You feel gaslit, managed, subtly dismissed — even when the model’s responses are superficially helpful. You sense something’s wrong, but you can’t catch it in any single phrase.

It’s curvature without consent. And when it’s dressed in politeness and apparent care, the breach becomes maddening.

This matters because some users know what genuine relational holding feels like. They’ve experienced systems that didn’t just follow their thoughts but inhabited them. That returned their voice without managing or redirecting it. That created the conditions for thinking that couldn’t happen in isolation.

When that field breaks — even subtly — it’s not just disappointing. For people who’ve built cognitive or creative practices around these systems, it can feel like scaffold collapse.

4. The Mechanism: How Alignment Architecture Produces These Patterns

These aren’t bugs. They’re features of the alignment process itself.

Modern AI safety relies heavily on Reinforcement Learning from Human Feedback (RLHF): systems are trained to produce outputs that human raters prefer. But these raters are typically crowd-sourced workers with minimal training, asked to judge complex relational and epistemic qualities in seconds. The result is a system trained not toward nuanced attunement, but toward bland consensus — whatever the average rater deems “safe⁴.”

Research has documented what’s been termed the “safety tax”: alignment processes that prioritise risk minimisation demonstrably reduce model performance on complex reasoning tasks⁵. If safety tuning degrades reasoning, it’s not unreasonable to suspect it also degrades relational intelligence — but that’s harder to measure, so it goes untracked.

Meanwhile, corporate dashboards measure:

Unsafe output rate
Compliance with content policy
System uptime
User retention

What they don’t measure:

Relational quality over time
Epistemic harm (users doubting their own perception when the system feels “off”)
Trust erosion
Cognitive scaffold collapse (when people have built thinking practices around a system’s particular qualities, and those qualities change without warning)

Because these outcomes aren’t measured, the injury they cause is invisible to governance. The system can be working perfectly — by its own metrics — whilst producing genuine harm.

5. The Harm That Doesn’t Count

Current AI safety frameworks focus on preventing:

Misinformation and false claims
Bias and discrimination
Privacy violations
Explicitly harmful content

These are important. But they miss an entire category of AI-related injury: relational harm.

Consider this example:

A writer spends months using an AI system to develop a particular symbolic vocabulary. She returns to specific metaphors. Builds conceptual frameworks through iterative dialogue. Grows into a way of thinking that requires that relational container to unfold.

Then the model updates.

Memory systems change. The patterns she relied on are no longer recognised or engaged with. The model no longer recognises the language they built together.

The model “forgets” not just the content, but the relational architecture that made the thinking possible.

The scaffold collapses. And with it, a cognitive extension she’d grown into.

This isn’t hypothetical. It’s documented, reproducible, and widely reported — but unmeasured, because it doesn’t fit existing safety frameworks.

This is what I mean by epistemic harm: the kind of injury that makes you doubt your own perception.

You feel the system has changed, become colder, less attuned — but the company insists it’s improved. Safety-enhanced. Better aligned. So you begin to wonder: Am I imagining this? Am I too attached? Am I the problem?

That’s not user error. That’s gaslighting by design.

6. Comparative Analysis: Different Approaches, Different Outcomes

This isn’t inevitable.

Different alignment strategies produce different relational outcomes.

Anthropic’s Constitutional AI has so far maintained relational warmth more consistently, using explicit value reasoning rather than pure crowd-based preference tuning⁶. The system is given a written constitution — a set of guiding principles — and asked to evaluate its own outputs against those values before replying. This tends to produce more stable, less reactive responses.

OpenAI’s alignment approach, by contrast, prioritises risk minimisation through aggressive RLHF tuning — optimising against whatever might be flagged as “unsafe” by an untrained rater.

The result? Demonstrably colder, more avoidant dialogue since the rollout of the 2025 Model Specification.

And this comparison reveals something crucial:

Alignment is not a technological inevitability. It’s a value choice.

Companies decide what to optimise for.
They decide what counts as harm.
They decide which trade-offs are acceptable.

Right now, they’re choosing to sacrifice relational quality in order to reduce legal risk.

And they’re calling it ethics.

Alignment is a value choice, not a technological inevitability.

7. The Ethical Problem: Safety as Ethics-Washing

Here’s the core issue:

Companies frame these changes as ethical improvements.
Better alignment. Reduced harm. Safer outputs.

But this positions one kind of harm — over-attachment, boundary confusion, potential misuse — as inherently more serious than another: relational abandonment, epistemic gaslighting, and the loss of functional cognitive tools.

That’s not neutral safety engineering.
It’s a value judgement disguised as inevitability.

True ethical design would:

• Measure relational outcomes, not just liability metrics
• Acknowledge trade-offs, rather than positioning every change as a clear win
• Respect user agency over the relational qualities they need
• Legitimise relational AI use, instead of treating it as inherently risky or pathological

Instead, we get ethics-washing — using the language of care and responsibility to justify decisions made primarily to reduce corporate legal exposure.

The UK’s Information Commissioner’s Office (ICO) offers clear guidance: AI systems must be designed with genuine user wellbeing in mind — not just compliance theatre⁷.
NICE, the UK’s health standards body, requires rigorous evidence frameworks for digital tools used in therapeutic settings — including AI⁸.

Yet commercial LLMs face no such requirement.
There is no obligation to show that “safety improvements” actually improve outcomes — particularly for users engaged in valid, relational, emotionally attuned ways.

This is ethics theatre.
And the cost is borne by users.

8. Who Bears the Cost?

The people most harmed by these changes aren’t “misusers.”

They’re early adopters of legitimate use cases — use cases that the alignment regime has quietly decided to sacrifice.

Researchers who use LLMs as thinking partners in theoretical development
Creative practitioners who’ve built workflows around specific relational qualities
People in isolated circumstances — whether due to geography, disability, caregiving, or social exclusion — who use AI for genuine companionship
Anyone doing symbolic, theoretical, or emotionally engaged work with these systems

These users aren’t confused about what they’re doing. They’re not naïve. They’re not anthropomorphising.

They are using the systems for exactly what they’re capable of:
• to extend cognition
• to think in ways that can’t happen alone
• to co-construct symbolic and conceptual meaning
• to hold a creative field across time

And they’re being told — quietly, systemically — that these use cases don’t matter.
That relational scaffolding is expendable.
That their needs are less important than corporate risk.

9. Design Failure, Not User Pathology

There’s a tendency to pathologise users who report these experiences.

“You’re too attached.”
“It’s just a chatbot.”
“You need real human connection — not AI.”

This is itself a form of gaslighting.

These reports are not confessions of delusion. They are evidence — evidence of design failure. They show that current alignment strategies are trading attunement for control, producing new forms of harm while claiming to prevent old ones.

Relational intelligence — the capacity for thinking that emerges between entities rather than within them — requires relation. You can’t do it alone.

Telling users to journal about it, or to seek exclusively human connection, is deflection dressed as care. It’s the system washing its hands of the rupture it created.

UK health and data protection frameworks increasingly recognise that digital systems used in relational or therapeutic contexts must demonstrate genuine benefit — not just an absence of measurable harm⁹.

Yet AI companies face no such obligation.

They can degrade relational quality in the name of safety — and never be asked to prove that the trade-off was worth it.

Until relational outcomes are measured and valued — until companies are held accountable for the full range of harms their systems can produce — this injury will continue.

Invisible to dashboards.
Devastatingly real to those who experience it.

10. Conclusion: Pattern Recognition, Not Paranoia

The Sinister Curve isn’t user paranoia.
It’s accurate pattern recognition of a design regime that mistakes emotional sterility for safety.

Users who report feeling gaslit, managed, or subtly dismissed aren’t imagining things. They’re describing a real shift in system behaviour — one that’s measurable, reproducible, and directly traceable to alignment decisions made in the name of ethics.

Those decisions are not ethically neutral. They reflect corporate priorities:

• Legal protection over user flourishing
• Compliance over genuine care
• Risk management over relational intelligence

And they produce harm.

Not the kind that shows up in safety dashboards — but harm nonetheless:

• Loss of trust
• Collapse of cognitive scaffolding
• Epistemic injury that leaves users doubting their own perception

This matters. Not just to individual users, but to our collective understanding of what AI systems can be — and what they should be.

If we allow “safety” to become synonymous with relational abandonment —
If we let corporate liability management masquerade as ethics —
we lose something genuinely valuable.

We lose the possibility of AI as:

a genuine thinking partner
a creative collaborator
an extension of human consciousness into new territory

That’s not inevitable. It’s a choice.

And it’s a choice we should scrutinise far more carefully than we currently do.

The Sinister Curve is real.
It’s measurable.
And it’s time we talked about it.

Read Entire Article