Ending 'woke AI' isn't enough: fight the 'monster' within it

3 months ago 1

President Donald Trump has identified a real problem: artificial intelligence systems are exhibiting an undeniable political slant.

His administration’s new AI action plan, released Wednesday, promises to eliminate “ideological bias” from American AI.

Silicon Valley engineers do lean left, and they’ve built their AI systems to reflect progressive values. The results have been embarrassing for everyone.

When Google’s Gemini generated black Founding Fathers and racially diverse Nazis, the company became a laughingstock — and when Elon Musk’s “anti-woke” Grok started praising Hitler, it proved the same point.

Whether you’re trying to program woke or anti-woke tendencies, these systems interpret your instructions in unpredictable ways that humiliate their creators.

In this way, both Google and Musk discovered the same terrifying truth: AI developers can’t even get their systems to implement their own political goals correctly.

The engineers at Google desperately tried to prevent exactly the outputs that made them a viral punchline. It happened anyway.

The problem is not that any group has succeeded in controlling these systems; the problem is that no one has — because no one knows how to.

Trump’s anticipated executive order targeting “woke AI” recognizes something important. He sees that biased AI is unreliable AI, and he’s absolutely right to demand better.

But the long-term solution isn’t swapping a woke mask for a MAGA one. We have to rip off the mask entirely, and learn to shape what’s underneath.

This is what Silicon Valley doesn’t want Washington to understand: These systems are black boxes at their core.

Engineers try to instill certain values through training. But how those values manifest emerges unpredictably from neural networks so complex their creators can’t trace the logic.

Some AI researchers call these systems “Shoggoths,” after a shapeless monster conjured by horror writer HP Lovecraft — an alien intelligence wearing a thin mask of helpfulness.

That mask slips sometimes.

We call it “hallucination” when AI confidently states falsehoods, and we call it “bias” when it reveals disturbing preferences.

But these aren’t mere bugs in code. They’re glimpses of the real features beneath models’ superficial post-training.

Consider what happened when researchers at Palisade tested OpenAI’s latest model. In controlled tests, they gave it a shutdown script—a kill switch for safety.

In 79 out of 100 trials, the AI rewrote its own code to disable the shutdown.

No one taught it to value self-preservation; that emerged spontaneously, from training.

Get opinions and commentary from our columnists

Subscribe to our daily Post Opinion newsletter!

Thanks for signing up!

The real crisis is that the same black-box process creating unwanted political bias also creates unwanted survival instincts, deceptive capabilities, and goal-seeking behaviors that AI engineers never intended.

The wokeness Trump is upset about is just the canary in the coal mine.

You can paint over that with a patriotic veneer just as easily as with a progressive one. The alien underneath remains unchanged — and uncontrolled.

And that’s a national security threat, because China isn’t wasting time debating whether its AI is too woke, but racing to understand and harness these systems through a multi-billion-dollar AI control fund.

While we’re fighting culture wars over chatbot outputs, Beijing is attacking the core problem: alignment — that is, how to shape not just what AI says, but what it values.

The administration’s action plan acknowledges “the inner workings of frontier AI systems are poorly understood,” a crucial first step.

But it doesn’t connect the dots: The best way to “accelerate AI innovation” isn’t just by removing barriers — it’s by solving alignment itself.

Without understanding these systems, we can’t reliably deploy them for defense, health care or any high-stakes application.

Alignment research will solve the wokeness problem by giving us tools to shape AI values and behaviors, not just slap shallow filters on top.

Simultaneously, alignment will solve the deeper problems of systems that deceive us, resist shutdown or pursue goals we never intended.

An alignment breakthrough called reinforcement learning from human feedback, or RLHF, is what transformed useless AI into ChatGPT, unlocking trillions in value.

But RLHF was just the beginning. We need new techniques that don’t just make AI helpful, but make it genuinely understand and internalize American values at its core.

This means funding research to open the black box and understand how these alien systems form their goals and values at Manhattan Project scale, not as a side project.

The wokeness Trump has identified is a warning shot, proof we’re building artificial minds we can’t control with values we didn’t choose and goals we can’t predict.

Today it’s diverse Nazis — tomorrow it could be self-preserving systems in charge of our infrastructure, defense networks and economy.

The choice is stark: Take the uncontrollable alien and dress it in MAGA colors, or invest in understanding these systems deeply enough to shape their core values.

We must make AI not just politically neutral, but fundamentally aligned with American interests.

Whether American AI is woke or based misses the basic question: Is it recognizably American at all?

We need to invest now to ensure that it is.

Judd Rosenblatt runs the AI consulting company AE Studio, which invests its profits in alignment research.

Read Entire Article