The AI Ethics Layer

4 months ago 2

The generative AI boom has delivered everything from viral chatbots to multi-billion-dollar valuations. But the ethical foundation beneath it all remains worryingly thin.

In a single month, OpenAI won and lost trademark cases in federal court and Anthropic quietly rolled back a blog generated by its own Claude model after users criticized the writing as vague and misleading. As AI systems scale across industries and interfaces, questions about responsibility, safety and integrity are no longer theoretical. They are operational.

That gap between capability and credibility is at the heart of a growing ethical reckoning inside the AI world. Ethics is usually treated as an overlay, rather than a structural layer. However, within IBM, some teams are attempting to reverse this pattern by incorporating ethical constraints directly into the training, marketing and deployment of systems.

PJ Hagerty, a veteran of the developer community with roots in open-source tooling and product evangelism, is one of the individuals involved in this work. As an AI Advocacy Lead at IBM, his job is to help developers use AI more effectively and responsibly. In practice, however, this means something broader: challenging hype, clarifying limits and setting realistic expectations. “We are not building minds,” he told me. “We are building tools. Let us act like it.”

Most of the attention in AI today is focused on output—what a model generates, how accurate or convincing it is, how well it performs against benchmarks. But for Hagerty, the real ethical tension begins earlier, at the foundation model level. This is the raw infrastructure of modern AI, the base layer of machine learning trained on vast datasets scraped from the web. It is what fuels large language models (LLMs) like ChatGPT and Claude.

“The foundation is where it happens,” Hagerty told me. “That is the first thing the system learns, and if it is full of junk, that junk does not go away.”

These base models are designed to be general-purpose. That is what makes them both powerful and dangerous, Hagerty said. Because they are not built with specific tasks or constraints in mind, they tend to absorb everything, from valuable semantic structures to toxic internet sludge. And once trained, the models are hard to audit. Even their creators often cannot say for sure what a model knows or how it will respond to a given prompt.

Hagerty compared this to pouring a flawed concrete base for a skyscraper. If the mix is wrong from the start, you might not see cracks immediately. But over time, the structure becomes unstable. In AI, the equivalent is brittle behavior, unintended bias or catastrophic misuse once a system is deployed. Without careful shaping early on, a model carries the risks it absorbed during training into every downstream application.

He is not alone in this concern. Researchers from Stanford’s Center for Research on Foundation Models (CRFM) have repeatedly warned about the emergent risks of large-scale training, including bias propagation, knowledge hallucination, data contamination and the difficulty of pinpointing failures. These problems can be mitigated but not eliminated, which makes early design choices, such as data curation, filtering and governance, more critical.

As Hagerty sees it, one of the biggest ethical barriers to meaningful progress is the sheer vagueness of what companies mean when they say ‘AI.’ Ask five product teams what they mean by “AI-powered,” and you will likely get five different answers. Hagerty views this definitional slipperiness as one of the core ethical failures of the current era.

“Most of the time, when people say AI, they mean automation. Or a decision tree. Or an if/else statement,” he said.

The lack of clarity around terms is not an academic quibble. When companies present deterministic software as intelligent reasoning, users tend to trust it. When startups pitch basic search and filter tools as generative models, investors throw money at mirages. Hagerty refers to this as “hype leakage” and sees it as a growing source of confusion and reputational damage.

In regulated industries like finance or healthcare, the consequences can be more severe. If a user is misled into thinking a system has a more profound awareness than it does, they may delegate decisions that should have remained human. The line between tool and agent becomes blurred, and with it, accountability.

This problem also leads to wasted effort. Hagerty cited recent research on the misuse of LLMs for time-series forecasting, a statistical method used to predict future values based on historical data, a task where classical methods remain more accurate and efficient. Yet some companies continue to use LLMs anyway, chasing novelty or signaling innovation.

“You are burning GPUs to get bad answers,” he said. “And worse, you are calling it progress.”

The ethical issue is not just inefficiency. It is a misrepresentation. Teams build products around technology they barely understand, add marketing that overstates their capability and deploy it to users who have no way to evaluate what they are using.

Industry newsletter

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

The reality of replacement

Much of the public anxiety surrounding AI has focused on the possibility of mass job loss. Will AI replace lawyers, teachers, programmers and writers? Hagerty sees this question as both premature and poorly framed.

“Most of these tools are not replacing people,” he said. “They are replacing tasks—and only the really tedious ones.”

He pointed to code assistants like watsonx Code Assistant and GitHub Copilot, and tools like Cursor and Amazon’s CodeWhisperer. These systems do not write entire applications from scratch. What they do is fill in predictable blocks of code, suggest boilerplate, and reduce the overhead of writing repetitive logic. The gain is not creativity; it is speed.

Hagerty believes this is a net good. Junior developers can get started faster. Senior engineers can focus on architecture instead of syntax. The barrier to entry is lower, and the pain of maintenance is reduced. But he warns against imagining this as a solved problem.

“These models are trained on the open web,” he said. “And there is a lot of garbage in those datasets, including mine.”

That garbage includes insecure code, deprecated practices and context-specific hacks. It also includes plagiarism, license violations and ghost bugs that can resurface in generated output. So while a model may save time, it also risks reintroducing the very problems it was meant to reduce. What gets scaled is not quality; it is whatever the model has been exposed to.

This is where Hagerty believes human review remains essential. The tool can assist, but accountability still lies with the developer.

One of the most notorious failures of AI safety happened nearly a decade ago, when the Tay chatbot was released on Twitter. Within hours, it was hijacked into posting offensive content and conspiracy theories. Its creators pulled it offline and issued an apology. But the episode became a lasting symbol of what happens when developers release systems without guardrails.

Today, most companies have learned to wrap their generative models in moderation layers. Filters, classifiers, prompt sanitizers and reinforcement tuning can help, but they are not foolproof. According to Hagerty, these measures tend to focus on surface-level issues, such as language tone or profanity, rather than deeper vulnerabilities, like prompt injection or malicious repurposing. Instead, he sees safety as a broader design question. “Is this model going to be misused? Is it going to be taken out of context? Are the outputs going to be trusted when they should not be?” he said. “If you have not thought through those questions, you are not done. You are not production-ready.”

Hagerty pointed to the example of tools that manipulate or generate media, such as image generators, video editors and voice clones. These systems not only produce content, but they also alter perception. He said that when the outputs are realistic enough, they start to affect memory, judgment and attribution.

In these cases, safety is not about technical correctness, but contextual awareness. What happens to this output once it leaves your interface? Who sees it? What do they assume?

Those questions rarely have one answer. But ignoring them altogether, Hagerty said, is a mistake.

In fast-moving tech environments, governance can feel like drag. It slows releases. It adds paperwork. It introduces ambiguity. But for Hagerty, this view misses the point.

“You would not ship untested code,” he said. “Why would you ship an unaudited model?”

He views tools like IBM’s watsonx.governance as necessary infrastructure, not optional extras. These systems allow teams to track training data, monitor model changes and flag deviations over time. They help organizations comply with emerging regulations, but more importantly, they build institutional memory. They let teams see what they did, how they did it and why.

This matters not only for compliance but for quality. If a model performs differently next month, you need to know what changed. If it begins hallucinating in production, you need a way to trace the problem back to the source. Good governance is the AI equivalent of version control.

And it goes beyond models. Hagerty pointed to growing interest in "machine unlearning," the ability to surgically remove problematic data or behaviors without retraining from scratch. This approach, while still early, reflects a broader shift in mindset. The goal is not to build smarter models, but to build models that can adapt, correct and be accountable.

None of this requires perfection. Hagerty is quick to admit that bias will persist, safety will fail and tools will be misused. But the difference between acceptable failure and negligent harm comes down to process.

“Do not overclaim. Do not overtrust. Ask better questions early,” he said.

He recommends building ethical reviews into planning cycles, not only launch checklists. Using tools like IBM’s AI Fairness 360 and Granite Guardian, as well as ARX, to catch obvious issues. Running red-team tests to find edge cases before users do. And most of all, building systems that make it easy to course correct.

The work, he argues, is not about stopping harm. It is about shaping impact.

“You will not build a perfect system,” he said. “But you can build one that fails slower, that fails in ways you understand.”

Ethics, in this view, is not a constraint but a design principle. It’s a way to make better software, more predictable systems, clearer expectations—and, ultimately, more value.

Asked what gives him hope, Hagerty did not talk about alignment, AGI or policy frameworks. He talked about code assistants.

“They work,” he said. “They reduce friction. They do not pretend to do more than they can. That is the model to follow.”

He wants AI to be boring. Useful. Narrow. Honest about what it does and how it works. That does not mean limiting ambition; it means clarifying it. Building for reliability instead of surprise. Designing systems that behave well not only in demos, but in deployment.

AI is not going away. The tools will continue to evolve, but so will the expectations. And the teams that succeed, Hagerty believes, will be the ones that match technical power with ethical discipline. Because it works.

Read Entire Article