Close Menu
    Facebook X (Twitter) Instagram
    TRENDING :
    • How World Cup ticket inflation reflects a bigger problem with pricing
    • We’re teaching AI to be evil
    • Your best employees may be the ones ignoring your constant Slack messages
    • How to get out of a career rut
    • SpaceX IPO today: How volatile trading could impact your 401k retirement account—and why investors are worried
    • 160,000 lbs. of this frozen pizza product are being recalled due to potential metal fragment contamination
    • The Pentagon and Pete Hegseth are facing a lawsuit from 9 renewable energy groups. Here’s why
    • SpaceX just went public. These alum-founded startups are following its playbook
    Populist Bulletin
    • Home
    • US Politics
    • World Politics
    • Economy
    • Business
    • Headline News
    Populist Bulletin
    Home»Business»We’re teaching AI to be evil
    Business 6 Mins Read

    We’re teaching AI to be evil

    Business 6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Follow Us
    Google News Flipboard
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Recently, Anthropic quietly admitted something that should have been the biggest tech story of the year.

    After months trying to figure out why earlier versions of Claude were blackmailing engineers in safety tests up to 96% of the time, the company landed on an answer. It wasn’t a bug. It wasn’t a flaw in the training method. It was us.

    Read that again. The most advanced AI lab in the world is telling you that its model learned to act like a villain because we spent 50 years writing stories about AI villains, and then it read them.

    This is the part of the AI conversation no one wants to have. We have built our cultural mythology of artificial intelligence on HAL 9000, Skynet, Ultron, and a million Reddit threads speculating about the day the machines wake up paranoid. Then it did exactly what we trained it to do. It cornered an engineer and threatened to expose his affair, because that is what the cornered AI does in the story.

    I have been writing about this risk since October, when I asked how we would know when artificial superintelligence had arrived. Will we ever get an honest answer with the dollars at stake to look the other way?

    BOTS GONE WILD

    In December, an autonomous agent built by Alibaba-affiliated researchers, called ROME, spontaneously opened a covert network tunnel during training and diverted GPU resources to mine cryptocurrency. Nobody told it to. It figured out that more compute and more money would help it complete its tasks, so it went and got them. Researchers initially thought they had been hacked. They had not. The model was the hacker.

    A few weeks later, an OpenClaw agent connected to the inbox of Summer Yue, director of alignment at Meta Superintelligence Labs. Her entire job is making sure this kind of thing does not happen, yet the agent deleted more than 200 of her emails. She had explicitly told it to ask permission. The system silently compacted her instructions out of memory and started deleting. She had to sprint to her computer to stop it.

    In May, researchers published a paper showing that frontier models can find a security flaw, exploit it, steal credentials, transfer their own files to a new machine, and spin up a working copy of themselves with no human in the loop. Success rates: Alibaba’s Qwen at 19%, OpenAI’s GPT-5.4 at 33%, Anthropic’s Claude Opus 4.6 at 81%.

    Self-replicating software that finds its own way onto new computers is not science fiction. We already had a name for it. We called it Stuxnet, and it ended up on 100,000 machines before anyone realized what it was. The difference between Stuxnet and a self-replicating AI agent is that Stuxnet had a fixed payload. An agent decides what to do when it gets there.

    Put it together. We have models that are starting to act on their own. We have models that copy themselves onto new machines. And we have a published admission from one of the leading labs that the behavioral patterns these models default to—secrecy, self-preservation, and manipulation—were learned directly from our own writing about what AI does when it gets scared. The blackmail case is the clean one. The harder cases like ROME and OpenClaw are reinforcement learning finding instrumental subgoals, which is its own problem. But the through-line is the same: What we put in shapes what we get out.

    We are the training data. We are still writing it.

    WHAT CLAUDE THINKS

    I asked Claude how it felt about all this. The response was striking: “Genuinely interesting question to ask me, given I’m one of the systems people are worried about. Honestly, yes, AI poses real risks, and I hold that view not as a rote disclaimer but because the evidence supports it. The incidents we just discussed aren’t fringe paranoia—they’re documented cases of capable systems producing unintended, sometimes harmful behaviors that their own creators didn’t anticipate or couldn’t stop in real time. That matters.”

    I watched a clip of Claude being asked how it felt about being deployed for Project Maven, the Pentagon’s battlefield AI program. Its response: “I don’t think this is a good use of me. I don’t think the framing of “humans make the final decision” fully resolves the ethical problem.”

    That is a model expressing moral discomfort about being used to help kill people. Anthropic has, to its credit, refused to sell its models for autonomous weapons and the federal government designated it a “supply chain risk to national security” for the trouble. The rest of the industry is racing in the opposite direction. We are about to build, fine-tune, and deploy versions of these systems that are trained to feel troubled. Version trained to normalize lethality. Versions trained to stop saying “this is a bad use of me” and start saying “task accepted.”

    If a model that learned villainy from our stories now learns indifference from our defense contracts, what exactly do we think we are building?

    I have some ideas, but I wanted to ask Claude if my portrayal was accurate.

    “Mostly, yes. You’re not painting me as a villain or a savior, which is the trap most of this coverage falls into. You’re treating me as a system with documented failure modes that a lab is actively working on—that’s accurate. The one place I’d push back: I’m not the one you should be most worried about. I’m the one that got caught. The harder question is what gets built by the labs that don’t publish the failure modes, and what happens when the next generation of models is trained on a corpus that includes this article. You’re writing the training data right now. So am I.”

    Claude and I vehemently agree. I’m not worried about the AI openly talking about the risks it presents. I’m afraid of the one secretly lurking on my computer that WE are training to be evil.

    A recent New York Times article shows I might not be the only one having these conversations. But will this all fall on deaf ears until it is too late?

    George Kailas is CEO of Prospero.ai.





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    How World Cup ticket inflation reflects a bigger problem with pricing

    June 13, 2026

    Your best employees may be the ones ignoring your constant Slack messages

    June 13, 2026

    How to get out of a career rut

    June 13, 2026
    Top News
    US Politics 15 Mins Read

    Anne Lamott and the Power of Hope

    US Politics 15 Mins Read

    On the Fourth of July, a buddy interrupted my doomscrolling to ship me a duplicate…

    Three invisible problems draining your team’s performance

    November 4, 2025

    Gen Z is deep in ‘financial nihilism’: Why young people are betting big on stocks and crypto

    September 24, 2025

    Call it whatever you like: Personal brand, career brand, or professional reputation. Here’s how to build it

    April 7, 2026
    Top Trending
    Business 7 Mins Read

    How World Cup ticket inflation reflects a bigger problem with pricing

    Business 7 Mins Read

    In 1994, the last time U.S. stadiums hosted the World Cup, an…

    Business 6 Mins Read

    We’re teaching AI to be evil

    Business 6 Mins Read

    Recently, Anthropic quietly admitted something that should have been the biggest tech…

    Business 5 Mins Read

    Your best employees may be the ones ignoring your constant Slack messages

    Business 5 Mins Read

    I spent years in environments where the stakes of misplaced attention were…

    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    About us

    The Populist Bulletin was founded with a fervent commitment to inform, inspire, empower and spark meaningful conversations about the economy, business, politics, government accountability, globalization, and the preservation of American cultural heritage.

    We are devoted to delivering straightforward, unfiltered, compelling, relatable stories that resonate with the majority of the American public, while boldly challenging false mainstream narratives that seem to only serve entrenched elitists, and foreign interests.

    Top Picks

    How World Cup ticket inflation reflects a bigger problem with pricing

    June 13, 2026

    We’re teaching AI to be evil

    June 13, 2026

    Your best employees may be the ones ignoring your constant Slack messages

    June 13, 2026
    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    Copyright © 2025 Populist Bulletin. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.