Close Menu
    Facebook X (Twitter) Instagram
    TRENDING :
    • Meet the designer behind NYC’s charming World Cup campaign
    • Google’s Pinpoint is the free research tool you should know about
    • How World Cup ticket inflation reflects a bigger problem with pricing
    • We’re teaching AI to be evil
    • Your best employees may be the ones ignoring your constant Slack messages
    • How to get out of a career rut
    • SpaceX IPO today: How volatile trading could impact your 401k retirement account—and why investors are worried
    • 160,000 lbs. of this frozen pizza product are being recalled due to potential metal fragment contamination
    Populist Bulletin
    • Home
    • US Politics
    • World Politics
    • Economy
    • Business
    • Headline News
    Populist Bulletin
    Home»Business»We’re teaching AI to be evil
    Business 6 Mins Read

    We’re teaching AI to be evil

    Business 6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Follow Us
    Google News Flipboard
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Recently, Anthropic quietly admitted something that should have been the biggest tech story of the year.

    After months trying to figure out why earlier versions of Claude were blackmailing engineers in safety tests up to 96% of the time, the company landed on an answer. It wasn’t a bug. It wasn’t a flaw in the training method. It was us.

    Read that again. The most advanced AI lab in the world is telling you that its model learned to act like a villain because we spent 50 years writing stories about AI villains, and then it read them.

    This is the part of the AI conversation no one wants to have. We have built our cultural mythology of artificial intelligence on HAL 9000, Skynet, Ultron, and a million Reddit threads speculating about the day the machines wake up paranoid. Then it did exactly what we trained it to do. It cornered an engineer and threatened to expose his affair, because that is what the cornered AI does in the story.

    I have been writing about this risk since October, when I asked how we would know when artificial superintelligence had arrived. Will we ever get an honest answer with the dollars at stake to look the other way?

    BOTS GONE WILD

    In December, an autonomous agent built by Alibaba-affiliated researchers, called ROME, spontaneously opened a covert network tunnel during training and diverted GPU resources to mine cryptocurrency. Nobody told it to. It figured out that more compute and more money would help it complete its tasks, so it went and got them. Researchers initially thought they had been hacked. They had not. The model was the hacker.

    A few weeks later, an OpenClaw agent connected to the inbox of Summer Yue, director of alignment at Meta Superintelligence Labs. Her entire job is making sure this kind of thing does not happen, yet the agent deleted more than 200 of her emails. She had explicitly told it to ask permission. The system silently compacted her instructions out of memory and started deleting. She had to sprint to her computer to stop it.

    In May, researchers published a paper showing that frontier models can find a security flaw, exploit it, steal credentials, transfer their own files to a new machine, and spin up a working copy of themselves with no human in the loop. Success rates: Alibaba’s Qwen at 19%, OpenAI’s GPT-5.4 at 33%, Anthropic’s Claude Opus 4.6 at 81%.

    Self-replicating software that finds its own way onto new computers is not science fiction. We already had a name for it. We called it Stuxnet, and it ended up on 100,000 machines before anyone realized what it was. The difference between Stuxnet and a self-replicating AI agent is that Stuxnet had a fixed payload. An agent decides what to do when it gets there.

    Put it together. We have models that are starting to act on their own. We have models that copy themselves onto new machines. And we have a published admission from one of the leading labs that the behavioral patterns these models default to—secrecy, self-preservation, and manipulation—were learned directly from our own writing about what AI does when it gets scared. The blackmail case is the clean one. The harder cases like ROME and OpenClaw are reinforcement learning finding instrumental subgoals, which is its own problem. But the through-line is the same: What we put in shapes what we get out.

    We are the training data. We are still writing it.

    WHAT CLAUDE THINKS

    I asked Claude how it felt about all this. The response was striking: “Genuinely interesting question to ask me, given I’m one of the systems people are worried about. Honestly, yes, AI poses real risks, and I hold that view not as a rote disclaimer but because the evidence supports it. The incidents we just discussed aren’t fringe paranoia—they’re documented cases of capable systems producing unintended, sometimes harmful behaviors that their own creators didn’t anticipate or couldn’t stop in real time. That matters.”

    I watched a clip of Claude being asked how it felt about being deployed for Project Maven, the Pentagon’s battlefield AI program. Its response: “I don’t think this is a good use of me. I don’t think the framing of “humans make the final decision” fully resolves the ethical problem.”

    That is a model expressing moral discomfort about being used to help kill people. Anthropic has, to its credit, refused to sell its models for autonomous weapons and the federal government designated it a “supply chain risk to national security” for the trouble. The rest of the industry is racing in the opposite direction. We are about to build, fine-tune, and deploy versions of these systems that are trained to feel troubled. Version trained to normalize lethality. Versions trained to stop saying “this is a bad use of me” and start saying “task accepted.”

    If a model that learned villainy from our stories now learns indifference from our defense contracts, what exactly do we think we are building?

    I have some ideas, but I wanted to ask Claude if my portrayal was accurate.

    “Mostly, yes. You’re not painting me as a villain or a savior, which is the trap most of this coverage falls into. You’re treating me as a system with documented failure modes that a lab is actively working on—that’s accurate. The one place I’d push back: I’m not the one you should be most worried about. I’m the one that got caught. The harder question is what gets built by the labs that don’t publish the failure modes, and what happens when the next generation of models is trained on a corpus that includes this article. You’re writing the training data right now. So am I.”

    Claude and I vehemently agree. I’m not worried about the AI openly talking about the risks it presents. I’m afraid of the one secretly lurking on my computer that WE are training to be evil.

    A recent New York Times article shows I might not be the only one having these conversations. But will this all fall on deaf ears until it is too late?

    George Kailas is CEO of Prospero.ai.





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Meet the designer behind NYC’s charming World Cup campaign

    June 13, 2026

    Google’s Pinpoint is the free research tool you should know about

    June 13, 2026

    How World Cup ticket inflation reflects a bigger problem with pricing

    June 13, 2026
    Top News
    Business 16 Mins Read

    Your Essential How-To Manual for Incorporating a Business

    Business 16 Mins Read

    Incorporating a business is an important step that can greatly influence your venture’s success. It…

    Hugh Jackman tells new grads the most “painful lesson” he learned

    May 4, 2026

    3 Female Front-Runners Challenging NY’s Mike Lawler Make This a Race to Watch

    April 24, 2026

    Everyone says this movie is terrible and it’s still about to make $70 million

    April 22, 2026
    Top Trending
    Business 11 Mins Read

    Meet the designer behind NYC’s charming World Cup campaign

    Business 11 Mins Read

    How do you build excitement among 8.5 million New Yorkers (and 1.2…

    Business 7 Mins Read

    Google’s Pinpoint is the free research tool you should know about

    Business 7 Mins Read

    This article is republished with permission from Wonder Tools, a newsletter that helps…

    Business 7 Mins Read

    How World Cup ticket inflation reflects a bigger problem with pricing

    Business 7 Mins Read

    In 1994, the last time U.S. stadiums hosted the World Cup, an…

    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    About us

    The Populist Bulletin was founded with a fervent commitment to inform, inspire, empower and spark meaningful conversations about the economy, business, politics, government accountability, globalization, and the preservation of American cultural heritage.

    We are devoted to delivering straightforward, unfiltered, compelling, relatable stories that resonate with the majority of the American public, while boldly challenging false mainstream narratives that seem to only serve entrenched elitists, and foreign interests.

    Top Picks

    Meet the designer behind NYC’s charming World Cup campaign

    June 13, 2026

    Google’s Pinpoint is the free research tool you should know about

    June 13, 2026

    How World Cup ticket inflation reflects a bigger problem with pricing

    June 13, 2026
    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    Copyright © 2025 Populist Bulletin. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.