Close Menu
    Facebook X (Twitter) Instagram
    TRENDING :
    • El Niño is here—and it will ‘pour fuel on the fire of a warming world’
    • Anthropic’s Claude Fable 5 plays it too safe on safety, developers say
    • SpaceX IPO update: Latest SPCX stock price, trading start time for closely watched Nasdaq debut
    • With the Trump Stench Gone, the Knicks Make History
    • Here’s how much the 2026 World Cup will cost companies in lost employee productivity—the number is staggering
    • Market Talk – June 11, 2026
    • Trump calls off military strikes on Iran hours after threatening to escalate
    • Taylor Swift set a trend at the NBA Finals with her custom New York Knicks shirt
    Populist Bulletin
    • Home
    • US Politics
    • World Politics
    • Economy
    • Business
    • Headline News
    Populist Bulletin
    Home»Business»Anthropic’s Claude Fable 5 plays it too safe on safety, developers say
    Business 3 Mins Read

    Anthropic’s Claude Fable 5 plays it too safe on safety, developers say

    Business 3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Follow Us
    Google News Flipboard
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Anthropic on Tuesday launched Claude Fable 5, its most capable public model. But within two days, users began reporting that its safety system was blocking benign or legitimate prompts.

    Fable 5 is the first public model derived from Anthropic’s Mythos family, whose original iteration showed unusual skill during training at finding software bugs and exploiting them to disrupt or take control of systems. That raised enough concern inside Anthropic that the company grouped cybersecurity with other high-risk domains, including biology and chemistry, when setting limits on Mythos-derived public models.

    For Fable 5, that means prompts flagged as sensitive in those areas are routed to Claude Opus 4.8, a less capable model with its own guardrails. Anthropic says the fallback affects about 0.05% of queries and notifies users when it happens.

    But reports of false positive reports quickly mounted. That’s because Anthropic erred on the side of caution when it designed the classifiers used to detect and downgrade potentially dangerous uses of its model. It was also challenged to balance accuracy with transparency.

    Try telling that to developers. Across social media, people have complained aboutClaude Fable 5 rejecting queries about everything from RNA sequencing data for sheep to résumé editing, to shopping lists. 

    “The word ‘cancer’ is flagged as a biosecurity risk by Claude Fable 5!” said scientist Derya Unutmazon X. “Our Anthropic overlords deciding which prompts the peasants are allowed to use.,” added founder and developer Bojan Tunguz on X.

    Anthropic now says it’s working on the problem. “A hidden safeguard is harder to probe and work around,” Anthropic says in a statement emailed to Fast Company. “This means the safeguards can be targeted much more narrowly. A visible safeguard needs to cast a wider net to be more robust, resulting in more requests being incorrectly flagged.”

    “We made the wrong tradeoff and we apologize for not getting the balance right,” the company adds. 

    Now Anthropic says it’s working to refine the classifiers so that less queries trigger false positives. For Claude subscribers, query downgrades (to Opus 4.8) will be more obvious. Developers accessing Fable 5 via the Claude API will see a reason for the model’s refusal of a prompt, the company says. 

    Meanwhile, at least one AI researcher appears to have coerced Fable 5 into responding to a banned prompt. Pliny the Liberator claimed on X to bypass Fable 5’s filters roughly 24 to 48 hours after launch. Pliny described using a multi-agent approach involving a previously jailbroken Claude Opus 4.8, along with techniques including query decomposition, long-context framing, fiction and narrative structures, and academic taxonomies. 

    Before launch, Anthropic said more than 1,000 hours of internal and external red-teaming, including bug bounty efforts, had identified no universal jailbreaks. The company has acknowledged that preventing all sophisticated, multi-turn, or agentic attacks is likely not possible and says it continues to refine its classifiers.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    El Niño is here—and it will ‘pour fuel on the fire of a warming world’

    June 11, 2026

    SpaceX IPO update: Latest SPCX stock price, trading start time for closely watched Nasdaq debut

    June 11, 2026

    Here’s how much the 2026 World Cup will cost companies in lost employee productivity—the number is staggering

    June 11, 2026
    Top News
    Business 3 Mins Read

    China’s trade surplus tops $1 trillion for the first time—even with Trump’s tariffs

    Business 3 Mins Read

    China’s exports rebounded in November after an unexpected contraction the previous month, pushing its trade…

    Kill The Messenger To Change The Future?

    January 18, 2026

    China, ‘deeply distressed,’ calls for an end to the U.S.-Iran war

    May 6, 2026

    Dyslexia doesn’t disqualify leaders—it creates them

    April 13, 2026
    Top Trending
    Business 5 Mins Read

    El Niño is here—and it will ‘pour fuel on the fire of a warming world’

    Business 5 Mins Read

    El Niño has officially arrived—and it could become one of the most…

    Business 3 Mins Read

    Anthropic’s Claude Fable 5 plays it too safe on safety, developers say

    Business 3 Mins Read

    Anthropic on Tuesday launched Claude Fable 5, its most capable public model.…

    Business 2 Mins Read

    SpaceX IPO update: Latest SPCX stock price, trading start time for closely watched Nasdaq debut

    Business 2 Mins Read

    From Wall Street to Main Street, Elon Musk’s hotly anticipated SpaceX initial…

    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    About us

    The Populist Bulletin was founded with a fervent commitment to inform, inspire, empower and spark meaningful conversations about the economy, business, politics, government accountability, globalization, and the preservation of American cultural heritage.

    We are devoted to delivering straightforward, unfiltered, compelling, relatable stories that resonate with the majority of the American public, while boldly challenging false mainstream narratives that seem to only serve entrenched elitists, and foreign interests.

    Top Picks

    El Niño is here—and it will ‘pour fuel on the fire of a warming world’

    June 11, 2026

    Anthropic’s Claude Fable 5 plays it too safe on safety, developers say

    June 11, 2026

    SpaceX IPO update: Latest SPCX stock price, trading start time for closely watched Nasdaq debut

    June 11, 2026
    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    Copyright © 2025 Populist Bulletin. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.