Close Menu
    Facebook X (Twitter) Instagram
    TRENDING :
    • Kill | The Nation
    • AI arms race or not, the U.S. and China need to talk about the tech
    • Government by Payback Squad | The Nation
    • Meta is using mouse-tracking software on employees. Now they’re pushing back.
    • Chris Rabb Will Be a Transformative Member of the House
    • AI scraping has become its own media business
    • Party City store closures started a war to win over its customers. Two very different retailers are on the front lines
    • Cisco layoffs today: Tech giant slashes thousands of jobs as CEO touts record revenue and urgent focus on AI
    Populist Bulletin
    • Home
    • US Politics
    • World Politics
    • Economy
    • Business
    • Headline News
    Populist Bulletin
    Home»Business»AI scraping has become its own media business
    Business 7 Mins Read

    AI scraping has become its own media business

    Business 7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Follow Us
    Google News Flipboard
    Share
    Facebook Twitter LinkedIn Pinterest Email

    There are several dimensions to the ongoing legal war between the media industry and AI companies over copyright, and one of the major ones is the question of outputs. Which is to say: Scraping content without permission may be detestable, but if the party doing the scraping isn’t doing anything with it that would compete with the content creator, it’s difficult to prove harm. And many legal proceedings, especially civil claims, depend on showing the actions were harmful.

    One of the earlier rulings in this area exemplifies the point. A group of authors, including comedienne Sarah Silverman, sued OpenAI way back in 2023 for appropriating their books without compensation. A judge later dismissed several of the authors’ claims because the lawsuit didn’t identify specific outputs that were direct copies. It turns out just pointing out that a large language model (LLM) was trained on your material isn’t enough—you have to show it’s creating outputs that take business away from you.

    The output problem

    Copyright lawsuits like the Silverman case often depend on showing specific instances of scraping and reproduction. The problem is, much of this activity is in the realm of bots: scraping done quickly, silently, and at scale. And while the outputs of big, public-facing AI services like ChatGPT, Gemini, and Perplexity are there for everyone to see, there’s a whole shadow industry of mass AI scraping that isn’t.

    {“blockType”:”mv-promo-block”,”data”:{“imageDesktopUrl”:”https://images.fastcompany.com/image/upload/f_webp,q_auto,c_fit/wp-cms-2/2025/03/media-copilot.png”,”imageMobileUrl”:”https://images.fastcompany.com/image/upload/f_webp,q_auto,c_fit/wp-cms-2/2025/03/fe289316-bc4f-44ef-96bf-148b3d8578c1_1440x1440.png”,”eyebrow”:””,”headline”:”u003Cstrongu003ESubscribe to The Media Copilotu003C/strongu003E”,”dek”:”Want more about how AI is changing media? Never miss an update from Pete Pachal by signing up for The Media Copilot. To learn more visit u003Ca href=u0022https://mediacopilot.substack.com/u0022u003Emediacopilot.substack.comu003C/au003E”,”subhed”:””,”description”:””,”ctaText”:”SIGN UP”,”ctaUrl”:”https://mediacopilot.substack.com/”,”theme”:{“bg”:”#f5f5f5″,”text”:”#000000″,”eyebrow”:”#9aa2aa”,”subhed”:”#ffffff”,”buttonBg”:”#000000″,”buttonHoverBg”:”#3b3f46″,”buttonText”:”#ffffff”},”imageDesktopId”:91453847,”imageMobileId”:91453848,”shareable”:false,”slug”:””,”wpCssClasses”:””}}

    It’s been an open secret that AI companies sometimes obtain data from third-party brokers, and media industry analyst Matthew Scott Goldstein recently published an extensive report on them. The conclusions, as reported in Digiday, are eye-opening: At least 21 companies, several funded to the tune of hundreds of millions of dollars, routinely scrape publisher content without paying for it, and sell their “data services” to customers that include OpenAI, Amazon, and even other publishers like The Telegraph.

    The report shows what “outputs” are when scraping is allowed at scale: multimillion-dollar companies built around parsing internet data for bots and agents, indexing that content, and selling it. These aren’t famous companies; they have names like Parallel AI, Exa, and Bright Data. Goldstein points out that they aren’t shy about what they’re doing: While a recent Wall Street Journal profile describes Parallel AI as a platform “dedicated to servicing AI agents,” he characterizes it as a “scraper company with better branding.”

    As the saying goes, show me the incentives, and I’ll show you the outcome. Given the setbacks in copyright cases before the courts, not to mention the current administration’s dismissal of copyright concerns, the message is clear: There are little to no consequences to unauthorized scraping, and generally the legal and technical mechanisms governing it default to greater access for AI systems.

    Block the bots, or build for them?

    This reality creates an existential dilemma among media companies. Do you aggressively block bots from accessing your content, or do you let them do it? The latter means essentially conceding the fight (or at least letting others fight it for you), but it also gets you out of the game of whack-a-mole with AI scrapers. More importantly, it frees you up to build a business around the idea that AI ingests and repurposes your content.

    I actually don’t believe these two perspectives are as contradictory as they may seem. Yes, copyright holders should assert their intellectual property rights, but they also need to contend with a future where AI engines are an essential part of content strategy. AI is a distribution channel, an intermediary, and an audience, all at the same time.

    What does a considered approach to the scraping ecosystem look like? I see five components, not all of which will be available to every media company:

    1. Get better at blocking bots: Protecting your IP requires both technical and legal components. Most major publishers are blocking bots, at least on paper, though being aggressive about it means going beyond adjustments to the robots exclusion protocol (the instructions every site has for bots trying to scrape their site—which are often ignored). For instance, People Inc. CEO Neil Vogel has said his company has needed to become highly sophisticated at blocking unauthorized bots.
      Most publishers don’t have the same resources. However, there are technical partners that can help, and infrastructure companies like Cloudflare have moved toward copyright-protecting defaults. Even if sophisticated blocking tech isn’t an option, you can still gather intel. Don’t just look at the bot traffic to your site; you should regularly audit AI systems to find where your content has been appropriated and misused.
    2. Practice good GEO: It might seem counterintuitive, but regardless of whether or not your site is being scraped, you should make your content as friendly to AI scrapers as possible. The question of access is a binary—either they should be scraping or not. The problem with ignoring generative engine optimization (GEO) is that, if your content is hard for bots to interpret, that counts for both authorized and unauthorized bots.
      There are several advantages to practicing good GEO. For starters, there’s the reality that scraping is happening, so you should compete in summaries, even if you don’t like being there without getting compensated. You may as well get the visibility and the (small) qualified traffic that results. Also, it creates a paper trail for your proactive auditing, and potentially helps prove your value in any legal proceedings. Finally, it will be essential if you build an in-house agent or MCP server for your content.
    3. Shift your business model: I’ve written about this extensively, but the reality is the media model of the Google era is rapidly diminishing. That means any business that’s primarily based on monetizing anonymous traffic is shrinking. New revenue streams need to be nurtured, including events, subscriptions, data and more. I know—easier said than done, but diversifying revenue needs to become religion among ad-dependent publishers.
    4. Sue: This is not an option for everyone, obviously. Very few media companies have the resources to take on an OpenAI or a Perplexity in court. But the report on the shadow market of industrial-scale scraping opens up a group of companies that have been largely invisible up until now. Given what they’re openly doing, how much money is involved, and the stakes for publishers, it would be surprising if more legal action didn’t result.
    5. Lobby for regulation: While regulation at the federal level seems unlikely in the current environment, many states are attempting to regulate AI, including through training-data transparency and disclosure rules. And it may not even require a wholesale updating of copyright law. The mere requirement for bots to properly identify themselves would ensure some bots couldn’t effectively impersonate humans, allowing for much more robust governance mechanisms.

    Reasserting agency

    As AI bots continue to “eat the internet,” publishers may feel a sense of helplessness—that scraping is just another brutal inevitability to be endured. There’s some truth to that. But inevitability shouldn’t become an excuse for paralysis. In a world increasingly dominated by agents, publishers need to reassert their own agency: protecting what they can, adapting where they must, and refusing to let the future of their work be decided entirely by the same companies who scraped it.

    {“blockType”:”mv-promo-block”,”data”:{“imageDesktopUrl”:”https://images.fastcompany.com/image/upload/f_webp,q_auto,c_fit/wp-cms-2/2025/03/media-copilot.png”,”imageMobileUrl”:”https://images.fastcompany.com/image/upload/f_webp,q_auto,c_fit/wp-cms-2/2025/03/fe289316-bc4f-44ef-96bf-148b3d8578c1_1440x1440.png”,”eyebrow”:””,”headline”:”u003Cstrongu003ESubscribe to The Media Copilotu003C/strongu003E”,”dek”:”Want more about how AI is changing media? Never miss an update from Pete Pachal by signing up for The Media Copilot. To learn more visit u003Ca href=u0022https://mediacopilot.substack.com/u0022u003Emediacopilot.substack.comu003C/au003E”,”subhed”:””,”description”:””,”ctaText”:”SIGN UP”,”ctaUrl”:”https://mediacopilot.substack.com/”,”theme”:{“bg”:”#f5f5f5″,”text”:”#000000″,”eyebrow”:”#9aa2aa”,”subhed”:”#ffffff”,”buttonBg”:”#000000″,”buttonHoverBg”:”#3b3f46″,”buttonText”:”#ffffff”},”imageDesktopId”:91453847,”imageMobileId”:91453848,”shareable”:false,”slug”:””,”wpCssClasses”:””}}



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    AI arms race or not, the U.S. and China need to talk about the tech

    May 14, 2026

    Meta is using mouse-tracking software on employees. Now they’re pushing back.

    May 14, 2026

    Party City store closures started a war to win over its customers. Two very different retailers are on the front lines

    May 14, 2026
    Top News
    World Politics 5 Mins Read

    Report: Soros Foundation Gave $80M To Groups Tied To ‘Extremist Violence’

    World Politics 5 Mins Read

    Amid President Donald Trump officially designating Antifa a domestic terror organization, a new report details…

    Why Have an LLC as a Smart Business Choice?

    April 19, 2026

    Switching to Anthropic? Claude can now take your memories from ChatGPT, Gemini and Copilot

    March 2, 2026

    Donald Trump’s Imperialism Is Murdering People—at Home and Abroad

    December 1, 2025
    Top Trending
    US Politics 1 Min Read

    Kill | The Nation

    US Politics 1 Min Read

    From illegal war on Iran to an inhumane fuel blockade of Cuba,…

    Business 3 Mins Read

    AI arms race or not, the U.S. and China need to talk about the tech

    Business 3 Mins Read

    Two world powers are in an arms race to develop the most…

    US Politics 12 Mins Read

    Government by Payback Squad | The Nation

    US Politics 12 Mins Read

    The Trump White House has weaponized all the arms of federal law…

    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    About us

    The Populist Bulletin was founded with a fervent commitment to inform, inspire, empower and spark meaningful conversations about the economy, business, politics, government accountability, globalization, and the preservation of American cultural heritage.

    We are devoted to delivering straightforward, unfiltered, compelling, relatable stories that resonate with the majority of the American public, while boldly challenging false mainstream narratives that seem to only serve entrenched elitists, and foreign interests.

    Top Picks

    Kill | The Nation

    May 14, 2026

    AI arms race or not, the U.S. and China need to talk about the tech

    May 14, 2026

    Government by Payback Squad | The Nation

    May 14, 2026
    Categories
    • Business
    • Economy
    • Headline News
    • Top News
    • US Politics
    • World Politics
    Copyright © 2025 Populist Bulletin. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.