The AI Content Crisis: How LLMs Are Draining Media Revenue – and the Technologies Fighting Back
嗯,用户让我帮忙总结一篇文章,控制在一百个字以内,而且不需要特定的开头。首先,我需要理解这篇文章的主要内容。 文章讨论了大型语言模型(LLMs)如何破坏了传统媒体的收入模式。传统上,搜索引擎通过抓取内容并引导用户到网站,从而为媒体带来广告和订阅收入。但LLMs改变了这一点,因为它们大量抓取内容却不引导用户访问原网站,导致媒体的流量和收入大幅下降。 接着,文章提到几种应对策略,比如Cloudflare的控制措施、TollBit的内容许可市场、ProRata的收入分成模式等。这些技术正在重塑内容所有权和AI治理。 我需要将这些要点浓缩到100字以内。首先说明问题:LLMs破坏了媒体收入。然后提到解决方案:新技术正在改变内容经济。最后点出这些变化影响了内容所有权和AI治理。 这样就能在有限的字数内涵盖主要信息。 </think> 大型语言模型破坏了传统媒体的收入模式,通过大规模抓取内容而不引导用户访问原网站,导致广告和订阅收入大幅下降。新技术如Cloudflare、TollBit和ProRata正在重塑内容经济,重新定义内容所有权、AI治理和开放网络的经济模式。 2026-4-16 15:4:16 Author: securityboulevard.com(查看原文) 阅读量:4 收藏

The post The AI Content Crisis: How LLMs Are Draining Media Revenue – and the Technologies Fighting Back appeared first on Deepak Gupta | AI & Cybersecurity Innovation Leader | Founder's Journey from Code to Scale.

The AI Content Crisis: How LLMs Are Draining Media Revenue - and the Technologies Fighting Back

The internet's economic engine is breaking down. For two decades, a simple contract held the web together: search engines crawl your content, index it, and send humans back to your site where you earn revenue through advertising and subscriptions. Large Language Models have shattered that loop entirely – and with it, the financial foundation of journalism, publishing, and digital media.

This is not a hypothetical future threat. It is happening right now, at scale, with measurable consequences. And the solutions emerging to address it are reshaping how we think about content ownership, AI governance, and the economics of the open web.

If you're unfamiliar with how AI search engines actually process and retrieve content, my technical guide to RAG architecture explains the retrieval mechanics that make this entire crisis possible.

The Broken Value Exchange: What the Data Shows

To understand the depth of this crisis, you need to look at one metric: the scrape-to-referral ratio. Traditional search engines like Google operate at roughly a 10:1 ratio – they crawl about 10 pages for every human visitor they send back to a publisher's site. That is the value exchange that funded digital media for a generation.

AI companies have obliterated that ratio.

According to TollBit's Q1 2025 State of the Bots report, OpenAI's scrape-to-referral ratio sits at 179:1. Perplexity's is 369:1. Anthropic's ratio is a staggering 8,692:1. That means for every 8,692 times Anthropic's crawlers scrape a publisher's content, they send back exactly one human visitor. The CEO of Conductor quantified it even more bluntly: ChatGPT crawls websites roughly 60,000 times and sends back just one visitor.

One publisher, Digital Trends, documented 4.1 million bot scrapes in a single week that generated only 4,200 human referrals – a 966:1 extraction ratio. The content is being consumed at industrial scale. The compensation is essentially zero.

Stanford Graduate School of Business research confirms the pattern from the user side. Click-through rates from AI chatbots land at just 0.33%, and AI search engines at 0.74%, compared to 8.6% for traditional Google Search. When Google's AI Overviews appear in results, only 8% of users click any link compared to 15% without AI summaries – a 46.7% drop in click-through rates. Zero-click searches surged from 56% to 69% between 2024 and 2025.

The traffic losses are massive and accelerating. Organic traffic to U.S. websites declined from 2.3 billion visits to under 1.7 billion in the same period. The IAB Tech Lab's consultation data shows publishers receiving 20–60% less traffic from search as bots have overtaken human-based web traffic. Traffic to the world's 500 most-visited publishers dropped 27% year-over-year – an average of 64 million fewer visits every month.

I covered the broader data around AI referral traffic growth and quality in my analysis of which AI engines B2B companies should actually optimize for, which includes data showing AI-referred visitors convert at significantly higher rates despite lower volume.

The Bot Tsunami: AI Scraping at Scale

The volume of AI bot traffic hitting publisher sites is growing exponentially. From May 2024 to May 2025, overall crawler traffic rose 18%, with GPTBot (OpenAI's crawler) growing 305% and Googlebot growing 96%. Arc XP, the Washington Post's publishing platform that powers over 1,000 media websites, observed a 300% year-over-year jump in AI-driven bot traffic. Media and publishing sites are 7x more likely to see AI bot traffic than the average website.

TollBit detected 436 million AI bot scrapes in Q1 2025, up 46% from Q4 2024. By Q4 2025, there was one AI bot visit for every 31 human visits – up from 1:50 earlier that year. Bots now make up approximately 80% of all web visits, with AI bots specifically representing about 13% of total traffic.

The critical distinction that publishers must understand is the difference between two types of scraping. Training scrapes are "one-and-done" – they feed a model's general knowledge. RAG (Retrieval Augmented Generation) scrapes are continuous and growing far faster, having risen 49% in Q1 2025 alone – nearly 2.5x the rate of training scrapes. RAG scrapes power real-time responses in AI chatbots and search engines, meaning AI companies need fresh publisher content every single day to remain competitive.

This distinction matters enormously for monetization strategy. Most publisher content has already been ingested for training purposes – that horse has bolted. But RAG creates an ongoing dependency that gives publishers actual leverage.

The Revenue Collapse: Ads, Subscriptions, and the Attention Economy

Fewer pageviews mean fewer ad impressions, lower CPM revenue, reduced subscription conversions, and lost affiliate income. For ad-supported outlets, the math is brutal: a 27% traffic decline translates almost directly to a 27% advertising revenue decline. For subscription-driven publishers, fewer site visits mean fewer opportunities to convert casual readers into paying customers.

The damage extends beyond direct revenue. Publishers have invested years building first-party data assets – audience login strategies, loyalty programs, newsletter portfolios, identity frameworks, and e-commerce data capture. These assets depend on human visitors actually arriving at the site. When AI intermediates the content consumption experience entirely, publishers lose not just the immediate ad impression but the ability to build the audience relationships that drive long-term monetization.

As one senior publishing executive at The Economist put it, organic traffic has declined sharply as generative AI tools increasingly intercept the path to publisher content. While consumer adoption of AI search has risen, there has been zero value exchange for content creators. The promised upside – traffic from LLMs and AI-powered search – has not materialized.

The Solution Landscape: Seven Approaches Reshaping the Economics

1. Cloudflare: Infrastructure-Level Control and Pay-Per-Crawl

Cloudflare manages approximately 20% of global web traffic, making it the single most impactful player in this fight. On July 1, 2025 – what the company called "Content Independence Day" – Cloudflare became the first major internet infrastructure provider to block AI crawlers by default for all new users. This was a seismic shift from the opt-out world (where bots had free access unless explicitly blocked) to an opt-in model.

Cloudflare's approach operates at three levels:

AI Crawl Control (formerly AI Audit) provides granular visibility into which AI services are crawling a site, what content they access, how frequently, and whether they comply with robots.txt directives. This is available on all Cloudflare plans at no extra cost, giving even small publishers the same visibility that only the largest media companies previously had through direct relationships with AI providers.

Pay-Per-Crawl gives publishers three options per crawler: Allow (free access), Charge (pay per request at a domain-wide price), or Block (deny entirely). The system uses cryptographic HTTP Message Signatures to prevent crawler spoofing – solving the critical technical challenge of ensuring that only the authentic crawler is charged, not an impersonator. Cloudflare aggregates billing events, charges the crawler, and distributes earnings to the publisher.

Default Blocking ensures that AI crawlers cannot access Cloudflare-protected sites unless publishers explicitly opt them in. Even if a crawler lacks a billing relationship with Cloudflare, a publisher choosing to "charge" them effectively blocks access while signaling that a commercial relationship could exist in the future.

This is significant because, as Cloudflare CEO Matthew Prince stated, the internet cannot survive the age of AI without giving publishers the control they deserve and building a new economic model that works for everyone. Cloudflare is building that economic infrastructure at the network layer, which is the only layer where enforcement actually works – unlike robots.txt, which AI bots routinely ignore.

2. TollBit: The Two-Sided Content Licensing Marketplace

TollBit operates as a marketplace where AI companies pay publishers to access content. Founded by Toast alumni Olivia Joslin and Toshit Panigrahi, the company has grown to serve nearly 7,000 publisher sites. Nearly 20% of those publishers have generated revenue through TollBit's AI bot paywall, ranging from hundreds to tens of thousands of dollars per month.

TollBit's strategic insight is that publishers need to think of AI bots as a new category of "visitor" – one that will far exceed human visitors in volume and requires its own monetization infrastructure. The platform provides analytics showing which bots access content, scraping frequency, specific pages targeted, and human referral ratios. Publishers pay nothing until they activate monetization – TollBit takes a transaction fee only from AI companies paying for access.

A major development in April 2026: Arc XP integrated TollBit to help mid-size publishers who lack the negotiating power to cut direct licensing deals with AI companies. The Philadelphia Inquirer, for example, plans to adopt TollBit through this integration to charge AI bots for content access.

TollBit also enables publishers to create separate content versions specifically for AI agents – controlling exactly what machines can access without impacting the human browsing experience. This "dual-layer" content strategy (human-optimized site plus machine-optimized feed) is becoming a key architectural pattern.

3. ProRata: Attribution-Based Revenue Sharing

ProRata takes a fundamentally different approach. Instead of charging AI companies for scraping access, ProRata shares ad revenue generated from AI-powered answers that cite publisher content. Through its Gist.ai product, ProRata offers a 50/50 revenue split backed by attribution algorithms that track how much each article contributed to an AI-generated response.

Over 500 publishers have joined ProRata, including The Atlantic, TIME, Fortune, ADWEEK, BuzzFeed, and Lee Enterprises. The model avoids TollBit's chicken-and-egg problem (needing both publishers and AI companies on the platform) because revenue comes from ads served alongside AI answers on the publisher's own site, not from AI companies agreeing to pay licensing fees.

For publishers wanting faster time-to-revenue, ProRata's model is compelling. You implement on-site AI search, your audience uses it, ads generate revenue, and you keep 50%. The dependency is on your own audience behavior, not on industry-wide marketplace maturation.

4. Perplexity's Comet Plus: Subscription Revenue Sharing

Perplexity launched its Comet Plus program in 2025, bundled with its Comet browser. The company shares subscription revenue with publishers – participants include TIME, Fortune, Los Angeles Times, Adweek, and Blavity. Publishers get paid when their articles appear in Comet browser results, when they drive traffic through the browser, and when AI agents use their content.

This model ties compensation directly to content usage in the AI experience, but the pools remain small compared to traditional search revenue. Scaling depends on converting free users to paid subscribers of the Comet browser – a significant distribution challenge.

5. Direct Licensing Deals: The Big Publisher Strategy

The largest publishers have pursued high-value direct licensing agreements. News Corp secured a multi-year deal with OpenAI reportedly worth hundreds of millions. The Washington Post, New York Times, Condé Nast, and Hearst have signed agreements with various AI companies. Meta confirmed multi-year licensing deals with seven major publishers including CNN, People Inc., and USA Today Co. Amazon entered the licensing market with deals with the NYT, Condé Nast, and Hearst.

The trend is shifting from lump-sum training deals toward pay-per-usage models as publishers recognize how dependent LLMs are on fresh, real-time content for RAG. News Corp's strategy of pursuing deals with multiple LLMs simultaneously reflects the understanding that exclusivity would limit revenue streams – and that AI companies themselves need access to premium journalism to differentiate their models in an increasingly crowded market.

However, this path is only available to the largest media conglomerates. As IAB Tech Lab and multiple industry observers have noted, smaller publishers lack the resources and leverage to negotiate equitable deals, making collective solutions essential.

6. IAB Tech Lab: The Industry Standard (LLM Content Ingest API)

The IAB Tech Lab is developing a standardized framework called the LLM Content Ingest API Initiative (now being reorganized under the Content Monetization Protocols or CoMP working group). The framework has four components:

  • Access Controls: Determining which AI systems can access publisher content
  • Access Terms: Licensing models and content tiers defining the commercial relationship
  • Content Logging: Tracking and reporting how ingested content is actually used
  • Tokenization: Uniquely assigning publisher content within LLM systems so publishers can see exactly how their scraped content appears in AI outputs

The tokenization component is particularly significant. It would allow publishers to trace their content through the AI pipeline – seeing when their reporting is used in a ChatGPT response or a Perplexity answer. Brands could similarly track what is being said about their products within AI systems.

The challenge is adoption. This standard requires buy-in from AI companies, and the history of tech platforms voluntarily adopting frameworks that constrain their behavior is not encouraging. The weekly CoMP working group meetings signal momentum, but standardization in this space will likely require regulatory backing or sufficient market pressure.

7. Creative Commons: The Ethical Framework

In a significant pivot from its historic mission of free sharing, Creative Commons announced "cautious" support for pay-to-crawl technologies in late 2025. The organization recognized that the old social contract of "content in exchange for traffic" is broken, and that automated machine access to data must, in many cases, be financially compensated.

Creative Commons warned, however, that pay-to-crawl must not become the only mode of access. The risk is a two-tier web: data-rich for well-funded AI companies, data-poor for researchers, non-profits, and archivists. This tension between compensating creators and maintaining open access will define the policy debates of the next several years.

Emerging Techniques for the AI Age

Beyond these established players, several strategic approaches are developing that publishers and content-dependent businesses should understand:

Dual-Layer Content Architecture

The most forward-thinking publishers are building separate content layers for human and machine consumption. The human experience remains rich with multimedia, interactive elements, and conversion funnels. The machine layer provides structured, markdown-formatted content with clear metadata, entity definitions, and semantic markup – optimized for how LLMs actually ingest and process information.

TollBit enables this by letting publishers create agent-specific content versions. This approach ensures that AI systems get clean, structured data (which improves citation quality) while publishers maintain full control over the human experience and its monetization.

Generative Engine Optimization (GEO)

As AI systems become primary information intermediaries, the publishers and brands that appear in AI-generated answers will capture disproportionate visibility and trust. GEO – optimizing content for AI citation and recommendation – is becoming as critical as SEO was for the search era.

This involves schema markup (the structured metadata that LLMs actively scrape), entity saturation (becoming the authoritative source for specific topics), semantic clarity (writing content that survives being chunked, embedded, and semantically retrieved by RAG systems), and FAQ structures that directly answer the queries users pose to AI systems.

The companies getting ahead are those with clean, declarative content that does not require context to understand. Legacy keyword-stuffed content is increasingly invisible in AI systems that prioritize semantic meaning over keyword density. I've written extensively about why product content earns 46–70% of AI citations while blog posts capture under 6% – the implications for content strategy are enormous.

Quality-Based Traffic Arbitrage

An unexpected finding is reshaping how smart publishers think about AI-referred traffic. Visitors arriving through AI systems actually spend more time on sites and are more likely to complete purchases than casual Google searchers. AI-referred traffic converts at 30–40% in some enterprise contexts. The volume is lower, but the intent is dramatically higher.

This suggests that publishers should optimize for conversion rather than raw traffic volume. The audience arriving from AI recommendations has already been pre-qualified by the AI system's assessment of the publisher's authority and relevance. Investing in conversion rate optimization, email capture, and subscription offers targeted at AI-referred visitors could offset some of the traffic decline.

Collective Bargaining Infrastructure

For the vast majority of publishers who lack News Corp's negotiating power, collective solutions are the only viable path. TollBit and ProRata function as aggregators – combining thousands of small and medium publishers to negotiate collectively with AI giants. The IAB Tech Lab's CoMP working group serves a similar coordinating function.

This aggregation model is critical because individual small publishers have zero leverage against companies spending billions on compute infrastructure. Collectively, however, they represent the content diversity and freshness that LLMs need to remain competitive. The aggregator platforms that successfully organize this long tail will capture enormous value.

Bot-Specific Paywall Architecture

Rather than extending the traditional human paywall model to AI bots (which does not work well technically or commercially), publishers are building separate authentication and payment layers specifically for non-human visitors. This is a fundamentally new monetization surface – a "bot paywall" that operates in parallel to the human subscription model.

The architecture involves bot detection and classification at the network edge (Cloudflare, Akamai), authentication via cryptographic signatures (Web Bot Auth proposals), metered access with per-request billing (TollBit, Cloudflare Pay-Per-Crawl), and content formatting specifically for machine consumption (markdown, structured data).

Litigation continues to play a significant role. The New York Times' lawsuit against OpenAI is the highest-profile case, aiming to establish clear legal boundaries around content scraping and AI training. News Corp sued Perplexity in October 2024, and a judge rejected Perplexity's attempt to dismiss the case.

These lawsuits serve a dual purpose: establishing legal precedent and creating negotiating leverage. The "woo and sue" strategy – pursuing licensing deals with cooperative AI companies while litigating against those who refuse fair terms – has become the standard playbook for major publishers. The legal outcomes will significantly influence whether the emerging technical and commercial solutions gain sufficient adoption.

What This Means for the Future of the Open Web

The convergence of these forces points toward a web that operates very differently from the one we have known. The default is shifting from open access to permission-based access. Content that was freely available to any crawler will increasingly sit behind commercial gates – Cloudflare's default blocking, TollBit's bot paywalls, and publisher-controlled access policies.

This is not necessarily a negative outcome, but it requires careful navigation. The risk Creative Commons identified – a two-tier web where only well-funded AI companies can access quality content – is real. If pay-to-crawl becomes the exclusive model, research institutions, non-profits, and smaller AI startups could be priced out, concentrating power further among the largest technology companies.

The ideal outcome is a layered system: free access for research and archival purposes, commercial licensing for AI training and RAG at scale, attribution-based compensation when content appears in AI outputs, and revenue sharing when AI-driven experiences generate advertising or subscription income. Building that layered system requires coordination among publishers, AI companies, infrastructure providers, standards bodies, and regulators.

For publishers, the strategic imperative is clear: implement bot monitoring immediately (TollBit's free tier, Cloudflare's AI Crawl Control), establish access policies before the market matures, invest in content that AI systems need but cannot easily replace (original reporting, expert analysis, proprietary data), and optimize for AI visibility through GEO so that when compensation models mature, your content is already embedded in the AI information ecosystem.

For AI companies, the message is equally direct: the era of free content extraction is ending. The companies that build sustainable content partnerships now will have access to the high-quality, fresh information that differentiates their models. Those that continue scraping without compensation face litigation, network-level blocking, and reputational damage that will ultimately constrain their product quality.

The internet's economic model is being rebuilt in real time. The publishers who understand this moment and act on it – will survive. Those who wait for the old model to return will not.


Deepak Gupta is the Founder and CEO of GrackerAI, a Generative Engine Optimization (GEO) platform for B2B cybersecurity companies. He is a 5x patent holder, author of 6 books on cybersecurity and data privacy, and writes about AI visibility, content economics, and the future of search. Explore more research at the Research Hub or browse tool comparisons and reviews.

*** This is a Security Bloggers Network syndicated blog from Deepak Gupta | AI &amp; Cybersecurity Innovation Leader | Founder&#039;s Journey from Code to Scale authored by Deepak Gupta - Tech Entrepreneur, Cybersecurity Author. Read the original post at: https://guptadeepak.com/the-ai-content-crisis-how-llms-are-draining-media-revenue-and-the-technologies-fighting-back/


文章来源: https://securityboulevard.com/2026/04/the-ai-content-crisis-how-llms-are-draining-media-revenue-and-the-technologies-fighting-back/
如有侵权请联系:admin#unsafe.sh