X (Twitter) Facebook Pinterest LinkedIn E-mail

One year, three times more bots with AI. That’s the snapshot from the new State of the Internet (SOTI) report by Akamai: AI-driven bots have skyrocketed by 300% year-over-year and are now launching billions of requests against websites across all industries. The phenomenon, says the company, is not marginal: they are distorting operations and analytics, eroding advertising revenue, and testing security in digital publications, e-commerce, healthcare, and financial services.

Akamai sums it up clearly: “The rise of AI bots has shifted from a security team concern to a business imperative for the board”, says Rupesh Chokshi, Senior Vice President and General Manager of Application Security.

The warning comes with numbers and context. The company — which processes more than a third of global web traffic — has observed a surge in massive, automated scraping, as well as fraud supported by generative models: impersonations, more convincing social engineering, more effective phishing campaigns, and professional-quality fake documents or images. All of this occurs alongside an ecosystem where “good” bots (indexing, accessibility) coexist with “bad” bots (FraudGPT, WormGPT, ad fraud, refund scams…).

What’s happening (and where it hits hardest)

The Digital Fraud and Abuse 2025 report identifies a common pattern: AI reduces costs and speeds up both scraping and abuse automation. Anyone with basic skills can assemble, from open libraries and cloud services, a bot capable of bypassing first-generation defenses, rotating identities, and executing millions of requests.

Key findings by industry:

Media and Publishing: the hardest-hit sector, accounting for 63% of AI bot attacks. Newsrooms face sweeping copies of headlines, images, and articles; analytics dashboards are corrupted, and ad inventory loses value when traffic is not human.
E-commerce: leads in bot activity with over 25 billion requests in two months. Beyond price and stock scraping: also cart stuffing, coupon abuse, return fraud, and launch scalping.
Healthcare: over 90% of health-related attacks stem from scraping, mostly from search and training bots. The exposure here is not only economic: the risk includes accessing sensitive data or leaving traces that facilitate future attacks.
Across sectors: “useful” bots coexist with malicious campaigns that degrade performance, inflate infrastructure costs, and skew key metrics (sessions, conversions, funnels). Drawing the line between them is increasingly difficult without inventory, telemetry, and bot governance.

Surprisingly: despite their 300% growth, AI bots still represent “almost 1%” of total bot traffic observed by Akamai. The data suggests that quality, rather than volume, is today’s biggest threat: a few well-orchestrated bots can cause outsized impacts.

Why AI bots are different (and more challenging)

1) Evasion by design. The new wave mimics human gestures: mouse movements and timing, context switches, random scrolling, plausible reading sequences. They also rotate identities (browsers, fingerprints, IPs, ASN) and mix routes (web, API, mobile) to reduce traceability.

2) Site “comprehension”. Using models that “read” the DOM or images, bots can interpret interfaces, solve weak CAPTCHAs, understand checkout flows or forms, and discover un documented shortcuts.

3) Frictionless automation. The entire chain —discovery, instruction, execution, refinement— can be automated with AI, multiplying malicious actors’ efficiency and shrinking trial-and-error cycles.

4) Low marginal cost. With serverless infrastructure, commercial proxy networks, and third-party hosted models, launching an attack costs little and scales easily. Sometimes, defense costs surpass attack costs.

Business implications: from contaminated metrics to missed revenue

Broken analytics: funnels and KPIs lose reliability; product and marketing decisions are made on contaminated data.
Eroded advertising: % of invalid traffic increases; non-human impressions and clicks reduce eCPM and ROI.
Performance and costs: sudden bot spikes raise CPU, bandwidth, and storage usage; pay for instances and CDN to serve non-humans.
Security and fraud: more believable impersonations, convincing fake documents, highly personalized phishing campaigns, and identity fraud that bypass weak controls.
Brand & compliance: indiscriminate scraping breaches terms of use; poorly handled responses may conflict with privacy or competition laws.

Akamai’s proposals: three OWASP frameworks and intelligent prioritization

The report recommends aligning capabilities with the current OWASP Top 10:

Web applications: risks like Broken Access Control, Injection, Sensitive Data Exposure.
APIs: (Broken Object Level Authorization, Security Misconfiguration, Excessive Data Exposure, etc.).
LLMs: new framework for AI-specific abuses: prompt injection, data exfiltration via output, model denial, overreliance on LLMs.

The aim is not just to “install a tool” but to map known vulnerabilities to each business’s risk appetite and prioritize defenses sensibly: which assets (pages, APIs, endpoints of AI) deliver value, which attacks are observed, and what controls reduce risk most cost-effectively.

Concrete measures that work (and why)

1) **Dynamic allowlist and client signals for bot management**

Classify “good” bots (indexers, accessibility) and gate unknowns.
Verify clients using low-friction signals (passive tests, browser integrity, JA3/JA4, TLS fingerprinting, device attestation when possible).
Serve canary content (invisible watermarks) to detect re-publication and unauthorized training.

2) Protect APIs like frontends (because they are)

Object authorization (BOLA) and prevent exposure overload.
Adaptive quotas and rate-limiting by identity, ASN, risk score.
Honeypots for APIs (trapped endpoints) to signal abusive clients.

3) **Close the AI vulnerabilities: guardrails for prompts, models, and outputs**

Filter and classify sensitive data before it enters prompts; prevent leaks in responses (dual-purpose DLP).
Isolate agents and tools (principle of least privilege); limit actions and require strong confirmations for critical operations.
Monitor: log prompts, context, tools, and outputs with metadata for audit and forensics.
Evaluate: run automated red teaming on prompts, RAGs, and MCP servers before deploying to production.

4) **Enhance resilience against scraping (beyond robots.txt)**

Legal + technical: terms of use and headers like “noai”/“noscrape” combined with real controls.
Content dynamics: views that force clients to execute code (without harming accessibility or SEO).
Pricing and gating: tiered data, pay-per-API, and clear licenses for training use.

5) Data-driven operation

Unified telemetry (web, API, mobile, AI) and reconciled dashboards to distinguish human from automated traffic.
Unified teams: security, product, growth, and ad ops working with the same metrics, with runbooks for bot peaks.

Signs your site has an AI bot problem

Unusual CTRs with very low bounce rates or irrelevant time on site.
Nighttime spikes from regions outside your core markets, with rotating IPs and suspicious ASNs.
APIs experiencing massive reads of lists or search endpoints, not reflected on the web.
Repeated prompts or calls to MCP exhausting quotas without value creation.
Re-publication of content with invisible tags detected.

What each industry can do today

Media: activate canary content, close public APIs for content, and license training use when strategic. Use separate dashboards for human vs. non-human in ad ops.

Commerce: protect price and stock with quotas and risk scores; maintain dynamic lists of permitted bots; strengthen controls against return fraud and coupon abuse.

Healthcare: minimize scraping surfaces (catalogs, medical content); log all automated access; ensure models/AI do not expose PHI in responses.

Finance: apply device binding and strong passive tests; monitor bots that fill forms to open avenues for subsequent phishing.

Conclusion: from “blocking bots” to governing AI

The web was born open; AI has made it more valuable and more attackable. The message from SOTI is clear: blacklists and captchas aren’t enough. It’s time to govern AI — what enters models and prompts, what exits, who accesses, and why — and manage bots as a product: inventory, metrics, SLOs, and playbooks.

Doing so early will protect revenue, clean analytics, and prevent your business’s future from being written by bots that don’t pay the bills.

Frequently Asked Questions

What is an AI scraper bot, and how does it differ from a traditional indexer?
An AI scraper bot uses models to interpret pages and automate large-scale data extraction, evading basic controls (identity rotation, false human-like timing, weak captcha resolution). Unlike a legitimate indexer (Google, Bing), it doesn’t always respect robots.txt, terms of use, or provide reciprocal value (quality traffic), and typically targets full content, catalogs, or large datasets for training.

How do I apply the OWASP Top 10 for LLMs in my company if I already have WAF and bot management?
The OWASP Top 10 LLM complements (not replaces) web/API defenses. Add specific controls: filtering and marking sensitive data in prompts and outputs; guardrails for agents and MCP; logging prompts, context, tools; automated red teaming during CI/CD (injection, jailbreak, exfiltration). Prioritize based on risk — e.g., internal data RAG vs. public assistant.

How do I allow “good” bots and stop “bad” ones without harming SEO or accessibility?
Implement whitelists with client verification (agent signatures, JA3/JA4, browser integrity, ASN), quotas, and dedicated routes (web vs. API). Maintain a living catalog of accepted bots (indexers, accessibility) and apply passive challenges and limitations to others. Properly managed allowlists promote SEO benefits, not blocking everything indiscriminately.

Can I legally block AI training bots?
Consult your legal counsel: robots.txt is not binding on its own. Combine terms of use, “noai”/“noscrape” metadata, licenses, and technical controls (blocking, rate limiting, canary content). Document and notify access conditions; many AI providers respect clear signals when your stance is evident, and licensing options exist.