Cloudflare has launched a new free tool designed to protect its customers from web scraping by Artificial Intelligence (AI) bots. This measure aims to prevent such bots from collecting data from websites and, in turn, contributes to a safer Internet for content creators.
A Response to the Demand for Generative AI Data
The rise of generative AI has significantly increased the demand for data to train models and make inferences. To meet this demand, many bots resort to web scraping, a process that involves extracting HTML content from websites to store and use it in training AI algorithms.
Although web scraping is legal, its excessive and opaque use by some bots has raised concerns among website owners. Cloudflare has responded to this issue by incorporating a new feature in its Internet security service that automatically blocks AI bots dedicated to web scraping.
New Security Feature to Block AI Bots
Cloudflare’s tool, available to all customers, including those using their free level of services, aims to provide additional control over data access. To activate it, users must go to the ‘Security’ menu in their control panel, select ‘Bots,’ and enable the ‘IA Scrapers and Crawlers’ option. Once activated, the feature will start blocking scraping attempts made by these bots.
Cloudflare has designed this tool with the capability to automatically update to include new fingerprints of offending bots, ensuring continuous protection against emerging data collection methods.
Revealing Data on AI Bots Usage
According to data provided by Cloudflare, AI bots are extremely active on the web. Bytespider from ByteDance, Amazonbot from Amazon, ClaudeBot from Claude, and GPTBot from OpenAI are some of the most active on their platform. Bytespider tops the list with attempts to access 40.40% of client websites, followed by GPTBot with 35.46%. ClaudeBot, on the other hand, has tried to access 11.17% of sites.
These numbers underline the prevalence and impact of AI bots in information gathering, highlighting the importance of effective measures to protect website data.
An Expanding Issue
In June of this year, it was identified that AI bots accessed 39% of the million top websites protected by Cloudflare. However, only 2.98% of these sites had taken steps to block such requests at that time.
Cloudflare remains committed to its mission of maintaining Internet security and ensuring that content creators have control over how their material is used to train AI models. The company also notes that other platforms, such as Reddit, are taking similar measures, such as updating their Robots Exclusion Protocols to limit automated access to public data.
In summary, with the implementation of this new feature, Cloudflare reinforces its role in protecting privacy and data security on the web. By providing customers with an effective tool to block AI bots, the company not only helps preserve the integrity of online content but also contributes to a safer Internet that respects the rights of creators.