AI bots now feed on all websites & blogs. But you can stop it – Here is How

A new AI tool by Cloudfare now lets you to stop AI bots from scraping data available in your website, that it uses to train AI models.

The growth of new domain of AI called the generative-AI, though led to increased productivity and saved much of the time, it worried bloggers, researchers and content creators in one aspect. As all their data are available in open web, big tech giants like Google, and Microsoft use these data to train their AI model.

Artificial intelligence has made content publishers anxious about their unique works being turned into fodder for AI models. That’s the reason this company named “Cloudfare”, which is a content delivery network and cloud security platform came with this new AI tool.

Here is all you need to know and why it matters, if you own a business or thinking to start a business online.

The problem with AI

All the factual information provided by AI as answers are sourced from the existing websites, and blogs. This may not be welcoming for content publishers, as big AI companies use their data unauthorizedly and monetize from it without proper compensation.

While certain tech companies such as Google, Apple, and OpenAI identify their bots and respect established transparency protocols like the Robots Exclusion Protocol, which helps websites steer clear of them, others may try to evade clear identification. Recently, Perplexity AI came under the scanner for ‘plagiarizing’ news content and reports said that it tried to disguise its AI bot as a legitimate visitor while surreptitiously scraping data.

Due to this, content publishers seldom let their works be a fodder for AI models.

What does the new tool do?

The AI tool launched by Cloudfare is an easy button that will block all AI bots, fine-tuning its machine learning models to identify and block even those that try to impersonate real people. AI bots are automated programmes that browse the internet and “scrape” or collect vast amounts of data to train large language models.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” Cloudfare wrote in a blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection”. The new feature will be available to all customers, including those on the free tier, and can be enabled in their Cloudfare dashboards.

To enable it, you can simply navigate to the Security > Bots section of the Cloudfare dashboard, and click the toggle labelled AI Scrapers and Crawlers.

Top AI bots that steal your website data

In a survey of its AI tools into action, based on number of requests made to Cloudfare sites, it’s found that Bytespider, operated by TikTok parent ByteDance, a Chinese firm, was the AI bot with widest presence, found in 40.4% of accessed websites. ByteDance is making its ChatGPT-rival Doubao. It’s followed by Amazonbot, GPTBot, and ClaudeBot.

The extent of vulnerability of being crawled by AI bots depends on the popularity of a website on internet. Among the top 10 internet properties that use Cloudfare, 80% were accessed by AI bots and 40% blocked them.

The tool might be beneficial for bloggers and content publishers for safeguarding their data from AI bots.

References: The Economic Times, Cloudfare.

