To Block or Not to Block GPTbot: Here’s What to Consider First
Learn what GPTbot is and the pros/cons of blocking ChatGPT's AI crawler
Wondering, “Should I block GPTbot?”, “Is GPTbot harmful?”, or “Can I see if ChatGPT uses my content?”
We’ve got answers. Learn what GPTbot is and read up on the pros and cons of letting Open AI’s GPTbot crawl your site.
ChatGPT is an online AI software made by OpenAI. It provides users with information and answers using data and content aggregated from the web. Since ChatGPT was trained using web information up until September 2021, answers can be outdated or incorrect.
GPTbot, a site crawler, was released in early August 2023 to scrape websites and collect new information and data for ChatGPT’s latest software version, GPT-5.
GPTbot may be the most discussed name in AI bot crawlers, but it’s not the only one. Other bots are used to inform AI tools, such as CCbot and WebText2.
Blocking GPTbot can prevent ChatGPT from using content on your site, but it won’t prevent other crawlers that may be informing the tool. If you don’t want GPTbot to access your content, you can use robots.txt to block the bot from accessing your website, or parts of it. But should you?
Is GPTbot harmful?
The act of GPTbot crawling your site isn’t harmful.
Bots crawl websites every day, and most are relatively harmless. Google might crawl large, active sites several times a day, or once every few weeks for smaller or less active sites.
An aggressive bot can ping your site enough to cause a spike in traffic, though this traffic typically shows up as an anomaly within data reporting tools and is filtered out.
OpenAI’s GPTbot crawls sites and stores information to use for AI training purposes and to provide answers to users. Currently, ChatGPT does not cite sources or links to the original source material it uses to generate answers — which could come from your website.
At present, there is no way to track whether your site content has been served in ChatGPT as an answer to a question or prompt. Once ChatGPT begins citing sources, you may be able to track incoming traffic from users who click on your site as a source material.
Should you block GPTbot?
AI bots are becoming more intuitive and gaining popularity. With GPTbot crawling sites, it’s safe to assume your content will be crawled and indexed for the AI program to use. The question is: should you block GPTbot in your robots.txt file to avoid having your content crawled and potentially used to inform answers? Here are some facts to consider.
Analytics tools do not see if your content appears in AI results.
Analytics tools such as Google Analytics, Google Search Console, SEMrush, Adobe Analytics, etc., do not currently have a way to show if your site was used for AI results. Allowing GPTbot to crawl and index the content on your site will let it potentially use that content when answering questions, but there is currently no way of knowing if it uses your content or not.
ChatGPT does not cite sources… Yet.
ChatGPT currently does not cite sources when providing answers to its users, which means if it uses your content, there will not be any attribution to your site — and the user wouldn’t know which source is informing ChatGPT’s response.
However, it is reasonable to assume that ChatGPT will follow Bard’s lead in terms of attribution and cite sources in future updates. For that reason, it may be wise to allow GPTBot to crawl your site now in preparation for what is to come.
“OpenAI says it will cite sources when plugins pull data from third-party websites. This means there will definitely be potential to get clicks from ChatGPT if a user pulls in your content,” – Search Engine Land.
ChatGPT will not replace Google Search in the short term, but it is a great additive tool.
Though AI programs like ChatGPT have gained popularity, it is unlikely they will replace Google Search when it comes to providing answers to users’ questions. Why? In its current state, ChatGPT is not timely nor informed on current events, people, and places.
On average, there are around 8.5 billion Google searches per day, while ChatGPT averaged around 55 million visitors per day in recent months — less than 1% of daily Google searches.
That said, other AI tools, such as Google Bard, are already integrated within search engine results and providing generative AI results. Your site could be discovered by searchers in generative AI results as long as your site allows AI bots to crawl content.
“There’s a decent chance that if somebody uses the tool to pull content from your website, they might link to you wherever they post the output. You’ll be passing up this chance if you block it.” – Search Engine Land.
Letting GPTbot crawl your site may lead to more mentions — and ownership over your brand voice.
Even if you aren’t yet receiving direct attribution in ChatGPT results, your brand name may still appear among ChatGPT’s results. Allowing GPTbot to crawl your site could result in increased (though currently immeasurable) benefits — akin to earned media or word of mouth. While you aren’t doing any additional work outside of publishing content on your digital properties, ChatGPT may be serving up queries about or related to your brand and, in turn, providing influence and driving interest. Allowing GPTbot to crawl your site will ensure that GPTbot — and therefore ChatGPT — is informed by and synthesizing your brand through your lens.
AI is still emerging and could shape the SERP.
The rapid emergence of consumer-facing AI technology will likely shape how search evolves down the line. We will likely see an integration between “traditional” search and AI. Rather than resisting (blocking GPTbot), early adopters may be at an advantage over the competition by allowing their sites to be crawled in the here and now.
At a minimum, allowing your site to be crawled by AI bots could allow your content to be utilized by a unique new tool that leads users back to search engines to prove the AI’s information is factual.
“We need to start thinking of AI as a new acquisition channel – just like we do with search, social and retail platforms or app stores.” – Search Engine Land
What’s GPO’s takeaway?
Currently, AI tools can and do use a site’s content without proper attribution. In the future, if ChatGPT begins crediting sources for its information, it’s best to ensure your quality content is displayed as the source for a user’s question.
At GPO, we believe AI is a highly relevant, evolving tool that should be embraced by companies as consumers gravitate toward AI-based tools such as ChatGPT. Weigh the pros and cons for your brand, and keep in mind that there is vast potential down the line for your content to influence AI and receive attribution in front of a larger audience as AI capabilities increase and AI integrations become more commonplace.