Monday, 24 April 2023

Google News Documentation for Googlebot Verification

 

Google added information about user-triggered bot visits that had been missing from earlier Googlebot documentation to their Search Central documentation on verifying Googlebot. This confusion had persisted for a long time as some publishers had blocked the IP ranges of the legitimate visits.

Modernised Bot Documentation

Google updated their documentation to include a classification of the three major bot types that publishers should be prepared for.

The three types of Google Bots are as follows:

1.The search crawler Googlebot
2.specific types of crawlers
3.Fetchers that are activated by users (Google User Content)

Because Google didn't have any documentation on that last one, Google User Content, it has long baffled publishers.

According to Google, Google User Content is as follows:

Users-activated fetchers

Tools and product features that let the user initiate a fetch.

For instance, Google Site Verifier responds to a user's request.

These fetchers disregard robots.txt restrictions because the fetch was requested by a user.

According to the documentation, the following domain will appear in the reverse DNS mask:

“***-***-***-***.gae.googleusercontent.com”

When a user browsed a website through the translate function that was once present in the search results, a functionality that is no longer present in Google's SERPs, bot activity from IP addresses linked to GoogleUserContent.com was allegedly activated.

I'm not sure if it is accurate. It was sufficient to know that Google had visited as a result of user activity.

The Google Site Verifier tool can be used to detect bot activity coming from IP addresses connected to GoogleUserContent.com, according to Google's updated instructions.

However, Google doesn't specify what else could cause a bot to originate from the IP addresses of GoogleUserContent.com.

The documentation has also been updated to include a reference to googleusercontent.com when discussing IP addresses that have been given to the GoogleUserContent.com domain name.

Here is the updated text:

Check to see if the website's domain ends in either googlebot.com, google.com, or googleusercontent.com.

The following sentence, which was expanded from the previous page, is another new addition:

By comparing the crawler's IP address to the lists of Google crawlers' and fetchers' IP ranges, you can also identify Googlebot by IP address:

Googlebot

unique spiders, such as AdsBot

"User-triggered fetches"

Documentation for Google Bot Identification

The updated documentation now mentions bots that use IP addresses connected to GoogleUserContent.

These IP addresses perplexed search marketers, who felt that these bots were spam.

A debate in the Google Search Console Help from 2020 demonstrates how perplexed individuals were over activities related to GoogleUserContent.

Many participants in the conversation correctly deduced that it wasn't Googlebot but then came to the incorrect conclusion that it was a fake bot impersonating Google.

One user wrote:

"The behaviour I observe coming from these addresses is hitting numerous of our sites and is very similar to, if not the same as, real Googlebot behaviour.

If it isn't, it appears to point to broad harmful bot activity on our websites by someone making a valiant effort to resemble Google, which is worrying.

The individual who started the conversation concludes that the GoogleUserContent activity was spam after receiving a number of responses.

They penned:

"...While the in question Googlebots do replicate the legitimate User-Agents, the evidence now seems to indicate that they are fraudulent.

I'll temporarily block them.

Now that we know, spam or hacker bots are not operating from IPs connected to GoogleUserContent.

They truly are Google employees. Publishers who have blocked IP addresses connected to GoogleUserContent in the past should presumably unblock them now.

Here you can get the most recent IP address list for User Triggered Fetchers.

Read the most recent Google documentation:

examining Googlebot and other Google crawlers for errors