Search Engine Robots
Google Bots
Google sends out the following bots:
- APIs-Google (For API developers.)
- AdsBot-Google
- Checks desktop web page ad quality.
- AdsBot-Google-Mobile
- Checks Android web page ad quality.
- Checks iPhone web page ad quality.
- AdsBot-Google-Mobile-Apps
- Checks Android app page ad quality.
- Obeys AdsBot-Google robots rules.
- DuplexWeb-Google
- Duplex on the web may ignore the * wildcard
- FeedFetcher-Google
- Feedfetcher doesn’t respect robots.txt rules
- Google-Read-Aloud
- Google Read Aloud doesn’t respect robots.txt rules.
- Googlebot
- For user-initiated requests, Google Favicon ignores robots.txt rules.
- Googlebot-Image
- For user-initiated requests, Google Favicon ignores robots.txt rules.
- Googlebot-News
- Googlebot-Video
- googleweblight
- Web Light doesn’t respect robots.txt rules
- Mediapartners-Google
- Storebot-Google
You can follow this link for more information on the list Googlebots.
The bots follows links from other websites and website sitemap files that are submitted through Google Search Console.
Follow this link for more information on crawling and Google bots.
Bing Bots
Bing sends out a bots called Bingbot, AdIdxBot, and BingPreview to follow links from other websites and website sitemap files that are submitted through Bing Webmaster Tools.
Follow this link for more information on how Bing crawls and the bots that they use.
Robots.txt File
If you decide that you do not want the search engines to crawl your website then you will have to add a robots.txt file to your website.
User-agent: Googlebot Disallow: /example-subfolder/
User-agent: Bingbot Disallow: /example-subfolder/
Sitemap: https://yourwebsite.con/sitemap.xml
The above example is a robots.txt file that tells Googlebot and Bingbot not to crawl the website.
The robots.txt file should be placed in the TLD (top level directory) of your website like so and is case sensitive so it has to be robots.txt not Robots.txt.
https://yourwebsite.com/robots.txt
If you have a subdomain like blog.yourwebsite.com then another robots.txt file specifically for that subdomain has to be added there.
https://blog.yourwebsite.com/robots.txt
Since the robots.txt file is accessible to everyone to read and is easy to find you will not want to put sensitive information here.
Here are some more examples on what you can add to your robots.txt file
Block all crawlers from all content.
User-agent: * Disallow: /
Allowing all crawlers to all content
User-agent: * Disallow:
Blocking Googlebot from access a specific file on your website.
User-agent: Googlebot Disallow: /subfolder/
Blocking Googlebot from a specific page on your website.
User-agent: Googlebot Disallow: /subfolder/block-page.html
Do not allow specific file types from being accessed.
User-agent: *
Disallow: /*.pdf$
Linking your sitemap in your robots.txt file. Usually at the bottom.
User-agent: *
Sitemap: https://pagedart.com/sitemap.xml
Delaying the crawling of your website. This delays it for 10 seconds and you can enter 1 to 30 seconds. Google does not follow this rule.
User-agent: *
Crawl-delay: 10
Back to Digital Marketing Course
Related Content:
How do search engines crawl your website?
How do search engines index your website?
What is a sitemap?
How do search engines rank your website?
How to get Indexed and Ranked Faster?