How do search engines crawl your website?

Just a quick understanding of how the search engines crawl, index, and rank your websites will help you develop a strategy for gaining Search Engine Visibility which will help your website gain exposure.

The search engines send out bots to crawl the internet and believe it or not a very small percentage of the websites out there are indexed by Google the number one search engine.

I decided to try and find out the exact number by browsing Google’s own website but the only reference I could come up with is 4% from the Tennessean. One other website out there stated that they were wrong because it is impossible to tell the exact percentage which is true but I believe it to be a fair estimate. Also, about 30% of the websites indexed and crawled by Google are spam websites.

Google takes in about 78% of all searches from a report in September 2021. That is why it is the most referenced search engine out there.

Bing’s search engine takes up about 11% of the market from the same report.

DuckDuckGo is a more recent search engine that focuses on privacy that people should keep their eyes on though because I see it gaining momentum.

What happened to Yahoo?

In October 2015, Yahoo subsequently reached an agreement with Google to provide services to Yahoo Search through the end of 2018, including advertising, search, and image search services. As of October 2019, Yahoo! Search is once again powered by Bing.

Search Engine Robots

Just a quick understanding of how the search engines crawl, index, and rank your websites will help you develop a strategy for gaining Search Engine Visibility which will help your website gain exposure.

Google Bots

Google sends out the following bots:

  • APIs-Google (For API developers.)
  • AdsBot-Google
    • Checks desktop web page ad quality.
  • AdsBot-Google-Mobile
    • Checks Android web page ad quality.
    • Checks iPhone web page ad quality.
  • AdsBot-Google-Mobile-Apps
    • Checks Android app page ad quality.
    • Obeys AdsBot-Google robots rules.
  • DuplexWeb-Google
    • Duplex on the web may ignore the * wildcard
  • FeedFetcher-Google
    • Feedfetcher doesn’t respect robots.txt rules
  • Google-Read-Aloud
    • Google Read Aloud doesn’t respect robots.txt rules.
  • Googlebot
    • For user-initiated requests, Google Favicon ignores robots.txt rules.
  • Googlebot-Image
    • For user-initiated requests, Google Favicon ignores robots.txt rules.
  • Googlebot-News
  • Googlebot-Video
  • googleweblight
    • Web Light doesn’t respect robots.txt rules
  • Mediapartners-Google
  • Storebot-Google

You can follow this link for more information on the list Googlebots.

The bots follow links from other websites and website sitemap files that are submitted through Google Search Console.

Follow this link for more information on crawling and Google bots.

Bing Bots

Bing sends out bots called Bingbot, AdIdxBot, and BingPreview to follow links from other websites and website sitemap files that are submitted through Bing Webmaster Tools.

Follow this link for more information on how Bing crawls and the bots that they use.

Robots.txt File

Robots.txt files are files added to the TLD (Top Level Domain) of your website and submitted to the search engines using websites like Bing Webmaster Control and Google Search Console which instruct them to take their bots and follow the instructions of your robots.txt file on how to index your website.

You can read more about robots.txt files and how to create them using the link below.

Back to Digital Marketing Course

Related Content:

What is a robots.txt file?
How do search engines index your website?
What is a sitemap?
How do search engines rank your website?
How to get Indexed and Ranked Faster?