business.com receives compensation from some of the companies listed on this page. Advertising Disclosure

Home

Why Google Bots Refuse to Crawl Your Site

Kevin Reis
Kevin Reis
business.com Member
Nov 20, 2019

If your site isn't generating traffic, Google bots may not be crawling your webpages.

Over 5 billion Google searches are made each day! Everyone's on Google to find answers to their queries and discover new things. In fact, Google is rated as the most popular website in both the global and U.S. markets. If your business isn't featured on Google's search engine result page (SERP), you are doomed!

Why are Google bots important?

Crawling. Indexing. Ranking. These are the three basic steps used by Google's search engine automated robots (also called crawlers or spiders) to generate results on the SERP. If your website is unfriendly to these crawlers, you stand no chance of attracting organic traffic to your site.

So, how can you make Google bots find and crawl your site? First things first, know where you stand. Conduct a thorough SEO audit of your site to gauge its onsite, offsite and technical SEO performance. Second, determine how many pages are indexed. Simply type "site:yoursite.com" into the Google search bar. If the number of results is drastically lower than the actual number of pages on your site, Google is not crawling all the pages on your site and you need to do something about it. 

Six reasons why Google bots aren't crawling your site

Without further ado, let's understand what makes a website crawler-unfriendly and what webmasters can do about it.

1. You have blocked Google bots.

Is Google not indexing your entire website? In this case, the first thing you need to check is your robots.txt file. Look for code snippets that disallow the bots from crawling any page on your site and simply remove such code.

Further, check for a crawl block in the robots.txt file using the URL inspection tool in Google Search Console. If you see an error saying that the crawl is blocked by robots.txt, get rid of it to help Google bots crawl and index your page.

At times, it takes more than a week for Google to crawl a new website. In such cases, it is wise to open a Google Search Console account and point Google to your sitemap URL. If your site doesn't have a sitemap, create one now.

Another way of barring search indexing from your website is by having the "noindex" meta tag. If you see the following code in the meta tag, get rid of it to allow Google to index your site.

  • <Meta Name= "Robots" Content="NOINDEX, NOFOLLOW">

2. You haven't created a Google Console/Analytics account yet.

Google Analytics is a free web analytics tool that collects and organizes traffic data into customizable reports and Google Search Console offers webmasters in-depth information on how Google sees a website.

Manually activating these Google services will send a signal to the Google bots that you are seriously working towards building your web presence. In fact, Search Console can help you gauge the health of your website and fix issues that are stopping your pages from getting indexed.

For instance, if you have a new page on your site, it's quite possible that Google hasn't got a chance to crawl it yet. The URL inspection tool in GSC can help you find out whether or not the page is indexed and offer you a complete report. So, say hello to Google by setting up a Search Console account and visit it regularly to see how your site performs in the SERP.

Another point to bear in mind is that the old Google Search Console allowed webmasters to have Google test, render, crawl and index any URL using the "Fetch as Google"' tool. Though this feature doesn't exist in the new version, you can still ask Google to index your webpages.

3. Your website has a poor internal linking profile.

Internal links are key to helping Google find, understand and index your webpages. They enable users to easily navigate a site, establish information hierarchy and spread link equity through the site. For instance, according to Moz, the optimal link structure for a website should look like a pyramid, with your homepage at the top of the structure.

Most e-commerce sites, including Amazon, use this structure and add internal links from their most authoritative pages. Google will recrawl such powerful pages, enabling it to find the internal link and index the respective page. You can find the most authoritative pages on your website using tools like Google Analytics and Ahrefs Site Explorer.

Finally, Google bots do not crawl links with the rel="nofollow" tag. Nofollow internal links cause Google bots to ignore the link. Hence, it's important to remove the nofollow tag from the internal links unless they point to an unimportant page that you want to exclude from the search engine's index.

4. Google doesn't like your URL structure.

Google advises webmasters to keep URL structures simple and readable. Hence, you should avoid using long, complex IDs that can cause problems for crawlers. According to Google, such complex URLs contain multiple parameters and create unnecessarily high numbers of URLs that point to identical content on your site. This will cause Google bots to consume more bandwidth to crawl the webpage or to not crawl the page at all.

Wherever possible, have a clean URL taxonomy that bots can understand. Further, use the robots.txt file to block the bot's access to problematic URLs if there are any.

Permalinks are URLs that help link your content on your web, enabling Google to find the page with ease. Google likes short URLs that clearly state the title or important keywords.

WordPress, by default, creates weird permalinks or URL structures that may contain day, date, month or post IDs. These aren't preferred by Google. If your site is hosted by WordPress, use the "Post name" structure in the Permalink Settings on the WordPress dashboard.

5. Google has temporarily removed your site from its index.

If your website failed to meet Google's quality guidelines or has a shady history, the search engine may deindex, penalize or remove your site from the search results.

  • Deindexed or banned: If a website is completely removed from the Google search page, it is deindexed.
  • Penalized: At times, a lurking manual penalty may prevent your site from getting indexed. If your website or a page still exists but cannot be found in the search results, Google has penalized your site. This penalty can be enforced by Google's algorithm or manually applied by Google's quality engineer.
  • Sandboxed: Google Sandbox is an alleged filter that prevents new websites from ranking high. If the traffic for your new site or page dropped suddenly and it wasn't deindexed or penalized, Google has sandboxed your site.

Generally, Google alerts webmasters when their websites violate quality guidelines. In such cases, it is advisable to modify the site and ask Google to review the site after the issues are fixed.

6. You haven't optimized for Google bots.

Optimizing your website for Google bots isn't the same as search engine optimization. Once you submit your website to the search engine, Google bots crawl the pages for content. These spiders scan your site for meta content, keyword saturation, relevant content and certain other factors. Therefore, it is important to optimize your sire for such scans.

Build a site that's indexable and offers relevant information to Google bots. Pay attention to the technical ranking factors to improve your site's crawler experience. Here are a few parameters that you shouldn't ignore.

  • Good-quality content: Create relevant and high-quality content for your audience. Google's algorithm awards sites that offer original and relevant content with a higher ranking than those who use fillers or share duplicate content. Though canonicalizing pages makes sense, do so wisely. Canonicalization, when not done carefully, can confuse Google's spiders, making it tough for them to crawl and index your site. 
  • Easy navigation: Make sure your website has a navigation bar that links to all major pages on your website.
  • Text-to-HTML ratio: Since Google's bots read text, make sure your website has a high text-to-HTML ratio (ideally, between 25% and 70%) in favor of text. Further, minimize JavaScript or make sure it loads after HTML, as bots get signals from the text in the HTML code. 
  • Site speed: Your site's loading time is an important ranking factor that Google bots consider when indexing your site. Make sure you test your site's speed and take the necessary measures to improve its loading time. 
  • Structured data: Schema markup or structured data gives context to your website, allowing Google's spiders to make the meaning of the content and index the pages with ease. Boost your site's SEO by using schema markup.  

Regardless of how many backlinks they have or what high-quality content they share, crawler-unfriendly sites do not exist in the eyes of Google. If your website or webpages have crawlability issues, Google bots will not be able to discover or index them, causing you to lose your online ranking. The information shared in this post will help you identify why Google bots aren't crawling your site, allowing you to take the necessary corrective measures.

Image Credit: fizkes / Getty Images
Kevin Reis
Kevin Reis
business.com Member
His most recent accomplishment was initiating and scaling Shopify’s very first international SEO effort. He has been successful in helping Shopify achieve organic growth in over 25 international markets. Having graduated from the John Molson School of Business at Concordia University, he developed a keen interest in digital marketing. Kevin gained most of his experience in some of the most competitive fields in digital marketing, such as ecommerce and video streaming. He enjoys writing on a variety of topics that range from SEO, technical & International SEO to content marketing.