Alright, lets get the obvious puns out of the way: Googlebot needs some natural spider enhancement, Googlebot needs a bigger band-width for your site, Googlebot needs to last longer crawling your site. Those are pretty bad…. but, if you came here for the pen15 jokes, thanks! Now, move on, perv.
What I am going to actually talk about is site architecture and how you might be wasting Googlebot’s crawling bandwidth for your site.
Site Architecture for SEO
How your site is structured is very important to SEO on many, many levels. For example, a proper site architecture allows SEOs to target keyphrases properly, develop authoratitive pages and appease the mighty search spiders.
An ad-hoc site architecture can be difficult to work worth and even more difficult to achieve the results that many sites are looking for out of their SEO campaigns. So, before you jump into your site – plan ahead.
Lets say your are completely re-doing your site. What you want to do is during the wire framing phase of your site’s build out think SEO or bring an SEO into the discussion so you know you have your major keyphrase targets covered. This will make the optimization go smoothly down the road. Also, an SEO will be able to make suggestions as to what is missing so you accommodate expansions now or make the site flexible enough to accommodate this new content later.
Evaluate Your Platform
Take the time to really test out the platform you are considering on using for your site. Focus in on how the platform builds the URLs as you click around. What you are keeping an eye out for is URL-bloat through improper canonicalization of URLs or a general issue with how the platform actually works. This is a major consideration for larger e-commerce and news sites because you never want to see the “Googlebot found an extremely high number of URLs” error.
Googlebot Found an Extremely High Number of URLs
The above error message means you have a serious issue on your hands. If Googlebot is having an issue with the number of URLs your site is offering up for crawling you can bet that Bingbot and every other bot on the planet is probably having the same problem.
Your Whole Site isn’t being Crawled
The error basically tells you Google has used too much bandwidth crawling your site and they aren’t going to waste any more. This boils down to repetitive URLs displaying the same content (a canonicalization issue). Large sites that have tons of pages but each page has unique content won’t run into this error, but it’s large sites that have a dysfunctional CMS that will always get this message.
See Step 2
You may have missed something while evaluating your platform, so circle back on how your CMS is building your URLs. Googlebot has clearly found itself in an area of the site that is continuously spitting out URL after URL that aren’t necessary.
Google even offers up a bit of advice on this too: http://www.google.com/support/webmasters/bin/answer.py?answer=76401
Googlebot can’t afford Enzyte
Google is crawling the entire Internet. Everything! They just can’t afford to spend forever crawling your defunct site in hopes that eventually they will find the content they were meant to find. Having a proper architecture and platform that respects that architecture will help to keep you out of trouble.