Site Search

Google

Links

Browse Articles

Highest Rated

Most Popular


Internet > SEO A fundamental to ensuring your web site appears in the search engine listings is making sure that search engine spiders can successfully crawl your web site. After all, if the spider can't reach your pages, then the search engine can't include them in their search listings.

Web Site Review
Let us test your web site for spider traps that affect your search engine ranking. Our Web Site Analysis is a top to bottom review.

Learn more >>

Unfortunately many web sites use technologies or architectures that make them hostile to search engine spiders. A search engine spider is really just an automated web browser that must interpret your page's underlying HTML code, just like a regular browser does.

But search engine spiders are surprisingly unsophisticated web browsers. The most advanced spiders are arguably the equivalent of a version 2.0 web browser. That means the spider can't understand many web technologies and can't read parts of your web page. That's especially damaging if those parts include some or all of your page's links. If a spider can't read your links, then it won't crawl your site.

As a search engine marketing consultant, I'm often asked to evaluate new web sites soon after their launch. Search engine optimization is often neglected during the design process, when designers are focused on navigation, usability, and branding. As a result, many sites launch with built-in problems, and it's much harder to correct these issues once the site is complete.

Yet often its only when their site fails to appear in the search engine listings that many companies call in an SEO.

That's a shame, because for small businesses, search engines are by far the most necessary source of traffic. Fully 85% of Internet users find sites through search engines. A web site that isn't friendly to search engines, its value is greatly reduced.

In this article, I'll give an overview of the issues that can hold a search engine spider from indexing your site. This list is by no means exhaustive, but it will highlight the most common issues that will hold spiders from crawling your pages.

JavaScript Links
JavaScript is a wonderful technology, but it's invisible to all of the search engines. If you use JavaScript to control your site's navigation, spiders may have serious problems crawling your site.

Links contained in your JavaScript code are likely to be ignored by search engine spiders. That's especially true if your script builds links by combining several script variables into a fully formed URL.

For example, suppose you have the following script that sends the browser to a specific page in your site:



This script uses a function called goToPage() to add a tracking code onto the end of the URL before sending visitors to the page.

I've seen sites where every link on every page ran through a JavaScript such as this. In some cases the JavaScript is used to include a tracking code, in other cases it's used to send users to different domains based on the page. But in all of these cases the site's home page is the only one listed in the search engines.

None of the spiders include a JavaScript parsing engine that would allow them to interpret this type of link. Even if the spider could interpret this script, it is difficult for them to simulate the different mouse clicks that would trigger execution of the goToPage() function with different values of the page variable, and it has no idea what value should be used for trackingCode.

Spiders will either ignore the contents of your SCRIPT tag, or else will read the script content as if it was visible text.

As a rule of thumb, it's best to avoid JavaScript navigation.

DHTML menus
DHTML drop-down menus are extremely well-liked for site navigation. Unfortunately, they're also hostile to search engine spiders, since the spiders again have problems finding links in the JavaScript code used to create these menus.

DHTML menus have the added problem that their code is often placed in external JavaScript files. While there are worthy reasons to put your script code into external files, some spiders won't fetch these pure JavaScript files.

If you use DHTML menus on your site and want to see what effect they have on search engines, try turning JavaScript off in your browser. The drop-down part of your menus will disappear, and there's a chance the top-level menus will disappear too. Yikes! Suddenly most of the pages in your site are unreachable. And that's the way they are to the search engines.

Query Strings
If you have a database-driven site that uses server-side technologies such as ASP, PHP, Cold Fusion, or JSP, there's a worthy chance your URLs include a question string. For example, you might have a URL like this one:

http://www.mysite.com/catalog.asp? item=320&category=23
That's a problem, because many search engine spiders won't follow links that include a question string. This is true even if the page that the link points to contains nothing but standard HTML. The URL itself is a barrier to the spider.

Why? Most search engines have made a conscious design decision not to follow question string links because they require additional record keeping by the spider. Spiders a hold list of all the pages they've crawled, and try to avoid indexing the same page in a single crawl. They do this by comparing all new URLs to the list of URLs they've already seen.

Now, suppose the spider sees a URL like this one in your site:

http://www.mysite.com/catalog.asp? category=23&item=320
This URL leads to the same page as our first question string URL, even though the URLs are not identical (Notice that the name/value pairs in the question string are in a different order).

To recognize that this URL leads to the same page, the spider would have to decompose the question string and store each name/value pair. Then, each time it sees a URL with the same root page, will would need to compare the name/value pairs of its question string to all the previous ones it has on file.

Keep in mind that our example question string is fairly short. I've seen question strings that are 200 characters long, and reference a dozen different name/value pairs.

So indexing question string pages can mean a worthy deal of extra work for the spider.

Some spiders, such as Googlebot, will handle URLs with a limited number of name/value pairs in the question string. Other spiders will ignore all URLs containing question strings.

Flash
Flash is cool, in fact it's much cooler than HTML. It's dynamic and cutting edge. Unfortunately, search engine spiders use trailing-edge technology. Remember: a search engine spider is roughly equivalent to a version 2.0 web browser. Spiders simply can't interpret newer technologies, such as Flash.

So even though that Flash animation may amaze your visitors, it's invisible to the search engines. If you're using Flash to add a bit of spice to your site, but most of your pages are written in standard HTML, this shouldn't be a problem.

But if you've created your entire site using Flash, you've got a serious problem getting your site into the engines.

Frames
Did I mention that search engine spiders are low-tech? That's right, they're so low-tech they don't understand frames either. If you use frames, a search engine will be able to crawl your home page, which contains the FRAMESET tags, but it won't be able to find the individual FRAME tags that make up the rest of your site.

In this case, at least, you can work around the problem by including a NOFRAMES section on your home page. This section of your page will be invisible to anyone using a frames-capable browser, but allows you to place content that is visible to the search engines and other frame-blind browsers.

If you do include a NOFRAMES section, be sure to put real content in there. At a minimum you should include standard hypertext links (A HREF) pointing to your individual frame pages.

It's surprising how often individuals include a NOFRAMES section that simply says "This site requires frames. Please upgrade your browser." If you'd like to try an experiment, race a question on Google for the phrase "requires frames." You'll see somewhere in the neighborhood of 160,000 pages returned, all of which include the text "this site requires frames." Each of these sites has limited search engine visibility.

With www or without www?
The address of this web site is www.keyrelevance.com, but can individuals reach it if they leave off the initial "www?" For most server configurations, the reply is "yes," but for some the reply is "no." Make sure your site is reachable with and without the www.

This list presents some of the most common reasons why a search engine may not be indexing your site. Other factors, such as the way you structure the hierarchy of your web pages, will also affect how much of your site a spider will crawl.

Each of these problems has a solution, and in future articles I'll expand on each one to assist you get more of your pages indexed.

If you're currently redesigning your web site, I'd encourage you to consider these issues before the site goes live. While each of these search engine barriers can be removed, it's better to start with a search engine friendly design than to fix hundreds of pages after launch.


This article is free for republishing
Source: http://www.articlealley.com/article_1134_6.html

Rate This Article Rating Saved!
Add to Mixx!

Keywords:

search site engine spiders page spider query pages javascript engines


Related Articles:

Blog Search Engine Optimisation
7 Simple Steps to an Optimized Website
Optimize Your Site to Enhance Web Ranking
Boost Your Web Rating with Optimization Techniques
How to Market your Products or Services Using the Internet
Modernization of the Search Engines
Google Loses Market Share in July
6 Simple Steps For A Search Engine Optimization Strategy
ExactSeek Explained Part 1 Revised
Make Money With Top Paying Keywords
The 10 Biggest Search Engine Optimization Mistakes 1 Wrong Keywords
Getting The Top Result In Search Engine With Successful Marketing Is Harder But Not Impossible
Resources you can use to promote your online business
Breaking the Myth about Page Rank PR
How to Help Google Make up its Mind
10 Great things NOT to do with Google AdSense
A Smart Trick for Attracting Higher Paying AdSense Ads
Meet Adwords AdSenses Fraternal Twin
Tips for maximizing Your Google AdSense Revenues
10 Inside Secrets to Google Adwords Part 1
How to Make a Sitemap For Your Website In Five Steps
Picking Keywords for SEO A Different View
4 Tricks For Lightning Fast Indexing
The Trust Quotient Is Your Website Ranking High on Trustworthiness Part 2
Uncomplicated Website Search Engine Positioning