Search engines are limited in how they crawl the web and interpret content. A webpage doesn't always look the same to you and me as it looks to a search engine. In this section, we'll focus on specific technical aspects of building (or modifying) web pages so they are structured for both search engines and human visitors alike. This is an excellent part of the guide to share with your programmers, information architects, and designers, so that all parties involved in a site's construction can plan and develop a search-engine friendly site.

 

In order to be listed in the search engines, your most important content should be in HTML text format. Images, Flash files, Java applets, and other non-text content are often ignored or devalued by search engine spiders, despite advances in crawling technology. The easiest way to ensure that the words and phrases you display to your visitors are visible to search engines is to place it in the HTML text on the page. However, more advanced methods are available for those who demand greater formatting or visual display styles:

  1. Images in gif, jpg, or png format can be assigned “alt attributes” in HTML, providing search engines a text description of the visual content.
  2. Search boxes can be supplemented with navigation and crawlable links.

  1. Flash or Java plug-in contained content can be supplemented with text on the page.
  2. Video & audio content should have an accompanying transcript if the words and phrases used are meant to be indexed by the engines.

 

Seeing Like a Search Engine

Many websites have significant problems with indexable content, so double-checking is worthwhile. By using tools like Google's cache, SEO-browser.com, or the MozBar you can see what elements of your content are visible and indexable to the engines. Take a look atGoogle's text cache of this page you are reading now. See how different it looks?

 

Juggling Panda ImageI think I have a problem with getting found.  I built this huge flash site for juggling pandas and I’m showing up nowhere on Google. What’s up?

 

Juggling Pandas Comparison

Whoa! That's what we look like?

Using the Google cache feature, we're able to see that to a search engine, JugglingPandas.com's homepage doesn't contain all the rich information that we see. This makes it difficult for search engines to interpret relevancy.

I'm totally going to check out my Axe Battling Monkeys blog!Axe Battling Monkeys

That’s a lot of monkeys, and just headline text?

Hey, where did the fun go?

Uh oh... via Google cache, we can see that the page is a barren wasteland. There's not even text telling us that the page contains the Axe Battling Monkeys. The site is entirely built in Flash, but sadly, this means that search engines cannot index any of the text content, or even the links to the individual games. Without any HTML text, this page would have a very hard time ranking in search results.

It's wise to not only check for text content but to also use SEO tools to double-check that the pages you're building are visible to the engines. This applies to your images, and as we see below, your links as well.

 

Just as search engines need to see content in order to list pages in their massive keyword-based indices, they also need to see links in order to find the content. A crawlable link structure - one that lets their spiders browse the pathways of a website - is vital in order to find all of the pages on a website. Hundreds of thousands of sites make the critical mistake of structuring their navigation in ways that search engines cannot access, thus impacting their ability to get pages listed in the search engines' indices.

Below, we've illustrated how this problem can happen:

Index Diagram

In the example above, Google's spider has reached page "A" and sees links to pages "B" and "E". However, even though C and D might be important pages on the site, the spider has no way to reach them (or even know they exist.) This is because no direct, crawlable links point to those pages. As far as Google is concerned, they might as well not exist - great content, good keyword targeting, and smart marketing won't make any difference at all if the spiders can't reach those pages in the first place.

shepherd

shepherd