Site Search

Google

Links

Browse Articles

Highest Rated

Most Popular


Careers > Programming

Using Web Fetching to Display Interesting Data

Author: Jonathan

Collecting information from a website and presenting it to your visitors in a different, more useful/interesting format can be a valuable service. It can assist attract new visitors and get existing visitors coming back for more.

The best way to collect data from a website is to use their web service if they provide one. Google, Yahoo, MSN, amazon, ebay, technorati and numerous others all offer web services. These are well documented interfaces through which you can easily collect data. The problems come when you want to collect data from a site that doesnt provide a web service.

If a site doesnt offer a web service or some other well structured version of their data then the only option remaining is web scraping. This can be messy, often requires extensive testing and debugging, and will demolish if ever the target website changes its design. With all these problems it should only ever be a last resort.

The reason it is so difficult is that you are collecting the data you need from the html used to display the website to each visitor. Each site is unique. Each web scraper is also unique.

If the site we are interested in doesnt offer a web service, no rss feed and we cant get the data we need from anywhere else then how do we go about building a web scraper?

The process can be broken down into three parts: get the html of the page, extract the data we need, and then do something with that data.

Depending on how the site you are fetching information from is set up fetching the html could be really very easy or exceedingly complicated. At its simplest all you will need to do is open up a file just as you would a file situated on your local server. If you dont need to log in to view the data you want on the site then it may well be this easy. For your sake I hope it is this simple.

If you need to log in to the site to access the data you need then things can get complicated . . . really complicated. Im currently trying to collect/develop scripts to fetch the contact lists from webmail services such as hotmail. Here the obstacles you must face include cookies and variables in the URL. These problems can be overcome but you will need plenty of time.

Once you have the html for the page you need to clean it up so that you have just the data you need with no extra text. The best way to do this is with regular expressions. By defining patterns for the data you want you can cut out the chaff and just hold what you need. Unless you are extremely comfortable with regular expressions you may find it easier to use several simpler patterns one after each other. Your script may take a little longer to race but it will be far easier to develop.

Finally you want to do something with the data you gather. This may be as simple as storing it in a text file or displaying it immediately to a visitor to your site. You may also decide to do some more complex tasks with the data. The necessary thing is that once you have the data you have the selection of what to do with it.

Web scraping isnt easy but the benefits can be considerable. If a web service is available though save yourself some time and use that instead.

Article Source: http://www.articlesbase.com/programming-articles/using-web-fetching-to-display-interesting-data-97215.html

About the Author:

TorrentialWebDev focuses on developing the tools to build and promote Web Applications.

Rate This Article Rating Saved!
Add to Mixx!

Keywords:

data need site service want website html fetching each


Related Articles:

Japan Web Design and Development