| |
Klixxx Magazine Archive - Tech "Know-How" |
|
|
Using LWP Modules with Perl *
by Tim Schaefer
The Problem
Have you ever wanted to be able to pull something off of one Web site and use it in a program for your Web site? The only way it seems possible is to have some kind of macro-type program doing repetitive tasks through a web browser, but maybe, just maybe if you had a program that would act like a web browser you would be able to do much more than simply play keystrokes. Macro programs have their uses but can often lead to a lot of frustration and extra work.
The Solution
The magic of the Perl programming language once again can be used to help you, allowing you to create web-browser behavior right inside your Perl programs. In this article we'll talk about the magic of using the LWP module, a series of tools that turn your Perl program into a web browser. LWP creates a web browser object for you, goes to the Web site of your choice, and pulls back the HTML just as if you were sitting there with a Netscape or Internet Explorer or other web browser. Don't worry, you don't have to be a rocket scientist to use LWP, what you really need is a good example and you'll be using LWP for all kinds of projects.
An Example
As always a good example is necessary. Recently I needed to create a United States zip code database table but could not find this data anywhere in a form that I could load into a database. Solution? Create my own with LWP and Perl. I found the USPS Web site very useful, but could not see myself sitting and inputting 100000 zip codes into this form:

Instead, I thought of using a program to gather this information automatically with Perl and LWP.
The Perl program I created runs a loop of approximately 100000 hits to the USPS Web site, looking for city and state when I send it a zipcode. The USPS does not have 100000 zipcodes, more like half that, give or take a few, but since I don't know which ones are valid and which ones are not, I'll have to hit the site 100000 times. While this may seem silly, and give the appearance of being a bad-boy hitting their site so many times, the USPS must know that millions of hits come in to their web site every day from people using their web browsers in a normal fashion. Since LWP behaves as a web browser we are only doing the same thing as a person sitting at a web browser, only the computer is doing the repetitive task of looking at the USPS Web site 100000 times.

The above diagram shows basically what is happening when you use an LWP-enabled Perl program. In the above diagram Server A hits Server B, but you can actually get creative and have both servers talking to each other posting back and forth using LWP requests to retrieve data from each other across the web. In the case of my zip code program, Server A is my little Linux box at home, and Server B is the USPS web site.
The Perl Program
Onward to the code. If you are not an experienced Perl programmer, fear not, this program is written in a simple way to show you what to do. Read the comments throughout the program and follow it to the end.
You will need to make sure you have the LWP:: modules installed on your system in order to use them. These can be found at www.cpan.org.
The most important function is the post_to_server routine. In here you need to set two variables to the correct URL where you wish to LWP-surf to.
The first variable, $post_path , holds the basic URL of the Web site and the CGI program you wish to run, but without the corresponding values.
The second variable, $request_str , holds the values that the CGI program would expect. In our case of the USPS program I went to the site, viewed the source code of the form used, and looked at the variables used in the form. You need to do this for your own uses, and also make sure to set the Request to POST or GET depending on what the receiving program is expecting.
At the end of the post_to_server routine you'll see the return command used to send the content back to the calling function.
After we get all that HTML back from the USPS Web site, we have to ignore most of it. The rest of the program scans through the HTML, skipping through until it finds the city and state, and then prints out the city, state, and zip code for a given zip code.
Sample Output
Here is a sample of the output from the program:
GIBSONBURG |OH|43431|
GRAYTOWN |OH|43432|
GYPSUM |OH|43433|
HARBOR VIEW |OH|43434|
HELENA |OH|43435|
ISLE SAINT GEORGE |OH|43436|
JERRY CITY |OH|43437|
KELLEYS ISLAND |OH|43438|
LACARNE |OH|43439|
LAKESIDE MARBLEHEAD |OH|43440|
The data is pipe-delimited, meaning that a pipe character separates the fields, which can then be loaded into a table in a database.
You'll notice that the Perl program did a good job of separating cities from states, allowing city names that have more than one word in them. You will also notice that the Perl program goes after only the first ACCEPTABLE town with the (DEFAULT) next to it. There may be other towns that use the same zip code, but for most situations you know you can use the DEFAULT city as the name of the town for a given zip code. The Post Office is going to go by the DEFAULT no matter what, so we have a great way of getting zip codes useful for our purposes.
A classic use of a zip code database would be to allow a Web site that takes memberships to validate a zip code if the customer lives in the US. Or you could save the customer the need to enter their city and state if they live in the US, and add it automatically after they sign up. I have found other countries like Australia offer their own zip code databases available for download, saving you time instead of having to create a program like this one.
For more information on how to use LWP head over to the popular Perl Web sites and dig in. For now though you should be able to use this program as a guide, and get started right away.
Another fine example of using LWP would be to set up a MySQL server behind a firewall, and have a server outside the firewall send and retrieve data from the MySQL database behind the firewall. There are many ISP's that do not have MySQL available for your Web site, but if you know the IP of your own MySQL server, say at home or elsewhere, you can set up your webserver to use LWP to send and receive information from your own MySQL box somewhere else. LWP!!
If anyone is interested in Perl code for this let me know, I'll write about it in the next issue.
|
Printer
Friendly Version
Submit
Questions or Comments to Klixxx

* Due to some elements of the Perl code contained in this article,
this article is not W3C standards compliant. See W3C check below
for details on non-compliant code and W3C recommended fixes.
|
|
|
|