A common misconception held by the average Web user is that their computer is actively connected to a Web site the entire time they are reading a page. The nature of the Web is transactional: a browser connects to the server, makes a request, receives a response, and the connection may be terminated.
HTTP is the protocol that determines how these client-server messages are formed. The details of an HTTP message are usually abstracted by Web programming languages as a convenience for developers, but there are tools available for viewing the exact message text. I recommend the Live HTTP Headers extension for Mozilla’s Firebird browser.
Let’s examine some of the more important pieces of information included in a request, and how they can be manipulated.
The first line of the request specifies an HTTP operation, a path, and the version of HTTP being used. It will look something like this: GET / HTTP/1.1
The Get operation asks the server for a resource. The path specifies where the resource is located on the server. In the above example, the path is specified as “/,” meaning the root resource will be returned (typically the homepage).
The next line of the request specifies the host. The host is the domain name of the server the resource is on, so a request that used the above Get command coupled with the following line would retrieve, for example, the IT World Canada homepage:
Here are some other important request headers:
User-Agent: The User-Agent header specifies the type of browser being used. This is important because the server may provide customized page views that take advantage of the strengths of different browsers. By editing this line of a request, you can “trick” the server into thinking you’re using Internet Explorer when you’re really running Mozilla.
Accept-Language: Specifies the world language to be returned if the server supports internationalized content.
Referrer: This is how your browser tells the server what page linked you to the resource you’re requesting. Some Web sites use this header to track the performance of banner ads on other sites. Others will use the Referrer header to try to force a page to be viewed within a frameset – if the URL of the frameset isn’t the referring page, the content doesn’t get displayed.
Connection: In HTTP 1.0, specifying “Connection: keep-alive” allows multiple requests to occur over the same connection – that is, rather than disconnect from the server after the first request, the connection is kept
active for a period of time so that additional requests for images, stylesheets, etc. will execute more quickly. Other headers could be used to specify options such as timeout, but keep-alive support was not very consistent under HTTP 1.0. In HTTP 1.1, keep-alive attempts to work by default until the header “Connection: close” is specified, terminating the connection.
The first line of the server’s response will also indicate the version of HTTP, as well as a response code. A response code of 200 means the request was fulfilled. Most people have seen code 404, which is a “page not found” error. There are many other response codes, including 500 (internal server error) and 403 (forbidden).
Useful headers that may be found in the response include the date/time the request was fulfilled, the content type of the resource, as well as the last modified and expiry dates of the resource. Executing a HEAD operation instead of GET will return only the response headers and not the resource itself. This allows and HTTP client to acquire information about a resource before requesting a copy of it.
On an unrelated note, I’d like to thank everyone who responded to my last column’s request for native XML database success stories. Once again, I’m impressed with the creativity of ComputerWorld Canada readers.
Cooney works as a programmer/analyst for a major Canadian book publisher. He can be reached at [email protected].