A little taste of cookie cookery


Let’s look at an old Internet favourite – a technology that is pervasive on the Web, old as the hills (at least by Internet standards) and fairly misunderstood. The technology in question is cookies, those little nuggets of data that are ignored by most and considered to be “the end of civilization as we know it” by some.

Cookies are tiny chunks of data that Web sites hand to and receive from your Web browser in an effort to track your travels, tag your hopping to make you statistically significant or created to make your preferences available on subsequent visits.

The way cookies are created is simple: When your browser makes a request to a Web server the server replies and a special field in the response header instructs your browser to store the cookie data supplied by the server. Here’s what the header of a server response looks like when it includes a cookie setting request:

HTTP/1.1 200 OK

Date: Wed, 04 Sep 2002 20:20:13 GMT

Server: Apache/1.3.12 (Unix) mod_ssl/2.6.6 OpenSSL/0.9.6 mod_fastcgi/2.2.10

Set-Cookie: Apache=; path=/; expires=Fri, 03-Sep-04 20:20:13 GMT

Connection: close

Content-Type: text/html

This header is from a request for the root page of Network World (U.S.)‘s Web server, www.nwfusion.com, and the tool we used was Ipswitch’s WS_Ping ProPak (US$37.50; www.ipswitch.com/Products/WS_Ping/), which has a feature that lets you retrieve Web pages as plain text (among other things).

By now you’ve probably figured out that the header line that is relevant to us is the one starting “Set-Cookie:.” This is a request that tells a cookie-compliant browser to take the data following the request and create a file to store it in.

The name of the cookie file is up to the browser implementer – Microsoft Corp. Internet Explorer under Windows names cookie files by appending the second-level domain name from the server’s URL to the current user’s name. Thus, our cookie for Network World Fusion is named gearhead@www.nwfusion.txt and is stored in the folder “C:\Documents and Settings\gearhead\Cookies.” Under the Netscape browser cookies are stored in a file named “cookies” that can be found in “c:etscape\users\default.”

The cookie data is defined by six parameters. These are the cookie name, its value, the expiration date, the path for which the cookie is valid, the domain the cookie is valid for and whether a secure connection must be available when the browser returns cookie data to a server.

The name in our example is “Apache,” and the value is “” The name is only significant to a server that sets the cookie, and you’ll often see default values such as “Apache” and “SITESERVER” where a coding library has been used to handle cookies.

The domain is a critical part of this system because it defines the domain or subdomain to which the cookie data will be sent with each browser request. The path also defines the start of a subtree under the domain’s Web root to which the cookie applies, thus \info and \users under myserver.com could have different cookies. If a path is not set, it defaults to the URL of the document creating the cookie.

Of course, you could just as easily set cookies for each subtree and by setting the path “/” have the cookies returned with every request. We’ve looked at quite a few cookies, and we suspect that this feature is rarely, if ever, used. The reason is obvious: the overhead of extra cookies is not significant, and it involves less work when Web site changes are required.

The expiration date is what you might guess, the date after which the cookie data is no longer valid. If a value isn’t set, then the cookie – called a “session cookie” – is stored in memory only and deleted when the browser exits.

The designers of the cookie system never considered that someone might not want cookies to expire, so you’ll often see cookies with expiration dates such as some date and time in 2038.

That year is often the maximum year used for a really dumb reason: Active Server Pages in Microsoft Internet Information Server 3.0 and 4.0 and Microsoft Internet Information Services 5.0 have a small bug that causes an error – we quote from Microsoft Knowledge Base article Q247348: “This is caused by an overflow of the time_t variable in the C/C++ programming language. This variable is a 32-bit integer value used as an offset in seconds from January 1, 1970. This variable has a maximum value of 2147483647, which only allows dates through 3:14:07 GMT on January 19, 2038.”

An “overflow”! What can we say but “duh.”

Send any comments to gearhead@gibbs.com.