Gaga over Google

InfoWorld (U.S.)

With the word Google Inc. being regularly used both as a noun and a verb around most offices these days, it’s a safe bet that employees in your enterprise already appreciate the simple interface, quality search results, and overall speediness of the search engine. With the Google Search Appliance GB-1001, IT managers can now bring the power of Google into the enterprise.

I took a look at Google’s newest offering, and my test results demonstrated why Google continues to dominate the relatively small search-appliance market over competitors such as the Thunderstone Search Appliance. The box boasts most of the features of but puts them under the full control of your IT staff. End-user training is not an issue with such a familiar front-end, making the GB-1001 a must-have for enterprises that want to buy a search solution today and have it running in production tomorrow.

After a cursory scan of the instructions and a quick, painless set up of the bright yellow appliance, I was ready to go. Throughout my test, I kept the small documentation booklet by my side but rarely needed it since the online help was sufficiently comprehensive.

For my first test, I pointed the GB-1001 crawler at a server mirroring Configuring the search crawler demonstrated just how far beyond simple keyword searching the Google engine goes. In a corporate intranet environment, I might want all searches for “emergency contacts” to return a URL with our company’s emergency contact list. Using Google’s KeyMatch feature, I was able to set up the engine to do just that.

In a similar vein, the Google Search Appliance offers a synonyms feature, enabling an administrator to easily set up suggested synonymous terms for user’s search words. In my test, I configured searches for “William Gates” to suggest “Bill Gates,” and sure enough, all searches for “William Gates” returned a message that read: “You could also try: Bill Gates.” I had similar success when I set up “OpenBSD” as a synonym for “FreeBSD.”

In the familiar search interface, you’ll find the same features you find at, though you can change the look and feel via simple Web forms or XSLT (eXtensible Style Language Transformation) style sheets.

The GB-1001 comes with the same automatic spell-checker found on, and I was pleased to find that it worked with information specific to my own company. When I deliberately misspelled my own name in a search as “Chad Dikerson,” the search returned no results but asked, “Did you mean Chad Dickerson?” The built-in spell checker is self-learning and does not have to be configured in any way. Very nice.

In my second test, I indexed content from InfoWorld‘s production intranet, which includes a diverse mix of HTML, PDF, and Microsoft Word documents. The GB-1001 does an excellent job of unlocking the information inside the non-HTML documents that reside on most enterprise networks. My searches for “Peoplesoft set up” and “content management” returned a relevance-sorted list of links to HTML, Word, and PDF documents, and I found what I was looking for. The GB-1001 can deliver results chronologically, and users can also search within particular document types. The appliance caches all documents it indexes, so critical information is available when other network resources are down, though links to cached documents are easily disabled in the search results if you choose.

The GB-1001 can also be configured to send automatic status updates to administrators via e-mail, a feature that I found useful in keeping track of the daily crawl of the site.

The only downside to the Google Search Appliance GB-1001is that it lacks an API, which is surprising because Google has been a pioneer in the Web services arena with its SOAP-based Google Web API. Many enterprises will neither need nor want API access to the appliance, but for those wanting to integrate search results into enterprise applications, this feature is notably missing. In a similar vein, API access to manipulate the search index on a per-URL basis would be useful.

Despite this shortcoming, the Google Search Appliance GB-1001 promised power, simplicity, and the quality search results you expect from Google and it more than delivered. With an API for the system, the Google Search Appliance would be close to perfection.

How I tested

The Google Search Appliance was set up with a static IP address on the production network at the University of Hawaii’s Advanced Network Computing Lab. After initial set up, the testing was done remotely from the InfoWorld Test Center in San Francisco.

To test the Google Search Appliance, I chose two common enterprise environments: a publicly available content Web site consisting primarily of HTML documents and an intranet site with a mixture of file types, including Word, Excel, PDF, and HTML documents.

For the Internet site, I created a collection (Google’s term for a selection of indexed documents) that began indexing the home page and created a subcollection that consisted only of article pages, which were specified by a regular expression that limited searches to those pages.

For the intranet site, I made a copy of the production InfoWorld intranet and used it as a collection. The Google Search Appliance crawler was scheduled to index the content at 10 p.m. nightly during the tests.