Alexa Internet Inc. is offering online computing capacity for US$1 an hour — and throwing in access to the database of millions of Web pages that lurk behind its Alexa toolbar search service.
Programmers who register for the beta version of Alexa Web Search Platform, released Tuesday, can use it to create specialized search engines for vertical markets, drawing results from the database of 4 billion Web pages crawled by Alexa, the company said. Alexa is a subsidiary of Amazon.com Inc.
Following in the footsteps of Google Inc., Alexa is opening up the API (application programming interface) to parts of its search engine, but going one better by offering to host applications that build on its database — for a fee. Programmers remixing Google’s search utilities must organize their own application hosting.
Alexa Web Search Platform gives programmers a way to specify a subset of documents from the archive, develop an application to search those documents, and publish the results as an XML (Extensible Markup Language) feed or a specialized search engine.
The results returned can include simple text or HTML (Hypertext Markup Language) documents, or graphics, audio or video files.
As an example of how to use the service, Alexa has built a photo search engine at http://photos.alexa.com/ that allows visitors to refine their search for photographs according to technical details such as the size of the image, the make and model of camera it was taken with, and even the aperture setting used.
While the photo search engine shows how the platform can be used to build a live service, a one-off search of the database content can also be used to seed another service. That’s how Rainer Typke, a researcher at the University of Utrecht in the Netherlands, used the platform to expand his searchable melody directory, http://www.musipedia.org.
Typke used the platform to extract around 1,000 MIDI files from Alexa’s database, converted them to a monophonic form and stored them on his own server to make them easier to search. Musipedia doesn’t use Alexa for its live search service, Typke said in an e-mail response to questions.
Using the Alexa computer cluster, Typke plans to identify hundreds of thousands of MIDI files in the database and process them using an algorithm that extracts their characteristic melody. Those melody files will be used to expand the Musipedia directory. Later, he hopes to be able to process files containing audio recordings in the same way.
“For the more computationally expensive preprocessing that would be required, especially by audio, Alexa’s fast and large computers will come in handy,” he said.
Alexa will charge for hosting applications that use the platform. The charges include $1 per processor per hour for computing capacity, $1 a year for 1G-byte of storage, $1 per 50G-bytes of data processed by the system, $1 per gigabyte of data transferred into or out of the system, and $1 for every 4,000 search requests the system responds to from published search engines using the service.
Typke expects the pricing will “be okay for people like me,” he said. He’s identified a number of ways to control the cost of his melody search, including updating the core data less frequently, or restricting the search to a smaller subset of Alexa’s total data.
“I still need to get a feeling for how much I can do with one hour of computing power,” he said. “Getting the 1,000 files for the prototype took just minutes.”
The API is designed for the C programming language. It can be used to build “Web services” which can be integrated into other systems or published through Amazon.com’s Web services platform, Alexa said.