A non-profit organization that creates an archive of the Internet by taking snapshots of every Web site recently moved its three petabytes of data into a 20-foot-long shipping container, a modular data centre based on open storage technology by Sun Microsystems Inc.
The library of text, audio, moving images (videos and films), rare and hard-to-find software, and archived versions of Web pages – built by San Francisco-based non-profit organization Internet Archive – receives about 100,000 visitors per day and was undergoing rapid expansion. Founder Brewster Kahle expects the three petabytes of data to grow at approximately 100 terabytes per month.
“A lot of the Web is gone, in fact most of it. We’re trying to build an analogy of a library for this new publishing world,” said Kahle. “(The Web) is currently growing at about one petabyte a year and it’s continuing to grow as videos and photographs start to come on the net. The net is effectively infinite.”
To ensure the storage and preservation of such vast volumes of content, Internet Archive must have a way to parse, index and physically encode the data, while avoiding degradation and ensuring accessibility in unknown future formats, said Kahle.
But the organization had already suffered through four data centre migrations over a span of 12 years. “We wanted a data centre well-tuned to the machines we are using, and retro-fitting a building takes years and you either buy too much or too little,’ said Kahle. “It’s very expensive, time-consuming and a pain in the butt.”
The original setup, a customized storage architecture built in-house, was replaced earlier this year with 63 Sun Fire x4500 servers running Solaris 10 and Solaris ZFS. The data centre, which Kahle amicably refers to as “a very big machine,” now resides in a shipping container that sits outside. “The next box that we get, we’ll fit on top of it or beside it,” he said.
The open software and hardware platform from the Santa Clara, Calif.-based technology vendor “allows (a customer) like the Internet Archive to take the development in a direction that is more in tune to what they do,” said Sun Microsystems’ Jud Cooley, senior director of engineering for the Sun modular data centre.
The open storage platform also comes with analytics capabilities designed to monitor system performance and bottlenecks. Kahle finds this particularly useful for lowering maintenance costs, one of Internet Archive’s challenges. “The system admin lives 50 miles away, and if there is something going on, they can just log in and be able power cycle a machine, take something offline, reconfigure a RAID,” he said.
The modular design of the data centre, dubbed Blackbox by Sun, offered Internet Archive and its growing volume of data the much needed scalability, said Cooley. “That means the architecture they use for the way they capture and reference data has to be able to scale, and the way they deploy the physical structure has to be able to scale,” said Cooley.
Sun Microsystems introduced the modular data centre with the expectation that one of the main applications would be customers like Internet Archive performing the same task repeatedly, said Cooley. “So after they do the first deployment, it’s almost a cookbook to do the next one,” he said. “You configure it once, debug it once, get it operational once, and then just reheat as many times as you want.”
And, as Internet Archive expands operations around the world, Cooley said deployment will be easy because you just “put another one of these boxes together, put the disks in it, lay the data on it and simply ship it into place.”
While Cooley acknowledged that modular data centres are probably more conducive to some businesses than others, he said most data centre managers are currently thinking in terms of modularity, “whether it’s a Blackbox modular container, or modular pods inside a data centre, or just a populated rack as a deployable module.”
“The cost of the modular data centre is driving people to think about how they do scale up,” said Cooley.
And, while one of the aversions to the modular concept is regarding security of the hardware, Cooley said securing a modular data centre is not any different from a more traditional setup. “Wherever you put your assets, you’re going to secure them. You can secure a container too. There are locks on the doors, they’re bolted to the ground, they go behind security fences, they’re not visible from the street.”
Sun is hosting the data centre on its Santa Clara campus and is providing power, cooling and networking capabilities.