Imagine a storage array with capacity that’s equivalent to a stack of iPods three times the height of the Empire State Building but that can be managed with common Ethernet networking tools, and you’ll get what a group of MIT scientists and four storage vendors are in the process of building.
The storage array will support an MIT Media Lab project called the Human Speechome Project that is studying how babies develop the ability to talk. The project began three months ago when MIT associate professor Deb Roy began recording his baby’s everyday life through the use of 14 fish-eye lens cameras set up throughout his house, giving researchers a bird’s-eye view of every room.
In order to store and then process the video and audio data, a massive storage area network (SAN) was needed to archive and search what is expected to be 1.4 petabytes of data, or 1,400TB of data, over the span of the three-year project.
The SAN is being built from commodity hardware and uses a 10GigE IP network for data transfer between the back-end SAN and hundreds of servers.
“I think what we’re seeing here is what the future of storage is going to be like. This is a great marriage between industry and the academic world,” said Frank Moss, director of the Media Lab and a former CEO of Tivoli Systems Inc., a maker of storage management software now owned by IBM.
The Human Speechome Project computing infrastructure is expected to be composed of more than 300 Hammer Z-Rack storage enclosures from Bell Microproducts, about 3,000 SATA (Serial Advanced Technology Attachment) hard disk drives from Seagate Technology LLC. and more than 100 10GigE switches and 400 blade processors from Marvell Technology Group Ltd.
The high-throughput switches are needed for the storage I/O anticipated by researchers who believe they’ll be processing 700TB of data during every 12-hour analytical run.
To achieve the desired performance requirements, 150-drive stripes (aggregated virtual volumes) will be created using the native virtualization capabilities of Bell’s Z-SAN.
Protection against data loss will be delivered through RAID 10 mirrors (duplicate copies) of the raw video data, transform data, and metadata files.
“Our approach allows us to eliminate a lot of cost by using high volume, commonly available systems,” said Jeff Greenberg, senior director of product marketing at Zetera, the vendor designing the SAN.