Goal of converting millions of historial documents is a complicated task. Read why
LAS VEGAS – The CIO of the Vatican Library is developing a digital archiving strategy of Biblical proportions.
Speaking at EMC World this week, Luciano Ammenti described the Vatican Library, or the Biblioteca Apostolica Vaticana, as a 500-year old story captured in more than 2,000 manuscripts. His aim is to digitally preserve the entire collection, which also includes maps, with an initial target of 20 per cent scanned. The challenge, however, is getting a clearer picture of what exactly is in the collection.
In 2010, Vatican City created a commission to look at the possibility of digitizing the whole collection as way to conserve the manuscripts. Back then, the development of a policy for this process and finding the right hardware for the job was a high priority, Ammenti said.
Through a channel partner called Terra Group, Ammenti was introduced to EMC. The storage giant outfitted the Vatican Library with Isilon arrays, VNX arrays, VPlex, and SRA products.
The entire Vatican Library has 42 km of shelving and some of the documents are from the first century. It even has the first-ever printed document.
Ammenti described the digitizing the collection as a “day-in, day-out work in progress.
“I have to thank EMC for their patience with us. We changed our philosophy day after day. One day we need one petabyte of storage and the next it’s two petabytes,” Ammenti said.
The Vatican Library decided on a policy of digitizing any manuscript that provides a benefit for humanity. The documents must also be accessible for people to read them anytime, anyplace. Approximately 40 million pages have already gone through the digitization process using the Common Internet File System (CIFS) format, he said.
CIFS is one proposed standard protocol among many. An open format originally developed by Microsoft, CIFS can make requests for files from remote computers or mobile devices or the Internet. Ammenti pointed out that the market has yet to standardize on one file format, which made the decision to support CIFS a tough one.
The next step for the Vatican Library is to implement a ViPR solution, which was released by EMC approximately a year ago.
ViPR is EMC’s software-defined storage solution that automates the management, delivery and access of the storage. ViPR also provides an open, extensible architecture for data centre integration with services that can scale to the cloud.
About two years ago Ammenti’s team struggled with its data centre because the digitization project was on a network attached storage system. By using ViPR it will provide the Vatican with analytics for each document.
Ammenti said that it’s important for searching specific aspects of the document such as the calligraphy, or a picture that is inside the manuscript. “Each manuscript has several objects to digitize that cannot be touched,” he said.
A side project for the Coordinatore dei Sistemi Informatici is to implement data system recovery, not just for the Vatican Library but for the whole state itself. That can get complicated, Ammenti admits.
“It’s easy in the U.S.A., but it’s complicated in Italy. The U.S.A. has an infrastructure and the cloud. All my data is at hand. We have files the size of 45 petabytes and when we try to get that to the cloud, it crashes. We have an internal cloud, but right now this is not possible. It’s a dream. One day it will be possible,” Ammenti said.