Taming the databeast

When David MacDonald became information systems manager at Visible Genetics Inc. two and a half years ago, the Toronto biotechnology firm had about 70 GB of data, mainly generated by software the firm developed to analyze DNA sequencing information, which helps predict the resistance of diseases to new drugs.

Today, Visible Genetics has about 300 GB of data at its Toronto location alone, with similar amounts at its U.S. and the U.K. locations. MacDonald expects requirements to grow another 170 GB in the next twelve months.

IT budgets may be shrinking, but storage budgets in most organizations are staying roughly constant, says Paul Ross, director of networked storage marketing at EMC Corp. of Canada in Toronto. Research firm International Data Corp. Canada Ltd. in Toronto estimates storage accounted for about 30 per cent of the average system sale in 1998, amounts to 50 per cent now and will eat up 70 per cent of the total by 2004.

“We’re seeing anywhere from 50 to 60 per cent growth in storage requirements,” says Bruce Jolliffe, director of enterprise solutions management at Hitachi Data Systems Canada Inc. in Montreal.

Crunch time

So why, all of a sudden, is storage such hot property? Customer relationship management (CRM) software bears much of the blame for growing storage needs. With businesses gathering as much data as they can about their customers, the sheer volume is overwhelming existing storage capabilities.

Growing demand for electronic commerce and multimedia applications such as audio and video streaming are also boosting storage requirements, says Alan Freedman, IDC Canada’s research manager for servers and storage. The growing popularity of these disk-hungry applications, coupled with the declining cost of storage hardware and a drive to centralize storage systems are helping keep the market going even in tough economic times, Freedman said.

Those increased storage budgets are buying more and more storage capacity for the buck. “There’s a definite price-per-megabyte drop,” says Paul Patterson, enterprise storage marketing manager at Hewlett-Packard (Canada) Ltd. in Missi-ssauga, Ont., “and it’s been pretty drastic in the last little while.” Ross says

disk-drive capacity is doubling every couple of years – “sometimes faster than that” – while drive prices remain more or less the same. The problem, he adds, is that the amount of data the average organization has to store is growing as fast as the disks, if not faster.

“I talk to customers who say their data needs are doubling every eight months,” says Kyle Foster, general manager of storage sales at IBM Canada Ltd. in Markham, Ont. In fact, Freedman says, end-user capacity requirements are doubling as often as every five to 10 months. However, businesses may keep their overall needs from growing that fast by learning to use storage more efficiently.

Coping with demand

To do that, many are turning from direct-attached storage – where each processor has one or more disk drives of its own – to networked storage. From now until 2005, IDC predicts, sales of direct-attached storage will decline six per cent per year, while sales of storage-area networks (SANs) will increase 26 per cent and those of network-attached storage (NAS) will rise 56 per cent annually.

In fact, no business today can create a storage strategy without understanding the various ways storage can be shared among processors through networks.

Networked storage aims to use storage more efficiently and reduce complexity. “I think customers have started to realize that managing all these silos of data within all those servers distributed everywhere is becoming unmanageable,” says Jeff Goldstein, general manager of networked storage vendor Network Appliance Canada Ltd. “Your accounts receivable database is out of space while your Web servers have lots of space, and yet you can’t share the space with the old model.” With networked storage, you can, he added.

The Export Development Corp., an Ottawa-based bank that supports Canadian exporters, is moving to networked storage so data can be shared among clustered servers and easily replicated to a disaster-recovery site, explains John Purdie, director of infrastructure services. A storage-area network from EDS will eventually replace all EDC’s disk storage, he says.

“Our data storage requirements are growing by leaps and bounds,” Purdie says. “There are some things that we know we have to do in the future and we just know that we aren’t going to be able to use the data storage methods we’ve been using to support the growth of the business.”

The best-known forms of networked storage are SANs and NAS. Despite their similar names, these are neither the same nor competing technologies. They work differently, but are often used together. And they are not the only forms of networked storage.

Ross at EMC says two main factors differentiate types of networked storage: the type of network and the form of access to the data. In each case there are two choices, and combining these yields four possible technologies.

The two kinds of network are the Internet Protocol (IP) network, of which the Internet and most corporate networks are examples, and the channel network, which is designed for high performance and used mainly for specialized purposes. An example of a channel network is Fibre Channel technology.

Data access can be at the file level and or the block (a group of files) level. Most desktop applications use file-level access, but large database systems need block-level access.

NAS uses IP or some form of packet networking, in most cases over Ethernet, and file-level access. NAS doesn’t require a dedicated network, so it’s easy to install and manage, and it looks to users like another hard disk on the PC, which makes it simple to use. Also, Ross says, multiple users can share access to files – good for data that end-users need to get at directly, like spreadsheets, documents and presentations.

View from the ground

Visible Genetics uses filer network-attached storage devices from Network Appliance to cope with its ballooning storage needs. MacDonald says the company chose this approach because it promised the fastest and most convenient access to DNA data stored in many small files – the 300 GB of data stored at the Toronto location is divided into about eight million files, he says.

A SAN is a dedicated channel network linking servers to multiple storage devices, and it uses block-level data access. Thus it provides high-speed access to data and it supports large database applications well. But a SAN is more

complex to install and run – and more expensive. Still,

for some needs, a SAN is the best solution. Keep in mind that SANs are not for tiny businesses, Foster says, “but you can start quite small.” Hitachi’s Jolliffe says a SAN starts making sense at about 300 or 400 GB of data.

Export Development Corp. chose a SAN because of the need to support five different operating-system platforms, says Rob Pettifer, a storage architecture project specialist at EDC.

Rather than transfer all its data to networked storage right away, Purdie says, Export Development Corp. is taking a phased approach. The priority is to move to the SAN the data that would be most urgently needed at the disaster recovery site if the primary system were lost and that would be hardest to restore from tape. Other systems will be dealt with later, he explains.

In a third form of networked storage, multi-path file serving, a separate channel network links storage devices to processors, as in a SAN, but access is at the file level as in NAS. This provides the performance of a SAN, Ross says, but the file-sharing capabilities of NAS.

The fourth possible combination is block-level access using an IP network. Ross calls this Block Storage Over IP. It uses the IP networks that almost everyone already has, but supports block-level access for database applications. IBM and Cisco Systems Inc. both offer a version of this called IP Small Computer Systems Interface (iSCSI). “I’ll call it a poor man’s storage area network,” Foster says. “Not to suggest that it’s not just as robust.” Foster says iSCSI is more suited to a small office or a department within a large company.

Whatever approach you take to networked storage, Freedman notes, some of the pros and cons are the same. Networking storage uses capacity more efficiently than direct access, improves security and makes the storage easier to manage, but on the other hand it also calls for more technical expertise, which is often hard to find, so hiring and/or training the necessary IT staff can be a challenge, especially for small to medium-sized business.

Beware the pitfalls

Foster says small- to medium-size organizations encounter four major storage problems. The first is the need to consolidate storage so as to use it more efficiently. The second is data protection. “Almost all customers … don’t know how many servers they have,” he says, “and if you don’t know how many servers

you have, how can you possibly answer my next question, which is: Are you backing it up?” The third issue is disaster tolerance, and the fourth is ensuring that data is readily available to everyone who needs it.

Understanding which of these problems are most urgent in your business is a pre-requisite for planning your storage needs.

The next battleground, Patterson says, is storage management. Storage management tools help you manage a variety of storage devices, track how heavily existing storage is used, and thus predict future needs. They can also help optimize the use of existing storage.

And in the meantime, how can you assess your storage needs? Experts admit it’s a bit of a black art. Essential things to look at include current usage patterns – how much of existing capacity is in use and how rapidly stores of data are growing – and how business plans may affect storage requirements.

MacDonald at Visible Genetics says he monitors the use of existing storage and also keeps tabs on business plans that might affect storage needs. He says Visible Genetics’ storage budget has remained roughly constant at about $30,000 for the last couple of years, but “that buys us a whole lot more space than it used to.”

The best single piece of advice may be to prepare a long-term strategy based on business needs, rather than just responding to short-term problems. “Every time they go out and look at adding storage,” Patterson says, “they need to look at long-term plans.”

Grant Buckler is a freelance writer and editor who has written on information technology topics for more than 20 years. He lives in Kingston, Ont.

Storage: Why you should care

Storage hardware is about as exciting to most IT and business managers as watching pet rocks sunbathe. But it’s rapidly becoming the single most important element of e-business innovation. Just look at your company’s e-business infrastructure.

There’s only one proprietary component: the enterprise customer data. Almost every other platform component is now a commodity; a company can substitute one

excellent vendor’s products for another’s

low-end and midrange servers, PCs and Internet hosting services, for example.

Here’s the problem: For decades, storage has been handled as just an add-on to IT strategy and as JBOD – storage professionals’ acronym for “just a bunch of disks.” One colleague calls this the “aspirin” approach. Your doctor tells you, “You’ve got a fever? Take two aspirin and call me in the morning.” Whoever handles JBOD purchases says, “Your data warehouse is exploding again? Buy two clusters and call back next month.”

Try asking your best telecommunications experts about Fibre Channel or backup and archiving. Then talk to the storage people about IP-based SANs. In most instances, you’ll see blank stares. Look at the network architecture plans. See if you can find the storage architecture plans. Good luck. Then look at your company’s many CRM activities and see if there’s any discussion of their implications for storage beyond JBOD and “aspirin.” Again, good luck.

In the JBOD world, vendors are box salespeople, and organizations are box buyers. Both are in a commodity transaction, not partners in enterprise storage strategy. The JBOD suppliers come in with feature lists, prices and service promises. That’s fine for semi-commodities such as low-end servers, PCs and Internet hosting. But it’s inappropriate when the discussion is about storing the firm’s customer data resources or its e-business strategy and platform architecture – and recognizing the importance of never putting either at risk.

As the storage issue rises above JBOD, IT must redefine the vendor dialogue, and vice versa.

Keen is chairman of Keen Education and an author and consultant. His Web site is www.peterkeen.com and he can be reached at peter@peterkeen.com.