The data dilemma

When it comes to storage infrastructure, few topics have been more talked about in 2011 than the phenomenon of ‘Big Data’ — the exponential growth of data — largely off the back of the huge transactional systems which underpin global commerce — which must be stored, backed up and archived.

For IBRS analyst, Kevin McIsaac, part of why Big Data has become such an issue is the fact that while transactional data has been piling up, unstructured data, such as Word and Excel files, videos, photographs, emails and the like has grown exponentially.

“The amount of data we have which are files, images and graphics, is now between 65 to 80 per cent of our data so it’s by far the largest part,” McIsaac says. “Transactional data has been growing at around 35 per cent per annum whereas unstructured data is growing anywhere from 60 to 100 per cent per annum.”

Add to these two issues the increasing number of devices each employee now uses within work IT environment, and data generated from new communication media such as Twitter, Facebook and LinkedIn, and you’d think that making sure organisational backup and archiving hardware, software and strategies were optimised would be a major priority. But, you’d be wrong.

If the storage experts are to be believed plenty of organisations are confusing key functions of storage management — namely backup and archiving — resulting in increased costs, power consumption, widening backup windows, mixed success recovery processes, and unnecessary data sets.

So just what is backup and what is archiving and how did the two come to be confused?

Backup vs. archiving

On paper, the distinction between backup and archiving is black and white. As a data storage process, backup is more immediate and aims to enable the business to recover data quickly and from a recent period of time, typically between one and six months, and is often carried out using disk as the storage medium. Archiving is the storage of data for many years, such as for future financial and compliance audits. Given the span of time, and the cost to spin disk for years at a time on the off-chance data is needed, the task of storing this data often falls to tape-based platforms.

Where organisations often go wrong in their storage strategies is in keeping their backup data on disk for years despite the fact that the IT department is unlikely to be called on to recover data that is more than a month or two old. “Archive is different from backup, its purpose is long term retention, while backup is to restore a few hours, a day or a few weeks later,” IBRS’ McIsaac says.

In addition to confusing archiving with backup, Telsyte senior analyst, Rodney Gedda, says some IT departments do not apply a ‘use-by date’ to their data to allow it to move from backup to archiving.

“I think there tends to be a bit of a mish-mash out there, you’ve got organisations like the ABC which previously put everything on archive tapes and have now decided they need to access it, or it might be a case of thinking you need to infinitely store everything and have access to it,” he says. “However, that’s not a prudent decision as you end up with expensive disk systems running and archiving and backup programs that aren’t necessary.”

When it comes to the success of backup and archiving, Gartner analyst Phil Sargeant, says a contributing factor has been the tendency for vendors themselves to focus more heavily on the backup and recovery space, with archiving platforms only becoming more prominent in the last few years.

Despite some confusion between the two functions, he says larger corporates generally get it right more often than SMEs. “There is no doubt that the lines have blurred, I’d say a lot of the larger companies I speak to are fairly sophisticated so they don’t confuse that… they know the distinction between backup and recovery and archiving, I’d say small to medium businesses are the ones who get a little bit confused at times,” he says.

“When there wasn’t a major distinction between the two a lot of people got confused because they could almost use backup and recovery and archiving interchangeably because there wasn’t the big data sets, but that’s certainly not the case today.”

One organisation having to come to grips with backup and archiving is Australian financial services firm Teacher’s Credit Union (TCU). Its CIO, Colin Thomas, says that the additional impetus of regulations to retain heath and financial data for certain lengths of time has prompted it to take a much closer look at its storage strategy.

“It’s much easier to back things up than to take the position of classifying data and actually start to expire data you no longer need,” Thomas says. “[Keeping everything] was a previous approach we took.”

The firm is now undergoing a data classification process to determine specific retention periods for different types of data.

“We’re implementing a document records management solution as well so we have to determine retention periods and whether or not we need to keep the data on worm (Write One Read Many) storage or whether it needs to be retained on normal disk type storage.”

” We have a preference to keep all our long-time member information that’s required for contractual reasons, on worm storage, it can’t be changed once you store it and the regulators and the courts tend to require evidence that you have a document in its original state.

“In the past we would have done optical storage but now we’re able to keep that online using EMC technologies to ensure the data we hold on disk cannot be changed from its original state.”

The firm uses a CommVault backup toolkit to move data from its primary storage onto EMC’s Data Domain which it then uses to replicate across the Wide Area Network (WAN).

Boasting 155,000 members, the firm has also experienced exponential data growth rates as it develops different business products requiring the investment of time and funds in a range of new data types.

“We’re addressing some substantial growth requirements, add in a lot of legislative requirements such as mortgage documents, which have to be kept for extended periods, bank statements and all manner of different regulatory reporting and it’s quite difficult to predict as well as hard to manage,” Thomas says.

On the other hand, Brisbane-based Anglican Church Grammar School (ACGS) uses the same system for both its backup and archiving, making no distinction between the two functions.

The school’s 10-strong IT team have just transitioned the school, which has about 1700 students and 250 staff, to a virtualised environment with the deployment of 30 virtualised servers leaving just a couple of physical servers remaining.

According to its network systems administrator, Gavin Rees, a nightly backup across the school’s servers also acts as the basis for its archiving.

“The Storagecraft ShadowProtect platform operates as a straight backup tool, we can at any point in time roll back a server for whatever reason, such as a critical failure or a virus and we also use it to recover user data,” Rees says. “To a lesser extent we use it sort of as an archiving solution, mind you we do lose data, if data changes we can go back at any time and do comparisons on data but it’s not a true archiving solution where we capture every single change that happens across the board.”

This is a concern, Rees says, as financial data must be kept for seven years, but with archiving solutions remaining too costly, a sequel database is used to secure sensitive data.

“Because it’s written into a sequel database it is tracked and logged, sequel is more secure and because it’s a financial system, everything that happens in that database is logged providing a record of who has logged in and what they’ve done.”

“We’ve looked specifically at archiving a few times but so far the solutions have just been way too expensive and that’s been the main issue when talking to executives.”

Both Thomas and Rees express dissatisfaction with tape-based technologies and their desire to take tape out of the equation altogether due to its inability to easily go back at random to find specific data.

“It’s really tedious, having done it any number of times the issues are whether the tapes are on site, whether it’s the tape in the machine or up in the safe, then whether the tapes’ catalogues have been deleted, it’s a hugely involved, horrible process to restore stuff from tape,” Rees says. “We’re now buying more disk arrays and basically our theory is we’ll be able to keep up to seven years backup on disk in a DR site.”

Thomas says one of the main reasons he has moved his data online with EMC is the painstaking nature of tape, but he cannot get rid of the technology altogether until the data currently stored on tape, which has not been moved to Data Domain, has expired.

Rees also notes the use of the Cloud to store data remains out of the question for the school, mainly due to the cost fuelled by the sheer volume of data the school has accumulated.

“Even if we’re just moving the data backwards and forwards we have an issue with the amount of data we have, to fix this we’re doing little bits with internal Clouds where we’ll have data storage at a different DR site across the road,” he says. “So we’re happy to say we have an internal Cloud but it’s really just data across the road.”

Getting your storage in order

With data growth only increasing, IT departments must be hasty in rectifying any confusion between backup and archiving functions and focus on cleaning up their organisation’s storage moving forward.

When it comes to getting on top of data it all comes down to one thing: strategy. IBRS’s McIsaac says that any good strategy must recognise different types of data at their sources be they email, file shares, or document storage repositories before approaching vendors and considering the various storage, dedup, compression and other related technologies.

“The problem is vendors are all about ‘I’ve got this product you should use’, whereas really this is more about policy and approach,” McIsaac explains. “If I take a backup, how long is it worth keeping, and if I need to recover data from an archive I need to have an archiving approach and how long should that data be archived and how do I get it back. Then you start looking at pieces of technology.”

Policies around data retention need to be nutted out to categorise data and ensure it is kept in accordance with regulations, as opposed to being unsure and keeping everything for seven years just in case or turning a blind eye. “Unfortunately businesses do turn a blind eye to [backup and archiving] which is why it is out of control, so what IT does is just keep everything and back everything up and then we end up with the very large backup and recovery problems with very large data sets that frankly, are unnecessary,” he says.

Businesses should assess their data requirements with an open mind and refrain from restricting certain types of data to certain systems in order to get the most out of the platforms they already have, Gartner’s Sargeant advises.

“The first thing they need to do is forget about the technologies at this point in time… there’s obviously some information that they’ll want to keep around but they need to think about their data and construct policies and once they’ve got those in place they can then decide on what is the most appropriate technology to use,” he says.

Even with stringent policies around data, organisations will be reluctant to let go and officially press delete, he says, which is a tough decision, but necessary as keeping old and stale data in the mix will only cost more and be detrimental in the long run.

“IT needs to sit down with the business and talk about the costs, both the cost of actually taking the backups and holding the data, and also the liability of keeping it with senior executives,” McIsaac says. “Unfortunately most executives don’t want to talk about it, the legal department doesn’t want to get involved, but IT needs to engage the data owner to come up with a policy about what can be deleted and when.”

For TCU’s Thomas, policy is fundamental and goes hand in hand with classifying data and implementing deletion dates on data.

“[Keeping everything] would have been a previous approach but now we’re starting work around best practice models such as the Cobit framework.

“Policy work is difficult but you need to know the framework you’re intending to operate on… while it’s difficult, it’s actually the processing, procedures, behavioural and expectation change you have to go through that’s most challenging.” ACGS’s Rees concedes that putting policies in place is a must but that it is an incredibly difficult feat once staff are set in their ways about what they store on the network.

“It’s really difficult to do once the horse is out of the gate, because you’re fighting end users, if you limit someone who comes into the organisation they accept it but if you try and take back something they’ve already got it’s really hard,” he says.

“Storage is really painful now, it never used to be an issue and in the last few years where everyone now has a digital camera, the storage needs have just exploded,” he says. “That’s probably the thing we find the most difficult or struggle the most with is getting people to limit what they’re storing on servers.”

“If you don’t have a storage policy in place, put one in before it’s too late, we’ve tried to do that and we still have issues all the time, putting policy in place that says this is what’s acceptable to keep on the network rather than trying to fight that battle later.”

Related Download
Virtualization: For Victory Over IT Complexity Sponsor: HPE
Virtualization: For Victory Over IT Complexity
Download this white paper to learn how to effectively deploy virtualization and create your own high-performance infrastructures
Register Now