Suppose for a minute that you are tasked with constructing or upgrading a corporate data centre. Being a responsible IT pro you decide to seek out best practices which will guide you to a solid computing model absolutely guaranteed to operate efficiently and please your employer.
Good intentions, but forget it.
Even if you manage to come up with a well-reasoned plan, based on objective design considerations, it may be overturned six months after implementation when the business decision makers get turned on by a new computing trend. Looking over the last 20 years, it’s clear that fashion dictates infrastructure no less than it does skirt length, as enterprise computing resources moved from the glass house to multiple server rooms to the desktop to the network and now to…well, take your pick.
But you need to build a data centre and objective design considerations do exist, so undaunted you begin with the two most important factors: the type of computing you need to do and the amount of money you can spend.
Strength in small numbers
One best practice currently much in vogue is the resurrection of the traditional data centre, the bastion of computing power in which big servers handle the corporate heavy lifting. These brawny, expensive boxes drive business processes but they exist in an IT preserve, walled-off from other computer users.
Many firms looking to recentralize data are migrating individual servers to one central machine or even consolidating multiple data centres into one large facility. Compaq Canada Inc. has a formal program in place to help customers do just this. One Canadian customer, for example, is replacing 17 nationally-distributed Alpha servers with five new boxes.
“The ongoing costs of managing and supporting 17 systems is so much higher than it will be for five,” said Ira Weiss, business manager for high performance servers at Compaq Canada in Richmond Hill, Ont.
Telus Corp. is making a similar migration. The company currently serves Alberta through three data centres in Edmonton. But in the near future, those three centres will become one.
The driving motivation is dollar savings. “It costs a lot to maintain a room to house the servers, plus the real-estate costs and everything that goes along with that,” said Clinton Wasylishen, a server support specialist with Telus Enterprise Solutions in Edmonton.
“Consolidating also saves time from an organizational point of view. I’m on call right now and we have about 300 servers, so if one goes down it takes time to track down that box, to find out which building it’s in. It will be easier to [find servers] when we’re operating a single data centre.”
Wasylishen said Telus’ experience is a classic example of the paradigm swings that hold sway over many computing decisions. “We’re all going back in time. The original plan was everything was central, then everything was distributed, now everything is moving back towards a centralized model. That’s evidenced by us having three data centres and moving to one.”
Wasylishen believes, however, that a centralized data centre makes sense for Telus because it has both the in-house resources to manage complex systems plus the type of heavy-duty processing that requires powerful machines.
Many hands make light work
Other types of compute jobs, however, are best served by large numbers of widely-distributed boxes. These are typically algorithm-based search functions performed in environments where cost is the primary consideration.
Cash-strapped research facilities, for example, often employ data centres which link large numbers of inexpensive PCs to create a system with a lot of processing power but little fault tolerance, mirroring or redundancy, according to Christopher Hogue.
“Scientific/technical computing always uses the cheapest, fastest boxes. We’re not looking for a durable, 24/7 box. In technical computing we just want the answer, we don’t care if the job dies in the middle because we can just restart it. So we go for a massive number of cheap processors.”
Hogue is currently heading up blueprint (http://184.108.40.206), a non-profit organization which is building a research database for the study of proteomics, the systematic analysis of protein sequences and protein expression patterns in tissues. The database will be freely available to the world’s scientific community. Hogue is also the founder and CIO of MDS Proteomics in Toronto and a scientist in bioinformatics at the Samuel Lunenfeld Research Institute at Toronto’s Mount Sinai Hospital.
The blueprint database is temporarily housed in Vancouver and at Hogue’s Mount Sinai lab, which utilizes a 216-processor Beowulf cluster, two four-processor machines from SGI and Sun, and an eight-way HP box.
“But that is not the data centre we want to build blueprint on,” Hogue said. “We’ll need more robust database and Web servers, and more powerful data processing and analysis [equipment].”
The search is currently on for a permanent home for blueprint. For the new centre Hogue plans to link standard boxes to create a do-it-yourself storage area network. “You can build a generic storage area network with a four terabyte capacity for less than $200,000. That saves us an incredible amount of money, compared to traditional SCSI, fibre-channel type configurations. Four terabytes on fibre channel would cost you a million dollars plus.”
But this type of compute infrastructure won’t work in all environments. “There is only a small subset of problems that work well on distributed computing. They are what I call search algorithms – such as cryptography-cracking algorithms.”
Most business tasks, such as manufacturing or financial systems, are poorly suited to these architectures, Hogue said. But where this data centre model works, it can work very well. Just ask the people who run the Google search engine. A Google search encompasses about 1.4 billion Web pages and returns results in less than half a second. And that happens more than 120 million times a day.
Those impressive numbers are generated by more than 10,000 cheap, off-the-shelf PCs distributed amongst four data centres, two on each of the West and East coasts.
And only a handful of people watch over this massive system. “A team of less than 20 people manages all of our infrastructure – that includes bandwidth, servers, DNS, routing, load balancing, everything,” said Google spokesman David Krane in Mountain View, Calif.
Small commodity servers are simple and easy to manage, Krane said. “Larger boxes with multiple processors from commercial vendors are very expensive, and often they require the support of someone inside a services organization.”
Krane also pointed out that if Google moved its processing to eight big machines, for example, the failure of one would be a significant problem, but with the current system “if 500 machines die for some reason we don’t even notice it.” In fact, maintenance and upgrades are performed by taking an entire data centre down during off-peak hours, and “users don’t notice a slowdown in performance.”
But while Google splits its computing between two ends of a country, some organizations don’t know nor care where their servers are. Initiatives such as the Search for Extraterrestrial Intelligence (www.seti.org) or Intel’s Philanthropic Peer-to-Peer Program (www.intel.com/cure) harvest the unused processing power of dormant home and office PCs. When they’re not in use, software installed on the machines kicks-in and requests a processing job from a central server. Once a job is completed the result is sent off and another is requested.
Each of these small blocks of processing combine to form huge computing structures. Intel estimates its cancer research initiative will create the world’s largest and most powerful computing resource by linking millions of PCs. This virtual supercomputer should be capable of more than 50 teraflops of processing, about 10 times more powerful than today’s highest performing supercomputers.
New music, same steps
Now, let’s get back to the opening scenario: you need to design a data centre. You’ve examined the business goals of the company and matched those to available computing resources, and then selected an appropriate data centre model.
And if you’re pleased with the fruits of your labour, well, here’s something else you should know. It’s something that seasoned tech pros have already learned: as important as business and technical requirements are, the fashion of the day is often as potent a force.
Remember that old joke about the CEO on the plane who flips through the in-flight magazine and returns to the office with a new operational game plan for the IT department? Well…
“That’s not really a joke, that’s the reality of how these things happen sometimes,” Wasylishen said. “If the business says ‘This is what we need’ then it has to be done that way – and if three months later they tell us to move it all back then that’s what we have to do.
“That’s the reality.”
Wolchak ([email protected]) is a freelance journalist in Toronto.
Collocation: someone else’s headache
Beyond basic data centre design there is one other decision to make: do you want to host the hardware yourself or pay someone to manage the physical infrastructure for your severs.
In this latter plan, called collocation, a service provider houses your servers and delivers physical security, 24/7 monitoring, Internet connectivity, emergency parts replacement, and firewall, UPS and backup services.
“Farming out this work is easier and cheaper than doing it yourself, and you off-load accountability from your over-stressed IT people to someone for whom it’s the core responsibility,” said Ovum Ltd. senior analyst Christina Kasica in Wakefield Mass.
Kasica also said those who outsource typically cut hosting costs by 50 per cent, compared to doing it themselves, and achieve improved uptime and reliability.
Osama Arafat, CEO of Q9 Networks in Toronto, said adding up hardware and maintenance costs alone can justify the decision to outsource data centre management.
“With back-up systems, for example, there are two kinds: there are basic $3,000 ones and $30,000 ones, without much in between. Then there is load balancing and having the right hardware. For example, a lot of machines have dual power supplies, which is great, but the big problem is that when one power supply goes out, the other one kicks in, and the machine continues to work (and the failure goes unnoticed). Unless you go in to check daily to see what’s going on, the next time you find out there’s a problem is when the second supply dies.
“If you add up the costs to do all that, it’s ridiculous. Q9 Networks was created to offer everything I’ve just described at a fraction of the cost of doing it yourself.”
Beyond dollar savings, Arafat said, is peace of mind, and he points to Q9 customer Noranda Inc. as an example. “They pay for managed firewalls, managed servers, managed tape back-up, everything is managed. We configure the operating system, we spare the servers so if something blows up we have a replacement, we monitor it so anything that goes wrong raises an alarm in the Network Operations Centre, and if there is any other problem we will service the server on a [24/7] basis. From a customer perspective, this is a platform to put their application on and all they have to know is it will never go down.”
But for all the advantages of collocation, Ovum’s Kasica said buyers need to beware: many firms have failed to deliver satisfactory levels of service, and an industry shakeout may be coming.