The grid scalability struggle

HALIFAX – Grid computing might be one of the biggest buzzwords in the industry today, but those in the know say the technology has some distance to go before reaching maturity.

During a press conference at Dalhousie University in Halifax, Sun Microsystems Inc.’s director, grid computing described some of the hurdles the technology faces. Dr. Wolfgang Gentzsch said small-scale department and enterprise-sized grid computing projects, wherein users share microprocessing power across the local area network, are relatively simple to do. But when it comes to super-wide-area-network projects, the grid loses its edge.

“The next step is the global grid,” Gentzsch said. But he pointed out that on a larger scale, it becomes more difficult to add processors to the distributed engine. He explained that the technology to connect multiple small grids – essentially building a distributed supercomputer – does not exist.

Gentzsch said Sun has part of the answer in its “N1” infrastructure. N1 is supposed to add a virtualization layer to the network such that sourcing processors across global grids becomes less arduous. However, Sun’s network-centric initiative is nebulous for the time being. The company described some of its functions during a user’s conference held last month in San Francisco, but provided few details about the technology.

Consider as well standard network uptime, which might not suffice for the big processing projects grid systems are obliged to handle, Gentzsch said. Whereas “three nines” (99.9 per cent) uptime is sufficient for the average data network, it’s not enough for global grids, which would have to keep CPUs connected for longer periods of time to facilitate massive computations.

“We built the infrastructure gradually,” Gentzsch said, pointing out how unwieldy the grid can be outside the controlled environment of the enterprise. “[Networks] have grown over time, unstructured.”

Beyond the bits and bytes, sometimes users can pose problems for grid projects, said Terry Dalton, head of strategic IT with the National Research Council’s (NRC) Institute for Marine Biosciences at Dalhousie University, where Gentzsch spoke to the press.

“It’s more human than anything,” Dalton said. “The technology is there, it’s just getting over the idea of sharing. If I’m going to share my computer with you, are you going to hog it? Will it hinder my own work?”

As well, consider the grid manager’s role. How is the network’s overseer supposed to decide which users have access to what processors?

“You need policies to enforce resource utilization,” said Gentzsch, but Dalton said that can be difficult in large systems.

“That’s why you’ll find most grids in confined, controllable situations.”

Mind you, this isn’t always the case. Consider the NRC in Halifax, which connects servers and CPUs for hefty processing. The institution relies on Canada’s Advanced Internet Development Organization’s (CANARIE Inc.) CA*net 4 national optical network to support a grid environment for bioinformatics research.

Researchers connect through NRC’s Canadian Bioinformatics Resource (CBR) to access 10 large computers across the country, said Simon Mercer, manager of the NRC-CBR, which recently became a Sun Center of Excellence for its grid know-how. He said the system comprises 20 processors for the moment but could grow to more than 1,000 in the future.

The NRC system won kudos from one user. Sean Hemmingsen, visiting Halifax from the Plant Biotechnology Institute in Saskatoon, said the CBR grid goes a long way to helping him with “multiple sequence alignment,” “phylogenetic analysis” and other work that “you can’t do by hand.”

“Now we have the capacity required that I never dreamed we’d have,” Hemmingsen said.

Ken Edgecombe, executive director of the High-Performance Computing Virtual Laboratory (HPVCL) at Queen’s University in Kingston, Ont., said grid computing is a boon for researchers seeking computer power. Still, he pointed out a problem that might have more to do with a lack of science funding than a lack of technology, but frustrates users nonetheless.

“For most researchers, the bottleneck is between the desktop and the university backbone,” Edgecombe said, suggesting that users need quicker links to the local network if they’re to benefit from grids.