Checklist helps smooth data centre shift changes

When your data centre is running 24/7, the operation uses three shift supervisors. Making sure they’re on the same page is critical to smooth operations and to prevent surprises. What’s the best way to hand over the wheel? A little bit of MBWA – management by walking around.

“You can’t manage a data centre from a cubicle,” says Darin Stahl, research lead with Info-Tech Research Group’s data centre practice. A former data centre supervisor himself, Stahl says looking at monitoring tools isn’t enough. “I want to walk the floor,” he says.

Stahl and the data centre group recently published the Data Centre Shift Turnover Checklist, a template companies can customize for their own operations to make the handover smoother. “It’s meant to facilitate an orderly turnover from shift to shift,” he says.

The checklist covers three broad areas. First, “we wanted to hit the perimeter and the facilities,” says Stahl, checking the status of HVAC, fire protection and power and UPS systems. The second section deals with pending and completed changes for the shift. “I want to know what was completed on the last shift, so if 45 minutes in something goes awry, I know where to start,” he says.

“I’d rather be on top of the situation from the get-go.”

The third section deals with about a dozen operational activities – backups, tape deliveries, server and network status, etc. And the checklist uses a much-overlooked technology – pen and paper. While the group discussed doing the checklist online, the list requires a lot of visual inspection inside and outside the building, which makes an online version less than ideal, he says.

Keeping on top of ops

Stahl advocates a formal shift turnover meeting between the supervisors involved – their hours should overlap to allow this. Depending on variables like whether there are tape or print rooms in the facility and how much outdoor inspection is required, the checklist should take less than 45 minutes to go through, he says.

The walkaround ensures that the oncoming supervisor knows, for example, whether there are vendors inhouse working on equipment, whether there are any potential health and safety issues, and whether someone has left a door unsecured sneaking out for a cigarette. “It’s amazing what some people will do,” he says.

It’s also a good practice from an audit perspective, Stahl says. There’s a written record of the state of the operation every eight hours. Not that data centres need hang onto the checklists for seven years like financial data, but “it’s demonstrating that you do what you say you do.”

Gary Baker, the partner leading Deloitte’s enterprise risk practice in Toronto, agrees that a “structured conversation tool” can help with continuity.

“I’m not a big fan of checklists generally,” Baker says. “As an auditor, with a checklist, you forget to think.

“They are not a substitute for critical thinking.”

But from the perspective of a data centre supervisor, he says, it’s a very useful tool.

“One of the key elements of IT is integrity of operations,” he says. Once the status of facilities and operations is documents, it becomes something auditors can use as evidence of operational integrity. But the primary advantage is having the exchange itself.

“Just the simple act of writing it down seems to crystallize what it is and what was said,” Baker says.

In terms of retention, Baker says, companies should hold onto the documents as long as there’s business value, not just to satisfy auditors. That would be a window of “months, rather than years.”

While companies can make their own checklists from scratch, “this kind of gives you a jumpstart,” Stahl says. In the shoes of a data centre supervisor, Stahl says, “I’d rather edit than create.”

Related Download
How Well Do You Know Your Apps? How to Implement a Continuous Application Monitoring Initiative Sponsor: HPE
How Well Do You Know Your Apps? How to Implement a Continuous Application Monitoring Initiative
Watch our insightful security webinar to learn more about how to implement a continuous application monitoring initiative.
Register Now