How to minimize your vulnerability to your service provider

With 23 years of IT experience – including 10 related to financial services – Ed Bilewicz, currently vice-president of application hosting for 724 Solutions Inc., has been at both the giving and receiving end of outsourcing. He shares with IT Focus the lessons he has learned along the way.

IT Focus: What are the data centre management outsourcing choices facing financial services companies today?

Ed Bilewicz: There are basically three kinds of models in a hosting scenario. One is where we basically say ” you run this for me. Everything. I just want to pay you a cheque every month and I tell want you to tell me how you’re doing on your service levels.” That’s like a fully managed service. That’s one end of the spectrum. Then the other end of the spectrum is the model that I currently favour and use which is called co-location (co-lo). Because I don’t have enough money to build a data centre, I go to these data hotels — web hosting co-lo providers — and say: “You give me some physical security around my equipment, you give me some physical access controls that I own, and I only can tell you who can get in near my equipment. You give me the power, the fire suppression. You give me the bandwidth into the Internet. But I will manage everything else.” In that case, the co-lo provider basically doesn’t know what I do. My interaction with the co-lo provider is mainly to grant physical access to things, or if I get a shipment of equipment and I’ve got to get it ‘racked and stacked’. I need to get it mounted up in cabinets, need power drawn to it and connected to the network.

IT Focus: So you would be selecting, purchasing, implementing, installing the hardware and software?

Bilewicz: Right, and then I just plug into their router to get out onto the general Internet. That’s kind of the two extremes. Then in the middle is kind of a hybrid thing. It’s kind of a menu approach where I have staff that can manage this, this and this, but I don’t have staff that can managed this, this, and this, so I want to buy that service.

Let’s say that security is really important to you, then I would bring that function in house and I would manage that. But let’s say I didn’t want to worry about backups because it’s kind of a commodity. Then I let them manage the backup. I just tell them when I want it, when I need it.

There are a lot of products out there that monitor infrastructure health. Those implementations and those products are fairly expensive, so you may (use the provider’s) monitoring option. The thing you have to be conscious about is, you’re still relying on this so you’re not quite as nimble here. A lot of these services are non-intrusive. You’ve also got some flexibility in this model to go in either direction (to managed service or co-lo).

Where I started out here is that I didn’t have any staff. I had to get up and running quick. The only way to do that is you’ve got to find somebody who can provide that service to you. So we started almost as a fully managed service. That was fine to get us up and going, and then as we started looking at the services provided, they were good, but they weren’t flexible. So if I needed to go outside that service for a bit, or I needed to tweak it a little bit, then it would have been a custom solution and would have cost lots of money.

IT Focus: Can you give me an example?

Bilewicz: We had a fully managed firewall service here. In the wireless space, you have to be able to deal with these new gateways that come up all the time from wireless carriers, and they don’t tell anybody when they’re doing it. So if someone is maybe driving through Arkansas trying to do banking on their cell phone and they get passed over to a new gateway that our system doesn’t allow in, then we need to make a firewall change right away. The way the normal Internet data works is that you just need your outer secure zone open to the whole world. Banks can’t do that because they have restrictions on where you can do transactions and where they originated from. So we have to shut down and only allow in specific carrier gateways. In that case, you have to do a change quickly. With fully managed service, you can get a 24-hour turnaround. Well, we needed to do it immediately, so we needed the service level that was immediate. They couldn’t guarantee it, so one of the first things we started bringing back was the firewall management. Over time we migrated here to co-lo where we run everything ourselves and they just give us space. We run back-ups, restores, troubleshooting, install the applications, install the operating systems — all that stuff we do from up here. So, it depends on where you are, and what you’re trying to do . If you’re trying to get up and going quickly, then you have to go with a fully managed service. You don’t have time to recruit and develop that skill set. In a fully managed service, you’ve got the one throat to choke scenario. You call the one number and they’ll look after everything for you.

IT Focus: So when is going with managed services advisable?

Bilewicz: When you start up and you don’t have any staff. It’s when you don’t want to be focused on infrastructure type stuff; you want to focus on your core business. But in the fully managed service side, if you go with the big guys, they’re going to be around forever and they’ve come a long way from when we first started kicking the tires on them. They understand better the web hosting and the wireless side of things. Plus a lot of the enterprise customers are comfortable with the big name companies.

IT Focus: When would one steer away from fully managed outsourcing?

Bilewicz: Once you do have staff, you can probably do things faster in co-lo. You have more flexibility in how you set up and if you have a situation where you have to react quickly, then co-lo’s the better way to go. Usually in a fully managed service you fall into a more traditional project implementation. They’ve got to gather the right resources at the right time, whereas with co-lo you can kind of go with your SWAT team approach. We need this environment built here – go out and do it, then come back and build the next one.

Moving from a fully managed service provider is a much more contemplated exercise. You think about it really really hard before you try to do that, where as in co-lo, you’ve got lots of flexibility. The other thing co-lo lets me do is I can bring stuff up quicker. I can go anywhere in the world, and when I negotiate with (a) vender, it’s all commodity stuff; it’s a checklist. I want this, this, this and this. Whereas (fully managed is) a much more complex negotiation contract type stuff because they have service levels around various components – bandwidth, NOC (networks operation centre) where they’ll answer a phone and flip a switch kind of thing.

IT Focus: But what if your provider closes?

Bilewicz: If you’re pure co-lo, it’s easier to go somewhere else because you’re not stuck in integrated provided services. If I put all my eggs into one basket and this guy runs into trouble – like they have to close their doors or something – then I’m pretty well toast because now I’ve got to scramble and find out where else I’m going to run it. If I leased equipment from them, there’s an issue there. If I use any of their managed services, I have to get the same thing somewhere else – and nobody does everything exactly the same. So the way I would set up an environment is, I would take provider A and provider B and I would go co-lo. I would not use any of the managed services as long as I have the mandate to get staff. The mandate to get staff allows me to put a management or operations layer (above it all). I can set up a consistent layer of how I do everything which means all the management services I’ve got, I’m running in-house in my company, so that if (one of my providers) goes away – because I’m doing everything from (the management/operations layer). I can still offer a consistent service. If I bring in provider C, I’m still running it the same way because I have all my management tools and all my operational processes. In the scenario where you’ve got to move quickly from place to place, this is probably the best way to go.

IT Focus: How do you know that you have to move quickly?

Bilewicz: You don’t. You just assume you have to move quickly. If you’re only relying on one provider and that provider gets into trouble, then you’re kind of hosed. You’ve got to scramble and find another provider quickly. If you’re also using managed service, leased equipment and stuff – you’ve got a whole lot of work to do, and you’ve got to do it fast! (If you can afford it) you should go with two providers — different companies — and then you should start distributing your customers across both.

When I have two data centres, I have to build infrastructures on both places, so that’s where it gets expensive.

We alternate; we crisscross the two. We buy production environment for (customer) X (at provider A), and then I’ll have (customer Y’s) quality assurance (QA) environment (at the same provider). Then I’ll have (customer Y’s) production environment (at provider B), and (customer X’s) QA environment (at provider B). The QA environment is essentially a close approximation of the production environment. Then we cross populate into environments. When I back up this production environment (at one provider), that backup gets stored over (at the other provider) so that if (one) provider takes a hit I invoke disaster recovery, and I just bring up their QA environment as the production environment until I can scramble and get another QA environment for them somewhere (else). So they have no loss, minimal downtime, or no interruption of the services. What that allows me to do is distribute my risk.

IT Focus: What other lessons have you learned?

Bilewicz: You want to avoid the pure co-lo providers. Providers you look for are like somebody who’s got affiliations with a company with deep pockets — like a Telco provider.

(Another) lesson we learned is if you need flexibility in the service, then you have to bring it in house. Otherwise, you just can’t get the reaction or the speed that you need.

IT Focus: What is important to look for in a service provider?

Bilewicz: You want to look for the actual operations to have some kind of a certification designation. There are a couple out there – SAS 70. The Canadian equivalent to that is CICA 5900. There’s also something called SYS Trust which is a new one, which is kind of like a SAS 70. Then there’s the ISO series. What you want is consistent service from these guys. So if I call the NOC and say “I need someone to do this server” and their service level says they’ll do that within fifteen minutes, if they don’t, there’s a process to remediate that.

If you’re going to go with the co-lo model, this would be one of the things I’d suggest. It actually gives you a pretty good roadmap of how you should do things and where you’re weak and where you’re strong.

I would always check out the facility first hand. I would physically go down there. You want to make sure they have multiple power suppliers if that’s allowed in the region. If they can’t have multiple suppliers you want to make sure they’re on multiple power grids or power sub-stations feeding it. You want to make sure they have generators (that) are fired and tested regularly…. if they have multiple fuel suppliers in case one of the fuel suppliers can’t make it. You want to make sure their UPS (uninterrupted power supply) is there and tested. You want to make sure their air flow is good.

You want to check their physical security. On a fully managed service, you don’t have to do as much because they do not generally allow other customers on the floor – they control that. You want to make sure that there’s good security and cameras around the outside and that nobody can get in unless they’re authorized and authenticated through some biometric or something like that. Once they’re out on the floor, you don’t worry that much because they’re all employees of this company in managed services. You want to make sure their security guys are background checked.

In a pure co-lo because you’re sharing floor space with other customers, you have to make sure that your stuff is enclosed and that they have the right physical access in there. How do they do their key controls? Make sure that somebody doesn’t get the wrong key to their cage. You want to make sure that there are mantraps (one door shuts before the next open so no one can follow undetected). You want to make sure there are multiple levels (of security). You want to make sure you have cameras down every row, every aisle… the security guards are checking the cameras. You want to make sure you have control of the access list, so they won’t let anybody in unless you give authorization. Usually you want to make sure they have to deposit some sort of ID at the front, so they have to come back to pick it up.

You have to kind of monitor what’s going on in the industry on an ongoing basis. Usually if one of these guys runs into trouble, there are a few others that get into trouble. You want to look at their client base. You want to look for more traditional companies; what they call enterprise companies, like well-known names. There’s a certain comfort in knowing that certain companies are running with a provider.

IT Focus: When you’re on the giving end of this service, what do you want your clients to know to do? What might they overlook?

Bilewicz: (As) a host, you need to provide an SLA (service level agreement), because you don’t want any ambiguity or expectation problems from the customer. You need to set the price, then set the expectation. So SLAs are important.

If I’m going to do changes (as a service provider), I should have a regularly scheduled maintenance window. You don’t want any variability on how you act or behave because then you’re inconsistent and they don’t know what kind of service to expect.

But there should be an expectation on how they react or interact with you as well. (As a provider,) I need to know when you’re doing maintenance on your system so that I don’t get a bunch of false positives and start evoking my escalation process and my triage process.

I don’t want a hundred people calling me with the same problem. It should always come through a single authorized contact. You should have a defined protocol on both sides and how they interact when there’s an issue.

You need to have everything documented. You need to be kind of rigorous in that because as soon as your documentation goes out of date, and then if you have to move or do something, it makes it more difficult. Especially in co-lo, you need to have rack diagrams and understand how you’re configuring so that if you have to duplicate it somewhere else, you can do it quickly and you don’t have to sit there and figure it all out all over again.

I haven’t run into this situation, but in a fully managed service, it’s kind of a black box. I would think most customers don’t know how everything is set up. If you’re going from one managed service provider to another managed service provider, there’s a certain degree of co-operation that would need to happen in between those two guys; competitors.

We always make sure our DR (disaster recovery) plans are current and tested in case we have to go to another provider.