The ransomware attack that choked the government of Nunavut’s entire IT environment started on a Saturday morning, the day after Halloween.
It was a horror.
Eight hundred physical and virtual on-premises servers, encrypted. Almost all of the 5,500 workstations, encrypted.
“Everything was a net loss,” recalled Martin Joy, the government’s director of information and communications technology.
Because the government has a centralized IT department spread over 25 communities, the entire government was frozen – including hospitals, schools and family services.
Worse, the communities were linked by a less than zippy satellite network.
After the government of the far north territory decided not to negotiate with the attacker, it took what some would call a heroic effort by the 50 IT staff and contractors, plus the help of Microsoft’s Detection and Response Team (DART), to rebuild the network and re-image servers, desktop and laptop computers. Email was restored in the capital, Iqaluit, within five days. Applications were moved to the cloud, a feat thought to be almost impossible at the time because of that high latency and low throughput satellite network.
“We had some significant hurdles that were almost impossible to overcome,” Joy recalled in an interview.
Here’s how his team did it.
Phone call at home
Looking back, Joy said, he knew something was wrong when he got the phone call at his home in Iqaluit. It was 8:30 a.m. No one calls with good news that early on a Saturday.
At the other end of the line was Nathaniel Alexander, his manager of network operations. Alexander had been called earlier by the IT department’s network on-call team when users couldn’t log in to their applications. The VPN was offline. The firewalls were OK, Alexander was told, so something else was wrong.
It was not a good morning for Alexander. The night before was Halloween, so his family, like others, were “all hopped up on sugar” when he awoke.
Alexander and the team soon discovered the Active Directory authentication server wasn’t responding to queries. Nor was the secondary authentication server. That led to an investigation of the network, where they noticed high throughput traffic leaving the central data centre and going to their data centres in the 25 communities.
Alexander called Joy, who rushed to his office.
The pair opened their computers. “My computer went dark,” Joy recalled. “His computer was half infected but he could still access some of his network monitoring tools. We were afraid to turn his laptop off.
“There’s a sinking feeling in your stomach when you’re sitting there with a ‘Rome is burning’ type of feeling and you can’t see anything,” Joy said. They hoped only a small community was being attacked, but “we found out it was all 25 communities and it all happened within less than an hour for the full compromise to take effect…. It’s a shock.”
Both agreed it was the worst cyber incident of their careers.
A root cause analysis later determined they’d been hit by a new variant of the DopplePaymer ransomware, one that few anti-virus software providers were looking for. Joy assumes the attackers went through the publicly-posted list of territory employees and their email addresses looking for anyone who worked in the finance department, and sent them emails with an infected invoice. One person fell for it.
One problem Joy and his team faced was the nature of their infrastructure across the 25 communities. “We were independent islands that had all targeted goals that were similar,” he said, “However, because of the segregation it caused some isolated monitoring that was probably not good. Looking back it was a challenge to try to provide a holistic view of the impact and threat landscape that existed within our organization.”
While the satellite provider had recently upgraded its network, the territory hadn’t completely upgraded its network gear when the ransomware hit. There was only 1.5 MB of bandwidth on the existing network between communities. “That’s similar to dial-up speeds,” said Joy, “so there wasn’t really an ability to have all of the monitoring tools in place that came back to a central point. So for some incidents in remote communities it could take 12 hours before you would get an alert. There wasn’t a capacity to move high volumes of alerting back and forth. So there was always a delay.”
The network monitoring tools were best of breed, “but all of the specifications were made for southern Canada terrestrial fibre.” A proof of concept showed monitoring would work, but in reality “the alerting and promised enhancements sometimes never showed up.”
On top of that was the logistical problem of getting hardware and software to communities spread out over 2 million sq. km. The only way to get staff and equipment there is by air.
“We were a victim of legacy infrastructure that we inherited when Nunavut was created in 1999,” said Joy. “We had just started a modernization plan, from a security and an infrastructure standpoint,” when the ransomware hit.
”We contained [the attack] by creating a box around the whole and environment and segregating the community LAN infrastructures,” Joy said. “We left the environment on but disconnected because the goal was to find the root cause – where did the compromise come from? How did it get in? Until we could find that out, there was not much we could do to have a recovery plan.”
The good news was the territory’s backup data wasn’t affected because of previously implemented security layers. That allowed the territory’s government the next day (Sunday) to decide not to negotiate with the attackers.
“We knew we were going to be able to build back,” Joy said, although it would be “daunting”
His team split into three units: One to figure out how to restore at least some services quickly (there were just nine days until the next payroll for all territorial employees, plus another 18 days until the next payday for people receiving monthly income assistance payments); another doing root cause analysis; and a third to figure out logistics (getting 2,500 workstations in Iqaluit to a staging environment for re-imaging with Windows 10, returning them to offices, and turning them on floor by floor, building by building. Planes had to be chartered to do the same in each of the 25 communities.)
Meanwhile vendors were contacted. One was Microsoft Canada. Under a support agreement it offered the services of its DART team.
By the time that team landed on Wednesday, the IT team had created a recovery process. The network team was creating a new network. The government had set one priority: Restoring communications so residents could be told what was going on. Working with the DART team, it was decided to switch to the cloud-based Office 365 for all 6,500 users. Typically, Joy said, that could take months. Instead, it was done in about five days in Iqaluit, and nine days across the territory.
That was just the start of an overhaul to switch from a siloed strategy to one built around Microsoft’s cloud-based tools. That included the installation of Azure Sentinel, a security information and event management solution (SIEM) to give visibility across the IT environment; Cloud App Security for visibility into apps and resources and Microsoft Defender for Endpoint.
To analyze telemetry streams, IT now uses Azure Data Explorer. It automates business-critical workflows with Azure Logic Apps, and uses Microsoft Information Protection to classify and protect documents by applying labels to content.
Six weeks – and according to the Globe and Mail, $5 million later – the system was fully back up and operational. It was, Joy said, “an unplanned modernization of the infrastructure.”
“If we had the ability to look at lateral attack patterns to see lateral movement, a SIEM could have alerted us,” Joy said. “We had modern security tools but unfortunately they didn’t work … it just didn’t detect” the strain of ransomware.
And while the end-point protection tool had a central console that provided alerting to the system team on all endpoints and servers, it would have only detected the initial infection. It wouldn’t have seen encryption of files as unusual behaviour.
The automation of certain functions has been an advantage, giving more time to analysts to spend on detecting sophisticated attacks. “If something goes wrong, the workstation is automatically isolated from the network and scanned with a full AV scan before an analyst can look at it,” said Joy. “You have to meet threat actors with a similar level of sophistication.
“One of our saving graces,” Joy added, was the ability of the IT team to work together. “You can’t have a siloed team,” he said. “They need to be interoperable and understand the whole business of the enterprise.”
Joy listed a number of lessons learned from the attack:
–sometimes trying to continue running legacy infrastructure poses a significant threat. You need a robust infrastructure with a holistic view;
–don’t operate a network on satellite infrastructure;
–your IT team has to be able to handle multiple tasks in an emergency. “One thing I can’t state enough is you’re truly as capable and competent as your team. Without a good team to support the infrastructure I don’t think you can be successful;”
–if they’re not doing it today, CISOs need to do a better job of making the business case for investing in cybersecurity management;
–“Security is not ‘put it in place and you’re done,’” Joy concludes. It’s an exercise constant improvement.