Last year, I was engaged to assist in a severe ransomeware event for a third party organization. Unfortunately, when I arrived most of the network had been impacted and the data was encrypted.
At the onset of the infection, the organization decided to shutdown all servers on the network. Shutting down was good decision to limit any further damage. By this time, the hypervisor hosts had already been hit. Without a functional failover cluster or operational Nutanix storage fabric, it was unclear the total impact on the virtual machines used for the server environment because of the inability to access the datastores where the vhds were stored. After some initial analysis, I became intrigued with the possibility that the hypervisor datastores might not have been encrypted.
Nutanix CVM's utilize pass-thru disks. The Nutanix CVMs use pass-thru for both the metadata disks and data disks. My theory was because the disk's were not directly mounted to the OS (operating system) of the hypervisor host, the ransomware wouldn't have encrypted them. Meaning all the raw data should be intact.
I booted up one of the hosts to evaluate the damage. I discovered the CVM boot image (iso) had been encrypted, so the cvms were unable to startup. This wasn't a large concern, because I knew the hypervisor hosts had to be rebuilt anyways. At this point, I engaged with the Nutanix support team to start building a plan to execute.
The plan:
- Boot Phoenix and wipe the SATADOM partitions, eliminating the ransomware and hypervisor OS.
- Re-install Windows.
- Boot to Phoenix and perform firstboot to re-create the CVMs.
- Re-configure IP address using previous configuration for each CVM.
- Cluster-Start.
It took some time to execute and rebuild the 10 node cluster. Once we completed the steps above, we were able to login to the prism element interface. We verified that the CVM's were rebuilt correctly, and that the storage containers were still in existence.
The next step was to rebuild the Windows failover cluster, and add it to prism. After 2 long days, and alot of anticipation, we found the containers were not encrypted. Great news!
We now needed to determine what VM's had been infected. The solution I devised was fairly simple, build a VM, then use the powershell Hyper-V module to mount each vdisk as read only, locate specific files indicating compromise, unmount, add a row to a csv file, then move to the next.
After a few hours, we were able to determine with a fair amount of confidence what VM's were infected and encrypted, and which may not be. Because of the amount of damage, the discussion focused on rebuilding, not containment, or recovery. It was decided to scrap the old systems and build new servers in the environment.
As IT professionals, we spend most of our time thinking about how we are protected from disasters, and what we have done to prevent them. We don't spend enough time thinking about what the recovery from the unexpected looks like. We can do alot to be proactive and deter large scale disasters, but sometimes, they won't be avoided.