Got a two node Storage Spaces Direct Cluster? Then Windows Server 2019 can bring you more resiliency with the same hardware, but how does it perform…
Two nodes are popular
The adoption of Storage Space Direct and Azure Stack HCI clusters is growing month by month. Over the years the two node clusters became very popular. But with only two nodes you are a bit more at risk. Your precious data is only available on two nodes. And when disaster strikes and you lose 1 disk in both nodes, or a node and a disk….volumes go offline ☹.
Windows Server 2019 brings new forms of volume resiliency to the table in the form of nested resiliency. The new nested resilicency comes in two new flavors of resilience “Nested Mirrored Volumes” and “Mirror Accelerated Parity Volumes”.
This blog is not to explain all the details about nested resiliency. Microsoft already did a great job for us here. Feel free to check it out. But to make this blog more clear to our readers, we provide a little background information.
In Windows Server 2016 Storage Spaces Direct we have these 4 resiliency options available to create volumes.
Based on the amount of nodes you have in your cluster, you can choose a form of resiliency.
With two nodes you can have two copies of the data. With three nodes you can have three copies of the data. Which means you can either lose one node or a disk in a node in a two-way mirror. Or lose two nodes or a disk in two of the three nodes in a three-way mirror without volumes going offline. With three nodes and above you can stick with the three-way mirror.
In the case of single parity it’s just like in the traditional RAID5 configurations. Blocks are written and parity data is calculated and written on another node. With dual parity there is more parity data to sustain more failures. Single parity requires three nodes and can sustain a node failure or disks in one node. With dual parity you need a minimum of four nodes and you can of course lose two nodes or disks in two nodes.
Volumes with parity are great for efficiency in disk space but they have a terrible performance and are only considered valid in backup or archival environments.
New Resiliency options
Based on the amount of nodes, some options are not available. In the case of a two node cluster there is only one option left! And that is the two-way mirror. Because of that, Microsoft added two additional resiliency options specifically and exclusively for the two node cluster configurations.
With a nested mirror you basically create a local and a remote mirror in one volume. So your volumes stripes across two nodes like a regular mirrored volume. But the block your write is not only available on another node, it is also copied on the same node and twice on the remote node. The picture below gives a good understanding:
In this case you can not only lose a node, but also a drive on the remaining node . With the nested mirror volumes you are much more resilient and can sustain more drive losses then in a two-way mirror. Unfortunately it is not efficient and you have only 25% of capacity available, but if availability is critical this is your way forward.
Nested Mirror Accelerated Parity
The second new flavor there is, is nested mirror accelerated parity…. Let’s explain that first by showing it in a picture.
When you create a nested mirror accelerated parity volume the storage does a little trick. Because part of that volume contains mirrored storage, let’s say 20%. The other 80% of the volume is created as a Parity. And then both parts are also copied to the second server. This way, the storage consumptions is much more effective because 80% is parity and 20% nested mirror. The storage uses the mirrored part of the volume as cache to improve performance and moves it to the parity part of the volume after it has been written. Pretty cool! But how does it perform? That is a question we have received a lot, so we tested it!
We use VMFleet and DiskSPD to test the performance of the different volumes. With these tools, we can quickly create a large amount of VMs and disks that we use for testing. When the VMs are deployed you can start the load tests on all the VMs simultaneous with a single command. During our tests we used the following test parameters:
Outstanding IO: 8
Block Size : 4k / 8k / 64k
Threads : 10
Write: 0% / 30% / 100%
VMs: 14 VMs per node
We then run 3 series of tests per type of volume. So the first test is based on 4k blocks. Thereafter we run a 100% read, 70% read and 0% read test. We repeated this process based on 8K blocks and so on, until the 64K blocks size. That results in a total of 9 tests.
Mirrored Volume Test
In this first test we created 2 mirrored volumes of 600GB and deployed 14 VMs to each volume. After that we started the series of tests with the parameters above. See the diagrams below for details.
Nested Mirrored Volume
For our second test, we now create 2 nested mirrored volumes of 600GB each and deployed the 14 VMs to each volume. We then run the same tests as before which gives us the following results.
Nested Mirror Accelerated Parity
In the last test we again create 2 volumes, but this time nested mirror accelerated parity volumes. The mirror part is 100GB and the parity part is 500GB. We then deployed the 14 VMs to each volume. After that we started the series of tests with the parameters from before.
As we can see in the tests displayed above we have little to no performance loss when we only read data from a Mirrored Volume, compared to a Nested Mirror volume. Because not all the data is in the mirrored part of the Nested Mirror Accelerated Parity volume, we see a bigger performance drop when we read data. When the data set you read is smaller and/or your mirrored part of the volume is bigger, there should be very little performance difference.
When we start writing data you see differences in performance which is very logical. In regards to the nested mirror volume we lost half of our physical disks that we can write to because of extra mirroring. In addition to that, creating more copies takes more time. The nested mirror accelerated parity volume is of course slower because a lot of parity calculations have to take place which gives a big performance hit, especially with write operations.
If you need the added resiliency of Windows Server 2019 with nested resiliency and a good amount of disk capacity, it is better to invest in additional storage and use nested mirror volumes. Going forward with nested mirror accelerated parity is not advisable for VM workloads.
Want to know more or do you have questions about Nested Resiliency then drop us an e-mail.