Posts Taged s2d

Azure Stack HCI Mirror vs Nested Volume Performance

Got a two node Storage Spaces Direct Cluster? Then Windows Server 2019 can bring you more resiliency with the same hardware, but how does it perform…

Two nodes are popular

The adoption of Storage Space Direct and Azure Stack HCI clusters is growing month by month. Over the years the two node clusters became very popular. But with only two nodes you are a bit more at risk. Your precious data is only available on two nodes. And when disaster strikes and you lose 1 disk in both nodes, or a node and a disk….volumes go offline ☹.

Windows Server 2019 brings new forms of volume resiliency to the table in the form of nested resiliency. The new nested resilicency comes in two new flavors of resilience “Nested Mirrored Volumes” and “Mirror Accelerated Parity Volumes”.

Resiliency

This blog is not to explain all the details about nested resiliency. Microsoft already did a great job for us here. Feel free to check it out. But to make this blog more clear to our readers, we provide a little background information.

In Windows Server 2016 Storage Spaces Direct we have these 4 resiliency options available to create volumes.

Based on the amount of nodes you have in your cluster, you can choose a form of resiliency.

Mirror

With two nodes you can have two copies of the data. With three nodes you can have three copies of the data. Which means you can either lose one node or a disk in a node in a two-way mirror. Or lose two nodes or a disk in two of the three nodes in a three-way mirror without volumes going offline. With three nodes and above you can stick with the three-way mirror.

Parity

In the case of single parity it’s just like in the traditional RAID5 configurations. Blocks are written and parity data is calculated and written on another node. With dual parity there is more parity data to sustain more failures. Single parity requires three nodes and can sustain a node failure or disks in one node. With dual parity you need a minimum of four nodes and you can of course lose two nodes or disks in two nodes.

Volumes with parity are great for efficiency in disk space but they have a terrible performance and are only considered valid in backup or archival environments.

New Resiliency options

Based on the amount of nodes, some options are not available. In the case of a two node cluster there is only one option left! And that is the two-way mirror. Because of that, Microsoft added two additional resiliency options specifically and exclusively for the two node cluster configurations.

Nested Mirror

With a nested mirror you basically create a local and a remote mirror in one volume. So your volumes stripes across two nodes like a regular mirrored volume. But the block your write is not only available on another node, it is also copied on the same node and twice on the remote node. The picture below gives a good understanding:

In this case you can not only lose a node, but also a drive on the remaining node . With the nested mirror volumes you are much more resilient and can sustain more drive losses then in a two-way mirror. Unfortunately it is not efficient and you have only 25% of capacity available, but if availability is critical this is your way forward.

Nested Mirror Accelerated Parity

The second new flavor there is, is nested mirror accelerated parity…. Let’s explain that first by showing it in a picture.

When you create a nested mirror accelerated parity volume the storage does a little trick. Because part of that volume contains mirrored storage, let’s say 20%. The other 80% of the volume is created as a Parity. And then both parts are also copied to the second server. This way, the storage consumptions is much more effective because 80% is parity and 20% nested mirror. The storage uses the mirrored part of the volume as cache to improve performance and moves it to the parity part of the volume after it has been written. Pretty cool! But how does it perform? That is a question we have received a lot, so we tested it!

Testing Parameters

We use VMFleet and DiskSPD to test the performance of the different volumes. With these tools, we can quickly create a large amount of VMs and disks that we use for testing. When the VMs are deployed you can start the load tests on all the VMs simultaneous with a single command. During our tests we used the following test parameters:

  • Outstanding IO: 8

  • Block Size : 4k / 8k / 64k

  • Threads : 10

  • Write: 0% / 30% / 100%

  • VMs: 14 VMs per node

We then run 3 series of tests per type of volume. So the first test is based on 4k blocks. Thereafter we run a 100% read, 70% read and 0% read test. We repeated this process based on 8K blocks and so on, until the 64K blocks size. That results in a total of 9 tests.

Mirrored Volume Test

In this first test we created 2 mirrored volumes of 600GB and deployed 14 VMs to each volume. After that we started the series of tests with the parameters above. See the diagrams below for details.  

Nested Mirrored Volume

For our second test, we now create 2 nested mirrored volumes of 600GB each and deployed the 14 VMs to each volume. We then run the same tests as before which gives us the following results.

Nested Mirror Accelerated Parity

In the last test we again create 2 volumes, but this time nested mirror accelerated parity volumes. The mirror part is 100GB and the parity part is 500GB. We then deployed the 14 VMs to each volume. After that we started the series of tests with the parameters from before.

Conclusion

As we can see in the tests displayed above we have little to no performance loss when we only read data from a Mirrored Volume, compared to a Nested Mirror volume. Because not all the data is in the mirrored part of the Nested Mirror Accelerated Parity volume, we see a bigger performance drop when we read data. When the data set you read is smaller and/or your mirrored part of the volume is bigger, there should be very little performance difference. 

When we start writing data you see differences in performance which is very logical. In regards to the nested mirror volume we lost half of our physical disks that we can write to because of extra mirroring. In addition to that, creating more copies takes more time. The nested mirror accelerated parity volume is of course slower because a lot of parity calculations have to take place which gives a big performance hit, especially with write operations.

 

If you need the added resiliency of Windows Server 2019 with nested resiliency and a good amount of disk capacity, it is better to invest in additional storage and use nested mirror volumes. Going forward with nested mirror accelerated parity is not advisable for VM workloads.

Want to know more or do you have questions about Nested Resiliency then drop us an e-mail.

    Azure Stack HCI local vs stretched volume performance

    Azure Stack HCI OS Stretched Clustering

    One of the great new features in Azure Stack HCI OS is stretched clustering support for Storage Spaces Direct (S2D). With stretched clustering you can place S2D nodes in different sites for high availability and disaster recovery.

    While in Windows Server 2019 you already have the ability to use stretched clustering, it was not yet possible with S2D enabled hosts. With the arrival of Azure Stack HCI OS, there is no holding back and we can now benefit from stretched clustering for hyper-converged systems!

    As you might have heard or read about here, Azure Stack HCI will move forward as a new operating system with integration to Azure. In the new Azure Stack HCI OS there are lots of new features that we tried before and during public preview. It is important to understand that we did the testing with a preview version that is released as Azure Stack HCI 20H1. Performance on the GA version can be different.

    Stretched clustering for HCI, is a very welcome feature and requested by a lot of customers for a long time. But are there differences in performance in compare to single site clusters? We too were curious about the performance differences and did some testing.

    Stretched Volumes

    Before we start testing, first a little bit of background info. When hyper converged nodes are stretched across 2 sites, you have the ability to stretch the volumes across the sites. While it seems like there is only one volume, if you dive below the surface you will see multiple volumes. Only one volume, the primary volume, is accessible from both sites at the same time. The secondary volume in the other site is standby for access and only receiving changes from the primary volume. This is just like in any other stretched or metro cluster solution. When disaster strikes the primary volume goes offline and the replica is brought online in the other site. The VMs fail over and start so the applications are accessible again.

    When you create a stretched cluster and are ready to deploy volumes, you have 2 options: you can either create an asynchronous or synchronous volume. More info on which option to choose is described in the next chapters.

    Asynchronous

    With an “Asynchronous” volume the system accepts a write on the primary volume and responses back with an acknowledgement to the application after it is written. The system then tries to synchronize the change to the replica volume as fast as possible. It could be milliseconds or seconds later that the replication is finished. Depending on the amount of changes and intervals of the system, we could lose x amount of changes that already have been written to the primary volume but not yet to the replica volume, in case of a failure of the primary site.

    Synchronous

    A volume that is setup as “Synchronous Volume” will respond to the application with an acknowledgement after it has been written in both sites. The write is accepted by a node and copied to the other site. When both blocks have been written, the application will receive an acknowledgement from the storage. When the primary site fails there is no data loss since it’s in sync with the secondary site.

    Topology

    To give a better understanding of what our test setup looks like we provide some extra information.

    In this case we have 4 servers that only contain flash drives. The servers are physically in the same rack but we simulated 2 sites based on 2 subnets. The primary site is called Amsterdam, the secondary site is called Utrecht.

    In this setup the servers from both sites are in the same rack and cable distance is only meters instead of several kilometers or miles. So there is no additional latency because of the distance between the sites. That is important to keep in mind.

    Both sites contain 2 servers and each server has:

    – One volume that is not replicated to the other site but only between the nodes in the same site.
    – One stretched synchronous volume
    – One stretched asynchronous volume

    Per server we have a total of 3 volumes and on each volume we deployed 10 VMs for testing.

    Testing the setup

    We use VMFleet and DiskSPD to test the performance of the volumes. With these tools we can quickly create a large amount of VMs and disks that we use for testing. When the VMs are deployed you can start the load tests on all the VMs simultaneous with a single command. During our tests we used the following test parameters:

    • Outstanding IO: 16

    • Block Size : 4k

    • Threads : 8

    • Write: 0% / 30% / 100%

    Local Volumes Tests

    First we start testing with the local volumes and boot the 40 VMs (10 VMs per volume) on the local volumes. Then we conduct the three tests based on zero writes, 30% writes and 100% writes. The results can be seen below.

    Synchronous Stretched Volumes Test

    Next, we tested the VMs that are deployed on the stretched volumes with synchronous replication. Like before we only start the 40 VMs deployed on the stretched volumes and start the same tests as before.

    Asynchronous Stretched Volume Tests

    For our last test we also use an stretched volume, but this time we used an Asynchronous volume. Again we only use the 40 VMs that are located on this volume and run the same tests.

    Conclusion

    To wrap things up, we have put all our results from the tables above in a diagram. Now we can visualize the difference between the various types of volumes. As you can see in the diagram, there is almost no difference between the types of volumes when we only read data.

    The difference is starting to show when we start writing data. The synchronous and asynchronous volumes differences are huge compared to the local volumes. Considering these systems are next to each other, it will be worse when there is, for example, 50 km of fiber connection between the sites. 

    Note: The tests above were conducted with a 4k block size, which is considered the most intensive size for the logs to keep up. Using an 8k or 16k block size, which are considered more regular workloads, their will be less difference between the local and replicated volumes.

    Stretched clustering is a great way to improve availability for hyper-converged clusters. Although the preview build performance results are not satisfying enough. It’s good we test this in early preview stages of the Azure Stack HCI OS, so the product gets the improvements it needs before Azure Stack HCI OS gets to GA.

    If you have any questions or want more information about Azure Stack HCI OS, or stretched clustering let us know! We are happy to assist! 

      Automatically Update Storage Spaces Direct (S2D) Clusters

      Windows Updates may seem as ordinary business or something that you will deal with when the time is there, bear with us for a moment to explain why automatic updates on Storage Spaces Direct are different.

      For a long time now, we all know that it’s important to update our servers regularly with the latest Windows Updates for several reasons. 

      • It improves security because all software contains security flaws. Those flaws can be exploited for the wrong reasons by the wrong people. The updates fix the known security issues.

      • In some cases, it may improve performance because after all data from the field may give insights and some bits or bytes were not working as efficient as planned.

      • The stability of your environment may also increase. Since bugs are reported and get fixed and released through Windows Updates.

      Not your regular set of servers

      There are lots of ways to update your servers. You could do nothing, and Windows Update will at some point install the updates and eventually reboot your server. You could use Group Policies to download updates from Microsoft and schedule installation and reboot times to fit the company update policy. Other tools like Windows Server Update Service (WSUS) with GPO’s, System Center Configuration Manager (SCCM), Azure Update Management, or other third-party tools can also help to update your servers in a more controlled, centralized, and efficient way. When we look at clusters, in this case specifically Hyperconverged Infrastructure clusters, most of these tools are not sufficient enough and you should avoid using them. These HCI servers are not your regular set of servers, they require special attention and procedures to update them.

      Manually

      Not very time-efficient but you can do it manually. Before you start, first validate the cluster status is healthy, then put a node in maintenance mode, install updates, restart the node. When it’s back online, monitor and wait for the storage to synchronize. When done, you can resume the node in the cluster. Now you can continue with the second node and repeat the process for every node in the cluster. Updating one node and waiting for the storage synchronization could take anywhere between 10 minutes and several hours depending on the change rate and performance of the nodes. You can imagine that this can take up several nights or weekends of IT personnel that could be spent otherwise.

      Virtual Machine Manager

      System Center Virtual Machine Manager (SCVMM) can help with automatically updating your S2D clusters by automating the update procedure. This way IT personnel can use their time on other matters and human errors is brought to the minimum. Virtual Machine Manager has specific support for Storage Spaces Direct or Azure Stack HCI clusters and takes care of the updating, restarting, and monitoring the storage repair jobs for you. You only need to start it, sit back, and let Virtual Machine Manager take care of the rest.

      Cluster Aware Updating

      Where SCVMM is additional software you need to purchase or may already have purchased, Cluster Aware Updating (CAU) is a free tool embedded in every Windows System as a feature. CAU is also capable of dealing with S2D or Azure Stack HCI clusters. Just like VMM, CAU also automates the update procedures and is aware of storage synchronization jobs.
      Three benefits of using Cluster Aware Updating;

      1. CUA allows update scheduling to install updates on a specific day and time

      2. Ability to use pre/post scripts to perform custom (Powershell) actions before or after an update of a node.

      3. CUA is able to install drivers and firmware in the process.

      Azure Automation

      Azure Update Management is a new way of automating Windows Updates on your servers. These servers can run in Azure or in your own datacenter. As it is a cloud offer on Azure, Microsoft is heavily investing in this. But still today you should avoid Azure Automation Update Management to patch cluster nodes. As described earlier this tool is not aware of clustering or storage jobs and will threaten your nodes as single instances, and things can miserably wrong fast.

      VMM or CUA?

      That leaves us with two choices. VMM and CAU both have their pro’s and con’s, but they have one thing in common.. they both save you time.
      If you want to learn more about updating your Storage Spaces Direct or Azure Stack HCI cluster and the different tools that are available to use you could watch the “Automatically Update S2D Cluster” video (in Dutch for now). In about 20 minutes we talk in-depth about the different tools to update Storage Spaces Direct or Azure Stack HCI clusters and go through the pros and cons. We will demonstrate both update processes and tell you all you need to know! Access the video here!

      Free 20-minutes video on Automatically update Storage Spaces Direct Clusters (Dutch)
      Terms and Conditions