Windows Update causes BSOD on Failover with Azure Site Recovery

Are you using Azure Site Recovery (ASR) and Windows Server 2019 VM’s? There is a big change that you are unprotected right now and failover will end up in BSOD.

Over the last couple of months we were involved in multiple projects to modernize customer datacenters with Azure Stack HCI. On top of that enabling Azure features like Azure Backup and Azure Site Recovery (ASR). During failover tests we ran into a problem with Windows Server 2019 VMs.

Case 1: A regular ASR Test failover

In the beginning of February 2021 we completed a migration from a Hyper-V environment to a new a Azure Stack HCI solution with Azure Site Recovery as disaster recovery solution. All VMs (most of them running Windows Server 2019) were migrated to Azure Stack HCI. The business critical VMs were enabled for protected with ASR. During the test failover all VMs were booted successfully in Azure. Then tested by the customer for availability and functionality.

The failover test was a success, but due to a migration of a core application, the customer scheduled an additional fail over test to include the new application when their migration was finished. But more on that later.

Case 2: Test failover in an error

In this case a very similar situation to the first case. Migration to Azure Stack HCI environment and all business critical VMs protected with ASR. In this case there was only one VM (a domain controller) on Windows Server 2019. The others VMs were running a variety of 2016, 2012 R2 an 2012.

Around the beginning of April 2021 we scheduled and executed a test fail over to test ASR. Almost all VMs booted successfully in Azure after the test failover was completed, except one VM. In this case the Windows Server 2019 VM keeps booting into the Windows Boot Manager with an error.

After some research and several reboots of the VM it seems to boot to Windows but is struck by a BSOD on WDF01000.sys. And we all know, that’s not good…

After the BSOD the VM boots into the Windows Boot manager and is stuck there. In this case the customer suggested to investigate him self and we left it at that. Waiting to proceed with the failover test when the issue was investigated by the customer.

Back to Case 1

In the mean while the application was migrated to new VMs and the customer requested an additional Azure Site Recovery failover test. This time it was not going that well. After the test failover all Windows Server 2019 VMs booted into the Windows Boot Manager with the same error code as we seen in our second case.

We noticed the same BSOD on WDF01000.sys. This time we could do the investigation and started downloading the VHD to retrieve the memory.dmp file. The memory dump pointed to vmstorfl.sys filter driver used as Virtual Storage Filter Driver on Windows VMs…

This was working 3 months back in the beginning of February, something has broke it for almost all Windows Server 2019 VMs.

What has happened?

When looking at the environment we noticed that after the successful failover test the February cumulative update was installed (KB4601345). With the February update the OS version is set to 10.0.17763.1757. In that update the VMstorfl.sys file is not updated and stays at version 10.0.17763.771. In the May Update the VMstorfl.sys file is updated to 10.0.17763.1911, so that means the file has changed. Unfortunately there is no documentation available and no mention of this issue in the February update. The same applies for the March and April updates including the May update that fixed the issue there is no mention of the issue.

We did some testing on our own in a lab environment to rule out third party antivirus, backup agents or any other software. We setup several VMs with a clean installation of Windows Server 2019 1809 VMs. As soon as the February, March or April update is installed the VMs experience the BSOD. 

Update as soon as possible and test your DR solution

Due to these 2 cases and some lab testing on our own. We found out that there is something in the February Update that is causing Windows Server 2019 VMs to Bluescreen on failover to Azure.

With the May update (KB5003171) installed this issue is fixed. So to all companies out there depending on Azure Site Recovery as your datacenter disaster recovery solution. If you are using Windows Server 2019 VMs make sure you install the May update as soon as possible. Otherwise your Disaster recovery solutions does not work at all!

In addition to that, these kind of issue’s proof once again that it is vital to test your DR solution from time to time. Because you probability didn’t know that for the past 3 months your DR solution with Azure Site Recovery was useless for your Windows Server 2019 VMs with the February update applied. And it keeps being useless until the may update is installed and synced.

If you need any help or got questions please get in touch!

Azure Stack HCI Mirror vs Nested Volume Performance

Got a two node Storage Spaces Direct Cluster? Then Windows Server 2019 can bring you more resiliency with the same hardware, but how does it perform…

Two nodes are popular

The adoption of Storage Space Direct and Azure Stack HCI clusters is growing month by month. Over the years the two node clusters became very popular. But with only two nodes you are a bit more at risk. Your precious data is only available on two nodes. And when disaster strikes and you lose 1 disk in both nodes, or a node and a disk….volumes go offline ☹.

Windows Server 2019 brings new forms of volume resiliency to the table in the form of nested resiliency. The new nested resilicency comes in two new flavors of resilience “Nested Mirrored Volumes” and “Mirror Accelerated Parity Volumes”.


This blog is not to explain all the details about nested resiliency. Microsoft already did a great job for us here. Feel free to check it out. But to make this blog more clear to our readers, we provide a little background information.

In Windows Server 2016 Storage Spaces Direct we have these 4 resiliency options available to create volumes.

Based on the amount of nodes you have in your cluster, you can choose a form of resiliency.


With two nodes you can have two copies of the data. With three nodes you can have three copies of the data. Which means you can either lose one node or a disk in a node in a two-way mirror. Or lose two nodes or a disk in two of the three nodes in a three-way mirror without volumes going offline. With three nodes and above you can stick with the three-way mirror.


In the case of single parity it’s just like in the traditional RAID5 configurations. Blocks are written and parity data is calculated and written on another node. With dual parity there is more parity data to sustain more failures. Single parity requires three nodes and can sustain a node failure or disks in one node. With dual parity you need a minimum of four nodes and you can of course lose two nodes or disks in two nodes.

Volumes with parity are great for efficiency in disk space but they have a terrible performance and are only considered valid in backup or archival environments.

New Resiliency options

Based on the amount of nodes, some options are not available. In the case of a two node cluster there is only one option left! And that is the two-way mirror. Because of that, Microsoft added two additional resiliency options specifically and exclusively for the two node cluster configurations.

Nested Mirror

With a nested mirror you basically create a local and a remote mirror in one volume. So your volumes stripes across two nodes like a regular mirrored volume. But the block your write is not only available on another node, it is also copied on the same node and twice on the remote node. The picture below gives a good understanding:

In this case you can not only lose a node, but also a drive on the remaining node . With the nested mirror volumes you are much more resilient and can sustain more drive losses then in a two-way mirror. Unfortunately it is not efficient and you have only 25% of capacity available, but if availability is critical this is your way forward.

Nested Mirror Accelerated Parity

The second new flavor there is, is nested mirror accelerated parity…. Let’s explain that first by showing it in a picture.

When you create a nested mirror accelerated parity volume the storage does a little trick. Because part of that volume contains mirrored storage, let’s say 20%. The other 80% of the volume is created as a Parity. And then both parts are also copied to the second server. This way, the storage consumptions is much more effective because 80% is parity and 20% nested mirror. The storage uses the mirrored part of the volume as cache to improve performance and moves it to the parity part of the volume after it has been written. Pretty cool! But how does it perform? That is a question we have received a lot, so we tested it!

Testing Parameters

We use VMFleet and DiskSPD to test the performance of the different volumes. With these tools, we can quickly create a large amount of VMs and disks that we use for testing. When the VMs are deployed you can start the load tests on all the VMs simultaneous with a single command. During our tests we used the following test parameters:

  • Outstanding IO: 8

  • Block Size : 4k / 8k / 64k

  • Threads : 10

  • Write: 0% / 30% / 100%

  • VMs: 14 VMs per node

We then run 3 series of tests per type of volume. So the first test is based on 4k blocks. Thereafter we run a 100% read, 70% read and 0% read test. We repeated this process based on 8K blocks and so on, until the 64K blocks size. That results in a total of 9 tests.

Mirrored Volume Test

In this first test we created 2 mirrored volumes of 600GB and deployed 14 VMs to each volume. After that we started the series of tests with the parameters above. See the diagrams below for details.  

Nested Mirrored Volume

For our second test, we now create 2 nested mirrored volumes of 600GB each and deployed the 14 VMs to each volume. We then run the same tests as before which gives us the following results.

Nested Mirror Accelerated Parity

In the last test we again create 2 volumes, but this time nested mirror accelerated parity volumes. The mirror part is 100GB and the parity part is 500GB. We then deployed the 14 VMs to each volume. After that we started the series of tests with the parameters from before.


As we can see in the tests displayed above we have little to no performance loss when we only read data from a Mirrored Volume, compared to a Nested Mirror volume. Because not all the data is in the mirrored part of the Nested Mirror Accelerated Parity volume, we see a bigger performance drop when we read data. When the data set you read is smaller and/or your mirrored part of the volume is bigger, there should be very little performance difference. 

When we start writing data you see differences in performance which is very logical. In regards to the nested mirror volume we lost half of our physical disks that we can write to because of extra mirroring. In addition to that, creating more copies takes more time. The nested mirror accelerated parity volume is of course slower because a lot of parity calculations have to take place which gives a big performance hit, especially with write operations.


If you need the added resiliency of Windows Server 2019 with nested resiliency and a good amount of disk capacity, it is better to invest in additional storage and use nested mirror volumes. Going forward with nested mirror accelerated parity is not advisable for VM workloads.

Want to know more or do you have questions about Nested Resiliency then drop us an e-mail.

    Azure Stack HCI local vs stretched volume performance

    Azure Stack HCI OS Stretched Clustering

    One of the great new features in Azure Stack HCI OS is stretched clustering support for Storage Spaces Direct (S2D). With stretched clustering you can place S2D nodes in different sites for high availability and disaster recovery.

    While in Windows Server 2019 you already have the ability to use stretched clustering, it was not yet possible with S2D enabled hosts. With the arrival of Azure Stack HCI OS, there is no holding back and we can now benefit from stretched clustering for hyper-converged systems!

    As you might have heard or read about here, Azure Stack HCI will move forward as a new operating system with integration to Azure. In the new Azure Stack HCI OS there are lots of new features that we tried before and during public preview. It is important to understand that we did the testing with a preview version that is released as Azure Stack HCI 20H1. Performance on the GA version can be different.

    Stretched clustering for HCI, is a very welcome feature and requested by a lot of customers for a long time. But are there differences in performance in compare to single site clusters? We too were curious about the performance differences and did some testing.

    Stretched Volumes

    Before we start testing, first a little bit of background info. When hyper converged nodes are stretched across 2 sites, you have the ability to stretch the volumes across the sites. While it seems like there is only one volume, if you dive below the surface you will see multiple volumes. Only one volume, the primary volume, is accessible from both sites at the same time. The secondary volume in the other site is standby for access and only receiving changes from the primary volume. This is just like in any other stretched or metro cluster solution. When disaster strikes the primary volume goes offline and the replica is brought online in the other site. The VMs fail over and start so the applications are accessible again.

    When you create a stretched cluster and are ready to deploy volumes, you have 2 options: you can either create an asynchronous or synchronous volume. More info on which option to choose is described in the next chapters.


    With an “Asynchronous” volume the system accepts a write on the primary volume and responses back with an acknowledgement to the application after it is written. The system then tries to synchronize the change to the replica volume as fast as possible. It could be milliseconds or seconds later that the replication is finished. Depending on the amount of changes and intervals of the system, we could lose x amount of changes that already have been written to the primary volume but not yet to the replica volume, in case of a failure of the primary site.


    A volume that is setup as “Synchronous Volume” will respond to the application with an acknowledgement after it has been written in both sites. The write is accepted by a node and copied to the other site. When both blocks have been written, the application will receive an acknowledgement from the storage. When the primary site fails there is no data loss since it’s in sync with the secondary site.


    To give a better understanding of what our test setup looks like we provide some extra information.

    In this case we have 4 servers that only contain flash drives. The servers are physically in the same rack but we simulated 2 sites based on 2 subnets. The primary site is called Amsterdam, the secondary site is called Utrecht.

    In this setup the servers from both sites are in the same rack and cable distance is only meters instead of several kilometers or miles. So there is no additional latency because of the distance between the sites. That is important to keep in mind.

    Both sites contain 2 servers and each server has:

    – One volume that is not replicated to the other site but only between the nodes in the same site.
    – One stretched synchronous volume
    – One stretched asynchronous volume

    Per server we have a total of 3 volumes and on each volume we deployed 10 VMs for testing.

    Testing the setup

    We use VMFleet and DiskSPD to test the performance of the volumes. With these tools we can quickly create a large amount of VMs and disks that we use for testing. When the VMs are deployed you can start the load tests on all the VMs simultaneous with a single command. During our tests we used the following test parameters:

    • Outstanding IO: 16

    • Block Size : 4k

    • Threads : 8

    • Write: 0% / 30% / 100%

    Local Volumes Tests

    First we start testing with the local volumes and boot the 40 VMs (10 VMs per volume) on the local volumes. Then we conduct the three tests based on zero writes, 30% writes and 100% writes. The results can be seen below.

    Synchronous Stretched Volumes Test

    Next, we tested the VMs that are deployed on the stretched volumes with synchronous replication. Like before we only start the 40 VMs deployed on the stretched volumes and start the same tests as before.

    Asynchronous Stretched Volume Tests

    For our last test we also use an stretched volume, but this time we used an Asynchronous volume. Again we only use the 40 VMs that are located on this volume and run the same tests.


    To wrap things up, we have put all our results from the tables above in a diagram. Now we can visualize the difference between the various types of volumes. As you can see in the diagram, there is almost no difference between the types of volumes when we only read data.

    The difference is starting to show when we start writing data. The synchronous and asynchronous volumes differences are huge compared to the local volumes. Considering these systems are next to each other, it will be worse when there is, for example, 50 km of fiber connection between the sites. 

    Note: The tests above were conducted with a 4k block size, which is considered the most intensive size for the logs to keep up. Using an 8k or 16k block size, which are considered more regular workloads, their will be less difference between the local and replicated volumes.

    Stretched clustering is a great way to improve availability for hyper-converged clusters. Although the preview build performance results are not satisfying enough. It’s good we test this in early preview stages of the Azure Stack HCI OS, so the product gets the improvements it needs before Azure Stack HCI OS gets to GA.

    If you have any questions or want more information about Azure Stack HCI OS, or stretched clustering let us know! We are happy to assist! 

      Azure Stack HCI with Kubernetes – part 2

      Introduction to Virtual machines and Containers 

      Back in 2016 Microsoft released a new type of OS called Nano Server and the Windows Container feature. Kubernetes had just been released and Docker was already working for some time on containers. While back in 2016 it was all about the jokes with shipping containers and garbage containers. Since then, container usage started to grow and has been adopted by all big vendors on a large scale. It has become yet another game changer in today’s IT infrastructures and application development. 

      Today all the big cloud providers like Microsoft with Azure, Amazon with AWS and Google with GCP offer containers based on Docker and Kubernetes. If you want to run containers yourself in your own datacenter you can use Docker, Kubernetes, Windows Containers with Docker on Windows or Linux.  

      On the other end, virtual machines are common good these days and will stay with us for a long time. Because not everything can be containerized or is not relevant in a containerized way. Therefore, wouldn’t it be great if you could share your infrastructure to run both Windows and Linux VMs and Windows and Linux Containers?

      Microsoft released Azure Stack HCI & AKS for Azure Stack HCI, these products give you the ability to run containers and VMs on your datacenter hardware. Managed and deployed through the Azure Portal and Windows Admin Center.

      In this blog we’ll talk a little bit about Kubernetes and how it works. But also, the possibilities we have with Azure, Azure Arc, Azure Stack HCI as virtualization and storage platform to run VMs, and containers managed by Kubernetes.

      Virtual Machines

      With a virtual machine the hardware is virtualized, and the operating system is running on top of virtual hardware instead of the physical hardware. Inside the OS you can practically do everything as on a physical computer. The VM is running on top of a virtualization host along with multiple other VMs.

      On a decent virtualization platform, we want to make sure that VMs are high available. In the case of a failure of a host the VM is quickly moved to another system and booted. In a matter of seconds the VM is back with access and functionality restored. For this to work we need shared storage. This can be in various ways like traditional SAN with Fiber or ISCSI access or Hyperconverged like Storage Spaces Direct. In addition, we need a cluster service to make sure that when a node fails the other systems detects it, and takes action. Within Windows the Failover Clustering feature takes care of this.


      When we look at a container there is some overlap. A container is an isolated, lightweight instance to run an application on the host operating system. This host can be a physical machine or a virtual machine. Containers are built on top of the host operating system’s kernel and contain only apps and some lightweight operating system APIs and services that run in user mode. If you have a Windows VM with docker you can deploy Windows containers. On a Linux VM you can deploy Linux containers. Because it shares the kernel you cannot mix Windows and Linux Containers on the same underlaying OS.

      For containers and VM the same applies, we want the application running inside it to be highly available in case something fails. This is where things get different with VMs and containers. With VMs we have the failover cluster manager to manage and detect failures and take actions accordingly. With containers we don’t use the failover cluster manager because the management of deploying, rebuilding, and so on is done by another management tool. Here comes container orchestrator tools such as Kubernetes into play.

      Kubernetes and Fail-over Clusters

      With VMs and containers the same rule applies. Threat them as cattle not as pets, meaning that you don’t want to have too much dependency on them.

      VMs are bigger in size and contain persistent data. If we would destroy it or spin up a new one it takes more time and you potentially could lose data. That’s why they are stored on shared storage. In case of a failure the failover cluster manager boots the VM on another host, which also can access that shared storage, and its up and running again.

      Containers are very small and, in most cases, they don’t contain any data. It is easier and faster to just deploy new ones. Container orchestrator platforms like Kubernetes take care of this. It detects when containers are down and spins up new one on another hosts and makes sure it’s accessible.


      Kubernetes manages the deployments of resources (not only containers). Kubernetes has several objects and building blocks it uses to deploy, manage and publish the resources which we will deep dive in to in another blog. For now, it is important to know Kubernetes consist of a management cluster (control pane) with master nodes and additional worker nodes to run workloads. 

      Master Nodes

      A production Kubernetes cluster requires a minimum of 3 master nodes. The master nodes manage the deployment of various components required to deploy containers and be able to communicate with them. It also provides an API layer for the workers to communicate with the masters. The API is also used to deploy workloads. The master nodes can run on physical or virtual machines and can only run on a Linux based OS. 

      Worker Nodes

      The worker nodes are used to run the container workloads. Worker nodes are also known as Minions….. 

      Let’s hope these minions behave better than the yellow dudes and don’t turn it all into chaos…

      The worker nodes can be either Linux or Windows. The Windows option gives us a lot of flexibility with Azure Stack HCI, but before we go down that path, we dive a little deeper in the Kubernetes on Windows requirements first.

      Worker Nodes on Windows

      To be able to add Windows Workers to a Kubernetes cluster, the Windows worker must run Windows Server 2019 or Azure Stack HCI OS at minimum and a Kubernetes version of 1.17 or above. In addition to that, the Windows Containers feature and Docker are required. There are other container engines available, but Docker is widely used and has the best support for Windows, so we recommend using Docker. Besides the previous requirements we also need some additional things like networking and storage on the worker nodes which we will discuss in the next parts of this blog series. Once we have the requirements setup, we have a working Windows worker capable of running containers deployed and managed by Kubernetes.

      Windows and Linux Containers 

      As described earlier in this blog you cannot mix different container OSes on the host. But that is only true for Linux workers. A Linux worker node cannot run Windows containers. But a Windows Worker can run both Windows and Linux containers due to the feature WSL (Windows Subsystem for Linux). With a Kubernetes cluster and Windows Workers nodes or let’s say Mixed worker nodes you can run both Linux and Windows containers and that is a great opportunity!    

      Azure Stack HCI & Azure Kubernetes Service (AKS)

      Azure Stack HCI is the Microsoft Hyper-converged infrastructure offering which is the basis for a software-defined datacenter. HCI brings together highly virtualized compute, storage, and networking on industry-standard x86 servers and components.

      With Azure Stack HCI we are able to create a robust platform to host virtual machines, and simultaneously these virtual machines are the foundation for a robust container platform. Because Azure Stack HCI makes use of clustering, it’s also suitable to host the Kubernetes cluster itself, making sure that the VMs hosting the Kubernetes cluster are spread among physical machines to reduce downtime.

      Microsoft has released Azure Kubernetes Service on Azure Stack HCI to save you from the hassle setting up Kubernetes yourself. Just as in Microsoft Azure, with AKS, you get your own Kubernetes clusters deployed and managed by Microsoft, but in your own datacenter. This brings a lot of advantages to the table such as latency or data locality.

      Getting started with AKS on Azure Stack HCI

      Read more about AKS on Azure Stack HCI on the Microsoft Docs page here.
      To get started and download you can head over to the preview registration page here.

      Microsoft released a great blog post on how Kubernetes in intertwined with Azure Stack HCI and the storage components: It explains the basics and how to get started using Windows Admin Center. 
      Do you want to consultation how AKS on HCI matches your challenges? Reach out

        Azure Stack HCI with Kubernetes

        The game of abstraction of infrastructure is going fast. If you don’t keep up, you could end up in a world where people point their finger at and whisper “legacy”. 

        Looking back a decade, hardware evolved quick and virtualization technologies came to the rescue, allowing higher densities of workloads on one or multiple physical servers in the form of virtual machines. Applications ran in those VMs would benefit from high availability, so if a physical server fails another server takes over the virtual machine. The hypervisor technology creates hardware abstraction. However, the virtual machine is still bound to the underlying hypervisor and most probably to the hardware the hypervisor is using. This would mean that you can move virtual machines between the same type of hypervisor and hardware, but for example moving a VMware VM to Hyper-V is not possible without conversion. The same goes for moving to or between public cloud providers, no technical challenge there but the portability is not good enough. In other words, moving from and to another platform is not a one-click action. 


        Being tied to a specific platform of choice is not very convenient but was accepted for many years. Applications would run in virtual machines and those virtual machines would be on a hypervisor platform. 

        Containers form the new wave of innovation and modernization of applications. Containers run in virtual machines which are called ‘container host s’. While running in virtual machines, the platform creates abstraction of the underlying infrastructure (the hypervisor). 

        This would mean that you can a run one container host on Hyper-V and another on VMware and deploy the same container to it. Using containers, organizations are not tied to specific platforms but can be platform agnostic.  

        Management of containers is a different ball game comparing to management of virtual machines. A virtual machine would typically run one application and the VM would exist as long as the application did. In the container landscape, an application can consists out of multiple containers that are created when needed and destroyed when not used. This requires a different type of toolset and Kubernetes is the swiss knife that has all the tools build-in. 


        Kubernetes is a container orchestrator platform, but it has a lot more capabilities. Seeking agnostic infrastructure, you can use Kubernetes to abstract the infrastructure away from your applications. The container hosts mentioned above are included within Kubernetes and become ‘worker nodes’ where containers are deployed. Kubernetes now orchestrates your container landscape, it notices if more containers are needed or when containers can be removed because of inactivity. Because the Kubernetes nodes can run anywhere you’d like, and Kubernetes manages where containers are deployed, your application is now highly portable and abstracted from any platform.  
        Kubernetes itself also needs to run somewhere and is also distributed in multiple virtual machines, which is referred to as a ‘Kubernetes Management Cluster’.  

        In part 2 of this blog series we’ll go in full detail how Kubernetes works. 

        Kubernetes cluster in the cloud

        The major cloud providers were not ignoring the container era, thus are providing customers Kubernetes clusters as a service. They are called Amazon’s Elastic Kubernetes Service (EKS), Microsoft’s Azure Kubernetes Service (AKS), and Google Kubernetes Engine (GKE). The Kubernetes cluster is abstracted as well in a PaaS service by the cloud providers, but you could run it anywhere you’d like. Same for the worker nodes, you could make use of AKS and run your Kubernetes worker nodes in AWS, Google cloud, and Azure Stack HCI simultaneously. Now.. that’s a true hybrid cloud. 
        In this blog series we’re explicating the relationship between ‘traditional’ infrastructure, modern Hyper-converged infrastructure and Kubernetes. From an IT-Pro point-of-view. 

        Read Azure Stack HCI with Kubernetes - part 2 here!

        Azure Stack HCI details

        Microsoft has announced a successor of the current Azure Stack HCI. The current solution is based on Windows Server 2019 using Hyper-V and Storage Spaces Direct. The new Azure Stack HCI solution is based on a new operating system originating from Windows Server 2019, called Azure Stack HCI. 

        On our dedicated Azure Stack HCI page, we have explained what the solution is all about. In this blog, we’re diving a little deeper in the details.

        Azure Stack HCI Operating System 

        Azure Stack HCI is not only the name of the solution, but also the name of the Operating System. That means that Azure Stack HCI OS is breaking loose from Windows Server, and the (slow paced) release cadence. The Azure Stack HCI OS will be updates much more frequently like the SAC releases providing new features or improvements at a faster rate. 

        As Azure Stack HCI is released before the upcoming version of Windows Server, we also get the announced enhancements sooner as expected. Such as;  

        Full stack automatic updates

        Firmware and drives updated through integration with Windows Admin Center. Automatic, no manual intervention needed.
        See this screenshot from EMC Dell for the visuals, or take a look at their 7-min video here.

        Storage rebuilds 75% faster

        Azure Stack HCI includes a completely renewed Storage Spaces Direct repair mechanism! The cluster now tracks the changes in the data at a much finer granularity. This improves rebuild times up to 75%, narrowing maintenance windows further. 

        Stretched clusters

        Azure Stack HCI also provides us with the Stretched Clustering feature, build on top of Storage Replica. Using this new feature, we can span an Azure Stack HCI cluster over multiple sites providing business continuity and disaster recovery (BCDR) capabilities. 
        Azure Stack HCI supports synchronous and A-synchronous replication. 

        Affinity and Anti-Affinity

        With the release of Azure Stack HCI there is a new feature included called ‘VM Affinity and Anti Affinity’.  


        With Affinity rules you can achieve binding of two or more resources together. For example, you want your front-end webservers and back-end databases servers on the same physical location to avoid latency and increase performance. 


        With Anti-Affinity, we achieve the exact opposite. 
        If we want to distribute front-end webservers over multiple physical locations (fault domains) we can use Anti-Affinity rules. 
        When one physical location is offline due to maintenance or unexpected failure, you make sure your application stays online. 

        Windows Admin Center

        With the release of Azure Stack HCI Microsoft also heavily invested again in Windows Admin Center. Windows Admin Center now includes cluster create options and with that several workflows to created different types of clusters like HCI, HCI+SDN and more.  
        With these workflows we can setup the cluster completely using Windows Admin Center. Automation in the background makes sure the asked components are installed according to best-practices. 

        Stripped down OS

        Because Azure Stack HCI is intended for HCI clusters only, the OS it will be stripped down from unnecessary features. Meaning, many features that are currently part of the Window Server OS will not be available in the Azure Stack HCI OS… 
        Current features and roles in Windows Server 2019: 268 
        Current features and roles in Azure Stack HCI: 193 
        For example, the Active Directory and related roles such as DNS, Certificate Services, Federation Services, DHCP and Print Services will not be included, and more features might follow.  
        These features will still be available in the regular Window Server releases, just not in Azure Stack HCI.  

        Azure Stack HCI Billing

        Since Azure Stack HCI is a cloud solution, the billing model will change to a cloud billing model.

        Traditional Windows Server licensing

        With Windows Server there always has been a licensing model calculated per physical processor core. Depending on the number of physical processor cores in your server, a number of core-packs must be purchased.

        Azure Stack HCI licensing

        With Azure Stack HCI you are also licensed per physical processor core. The difference with Windows Server licensing is that there is no concept of core-packs, you pay for the amount of physical processor cores in your cluster.

        With this model the licensing costs switches from a CAPEX to an OPEX model.
        When Azure Stack HCI is down or up-scaled the day-to-day expenses change.

        Because the billing is managed through Microsoft Azure we can leverage the tools available to get more insights on costs. For example, with Azure Cost Analyses we can query the information and provide forecasts. In addition, the Azure APIs can be used with third party tooling for cost management.

        Guest operating systems not included

        One important aspect to note is that guest operating systems are not included in the license, like with Windows Server 2019 Datacenter edition.
        This means that you will need to license VMs running on the Azure Stack HCI solution.

        Azure Connection required once per month

        Because the billing runs through Microsoft Azure, the cluster must be registered to Microsoft Azure within 30 days after deployment. After registration the cluster needs to connect to Microsoft Azure once every 30 days to report cluster status. If the cluster is unable to report the cluster will be out of policy.


        Support via Azure support tickets

        As cloud solution, the support of Azure Stack HCI falls under the umbrella of Microsoft Azure support. That means that you could request support by going to and file a support request there for your Azure Stack HCI solution.


        Azure Stack HCI resource provider

        Microsoft has created a dedicated resource type in Azure Resource Manager for Azure Stack HCI clusters.

        By registering Azure Stack HCI clusters to the resource provider in Microsoft Azure an Azure Resource is created that represents the cluster.


        Self-service VMs through Azure Portal

        Want to offer your users a consistent experience with Azure? You now can.
        Azure Stack HCI makes use of the same toolset as Microsoft Azure, including the portal and ARM templates. Using Azure Resource Manager (ARM) you can also delegate access to users in your Azure AD.

        Contact Splitbrain for more information

        Unsure how the new Azure Stack HCI fits in your organization? Or what is going to happen to your existing Azure Stack HCI clusters based on Windows Server 2019?

        Contact us, we’re happy to help you.

          Automatically Update Storage Spaces Direct (S2D) Clusters

          Windows Updates may seem as ordinary business or something that you will deal with when the time is there, bear with us for a moment to explain why automatic updates on Storage Spaces Direct are different.

          For a long time now, we all know that it’s important to update our servers regularly with the latest Windows Updates for several reasons. 

          • It improves security because all software contains security flaws. Those flaws can be exploited for the wrong reasons by the wrong people. The updates fix the known security issues.

          • In some cases, it may improve performance because after all data from the field may give insights and some bits or bytes were not working as efficient as planned.

          • The stability of your environment may also increase. Since bugs are reported and get fixed and released through Windows Updates.

          Not your regular set of servers

          There are lots of ways to update your servers. You could do nothing, and Windows Update will at some point install the updates and eventually reboot your server. You could use Group Policies to download updates from Microsoft and schedule installation and reboot times to fit the company update policy. Other tools like Windows Server Update Service (WSUS) with GPO’s, System Center Configuration Manager (SCCM), Azure Update Management, or other third-party tools can also help to update your servers in a more controlled, centralized, and efficient way. When we look at clusters, in this case specifically Hyperconverged Infrastructure clusters, most of these tools are not sufficient enough and you should avoid using them. These HCI servers are not your regular set of servers, they require special attention and procedures to update them.


          Not very time-efficient but you can do it manually. Before you start, first validate the cluster status is healthy, then put a node in maintenance mode, install updates, restart the node. When it’s back online, monitor and wait for the storage to synchronize. When done, you can resume the node in the cluster. Now you can continue with the second node and repeat the process for every node in the cluster. Updating one node and waiting for the storage synchronization could take anywhere between 10 minutes and several hours depending on the change rate and performance of the nodes. You can imagine that this can take up several nights or weekends of IT personnel that could be spent otherwise.

          Virtual Machine Manager

          System Center Virtual Machine Manager (SCVMM) can help with automatically updating your S2D clusters by automating the update procedure. This way IT personnel can use their time on other matters and human errors is brought to the minimum. Virtual Machine Manager has specific support for Storage Spaces Direct or Azure Stack HCI clusters and takes care of the updating, restarting, and monitoring the storage repair jobs for you. You only need to start it, sit back, and let Virtual Machine Manager take care of the rest.

          Cluster Aware Updating

          Where SCVMM is additional software you need to purchase or may already have purchased, Cluster Aware Updating (CAU) is a free tool embedded in every Windows System as a feature. CAU is also capable of dealing with S2D or Azure Stack HCI clusters. Just like VMM, CAU also automates the update procedures and is aware of storage synchronization jobs.
          Three benefits of using Cluster Aware Updating;

          1. CUA allows update scheduling to install updates on a specific day and time

          2. Ability to use pre/post scripts to perform custom (Powershell) actions before or after an update of a node.

          3. CUA is able to install drivers and firmware in the process.

          Azure Automation

          Azure Update Management is a new way of automating Windows Updates on your servers. These servers can run in Azure or in your own datacenter. As it is a cloud offer on Azure, Microsoft is heavily investing in this. But still today you should avoid Azure Automation Update Management to patch cluster nodes. As described earlier this tool is not aware of clustering or storage jobs and will threaten your nodes as single instances, and things can miserably wrong fast.

          VMM or CUA?

          That leaves us with two choices. VMM and CAU both have their pro’s and con’s, but they have one thing in common.. they both save you time.
          If you want to learn more about updating your Storage Spaces Direct or Azure Stack HCI cluster and the different tools that are available to use you could watch the “Automatically Update S2D Cluster” video (in Dutch for now). In about 20 minutes we talk in-depth about the different tools to update Storage Spaces Direct or Azure Stack HCI clusters and go through the pros and cons. We will demonstrate both update processes and tell you all you need to know! Access the video here!

          Free 20-minutes video on Automatically update Storage Spaces Direct Clusters (Dutch)

          How to get your file server in Azure

          In today’s cloud orientated world, lots of File Servers lose the battle to modern solutions like Teams and Sharepoint. But what if these solutions don’t work for your company. For example, these files are not supported on those platforms or the applications working with the files don’t support accessing it from any other type of share than CIFS or SMB. Files are too big and accessed to regularly which cause latency or a too big demand on the internet connection?

          Are you out of luck and cannot take advantage of cloud services for this? Maybe not yet.. Typically, for most of the organizations 80% of the data is never to almost never accessed. What if the clients and applications require that the hot data is available with low latency? We can use Azure File Sync to move the 80% of cold data to Azure Files. This way we save disk space on storage. If all the files are stored in Azure it is also possible to store them in Azure Backup! It is most likely cheaper and automatically offsite.

          Let’s look at Azure Files and Azure File Sync!

          What is Azure Files?

          With Azure Files you have the ability to store files in Azure Storage accounts which can be presented as files shares that you can access over SMB. Microsoft put in a lot of effort the past year to enable NTFS permissions on this, and they have done a pretty good job in making it more usable now. They enabled AD integration so you can use NTFS permissions and groups now. And you can access the shares from remote offices and your datacenters with Azure Private Link. And last but not least? you can also use DFS-N (Distributed File Service Namespace) now.

          At Splitbrain we always recommend the use of DFS-N in file servers. DFS-N gives you great flexibility in case of migrations or when you need to move files and folders around to other disks or servers.

          What resources do we need?

          You can use different storage accounts based on your needs. For example in the case of availability you can use Local Redundant Storage (LRS), Zone Redundant Storage (ZRS) or go global with Global Redundant Storage (GRS). And there is a performance point of view with regular or premium storage.


          In Azure you pay for the resources you use. For Azure Files you can take a look at Azure Files pricing overview to calculate how much it will cost if you migrate your files to Azure Files. Keep in mind that not only storing the files in the share cost money. With accessing, listing and changing the files you will also be charged based on x amount of actions. In addition downloading the files from Azure Files to the client are charged as described here.

          What is Azure File Sync?

          With Azure File Sync you can extend your current file server to Azure in a tiered storage principle. You use your current file server as endpoint for your clients with all the current features there are today and offload the bulk of your data to Azure. You can configure the Azure File Sync agent to keep files newer than x amount of days on the file server per share or choose to not tier at all. This way most of the hot data is locally and only changes are sync from and to Azure.

          With Azure Files, Azure File Sync and your file server you can have best of both worlds!

          Note that you cannot use DFS-R and AFS Tiering on the same volume, they will bite each other.

          What resources do we need?

          With Azure File Sync we also require storage accounts to store the data. The same rules apply for the different types of storage accounts as described in the chapter Azure Files. On top of that we need a storage sync service to takes care of the synchronization and access of the data from the file server.


          For Azure Files information you can view the pricing chapter for Azure Files above. On top of that we also need an Azure File Sync Service (or storage sync service) and that one is free if you only use one file server. When you have more you will be charged for the additional servers. Keep in mind that most of your data that is changing is on your file server. So your operational costs will be lower with Azure File Sync in compare to Azure Files.

          Now let’s dive into that!

          Azure Files and Azure File Sync use cases

          To give you a better understanding of the possibilities with Azure Files and Azure File Sync we describe some example scenario’s below.

          Azure Files

          In this use-case the company moves all the files to Azure Files. Most of the companies keep the SMB port (445) closed to the internet. In addition, there are several ISPs that block SMB on their networks. To be able to move and access the files from the datacenter and/or office locations, we first need to setup a Site-to-Site VPN or an ExpressRoute. We also need our ADDS (Active Directory Domain Services) synced to AAD (Azure Active Directory) and enable ADDS authentication on the storage account for the file shares. The setup requires an Azure Private Link and Azure Private DNS to be able to resolve and access the Azure files shares over the ExpressRoute or VPN.

          If your company is already using several resources in Azure and you already have an ExpressRoute or VPN connection you can leverage this.

          We also need a Domain controller and DNS server in our datacenter/office and a server to host our DFS namespace. When we put all that in the mix and configure it the right way your users are able to access the file shares running in Azure.

          User A accesses the DFS Share and browse the Marketing folder. When opening the file, it is downloaded from the storage account to his device and opened.

          This scenario is great for small deployments, but this can get problematic for a number of reasons. For example when you have lots of users accessing the files. The files might be too large. These examples would have impact on the available internet bandwidth and could lead to other issues in the organization. The added latency could become a problem for some applications and the application experience becomes slow. If the internet connection is gone, so are the files. And there are more of these technical challenges to tackle when your environment is getting bigger. For these reasons Azure Files might not be the solution for your organization.

          Single file server with Azure File Sync

          We have a single file server running in your datacenter with for example 10 terabytes of storage. That takes up a lot of data on the underlying storage system and the backup.

          For this setup we need very little configuration in Azure. An Azure subscription with a storage account and Azure File Sync Service is all that we need. In the scenario described above all clients connect directly over SMB to the storage account. In this setup all users use SMB to connect to the file server and if a file is not local, the agent pulls the parts of the file that are needed over SSL to the server. If DFS-N is in the mix, it is also easy to migrate to new smaller disks while uploading the data to Azure to save space on your file server and storage. You can enable deduplication to further lower the storage footprint. Beware that files are unduplicated when they land in the storage account, so we cannot save space there.

          In this scenario user A accesses files from the IT folder in the DFS Share that are stored on the file server and delivered from local cache. The other file is from the marketing folder and not locally. The agent pulls the first bits to open the file to improve load time for the user while it continues to download the remaining bits. This way, Azure File Sync gives a performance advantage over the first scenario with Azure Files when working with big PDFs, PowerPoints or word documents when tiered files need to be downloaded from Azure Files.

          If your company is already using several resources in Azure and you might have an ExpressRoute connection, you can take advantage of syncing over Express Route instead of over the internet and use private endpoints to further lock down access.

          File server cluster with Azure File Sync

          In some cases, a company is more dependent on the files being available at all times and have an highly available (HA) file server. If you require an HA file server and currently have a cluster? No problem! The Azure File Sync agent is cluster aware. All we need is a second agent added to the scenario above with the single server. This way we can offload the data in the file server cluster to Azure File Shares.

          Multiple sites and regions use cases

          Most of the Azure resources and that include Azure Files are bound to their region for various reasons. For example underlying infrastructure reasons like network latency between resources. If your company operates on more geographical locations, it might not work to centralize files to a single place or a single region for Azure Files. In order to use multiple regions we need additional resources in the second region. US users accessing their files in Europe is less efficient because of the large distance. We can use Azure Files and Azure File Sync to offload data to Azure to overcome these challenges.

          Multi region with Azure Files

          In the example below we have offices in Europe and the US. The users can access their files through their locally available DFS Namespace and even access files from other locations. All files are pulled from Azure files to the user device for accessing and editing.

          Based on the setup, cross region traffic flows might be different. In the above example User C access a file in the Marketing US folder that is close to his office. User D access a file from the IT EU folder. The file from the Europe region is downloaded to his device and opened. Like in the first scenario all files come directly from the internet.

          Multi region with Azure File Sync

          As described in the scenario above with Azure Files, the setup with multiple locations is also possible with Azure File Sync. Users in both locations can access the files from their local file server and if the files are not available locally, the Azure File Sync agents pulls the bits from Azure Files transparently. When accessing files in the other region, like User C does, the same rules apply. If the file is locally available on the Europe file server, it’s presented directly to the user. If the file is not locally available, the agent of the file server in that region downloads the bits and provides it to the user so the file can be accessed.

          Move to Azure IaaS with Azure File Sync

          In some cases, companies move their file server to Azure IaaS. While the concept “Lift and Shift” looks plain and simple for file servers, it might not always be the answer because of disk efficiency, size and layout. Moving your file server 1-on-1 without optimizing it with cloud services can become a very cost inefficient solution. We have worked at projects with file servers containing 15TB of storage costing well over $2000 per month. When we optimized the fileserver with Azure File Sync, the costs dropped with more than 50%. The more storage, the bigger the savings.

          Pro-tip: Azure Backup

          Azure Backup is an Hybrid Backup solution on Microsoft Azure. With Azure Backup you are able to backup your VMs, SQL workloads and Files from your datacenter or hybrid datacenter. When using Azure Files and Azure File Sync you could leverage Azure Backup to further reduce datacenter storage costs and take advantage of the native integration of Azure Backup in Azure Files.

          In the case of Azure Files you may no longer have the option to backup the files in your datacenter. Well you technically could, it is probably not very efficient… With Azure Files sync you could do a file level backup of your file server but that would initiate a download of all the files and we don’t want that to happen.

          When going forward with a hybrid datacenter and moving files to the cloud, its important to also include the backup strategy in your design. When using Azure files it could be more cost efficient to use Azure Backup instead of you current backup solution.


          To wrap things up, we go back to our initial question. Can you still move files to the cloud if you can’t use Teams or SharePoint.. Yes you can!

          If you are looking at lowering the storage footprint or postpone a storage investment the above described scenarios might benefit you greatly. Start thinking about what an hybrid datacenter could do for you.

          Although it might seem straightforward, setups like this can be quite complicated and specific features might not work as you expect, backup is not as simple now and DR brings new options to the table. Based on the design there could also be hidden costs that are not as obvious as plain storage and/or license costs? At Splitbrain were a happy to help you out if you are looking at a hybrid datacenter with Azure Files and/or Azure File Sync, just drop us an e-mail!

          Terms and Conditions