Azure Storage Soon to Support Customer Initiated Failover

Azure Storage Soon to Support Customer Initiated Failover

If your business applications use Azure Storage as a piece of its infrastructure, you might already know about the various options for redundancy and disaster recovery. Those wanting peace-of-mind usually pick one of the geo-redundancy options to ensure their data is replicated in the event of a regional disaster. These options include:

  • Geo-redundant storage (GRS): This option replicates your data in a second region. In the event of an outage, the secondary region then serves as a redundant source of truth for your data.
  • Read-access geo-redundant storage (RA-GRS): This is Microsoft’s recommended option. Not only does it provide geo-redundant storage, but you also gain the added benefit of read access to the secondary endpoint. What this means is, in the event of an outage, applications configured for RA-GRS and designed for high availability can continue to read from the secondary endpoint.

One drawback of the GRS options is that Microsoft must initiate the failover to the secondary region. Documentation is not exactly clear on when Microsoft will choose to do this in the event of a regional failure. What this means is that it could end up being hours or more before your storage account and data become writable again. For many organizations, this is unacceptable.

As a workaround, we've had to engineer either dual write solutions within the software we build to simultaneously write the data to two storage containers, or engineer processes to replicate the files in near real time from one storage account to another. Doing these things gives organizations flexibility, knowing they can fail over to another storage account in a different region when they want. However, the solutions to do this are often complex, require their own monitoring, and require someone with technical skills to perform the failovers and fail-backs.

Customer Initiated Failover Support

A long time coming, Microsoft now has in preview a Customer Initiated Failover Support for Azure storage accounts. We expect this to become generally available this year.

This solution makes it possible for customers to manually initiate a failover of their storage account to the secondary backup region whenever they want, and no complicated engineering is needed for your applications to utilize it. Microsoft manages it all through DNS updates for your storage account endpoints.

The following image from Microsoft shows how this works:

FailoverGraphic

Source: https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance?toc=%2fazure%2fstorage%2fblobs%2ftoc.json

Important Things to Consider When Using Account Failover in Azure Storage

Some additional considerations and cautions from Microsoft on using the failover process include:

It May Take an Hour for Failover to Complete

After you fail over from the primary to the secondary region, your storage account is configured to be locally redundant in the new primary region. You can configure the account for geo-redundancy again by updating it to use GRS or RA-GRS. When the account is configured for geo-redundancy again after a failover, the new primary region immediately begins replicating data to the new secondary region, which was the primary before the original failover. However, it may take a period of time before existing data in the primary is fully replicated to the new secondary. Which brings me to my second point…

Avoid Data Loss

To avoid a major data loss while you wait for the failover to complete, check the value of the Last Sync Time property before failing back. Compare the last sync time to the last times that data was written to the new primary to evaluate expected data loss.

Azure Virtual Machines Do Not Fail Over

Azure virtual machines (VMs) do not fail over as part of an account failover. Therefore, in the event of an outage in the primary region, you will need to recreate any VMs after the failover is complete. If you need to fail over an account that contains unmanaged disks attached to Azure VMs, you’ll want to be sure to shut down the VM before initiating the failover.

Azure File Sync does not support storage account failover

  • Storage accounts containing Azure file shares being used as cloud endpoints in Azure File Sync should not be failed over. Doing so will cause sync to stop working and may also cause unexpected data loss in the case of newly tiered files.
  • Storage accounts using Azure Data Lake Storage Gen2 hierarchical namespace cannot be failed over.
  • A storage account containing archived blobs cannot be failed over. Maintain archived blobs in a separate storage account that you do not plan to fail over.
  • A storage account containing premium block blobs cannot be failed over and do not currently support geo-redundancy.

For more information on disaster recovery and account failover in Azure storage, you can read the complete Microsoft article here.

Leveraging opportunities in the cloud