Skip to main content
Blog

Cloud Disaster Recovery

Leveraging public cloud for the purposes of disaster recovery (DR) from on-premises and/or another platform has been a longstanding cloud use case.

10
Matt Larder 180 x 180

Matt Larder

Head of Cloud, Softcat

Leveraging public cloud for the purposes of disaster recovery (DR) from on-premises and/or another platform has been a longstanding cloud use case. It’s easy to understand the appeal of this approach, since it enables moving away from legacy work methods, such as running costly and unsustainable DR systems or relying on very slow cold DR systems that take a significant amount of time to recover.

Conversely, the execution of Cloud-based DR is often misunderstood, with many organisations approaching it as if it were still an on-premises / in a datacentre, introducing operational complexity and unnecessary costs.

In recent years a new use case has become much more prominent, which stems from the question in the back of the mind – can the Cloud really deliver on its promise of resilience? This has led many to wonder if putting all their IT eggs in one Cloud basket is a risk they should avoid. So, should you consider multi-cloud disaster recovery instead?

In this guide, we’ll explore these scenarios a little more, as well as offer advice and guidance against each of them.

Cloud DR – the basics

Firstly, the use and extensivity of DR should be guided by the business criticality of the workload.

Crucially the basics are different based on the scenario – DR in Cloud (for workloads already running there) or DR to Cloud (for workloads on-premises requiring a new approach).

With that context, I’ll outline some basic principles:

  1. Understand the differences – For on-premises setups, we generally need multiple datacentres to ensure even the most basic (high RTO and RPO) disaster recovery for IT systems. Whereas in Cloud, a single location/region is made up of multiple physical datacentres, known as Availability Zones, making the region much more reliable than an on-premises datacentre.

  2. Know your Clouds – building on the previous point, Cloud does things differently from on-premises. While both on-premises and the Cloud often rely on third-party software or hardware solutions for backup, high availability, replication, and failover, the Cloud offers built-in mechanisms for reliability (in terms of availability and recovery) via many of its services and features. This applies in equal measure to DR in Cloud or DR to Cloud – for the latter many Cloud providers offer built-in (and chargeable!) options, such as Microsoft Azure Site Recovery and AWS Elastic Disaster Recovery as replacements for third-party solutions used in on-premises setups.

  3. Understand your workload and its weak points – typically the type of cloud workload dictates the effort you need to maintain the reliability of the workload, for example:
    1. Monolithic or static applications built on IaaS - typically require a backup and replication mechanism in place to ensure the protection of systems, data, and state. For example, Cloud Provider IaaS snapshots + Application/Database backup/replication.
    2. Cloud-native scalable applications or serverless applications - typically use components that offer native backup and replication capabilities. For instance, this could include geo-redundant point-in-time restore, or AWS RDS backup and point-in-time restore, or geo-replicated object storage. In many cases, enabling the DR option can be as simple as checking a box.

Cloud DR – the options and considerations

To add more context to the high-level basics and in leveraging said native controls, the following typical disaster recovery options are available: 

 

 

Option 1:  

Cold Standby 

Option 2:  

Pilot Light 

Option 3:  

Hot Standby 

Description 

The first Cloud region or on-premises datacentre is used as the primary site.   

  

A second cloud region is used as a cold standby site for backup replication from the First Region. 

 

Crucially the secondary region is only minimally built out to reduce all costs.

The first Cloud region or on-premises datacentre is used as the primary site.   

  

A second cloud region is used as a standby site with more advanced replication (backup or workload) from the first region.

 

Crucially the second region is pre-built, with a minimally performant base infrastructure and accepting replication from the First Region or utilising technologies such as Azure Site Recovery. 

The first Cloud region or on-premises datacentre is used as the primary site.   

  

A second cloud region is used as a hot standby site with advanced replication (backup or workload) from the first region.

 

Crucially the second region is pre-built, with the primary region mirror infrastructure and accepting replication from the First Region or utilising technologies such as Azure Site Recovery. 

Disaster Recovery Invocation 

Complete recovery and deployment of infrastructure and application services carried out using backups, followed by failover.

Scale up of base infrastructure for production load, recovery and deployment of remaining infrastructure and application services using backups, then failover (could be automated).

Recovery of data and application services and failover (could be automated).

Typical SLA 

99.9% 

99.95% 

99.95% 

Typical RTO

~1week

Days

Hours

Typical RPO 

Hours

Hours to minutes

Hours to minutes

Typical Cloud Costs 

Included, no cost increase. 

+25%  

+50% 

 

Multi-cloud DR – separating fact from fiction

Many organisations are questioning their selection of a Cloud provider and whether using just one is a risk.

My initial and continued reaction to this question, is why use more than one, but let’s explore some reasons:

  • Workload suitability – this is my go-to answer – if one Cloud provider has greater feature suitability for your workload, then branch out. However, for the majority of organisations, one Cloud provider should suffice.
  • Provider resilience – it’s true many organisations question whether it’s a risk to only have IT built on one provider platform. I would consider this in two parts:
  1. Part 1 – Stability – AWS, Microsoft and Google are among the largest and probably what you would consider the most ‘financially stable’ organisations. However, it's true that recent years have been challenging for many behemoth organisations, and these challenges could translate into risks for you. For instance, we have seen increased pricing and unavailability of service from these big-3 providers over the years.

  2. Part 2 – Effort/Cost vs Risk – in my experience, this is the more practical part of the question.  Perhaps you’re already using multiple clouds, or the risk posed is too great for you to ignore.  So, the practicality of achieving multi-Cloud DR for you will come down to how deeply you are integrated into the Cloud:

    - If most of your workloads are static IaaS based, then you could follow the guidance set in this blog to use tools (cloud native or third party) to backup or replicate your systems to another platform. While the process itself isn't too demanding, it's essential to keep in mind that there might be compatibility issues when you're failing over. Therefore, thorough testing is key.

    - If most or some of your workloads are Cloud-native or serverless, then I would argue the effort and time required to unpick what you have isn't worth it to mitigate the improbability of the risk. Though that’s not to say it isn’t achievable. Depending on your deployment, for example, if it’s Kubernetes based, there are lots of great third-party software platforms that can sit a layer above Cloud making this task more accessible. However, it's crucial to consider whether you have implemented such technology from the start or are now considering retrofitting it to manage the risk.

Conclusion

To summarise, for production workloads, you always need a DR plan – don’t rely on a single Cloud region.

Take your time to understand the differences between on-premises vs Cloud, and the available cloud-native features to make your DR plan as simple as it can be.

Additionally, consider the probability of the risk you’re trying to mitigate as the primary factor in determining whether you should mitigate it.

Ultimately remember that solutions are not always one-size-fits-all. For instance, organisations with large-scale hybrid cloud deployments may still require third-party solutions to meet their specific requirements. So, consider your unique needs and find a solution that suits you.

Cloud services from Softcat

Work with Softcat for all your cloud technology needs. Our specialists offer an independent view of your cloud journey, offering practical advice tailored to your needs.

For more information, get in touch with our team today.