New strategies to help your cloud deployment survive an outage

Many view the cloud as a fail-safe, always available environment. In reality, though, almost all the large cloud providers have suffered one or more major outages in the last 12-24 months.

As magical as cloud can seem, these services are being provided by computing resources housed in a data center, and when that data center has an issue, services can and will be impacted. Recently, for example, an issue in a major Microsoft data center in San Antonio, TX, shut down servers supporting Azure and a service impact resulted.

Only those systems and applications designed to take advantage of Azure’s geo-dispersed availability and recoverability features, or that employ some other way to bring a replicated or synchronized environment online, can typically survive a data center outage.

While the leading cloud vendors have made significant investments to reduce the likelihood of a large scale outage, there will always be some risk for consumers of these services. In a majority of the cases, however, this risk is significantly less than the risk you would have if you managed your own data center, making cloud an extremely attractive option.

Key to minimizing the impact of a potential cloud outage is understanding the services available to minimize the impact of outages.

Taking advantage of availability zones and peered regions

Both Azure and AWS, for example, provide Availability Zones and regions as a means to segment and isolate data and workloads from the impact of failures. Availability Zones are independent data centers connected at high speeds within a given region, designed to provide an additional layer of protection against an event such as a fire or power or cooling failure in a single data center from impacting the region as a whole.

Regions are geographically dispersed to minimize events that might span a wide metropolitan or geographic area, such as tornados, storms, flooding, or even large-scale explosions. There may be three availability zones within a given region. Azure has 54 regions across the globe, with eight in the United States alone.

For true high availability for multiple cloud applications, especially legacy apps in IaaS, you need multi-site replication and fail-over with a process or service to automate the recovery and cut-over. For Azure, two common strategies are:

  • The use of Azure-to-Azure ASR (Azure Site Recovery) which replicates an environment between two Azure Paired Data Centers and provides for orchestrated fail-over during an outage.
  • The use of application high availability features that take advantage of distributed systems, such as SQL Always on clustering and load balanced application, or web services that span Availability Zones or regions.

Taking advantage of availability zones and peered regions is designed into your cloud services from the beginning for PaaS born-in-the-cloud applications, and should be part of the cloud architecture and design of any cloud native app.

IaaS environments, as opposed to PaaS, often are housing legacy applications that may not always take advantage of native cloud scalability features, and instead may rely on High Availability (HA) architectures built into the Server OS and the applications themselves. This can be complex and challenging to maintain, and dependability varies by OS and application.  Additionally, this can require significant ongoing management and monitoring to ensure it works as expected when needed.

Azure to Azure ASR will help companies address IaaS limitations and allow for continuous complete environment replication across availability zones. The release allows VMs, services, networks and supporting services to be brought up on demand within a different region. It also simplifies the testing of disaster recovery through its “run book” like pre-checks and tests. While this still may mean you have a small window of downtime if an Azure data center or service in a region is impacted, you now have a reliable way to fail-over the environment and bring it back up in another availability zone or region.

Reach out to DXC Concerto for help to survive a regional outage.

Chris-Lavelle-HeadshotChris Lavelle is Vice President, Client Services at DXC Concerto Services. An expert in cloud service delivery and support, Chris has more than 20 years of experience in IT consulting and leadership of highly technical and service-related teams. Chris manages cloud advisory, migration and strategic delivery services for Concerto customers and partners, with an unyielding focus on driving efficiency, smooth integrations and exceptional service.

Speak Your Mind


This site uses Akismet to reduce spam. Learn how your comment data is processed.