Skip to content

Commit 3f17e68

Browse files
Merge pull request #291930 from anaharris-ms/rh-shared-responsibility
Reliability Hub - Add shared responsibility article
2 parents f567753 + 0f26f00 commit 3f17e68

8 files changed

+95
-12
lines changed

articles/reliability/TOC.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22
href: index.yml
33
- name: What is reliability?
44
href: overview.md
5+
- name: Reliability fundamentals
6+
items:
7+
- name: Shared responsibility for resiliency
8+
href: concept-shared-responsibility.md
9+
- name: Azure service incident response
10+
href: incident-response.md
511
- name: Availability zone support
612
items:
713
- name: What are Azure availability zones?
@@ -444,8 +450,6 @@
444450
href: /azure/well-architected/resiliency/chaos-engineering
445451
- name: Reliability in Microsoft Azure Well-Architected Framework
446452
href: /azure/well-architected/reliability
447-
- name: Azure service incident response
448-
href: ./incident-response.md
449453
- name: Azure Service Manager retirement
450454
items:
451455
- name: Overview

articles/reliability/availability-zones-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ Many regions also have a [*paired region*](./cross-region-replication-azure.md#a
7474

7575
## Shared responsibility model
7676

77-
The [shared responsibility model](/azure/security/fundamentals/shared-responsibility) describes how responsibilities are divided between the cloud provider (Microsoft) and you. Depending on the type of services you use, you might take on more or less responsibility for operating the service.
77+
The [shared responsibility model](./concept-shared-responsibility.md) describes how responsibilities are divided between the cloud provider (Microsoft) and you. Depending on the type of services you use, you might take on more or less responsibility for operating the service.
7878

7979
Microsoft provides availability zones and regions to give you flexibility in how you design your solution to meet your requirements. When you use managed services, Microsoft takes on more of the management responsibilities for your resources, which might even include data replication, failover, failback, and other tasks related to operating a distributed system.
8080

articles/reliability/business-continuity-management-program.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ A good example of the shared responsibility model is the deployment of virtual m
3939

4040
Customer-enabled disaster recovery services all have public-facing documentation to guide you. For an example of public-facing documentation for customer-enabled disaster recovery, see [Azure Data Lake Analytics](../data-lake-analytics/data-lake-analytics-disaster-recovery.md).
4141

42-
For more information on the shared responsibility model, see [Microsoft Trust Center](../security/fundamentals/shared-responsibility.md).
42+
For more information, see [Shared responsibility for resiliency](./concept-shared-responsibility.md).
4343

4444
## Business continuity compliance: Service-level responsibility
4545

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Shared responsibility for resiliency
3+
description: Learn about the shared responsibility model for resiliency in the Azure cloud platform.
4+
ms.service: azure
5+
ms.subservice: azure-availability-zones
6+
ms.topic: conceptual
7+
ms.date: 12/14/2024
8+
ms.author: anaharris
9+
author: anaharris-ms
10+
ms.custom: subject-reliability
11+
---
12+
13+
# Shared responsibility for resiliency
14+
15+
In the Azure public cloud platform, resiliency is a shared responsibility between Microsoft and you. Because there are different levels of resiliency in each workload that you design and deploy, it's important that you understand who has primary responsibility for each one of those levels from a resiliency perspective.
16+
17+
To help you better understand how shared responsibility works, especially when confronting an outage or disaster, this article describes the shared responsibility *model* for resiliency. For more information on how to actually use this model to plan for disaster recovery, see [Recommendations for designing a disaster recovery strategy](/azure/well-architected/reliability/disaster-recovery).
18+
19+
## Shared responsibility model for resiliency
20+
21+
The shared responsibility model for resiliency is comprised of three levels:
22+
23+
- [Core platform reliability](#core-platform-reliability). The Azure platform provides a base level of reliability for all customers and all services through the underlying infrastructure, services, and processes.
24+
- [Resilience-enhancing capabilities](#resilience-enhancing-capabilities) Azure offers a suite of built-in features and services that enhance resiliency, such as using availability zones, deploying across multiple regions, and implementing backup strategies. While Azure provides these capabilities, it's your responsibility to evaluate and configure them to align with your specific requirements. Requirements can include reliability, cost, performance, and compliance with regulatory standards.
25+
- [Applications](#applications). To make effective use of the other levels, your application and workload must be designed for resiliency.
26+
27+
:::image type="content" source="media/shared-responsibility/shared-responsibility-model.jpg" alt-text="Diagram showing shared responsibility model for resiliency: Core platform reliability, resilience-enhancing capabilities, and applications." border="false":::
28+
29+
Microsoft is solely responsible for core platform reliability. Microsoft is also responsible for providing resilience-enhancing capabilities that you can use. You're responsible for selecting and using the appropriate components.
30+
31+
Whether you choose SaaS, PaaS, or IaaS service categories determines what kind of decisions you make. For example, if you use a SaaS service, you typically don't need to opt into using availability zones. If you use PaaS services for your data tier, you might have automated capabilities for backup available to you. If you use IaaS services, you typically need to plan and implement many resiliency capabilities yourself.
32+
33+
> [!NOTE]
34+
> Service categories (SaaS, PaaS, and IaaS) are useful as a broad grouping of services, but it's important to understand your responsibilities for each individual service you use.
35+
>
36+
> The [reliability guides](./overview-reliability-guidance.md) provide an overview of how each service works from a resiliency perspective, and help you to make informed decisions about how to configure your services to meet your needs.
37+
38+
You're also responsible for your application and workload design, and for defining your reliability requirements, which helps you to decide how to design and configure your solution.
39+
40+
### Core platform reliability
41+
42+
The Microsoft cloud platform consists of a large amount of infrastructure, hardware, software, and processes to support service deployment and management. Each component is designed to be highly resilient, with multiple redundancies for hardware and with research-based software processes. Together, these components comprise the core platform reliability level. Some examples of how Microsoft provides a reliable platform include the following:
43+
44+
- Networks have redundant links and can dynamically bypass faulty segments.
45+
- Within each region, datacenters are connected through a low-latency network, which enables a variety of data replication approaches.
46+
- Datacenter facilities have redundant power, cooling, and network connections. They're operated by onsite teams who secure, monitor, and manage them.
47+
- Hardware, including clusters and racks, have redundancy at multiple layers.
48+
- Updates to compute clusters, racks, and hosts follow a controlled process. We use techniques like hotpatching to reduce or eliminate impact to hosts.
49+
- Software platform updates and configuration changes are applied by following our safe deployment practices.
50+
- Microsoft audits critical external suppliers to ensure that a third-party outage doesn't disrupt Azure services.
51+
- Each Azure service must have a detailed disaster recovery plan. We conduct full-region down drills in regions that match production environments.
52+
53+
All Azure services benefit from these core platform reliability capabilities, and with the ongoing improvements Microsoft makes.
54+
55+
### Resilience-enhancing capabilities
56+
57+
Azure provides many different resilience-enhancing capabilities. Although Microsoft is responsible for providing these capabilities, you are entirely responsible for selecting and using the appropriate ones for your needs. Some examples of these capabilities include:
58+
59+
- **Regions.** Azure has over 60 regions, and you can use multiple regions in a single solution to achieve geo-redundancy, meet your data residency needs, and enable low-latency communication to users globally.
60+
61+
- **Availability zones.** Many Azure regions support availability zones, which enable you to distribute your workloads across multiple independent sets of datacenters. Azure services support availability zones in a way that suits their intended purpose, usually by supporting zonal deployments (pinned to a single zone) and/or zone-redundant deployments (spread across multiple zones). To learn more about availability zones, see [What are availability zones?](./availability-zones-overview.md).
62+
63+
- **Service tiers.** Services provide a range of offerings and tiers that suit different requirements. For example, when you create a virtual machine, you can choose between a standard disk, which provides a low-cost option, or a premium disk to achieve a higher level of availability.
64+
65+
- **Backups.** Many Azure services that store data support backups, which might be automatic, manual, or both. With backups, you can protect your workload against outages as well as data corruption and other data loss events.
66+
67+
- **Governance.** Platform capabilities like Azure Policy, role-based access control, and Microsoft Entra ID identity protection capabilities, can be configured to enforce your organization's requirements consistently. With these approaches you can protect your workloads against security incidents and accidental changes that might cause downtime or other problems with your workload.
68+
69+
> [!IMPORTANT]
70+
> It's important to understand the *service level agreements* (SLAs) for each Azure service. SLAs provide important information on the expected uptime of the service, and any conditions you need to meet to be eligible for the SLA. For SLAs for each service, see [Service Level Agreements (SLA) for Online Services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
71+
72+
### Applications
73+
74+
It's your responsibility to make sure that your applications are designed to be resilient. Use the [Azure Well-Architected Framework](/azure/well-architected) pillars to drive architectural excellence at the fundamental level of a workload. The [reliability pillar](/azure/well-architected/reliability/) focuses on how you can make your workload and applications resilient to different types of failures, and to enable recovery when failures occur.
75+
76+
## Next steps
77+
78+
The shared responsibility model applies to other parts of your solution beyond resiliency. For more information on the shared responsibility model for security, see [Microsoft Trust Center](../security/fundamentals/shared-responsibility.md).

articles/reliability/cross-region-replication-azure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Some Azure services support cross-region replication to ensure business continui
2222

2323
## Shared responsibility
2424

25-
Not all Azure services automatically replicate data or automatically fall back from a failed region to cross-replicate to another enabled region. In these scenarios, you are responsible for recovery and replication. These examples are illustrations of the *shared responsibility model*. It's a fundamental pillar in your disaster recovery strategy. For more information about the shared responsibility model and to learn about business continuity and disaster recovery in Azure, see [Business continuity management in Azure](business-continuity-management-program.md).
25+
Not all Azure services automatically replicate data or automatically fall back from a failed region to cross-replicate to another enabled region. In these scenarios, you are responsible for recovery and replication. These examples are illustrations of the *shared responsibility model*. It's a fundamental pillar in your disaster recovery strategy. For more information, see [Shared responsibility for resiliency](./concept-shared-responsibility.md).
2626

2727
Shared responsibility becomes the crux of your strategic decision-making when it comes to disaster recovery. Azure doesn't require you to use cross-region replication, and you can use services to build resiliency without cross-replicating to another enabled region. But we strongly recommend that you configure your essential services across regions to benefit from [isolation](../security/fundamentals/isolation-choices.md) and improve [availability](availability-zones-overview.md).
2828

articles/reliability/disaster-recovery-overview.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Disaster recovery overview for Microsoft Azure products and service
44
author: anaharris-ms
55
ms.service: azure
66
ms.topic: conceptual
7-
ms.date: 08/25/2023
7+
ms.date: 12/06/2024
88
ms.author: anaharris
99
ms.custom: subject-reliability, subject-reliability
1010
ms.subservice: azure-reliability
@@ -30,8 +30,7 @@ Each major process or workload that an application implements should have separa
3030

3131
## Design for disaster recovery
3232

33-
Disaster recovery isn't an automatic feature, but must be designed, built, and tested. To support a solid DR strategy, you must build an application with DR in mind from the ground up. Azure offers services, features, and guidance to help you support DR when you create apps.
34-
33+
Disaster recovery isn't an automatic feature, but must be designed, built, and tested. To support a solid DR strategy, you must build an application with DR in mind from the ground up. Azure offers services, features, and guidance to help you support DR when you create apps. To understand what you need to do to support DR, you must first understand the shared responsibility model for resiliency. For more information, see [Shared responsibility for resiliency](./concept-shared-responsibility.md).
3534

3635

3736

@@ -70,6 +69,8 @@ Most services that run on Azure platform as a service (PaaS) offerings like [Azu
7069

7170
## Next steps
7271

72+
- [Shared responsibility for resiliency](./concept-shared-responsibility.md).
73+
7374
- [Disaster recovery guidance by service](./disaster-recovery-guidance-overview.md)
7475

7576
- [Cloud Adaption Framework for Azure - Business continuity and disaster recovery](/azure/cloud-adoption-framework/ready/landing-zone/design-area/management-business-continuity-disaster-recovery)

articles/reliability/includes/reliability-disaster-recovery-description-include.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: include file
3-
description: include file
2+
title: Description of disaster recovery
3+
description: Description of disaster recovery
44
author: anaharris-ms
55
ms.service: azure
66
ms.topic: include
@@ -10,10 +10,10 @@
1010
---
1111

1212

13-
Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see [Recommendations for designing a disaster recovery strategy](/azure/well-architected/reliability/disaster-recovery).
13+
Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see [Recommendations for designing a disaster recovery strategy](/azure/well-architected/reliability/disaster-recovery).
1414

1515

16-
When it comes to DR, Microsoft uses the [shared responsibility model](../business-continuity-management-program.md#shared-responsibility-model). In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you are responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use [service-specific features to support fast recovery](../reliability-guidance-overview.md) to help develop your DR plan.
16+
When it comes to DR, Microsoft uses the [shared responsibility model](../concept-shared-responsibility.md). In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you're responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use [service-specific features to support fast recovery](../reliability-guidance-overview.md) to help develop your DR plan.
1717

1818

1919

377 KB
Loading

0 commit comments

Comments
 (0)