Skip to content

Commit ba38d26

Browse files
authored
Merge pull request #105410 from Juliako/disaster_recovery
started bcdr
2 parents 35cb806 + 2209e48 commit ba38d26

File tree

2 files changed

+69
-0
lines changed

2 files changed

+69
-0
lines changed

articles/media-services/latest/TOC.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@
183183
href: live-event-error-codes.md
184184
- name: Job error codes
185185
href: job-error-codes.md
186+
- name: High availability guidance
187+
displayName: failover, bcdr
188+
href: media-services-high-availability-guidance.md
186189
- name: Migration guidance from v2 to v3
187190
href: migrate-from-v2-to-v3.md
188191
displayName: naming
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Azure Media Services high availability
3+
description: Learn how to failover to a secondary Media Services account if a regional datacenter outage or failure occurs.
4+
services: media-services
5+
documentationcenter: ''
6+
author: juliako
7+
manager: femila
8+
editor: ''
9+
10+
ms.service: media-services
11+
ms.subservice:
12+
ms.workload:
13+
ms.topic: article
14+
ms.custom:
15+
ms.date: 02/24/2020
16+
ms.author: juliako
17+
---
18+
19+
# Azure Media Services high availability guidance
20+
21+
Azure Media Services encoding service is a regional batch processing platform and not currently designed for high availability within a single region. The encoding service currently does not provide instant failover of the service if there is a regional datacenter outage or failure of underlying component or dependent services (such as storage, SQL, etc.) This article explains how to deploy Media Services to maintain a high-availability architecture with failover and ensure optimal availability for your applications.
22+
23+
By following the guidelines and best-practices described in the article, you will lower risk of encoding failures, delays, and minimize recovery time if an outage occurs in a single region.
24+
25+
## How to build a cross-regional encoding system
26+
27+
* [Create](create-account-cli-how-to.md) two (or more) Azure Media Services accounts.
28+
* Subscribe for **JobStateChange** messages in each account.
29+
30+
* In Media Services v3, it is done via Azure Event Grid. For more information, see:
31+
32+
* [Event Grid examples](../../event-grid/receive-events.md),
33+
* [Azure Event Grid schemas for Media Services events](media-services-event-schemas.md),
34+
* [Register for events via the Azure portal or the CLI](reacting-to-media-services-events.md) (you can also do it with the EventGrid Management SDK)
35+
* [Microsoft.Azure.EventGrid SDK](https://www.nuget.org/packages/Microsoft.Azure.EventGrid/) (which supports Media Services events natively).
36+
37+
You can also consume Event Grid events via Azure Functions.
38+
* In Media Services v2, this is done via [NotificationEndpoints](../previous/media-services-dotnet-check-job-progress-with-webhooks.md).
39+
* When you [create a job](transforms-jobs-concept.md):
40+
41+
* Randomly select an account from the list of currently used accounts (this list will normally contain both accounts but if issues are detected it may only contain one account). If the list is empty, raise an alert so an operator can investigate.
42+
* General guidance is you need one [media reserved unit](media-reserved-units-cli-how-to.md) per task or [JobOutput](https://docs.microsoft.com/rest/api/media/jobs/create#joboutputasset) (unless you are using [VideoAnalyzerPreset](analyzing-video-audio-files-concept.md) in v3).
43+
* Get the count of [media reserved units](media-reserved-units-cli-how-to.md) (MRUs) for the chosen account. If the current **media reserved units** count isn’t already at the maximum value, add the number of the MRUs needed by the job and update the service. If your job submission rate is high and you are frequently querying the MRUs to find you are at the maximum, use a distributed cache for the value with a reasonable timeout.
44+
* Keep a count of the number of inflight jobs.
45+
* When your JobStateChange handler gets a notification that a job has reached the scheduled state, record the time it enters the schedule state and the region/account used.
46+
* When your JobStateChange handler gets a notification that a job has reached the processing state, mark the record for the job as processing.
47+
* When your JobStateChange handler gets a notification that a job has reached the Finished/Errored/Canceled state, mark the record for the job as final and decrement the inflight job count. Get the number of media reserved units for the chosen account and compare the current MRU number against your inflight job count. If your inflight count is less than the MRU count, then decrement it and update the service.
48+
* Have a separate process that periodically looks at your records of the jobs. If you have jobs in the scheduled state that haven’t advanced to the processing state in a reasonable amount of time for a given region, remove that region from your list of currently used accounts.
49+
50+
* Depending on your business requirements, you could decide to cancel those jobs right away and resubmit them to the other account. Or, you could give them some more time to move to the next state.
51+
* After a period of time, add the account back to the currently used list (with the assumption that the region has recovered).
52+
53+
If you find the MRU count is thrashing up and down a lot, move the decrement logic to the periodic task. Have the pre-job submit logic compare inflight count to the current MRU count to see if it needs to update the MRUs.
54+
55+
## How to build video-on-demand cross region streaming
56+
57+
* Video-on-demand cross region streaming involves duplicating [Assets](assets-concept.md), [Content Key Policies](content-key-policy-concept.md) (if used), [Streaming Policies](streaming-policy-concept.md), and [Streaming Locators](streaming-locators-concept.md).
58+
* You will have to create the policies in both regions and keep them up to date.
59+
* When you create the streaming locators, you will want to use the same LocatorId value, ContentKey ID value, and ContentKey value.
60+
* If you are encoding the content, it is advised to encode the content in region A and publish it, then copy the encoded content to region B and publish it using the same values as from region A.
61+
* You can use traffic manager on the host names for the origin and the key delivery service (in Media Services configuration this will look like a custom key server URL).
62+
63+
## Next steps
64+
65+
* [Create an account](create-account-cli-how-to.md)
66+
* Check out [code samples](https://docs.microsoft.com/samples/browse/?products=azure-media-services)

0 commit comments

Comments
 (0)