Skip to content

Commit a8eaa5a

Browse files
authored
Merge pull request #282 from mkistler/lro
Integrate LRO Guidance to make Guidelines self-contained
2 parents 5dd9272 + fc61de1 commit a8eaa5a

File tree

6 files changed

+172
-16
lines changed

6 files changed

+172
-16
lines changed

azure/ConsiderationsForServiceDesign.md

Lines changed: 105 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1-
## Considerations for Service Design
2-
### History
1+
# Considerations for Service Design
2+
3+
## History
34

45
| Date | Notes |
56
| ----------- | -------------------------------------------------------------- |
7+
| 2021-Sep-11 | Add long-running operations guidance |
68
| 2021-Aug-06 | Updated Azure REST Guidelines per Azure API Stewardship Board. |
79

8-
### Introduction
10+
## Introduction
911

1012
Great APIs make your service usable to customers. They are intuitive, naturally reflecting and communicating the underlying model and its behavior. They lend themselves easily to client library implementations in multiple programming languages. And they don't "get in the way" of the developer, by remaining stable and predictable, _especially over time_.
1113

@@ -22,15 +24,15 @@ It is critically important to design your service to avoid disrupting users as t
2224

2325
:white_check_mark: **DO** ensure that customers are able to adopt a new version of service or SDK client library **without requiring code changes**
2426

25-
### Azure Management Plane vs Data Plane
27+
## Azure Management Plane vs Data Plane
2628
*Note: Developing a new service requires the development of at least 1 (management plane) API and potentially one or more additional (data plane) APIs. When reviewing v1 service APIs, we see common advice provided during the review.*
2729

2830
A **management plane** API is implemented through the Azure Resource Manager (ARM) and is used to provision and control the operational state of resources.
2931
A **data plane** API is used by developers to implement applications. Occasionally, some operations are useful for provisioning/control and applications. In this case, the operation can appear in both APIs.
3032
Although, best practices and patterns described in this document apply to all HTTP/REST APIs, they are especially important for **data plane** services because it is the primary interface for developers using your service. The **management plane** APIs may have other preferred practices based on [the conventions of the Azure ARM](https://github.com/Azure/azure-resource-manager-rpc).
3133

3234

33-
### Start with the Developer Experience
35+
## Start with the Developer Experience
3436
A great API starts with a well thought out and designed service. Your service should define simple/understandable abstractions with each given a clear name that you use consistently throughout your API and documentation. There must also be an unambiguous relationship between these abstractions.
3537

3638
Follow these practices to create clear names for your abstractions:
@@ -49,7 +51,7 @@ It is extremely difficult to create an elegant API that works well on top of a p
4951

5052
The whole purpose of a preview to address feedback by improving abstractions, naming, relationships, API operations, and so on. It is OK to make breaking changes during a preview to improve the experience now so that it is sustainable long term.
5153

52-
### Focus on Hero Scenarios
54+
## Focus on Hero Scenarios
5355
It is important to realize that writing an API is, in many cases, the easiest part of providing a delightful developer experience. There are a large number of downstream activities for each API, e.g. testing, documentation, client libraries, examples, blog posts, videos, and supporting customers in perpetuity. In fact, implementing an API is of miniscule cost compared to all the other downstream activities.
5456

5557
*For this reason, it is **much better** to ship with fewer features and only add new features over time as required by customers.*
@@ -71,7 +73,7 @@ Understanding how your service is used and defining its model and interaction pa
7173

7274
:white_check_mark: **DO** create an [OpenAPI Definition](https://github.com/OAI/OpenAPI-Specification/blob/main/versions/2.0.md) (with [autorest extensions](https://github.com/Azure/autorest/blob/master/docs/extensions/readme.md)) describing the service. The OpenAPI definition is a key element of the Azure SDK plan and is essential for documentation, usability and discoverability of services.
7375

74-
### Use Previews to Iterate
76+
## Use Previews to Iterate
7577
Before releasing your API plan to invest significant design effort, get customer feedback, & iterate through multiple preview releases. This is especially important for V1 as it establishes the abstractions and patterns that developers will use to interact with your service.
7678

7779
:ballot_box_with_check: **YOU SHOULD** write and test hypotheses about how your customers will use the API.
@@ -84,7 +86,7 @@ Before releasing your API plan to invest significant design effort, get customer
8486

8587
:ballot_box_with_check: **YOU SHOULD** capture what you have learned during the preview stage and share these findings with your team and with the API Stewardship Board.
8688

87-
### Avoid Surprises
89+
## Avoid Surprises
8890
A major inhibitor to adoption and usage is when an API behaves in an unexpected way. Often, these are subtle design decisions that seem benign at the time, but end up introducing significant downstream friction for developers.
8991

9092
One common area of friction for developers is _polymorphism_ -- where a value may have any of several types or structures.
@@ -110,13 +112,107 @@ Another important design pattern for avoiding surprises is idempotency. An opera
110112
HTTP requires certain operations like GET, PUT, and DELETE to be idempotent, but for cloud services it is important to make _all_ operations idempotent so that clients can use retry in failure scenarios without risk of unintended consequences.
111113
See the [HTTP Request / Response Pattern section of the Guidelines](./Guidelines.md#http-request--response-pattern) for detailed guidance on making operations idempotent.
112114

113-
### Design for Change Resiliency
115+
## Design for Change Resiliency
114116
As you build out your service and API, there are a number of decisions that can be made up front that add resiliency to client implementations. Addressing these as early as possible will help you iterate faster and avoid breaking changes.
115117

116118
:ballot_box_with_check: **YOU SHOULD** use extensible enumerations. Extensible enumerations are modeled as strings - expanding an extensible enumeration is not a breaking change.
117119

118120
:ballot_box_with_check: **YOU SHOULD** implement [conditional requests](https://tools.ietf.org/html/rfc7232) early. This allows you to support concurrency, which tends to be a concern later on.
119121

122+
## Long-Running Operations
123+
124+
Long-running operations are an API design pattern that should be used when the processing of
125+
an operation may take a significant amount of time -- longer than a client will want to block
126+
waiting for the result.
127+
Azure allows for two forms of this design pattern: resource-based long-running operations (RELO),
128+
which is the preferred pattern, and long-running operations with a status monitor.
129+
130+
In both patterns, the processing of the operation is initiated by one API call and the client
131+
obtains the results of the operation from a subsequent API call.
132+
Here we illustrate the sequence of API calls involved in each of these patterns.
133+
134+
### Resource-based long-running operations
135+
136+
In the RELO pattern, the resource that is the target of the operation contains a `status` field
137+
that holds the status of an outstanding or last completed operation.
138+
This means that the client can use a standard "get" operation on the resource to determine the
139+
status of an operation it initiated. The flow looks like this:
140+
141+
<!-- markdownlint-disable MD033 -->
142+
<p align="center">
143+
<img src="./relo.jpg" alt="The RELO flow"/>
144+
</p>
145+
<!-- markdownlint-enable MD033 -->
146+
147+
1. The client sends the initial request to the resource to initiate the long-running operation.
148+
This initial request could be a PUT, PATCH, POST, or DELETE method.
149+
150+
2. The resource validates the request and initiates the operation processing.
151+
It sends a response to client with a `200-OK` HTTP status code (or `201-Created` if the operation
152+
is a create operation) and a representation of the resource where the `status` field is set
153+
to a value indicating that the operation processing has been started.
154+
155+
3. The client then issues a GET request to the resource to determine if the operation processing
156+
has completed.
157+
158+
4. The resource responds with a representation of the resource. While the operation is still being
159+
processed, the status field will contain a "non-terminal" value, like `Processing`.
160+
161+
5. After the operation processing has completed, a GET request from the client will receive a response
162+
where the status field contains a "terminal" value -- `Succeeded`, `Failed`, or `Canceled` --
163+
that indicates the result of the operation.
164+
165+
A resource may support multiple outstanding RELO operations, where the status field of the resource
166+
indicates the combined status of the outstanding operations.
167+
If a new operation request is received when there is already a long-running operation in progress for a resource,
168+
the service should reject the operation if it is inconsistent with one already in progress.
169+
However, if the new operation is redundant or not inconsistent with the one in progress,
170+
for example a "reboot" operation on a VM that is in the process of rebooting, then the service should
171+
accept the request. The status field of the resource should then report the completion status of _both_
172+
operations.
173+
174+
Note: The RELO pattern should not be used in cases where the completion status of individual operations
175+
may be important to users, as opposed to simply learning that an operation of the type they requested
176+
(e.g. create a resource with a specific name) has successfully completed.
177+
178+
### Long-running operations with status monitor
179+
180+
In the LRO with status monitor pattern, the status and results of the operation are encapsulated into
181+
a status monitor resource that is distinct from the target resource and specific to the individual
182+
operation request. Here's what the status monitor LRO pattern looks like:
183+
184+
<!-- markdownlint-disable MD033 -->
185+
<p align="center">
186+
<img src="./statmon.jpg" alt="The status monitor LRO flow"/>
187+
</p>
188+
<!-- markdownlint-enable MD033 -->
189+
190+
1. The client sends the request to initiate the long-running operation.
191+
As in the RELO pattern, the initial request could be a PUT, PATCH, POST, or DELETE method.
192+
193+
2. The resource validates the request and initiates the operation processing.
194+
It sends a response to the client with a `202-Accepted` HTTP status code.
195+
Included in this response is an `Operation-location` response header with the absolute URL of
196+
status monitor for this specific operation.
197+
The response also includes a `Retry-after` header telling the client a minimum time to wait (in seconds)
198+
before sending a request to the status monitor URL.
199+
200+
3. After waiting at least the amount of time specified by the previous response's `Retry-after` header,
201+
the client issues a GET request to the status monitor URL.
202+
203+
4. The status monitor URL responds with information about the operation including its current status,
204+
which should be represented as one of a fixed set of string values in a field named `status`.
205+
If the operation is still being processed, the status field will contain a "non-terminal" value, like `Processing`.
206+
207+
5. After the operation processing completes, a GET request to status monitor URL returns a response with a status field containing a terminal value -- `Succeeded`, `Failed`, or `Canceled` -- that indicates the result of the operation.
208+
If the status is `Failed`, the status monitor resource must contain an `error` field with a `code` and `message` that describes the failure.
209+
If the status is `Succeeded`, the response may contain additional fields as appropriate, such as results
210+
of the operation processing.
211+
212+
An important distinction between RELO and status monitor LROs is that there is a unique status monitor for each
213+
status monitor LRO, whereas the status of all RELO operations is combined into the status of the resource.
214+
So status monitor LROs are "one-to-one" with their operation status, whereas RELO-style LROs are "many-to-one".
215+
120216
## Getting Help: The Azure REST API Stewardship Board
121217
The Azure REST API Stewardship board is a collection of dedicated architects that are passionate about helping Azure service teams build interfaces that are intuitive, maintainable, consistent, and most importantly, delight our customers. Because APIs affect nearly all downstream decisions, you are encouraged to reach out to the Stewardship board early in the development process. These architects will work with you to apply these guidelines and identify any hidden pitfalls in your design.
122218

0 commit comments

Comments
 (0)