Skip to content

Commit 3431e24

Browse files
committed
Add guidelines on returning string offsets & lengths
1 parent 8c40492 commit 3431e24

File tree

2 files changed

+80
-16
lines changed

2 files changed

+80
-16
lines changed

azure/ConsiderationsForServiceDesign.md

Lines changed: 69 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -293,20 +293,20 @@ The operation is initiated with a POST operation and the operation path ends in
293293

294294
```text
295295
POST /<service-or-resource-url>:<action>?api-version=2022-05-01
296-
Operation-Id: 22
297-
298-
{
299-
"arg1": 123
300-
"arg2": "abc"
301-
}
296+
Operation-Id: 22
297+
298+
{
299+
"arg1": 123
300+
"arg2": "abc"
301+
}
302302
```
303303

304304
The response is a `202 Accepted` as described above.
305305

306306
```text
307307
HTTP/1.1 202 Accepted
308308
Operation-Location: https://<status-monitor-endpoint>/22
309-
309+
310310
{
311311
"id": "22",
312312
"status": "NotStarted"
@@ -323,7 +323,7 @@ When the operation completes successfully, the result (if there is one) will be
323323

324324
```text
325325
HTTP/1.1 200 OK
326-
326+
327327
{
328328
"id": "22",
329329
"status": "Succeeded",
@@ -344,7 +344,7 @@ PUT /items/FooBar&api-version=2022-05-01
344344
Operation-Id: 22
345345
346346
{
347-
"prop1": 555,
347+
"prop1": 555,
348348
"prop2": "something"
349349
}
350350
```
@@ -358,13 +358,13 @@ The response may also include an `Operation-Location` header for backward compat
358358
If the resource supports ETags, the response may contain an `etag` header and possibly an `etag` property in the resource.
359359

360360
```text
361-
HTTP/1.1 201 Created
361+
HTTP/1.1 201 Created
362362
Operation-Id: 22
363363
Operation-Location: https://items/operations/22
364364
etag: "123abc"
365365
366366
{
367-
"id": "FooBar",
367+
"id": "FooBar",
368368
"etag": "123abc",
369369
"prop1": 555,
370370
"prop2": "something"
@@ -381,7 +381,7 @@ When the additional processing completes, the status monitor will indicate if it
381381

382382
```text
383383
HTTP/1.1 200 OK
384-
384+
385385
{
386386
"id": "22",
387387
"status": "Succeeded"
@@ -412,8 +412,8 @@ POST /<status-monitor-url>:cancel?api-version=2022-05-01
412412
A successful response to a control operation should be a `200 OK` with a representation of the status monitor.
413413

414414
```text
415-
HTTP/1.1 200 OK
416-
415+
HTTP/1.1 200 OK
416+
417417
{
418418
"id": "22",
419419
"status": "Canceled"
@@ -515,6 +515,61 @@ For example, the client can specify an `If-Match` header with the last ETag valu
515515
The service processes the update only if the ETag value in the header matches the ETag of the current resource on the server.
516516
By computing and returning ETags for your resources, you enable clients to avoid using a strategy where the "last write always wins."
517517

518+
## Returning String Offsets & Lengths (Substrings)
519+
520+
Some Azure services return substring offset & length values within a string. For example, the offset & length within a string to a name, email address, or phone #.
521+
When a service response includes a string, the client's programming language deserializes that string into that language's internal string encoding. Below are the possible encodings and examples of languages that use each encoding:
522+
523+
| Encoding | Example languages |
524+
| -------- | ------- |
525+
| UTF-8 | Go, Rust, Ruby, PHP |
526+
| UTF-16 | JavaScript, Java, C# |
527+
| CodePoint (UTF-32) | Python |
528+
529+
Because the service doesn't know what language a client is written in and what string encoding that language uses, the service can't return UTF-agnostic offset and length values that the client can use to index within the string. To address this, the service response must include offset & length values for all 3 possible encodings and then the client code must select the encoding it required by its language's internal string encoding.
530+
531+
For example, if a service response needed to identify offset & length values for "name" and "email" substrings, the JSON response would look like this:
532+
533+
```
534+
{
535+
(... other properties not shown...)
536+
"fullString": "(...some string containing a name and an email address...)",
537+
"name": {
538+
"offset": {
539+
"utf8": 12,
540+
"utf16": 10,
541+
      "codePoint": 4
542+
   },
543+
   "length": {
544+
   "uft8": 10,
545+
      "utf16": 8,
546+
      "codePoint": 2
547+
    }
548+
  },
549+
  "email": {
550+
 "offset": {
551+
      "utf8": 12,
552+
      "utf16": 10,
553+
      "codePoint": 4
554+
    },
555+
    "length": {
556+
      "uft8": 10,
557+
      "utf16": 8,
558+
      "codePoint": 4
559+
    }
560+
  }
561+
}
562+
```
563+
564+
Then, the Go developer, for example, would get the substring containing the name using code like this:
565+
566+
```
567+
var response := client.SomeMethodReturningJSONShownAbove(...)
568+
name := response.fullString[ response.name.offset.utf8 : response.name.offset.utf8 + response.name.length.utf8]
569+
```
570+
571+
The service must calculate the offset & length for all 3 encodings and return them because clients find it difficult working with Unicode encodings and how to convert from one encoding to another. In other words, we do this to simplify client development and ensure customer success when isolating a substring.
572+
518573
## Getting Help: The Azure REST API Stewardship Board
519574
The Azure REST API Stewardship board is a collection of dedicated architects that are passionate about helping Azure service teams build interfaces that are intuitive, maintainable, consistent, and most importantly, delight our customers. Because APIs affect nearly all downstream decisions, you are encouraged to reach out to the Stewardship board early in the development process. These architects will work with you to apply these guidelines and identify any hidden pitfalls in your design.
520575

azure/Guidelines.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Please ensure that you add an anchor tag to any new guidelines that you add and
1616

1717
| Date | Notes |
1818
| ----------- | -------------------------------------------------------------- |
19+
| 2024-Jan-17 | Added guidelines on returning string offsets & lengths |
1920
| 2023-May-12 | Explain service response for missing/unsupported `api-version` |
2021
| 2023-Apr-21 | Update/clarify guidelines on POST method repeatability |
2122
| 2023-Apr-07 | Update/clarify guidelines on polymorphism |
@@ -438,7 +439,7 @@ This indicates to client libraries and customers that values of the enumeration
438439

439440
Polymorphism types in REST APIs refers to the possibility to use the same property of a request or response to have similar but different shapes. This is commonly expressed as a `oneOf` in JsonSchema or OpenAPI. In order to simplify how to determine which specific type a given request or response payload corresponds to, Azure requires the use of an explicit discriminator field.
440441

441-
Note: Polymorphic types can make your service more difficult for nominally typed languages to consume. See the corresponding section in the [Considerations for service design](./ConsiderationsForServiceDesign.md#avoid-surprises) for more information.
442+
Note: Polymorphic types can make your service more difficult for nominally typed languages to consume. See the corresponding section in the [Considerations for service design](./ConsiderationsForServiceDesign.md#avoid-surprises) for more information.
442443

443444
<a href="#json-use-discriminator-for-polymorphism" name="json-use-discriminator-for-polymorphism">:white_check_mark:</a> **DO** define a discriminator field indicating the kind of the resource and include any kind-specific fields in the body.
444445

@@ -838,7 +839,7 @@ For example:
838839
### Repeatability of requests
839840

840841
Fault tolerant applications require that clients retry requests for which they never got a response, and services must handle these retried requests idempotently. In Azure, all HTTP operations are naturally idempotent except for POST used to create a resource and [POST when used to invoke an action](
841-
https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#performing-an-action).
842+
https://github.com/microsoft/api-guidelines/blob/vNext/azure/Guidelines.md#performing-an-action).
842843

843844
<a href="#repeatability-headers" name="repeatability-headers">:ballot_box_with_check:</a> **YOU SHOULD** support repeatable requests as defined in [OASIS Repeatable Requests Version 1.0](https://docs.oasis-open.org/odata/repeatable-requests/v1.0/repeatable-requests-v1.0.html) for POST operations to make them retriable.
844845
- The tracked time window (difference between the `Repeatability-First-Sent` value and the current time) **MUST** be at least 5 minutes.
@@ -1098,6 +1099,14 @@ While it may be tempting to use a revision/version number for the resource as th
10981099

10991100
<a href="#condreq-etag-depends-on-encoding" name="condreq-etag-depends-on-encoding">:white_check_mark:</a> **DO**, when supporting multiple representations (e.g. Content-Encodings) for the same resource, generate different ETag values for the different representations.
11001101

1102+
<a href="#substrings" name="substrings"></a>
1103+
### Returning String Offsets & Lengths (Substrings)
1104+
1105+
All string values in JSON are inherently Unicode and UTF-8 encoded, but clients written in a high-level programming language must work with strings in that language's string encoding, which may be UTF-8, UTF-16, or CodePoints (UTF-32).
1106+
When a service response includes a string offset or length value, it should specify these values in all 3 encodings to simplify client development and ensure customer success when isolating a substring.
1107+
1108+
<a href="#substrings-return-value-for-each-encoding" name="substrings-return-value-for-each-encoding">:white_check_mark:</a> **DO** include all 3 encodings (UTF-8, UTF-16, and CodePoint) for every string offset or length value in a service response.
1109+
11011110
<a href="#telemetry" name="telemetry"></a>
11021111
### Distributed Tracing & Telemetry
11031112
Azure SDK client guidelines specify that client libraries must send telemetry data through the `User-Agent` header, `X-MS-UserAgent` header, and Open Telemetry.

0 commit comments

Comments
 (0)