Skip to content

Commit 4c3c8b1

Browse files
committed
Guidance on searching and evaluating schemas
Some OAS features casually state that they depend on the type of data being examined, or implicitly carry ambiguity about how to determine how to parse the data. This section attempts to provide some guidance and limits, requiring only that implementations follow the unambiguous, statically deterministic keywords `$ref` and `allOf`. It also provides for just validating the data (when possible) and using the actual in-memory type when a schema is too complex to analyze statically. One use of this is breaking apart schemas to use them with mixed binary and JSON-compatible data, and a new section has been added to address that. Finally, a typo in a related section was fixed.
1 parent 22fbdc9 commit 4c3c8b1

File tree

1 file changed

+48
-1
lines changed

1 file changed

+48
-1
lines changed

src/oas.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,40 @@ The formats defined by the OAS are:
288288

289289
As noted under [Data Type](#data-types), both `type: number` and `type: integer` are considered to be numbers in the data model.
290290

291+
#### Determining Type and Structure
292+
293+
Several features of the OpenAPI Specification depend on detecting data characteristics such as type, format, media type, and object property or array item structure.
294+
295+
If the data is in a form that can be validated by the relevant Schema Object and is determined to be valid, implementations MUST support detecting characteristics such as JSON type or property or item structure from the data, whether it can be gleaned from the schema(s) or not.
296+
If `format` or the `content*` keywords are involved in further characterizing the data, these can be obtained as [annotation results](#extended-validation-with-annotations).
297+
298+
##### Locating Schemas and Keywords
299+
300+
When the data is in a non-JSON format, particularly one such as XML or various form media types where data is stored as strings without type information, it can be necessary to find this information through the relevant Schema Object to determine how to parse the format into a structure that can be validated by the schema.
301+
As schema organization can become very complex, implementations are not expected to handle every possible schema layout.
302+
However, given a known starting point schema (usually the value of the nearest `schema` field), implementations MUST search the following for the relevant keywords (e.g. `type`, `format`, `contentMediaType`, etc.):
303+
304+
* The starting point schema itself
305+
* Any schema reachable from there solely through `$ref` and/or `allOf`
306+
307+
These schemas are guaranteed to be applied to any instance.
308+
309+
In some cases, such as correlating [Encoding Objects](#encoding-object) with Schema Objects using fields in a [Media Type Object](#media-type-object), it is be necessary to first find a keyword such as `properties`, and then treat its subschema(s) as starting point schemas for further searches.
310+
311+
Implementations MAY analyze subschemas of other keywords such as `oneOf` or `dependentSchemas`, or possible `$dynamicRef` targets, and MUST document the extent and nature of such support.
312+
313+
##### Handling Multiple Types
314+
315+
When a `type` keyword with multiple values (e.g. `type: ["number", "null"]`) is found, implementations MUST attempt to use the types as follows, ignoring any types not present in the `type` list:
316+
317+
1. Determine if the data can be parsed as whichever of `null`, `number`, `object`, or `array` are present in the `type` list, treating `integer` as `number` for this step.
318+
2. If the data can be parsed as a number, and `integer` is in the `type` list, check to see if the value is a mathematical integer, regardless of its textual representation.
319+
3. If the data has not been parsed successfully and `string` is in the type list, parse it as a string.
320+
321+
This process is sufficient to produce data that can be validated by JSON Schema.
322+
If `format` or `content*` are needed for further parsing, they can be checked in the same way as `type`, or as annotations from the schema evaluation process.
323+
Parsing string contents based on `contentMediaType` carries the same security risks as parsing HTTP message bodies based on `Content-Type`, as noted under [Handling External Resources](#handling-external-resources).
324+
291325
#### Working with Binary Data
292326

293327
The OAS can describe either _raw_ or _encoded_ binary data.
@@ -309,7 +343,7 @@ Using a `contentEncoding` of `base64url` ensures that URL encoding (as required
309343

310344
The `contentMediaType` keyword is redundant if the media type is already set:
311345

312-
* as the key for a [MediaType Object](#media-type-object)
346+
* as the key for a [Media Type Object](#media-type-object)
313347
* in the `contentType` field of an [Encoding Object](#encoding-object)
314348

315349
If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored.
@@ -325,6 +359,19 @@ The following table shows how to migrate from OAS 3.0 binary data descriptions,
325359
| <code style="white-space:nowrap">type: string</code><br /><code style="white-space:nowrap">format: binary</code> | <code style="white-space:nowrap">contentMediaType: image/png</code> | if redundant, can be omitted, often resulting in an empty [Schema Object](#schema-object) |
326360
| <code style="white-space:nowrap">type: string</code><br /><code style="white-space:nowrap">format: byte</code> | <code style="white-space:nowrap">type: string</code><br /><code style="white-space:nowrap">contentMediaType: image/png</code><br /><code style="white-space:nowrap">contentEncoding: base64</code> | note that `base64url` can be used to avoid re-encoding the base64 string to be URL-safe |
327361

362+
##### Schema Evaluation and Binary Data
363+
364+
Evaluating a binary media type with a single Schema Object is straightforward, as it is usually simple check for [annotations](#extended-validation-with-annotations) as most assertions are not relevant, and `const` and `enum` cannot be used as they cannot hold binary data.
365+
However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for performing schema validation.
366+
367+
The simplest is to use a placeholder value, as schemas for binary data are generally written in a way that prevents any possible validation failure.
368+
However, it is possible that a complex schema might produce unexpected results if a particular value is allowed to be either binary or some other data type that happens to match the chosen placeholder.
369+
This risk could be reduced by trying multiple placeholders of different types.
370+
371+
Alternatively, implementations can use the procedures outlined under [Determining Type and Structure](#determining-type-and-structure) to find the property or item schemas to apply individually to the non-binary data, and handle the binary data separately as it would be handled if it were a separate document.
372+
373+
Implementations MUST document how such evaluations are handled, along with any expected limitations of the chosen technique(s).
374+
328375
### Rich Text Formatting
329376

330377
Throughout the specification `description` fields are noted as supporting CommonMark markdown formatting.

0 commit comments

Comments
 (0)