Skip to content

Commit 79ec626

Browse files
authored
Merge pull request #4793 from handrews/schema-inspect
v3.2: Provide parsing and serialization guidance
2 parents c097759 + 2359b8d commit 79ec626

File tree

1 file changed

+153
-5
lines changed

1 file changed

+153
-5
lines changed

src/oas.md

Lines changed: 153 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,9 @@ The behavior for Discriminator Object non-URI mappings and for the Operation Obj
322322

323323
Note that no aspect of implicit connection resolution changes how [URIs are resolved](#relative-references-in-api-description-uris), or restricts their possible targets.
324324

325-
### Data Types
325+
### Working with Data
326+
327+
#### Data Types
326328

327329
Data types in the OAS are based on the types defined by the [JSON Schema Validation Specification Draft 2020-12](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-6.1.1):
328330
"null", "boolean", "object", "array", "number", "string", or "integer".
@@ -332,7 +334,7 @@ JSON Schema keywords and `format` values operate on JSON "instances" which may b
332334

333335
Note that the `type` keyword allows `"integer"` as a value for convenience, but keyword and format applicability does not recognize integers as being of a distinct JSON type from other numbers because [[RFC8259|JSON]] itself does not make that distinction. Since there is no distinct JSON integer type, JSON Schema defines integers mathematically. This means that both `1` and `1.0` are [equivalent](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.2), and are both considered to be integers.
334336

335-
#### Data Type Format
337+
##### Data Type Format
336338

337339
As defined by the [JSON Schema Validation specification](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.3), data types can have an optional modifier keyword: `format`. As described in that specification, `format` is treated as a non-validating annotation by default; the ability to validate `format` varies across implementations.
338340

@@ -353,7 +355,141 @@ The formats defined by the OAS are:
353355

354356
As noted under [Data Type](#data-types), both `type: number` and `type: integer` are considered to be numbers in the data model.
355357

356-
#### Working with Binary Data
358+
#### Parsing and Serializing
359+
360+
API data has several forms:
361+
362+
1. The serialized form, which is either a document of a particular media type, an HTTP header value, or part of a URI.
363+
2. The data form, intended for use with a [Schema Object](#schema-object).
364+
3. The application form, which incorporates any additional information conveyed by JSON Schema keywords such as `format` and `contentType`, and possibly additional information such as class hierarchies that are beyond the scope of this specification, although they MAY be based on specification elements such as the [Discriminator Object](#discriminator-object) or guidance regarding [Data Modeling Techniques](#data-modeling-techniques).
365+
366+
##### JSON Data
367+
368+
JSON-serialized data is nearly equivalent to the data form because the [JSON Schema data model](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.1) is nearly equivalent to the JSON representation.
369+
The serialized UTF-8 JSON string `{"when": "1985-04-12T23:20:50.52"}` represents an object with one data field, named `when`, with a string value, `1985-04-12T23:20:50.52`.
370+
371+
The exact application form is beyond the scope of this specification, as can be shown with the following schema for our JSON instance:
372+
373+
```yaml
374+
type: object
375+
properties:
376+
when:
377+
type: string
378+
format: date-time
379+
```
380+
381+
Some applications might leave the string as a string regardless of programming language, while others might notice the `format` and use it as a `datetime.datetime` instance in Python, or a `java.time.ZonedDateTime` in Java.
382+
This specification only requires that the data is valid according to the schema, and that [annotations](#extended-validation-with-annotations) such as `format` are available in accordance with the JSON Schema specification.
383+
384+
##### Non-JSON Data
385+
386+
Non-JSON serializations can be substantially different from their corresponding data form, and might require several steps to parse.
387+
388+
To continue our "when" example, if we serialized the object as `application/x-www-form-urlencoded`, it would appear as the ASCII string `when=1985-04-12T23%3A20%3A50.52`.
389+
This example is still straightforward to use as it is all string data, and the only differences from JSON are the URI percent-encoding and the delimiter syntax (`=` instead of JSON punctuation and quoting).
390+
391+
However, many non-JSON text-based formats can be complex, requiring examination of the appropriate schema(s) in order to correctly parse the text into a schema-ready data structure.
392+
Serializing data into such formats requires either examining the schema-validated data or performing the same schema inspections.
393+
394+
When inspecting schemas, given a starting point schema, implementations MUST examine that schema and all schemas that can be reached from it by following only `$ref` and `allOf` keywords.
395+
These schemas are guaranteed to apply to any instance.
396+
When searching schemas for `type`, if the `type` keyword's value is a list of types and the serialized value can be successfully parsed as more than one of the types in the list, and no other findable `type` keyword disambiguates the actual required type, the behavior is implementation-defined.
397+
Schema Objects that do not contain `type` MUST be considered to allow all types, regardless of which other keywords are present (e.g. `maximum` applies to numbers, but _does not_ require the instance to be a number).
398+
399+
Implementations MAY inspect subschemas or possible reference targets of other keywords such as `oneOf` or `$dynamicRef`, but MUST NOT attempt to resolve ambiguities.
400+
For example, if an implementation opts to inspect `anyOf`, the schema:
401+
402+
```yaml
403+
anyOf:
404+
- type: number
405+
minimum: 0
406+
- type: number
407+
maximum: 100
408+
```
409+
410+
unambiguously indicates a numeric type, but the schema:
411+
412+
```yaml
413+
anyOf:
414+
- type: number
415+
- maximum: 100
416+
```
417+
418+
does not, because the second subschema allows all types.
419+
420+
Due to these limited requirements for searching schemas, serializers that have access to validated data MUST inspect the data if possible; implementations that either do not work with runtime data (such as code generators) or cannot access validated data for some reason MUST fall back to schema inspection.
421+
422+
Recall also that in JSON Schema, keywords that apply to a specific type (e.g. `pattern` applies to strings, `minimum` applies to numbers) _do not_ require or imply that the data will actually be of that type.
423+
424+
As an example of these processes, given these OpenAPI components:
425+
426+
```yaml
427+
components:
428+
requestBodies:
429+
Form:
430+
content:
431+
application/x-www-form-urlencoded:
432+
schema:
433+
$ref: "#/components/schemas/FormData"
434+
encoding:
435+
extra:
436+
contentType: application/xml
437+
schemas:
438+
FormData:
439+
type: object
440+
properties:
441+
code:
442+
allOf:
443+
- type: [string, number]
444+
pattern: "1"
445+
minimum: 0
446+
- type: string
447+
pattern: "2"
448+
count:
449+
type: integer
450+
extra:
451+
type: object
452+
```
453+
454+
And this request body to parse into its data form:
455+
456+
```uri
457+
code=1234&count=42&extra=%3Cinfo%3Eabc%3C/info%3E
458+
```
459+
460+
We must first search the schema for `properties` or other property-defining keywords, and then use each property schema as a starting point for a search for that property's `type` keyword, as follows (the exact order is implementation-defined):
461+
462+
* `#/components/requestBodies/Form/content/application~1x-www-form-urlencoded/schema` (initial starting point schema, only `$ref`)
463+
* `#/components/schemas/FormData` (follow `$ref`, found `properties`)
464+
* `#/components/schemas/FormData/properties/code` (starting point schema for `code` property)
465+
* `#/components/schemas/FormData/properties/code/allOf/0` (follow `allOf`, found `type: [string, number]`)
466+
* `#/components/schemas/FormData/properties/code/allOf/1` (follow `allOf`, found `type: string`)
467+
* `#/components/schemas/FormData/properties/count` (starting point schema for `count` property, found `type: integer`)
468+
* `#/components/schemas/FormData/properties/extra` (starting point schema for `extra` property, found `type: object`)
469+
470+
Note that for `code` we first found an ambiguous `type`, but then found another `type` keyword that ensures only one of the two possibilities is valid.
471+
472+
From this inspection, we determine that `code` is a string that happens to look like a number, while `count` needs to be parsed into a number _prior_ to schema validation.
473+
Furthermore, the `extra` string is in fact an XML serialization of an object containing an `info` property.
474+
This means that the data form of this serialization is equivalent to the following JSON object:
475+
476+
```json
477+
{
478+
"code": "1234",
479+
"count": 42
480+
"extra": {
481+
"info": "abc"
482+
}
483+
}
484+
```
485+
486+
Serializing this object also requires correlating properties with [Encoding Objects](#encoding-object), and may require inspection to determine a default value of the `contentType` field.
487+
If validated data is not available, the schema inspection process is identical to that shown for parsing.
488+
489+
In this example, both `code` and `count` are of primitive type and do not appear in the `encoding` field, and are therefore serialized as plain text.
490+
However, the `extra` field is an object, which would by default be serialized as JSON, but the `extra` entry in the `encoding` field tells use to serialize it as XML instead.
491+
492+
##### Working with Binary Data
357493

358494
The OAS can describe either _raw_ or _encoded_ binary data.
359495

@@ -381,7 +517,19 @@ If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON
381517

382518
See [Complete vs Streaming Content](#complete-vs-streaming-content) for guidance on streaming binary payloads.
383519

384-
##### Migrating binary descriptions from OAS 3.0
520+
###### Schema Evaluation and Binary Data
521+
522+
Few JSON Schema implementations directly support working with binary data, as doing so is not a mandatory part of that specification.
523+
524+
OAS Implementations that do not have access to a binary-instance-supporting JSON Schema implementation MUST examine schemas and apply them in accordance with [Working with Binary Data](#working-with-binary-data).
525+
When the entire instance is binary, this is straightforward as few keywords are relevant.
526+
527+
However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations:
528+
529+
1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords).
530+
2. Inspect the schema(s) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data.
531+
532+
###### Migrating binary descriptions from OAS 3.0
385533

386534
The following table shows how to migrate from OAS 3.0 binary data descriptions, continuing to use `image/png` as the example binary media type:
387535

@@ -1689,7 +1837,7 @@ These fields MAY be used either with or without the RFC6570-style serialization
16891837

16901838
| Field Name | Type | Description |
16911839
| ---- | :----: | ---- |
1692-
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. |
1840+
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). The default value depends on the type as shown in the table below. |
16931841
| <a name="encoding-headers"></a>headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. |
16941842
| <a name="encoding-encoding"></a>encoding | Map[`string`, [Encoding Object](#encoding-object)] | Applies nested Encoding Objects in the same manner as the [Media Type Object](#media-type-object)'s `encoding` field. |
16951843
| <a name="encoding-prefix-encoding"></a>prefixEncoding | [[Encoding Object](#encoding-object)] | Applies nested Encoding Objects in the same manner as the [Media Type Object](#media-type-object)'s `prefixEncoding` field. |

0 commit comments

Comments
 (0)