Skip to content

Commit 62d67f3

Browse files
committed
Provide parsing and serialization guidance
This creates a "Working with Data" section that incorporates the existing "Data Types" section (with some section level adjustments) along with new guidance on mapping different kinds of data between serialized, data, and application forms. This terminology matches the terminology currently being considered for examples. The application form is largely out of scope for the OAS, and is mainly included to clarify this scope while acknowledging that the OAS may influence such things. Most of the new material is on parsing and serializing, briefly addressing JSON as the common case before going into detail on non-JSON data, with examples. This is where the requirements for schema and/or instance inspection/searching are listed. The only additional change is no longer mentioning the property schema in the Encoding Object, in part because with the new `multipart/mixed` support Encoding Objects can be used with arrays as well as objects.
1 parent b19e612 commit 62d67f3

File tree

1 file changed

+127
-5
lines changed

1 file changed

+127
-5
lines changed

src/oas.md

Lines changed: 127 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,9 @@ The behavior for Discriminator Object non-URI mappings and for the Operation Obj
257257

258258
Note that no aspect of implicit connection resolution changes how [URIs are resolved](#relative-references-in-api-description-uris), or restricts their possible targets.
259259

260-
### Data Types
260+
### Working with Data
261+
262+
#### Data Types
261263

262264
Data types in the OAS are based on the types defined by the [JSON Schema Validation Specification Draft 2020-12](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-6.1.1):
263265
"null", "boolean", "object", "array", "number", "string", or "integer".
@@ -267,7 +269,7 @@ JSON Schema keywords and `format` values operate on JSON "instances" which may b
267269

268270
Note that the `type` keyword allows `"integer"` as a value for convenience, but keyword and format applicability does not recognize integers as being of a distinct JSON type from other numbers because [[RFC8259|JSON]] itself does not make that distinction. Since there is no distinct JSON integer type, JSON Schema defines integers mathematically. This means that both `1` and `1.0` are [equivalent](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.2), and are both considered to be integers.
269271

270-
#### Data Type Format
272+
##### Data Type Format
271273

272274
As defined by the [JSON Schema Validation specification](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.3), data types can have an optional modifier keyword: `format`. As described in that specification, `format` is treated as a non-validating annotation by default; the ability to validate `format` varies across implementations.
273275

@@ -288,7 +290,115 @@ The formats defined by the OAS are:
288290

289291
As noted under [Data Type](#data-types), both `type: number` and `type: integer` are considered to be numbers in the data model.
290292

291-
#### Working with Binary Data
293+
#### Parsing and Serializing
294+
295+
API data has three forms:
296+
297+
1. The serialized form, which is either a document of a particular media type, part of an HTTP header value, or part of a URI.
298+
2. The data form, intended for use with a [Schema Object](#schema-object).
299+
3. The application form, which incorporates any additional information conveyed by JSON Schema keywords such as `format` and `contentType`, and possibly additional information such as class hierarchies that are beyond the scope of this specification, although they MAY be based on specification elements such as the [Discriminator Object](#discriminator-object) or guidance regarding [Data Modeling Techniques](#data-modeling-techniques).
300+
301+
##### JSON Data
302+
303+
JSON-serialized data is nearly equivalent to the data form because the [JSON Schema data model](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.1) is nearly equivalent to the JSON representation.
304+
The serialized UTF-8 JSON string `{"when": "1985-04-12T23%3A20%3A50.52"}` represents an object with one data field, named `when`, with a string value, `1985-04-12T23%3A20%3A50.52`.
305+
306+
The exact application form is beyond the scope of this specification, as can be shown with the following schema for our JSON instance:
307+
308+
```yaml
309+
type: object
310+
properties:
311+
when:
312+
type: string
313+
format: date-time
314+
```
315+
316+
Some applications might leave the string as a string regardless of programming language, while others might notice the `format` and use it as a `datetime.datetime` instance in Python, or a `java.time.ZonedDateTime` in Java.
317+
This specification only requires that the data is valid according to the schema, and that [annotations](#extended-validation-with-annotations) such as `format` are available in accordance with the JSON Schema specification.
318+
319+
##### Non-JSON Data
320+
321+
Non-JSON serializetions can be substantially different from their corresponding data form, and might require several steps to parse.
322+
323+
To continue our "when" example, if we serialized the object as `application/x-www-form-urlencoded`, it would appear as the ASCII string `when=1985-04-12T23%3A20%3A50.52`.
324+
This example is still straightforward to use as it is all string data, and the only differences from JSON are the URI percent-encoding and the delimiter syntax (`=` instead of JSON punctuation and quoting).
325+
326+
However, many non-JSON text-based formats can be complex, requiring examination of the appropriate schema(s) in order to correctly parse the text into a schema-ready data structure.
327+
Serializing data into such formats requires either examing the schema-validated data or performing the same schema inspections.
328+
329+
When inspecting schemas, given a starting point schema, implementations MUST examine that schema and all schemas that can be reached from it by following only `$ref` and `allOf` keywords.
330+
These schemas are guaranteed to apply to any instance.
331+
332+
Due to this limited requirement for searching schemas, serializers that have access to validated data MUST inspect the data if possible; implementations that either do not work with runtime data (such as code generators) or cannot access validated data for some reason MUST fall back to schema inspection.
333+
334+
When searching schemas for `type`, if the `type` keyword's value is a list of types and the serialized value can be successfully parsed as more than one of the types in the list, the behavior is implementation-defined.
335+
336+
As an example of these processes, given these OpenAPI components:
337+
338+
```yaml
339+
components:
340+
requestBodies:
341+
Form:
342+
content:
343+
application/x-www-form-urlencoded:
344+
schema:
345+
$ref: "#/components/schemas/FormData"
346+
encoding:
347+
extra:
348+
contentType: application/xml
349+
schemas:
350+
FormData:
351+
type: object
352+
properties:
353+
code:
354+
allOf:
355+
- type: string
356+
pattern: "1"
357+
- type: string
358+
pattern: "2"
359+
count:
360+
type: integer
361+
extra:
362+
type: object
363+
```
364+
365+
And this request body to parse into its data form:
366+
367+
```uri
368+
code=1234&count=42&extra=%3Cinfo%3Eabc%3C/info%3E
369+
```
370+
371+
We must first search the schema for `properties` or other property-defining keywords, and then use each property schema as a starting point for a search for that property's `type` keyword, as follows (the exact order is implementation-defined):
372+
373+
* `#/components/requestBodies/Form/content/application~1x-www-form-urlencoded/schema` (initial starting point schema, only `$ref`)
374+
* `#/components/schemas/FormData` (follow `$ref`, found `properties`)
375+
* `#/components/schemas/FormData/properties/code` (starting point schema for `code` property)
376+
* `#/components/schemas/FormData/properties/code/allOf/0` (follow `allOf`, but no `type`)
377+
* `#/components/schemas/FormData/properties/code/allOf/1` (follow `allOf`, found `type: string`)
378+
* `#/components/schemas/FormData/properties/count` (starting point schema for `count` property, found `type: integer`)
379+
* `#/components/schemas/FormData/properties/extra` (starting point schema for `count` property, found `type: object`)
380+
381+
From this, we determine that `code` is a string that happens to look like a number, while `count` needs to be parsed into a number _prior_ to schema validation.
382+
Furthermore, the `extra` string is in fact an XML serialization of an object containing an `info` property.
383+
This means that the data form of this serialization is equivalent to the following JSON object:
384+
385+
```json
386+
{
387+
"code": "1234",
388+
"count": 42
389+
"extra": {
390+
"info": "abc"
391+
}
392+
}
393+
```
394+
395+
Serializing this object also requires correlating properties with [Encoding Objects](#encoding-object), and may require inspection to determine a default value of the `contentType` field.
396+
If validated data is not available, the schema inspection process is identical to that shown for parsing.
397+
398+
In this example, both `code` and `count` are of primitive type and do not appear in the `encoding` field, and are therefore serialized as plain text.
399+
However, the `extra` field is an object, which would by default be serialized as JSON, but the `extra` entry in the `encoding` field tells use to serialize it as XML instead.
400+
401+
##### Working with Binary Data
292402

293403
The OAS can describe either _raw_ or _encoded_ binary data.
294404

@@ -316,7 +426,19 @@ If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON
316426

317427
See [Complete vs Streaming Content](#complete-vs-streaming-content) for guidance on streaming binary payloads.
318428

319-
##### Migrating binary descriptions from OAS 3.0
429+
###### Schema Evaluation and Binary Data
430+
431+
Few JSON Schema implementations directly support working with binary data, as doing so is not a mandatory part of that specification.
432+
433+
OAS Implementations that do not have access to a binary-instance-supporting JSON Schema implementation MUST examine schemas and apply them in accordance with [Working with Binary Data](#working-with-binary-data).
434+
When the entire instance is binary, this is straightforward as few keywords are relevant.
435+
436+
However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations:
437+
438+
1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords).
439+
2. Inspect the schema(s) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data.
440+
441+
###### Migrating binary descriptions from OAS 3.0
320442

321443
The following table shows how to migrate from OAS 3.0 binary data descriptions, continuing to use `image/png` as the example binary media type:
322444

@@ -1639,7 +1761,7 @@ These fields MAY be used either with or without the RFC6570-style serialization
16391761

16401762
| Field Name | Type | Description |
16411763
| ---- | :----: | ---- |
1642-
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. |
1764+
| <a name="encoding-content-type"></a>contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). The default value depends on the type as shown in the table below. |
16431765
| <a name="encoding-headers"></a>headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. |
16441766

16451767
This object MAY be extended with [Specification Extensions](#specification-extensions).

0 commit comments

Comments
 (0)