From 62d67f385acffa40c98a54b178c9ded55012c9c9 Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Thu, 17 Jul 2025 16:50:23 -0700 Subject: [PATCH 1/5] Provide parsing and serialization guidance This creates a "Working with Data" section that incorporates the existing "Data Types" section (with some section level adjustments) along with new guidance on mapping different kinds of data between serialized, data, and application forms. This terminology matches the terminology currently being considered for examples. The application form is largely out of scope for the OAS, and is mainly included to clarify this scope while acknowledging that the OAS may influence such things. Most of the new material is on parsing and serializing, briefly addressing JSON as the common case before going into detail on non-JSON data, with examples. This is where the requirements for schema and/or instance inspection/searching are listed. The only additional change is no longer mentioning the property schema in the Encoding Object, in part because with the new `multipart/mixed` support Encoding Objects can be used with arrays as well as objects. --- src/oas.md | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 127 insertions(+), 5 deletions(-) diff --git a/src/oas.md b/src/oas.md index c3b6b33141..ee58932668 100644 --- a/src/oas.md +++ b/src/oas.md @@ -257,7 +257,9 @@ The behavior for Discriminator Object non-URI mappings and for the Operation Obj Note that no aspect of implicit connection resolution changes how [URIs are resolved](#relative-references-in-api-description-uris), or restricts their possible targets. -### Data Types +### Working with Data + +#### Data Types Data types in the OAS are based on the types defined by the [JSON Schema Validation Specification Draft 2020-12](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-6.1.1): "null", "boolean", "object", "array", "number", "string", or "integer". @@ -267,7 +269,7 @@ JSON Schema keywords and `format` values operate on JSON "instances" which may b Note that the `type` keyword allows `"integer"` as a value for convenience, but keyword and format applicability does not recognize integers as being of a distinct JSON type from other numbers because [[RFC8259|JSON]] itself does not make that distinction. Since there is no distinct JSON integer type, JSON Schema defines integers mathematically. This means that both `1` and `1.0` are [equivalent](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.2), and are both considered to be integers. -#### Data Type Format +##### Data Type Format As defined by the [JSON Schema Validation specification](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.3), data types can have an optional modifier keyword: `format`. As described in that specification, `format` is treated as a non-validating annotation by default; the ability to validate `format` varies across implementations. @@ -288,7 +290,115 @@ The formats defined by the OAS are: As noted under [Data Type](#data-types), both `type: number` and `type: integer` are considered to be numbers in the data model. -#### Working with Binary Data +#### Parsing and Serializing + +API data has three forms: + +1. The serialized form, which is either a document of a particular media type, part of an HTTP header value, or part of a URI. +2. The data form, intended for use with a [Schema Object](#schema-object). +3. The application form, which incorporates any additional information conveyed by JSON Schema keywords such as `format` and `contentType`, and possibly additional information such as class hierarchies that are beyond the scope of this specification, although they MAY be based on specification elements such as the [Discriminator Object](#discriminator-object) or guidance regarding [Data Modeling Techniques](#data-modeling-techniques). + +##### JSON Data + +JSON-serialized data is nearly equivalent to the data form because the [JSON Schema data model](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.1) is nearly equivalent to the JSON representation. +The serialized UTF-8 JSON string `{"when": "1985-04-12T23%3A20%3A50.52"}` represents an object with one data field, named `when`, with a string value, `1985-04-12T23%3A20%3A50.52`. + +The exact application form is beyond the scope of this specification, as can be shown with the following schema for our JSON instance: + +```yaml +type: object +properties: + when: + type: string + format: date-time +``` + +Some applications might leave the string as a string regardless of programming language, while others might notice the `format` and use it as a `datetime.datetime` instance in Python, or a `java.time.ZonedDateTime` in Java. +This specification only requires that the data is valid according to the schema, and that [annotations](#extended-validation-with-annotations) such as `format` are available in accordance with the JSON Schema specification. + +##### Non-JSON Data + +Non-JSON serializetions can be substantially different from their corresponding data form, and might require several steps to parse. + +To continue our "when" example, if we serialized the object as `application/x-www-form-urlencoded`, it would appear as the ASCII string `when=1985-04-12T23%3A20%3A50.52`. +This example is still straightforward to use as it is all string data, and the only differences from JSON are the URI percent-encoding and the delimiter syntax (`=` instead of JSON punctuation and quoting). + +However, many non-JSON text-based formats can be complex, requiring examination of the appropriate schema(s) in order to correctly parse the text into a schema-ready data structure. +Serializing data into such formats requires either examing the schema-validated data or performing the same schema inspections. + +When inspecting schemas, given a starting point schema, implementations MUST examine that schema and all schemas that can be reached from it by following only `$ref` and `allOf` keywords. +These schemas are guaranteed to apply to any instance. + +Due to this limited requirement for searching schemas, serializers that have access to validated data MUST inspect the data if possible; implementations that either do not work with runtime data (such as code generators) or cannot access validated data for some reason MUST fall back to schema inspection. + +When searching schemas for `type`, if the `type` keyword's value is a list of types and the serialized value can be successfully parsed as more than one of the types in the list, the behavior is implementation-defined. + +As an example of these processes, given these OpenAPI components: + +```yaml +components: + requestBodies: + Form: + content: + application/x-www-form-urlencoded: + schema: + $ref: "#/components/schemas/FormData" + encoding: + extra: + contentType: application/xml + schemas: + FormData: + type: object + properties: + code: + allOf: + - type: string + pattern: "1" + - type: string + pattern: "2" + count: + type: integer + extra: + type: object +``` + +And this request body to parse into its data form: + +```uri +code=1234&count=42&extra=%3Cinfo%3Eabc%3C/info%3E +``` + +We must first search the schema for `properties` or other property-defining keywords, and then use each property schema as a starting point for a search for that property's `type` keyword, as follows (the exact order is implementation-defined): + +* `#/components/requestBodies/Form/content/application~1x-www-form-urlencoded/schema` (initial starting point schema, only `$ref`) +* `#/components/schemas/FormData` (follow `$ref`, found `properties`) +* `#/components/schemas/FormData/properties/code` (starting point schema for `code` property) +* `#/components/schemas/FormData/properties/code/allOf/0` (follow `allOf`, but no `type`) +* `#/components/schemas/FormData/properties/code/allOf/1` (follow `allOf`, found `type: string`) +* `#/components/schemas/FormData/properties/count` (starting point schema for `count` property, found `type: integer`) +* `#/components/schemas/FormData/properties/extra` (starting point schema for `count` property, found `type: object`) + +From this, we determine that `code` is a string that happens to look like a number, while `count` needs to be parsed into a number _prior_ to schema validation. +Furthermore, the `extra` string is in fact an XML serialization of an object containing an `info` property. +This means that the data form of this serialization is equivalent to the following JSON object: + +```json +{ + "code": "1234", + "count": 42 + "extra": { + "info": "abc" + } +} +``` + +Serializing this object also requires correlating properties with [Encoding Objects](#encoding-object), and may require inspection to determine a default value of the `contentType` field. +If validated data is not available, the schema inspection process is identical to that shown for parsing. + +In this example, both `code` and `count` are of primitive type and do not appear in the `encoding` field, and are therefore serialized as plain text. +However, the `extra` field is an object, which would by default be serialized as JSON, but the `extra` entry in the `encoding` field tells use to serialize it as XML instead. + +##### Working with Binary Data The OAS can describe either _raw_ or _encoded_ binary data. @@ -316,7 +426,19 @@ If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON See [Complete vs Streaming Content](#complete-vs-streaming-content) for guidance on streaming binary payloads. -##### Migrating binary descriptions from OAS 3.0 +###### Schema Evaluation and Binary Data + +Few JSON Schema implementations directly support working with binary data, as doing so is not a mandatory part of that specification. + +OAS Implementations that do not have access to a binary-instance-supporting JSON Schema implementation MUST examine schemas and apply them in accordance with [Working with Binary Data](#working-with-binary-data). +When the entire instance is binary, this is straightforward as few keywords are relevant. + +However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations: + +1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords). +2. Inspect the schema(s) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data. + +###### Migrating binary descriptions from OAS 3.0 The following table shows how to migrate from OAS 3.0 binary data descriptions, continuing to use `image/png` as the example binary media type: @@ -1639,7 +1761,7 @@ These fields MAY be used either with or without the RFC6570-style serialization | Field Name | Type | Description | | ---- | :----: | ---- | -| contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. | +| contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). The default value depends on the type as shown in the table below. | | headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. | This object MAY be extended with [Specification Extensions](#specification-extensions). From 9fb9893fb35b6c3959c29909e09e9d099fa820a3 Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Fri, 18 Jul 2025 10:16:27 -0700 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Ralf Handl --- src/oas.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/oas.md b/src/oas.md index ee58932668..1488d65946 100644 --- a/src/oas.md +++ b/src/oas.md @@ -318,13 +318,13 @@ This specification only requires that the data is valid according to the schema, ##### Non-JSON Data -Non-JSON serializetions can be substantially different from their corresponding data form, and might require several steps to parse. +Non-JSON serializations can be substantially different from their corresponding data form, and might require several steps to parse. To continue our "when" example, if we serialized the object as `application/x-www-form-urlencoded`, it would appear as the ASCII string `when=1985-04-12T23%3A20%3A50.52`. This example is still straightforward to use as it is all string data, and the only differences from JSON are the URI percent-encoding and the delimiter syntax (`=` instead of JSON punctuation and quoting). However, many non-JSON text-based formats can be complex, requiring examination of the appropriate schema(s) in order to correctly parse the text into a schema-ready data structure. -Serializing data into such formats requires either examing the schema-validated data or performing the same schema inspections. +Serializing data into such formats requires either examining the schema-validated data or performing the same schema inspections. When inspecting schemas, given a starting point schema, implementations MUST examine that schema and all schemas that can be reached from it by following only `$ref` and `allOf` keywords. These schemas are guaranteed to apply to any instance. @@ -376,7 +376,7 @@ We must first search the schema for `properties` or other property-defining keyw * `#/components/schemas/FormData/properties/code/allOf/0` (follow `allOf`, but no `type`) * `#/components/schemas/FormData/properties/code/allOf/1` (follow `allOf`, found `type: string`) * `#/components/schemas/FormData/properties/count` (starting point schema for `count` property, found `type: integer`) -* `#/components/schemas/FormData/properties/extra` (starting point schema for `count` property, found `type: object`) +* `#/components/schemas/FormData/properties/extra` (starting point schema for `extra` property, found `type: object`) From this, we determine that `code` is a string that happens to look like a number, while `count` needs to be parsed into a number _prior_ to schema validation. Furthermore, the `extra` string is in fact an XML serialization of an object containing an `info` property. From 5661d71da3190390f0574344407131d624c8a18a Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Fri, 18 Jul 2025 11:00:06 -0700 Subject: [PATCH 3/5] Expand type guidance and optional further inspection Provide examples of narrowing multiple types, and make it clear that every schema without a `type` keyword allows all types. Also note that implementations MAY go beyond these requirements, but set boundaries on what they can do. --- src/oas.md | 38 ++++++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/src/oas.md b/src/oas.md index 1488d65946..933658604a 100644 --- a/src/oas.md +++ b/src/oas.md @@ -301,7 +301,7 @@ API data has three forms: ##### JSON Data JSON-serialized data is nearly equivalent to the data form because the [JSON Schema data model](https://www.ietf.org/archive/id/draft-bhutton-json-schema-01.html#section-4.2.1) is nearly equivalent to the JSON representation. -The serialized UTF-8 JSON string `{"when": "1985-04-12T23%3A20%3A50.52"}` represents an object with one data field, named `when`, with a string value, `1985-04-12T23%3A20%3A50.52`. +The serialized UTF-8 JSON string `{"when": "1985-04-12T23:20:50.52"}` represents an object with one data field, named `when`, with a string value, `1985-04-12T23:20:50.52`. The exact application form is beyond the scope of this specification, as can be shown with the following schema for our JSON instance: @@ -328,10 +328,33 @@ Serializing data into such formats requires either examining the schema-validate When inspecting schemas, given a starting point schema, implementations MUST examine that schema and all schemas that can be reached from it by following only `$ref` and `allOf` keywords. These schemas are guaranteed to apply to any instance. +When searching schemas for `type`, if the `type` keyword's value is a list of types and the serialized value can be successfully parsed as more than one of the types in the list, and no other findable `type` keyword disambiguates the actual required type, the behavior is implementation-defined. +Schema Objects that do not contain `type` MUST be considered to allow all types, regardless of which other keywords are present (e.g. `maximum` applies to numbers, but _does not_ require the instance to be a number). -Due to this limited requirement for searching schemas, serializers that have access to validated data MUST inspect the data if possible; implementations that either do not work with runtime data (such as code generators) or cannot access validated data for some reason MUST fall back to schema inspection. +Implementations MAY inspect subschemas or possible reference targets of other keywords such as `oneOf` or `$dynamicRef`, but MUST NOT attempt to resolve ambiguities. +For example, if an implementation opts to inspect `anyOf`, the schema: -When searching schemas for `type`, if the `type` keyword's value is a list of types and the serialized value can be successfully parsed as more than one of the types in the list, the behavior is implementation-defined. +```yaml +anyOf: +- type: number + minimum: 0 +- type: number + maximum: 100 +``` + +unambiguously indicates a numeric type, but the schema: + +```yaml +anyOf: +- type: number +- maximum: 100 +``` + +does not, because the second subschema allows all types. + +Due to these limited requirements for searching schemas, serializers that have access to validated data MUST inspect the data if possible; implementations that either do not work with runtime data (such as code generators) or cannot access validated data for some reason MUST fall back to schema inspection. + +Recall also that in JSON Schema, keywords that apply to a specific type (e.g. `pattern` applies to strings, `minimum` applies to numbers) _do not_ require or imply that the data will actually be of that type. As an example of these processes, given these OpenAPI components: @@ -352,8 +375,9 @@ components: properties: code: allOf: - - type: string + - type: [string, number] pattern: "1" + minimum: 0 - type: string pattern: "2" count: @@ -373,12 +397,14 @@ We must first search the schema for `properties` or other property-defining keyw * `#/components/requestBodies/Form/content/application~1x-www-form-urlencoded/schema` (initial starting point schema, only `$ref`) * `#/components/schemas/FormData` (follow `$ref`, found `properties`) * `#/components/schemas/FormData/properties/code` (starting point schema for `code` property) -* `#/components/schemas/FormData/properties/code/allOf/0` (follow `allOf`, but no `type`) +* `#/components/schemas/FormData/properties/code/allOf/0` (follow `allOf`, found `type: [string, number]`) * `#/components/schemas/FormData/properties/code/allOf/1` (follow `allOf`, found `type: string`) * `#/components/schemas/FormData/properties/count` (starting point schema for `count` property, found `type: integer`) * `#/components/schemas/FormData/properties/extra` (starting point schema for `extra` property, found `type: object`) -From this, we determine that `code` is a string that happens to look like a number, while `count` needs to be parsed into a number _prior_ to schema validation. +Note that for `code` we first found an ambiguous `type`, but then found another `type` keyword that ensures only one of the two possibilities is valid. + +From this inspection, we determine that `code` is a string that happens to look like a number, while `count` needs to be parsed into a number _prior_ to schema validation. Furthermore, the `extra` string is in fact an XML serialization of an object containing an `info` property. This means that the data form of this serialization is equivalent to the following JSON object: From 2600500b6a095c842977278eab74bb62d8fb999c Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Thu, 24 Jul 2025 09:41:55 -0700 Subject: [PATCH 4/5] maintainable --- src/oas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oas.md b/src/oas.md index 933658604a..02f24b1b6a 100644 --- a/src/oas.md +++ b/src/oas.md @@ -292,7 +292,7 @@ As noted under [Data Type](#data-types), both `type: number` and `type: integer` #### Parsing and Serializing -API data has three forms: +API data has several forms: 1. The serialized form, which is either a document of a particular media type, part of an HTTP header value, or part of a URI. 2. The data form, intended for use with a [Schema Object](#schema-object). From 2359b8d2dc0d0df63e27feb4490502f3cb42120b Mon Sep 17 00:00:00 2001 From: Henry Andrews Date: Thu, 24 Jul 2025 09:42:09 -0700 Subject: [PATCH 5/5] More correct from review feedback --- src/oas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oas.md b/src/oas.md index 02f24b1b6a..f686bdf596 100644 --- a/src/oas.md +++ b/src/oas.md @@ -294,7 +294,7 @@ As noted under [Data Type](#data-types), both `type: number` and `type: integer` API data has several forms: -1. The serialized form, which is either a document of a particular media type, part of an HTTP header value, or part of a URI. +1. The serialized form, which is either a document of a particular media type, an HTTP header value, or part of a URI. 2. The data form, intended for use with a [Schema Object](#schema-object). 3. The application form, which incorporates any additional information conveyed by JSON Schema keywords such as `format` and `contentType`, and possibly additional information such as class hierarchies that are beyond the scope of this specification, although they MAY be based on specification elements such as the [Discriminator Object](#discriminator-object) or guidance regarding [Data Modeling Techniques](#data-modeling-techniques).