Skip to content

v3.1: application/x-www-form-urlencoded potential inconsistency between examples and referenced RFC #4813

@mnahkies

Description

@mnahkies

tldr;
I think that for application/x-www-form-urlencoded content types, all special characters including : should always be percent-encoded, excepting spaces should always be the special case of encoded as a +

Though I do appreciate it is probably easy to find examples of languages/libraries where this isn't followed 😅 and so it maybe difficult stipulate as a MUST...

Describe the error in the specification
With respect to the examples given here:

RFC 1866 is referenced, and from my skimming of the RFC the only section that seems relevant in this context to me is 8.2 Form Submission which defines serialization of `application/x-www-form-urlencoded', specifically:

The form field names and values are escaped: space characters are replaced by +, and then reserved characters are escaped as per [URL]; that is, non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks, as in multi-line text field values, are represented as CR LF pairs, i.e. %0D%0A.

The example searilization given in the OAI specification under 4.8.15.2.1 is then:

id=f81d4fae-7dec-11d0-a765-00a0c91e6bf6&address=%7B%22streetAddress%22:%22123+Example+Dr.%22,%22city%22:%22Somewhere%22,%22state%22:%22CA%22,%22zip%22:%2299999%2B1234%22%7D

Given that in this example, we have a field name address with a value that happens to be (ie: form data isn't aware of this, its just an opaque string AFAIK) a JSON serialized string, I'd expect all special characters to be percent-encoded, giving a result of:

id=f81d4fae-7dec-11d0-a765-00a0c91e6bf6&address=%7B%22streetAddress%22%3A%22123+Example+Dr.%22%2C%22city%22%3A%22Somewhere%22%2C%22state%22%3A%22CA%22%2C%22zip%22%3A%2299999%2B1234%22%7D

This matches the behavior of URLSearchParams on web/nodejs, eg:

const params = new URLSearchParams();
params.append("id", "f81d4fae-7dec-11d0-a765-00a0c91e6bf6");
params.append("address", JSON.stringify({streetAddress: "123 Example Dr.", city: "Somewhere", state: "CA", zip: "99999+1234"}));
params.toString();

Yields the above, and it will also round-trip back, eg: using JSON.parse(new URLSearchParams(params.toString()).get("address"))

The behavior here is very likely to differ across platforms / implementations, but as a quick sense-check I observe the same behavior in java/kotlin using both java.net.URLEncoder (playground example) and org.apache.http.client.utils.URLEncodedUtils which increases my confidence in my interpretation of the RFC.

org.apache.httpcomponents:httpclient:4.5.13

// depends on `implementation("org.apache.httpcomponents:httpclient:4.5.13")`
import org.apache.http.client.utils.URLEncodedUtils
import org.apache.http.message.BasicNameValuePair
import java.nio.charset.StandardCharsets

fun main() {
    val params = listOf(
        BasicNameValuePair("id", "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"),
        BasicNameValuePair("address", """{"streetAddress":"123 Example Dr.","city":"Somewhere","state":"CA","zip":"99999+1234"}""")
    )
    val encoded = URLEncodedUtils.format(params, StandardCharsets.UTF_8)
    println(encoded)
}

There are several other examples under section 4.8.12.4 that exhibit similar discrepancy:

  • style: form, explode: false: , not escaped for array and object
  • style: spaceDelimited, explode: false: replaced with %20 instead of with +

Later in appendix E.5, reference is made to RFC 3986

I haven't fully read and understood that RFC, but my current interpretation of statements such as:

URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.

Mean that gen-delims including : etc indicates that a : falling within the query, Should still be percent encoded.

query       = *( pchar / "/" / "?" )

(pchar includes :)

as they are not "representing data" in the context of the URI scheme / semantic meaning of the query portion - though I can't find where "representing data" is actually explicitly defined, so my reading of this could be wrong (language used is a bit ambiguous IMO)

Additional context
Slack conversation: https://open-api.slack.com/archives/C1137F8HF/p1753276004758329

Metadata

Metadata

Assignees

Labels

media and encodingIssues regarding media type support and how to encode data (outside of query/path params)param serializationIssues related to parameter and/or header serialization

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions