-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Description
tldr;
I think that for application/x-www-form-urlencoded
content types, all special characters including :
should always be percent-encoded, excepting spaces should always be the special case of encoded as a +
Though I do appreciate it is probably easy to find examples of languages/libraries where this isn't followed 😅 and so it maybe difficult stipulate as a MUST
...
Describe the error in the specification
With respect to the examples given here:
RFC 1866 is referenced, and from my skimming of the RFC the only section that seems relevant in this context to me is 8.2 Form Submission which defines serialization of `application/x-www-form-urlencoded', specifically:
The form field names and values are escaped: space characters are replaced by
+
, and then reserved characters are escaped as per [URL]; that is, non-alphanumeric characters are replaced by%HH
, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks, as in multi-line text field values, are represented as CR LF pairs, i.e.%0D%0A
.
The example searilization given in the OAI specification under 4.8.15.2.1 is then:
id=f81d4fae-7dec-11d0-a765-00a0c91e6bf6&address=%7B%22streetAddress%22:%22123+Example+Dr.%22,%22city%22:%22Somewhere%22,%22state%22:%22CA%22,%22zip%22:%2299999%2B1234%22%7D
Given that in this example, we have a field name address
with a value that happens to be (ie: form data isn't aware of this, its just an opaque string AFAIK) a JSON serialized string, I'd expect all special characters to be percent-encoded, giving a result of:
id=f81d4fae-7dec-11d0-a765-00a0c91e6bf6&address=%7B%22streetAddress%22%3A%22123+Example+Dr.%22%2C%22city%22%3A%22Somewhere%22%2C%22state%22%3A%22CA%22%2C%22zip%22%3A%2299999%2B1234%22%7D
This matches the behavior of URLSearchParams
on web/nodejs, eg:
const params = new URLSearchParams();
params.append("id", "f81d4fae-7dec-11d0-a765-00a0c91e6bf6");
params.append("address", JSON.stringify({streetAddress: "123 Example Dr.", city: "Somewhere", state: "CA", zip: "99999+1234"}));
params.toString();
Yields the above, and it will also round-trip back, eg: using JSON.parse(new URLSearchParams(params.toString()).get("address"))
The behavior here is very likely to differ across platforms / implementations, but as a quick sense-check I observe the same behavior in java/kotlin using both java.net.URLEncoder
(playground example) and org.apache.http.client.utils.URLEncodedUtils
which increases my confidence in my interpretation of the RFC.
org.apache.httpcomponents:httpclient:4.5.13
// depends on `implementation("org.apache.httpcomponents:httpclient:4.5.13")`
import org.apache.http.client.utils.URLEncodedUtils
import org.apache.http.message.BasicNameValuePair
import java.nio.charset.StandardCharsets
fun main() {
val params = listOf(
BasicNameValuePair("id", "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"),
BasicNameValuePair("address", """{"streetAddress":"123 Example Dr.","city":"Somewhere","state":"CA","zip":"99999+1234"}""")
)
val encoded = URLEncodedUtils.format(params, StandardCharsets.UTF_8)
println(encoded)
}
There are several other examples under section 4.8.12.4 that exhibit similar discrepancy:
style: form
,explode: false
:,
not escaped forarray
andobject
style: spaceDelimited
,explode: false
:%20
instead of with+
Later in appendix E.5, reference is made to RFC 3986
I haven't fully read and understood that RFC, but my current interpretation of statements such as:
URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.
Mean that gen-delims
including :
etc indicates that a :
falling within the query, Should still be percent encoded.
query = *( pchar / "/" / "?" )
(pchar
includes :
)
as they are not "representing data" in the context of the URI scheme / semantic meaning of the query
portion - though I can't find where "representing data" is actually explicitly defined, so my reading of this could be wrong (language used is a bit ambiguous IMO)
Additional context
Slack conversation: https://open-api.slack.com/archives/C1137F8HF/p1753276004758329