Copying a link to a filename that contains + does not encode it properly

(1) By js on 2025-06-04 00:37:44 [link] [source]

When going to https://objfw.nil.im/file?udc=1&ln=on&ci=trunk&name=src%2Fplatform%2FAmigaOS%2FOFString%2BPathAdditions.m, clicking on a line and selecting copy, the link is copied as https://objfw.nil.im/file?ci=trunk&name=src/platform/AmigaOS/OFString+PathAdditions.m&ln=11, which Fossil then converts to a space for some reason (I guess because that was a common convention at some point, though never part of any URL spec). So this seems to be a case of the JavaScript and Fossil disagreeing which characters need to be escaped.

(2.1) By Stephan Beal (stephan) on 2025-06-04 08:17:38 edited from 2.0 in reply to 1 [link] [source]

So this seems to be a case of the JavaScript and Fossil disagreeing which characters need to be escaped.

The JS in question simply reads the current URL, snips out certain parts of it, and re-uses the rest. Unfortunately, the file in question has a + sign in its name, and a + sign is a URL encoding for spaces (going way back to before spaces were commonly handled more transparently), which apparently confuses the JS pieces into thinking that's supposed to be a space.

The translation looks something like:

Inbound: ?ci=trunk&name=src/platform/AmigaOS/OFString%2bPathAdditions.m

Decoded to: ?ci=trunk&name=src/platform/AmigaOS/OFString+PathAdditions.m

when is then used as-is, with no further encoding/decoding.

My attempts to work around this have, by replacing the + characters both before and after the decoding, have not produced working results, but suggestions are welcomed. The culprit is src/fossil.numbered-lines.js. Simply removing the URL decoding from that leaves us with broken links for everything (edit: not quite correct - skipping decoding seems to work (see the next response) but i was also doing an unnecessary re-code in this particular test, when then broke it). Similarly, following MDN's advice¹ and replacing + with a space before decoding doesn't change the result.

^{^} https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent

(3) By Stephan Beal (stephan) on 2025-06-04 08:15:26 in reply to 2.0 [link] [source]

My attempts to work around this have, by replacing the + characters both before and after the decoding, have not produced working results,

Until i came upon the idea of not decoding/recoding them at all. That seems to work and is now checked in.

(4) By js on 2025-06-07 12:20:02 in reply to 3 [source]

Yes, not encoding/decoding is probably the best thing to do. The problem is that + is not a special character in a URI, and as per spec, just means +. However, in the 90s/early 2000s, a lot of things interpreter that as space, maybe because it's more readable than %20? In any case, I think in a lot of use cases for Fossil it makes sense to use + over %20 - it makes Wiki links much more readable, for example, so I think the solution you picked here is the best way to deal with it.

(5) By anonymous on 2025-06-07 23:25:40 in reply to 4 [link] [source]

The problem is that + is not a special character in a URI, and as per spec, just means +.

RFC2396, section 3.4, from 1998, suggests that that assertion is incorrect.

""" Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved. """

https://datatracker.ietf.org/doc/html/rfc2396#section-3.4

Note that RFC1738 and RFC1808 from 1994 and 1995 covered similar material and did not declare that + was special; that seems to have been added in this version.

RFC2396 was updated, and then obsoleted by RFC3986 in 2005, which still shows + as "reserved", but is not explicit that + is specially-special within the query component of a http URI (i.e. after the ?).

RFC3875 from 2004 describes "CGI", which was in practice the main user of the http query component. It includes:

""" form submission from an HTML document [18] uses application/x-www-form-urlencoded encoding, in which the characters "+", "&" and "=" are reserved """

which links to a description of HTML 4.01 (from 1999, up to 2018) and how to create that encoding, which includes "Space characters are replaced by `+', and then reserved characters are escaped" (by percent-encoding).

The current HTML 5 spec has a note that includes

""" The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, """

but does say to decode + to space ("Replace any 0x2B (+) in name and value with 0x20 (SP).") before doing percent-decoding.

So overall: + in a http URI before the ? means +; while + after the ? (and before the #) means space.

If you want to move strings across the ?, then you need to handle that difference, or you need accept that some clients will interpret things differently, and they will not be wrong.

Aside, and untested by me: what would happen if the filename was of the form start&name1=value1 (i.e. includes both an & and an =). I suspect that the previous decode/re-encode would have broken that, and that the current pass-through does not break it. But I also suspect that anyone who uses filenames like that, deserves what they get...

Cheers,