Change the underlying URI parser (from RFC2396 to RFC3986)#202
Change the underlying URI parser (from RFC2396 to RFC3986)#202Drowze wants to merge 2 commits intorails:mainfrom
Conversation
|
Thank you for the pull request. It is unfortunate that underscore were allowed for so long depending on the URI parser, but it was never the intention. Allowing changing the parser just to allow underscore isn't something I think the library should do. We might want to make a change to allow underscore in global id. I'd be ok with that. But I don't think it should be a configuration. Can you change the PR in that direction? |
|
Hey @rafaelfranca 👋
That sounds good to me 😃 |
| # RFC3986 allows ampersands in hostnames | ||
| # see: https://github.com/ruby/uri/issues/141#issuecomment-2541219003 | ||
| assert_equal 'foo&bar', URI::GID.validate_app('foo&bar') |
There was a problem hiding this comment.
This is one thing that got confused me on RFC3986 parser... Apparently it does allow hostnames with ampersands. That doesn't look right, but I added a test documenting the behaviour regardless.
# ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23]
require "uri"
URI::VERSION
# => "1.0.4"
URI::RFC2396_Parser.new.parse("gid://foo&bar/bar/1")
#=> raises URI::InvalidURIError
URI::RFC3986_Parser.new.parse("gid://foo&bar/bar/1")
# => #<URI::Generic gid://foo&bar/bar/1>
What does this PR do?
Adds the possibility to configure the underlying uri parser used by global_id gem.
For context:
Since URI v1.0.0, the
URI::DEFAULT_PARSERis RFC3986_Parser (original PR, v1.0.0 release which included it). One notable change of the new default parser is that URIs with underscores in the host part are now considered valid (e.g.:foo://foo_bar)On the other hand, since v1.3.0 GlobalID started hardcoding its underlying URI parser to RFC2396_Parser (i.e.: how
URI::DEFAULT_PARSERwas before uri v1 release), in what I suppose to be an effort to keep good compatibility.Also relevant: URI v1 was released in Nov/2024, while GlobalID v1.3.0 was released almost an year later on Sep/2025.
Why?
Hardcoding the parser to RFC2396 on v1.3.0 was a breaking change for us: we started using GlobalID earlier this year and some of our URIs include underscore characters.
Unfortunately it's also not something we can easily change now: some of those now-invalid URIs were included in QR codes which were physically printed and distributed to some of our customers.
related issue(s): #201