fix userinfo in case of some special characters#144
fix userinfo in case of some special characters#144kastakhov wants to merge 4 commits intolibwww-perl:masterfrom
Conversation
add some UT for this case
lib/URI/_server.pm
Outdated
| } | ||
| if ($str =~ m,^((?:$URI::scheme_re:)?)//(.*:.*@)?([^/?\#]*)(.*)$,os) { | ||
| my $scheme = $1; | ||
| my $ui = $2 || ''; |
There was a problem hiding this comment.
Could we have a more descriptive name for this variable?
There was a problem hiding this comment.
I realize this variable was not your creation. :)
There was a problem hiding this comment.
Sure thing, I have renamed to $userinfo.
I decided to rename in whole file as it might be a little confusing when we're using $ui or $userinfo.
lib/URI.pm
Outdated
| } | ||
|
|
||
| if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//([^/?\#]+)(.*)$}os) { | ||
| if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//((.*:.*@)?[^/?\#]+)(.*)$}os) { |
There was a problem hiding this comment.
From a discussion with @genio on IRC:
It would be nice to avoid (.*@) or (.*:.*@) as it's greedy and could be prone to bugs.
We could take inspiration from https://metacpan.org/dist/Mojolicious/source/lib/Mojo/URL.pm#L52-64
After gathering the host and port part, ^([^@]+@) should be fine.
There was a problem hiding this comment.
Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.
like the similar approach is already used in LWP/Protocol/http.pm.
And I already mentioned what issue it might cause in my PR.
For my standpoint (.*:.*@) is pretty lazy, but okay, I agree that it could be better.
@oalders, what you think about ([^:]+:.*@) for matching user info?
This will match everything except : as username cannot contain colon, then match : as separator, match everything after first colon as password or nothing as according to rfc3986#section-3.2.1 it might be empty, and finally match @ as second separator.
There was a problem hiding this comment.
By the way, the regex which Mojo/URL.pm#L52-64 is used also cannot parse username/password if it contains any of /#? symbols :)
I know, it's really rarely usecase, but unfortunately I faced with it already two times :d
For instance, this regex with username and password which contain all special characters in unescaped form.
There was a problem hiding this comment.
Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.
Then it should be URL-encoded as %40. I'm not sure if there's any way around that. Something has to serve as a stopping point. If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.
There was a problem hiding this comment.
If the RE is too greedy, it might end up stopping with a
@inside of the path, likehttp://host/path/directory@foobar/file.
Hm.
Now I see, it'll match this path incorrectly.
There was a problem hiding this comment.
Well, actually I can revert this change as the most important part is in lib/URI/_server.pm#14.
There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d
There was a problem hiding this comment.
There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d
If the password was inserted directly into the URL that way without percent-encoding, that's not really the URI module's fault. The password has to be encoded properly first. That's the point of escaping characters.
|
Thanks very much for this @kastakhov. If we could tweak the regex in both places as suggested, that would be helpful. |
|
This is very adjacent to this issue, and feel free to tell me to open a different issue/PR, but... The user/password methods allow bad data to pollute the URL (lines 45 and the similar line 22), because it is only doing a very light set of manual URL-escaping. It should, at a minimum escape potential |
|
Is there a good reason not to just call |
|
I still don't think this is a good idea, as you're supposed to URI-escape things. Here's an example with both Mojo and URI: #!/usr/bin/env perl
use v5.36;
use strict;
use warnings;
use Mojo::URL ();
use Mojo::Util ();
use URI ();
use URI::Escape ();
my $unescaped_u = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $unescaped_p = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
{
# mojolicious way:
my $username = Mojo::Util::url_escape($unescaped_u);
my $password = Mojo::Util::url_escape($unescaped_p);
my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
# say "Mojo: ", $foo;
my $url = Mojo::URL->new($foo);
# say "Mojo: ", $url->to_string();
say "Mojo: ", $url->userinfo();
}
{
# URI way
my $username = URI::Escape::uri_escape($unescaped_u);
my $password = URI::Escape::uri_escape($unescaped_p);
my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
# say "URI: ", $foo;
my $url = URI->new($foo);
# say "URI: ", $url->as_string();
say "URI: ", URI::Escape::uri_unescape($url->userinfo());
}
Yielding: % perl url.pl
Mojo: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
URI: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
|
|
While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, Also, I'll just create a different issue for the user/password set problem. |
|
Well, disregard my user/password set problem. I realized that |
Just in case, originally I faced with this issue here.. |
make _uric_escape regex more specific for userinfo part update UT
|
Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly. 🙃 |
|
The seemingly relevant RFC pieces are linked here: I think my cursory reading of these and understanding thus far leads me to believe we should leave things as-is and maybe provide documentation for how to provide complex authentication using |
|
I found Section 2.2 earlier, but didn't see where those declarations were used. Good find. So, this means we should stay away from unnecessarily parsing
Again, a delimiter has to exist unescaped. There's no such thing as a parser algorithm that can take in every ASCII or Unicode unescaped character as a sub-field of the whole string. At some point, we have to force the user to escape their data before it gets dropped into a URL. |
| if (_host_escape($host)) { | ||
| $str = "$scheme//$ui$host$port$rest"; | ||
| } | ||
| if ($str =~ m,^((?:$URI::scheme_re:)?)//([^:]+:[^@]*@)?([^/?\#]*)(.*)$,os) { |
There was a problem hiding this comment.
This will not match correctly valid string where the optional password is omitted (such as ftp://user@host) and will also match what I think is invalid string ftp://user:@host
Wouldn't it be better written as something like: m,^((?:$URI::scheme_re:)?)[/][/]([^:]+(?:[:][^@]+)?[@])?([^/?\#@]*)(.*)$,os
Fix for #143