Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/URI.pm
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ sub _fix_uric_escape_for_host_part {
return;
}

if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//([^/?\#]+)(.*)$}os) {
if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//((.*:.*@)?[^/?\#]+)(.*)$}os) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a discussion with @genio on IRC:

It would be nice to avoid (.*@) or (.*:.*@) as it's greedy and could be prone to bugs.

We could take inspiration from https://metacpan.org/dist/Mojolicious/source/lib/Mojo/URL.pm#L52-64

After gathering the host and port part, ^([^@]+@) should be fine.

Copy link
Author

@kastakhov kastakhov Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

like the similar approach is already used in LWP/Protocol/http.pm.
And I already mentioned what issue it might cause in my PR.

For my standpoint (.*:.*@) is pretty lazy, but okay, I agree that it could be better.

@oalders, what you think about ([^:]+:.*@) for matching user info?

This will match everything except : as username cannot contain colon, then match : as separator, match everything after first colon as password or nothing as according to rfc3986#section-3.2.1 it might be empty, and finally match @ as second separator.

Copy link
Author

@kastakhov kastakhov Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the regex which Mojo/URL.pm#L52-64 is used also cannot parse username/password if it contains any of /#? symbols :)
I know, it's really rarely usecase, but unfortunately I faced with it already two times :d

For instance, this regex with username and password which contain all special characters in unescaped form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

Then it should be URL-encoded as %40. I'm not sure if there's any way around that. Something has to serve as a stopping point. If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

Hm.
Now I see, it'll match this path incorrectly.

Copy link
Author

@kastakhov kastakhov Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually I can revert this change as the most important part is in lib/URI/_server.pm#14.

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

If the password was inserted directly into the URL that way without percent-encoding, that's not really the URI module's fault. The password has to be encoded properly first. That's the point of escaping characters.

my $orig = $2;
my ($user, $host) = $orig =~ /^(.*@)?([^@]*)$/;
$user ||= '';
Expand Down
26 changes: 16 additions & 10 deletions lib/URI/_server.pm
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,25 @@ use warnings;

use parent 'URI::_generic';

use URI::Escape qw(uri_unescape);
use URI::Escape qw(uri_unescape uri_escape);

our $VERSION = '5.29';

sub _uric_escape {
my($class, $str) = @_;
if ($str =~ m,^((?:$URI::scheme_re:)?)//([^/?\#]*)(.*)$,os) {
my($scheme, $host, $rest) = ($1, $2, $3);
my $ui = $host =~ s/(.*@)// ? $1 : "";
my $port = $host =~ s/(:\d+)\z// ? $1 : "";
if (_host_escape($host)) {
$str = "$scheme//$ui$host$port$rest";
}
if ($str =~ m,^((?:$URI::scheme_re:)?)//(.*:.*@)?([^/?\#]*)(.*)$,os) {
my $scheme = $1;
my $ui = $2 || '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a more descriptive name for this variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this variable was not your creation. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I have renamed to $userinfo.
I decided to rename in whole file as it might be a little confusing when we're using $ui or $userinfo.

my $host = $3;
my $rest = $4;
my $port = $host =~ s/(:\d+)\z// ? $1 : "";
if ($ui) {
# escape /?# symbols as they are used
# in subsequent regex for path parsing
$ui = uri_escape($ui, '/?#');
}
_host_escape($host);
$str = "$scheme//$ui$host$port$rest";
}
return $class->SUPER::_uric_escape($str);
}
Expand All @@ -26,8 +32,8 @@ sub _host_escape {
return if URI::HAS_RESERVED_SQUARE_BRACKETS and $_[0] !~ /[^$URI::uric]/;
return if !URI::HAS_RESERVED_SQUARE_BRACKETS and $_[0] !~ /[^$URI::uric4host]/;
eval {
require URI::_idna;
$_[0] = URI::_idna::encode($_[0]);
require URI::_idna;
$_[0] = URI::_idna::encode($_[0]);
};
return 0 if $@;
return 1;
Expand Down
50 changes: 49 additions & 1 deletion t/http.t
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use strict;
use warnings;

use Test::More tests => 16;
use Test::More tests => 76;

use URI ();

Expand Down Expand Up @@ -48,3 +48,51 @@ $u = URI->new("http://%65%78%61%6d%70%6c%65%2e%63%6f%6d/%70%75%62/%61/%32%30%30%
is($u->canonical, "http://example.com/pub/a/2001/08/27/bjornstad.html");

ok($u->has_recognized_scheme);

my $username = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $exp_username = 'u1!%22%23$%&\'()*+,-.%2F;%3C=%3E%3F@%5B%5C%5D%5E_%60%7B%7C%7D~';
my $password = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $exp_password = 'p1!%22%23$%&\'()*+,-.%2F;%3C=%3E%3F@%5B%5C%5D%5E_%60%7B%7C%7D~';
my $path = 'path/to/page';
my $query = 'a=b&c=d';
my %host = (
'[::1]' => {
host => '::1',
port => 80,
},
'[::1]:8080' => {
host => '::1',
port => 8080,
},
'127.0.0.1' => {
host => '127.0.0.1',
port => 80,
},
'127.0.0.1:8080' => {
host => '127.0.0.1',
port => 8080,
},
'localhost' => {
host => 'localhost',
port => 80,
},
'localhost:8080' => {
host => 'localhost',
port => 8080,
},
);

foreach my $host (keys %host) {
my $uri = URI->new("http://${username}:${password}\@${host}/${path}?${query}");
is($uri->scheme, 'http');
is($uri->userinfo, "${exp_username}:${exp_password}");
is($uri->host, $host{$host}->{host});
is($uri->port, $host{$host}->{port});
is($uri->path, "/${path}");
is($uri->query, $query);
is($uri->authority, "${exp_username}:${exp_password}\@${host}");
is($uri->as_string, "http://${exp_username}:${exp_password}\@${host}/${path}?${query}");
is($uri->as_iri, "http://${exp_username}:${exp_password}\@${host}/${path}?${query}");
is($uri->canonical, "http://${exp_username}:${exp_password}\@${host}/${path}?${query}");
}