-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-124130: Fix a bug in matching regular expression \B in empty string #127007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
592a769
bbace33
4c79573
ef0a6fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -572,11 +572,8 @@ character ``'$'``. | |
| Word boundaries are determined by the current locale | ||
| if the :py:const:`~re.LOCALE` flag is used. | ||
|
|
||
| .. note:: | ||
|
|
||
| Note that ``\B`` does not match an empty string, which differs from | ||
| RE implementations in other programming languages such as Perl. | ||
| This behavior is kept for compatibility reasons. | ||
| .. versionchanged:: 3.14 | ||
| ``\B`` now matches the whole empty string. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fix is LGTM. Some users may not be sure what "the whole empty string" is.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know why you use "whole", the
You want to distinguish from "empty string" here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is still difficult to avoid contradiction with "\B Matches the empty string, but only when it ...". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, "the empty string" means two different things there. The docs you're removing don't do a very good job either. Should be disambiguated somehow, for example \B now matches if the input string is empty. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about this: \B used to be unable to match 0-length string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@Alcaro uses the word "input", it looks fine. Candidate:
The previous test: #124130 (comment) Tested with: regex module, openjdk, .net, rust, ruby, php, pcre2.
Save it to file Click to see Perl scriptprint "Perl version: $]\n\n";
# test \b in ASCII mode
my $n = () = ( "" =~ /\b/g );
print "\\b \"\" matches: $n\n";
my $n = () = ( "a" =~ /\b/g );
print "\\b \"a\" matches: $n\n";
my $n = () = ( "=" =~ /\b/g );
print "\\b \"=\" matches: $n\n";
print "~~~~~~~~~~~~\n";
# test \B in ASCII mode
my $n = () = ( "" =~ /\B/g );
print "\\B \"\" matches: $n\n";
my $n = () = ( "a" =~ /\B/g );
print "\\B \"a\" matches: $n\n";
my $n = () = ( "=" =~ /\B/g );
print "\\B \"=\" matches: $n\n";
print "\nxxxx ASCII mode above / Unicode mode below xxxx\n\n";
# test \b in Unicode mode
my $n = () = ( "" =~ /\b/gu );
print "\\b \"\" matches: $n\n";
my $n = () = ( "ю" =~ /\b/gu );
print "\\b \"ю\" matches: $n\n";
my $n = () = ( "=" =~ /\b/gu );
print "\\b \"=\" matches: $n\n";
print "~~~~~~~~~~~~\n";
# test \B in Unicode mode
my $n = () = ( "" =~ /\B/gu );
print "\\B \"\" matches: $n\n";
my $n = () = ( "ю" =~ /\B/gu );
print "\\B \"ю\" matches: $n <- other REs get 0, it seems a bug.\n";
my $n = () = ( "=" =~ /\B/gu );
print "\\B \"=\" matches: $n\n";
# explanation
print "\n(\\B + Unicode_mode) behaves inconsistently at the begin/end of a string:\n";
print "\n/^\\B/gu NOT IN \"ю\":\n";
my $n = () = ( "ю" =~ /^\B/gu );
print "matches: $n\n";
print "\n/\\B\$/gu IN \"ю\":\n";
my $n = () = ( "ю" =~ /\B$/gu );
print "matches: $n\n";Script output: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sorry, Perl doesn't have (\B+Unicode_mode) bug. |
||
| .. index:: single: \d; in regular expressions | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Fix a bug in matching regular expression ``\B`` in empty string. Now it | ||
picnixz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| is always the opposite of ``\b``. | ||
Uh oh!
There was an error while loading. Please reload this page.