Skip to content

Commit af03cea

Browse files
authored
Update ChangeLog and NEWS for 10.45 (#643)
1 parent 78857e4 commit af03cea

File tree

4 files changed

+313
-135
lines changed

4 files changed

+313
-135
lines changed

ChangeLog

Lines changed: 130 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,45 @@ Change Log for PCRE2
44
Before the move to GitHub, this was the only record of changes to PCRE2. Now
55
there is also the log of commit messages.
66

7+
Internal changes which are not visible to clients of the library are mostly not
8+
listed here.
9+
710
Version 10.45 xx-xxx-2024
811
-------------------------
912

10-
1. Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of
13+
1. (#418) Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of
1114
memory size was changed to the entire compiled data block, instead of just the
1215
pattern and tables data, so as to align with the new length restriction.
1316
Because the block's header contains pointers, this meant the pcre2test output
1417
was different in 32-bit mode. A patch by Carlo reverts to the previous state
1518
and makes sure that any limit set by pcre2_set_max_pattern_compiled_length()
1619
also avoids the internal struct overhead.
1720

18-
2. Add --posix-pattern-file to pcre2grep to allow processing of empty patterns
19-
through the -f option, as well as patterns that end in space characters, for
20-
compatibility with other grep tools.
21+
2. (#416, #622) Updates to build.zig.
22+
23+
3. (#427, et al.) Various fixes to pacify static analyzers.
2124

22-
3. Fix a but in the fuzz support quantifier-limiting code. It ignores strings
23-
of more than 5 digits because they are necessarily numbers greater than 65535,
24-
the largest legal quantifier. However, it wasn't ignoring non-significant
25+
4. (#428) Add --posix-pattern-file to pcre2grep to allow processing of empty
26+
patterns through the -f option, as well as patterns that end in space
27+
characters, for compatibility with other grep tools.
28+
29+
5. (4fa5b8bd) Fix a bug in the fuzz support quantifier-limiting code. It ignores
30+
strings of more than 5 digits because they are necessarily numbers greater than
31+
65535, the largest legal quantifier. However, it wasn't ignoring non-significant
2532
leading zeros.
2633

27-
4. The case-independent processing of the letter-matching Unicode properties
28-
Ll, Lt, and Lu have been changed to match Perl (which changed a while ago).
29-
When caseless matching is in force, all three of these properties are now
34+
6. (6d82f0cd) The case-independent processing of the letter-matching Unicode
35+
properties Ll, Lt, and Lu have been changed to match Perl (which changed a while
36+
ago). When caseless matching is in force, all three of these properties are now
3037
treated as Lc (cased letter).
3138

32-
5. The pcre2_jit_compile() function was updated by the addition of a new option
33-
PCRE2_JIT_TEST_ALLOC which, if called with a NULL first argument, tests not
34-
only the availability of JIT, but also its ability to allocate executable
39+
7. (#433) The pcre2_jit_compile() function was updated by the addition of a new
40+
option PCRE2_JIT_TEST_ALLOC which, if called with a NULL first argument, tests
41+
not only the availability of JIT, but also its ability to allocate executable
3542
memory. Update pcre2test to use this support to extend the -C option.
3643

37-
6. The code for parsing Unicode property descriptions for \p and \P been
38-
changed as follows:
44+
8. (75b1025a) The code for parsing Unicode property descriptions for \p and \P
45+
been changed as follows:
3946

4047
. White space etc. before ^ in a negated value such as \p{ ^L } was not being
4148
ignored.
@@ -47,77 +54,142 @@ changed as follows:
4754
. The documentation of the syntax of what can follow \p and \P has been
4855
updated.
4956

50-
7. There was an error in the table of lengths for parsed items for the OPTIONS
51-
item, but fortuitously it could never have actually bitten. While fixing this,
52-
some other code that could never be obeyed was discovered and removed.
57+
9. (1c24ba01) There was an error in the table of lengths for parsed items for
58+
the OPTIONS item, but fortuitously it could never have actually bitten. While
59+
fixing this, some other code that could never be obeyed was discovered and
60+
removed.
5361

54-
8. Removed some incoreect optimization code from DFA matching that has been
55-
there since PCRE1, but has just been found to cause a no match return instead
56-
of a partial match in some cases. It involves partial matching when (*F) is
57-
present so is unlikely to have actually affected anyone.
62+
10. (674b6640) Removed some incorect optimization code from DFA matching that
63+
has been there since PCRE1, but has just been found to cause a no match return
64+
instead of a partial match in some cases. It involves partial matching when (*F)
65+
is present so is unlikely to have actually affected anyone.
5866

59-
9. Tidy the wording and formatting of some pcre2test error messages concerned
60-
with bad modifiers. Also restrict single-letter modifier sequences to the first
61-
item in a modifier list, as documented and always intended.
67+
11. (b0f4ac17) Tidy the wording and formatting of some pcre2test error messages
68+
concerned with bad modifiers. Also restrict single-letter modifier sequences to
69+
the first item in a modifier list, as documented and always intended.
6270

63-
10. An iterator at the end of many assertions can always be auto-possessified,
64-
but not at the end of variable-length lookbehinds. THere was a bug in the code
65-
that checks for such a lookbehind; it was looking only at the first branch,
66-
which is wrong because some branches can be fixed length when others are not,
67-
for example (?<=AB|CD?). Now all branches are checked for variability.
71+
12. (1415565c) An iterator at the end of many assertions can always be
72+
auto-possessified, but not at the end of variable-length lookbehinds. There was
73+
a bug in the code that checks for such a lookbehind; it was looking only at the
74+
first branch, which is wrong because some branches can be fixed length when
75+
others are not, for example (?<=AB|CD?). Now all branches are checked for
76+
variability.
6877

69-
11. Matching with pcre2_match() could give an incorrect result if a
78+
13. (ead08288) Matching with pcre2_match() could give an incorrect result if a
7079
variable-length lookbehind was used as the condition in a conditional group.
7180
The condition could erroneously be treated as true if a branch matched but
7281
overran the current position. This bug was in the interpreter only; matching
7382
with JIT was correct.
7483

75-
12. Add a new error code (PCRE2_ERROR_JIT_UNSUPPORTED) which is yielded
84+
14. (#443) Split out the sljit sub-project into a "Git submodule". Git users
85+
must now run `git submodule init; git submodule update` after a Git checkout, or
86+
the build will fail due to missing files in deps/sljit.
87+
88+
15. (#441) Add a new error code (PCRE2_ERROR_JIT_UNSUPPORTED) which is yielded
7689
for unsupported jit features.
7790

78-
13. Add a new experimental feature called scan substring. This feature is a new
79-
type of assertion which matches the content of a capturing block to a sub
80-
pattern.
91+
16. (#444) Fix bug in 'first code unit' and 'last code unit' optimization
92+
combined with lookahead assertions.
93+
94+
17. (#445, #447, #449, #451, #452, #459, #563) Add a new feature called scan
95+
substring. This feature is a new type of assertion which matches the content of
96+
a capturing block to a sub-pattern.
8197

82-
14. Item 43 of 10.43 was incomplete because it addressed only \z and not \Z,
98+
18. (#450) Improvements to 'first code unit' / 'starting code units'
99+
optimisation.
100+
101+
19. (#455) Many, many improvements to the JIT compiler.
102+
103+
20. Item 43 of 10.43 was incomplete because it addressed only \z and not \Z,
83104
which was still misbehaving when matching fragments inside invalid UTF strings.
84105

85-
15. Octal escapes of the form \045 or \111 were not being recognized in
86-
substitution strings, and if encountered gave an error, though the \o{...} form
87-
was recognized. This bug is now fixed.
106+
21. (d29e7290) Octal escapes of the form \045 or \111 were not being recognized
107+
in substitution strings, and if encountered gave an error, though the \o{...}
108+
form was recognized. This bug is now fixed.
109+
110+
22. (#463, #487) Fix 1 byte out-of-bounds read when parsing malformed limits
111+
(e.g. LIMIT_HEAP)
112+
113+
23. Many improvements to test infrastructure. Many more platforms and
114+
configurations are now run in Continuous Integration, and all the platforms now
115+
run the full test suite, rather than a partial subset.
116+
117+
24. (#475) Implement title casing in substitution strings using Perl syntax.
118+
119+
25. (#478, #504) Disallow \x if not followed by { or a hex digit.
120+
121+
26. (#473) Implements Python-style backrefs in substitutions.
88122

89-
16. Merged PR475, which implements title casing in substitution strings a la
90-
Perl.
123+
27. (#472) Fix error reporting for certain over-large octal escapes.
91124

92-
17. Merged PR478, which disallows \x if not followed by { or a hex digit.
125+
28. (#482) Fix parsing of named captures in replacement strings, allowing
126+
non-ASCII capture names to be used.
93127

94-
18. Merged PR473, which implements Python-style backrefs in substitutions.
128+
29. (#477, #474, #488, #494, #496, #506, #508, #511, #518, #524, #540) Many
129+
improvements to parsing and optimising of character classes.
95130

96-
19. Merged PR483, which is adding \g<n> and $<name> to replacement strings.
131+
30. (#483, #498) Add support for \g<n> and $<name> to replacement strings.
97132

98-
20. Merged PR470, which adds PCRE2_EXTRA_NO_BS0 and PCRE2_EXTRA_PYTHON_OCTAL.
133+
31. (#470) Add option flags PCRE2_EXTRA_NO_BS0 and PCRE2_EXTRA_PYTHON_OCTAL.
99134

100-
21. Prevent 1 byte overread when parsing malformed patterns with early VERBs.
135+
32. (#471) Add new API function pcre2_set_optimize() for controlling which
136+
optimizations are enabled.
101137

102-
22. Merged PR491 which adds $& $` $' and $_ to substitution replacements, as
103-
well as interpreting \b and \v as characters.
138+
33. (#491) Adds $& $` $' and $_ to substitution replacements, as well as
139+
interpreting \b and \v as characters.
104140

105-
23. Updated perltest.sh to enable locale setting.
141+
34. (#499) Add option PCRE2_EXTRA_NEVER_CALLOUT to disable callouts.
106142

107-
24. Fixed a bug in JIT affecting greedy bounded repeats. The upper limit of
108-
repeats inside a repeated bracket might be incorrectly checked.
143+
35. (#503, #513) Update Unicode support to UCD 16.
109144

110-
25. Fixed a bug in JIT affecting caseful matching of backreferences. When
145+
36. (#512, #618, #638) Add new function pcre2_set_substitute_case_callout() to
146+
allow clients to provide a custom callback with locale-aware case
147+
transformation.
148+
149+
37. (#516) Fix case-insensitive matching of backreferences when using the
150+
PCRE2_EXTRA_CASELESS_RESTRICT option.
151+
152+
38. (#519) In pcre2grep, add $& as an alias for $0
153+
154+
39. (c9bf8339, #534) Updated perltest.sh to enable locale setting.
155+
156+
40. (#521) Add support for Turkish I casefolding, using new options
157+
PCRE2_EXTRA_TURKISH_CASING, and added pre-pattern flags (*TURKISH_CASING) and
158+
(*CASELESS_RESTRICT).
159+
160+
41. (#523, #546, #547) Add support for UTS#18 compatible character classes,
161+
using the new option PCRE2_ALT_EXTENDED_CLASS. This adds '[' as a metacharacter
162+
within character classes and the operators '&&', '--' and '~~', allowing
163+
subtractions and intersections of character classes to be easily expressed.
164+
165+
42. (#553, #586, #596, #597) Add support for Perl-style extended character
166+
classes, using the syntax (?[...]). This also allows expressing subtractions and
167+
intersections of character classes, but using a different syntax to UTS#18.
168+
169+
43. (#554) Fixed a bug in JIT affecting greedy bounded repeats. The upper limit
170+
of repeats inside a repeated bracket might be incorrectly checked.
171+
172+
44. (#556) Fixed a bug in JIT affecting caseful matching of backreferences. When
111173
utf is disabled, and dupnames is enabled, caseless matching was used even
112174
if caseful matching was needed.
113175

114-
26. Fixed a bug in pcre2grep reported by Alejandro Colomar <alx@kernel.org>
115-
(GitHub issue #577). In certain cases, when lines of above and below context
116-
were contiguous, a separator line was incorrectly being inserted.
176+
45. (f34fc0a3) Fixed a bug in pcre2grep reported by Alejandro Colomar
177+
<alx@kernel.org> (GitHub issue #577). In certain cases, when lines of above and
178+
below context were contiguous, a separator line was incorrectly being inserted.
117179

118-
27. Split out the sljit sub-project into a "git submodule". Git users must
119-
now run `git submodule init; git submodule update` after a Git checkout, or
120-
the build will fail due to missing files in deps/sljit.
180+
46. (#594) Fix a small (one/two byte) out-of-bounds read on invalid UTF-8 input
181+
in pcre2grep.
182+
183+
47. (#370) Fix the INSTALL_MSVC_PDB CMake flag.
184+
185+
48. (#366) Install cmake files in prefix/lib/cmake/pcre2 rather than
186+
prefix/cmake. The new CMake flag PCRE2_INSTALL_CMAKEDIR allows customising this
187+
location.
188+
189+
49. (#624, #626, #628, #632, #639, #641) Reduce code size of generated JIT code
190+
for repeated character classes.
191+
192+
50. (#623) Update the Bazel build files.
121193

122194

123195
Version 10.44 07-June-2024

NEWS

Lines changed: 87 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,92 @@
11
News about PCRE2 releases
22
-------------------------
33

4+
Version 10.45 xx-xxx-2024
5+
--------------------------
6+
7+
This is a comparatively large release, incorporating new features, some
8+
bugfixes, and a few changes with slight backwards compatibility implications.
9+
Please see the ChangeLog and Git log for further details.
10+
11+
Only changes to behaviour, changes to the API, and major changes to the pattern
12+
syntax are described here.
13+
14+
This release is the first to be available as a (signed) Git tag, or
15+
alternatively as a (signed) tarball of the Git tag.
16+
17+
This is also the first release to be made by the new maintainers of PCRE2, and
18+
we would like to thank Philip Hazel, creator and maintainer of PCRE and PCRE2.
19+
20+
* (Git change) The sljit project has been split out into a separate Git
21+
repository. Git users must now run `git submodule init; git submodule update`
22+
after a Git checkout.
23+
24+
* (Behaviour change) Update Unicode support to UCD 16.
25+
26+
* (Match behaviour change) Case-insensitive matching of Unicode properties
27+
Ll, Lt, and Lu has been changed to match Perl. Previously, /\p{Ll}/i would
28+
match only lower-case characters (even though case-insensitive matching was
29+
specified). This also affects case-insensitive matching of POSIX classes such
30+
as [:lower:].
31+
32+
* (Minor match behaviour change) Case-insensitive matching of backreferences now
33+
respects the PCRE2_EXTRA_CASELESS_RESTRICT option.
34+
35+
* (Minor pattern syntax change) Parsing of the \x escape is stricter, and is
36+
no longer parsed as an escape for the NUL character if not followed by '{' or
37+
a hexadecimal digit. Use \x00 instead.
38+
39+
* (Major new feature) Add a new feature called scan substring. This is a new
40+
type of assertion which matches the content of a capturing block to a
41+
sub-pattern.
42+
43+
Example: to find a word that contains the rare (in English) sequence of
44+
letters "rh" not at the start:
45+
46+
\b(\w++)(*scan_substring:(1).+rh)
47+
48+
The first group captures a word which is then scanned by the
49+
(*scan_substring:(1) ... ) assertion, which tests whether the pattern ".+rh"
50+
matches the capture group "(1)".
51+
52+
* (Major new feature) Add support for UTS#18 compatible character classes,
53+
using the new option PCRE2_ALT_EXTENDED_CLASS. This adds '[' as a
54+
metacharacter within character classes and the operators '&&', '--' and '~~',
55+
allowing subtractions and intersections of character classes to be easily
56+
expressed.
57+
58+
Example: to match Thai or Greek letters (but not letters or other characters
59+
in those scripts), use [\p{L}&&[\p{Thai}||\p{Greek}]].
60+
61+
* (Major new feature) Add support for Perl-style extended character classes,
62+
using the syntax (?[...]). This also allows expressing subtractions and
63+
intersections of character classes, but using a different syntax to UTS#18.
64+
65+
Example: to match Thai or Greek letters (but not letters or other characters
66+
in those scripts), use (?[\p{L} & (\p{Thai} + \p{Greek})]).
67+
68+
* (Minor feature) Significant improvements to the character class match engine.
69+
Compiled character classes are now more compact, and have faster matching
70+
for large or complex character sets, using binary search through the set.
71+
72+
* JIT compilation now fails with the new error code PCRE2_ERROR_JIT_UNSUPPORTED
73+
for patterns which use features not supported by the JIT compiler.
74+
75+
* (Minor feature) New options PCRE2_EXTRA_NO_BS0 (disallow \0 as an escape for
76+
the NUL character); PCRE2_EXTRA_PYTHON_OCTAL (use Python disambiguation rules
77+
for deciding whether \12 is a backreference or an octal escape);
78+
PCRE2_EXTRA_NEVER_CALLOUT (disable callout syntax entirely);
79+
PCRE2_EXTRA_TURKISH_CASING (use Turkish rules for case-insensitive matching).
80+
81+
* (Minor feature) Add new API function pcre2_set_optimize() for controlling
82+
which optimizations are enabled.
83+
84+
* (Minor new features) A variety of extensions have been made to
85+
pcre2_substitute() and its syntax for replacement strings. These now support:
86+
\123 octal escapes; titlecasing \u\L; \1 backreferences; \g<1> and $<NAME>
87+
backreferences; $& $` $' and $_; new function
88+
pcre2_set_substitute_case_callout() to allow locale-aware case transformation.
89+
490

591
Version 10.44 07-June-2024
692
--------------------------
@@ -13,7 +99,7 @@ increased to 128. Some auxiliary files for building under VMS are added.
1399
Version 10.43 16-February-2024
14100
------------------------------
15101

16-
There are quite a lot of changes in this release (see ChangeLog and git log for
102+
There are quite a lot of changes in this release (see ChangeLog and Git log for
17103
a list). Those that are not bugfixes or code tidies are:
18104

19105
* The JIT code no longer supports ARMv5 architecture.

0 commit comments

Comments
 (0)