@@ -4,38 +4,45 @@ Change Log for PCRE2
44Before the move to GitHub, this was the only record of changes to PCRE2. Now
55there is also the log of commit messages.
66
7+ Internal changes which are not visible to clients of the library are mostly not
8+ listed here.
9+
710Version 10.45 xx-xxx-2024
811-------------------------
912
10- 1. Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of
13+ 1. (#418) Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of
1114memory size was changed to the entire compiled data block, instead of just the
1215pattern and tables data, so as to align with the new length restriction.
1316Because the block's header contains pointers, this meant the pcre2test output
1417was different in 32-bit mode. A patch by Carlo reverts to the previous state
1518and makes sure that any limit set by pcre2_set_max_pattern_compiled_length()
1619also avoids the internal struct overhead.
1720
18- 2. Add --posix-pattern-file to pcre2grep to allow processing of empty patterns
19- through the -f option, as well as patterns that end in space characters, for
20- compatibility with other grep tools .
21+ 2. (#416, #622) Updates to build.zig.
22+
23+ 3. (#427, et al.) Various fixes to pacify static analyzers .
2124
22- 3. Fix a but in the fuzz support quantifier-limiting code. It ignores strings
23- of more than 5 digits because they are necessarily numbers greater than 65535,
24- the largest legal quantifier. However, it wasn't ignoring non-significant
25+ 4. (#428) Add --posix-pattern-file to pcre2grep to allow processing of empty
26+ patterns through the -f option, as well as patterns that end in space
27+ characters, for compatibility with other grep tools.
28+
29+ 5. (4fa5b8bd) Fix a bug in the fuzz support quantifier-limiting code. It ignores
30+ strings of more than 5 digits because they are necessarily numbers greater than
31+ 65535, the largest legal quantifier. However, it wasn't ignoring non-significant
2532leading zeros.
2633
27- 4. The case-independent processing of the letter-matching Unicode properties
28- Ll, Lt, and Lu have been changed to match Perl (which changed a while ago).
29- When caseless matching is in force, all three of these properties are now
34+ 6. (6d82f0cd) The case-independent processing of the letter-matching Unicode
35+ properties Ll, Lt, and Lu have been changed to match Perl (which changed a while
36+ ago). When caseless matching is in force, all three of these properties are now
3037treated as Lc (cased letter).
3138
32- 5. The pcre2_jit_compile() function was updated by the addition of a new option
33- PCRE2_JIT_TEST_ALLOC which, if called with a NULL first argument, tests not
34- only the availability of JIT, but also its ability to allocate executable
39+ 7. (#433) The pcre2_jit_compile() function was updated by the addition of a new
40+ option PCRE2_JIT_TEST_ALLOC which, if called with a NULL first argument, tests
41+ not only the availability of JIT, but also its ability to allocate executable
3542memory. Update pcre2test to use this support to extend the -C option.
3643
37- 6. The code for parsing Unicode property descriptions for \p and \P been
38- changed as follows:
44+ 8. (75b1025a) The code for parsing Unicode property descriptions for \p and \P
45+ been changed as follows:
3946
4047 . White space etc. before ^ in a negated value such as \p{ ^L } was not being
4148 ignored.
@@ -47,77 +54,142 @@ changed as follows:
4754 . The documentation of the syntax of what can follow \p and \P has been
4855 updated.
4956
50- 7. There was an error in the table of lengths for parsed items for the OPTIONS
51- item, but fortuitously it could never have actually bitten. While fixing this,
52- some other code that could never be obeyed was discovered and removed.
57+ 9. (1c24ba01) There was an error in the table of lengths for parsed items for
58+ the OPTIONS item, but fortuitously it could never have actually bitten. While
59+ fixing this, some other code that could never be obeyed was discovered and
60+ removed.
5361
54- 8. Removed some incoreect optimization code from DFA matching that has been
55- there since PCRE1, but has just been found to cause a no match return instead
56- of a partial match in some cases. It involves partial matching when (*F) is
57- present so is unlikely to have actually affected anyone.
62+ 10. (674b6640) Removed some incorect optimization code from DFA matching that
63+ has been there since PCRE1, but has just been found to cause a no match return
64+ instead of a partial match in some cases. It involves partial matching when (*F)
65+ is present so is unlikely to have actually affected anyone.
5866
59- 9. Tidy the wording and formatting of some pcre2test error messages concerned
60- with bad modifiers. Also restrict single-letter modifier sequences to the first
61- item in a modifier list, as documented and always intended.
67+ 11. (b0f4ac17) Tidy the wording and formatting of some pcre2test error messages
68+ concerned with bad modifiers. Also restrict single-letter modifier sequences to
69+ the first item in a modifier list, as documented and always intended.
6270
63- 10. An iterator at the end of many assertions can always be auto-possessified,
64- but not at the end of variable-length lookbehinds. THere was a bug in the code
65- that checks for such a lookbehind; it was looking only at the first branch,
66- which is wrong because some branches can be fixed length when others are not,
67- for example (?<=AB|CD?). Now all branches are checked for variability.
71+ 12. (1415565c) An iterator at the end of many assertions can always be
72+ auto-possessified, but not at the end of variable-length lookbehinds. There was
73+ a bug in the code that checks for such a lookbehind; it was looking only at the
74+ first branch, which is wrong because some branches can be fixed length when
75+ others are not, for example (?<=AB|CD?). Now all branches are checked for
76+ variability.
6877
69- 11. Matching with pcre2_match() could give an incorrect result if a
78+ 13. (ead08288) Matching with pcre2_match() could give an incorrect result if a
7079variable-length lookbehind was used as the condition in a conditional group.
7180The condition could erroneously be treated as true if a branch matched but
7281overran the current position. This bug was in the interpreter only; matching
7382with JIT was correct.
7483
75- 12. Add a new error code (PCRE2_ERROR_JIT_UNSUPPORTED) which is yielded
84+ 14. (#443) Split out the sljit sub-project into a "Git submodule". Git users
85+ must now run `git submodule init; git submodule update` after a Git checkout, or
86+ the build will fail due to missing files in deps/sljit.
87+
88+ 15. (#441) Add a new error code (PCRE2_ERROR_JIT_UNSUPPORTED) which is yielded
7689for unsupported jit features.
7790
78- 13. Add a new experimental feature called scan substring. This feature is a new
79- type of assertion which matches the content of a capturing block to a sub
80- pattern.
91+ 16. (#444) Fix bug in 'first code unit' and 'last code unit' optimization
92+ combined with lookahead assertions.
93+
94+ 17. (#445, #447, #449, #451, #452, #459, #563) Add a new feature called scan
95+ substring. This feature is a new type of assertion which matches the content of
96+ a capturing block to a sub-pattern.
8197
82- 14. Item 43 of 10.43 was incomplete because it addressed only \z and not \Z,
98+ 18. (#450) Improvements to 'first code unit' / 'starting code units'
99+ optimisation.
100+
101+ 19. (#455) Many, many improvements to the JIT compiler.
102+
103+ 20. Item 43 of 10.43 was incomplete because it addressed only \z and not \Z,
83104which was still misbehaving when matching fragments inside invalid UTF strings.
84105
85- 15. Octal escapes of the form \045 or \111 were not being recognized in
86- substitution strings, and if encountered gave an error, though the \o{...} form
87- was recognized. This bug is now fixed.
106+ 21. (d29e7290) Octal escapes of the form \045 or \111 were not being recognized
107+ in substitution strings, and if encountered gave an error, though the \o{...}
108+ form was recognized. This bug is now fixed.
109+
110+ 22. (#463, #487) Fix 1 byte out-of-bounds read when parsing malformed limits
111+ (e.g. LIMIT_HEAP)
112+
113+ 23. Many improvements to test infrastructure. Many more platforms and
114+ configurations are now run in Continuous Integration, and all the platforms now
115+ run the full test suite, rather than a partial subset.
116+
117+ 24. (#475) Implement title casing in substitution strings using Perl syntax.
118+
119+ 25. (#478, #504) Disallow \x if not followed by { or a hex digit.
120+
121+ 26. (#473) Implements Python-style backrefs in substitutions.
88122
89- 16. Merged PR475, which implements title casing in substitution strings a la
90- Perl.
123+ 27. (#472) Fix error reporting for certain over-large octal escapes.
91124
92- 17. Merged PR478, which disallows \x if not followed by { or a hex digit.
125+ 28. (#482) Fix parsing of named captures in replacement strings, allowing
126+ non-ASCII capture names to be used.
93127
94- 18. Merged PR473, which implements Python-style backrefs in substitutions.
128+ 29. (#477, #474, #488, #494, #496, #506, #508, #511, #518, #524, #540) Many
129+ improvements to parsing and optimising of character classes.
95130
96- 19. Merged PR483, which is adding \g<n> and $<name> to replacement strings.
131+ 30. (#483, #498) Add support for \g<n> and $<name> to replacement strings.
97132
98- 20. Merged PR470, which adds PCRE2_EXTRA_NO_BS0 and PCRE2_EXTRA_PYTHON_OCTAL.
133+ 31. (#470) Add option flags PCRE2_EXTRA_NO_BS0 and PCRE2_EXTRA_PYTHON_OCTAL.
99134
100- 21. Prevent 1 byte overread when parsing malformed patterns with early VERBs.
135+ 32. (#471) Add new API function pcre2_set_optimize() for controlling which
136+ optimizations are enabled.
101137
102- 22. Merged PR491 which adds $& $` $' and $_ to substitution replacements, as
103- well as interpreting \b and \v as characters.
138+ 33. (#491) Adds $& $` $' and $_ to substitution replacements, as well as
139+ interpreting \b and \v as characters.
104140
105- 23. Updated perltest.sh to enable locale setting .
141+ 34. (#499) Add option PCRE2_EXTRA_NEVER_CALLOUT to disable callouts .
106142
107- 24. Fixed a bug in JIT affecting greedy bounded repeats. The upper limit of
108- repeats inside a repeated bracket might be incorrectly checked.
143+ 35. (#503, #513) Update Unicode support to UCD 16.
109144
110- 25. Fixed a bug in JIT affecting caseful matching of backreferences. When
145+ 36. (#512, #618, #638) Add new function pcre2_set_substitute_case_callout() to
146+ allow clients to provide a custom callback with locale-aware case
147+ transformation.
148+
149+ 37. (#516) Fix case-insensitive matching of backreferences when using the
150+ PCRE2_EXTRA_CASELESS_RESTRICT option.
151+
152+ 38. (#519) In pcre2grep, add $& as an alias for $0
153+
154+ 39. (c9bf8339, #534) Updated perltest.sh to enable locale setting.
155+
156+ 40. (#521) Add support for Turkish I casefolding, using new options
157+ PCRE2_EXTRA_TURKISH_CASING, and added pre-pattern flags (*TURKISH_CASING) and
158+ (*CASELESS_RESTRICT).
159+
160+ 41. (#523, #546, #547) Add support for UTS#18 compatible character classes,
161+ using the new option PCRE2_ALT_EXTENDED_CLASS. This adds '[' as a metacharacter
162+ within character classes and the operators '&&', '--' and '~~', allowing
163+ subtractions and intersections of character classes to be easily expressed.
164+
165+ 42. (#553, #586, #596, #597) Add support for Perl-style extended character
166+ classes, using the syntax (?[...]). This also allows expressing subtractions and
167+ intersections of character classes, but using a different syntax to UTS#18.
168+
169+ 43. (#554) Fixed a bug in JIT affecting greedy bounded repeats. The upper limit
170+ of repeats inside a repeated bracket might be incorrectly checked.
171+
172+ 44. (#556) Fixed a bug in JIT affecting caseful matching of backreferences. When
111173utf is disabled, and dupnames is enabled, caseless matching was used even
112174if caseful matching was needed.
113175
114- 26. Fixed a bug in pcre2grep reported by Alejandro Colomar <alx@kernel.org>
115- (GitHub issue #577). In certain cases, when lines of above and below context
116- were contiguous, a separator line was incorrectly being inserted.
176+ 45. (f34fc0a3) Fixed a bug in pcre2grep reported by Alejandro Colomar
177+ <alx@kernel.org> (GitHub issue #577). In certain cases, when lines of above and
178+ below context were contiguous, a separator line was incorrectly being inserted.
117179
118- 27. Split out the sljit sub-project into a "git submodule". Git users must
119- now run `git submodule init; git submodule update` after a Git checkout, or
120- the build will fail due to missing files in deps/sljit.
180+ 46. (#594) Fix a small (one/two byte) out-of-bounds read on invalid UTF-8 input
181+ in pcre2grep.
182+
183+ 47. (#370) Fix the INSTALL_MSVC_PDB CMake flag.
184+
185+ 48. (#366) Install cmake files in prefix/lib/cmake/pcre2 rather than
186+ prefix/cmake. The new CMake flag PCRE2_INSTALL_CMAKEDIR allows customising this
187+ location.
188+
189+ 49. (#624, #626, #628, #632, #639, #641) Reduce code size of generated JIT code
190+ for repeated character classes.
191+
192+ 50. (#623) Update the Bazel build files.
121193
122194
123195Version 10.44 07-June-2024
0 commit comments