Skip to content

Commit 2a49db0

Browse files
committed
maint: introduce LintMan to aid on tracking/updating values
Allow tagging the documentation with a `#define` value that could be then updated programmatically. Update the value for MAX_NAME_SIZE in pcre2limits.3 that was missing since ced3b0f (Increase name length to 128, 2024-03-11) and while at it, improve on its description and add a tag for a related variable. For completeness, add also a tag to the same value in pcre2pattern.3 and the configuration for VMS that was missing since 6c670c7 (Update overlooked cmake update of name size to 128, 2024-03-11) and add it to UpdateAlways.
1 parent 507bdd0 commit 2a49db0

File tree

10 files changed

+104
-17
lines changed

10 files changed

+104
-17
lines changed

doc/html/pcre2limits.html

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,11 @@ <h2>
6464
a compile context.
6565
</p>
6666
<p>
67-
The maximum length of name for a named capture group is 32 code units, and the
68-
maximum number of such groups is 10000.
67+
The maximum length of the name for a named capture group as well as the number
68+
of such groups is configurable at build time. The maximum length for the name
69+
defaults to
70+
128 code units, and the maximum number of such groups to
71+
10000.
6972
</p>
7073
<p>
7174
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
@@ -96,7 +99,7 @@ <h2>
9699
REVISION
97100
</h2>
98101
<p>
99-
Last updated: 16 August 2023
102+
Last updated: 17 August 2025
100103
<br>
101104
Copyright &copy; 1997-2023 University of Cambridge.
102105
<br>

doc/html/pcre2pattern.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2007,8 +2007,8 @@ <h2><a name="SEC18" href="#TOC1">NAMED CAPTURE GROUPS</a></h2>
20072007
</p>
20082008
<p>
20092009
In PCRE2, a capture group can be named in one of three ways: (?&#60;name&#62;...) or
2010-
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names may be up to 128
2011-
code units long. When PCRE2_UTF is not set, they may contain only ASCII
2010+
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names may be up to
2011+
128 code units long. When PCRE2_UTF is not set, they may contain only ASCII
20122012
alphanumeric characters and underscores, but must start with a non-digit. When
20132013
PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode
20142014
letter or Unicode decimal digit. In other words, group names must match one of
@@ -4183,7 +4183,7 @@ <h2><a name="SEC33" href="#TOC1">AUTHOR</a></h2>
41834183
</p>
41844184
<h2><a name="SEC34" href="#TOC1">REVISION</a></h2>
41854185
<p>
4186-
Last updated: 28 March 2025
4186+
Last updated: 17 August 2025
41874187
<br>
41884188
Copyright &copy; 1997-2024 University of Cambridge.
41894189
<br>

doc/pcre2.txt

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6238,8 +6238,10 @@ SIZE AND OTHER LIMITATIONS
62386238
is set to 250. An application can change this limit by calling
62396239
pcre2_set_parens_nest_limit() to set the limit in a compile context.
62406240

6241-
The maximum length of name for a named capture group is 32 code units,
6242-
and the maximum number of such groups is 10000.
6241+
The maximum length of the name for a named capture group as well as the
6242+
number of such groups is configurable at build time. The maximum length
6243+
for the name defaults to 128 code units, and the maximum number of such
6244+
groups to 10000.
62436245

62446246
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or
62456247
(*THEN) verb is 255 code units for the 8-bit library and 65535 code
@@ -6262,7 +6264,7 @@ AUTHOR
62626264

62636265
REVISION
62646266

6265-
Last updated: 16 August 2023
6267+
Last updated: 17 August 2025
62666268
Copyright (c) 1997-2023 University of Cambridge.
62676269

62686270

@@ -10747,7 +10749,7 @@ AUTHOR
1074710749

1074810750
REVISION
1074910751

10750-
Last updated: 28 March 2025
10752+
Last updated: 17 August 2025
1075110753
Copyright (c) 1997-2024 University of Cambridge.
1075210754

1075310755

doc/pcre2limits.3

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,13 @@ when PCRE2 is built; if not, the default is set to 250. An application can
4747
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
4848
a compile context.
4949
.P
50-
The maximum length of name for a named capture group is 32 code units, and the
51-
maximum number of such groups is 10000.
50+
The maximum length of the name for a named capture group as well as the number
51+
of such groups is configurable at build time. The maximum length for the name
52+
defaults to
53+
.\" DEFINE MAX_NAME_SIZE
54+
128 code units, and the maximum number of such groups to
55+
.\" DEFINE MAX_NAME_COUNT
56+
10000.
5257
.P
5358
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
5459
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
@@ -76,6 +81,6 @@ Cambridge, England.
7681
.rs
7782
.sp
7883
.nf
79-
Last updated: 16 August 2023
84+
Last updated: 17 August 2025
8085
Copyright (c) 1997-2023 University of Cambridge.
8186
.fi

doc/pcre2pattern.3

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2015,8 +2015,9 @@ the naming of capture groups. This feature was not added to Perl until release
20152015
using the Python syntax. PCRE2 supports both the Perl and the Python syntax.
20162016
.P
20172017
In PCRE2, a capture group can be named in one of three ways: (?<name>...) or
2018-
(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to 128
2019-
code units long. When PCRE2_UTF is not set, they may contain only ASCII
2018+
(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to
2019+
.\" DEFINE MAX_NAME_SIZE
2020+
128 code units long. When PCRE2_UTF is not set, they may contain only ASCII
20202021
alphanumeric characters and underscores, but must start with a non-digit. When
20212022
PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode
20222023
letter or Unicode decimal digit. In other words, group names must match one of
@@ -4229,6 +4230,6 @@ Cambridge, England.
42294230
.rs
42304231
.sp
42314232
.nf
4232-
Last updated: 28 March 2025
4233+
Last updated: 17 August 2025
42334234
Copyright (c) 1997-2024 University of Cambridge.
42344235
.fi

maint/CheckMan

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ while (scalar(@ARGV) > 0)
3939
^\.P\s*$|
4040
^\.PP\s*$|
4141
^\.\\"(?:\ HREF)?\s*$|
42+
^\.\\"\sDEFINE\s\w+$|
4243
^\.\\"\sHTML\s<a\shref="[^"]+?">\s*$|
4344
^\.\\"\sHTML\s<a\sname="[^"]+?"><\/a>\s*$|
4445
^\.\\"\s<\/a>\s*$|

maint/LintMan

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#!/usr/bin/perl
2+
3+
use warnings;
4+
use strict;
5+
use Getopt::Long;
6+
use vars qw /$opt_verbose/;
7+
8+
# A script to scan PCRE2's man pages to check for values that might need to
9+
# be updatd to match the code.
10+
#
11+
# It updates numerical values after \" DEFINE <name> or errors if name is
12+
# not found.
13+
14+
my $file;
15+
my %defs;
16+
17+
foreach $file ("../src/config.h")
18+
{
19+
open (INCLUDE, $file) or die "Failed to open include $file\n";
20+
21+
while (<INCLUDE>)
22+
{
23+
next unless /^#define ([[:upper:]_\d]+)\s+(\d+)/a;
24+
$defs{$1} = $2;
25+
}
26+
27+
close(INCLUDE);
28+
}
29+
30+
GetOptions("verbose");
31+
while (scalar(@ARGV) > 0)
32+
{
33+
$file = shift @ARGV;
34+
35+
open my $fh, "+<", $file or die "Failed to open $file\n";
36+
37+
my @lines = <$fh>;
38+
my $updated = 0;
39+
40+
foreach my $index (0 .. $#lines)
41+
{
42+
if ($lines[$index] =~ /^\.\\"\sDEFINE\s([[:upper:]_\d]+)$/a)
43+
{
44+
my $l = $index + 1;
45+
die "Invalid DEFINE line $l of $file\n" unless defined $lines[$l];
46+
47+
my $key = $1;
48+
die "Bad DEFINE key $key line $l of $file\n" unless exists $defs{$key};
49+
50+
my $value = $defs{$key};
51+
if ($lines[$index + 1] !~ /^$value\b/)
52+
{
53+
$updated += $lines[$index + 1] =~ s/^\d+/$value/a;
54+
print "Updated $key in $file to $value\n" if $opt_verbose;
55+
}
56+
}
57+
}
58+
59+
if ($updated > 0)
60+
{
61+
seek($fh, 0, 0);
62+
print $fh @lines;
63+
truncate($fh, tell($fh));
64+
}
65+
close($fh);
66+
}

maint/README

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,10 @@ GenerateUcpTables.py
6060
GenerateCommon.py and Unicode data files. The generated file contains tables
6161
for looking up Unicode property names.
6262

63+
LintMan
64+
A Perl script to check and update magic numbers in the documentation that
65+
correspond to configurable settings in the codebase.
66+
6367
manifest-*
6468
Data files used to verify the contents of the distribution tarball and
6569
`make install` file lists.

maint/UpdateAlways

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919

2020
# Detrail A Perl script that removes trailing spaces from files.
2121

22+
# LintMan A Perl script that lints man pages looking for inconsistencies.
23+
2224
# doc/index.html.src
2325
# A file that is copied as index.html into the doc/html directory
2426
# when the HTML documentation is built. It works like this so that
@@ -54,6 +56,9 @@ echo Processing documentation
5456
perl ../maint/CheckMan *.1 *.3
5557
if [ $? != 0 ] ; then exit 1; fi
5658

59+
perl ../maint/LintMan -v *.3
60+
if [ $? != 0 ] ; then exit 1; fi
61+
5762
# Verify the version number in the man pages
5863

5964
for file in *.1 *.3 ; do

vms/configure.com

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -905,7 +905,7 @@ sure both macros are undefined; an emulation function will then be used. */
905905
#define PCRE2_EXPORT
906906
#define LINK_SIZE 2
907907
#define MAX_NAME_COUNT 10000
908-
#define MAX_NAME_SIZE 32
908+
#define MAX_NAME_SIZE 128
909909
#define MATCH_LIMIT 10000000
910910
#define HEAP_LIMIT 20000000
911911
#define NEWLINE_DEFAULT 2

0 commit comments

Comments
 (0)