Skip to content

Commit 38aa121

Browse files
committed
Functions fn:regex and ?matching-substrings
1 parent 6808359 commit 38aa121

File tree

2 files changed

+182
-0
lines changed

2 files changed

+182
-0
lines changed

specifications/xpath-functions-40/src/function-catalog.xml

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,98 @@
206206
</fos:field>-->
207207
</fos:record-type>
208208

209+
<fos:record-type id="compiled-regex-record" extensible="false">
210+
<fos:summary>
211+
<p>This record type represents the results of processing a regular expression,
212+
together with a set of flags, It contains function items that can be used
213+
to evaluate the regular expression in various ways.</p>
214+
</fos:summary>
215+
<fos:field name="regex" type="xs:string" required="true">
216+
<fos:meaning>
217+
<p>The supplied regular expression, as a string.</p>
218+
</fos:meaning>
219+
</fos:field>
220+
<fos:field name="flags" type="xs:string" required="true">
221+
<fos:meaning>
222+
<p>The supplied flags, as a string; or the zero-length string if no flags were supplied.</p>
223+
</fos:meaning>
224+
</fos:field>
225+
<fos:field name="matches" type="fn($s as xs:string) as xs:boolean" required="true">
226+
<fos:meaning>
227+
<p>An arity-one function that can be used to test whether the supplied string <var>$s</var> matches
228+
the regular expression. If <var>R</var> is a <code>compiled-regex-record</code>
229+
produced by the function <function>fn:regex</function>, then the effect
230+
of the expression <code>R?matches($s)</code> is defined
231+
to be the same as the result of <code>fn:matches($s, R?regex, R?flags)</code>.</p>
232+
</fos:meaning>
233+
</fos:field>
234+
<fos:field name="tokenize" type="fn($s as xs:string) as xs:string*" required="true">
235+
<fos:meaning>
236+
<p>An arity-one function that can be used to split the supplied string <var>$s</var> on
237+
separators that match
238+
the regular expression. If <var>R</var> is a <code>compiled-regex-record</code>
239+
produced by the function <function>fn:regex</function>, then the effect
240+
of the expression <code>R?tokenize($s)</code> is defined
241+
to be the same as the result of <code>fn:tokenize($s, R?regex, R?flags)</code>.</p>
242+
</fos:meaning>
243+
</fos:field>
244+
<fos:field name="replace" type="fn($s as xs:string, $replacement as (xs:string | fn(xs:untypedAtomic, xs:untypedAtomic*) as item()?)?) as xs:string*" required="true">
245+
<fos:meaning>
246+
<p>An arity-two function that can be used to replace parts of the supplied string <var>$s</var> that match
247+
the regular expression. If <var>R</var> is a <code>compiled-regex-record</code>
248+
produced by the function <function>fn:regex</function>, then the effect
249+
of the expression <code>R?replace($s, $rep)</code> is defined
250+
to be the same as the result of <code>fn:replace($s, R?regex, $rep, R?flags)</code>.</p>
251+
</fos:meaning>
252+
</fos:field>
253+
<fos:field name="analyze-string" type="fn($s as xs:string) as element(fn:analyze-string-result)" required="true">
254+
<fos:meaning>
255+
<p>An arity-one function that can be used to process a string against the regular expression
256+
and return all the matching and non-matching substrings, plus matching groups, in an XML structure.
257+
If <var>R</var> is a <code>compiled-regex-record</code>
258+
produced by the function <function>fn:regex</function>, then the effect
259+
of the expression <code>R?analyze-string($s)</code> is defined
260+
to be the same as the result of <code>fn:analyze-string($s, R?regex, R?flags)</code>.</p>
261+
</fos:meaning>
262+
</fos:field>
263+
<fos:field name="matching-segments" type="fn($s as xs:string) as fn:matching-segment-record*" required="true">
264+
<fos:meaning>
265+
<p>An arity-one function that can be used to process a string against the regular expression
266+
and return details of all the matching substrings, together with their captured groups.
267+
The result is returned as a sequence of records of type <code>fn:matching-segment-record</code>.
268+
</p>
269+
</fos:meaning>
270+
</fos:field>
271+
</fos:record-type>
272+
273+
<fos:record-type id="matching-segment-record" extensible="false">
274+
<fos:summary>
275+
<p>This record type represents the a segment of an input string that matched a regular expression,
276+
together with any captured groups found during the matching process.</p>
277+
</fos:summary>
278+
<fos:field name="substring" type="xs:string" required="true">
279+
<fos:meaning>
280+
<p>The substring of the original input that matched the regular expression. May be zero-length.</p>
281+
</fos:meaning>
282+
</fos:field>
283+
<fos:field name="position" type="xs:integer" required="true">
284+
<fos:meaning>
285+
<p>The 1-based position of the matching segment within the original input string.</p>
286+
</fos:meaning>
287+
</fos:field>
288+
<fos:field name="groups" type="map(xs:integer, record($group as xs:string, $position as xs:integer))" required="true">
289+
<fos:meaning>
290+
<p>The captured groups for this matching segment. This is represented as a map from the group number
291+
(corresponding to the sequence of left parentheses introducing capturing subexpressions of the
292+
regular expression) to details of the captured group: specifically, the 1-based start position of the
293+
captured group within the original input string, and the string content of the capture. Note that
294+
where groups are captured within a lookahead, the captured group is not necessarily a substring
295+
of the matching segment.</p>
296+
</fos:meaning>
297+
</fos:field>
298+
</fos:record-type>
299+
300+
209301
<fos:record-type id="load-xquery-module-record" extensible="false">
210302
<fos:summary>
211303
<p>This record type is used to hold the result of the <function>fn:load-xquery-module</function> function.</p>
@@ -7902,6 +7994,86 @@ Tak, tak, tak! - da kommen sie.
79027994
</fos:change>
79037995
</fos:changes>
79047996
</fos:function>
7997+
7998+
<fos:function name="regex" prefix="fn">
7999+
<fos:signatures>
8000+
<fos:proto name="regex" return-type="fn:compiled-regex-record">
8001+
<fos:arg name="pattern" type="xs:string"/>
8002+
<fos:arg name="flags" type="xs:string?" default='""'/>
8003+
</fos:proto>
8004+
</fos:signatures>
8005+
<fos:properties>
8006+
<fos:property>deterministic</fos:property>
8007+
<fos:property>context-independent</fos:property>
8008+
<fos:property>focus-independent</fos:property>
8009+
</fos:properties>
8010+
<fos:summary>
8011+
<p>Processes a regular expression, plus flags, into a form suitable for repeated use.</p>
8012+
</fos:summary>
8013+
<fos:rules>
8014+
<p>The value of <code>$regex</code> must be a regular expression as defined
8015+
in <specref ref="regex-syntax"/>.</p>
8016+
<p>The value of <code>$flags</code> (if supplied) must be a string representing
8017+
a set of flags as defined in <specref ref="flags"/>.</p>
8018+
<p>The function returns a record, whose fields are function items that can
8019+
be used to evaluate the regular expression in various ways. The structure
8020+
of this record is described in <specref ref="compiled-regex-record"/></p>
8021+
</fos:rules>
8022+
<fos:errors>
8023+
<p>A dynamic error is raised <errorref class="RX" code="0002"
8024+
/> if the value of
8025+
<code>$pattern</code> is invalid according to the rules described in section <specref
8026+
ref="regex-syntax"/>.</p>
8027+
<p>A dynamic error is raised <errorref class="RX" code="0001"
8028+
/> if the value of
8029+
<code>$flags</code> is invalid according to the rules described in section <specref
8030+
ref="flags"/>.</p>
8031+
</fos:errors>
8032+
<fos:notes>
8033+
<p>The function is designed to allow an implementation to process the regular expression
8034+
into an internal compiled form, which may yield performance benefits if it is used repeatedly.</p>
8035+
</fos:notes>
8036+
<fos:examples>
8037+
<fos:example>
8038+
<fos:test>
8039+
<fos:expression><eg>let $regex := regex('^[a-z]+$')
8040+
return tokenize("A group of 10 students")[$regex ? matches(.)]</eg></fos:expression>
8041+
<fos:result><eg>"group", "of", "students"</eg></fos:result>
8042+
</fos:test>
8043+
<fos:test>
8044+
<fos:expression><eg>let $regex := regex('^(a-z).*')
8045+
return tokenize("A group of 10 students")!$regex ? replace(., '$1', upper-case#1)]</eg></fos:expression>
8046+
<fos:result><eg>"A", "Group", "Of", "10", "Students"</eg></fos:result>
8047+
</fos:test>
8048+
<fos:test>
8049+
<fos:expression><eg>regex("([A-Z])([0-9]+)") ? matching-segments("A1,C15,,D24, X50,")</eg></fos:expression>
8050+
<fos:result><eg>
8051+
{ "substring": "A1",
8052+
"position": 1,
8053+
"groups": {1: { "group": "A", "position": 1 },
8054+
2: { "group": "1", "position": 2 }}
8055+
},
8056+
{ "substring": "C15",
8057+
"position": 4,
8058+
"groups": {1: { "group": "C", "position": 4 },
8059+
2: { "group": "15", "position": 5 }}
8060+
},
8061+
{ "substring": "D24",
8062+
"position": 9,
8063+
"groups": {1: { "group": "D", "position": 9 },
8064+
2: { "group": "24", "position": 10 }}
8065+
},
8066+
{ "substring": "X50",
8067+
"position": 14,
8068+
"groups": {1: { "group": "X", "position": 14 },
8069+
2: { "group": "50", "position": 15 }}
8070+
}</eg>
8071+
</fos:result>
8072+
</fos:test>
8073+
</fos:example>
8074+
</fos:examples>
8075+
</fos:function>
8076+
79058077
<fos:function name="analyze-string" prefix="fn">
79068078
<fos:signatures>
79078079
<fos:proto name="analyze-string" return-type="element(fn:analyze-string-result)">
@@ -7922,6 +8094,7 @@ Tak, tak, tak! - da kommen sie.
79228094
regular expression.</p>
79238095
</fos:summary>
79248096
<fos:rules>
8097+
79258098
<p>If the <code>$flags</code> argument is omitted or if it is the empty sequence,
79268099
the effect is the same as setting <code>$flags</code> to a zero-length string.
79278100
Flags are defined in <specref ref="flags"/>.</p>

specifications/xpath-functions-40/src/xpath-functions.xml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4087,6 +4087,15 @@ WildcardEsc ::= '.'
40874087
<div3 id="func-analyze-string">
40884088
<head><?function fn:analyze-string?></head>
40894089
</div3>
4090+
<div3 id="func-regex">
4091+
<head><?function fn:regex?></head>
4092+
</div3>
4093+
<div3 id="compiled-regex-record">
4094+
<head><?record-description compiled-regex-record?></head>
4095+
</div3>
4096+
<div3 id="matching-segment-record">
4097+
<head><?record-description matching-segment-record?></head>
4098+
</div3>
40904099

40914100

40924101
</div2>

0 commit comments

Comments
 (0)