Skip to content

Commit 2ebd55e

Browse files
authored
Explain (?<name>...) and \k<name> in vignette (#588)
Fixes #415
1 parent fc4e494 commit 2ebd55e

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

vignettes/regular-expressions.Rmd

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@ str_extract(c("grey", "gray"), "gre|ay")
249249
str_extract(c("grey", "gray"), "gr(e|a)y")
250250
```
251251

252-
Parenthesis also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters:
252+
Parentheses also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters:
253253

254254
```{r}
255255
pattern <- "(..)\\1"
@@ -270,6 +270,25 @@ str_match(c("grey", "gray"), "gr(?:e|a)y")
270270

271271
This is most useful for more complex cases where you need to capture matches and control precedence independently.
272272

273+
You can use `(?<name>...)`, the named capture group, to provide a reference to the matched text. This is more readable and maintainable, especially with complex regular expressions, because you can reference the matched text by name instead of a potentially confusing numerical index.
274+
275+
*Note: `<name>` should not include an underscore because they are not supported.*
276+
277+
```{r}
278+
date_string <- "Today's date is 2025-09-19."
279+
pattern <- "(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})"
280+
str_match(date_string, pattern)
281+
```
282+
283+
You can then use `\k<name>` to backreference the previously captured named group. It is an alternative to the standard numbered backreferences like `\1` or `\2`.
284+
285+
```{r}
286+
text <- "This is is a test test with duplicates duplicates"
287+
pattern <- "(?<word>\\b\\w+\\b)\\s+\\k<word>"
288+
str_subset(text, pattern)
289+
str_match_all(text, pattern)
290+
```
291+
273292
## Anchors
274293

275294
By default, regular expressions will match any part of a string. It's often useful to __anchor__ the regular expression so that it matches from the start or end of the string:

0 commit comments

Comments
 (0)