Skip to content

Commit 233e864

Browse files
Merge pull request #4734 from linuxfoundation/unicron-support-empty-patterns-in-skip-cla
Add support for patterns matching empty or missing properties: login, email, name
2 parents 370ffc0 + c07826e commit 233e864

File tree

5 files changed

+43
-30
lines changed

5 files changed

+43
-30
lines changed

BOT_ALLOWLIST.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,26 +6,27 @@ This can be done on the GitHub organization level by setting the `skip_cla` prop
66

77
Replace `{stage}` with either `dev` or `prod`.
88

9-
This property is a Map attribute that contains mapping from repository pattern to bot GitHub login, email and name pattern.
9+
This property is a map attribute that contains mapping from repository pattern to bot GitHub login, email and name pattern.
1010

1111
Example `login` is `lukaszgryglicki` (like any `login` that can be accessed via `https://github.com/login`).
1212

1313
This is sometimes called `username` but we use `login` to avoid confusion with the `name` attribute.
1414

1515
Example name is `"Lukasz Gryglicki"`.
1616

17-
Email pattern and name pattern are optional and `*` is assumed for them if not specified.
17+
Email pattern and name pattern are optional and `""` (empty) is assumed for them if not specified.
1818

1919
Each pattern is a string and can be one of three possible types (and are checked tin this order):
2020
- `"name"` - exact match for repository name, GitHub login, email address, GitHub name.
21+
- `""` - (empty string) pattern is special and it matches missing property, property with null value or property with empty string value.
2122
- `"re:regexp"` - regular expression match for repository name, GitHub login, name, or email address.
2223
- `"*"` - matches all.
2324

2425
So the format is like `"repository_pattern": "login_pattern;email_pattern;name_pattern"`. `;` is used as a separator.
2526

2627
You can also specify multiple patterns so different set is used for multiple users - in such case configuration must start with `[`, end with `]` and be `||` separated.
2728

28-
For example: `"[copilot-swe-agent[bot];*;*||re:(?i)^l(ukasz)?gryglicki$;*;re:Gryglicki]"`.
29+
For example: `"[;*;copilot-swe-agent[bot];||re:(?i)^l(ukasz)?gryglicki$;*;re:Gryglicki]"`.
2930

3031
Full format is like `"repository_pattern": "[login_pattern;email_pattern;name_pattern||..]"`.
3132

@@ -48,7 +49,7 @@ Example:
4849
"skip_cla": {
4950
"M": {
5051
"*": {
51-
"S": "*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"
52+
"S": ";re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;copilot-swe-agent[bot]"
5253
},
5354
"re:(?i)^repo[0-9]+$": {
5455
"S": "re:vee?rendra;*;*"
@@ -65,19 +66,19 @@ Algorithm to match pattern is as follows:
6566
- First we check repository name for exact match. Repository name is without the organization name, so for `https://github.com/linuxfoundation/easycla` it is just `easycla`. If we find an entry in `skip_cla` for `easycla` that entry is used and we stop searching.
6667
- If no exact match is found, we check for regular expression match. Only keys starting with `re:` are considered. If we find a match, we use that entry and stop searching.
6768
- If no match is found, we check for `*` entry. If it exists, we use that entry and stop searching.
68-
- If no match is found, we don't skip CLA check.
69+
- If no match is found, we don't skip CLA check for any author.
6970
- Now when we have the entry, it is in the following format: `login_pattern;email_pattern;name_pattern` or `"[login_pattern;email_pattern;name_pattern||...]" (array)`.
70-
- We check GitHub login, email address and name against the patterns. Algorithm is the same - login, email and name patterns can be either direct match or `re:regexp` or `*`.
71+
- We check GitHub login, email address and name against the patterns. Algorithm is the same - login, email and name patterns can be either direct match ("" is a special case that also matches missing or null) or `re:regexp` or `*`.
7172
- If login, email and name match the patterns, we skip CLA check. If login, email or name is not set but the pattern is `*` it means hit.
72-
- So setting pattern to `login_pattern;*;*` or `login_pattern` (which is equivalent) means that we only check for login match and assume all emails and names are valid.
73+
- So setting pattern to `login_pattern;*;*` means that we only check for login match and assume all emails and names are valid.
7374
- Any actor that matches any of the entries in the array will be skipped (logical OR).
7475
- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization.
7576
- If there are also specific repository patterns, they will be used instead of `*` (fallback for all).
7677

7778

7879
There is a script that allows you to update the `skip_cla` property in the DynamoDB table. It is located in `utils/skip_cla_entry.sh`. You can run it like this:
7980
- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'login-pattern;email-pattern;name_pattern' ``.
80-
- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' 'copilot-swe-agent[bot];*;*' ``.
81+
- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' ';*;copilot-swe-agent[bot]' ``.
8182
- Complex example: `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$' '[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]' ``.
8283

8384
`MODE` can be one of:
@@ -107,7 +108,7 @@ aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
107108
--key '{"organization_name": {"S": "linuxfoundation"}}' \
108109
--update-expression "SET skip_cla.#repo = :val" \
109110
--expression-attribute-names '{"#repo": "re:^easycla"}' \
110-
--expression-attribute-values '{":val": {"S": "some-github-login"}}'
111+
--expression-attribute-values '{":val": {"S": "some-github-login;*;*"}}'
111112
```
112113

113114
To delete a key from an existing `skip_cla` entry:
@@ -143,14 +144,14 @@ To check for log entries related to skipping CLA check, you can use the followin
143144

144145
To add first `skip_cla` value for an organization:
145146
```
146-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"otel-arrow":{"S":"*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}'
147-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"vscode-ext":{"S":"*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}'
147+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"otel-arrow":{"S":";re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;copilot-swe-agent[bot]"}}}}'
148+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"vscode-ext":{"S":";re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;copilot-swe-agent[bot]"}}}}'
148149
```
149150

150151
To add additional repositories entries without overwriting the existing `skip_cla` value:
151152
```
152-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$"}}'
153-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$"}}'
153+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": ";re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;copilot-swe-agent[bot]"}}'
154+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": ";re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;copilot-swe-agent[bot]"}}'
154155
```
155156

156157
To delete a specific repo entry from `skip_cla`:
@@ -175,6 +176,6 @@ aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs"
175176

176177
Typical adding a new entry for an organization:
177178
```
178-
STAGE=prod MODE=add-key DEBUG=1 ./utils/skip_cla_entry.sh 'open-telemetry' 'opentelemetry-rust' '*;re:^\d+\+Copilot@users\.noreply\.github\.com$;copilot-swe-agent[bot]'
179+
STAGE=prod MODE=add-key DEBUG=1 ./utils/skip_cla_entry.sh 'open-telemetry' 'opentelemetry-rust' ';re:^\d+\+Copilot@users\.noreply\.github\.com$;copilot-swe-agent[bot]'
179180
```
180181

cla-backend-go/github/bots.go

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import (
1616

1717
// propertyMatches returns true if value matches the pattern.
1818
// - "*" matches anything
19+
// - "" matches empty value
1920
// - "re:..." matches regex (value must be non-empty)
2021
// - otherwise, exact match
2122
func propertyMatches(pattern, value string) bool {
@@ -27,6 +28,9 @@ func propertyMatches(pattern, value string) bool {
2728
if pattern == "*" {
2829
return true
2930
}
31+
if pattern == "" && value == "" {
32+
return true
33+
}
3034
if value == "" {
3135
return false
3236
}
@@ -54,12 +58,12 @@ func stripOrg(repoFull string) string {
5458

5559
// isActorSkipped returns true if the actor should be skipped according to ANY pattern in config.
5660
// Each config entry is "<login_pattern>;<email_pattern>;<name_pattern>"
57-
// Any missing pattern defaults to "*"
61+
// Any missing pattern defaults to "" which is special and matches missing property, null property value or empty string property value
5862
func isActorSkipped(actor *UserCommitSummary, config []string) bool {
5963
for _, pattern := range config {
6064
parts := strings.Split(pattern, ";")
6165
for len(parts) < 3 {
62-
parts = append(parts, "*")
66+
parts = append(parts, "")
6367
}
6468
loginPattern, emailPattern, namePattern := parts[0], parts[1], parts[2]
6569

@@ -143,9 +147,10 @@ func parseConfigPatterns(config string) []string {
143147
// - repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo")
144148
// - re:repo-regexp is a regex pattern to match repository names
145149
// - * is a wildcard that applies to all repositories
146-
// - <login_pattern> is a GitHub login pattern (exact match or regex prefixed by re: or match all '*')
147-
// - <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*'
148-
// - <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*'
150+
// - <login_pattern> is a GitHub login pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to ""
151+
// - <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to ""
152+
// - <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to ""
153+
// "" matches empty value, null value or missing property
149154
// The login, email and name patterns are separated by a semicolon (;). Email and name parts are optional.
150155
// There can be an array of patterns for a single repository, separated by ||. It must start with a '[' and end with a ']': "[...||...||...]"
151156
// If the skip_cla is not set, it will skip the allowlisted bots check.

cla-backend-go/swagger/common/github-organization.yaml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,24 +77,26 @@ properties:
7777
Map of repository name or pattern (e.g. 'repo1', '*', 're:pattern') to a string or array-string of pattern entries for skipping CLA checks for certain bots.
7878
7979
Each value can be either:
80-
- A string in the form '<login_pattern>;<email_pattern>;<name_pattern>' (email and name patterns are optional, default to '*').
80+
- A string in the form '<login_pattern>;<email_pattern>;<name_pattern>' (email and name patterns are optional, default to '').
8181
- Or an OR-array in the form '[<entry1>||<entry2>||...]', where each entry uses the same pattern format above.
8282
8383
Patterns can be:
8484
- An exact match (e.g. 'repo1', 'login', 'Name Surname', 'email@domain').
85+
- A special case of exact match is '' pattern - it matches empty string, null property value or missing property.
8586
- A regular expression prefixed with 're:' (e.g. 're:(?i)^bot.*$').
8687
- A wildcard '*' to match all.
8788
8889
Example formats:
89-
- "copilot-swe-agent[bot];*;*"
90+
- ";*;copilot-swe-agent[bot]"
9091
- "re:vee?rendra;*;*"
9192
- "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]"
92-
- "login;*"
93+
- "login;*;*"
9394
- "login;[email protected];Real Name"
9495
example:
95-
"*": "copilot-swe-agent[bot];*;*"
96-
"repo1": "re:vee?rendra;*;*"
97-
"re:(?i)^repo[0-9]+$": "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]"
96+
'*': 'some-bot-login;*;*'
97+
'repo1': 're:vee?rendra;*;*'
98+
're:(?i)^repo[0-9]+$': '[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||;re:^\d+\+Copilot@users\.noreply\.github\.com$;copilot-swe-agent[bot]]'
99+
98100
repositories:
99101
type: object
100102
properties:

cla-backend/cla/models/github_models.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -932,12 +932,15 @@ def property_matches(self, pattern, value):
932932
"""
933933
Returns True if value matches the pattern.
934934
- '*' matches anything
935+
- '' matches None or empty string
935936
- 're:...' matches regex - value must be set
936937
- otherwise, exact match
937938
"""
938939
try:
939940
if pattern == '*':
940941
return True
942+
if pattern == '' and (value is None or value == ''):
943+
return True
941944
if value is None or value == '':
942945
return False
943946
if pattern.startswith('re:'):
@@ -952,7 +955,7 @@ def is_actor_skipped(self, actor, config):
952955
"""
953956
Returns True if the actor should be skipped (allowlisted) based on config pattern.
954957
config: '<login_pattern>;<email_pattern>;<name_pattern>'
955-
If any pattern is missing, it defaults to '*'
958+
If any pattern is missing, it defaults to '' which is special and matches None or empty string.
956959
It returns true if ANY config entry matches or false if there is no match in any config entry.
957960
"""
958961
try:
@@ -965,7 +968,7 @@ def is_actor_skipped(self, actor, config):
965968
# Otherwise, treat as string pattern
966969
parts = config.split(';')
967970
while len(parts) < 3:
968-
parts.append('*')
971+
parts.append('')
969972
login_pattern, email_pattern, name_pattern = parts[:3]
970973
login = getattr(actor, "author_login", None)
971974
email = getattr(actor, "author_email", None)
@@ -1028,9 +1031,10 @@ def skip_allowlisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tupl
10281031
- repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo")
10291032
- re:repo-regexp is a regex pattern to match repository names
10301033
- * is a wildcard that applies to all repositories
1031-
- <login_pattern> is a GitHub login pattern (exact match or regex prefixed by re: or match all '*')
1032-
- <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') - defaults to '*' if not set
1033-
- <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') - defaults to '*' if not set
1034+
- <login_pattern> is a GitHub login pattern (exact match or regex prefixed by re: or match all '*') - defaults to '' if not set
1035+
- <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') - defaults to '' if not set
1036+
- <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') - defaults to '' if not set
1037+
:note: '' is a special pattern that matches None or empty string.
10341038
:note: The login (sometimes called username it's the same), email and name patterns are separated by a semicolon (;).
10351039
:note: There can be an array of patterns - it must start with [ and with ] and be || separated.
10361040
:note: If the skip_cla is not set, it will skip the allowlisted bots check.

utils/skip_cla_entry.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
# ./utils/scan.sh github-orgs organization_name sun-test-org
1212
# STAGE=dev DTFROM='1 hour ago' DTTO='1 second ago' ./utils/search_aws_log_group.sh 'cla-backend-dev-githubactivity' 'skip_cla'
1313
# MODE=delete-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$'
14+
# STAGE=dev MODE=add-key DEBUG=1 ./utils/skip_cla_entry.sh 'sun-test-org' 'repo1' 'thakurveerendras;;*'
1415
# STAGE=prod MODE=add-key DEBUG=1 ./utils/skip_cla_entry.sh 'open-telemetry' 'opentelemetry-rust' '*;re:^\d+\+Copilot@users\.noreply\.github\.com$;copilot-swe-agent[bot]'
1516

1617
if [ -z "$MODE" ]

0 commit comments

Comments
 (0)