Skip to content

Commit e4a0d08

Browse files
Merge pull request #4730 from linuxfoundation/unicron-update-skiping-bots-cla-documentation
Clarify what login is used for vs username, account for GitHub bots (like Copilot) having no login but only email
2 parents 1c5dbae + c8eceff commit e4a0d08

File tree

4 files changed

+62
-46
lines changed

4 files changed

+62
-46
lines changed

WHITELISTING_BOTS.md

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,33 +6,35 @@ This can be done on the GitHub organization level by setting the `skip_cla` prop
66

77
Replace `{stage}` with either `dev` or `prod`.
88

9-
This property is a Map attribute that contains mapping from repository pattern to bot username (GitHub login), email and name pattern.
9+
This property is a Map attribute that contains mapping from repository pattern to bot GitHub login, email and name pattern.
1010

11-
Example `username/login` is `lukaszgryglicki` (like any `username/login` that can be accessed via `https://github.com/username`).
11+
Example `login` is `lukaszgryglicki` (like any `login` that can be accessed via `https://github.com/login`).
12+
13+
This is sometimes called `username` but we use `login` to avoid confusion with the `name` attribute.
1214

1315
Example name is `"Lukasz Gryglicki"`.
1416

1517
Email pattern and name pattern are optional and `*` is assumed for them if not specified.
1618

1719
Each pattern is a string and can be one of three possible types (and are checked tin this order):
18-
- `"name"` - exact match for repository name, GitHub login/username, email address, GitHub name.
19-
- `"re:regexp"` - regular expression match for repository name, GitHub username, or email address.
20+
- `"name"` - exact match for repository name, GitHub login, email address, GitHub name.
21+
- `"re:regexp"` - regular expression match for repository name, GitHub login, name, or email address.
2022
- `"*"` - matches all.
2123

22-
So the format is like `"repository_pattern": "github_username_pattern;email_pattern;name_pattern"`. `;` is used as a separator.
24+
So the format is like `"repository_pattern": "login_pattern;email_pattern;name_pattern"`. `;` is used as a separator.
2325

2426
You can also specify multiple patterns so different set is used for multiple users - in such case configuration must start with `[`, end with `]` and be `||` separated.
2527

2628
For example: `"[copilot-swe-agent[bot];*;*||re:(?i)^l(ukasz)?gryglicki$;*;re:Gryglicki]"`.
2729

28-
Full format is like `"repository_pattern": "[github_username_pattern;email_pattern;name_pattern||..]"`.
30+
Full format is like `"repository_pattern": "[login_pattern;email_pattern;name_pattern||..]"`.
2931

3032
Other complex example: `"re:(?i)^repo\d*$": "[veerendra||re:(?i)^l(ukasz)?gryglicki$;[email protected]||*;*;Lukasz Gryglicki]"`.
3133

3234
This matches one of:
33-
- GitHub username/login `veerendra` no matter the email and name.
34-
- GitHub username/login like lgryglicki, LukaszGryglicki and similar with email [email protected], name doesn't matter.
35-
- GitHub name "Lukasz Gryglicki" email and username/login doesn't matter.
35+
- GitHub login `veerendra` no matter the email and name.
36+
- GitHub login like lgryglicki, LukaszGryglicki and similar with email [email protected], name doesn't matter.
37+
- GitHub name "Lukasz Gryglicki" email and login doesn't matter.
3638

3739
There can be multiple entries under one Github Organization DynamoDB entry.
3840

@@ -46,7 +48,7 @@ Example:
4648
"skip_cla": {
4749
"M": {
4850
"*": {
49-
"S": "copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"
51+
"S": "*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"
5052
},
5153
"re:(?i)^repo[0-9]+$": {
5254
"S": "re:vee?rendra;*;*"
@@ -57,21 +59,24 @@ Example:
5759
}
5860
```
5961

62+
For example for `copilot-swe-agent[bot]` GitHub bot the exact values returned by GitHub are: id, login, name are all nulls, email is like this `[email protected]`.
63+
6064
Algorithm to match pattern is as follows:
6165
- First we check repository name for exact match. Repository name is without the organization name, so for `https://github.com/linuxfoundation/easycla` it is just `easycla`. If we find an entry in `skip_cla` for `easycla` that entry is used and we stop searching.
6266
- If no exact match is found, we check for regular expression match. Only keys starting with `re:` are considered. If we find a match, we use that entry and stop searching.
6367
- If no match is found, we check for `*` entry. If it exists, we use that entry and stop searching.
6468
- If no match is found, we don't skip CLA check.
65-
- Now when we have the entry, it is in the following format: `github_username_pattern;email_pattern;name_pattern` or `"[github_username_pattern;email_pattern;name_pattern||...]" (array)`.
66-
- We check GitHub username/login, email address and name against the patterns. Algorithm is the same - username, email and name patterns can be either direct match or `re:regexp` or `*`.
67-
- If username, email and name match the patterns, we skip CLA check. If username or email or name is not set but the pattern is `*` it means hit.
68-
- So setting pattern to `username_pattern;*;*` or `username_pattern` (which is equivalent) means that we only check for username match and assume all emails and names are valid.
69+
- Now when we have the entry, it is in the following format: `login_pattern;email_pattern;name_pattern` or `"[login_pattern;email_pattern;name_pattern||...]" (array)`.
70+
- We check GitHub login, email address and name against the patterns. Algorithm is the same - login, email and name patterns can be either direct match or `re:regexp` or `*`.
71+
- If login, email and name match the patterns, we skip CLA check. If login, email or name is not set but the pattern is `*` it means hit.
72+
- So setting pattern to `login_pattern;*;*` or `login_pattern` (which is equivalent) means that we only check for login match and assume all emails and names are valid.
6973
- Any actor that matches any of the entries in the array will be skipped (logical OR).
70-
- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization. If there are also specific repository patterns, they will be used instead of `*` (fallback for all).
74+
- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization.
75+
- If there are also specific repository patterns, they will be used instead of `*` (fallback for all).
7176

7277

7378
There is a script that allows you to update the `skip_cla` property in the DynamoDB table. It is located in `utils/skip_cla_entry.sh`. You can run it like this:
74-
- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'github-username-pattern;email-pattern;name_pattern' ``.
79+
- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'login-pattern;email-pattern;name_pattern' ``.
7580
- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' 'copilot-swe-agent[bot];*;*' ``.
7681
- Complex example: `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$' '[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]' ``.
7782

@@ -91,7 +96,7 @@ aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
9196
--table-name "cla-prod-github-orgs" \
9297
--key '{"organization_name": {"S": "linuxfoundation"}}' \
9398
--update-expression 'SET skip_cla = :val' \
94-
--expression-attribute-values '{":val": {"M": {"re:^easycla":{"S":"copilot-swe-agent[bot];*;*"}}}}'
99+
--expression-attribute-values '{":val": {"M": {"re:^easycla":{"S":"some-github-login;*;*"}}}}'
95100
```
96101

97102
To add a new key to an existing `skip_cla` entry (or replace the existing key):
@@ -102,7 +107,7 @@ aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
102107
--key '{"organization_name": {"S": "linuxfoundation"}}' \
103108
--update-expression "SET skip_cla.#repo = :val" \
104109
--expression-attribute-names '{"#repo": "re:^easycla"}' \
105-
--expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot]"}}'
110+
--expression-attribute-values '{":val": {"S": "some-github-login"}}'
106111
```
107112

108113
To delete a key from an existing `skip_cla` entry:
@@ -138,14 +143,14 @@ To check for log entries related to skipping CLA check, you can use the followin
138143

139144
To add first `skip_cla` value for an organization:
140145
```
141-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"otel-arrow":{"S":"copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}'
142-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"vscode-ext":{"S":"copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}'
146+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"otel-arrow":{"S":"*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}'
147+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla = :val' --expression-attribute-values '{":val": {"M": {"vscode-ext":{"S":"*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}}}'
143148
```
144149

145150
To add additional repositories entries without overwriting the existing `skip_cla` value:
146151
```
147-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}'
148-
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot];re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$;*"}}'
152+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "open-telemetry"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$"}}'
153+
aws --profile lfproduct-prod --region us-east-1 dynamodb update-item --table-name "cla-prod-github-orgs" --key '{"organization_name": {"S": "openfga"}}' --update-expression 'SET skip_cla.#repo = :val' --expression-attribute-names '{"#repo": "*"}' --expression-attribute-values '{":val": {"S": "*;re:^\\d+\\+Copilot@users\\.noreply\\.github\\.com$"}}'
149154
```
150155

151156
To delete a specific repo entry from `skip_cla`:

cla-backend-go/github/bots.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -134,19 +134,19 @@ func parseConfigPatterns(config string) []string {
134134
// : in cla-{stage}-github-orgs table there can be a skip_cla field which is a dict with the following structure:
135135
//
136136
// {
137-
// "repo-name": "<username_pattern>;<email_pattern>;<name_pattern>",
138-
// "re:repo-regexp": "[<username_pattern>;<email_pattern>;<name_pattern>||...]",
137+
// "repo-name": "<login_pattern>;<email_pattern>;<name_pattern>",
138+
// "re:repo-regexp": "[<login_pattern>;<email_pattern>;<name_pattern>||...]",
139139
// "*": "<login_pattern>"
140140
// }
141141
//
142142
// where:
143143
// - repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo")
144144
// - re:repo-regexp is a regex pattern to match repository names
145145
// - * is a wildcard that applies to all repositories
146-
// - <username_pattern> is a GitHub username pattern (exact match or regex prefixed by re: or match all '*')
146+
// - <login_pattern> is a GitHub login pattern (exact match or regex prefixed by re: or match all '*')
147147
// - <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*'
148148
// - <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*'
149-
// The username/login, email and name patterns are separated by a semicolon (;). Email and name parts are optional.
149+
// The login, email and name patterns are separated by a semicolon (;). Email and name parts are optional.
150150
// There can be an array of patterns for a single repository, separated by ||. It must start with a '[' and end with a ']': "[...||...||...]"
151151
// If the skip_cla is not set, it will skip the whitelisted bots check.
152152
func SkipWhitelistedBots(ev events.Service, orgModel *models.GithubOrganization, orgRepo, projectID string, actorsMissingCLA []*UserCommitSummary) ([]*UserCommitSummary, []*UserCommitSummary) {

cla-backend-go/swagger/common/github-organization.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,16 +81,16 @@ properties:
8181
- Or an OR-array in the form '[<entry1>||<entry2>||...]', where each entry uses the same pattern format above.
8282
8383
Patterns can be:
84-
- An exact match (e.g. 'repo1', 'username', 'email@domain').
84+
- An exact match (e.g. 'repo1', 'login', 'Name Surname', 'email@domain').
85+
- A regular expression prefixed with 're:' (e.g. 're:(?i)^bot.*$').
8586
- A wildcard '*' to match all.
86-
- A regular expression prefixed with 're:' (e.g. 're:(?i)^bot.*').
8787
8888
Example formats:
8989
- "copilot-swe-agent[bot];*;*"
9090
- "re:vee?rendra;*;*"
9191
- "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]"
92-
- "username;*"
93-
- "username;[email protected];Real Name"
92+
- "login;*"
93+
- "login;[email protected];Real Name"
9494
example:
9595
"*": "copilot-swe-agent[bot];*;*"
9696
"repo1": "re:vee?rendra;*;*"

cla-backend/cla/models/github_models.py

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -969,7 +969,7 @@ def is_actor_skipped(self, actor, config):
969969
login_pattern, email_pattern, name_pattern = parts[:3]
970970
login = getattr(actor, "author_login", None)
971971
email = getattr(actor, "author_email", None)
972-
name = getattr(actor, "author_username", None)
972+
name = getattr(actor, "author_name", None)
973973
return (
974974
self.property_matches(login_pattern, login) and
975975
self.property_matches(email_pattern, email) and
@@ -1001,6 +1001,12 @@ def parse_config_patterns(self, config):
10011001
else:
10021002
return [config]
10031003

1004+
def safe_getattr(obj, attr, default='(null)'):
1005+
"""Returns obj.attr or default if attr is missing or None."""
1006+
val = getattr(obj, attr, default)
1007+
if val is None:
1008+
return default
1009+
return val
10041010

10051011
def skip_whitelisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tuple[List[UserCommitSummary], List[UserCommitSummary]]:
10061012
"""
@@ -1014,18 +1020,18 @@ def skip_whitelisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tupl
10141020
:return: Tuple of (actors_missing_cla, whitelisted_actors)
10151021
: in cla-{stage}-github-orgs table there can be a skip_cla field which is a dict with the following structure:
10161022
{
1017-
"repo-name": "<username_pattern>;<email_pattern>;<name_pattern>",
1018-
"re:repo-regexp": "[<username_pattern>;<email_pattern>;<name_pattern>||...]",
1023+
"repo-name": "<login_pattern>;<email_pattern>;<name_pattern>",
1024+
"re:repo-regexp": "[<login_pattern>;<email_pattern>;<name_pattern>||...]",
10191025
"*": "<login_pattern>"
10201026
}
10211027
where:
10221028
- repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo")
10231029
- re:repo-regexp is a regex pattern to match repository names
10241030
- * is a wildcard that applies to all repositories
1025-
- <username_pattern> is a GitHub username pattern (exact match or regex prefixed by re: or match all '*')
1031+
- <login_pattern> is a GitHub login pattern (exact match or regex prefixed by re: or match all '*')
10261032
- <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') - defaults to '*' if not set
10271033
- <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') - defaults to '*' if not set
1028-
:note: The username/login, email and name patterns are separated by a semicolon (;).
1034+
:note: The login (sometimes called username it's the same), email and name patterns are separated by a semicolon (;).
10291035
:note: There can be an array of patterns - it must start with [ and with ] and be || separated.
10301036
:note: If the skip_cla is not set, it will skip the whitelisted bots check.
10311037
"""
@@ -1071,10 +1077,10 @@ def skip_whitelisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tupl
10711077
return actors_missing_cla, []
10721078

10731079
actor_debug_data = [
1074-
f"id='{getattr(a, 'author_id', '(null)')}',"
1075-
f"login='{getattr(a, 'author_login', '(null)')}',"
1076-
f"username='{getattr(a, 'author_username', '(null)')}',"
1077-
f"email='{getattr(a, 'author_email', '(null)')}'"
1080+
f"id='{safe_getattr(a, 'author_id')}',"
1081+
f"login='{safe_getattr(a, 'author_login')}',"
1082+
f"name='{safe_getattr(a, 'author_name')}',"
1083+
f"email='{safe_getattr(a, 'author_email')}'"
10781084
for a in actors_missing_cla
10791085
]
10801086
config = self.parse_config_patterns(config)
@@ -1085,11 +1091,11 @@ def skip_whitelisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tupl
10851091
if actor is None:
10861092
continue
10871093
try:
1088-
actor_data = "id='{}',login='{}',username='{}',email='{}'".format(
1089-
getattr(actor, "author_id", "(null)"),
1090-
getattr(actor, "author_login", "(null)"),
1091-
getattr(actor, "author_username", "(null)"),
1092-
getattr(actor, "author_email", "(null)"),
1094+
actor_data = "id='{}',login='{}',name='{}',email='{}'".format(
1095+
safe_getattr(actor, "author_id"),
1096+
safe_getattr(actor, "author_login"),
1097+
safe_getattr(actor, "author_name"),
1098+
safe_getattr(actor, "author_email"),
10931099
)
10941100
cla.log.debug("Checking actor: %s for skip_cla config: %s", actor_data, config)
10951101
if self.is_actor_skipped(actor, config):
@@ -1111,8 +1117,13 @@ def skip_whitelisted_bots(self, org_model, org_repo, actors_missing_cla) -> Tupl
11111117
continue
11121118
except Exception as e:
11131119
cla.log.warning(
1114-
"Error checking skip_cla for actor '%s' (login='%s', email='%s'): %s",
1115-
actor, getattr(actor, "author_login", None), getattr(actor, "author_email", None), e,
1120+
"Error checking skip_cla for actor '%s' (id='%s', login='%s', name='%s', email='%s'): %s",
1121+
actor,
1122+
safe_getattr(actor, "author_id"),
1123+
safe_getattr(actor, "author_login"),
1124+
safe_getattr(actor, "author_name"),
1125+
safe_getattr(actor, "author_email"),
1126+
e,
11161127
)
11171128
out_actors_missing_cla.append(actor)
11181129

0 commit comments

Comments
 (0)