Skip to content

Commit 24d3547

Browse files
Merge pull request #4728 from linuxfoundation/unicron-4701-allow-bots-to-skip-cla-support-arrays-and-names
Refactor to support array of configs, additional name pattern and make email & name patterns optional + update util scripts and docs
2 parents 65a102b + f021e0f commit 24d3547

File tree

5 files changed

+171
-89
lines changed

5 files changed

+171
-89
lines changed

WHITELISTING_BOTS.md

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,26 @@ You can allow specific bot users to automatically pass the CLA check.
44

55
This can be done on the GitHub organization level by setting the `skip_cla` property on `cla-{stage}-github-orgs` DynamoDB table.
66

7-
This property is a Map attribute that contains mapping from repository pattern to bot username and email pattern.
7+
Replace `{stage}` with either `dev` or `prod`.
88

9-
Each pattern is a string and can be one of three possible types:
10-
- `"name"` - exact match for repository name, GitHub username, or email address.
9+
This property is a Map attribute that contains mapping from repository pattern to bot username (GitHub login), email and name pattern.
10+
11+
Example username/login is lukaszgryglicki (like any username/login that can be accessed via `https://github.com/username`).
12+
13+
Example name is "Lukasz Gryglicki".
14+
15+
Email pattern and name pattern are optional and `*` is assumed for them if not specified.
16+
17+
Each pattern is a string and can be one of three possible types (and are checked tin this order):
18+
- `"name"` - exact match for repository name, GitHub login/username, email address, GitHub name.
1119
- `"re:regexp"` - regular expression match for repository name, GitHub username, or email address.
1220
- `"*"` - matches all.
1321

14-
So the format is like `"repository_pattern": "github_username_pattern;email_pattern"`.
22+
So the format is like `"repository_pattern": "github_username_pattern;email_pattern;name_pattern"`. `;` is used as a separator.
23+
24+
You can also specify multiple patterns so different set is used for multiple users - in such case configuration must start with `[`, end with `]` and be `||` separated.
25+
26+
For example: `"[copilot-swe-agent[bot];*;*||re:(?i)^l(ukasz)?gryglicki$;*;re:Gryglicki]"`.
1527

1628
There can be multiple entries under one Github Organization DynamoDB entry.
1729

@@ -25,10 +37,10 @@ Example:
2537
"skip_cla": {
2638
"M": {
2739
"*": {
28-
"S": "copilot-swe-agent[bot];*"
40+
"S": "copilot-swe-agent[bot];*;*"
2941
},
30-
"repo1": {
31-
"S": "re:vee?rendra;*"
42+
"re:(?i)^repo[0-9]+$": {
43+
"S": "re:vee?rendra;*;*"
3244
}
3345
}
3446
},
@@ -41,22 +53,24 @@ Algorithm to match pattern is as follows:
4153
- If no exact match is found, we check for regular expression match. Only keys starting with `re:` are considered. If we find a match, we use that entry and stop searching.
4254
- If no match is found, we check for `*` entry. If it exists, we use that entry and stop searching.
4355
- If no match is found, we don't skip CLA check.
44-
- Now when we have the entry, it is in the following format: `github_username_pattern;email_pattern`.
45-
- We check both GitHub username and email address against the patterns. Algorith is the same - username and email patterns can be either direct match or `re:regexp` or `*`.
46-
- If both username and email match the patterns, we skip CLA check. If username or email is not set but the pattern is `*` it means hit.
47-
- So setting pattern to `username_pattern;*` means that we only check for username match and assume all emails are valid.
48-
- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization. If there are also specific repository patterns, they will be checked first.
56+
- Now when we have the entry, it is in the following format: `github_username_pattern;email_pattern;name_pattern` or `"[github_username_pattern;email_pattern;name_pattern||...]" (array)`.
57+
- We check GitHub username/login, email address and name against the patterns. Algorithm is the same - username, email and name patterns can be either direct match or `re:regexp` or `*`.
58+
- If username, email and name match the patterns, we skip CLA check. If username or email or name is not set but the pattern is `*` it means hit.
59+
- So setting pattern to `username_pattern;*;*` or `username_pattern` (which is equivalent) means that we only check for username match and assume all emails and names are valid.
60+
- Any actor that matches any of the entries in the array will be skipped (logical OR).
61+
- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization. If there are also specific repository patterns, they will be used instead of `*` (fallback for all).
4962

5063

5164
There is a script that allows you to update the `skip_cla` property in the DynamoDB table. It is located in `utils/skip_cla_entry.sh`. You can run it like this:
52-
- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'github-username-pattern' 'email-pattern' ``.
53-
- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' 'copilot-swe-agent[bot]' '*' ``.
65+
- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'github-username-pattern;email-pattern;name_pattern' ``.
66+
- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' 'copilot-swe-agent[bot];*;*' ``.
67+
- Complex example: `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' 're:(?i)^repo[0-9]+$' '[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]' ``.
5468

5569
`MODE` can be one of:
56-
- `put-item`: Overwrites/adds the entire `skip_cla` property. Needs all 4 arguments org, repo, username and email.
57-
- `add-key`: Adds or updates a key/value inside the `skip_cla` map (preserves other keys). Needs all 4 args.
70+
- `put-item`: Overwrites/adds the entire `skip_cla` property. Needs all 3 arguments org, repo, and pattern.
71+
- `add-key`: Adds or updates a key/value inside the `skip_cla` map (preserves other keys). Needs all 3 args.
5872
- `delete-key`: Removes a key from the `skip_cla` map. Needs 2 arguments: org and repo.
59-
- `delete-item`: Deletes the entire `skip_cla` item. Needs 1 argument: org.
73+
- `delete-item`: Deletes the entire `skip_cla` from the item. Needs 1 argument: org.
6074

6175

6276
You can also use AWS CLI to update the `skip_cla` property. Here is an example command:
@@ -68,7 +82,7 @@ aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
6882
--table-name "cla-prod-github-orgs" \
6983
--key '{"organization_name": {"S": "linuxfoundation"}}' \
7084
--update-expression 'SET skip_cla = :val' \
71-
--expression-attribute-values '{":val": {"M": {"re:^easycla":{"S":"copilot-swe-agent[bot];*"}}}}'
85+
--expression-attribute-values '{":val": {"M": {"re:^easycla":{"S":"copilot-swe-agent[bot];*;*"}}}}'
7286
```
7387

7488
To add a new key to an existing `skip_cla` entry (or replace the existing key):
@@ -79,7 +93,7 @@ aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
7993
--key '{"organization_name": {"S": "linuxfoundation"}}' \
8094
--update-expression "SET skip_cla.#repo = :val" \
8195
--expression-attribute-names '{"#repo": "re:^easycla"}' \
82-
--expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot];*"}}'
96+
--expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot]"}}'
8397
```
8498

8599
To delete a key from an existing `skip_cla` entry:
@@ -110,3 +124,4 @@ aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs"
110124
```
111125

112126
To check for log entries related to skipping CLA check, you can use the following command: `` STAGE=dev DTFROM='1 hour ago' DTTO='1 second ago' ./utils/search_aws_log_group.sh 'cla-backend-dev-githubactivity' 'skip_cla' ``.
127+

cla-backend-go/github/bots.go

Lines changed: 61 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -52,39 +52,40 @@ func stripOrg(repoFull string) string {
5252
return repoFull
5353
}
5454

55-
// isActorSkipped returns true if the given actor should be skipped according to the skip_cla config pattern.
56-
// config format: "<username_pattern>;<email_pattern>"
57-
// Actor.CommitAuthor.Login and Actor.CommitAuthor.Email should be *string, can be nil.
58-
func isActorSkipped(actor *UserCommitSummary, config string) bool {
59-
f := logrus.Fields{
60-
"functionName": "github.isActorSkipped",
61-
"config": config,
62-
}
63-
// Defensive: must have exactly one ';'
64-
if !strings.Contains(config, ";") {
65-
log.WithFields(f).Debugf("Invalid skip_cla config format: %s, expected '<username_pattern>;<email_pattern>'", config)
66-
return false
67-
}
68-
parts := strings.SplitN(config, ";", 2)
69-
if len(parts) != 2 {
70-
return false
71-
}
72-
usernamePattern := parts[0]
73-
emailPattern := parts[1]
74-
var (
75-
username string
76-
email string
77-
)
78-
if actor != nil && actor.CommitAuthor != nil && actor.CommitAuthor.Login != nil {
79-
username = *actor.CommitAuthor.Login
80-
}
81-
if actor != nil && actor.CommitAuthor != nil && actor.CommitAuthor.Email != nil {
82-
email = *actor.CommitAuthor.Email
83-
}
55+
// isActorSkipped returns true if the actor should be skipped according to ANY pattern in config.
56+
// Each config entry is "<login_pattern>;<email_pattern>;<name_pattern>"
57+
// Any missing pattern defaults to "*"
58+
func isActorSkipped(actor *UserCommitSummary, config []string) bool {
59+
for _, pattern := range config {
60+
parts := strings.Split(pattern, ";")
61+
for len(parts) < 3 {
62+
parts = append(parts, "*")
63+
}
64+
loginPattern, emailPattern, namePattern := parts[0], parts[1], parts[2]
65+
66+
var login, email, name string
67+
if actor != nil && actor.CommitAuthor != nil {
68+
if actor.CommitAuthor.Login != nil {
69+
login = *actor.CommitAuthor.Login
70+
}
71+
if actor.CommitAuthor.Email != nil {
72+
email = *actor.CommitAuthor.Email
73+
}
74+
if actor.CommitAuthor.Name != nil {
75+
name = *actor.CommitAuthor.Name
76+
}
77+
}
8478

85-
return propertyMatches(usernamePattern, username) && propertyMatches(emailPattern, email)
79+
if propertyMatches(loginPattern, login) &&
80+
propertyMatches(emailPattern, email) &&
81+
propertyMatches(namePattern, name) {
82+
return true
83+
}
84+
}
85+
return false
8686
}
8787

88+
// actorToString converts a UserCommitSummary actor to a string representation.
8889
func actorToString(actor *UserCommitSummary) string {
8990
const nullStr = "(null)"
9091
if actor == nil {
@@ -106,6 +107,22 @@ func actorToString(actor *UserCommitSummary) string {
106107
return fmt.Sprintf("id='%v',login='%v',username='%v',email='%v'", id, login, username, email)
107108
}
108109

110+
// parseConfigPatterns takes a config string and returns a slice of pattern strings.
111+
// If the config starts with '[' and ends with ']', splits by '||' inside; else returns []string{config}.
112+
// Trims whitespace from each pattern.
113+
func parseConfigPatterns(config string) []string {
114+
config = strings.TrimSpace(config)
115+
if len(config) >= 2 && strings.HasPrefix(config, "[") && strings.HasSuffix(config, "]") {
116+
inner := config[1 : len(config)-1]
117+
parts := strings.Split(inner, "||")
118+
for i, p := range parts {
119+
parts[i] = strings.TrimSpace(p)
120+
}
121+
return parts
122+
}
123+
return []string{config}
124+
}
125+
109126
// SkipWhitelistedBots- check if the actors are whitelisted based on the skip_cla configuration.
110127
// Returns two lists:
111128
// - actors still missing cla: actors who still need to sign the CLA after checking skip_cla
@@ -117,18 +134,20 @@ func actorToString(actor *UserCommitSummary) string {
117134
// : in cla-{stage}-github-orgs table there can be a skip_cla field which is a dict with the following structure:
118135
//
119136
// {
120-
// "repo-name": "<username_pattern>;<email_pattern>",
121-
// "re:repo-regexp": "<username_pattern>;<email_pattern>",
122-
// "*": "<username_pattern>;<email_pattern>"
137+
// "repo-name": "<username_pattern>;<email_pattern>;<name_pattern>",
138+
// "re:repo-regexp": "[<username_pattern>;<email_pattern>;<name_pattern>||...]",
139+
// "*": "<login_pattern>"
123140
// }
124141
//
125142
// where:
126143
// - repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo")
127144
// - re:repo-regexp is a regex pattern to match repository names
128145
// - * is a wildcard that applies to all repositories
129146
// - <username_pattern> is a GitHub username pattern (exact match or regex prefixed by re: or match all '*')
130-
// - <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*')
131-
// The username and email patterns are separated by a semicolon (;).
147+
// - <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*'
148+
// - <name_pattern> is a GitHub name pattern (exact match or regex prefixed by re: or match all '*') if not specified defaults to '*'
149+
// The username/login, email and name patterns are separated by a semicolon (;). Email and name parts are optional.
150+
// There can be an array of patterns for a single repository, separated by ||. It must start with a '[' and end with a ']': "[...||...||...]"
132151
// If the skip_cla is not set, it will skip the whitelisted bots check.
133152
func SkipWhitelistedBots(ev events.Service, orgModel *models.GithubOrganization, orgRepo, projectID string, actorsMissingCLA []*UserCommitSummary) ([]*UserCommitSummary, []*UserCommitSummary) {
134153
repo := stripOrg(orgRepo)
@@ -189,23 +208,25 @@ func SkipWhitelistedBots(ev events.Service, orgModel *models.GithubOrganization,
189208
return actorsMissingCLA, []*UserCommitSummary{}
190209
}
191210

211+
configArray := parseConfigPatterns(config)
212+
192213
// Log full configuration
193214
actorDebugData := make([]string, 0, len(actorsMissingCLA))
194215
for _, a := range actorsMissingCLA {
195216
actorDebugData = append(actorDebugData, actorToString(a))
196217
}
197-
log.WithFields(f).Debugf("final skip_cla config for repo %s is %s; actorsMissingCLA: [%s]", orgRepo, config, strings.Join(actorDebugData, ", "))
218+
log.WithFields(f).Debugf("final skip_cla config for repo %s is %+v; actorsMissingCLA: [%s]", orgRepo, configArray, strings.Join(actorDebugData, ", "))
198219

199220
for _, actor := range actorsMissingCLA {
200221
if actor == nil {
201222
continue
202223
}
203224
actorData := actorToString(actor)
204-
log.WithFields(f).Debugf("Checking actor: %s for skip_cla config: %s", actorData, config)
205-
if isActorSkipped(actor, config) {
225+
log.WithFields(f).Debugf("Checking actor: %s for skip_cla config: %+v", actorData, configArray)
226+
if isActorSkipped(actor, configArray) {
206227
msg := fmt.Sprintf(
207-
"Skipping CLA check for repo='%s', actor: %s due to skip_cla config: '%s'",
208-
orgRepo, actorData, config,
228+
"Skipping CLA check for repo='%s', actor: %s due to skip_cla config: %+v",
229+
orgRepo, actorData, configArray,
209230
)
210231
log.WithFields(f).Info(msg)
211232
eventData := events.BypassCLAEventData{

cla-backend-go/swagger/common/github-organization.yaml

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,27 @@ properties:
7474
additionalProperties:
7575
type: string
7676
description: |
77-
Map of repository name or pattern (e.g. 'repo1', '*', 're:pattern') to a string in the form '<username_pattern>;<email_pattern>' for skipping CLA checks for certain bots. Patterns can be exact, wildcard '*', or regexp prefixed with 're:'.
78-
example:
79-
"*": "copilot-swe-agent[bot];*"
80-
"repo1": "re:vee?rendra;*"
77+
Map of repository name or pattern (e.g. 'repo1', '*', 're:pattern') to a string or array-string of pattern entries for skipping CLA checks for certain bots.
78+
79+
Each value can be either:
80+
- A string in the form '<login_pattern>;<email_pattern>;<name_pattern>' (email and name patterns are optional, default to '*').
81+
- Or an OR-array in the form '[<entry1>||<entry2>||...]', where each entry uses the same pattern format above.
8182
83+
Patterns can be:
84+
- An exact match (e.g. 'repo1', 'username', 'email@domain').
85+
- A wildcard '*' to match all.
86+
- A regular expression prefixed with 're:' (e.g. 're:(?i)^bot.*').
87+
88+
Example formats:
89+
- "copilot-swe-agent[bot];*;*"
90+
- "re:vee?rendra;*;*"
91+
- "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]"
92+
- "username;*"
93+
- "username;[email protected];Real Name"
94+
example:
95+
"*": "copilot-swe-agent[bot];*;*"
96+
"repo1": "re:vee?rendra;*;*"
97+
"re:(?i)^repo[0-9]+$": "[re:(?i)^l(ukasz)?gryglicki$;re:(?i)^l(ukasz)?gryglicki@;*||copilot-swe-agent[bot]]"
8298
repositories:
8399
type: object
84100
properties:

0 commit comments

Comments
 (0)