Skip to content

Commit 5a9689c

Browse files
Merge pull request #4725 from linuxfoundation/unicron-4701-allow-bots-to-skip-cla
Add support for whitelisting bots
2 parents 054a8c8 + 8cc692e commit 5a9689c

File tree

18 files changed

+699
-22
lines changed

18 files changed

+699
-22
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ The following diagram explains the EasyCLA architecture.
6161

6262
![CLA Architecture](.gitbook/assets/easycla-architecture-overview.png)
6363

64+
## Bot Whitelisting
65+
66+
For whitelisting bots please see the [Whitelisting Bots](WHITELISTING_BOTS.md) documentation.
67+
6468
## EasyCLA Release Process
6569

6670
The following diagram illustrates the EasyCLA release process:

WHITELISTING_BOTS.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
## Whitelisting Bots
2+
3+
You can allow specific bot users to automatically pass the CLA check.
4+
5+
This can be done on the GitHub organization level by setting the `skip_cla` property on `cla-{stage}-github-orgs` DynamoDB table.
6+
7+
This property is a Map attribute that contains mapping from repository pattern to bot username and email pattern.
8+
9+
Each pattern is a string and can be one of three possible types:
10+
- `"name"` - exact match for repository name, GitHub username, or email address.
11+
- `"re:regexp"` - regular expression match for repository name, GitHub username, or email address.
12+
- `"*"` - matches all.
13+
14+
So the format is like `"repository_pattern": "github_username_pattern;email_pattern"`.
15+
16+
There can be multiple entries under one Github Organization DynamoDB entry.
17+
18+
Example:
19+
```
20+
{
21+
(...)
22+
"organization_name": {
23+
"S": "linuxfoundation"
24+
},
25+
"skip_cla": {
26+
"M": {
27+
"*": {
28+
"S": "copilot-swe-agent[bot];*"
29+
},
30+
"repo1": {
31+
"S": "re:vee?rendra;*"
32+
}
33+
}
34+
},
35+
(...)
36+
}
37+
```
38+
39+
Algorithm to match pattern is as follows:
40+
- First we check repository name for exact match. Repository name is without the organization name, so for `https://github.com/linuxfoundation/easycla` it is just `easycla`. If we find an entry in `skip_cla` for `easycla` that entry is used and we stop searching.
41+
- If no exact match is found, we check for regular expression match. Only keys starting with `re:` are considered. If we find a match, we use that entry and stop searching.
42+
- If no match is found, we check for `*` entry. If it exists, we use that entry and stop searching.
43+
- If no match is found, we don't skip CLA check.
44+
- Now when we have the entry, it is in the following format: `github_username_pattern;email_pattern`.
45+
- We check both GitHub username and email address against the patterns. Algorith is the same - username and email patterns can be either direct match or `re:regexp` or `*`.
46+
- If both username and email match the patterns, we skip CLA check. If username or email is not set but the pattern is `*` it means hit.
47+
- So setting pattern to `username_pattern;*` means that we only check for username match and assume all emails are valid.
48+
- If we set `repo_pattern` to `*` it means that this configuration applies to all repositories in the organization. If there are also specific repository patterns, they will be checked first.
49+
50+
51+
There is a script that allows you to update the `skip_cla` property in the DynamoDB table. It is located in `utils/skip_cla_entry.sh`. You can run it like this:
52+
- `` MODE=mode ./utils/skip_cla_entry.sh 'org-name' 'repo-pattern' 'github-username-pattern' 'email-pattern' ``.
53+
- `` MODE=add-key ./utils/skip_cla_entry.sh 'sun-test-org' '*' 'copilot-swe-agent[bot]' '*' ``.
54+
55+
`MODE` can be one of:
56+
- `put-item`: Overwrites/adds the entire `skip_cla` property. Needs all 4 arguments org, repo, username and email.
57+
- `add-key`: Adds or updates a key/value inside the `skip_cla` map (preserves other keys). Needs all 4 args.
58+
- `delete-key`: Removes a key from the `skip_cla` map. Needs 2 arguments: org and repo.
59+
- `delete-item`: Deletes the entire `skip_cla` item. Needs 1 argument: org.
60+
61+
62+
You can also use AWS CLI to update the `skip_cla` property. Here is an example command:
63+
64+
To add a new `skip_cla` entry:
65+
66+
```
67+
aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
68+
--table-name "cla-prod-github-orgs" \
69+
--key '{"organization_name": {"S": "linuxfoundation"}}' \
70+
--update-expression 'SET skip_cla = :val' \
71+
--expression-attribute-values '{":val": {"M": {"re:^easycla":{"S":"copilot-swe-agent[bot];*"}}}}'
72+
```
73+
74+
To add a new key to an existing `skip_cla` entry (or replace the existing key):
75+
76+
```
77+
aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
78+
--table-name "cla-prod-github-orgs" \
79+
--key '{"organization_name": {"S": "linuxfoundation"}}' \
80+
--update-expression "SET skip_cla.#repo = :val" \
81+
--expression-attribute-names '{"#repo": "re:^easycla"}' \
82+
--expression-attribute-values '{":val": {"S": "copilot-swe-agent[bot];*"}}'
83+
```
84+
85+
To delete a key from an existing `skip_cla` entry:
86+
87+
```
88+
aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
89+
--table-name "cla-prod-github-orgs" \
90+
--key '{"organization_name": {"S": "linuxfoundation"}}' \
91+
--update-expression "REMOVE skip_cla.#repo" \
92+
--expression-attribute-names '{"#repo": "re:^easycla"}'
93+
```
94+
95+
To delete the entire `skip_cla` entry:
96+
97+
```
98+
aws --profile "lfproduct-prod" --region "us-east-1" dynamodb update-item \
99+
--table-name "cla-prod-github-orgs" \
100+
--key '{"organization_name": {"S": "linuxfoundation"}}' \
101+
--update-expression "REMOVE skip_cla"
102+
```
103+
104+
To see given organization's entry: `./utils/scan.sh github-orgs organization_name sun-test-org`.
105+
106+
Or using AWS CLI:
107+
108+
```
109+
aws --profile "lfproduct-prod" dynamodb scan --table-name "cla-prod-github-orgs" --filter-expression "contains(organization_name,:v)" --expression-attribute-values "{\":v\":{\"S\":\"linuxfoundation\"}}" --max-items 100 | jq -r '.Items'
110+
```
111+

cla-backend-go/events/event_data.go

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -457,6 +457,23 @@ type CorporateSignatureSignedEventData struct {
457457
SignatoryName string
458458
}
459459

460+
// BypassCLAEventData event data model
461+
type BypassCLAEventData struct {
462+
Repo string
463+
Config string
464+
Actor string
465+
}
466+
467+
func (ed *BypassCLAEventData) GetEventDetailsString(args *LogEventArgs) (string, bool) {
468+
data := fmt.Sprintf("repo='%s', config='%s', actor='%s'", ed.Repo, ed.Config, ed.Actor)
469+
return data, true
470+
}
471+
472+
func (ed *BypassCLAEventData) GetEventSummaryString(args *LogEventArgs) (string, bool) {
473+
data := fmt.Sprintf("repo='%s', config='%s', actor='%s'", ed.Repo, ed.Config, ed.Actor)
474+
return data, true
475+
}
476+
460477
func (ed *CorporateSignatureSignedEventData) GetEventDetailsString(args *LogEventArgs) (string, bool) {
461478
data := fmt.Sprintf("The signature was signed for the project %s and company %s by %s", args.ProjectName, ed.CompanyName, ed.SignatoryName)
462479
if args.UserName != "" {

cla-backend-go/events/event_types.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,4 +99,6 @@ const (
9999

100100
IndividualSignatureSigned = "individual.signature.signed"
101101
CorporateSignatureSigned = "corporate.signature.signed"
102+
103+
BypassCLA = "Bypass CLA"
102104
)

cla-backend-go/github/bots.go

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
// Copyright The Linux Foundation and each contributor to CommunityBridge.
2+
// SPDX-License-Identifier: MIT
3+
4+
package github
5+
6+
import (
7+
"fmt"
8+
"regexp"
9+
"strings"
10+
11+
"github.com/linuxfoundation/easycla/cla-backend-go/events"
12+
"github.com/linuxfoundation/easycla/cla-backend-go/gen/v1/models"
13+
log "github.com/linuxfoundation/easycla/cla-backend-go/logging"
14+
"github.com/sirupsen/logrus"
15+
)
16+
17+
// propertyMatches returns true if value matches the pattern.
18+
// - "*" matches anything
19+
// - "re:..." matches regex (value must be non-empty)
20+
// - otherwise, exact match
21+
func propertyMatches(pattern, value string) bool {
22+
f := logrus.Fields{
23+
"functionName": "github.propertyMatches",
24+
"pattern": pattern,
25+
"value": value,
26+
}
27+
if pattern == "*" {
28+
return true
29+
}
30+
if value == "" {
31+
return false
32+
}
33+
if strings.HasPrefix(pattern, "re:") {
34+
regex := pattern[3:]
35+
re, err := regexp.Compile(regex)
36+
if err != nil {
37+
log.WithFields(f).Debugf("Error in propertyMatches: bad regexp: %s, error: %v", regex, err)
38+
return false
39+
}
40+
return re.MatchString(value)
41+
}
42+
return value == pattern
43+
}
44+
45+
// stripOrg removes the organization part from the repository name.
46+
// If input is "org/repo", returns "repo". If no "/", returns input unchanged.
47+
func stripOrg(repoFull string) string {
48+
idx := strings.Index(repoFull, "/")
49+
if idx >= 0 && idx+1 < len(repoFull) {
50+
return repoFull[idx+1:]
51+
}
52+
return repoFull
53+
}
54+
55+
// isActorSkipped returns true if the given actor should be skipped according to the skip_cla config pattern.
56+
// config format: "<username_pattern>;<email_pattern>"
57+
// Actor.CommitAuthor.Login and Actor.CommitAuthor.Email should be *string, can be nil.
58+
func isActorSkipped(actor *UserCommitSummary, config string) bool {
59+
f := logrus.Fields{
60+
"functionName": "github.isActorSkipped",
61+
"config": config,
62+
}
63+
// Defensive: must have exactly one ';'
64+
if !strings.Contains(config, ";") {
65+
log.WithFields(f).Debugf("Invalid skip_cla config format: %s, expected '<username_pattern>;<email_pattern>'", config)
66+
return false
67+
}
68+
parts := strings.SplitN(config, ";", 2)
69+
if len(parts) != 2 {
70+
return false
71+
}
72+
usernamePattern := parts[0]
73+
emailPattern := parts[1]
74+
var (
75+
username string
76+
email string
77+
)
78+
if actor != nil && actor.CommitAuthor != nil && actor.CommitAuthor.Login != nil {
79+
username = *actor.CommitAuthor.Login
80+
}
81+
if actor != nil && actor.CommitAuthor != nil && actor.CommitAuthor.Email != nil {
82+
email = *actor.CommitAuthor.Email
83+
}
84+
85+
return propertyMatches(usernamePattern, username) && propertyMatches(emailPattern, email)
86+
}
87+
88+
// SkipWhitelistedBots- check if the actors are whitelisted based on the skip_cla configuration.
89+
// Returns two lists:
90+
// - actors still missing cla: actors who still need to sign the CLA after checking skip_cla
91+
// - whitelisted actors: actors who are skipped due to skip_cla configuration
92+
// :param orgModel: The GitHub organization model instance.
93+
// :param orgRepo: The repository name in the format 'org/repo'.
94+
// :param actorsMissingCla: List of UserCommitSummary objects representing actors who are missing CLA.
95+
// :return: two arrays (actors still missing CLA, whitelisted actors)
96+
// : in cla-{stage}-github-orgs table there can be a skip_cla field which is a dict with the following structure:
97+
//
98+
// {
99+
// "repo-name": "<username_pattern>;<email_pattern>",
100+
// "re:repo-regexp": "<username_pattern>;<email_pattern>",
101+
// "*": "<username_pattern>;<email_pattern>"
102+
// }
103+
//
104+
// where:
105+
// - repo-name is the exact repository name under given org (e.g., "my-repo" not "my-org/my-repo")
106+
// - re:repo-regexp is a regex pattern to match repository names
107+
// - * is a wildcard that applies to all repositories
108+
// - <username_pattern> is a GitHub username pattern (exact match or regex prefixed by re: or match all '*')
109+
// - <email_pattern> is a GitHub email pattern (exact match or regex prefixed by re: or match all '*')
110+
// The username and email patterns are separated by a semicolon (;).
111+
// If the skip_cla is not set, it will skip the whitelisted bots check.
112+
func SkipWhitelistedBots(ev events.Service, orgModel *models.GithubOrganization, orgRepo, projectID string, actorsMissingCLA []*UserCommitSummary) ([]*UserCommitSummary, []*UserCommitSummary) {
113+
repo := stripOrg(orgRepo)
114+
f := logrus.Fields{
115+
"functionName": "github.SkipWhitelistedBots",
116+
"orgRepo": orgRepo,
117+
"repo": repo,
118+
"projectID": projectID,
119+
}
120+
outActorsMissingCLA := []*UserCommitSummary{}
121+
whitelistedActors := []*UserCommitSummary{}
122+
123+
skipCLA := orgModel.SkipCla
124+
if skipCLA == nil {
125+
log.WithFields(f).Debug("skip_cla is not set, skipping whitelisted bots check")
126+
return actorsMissingCLA, []*UserCommitSummary{}
127+
}
128+
129+
var config string
130+
131+
// 1. Exact match
132+
if val, ok := skipCLA[repo]; ok {
133+
config = val
134+
log.WithFields(f).Debugf("skip_cla config found for repo (exact hit): '%s'", config)
135+
}
136+
137+
// 2. Regex match (if no exact hit)
138+
if config == "" {
139+
log.WithFields(f).Debug("No skip_cla config found for repo, checking regex patterns")
140+
for k, v := range skipCLA {
141+
if !strings.HasPrefix(k, "re:") {
142+
continue
143+
}
144+
pattern := k[3:]
145+
re, err := regexp.Compile(pattern)
146+
if err != nil {
147+
log.WithFields(f).Warnf("Invalid regex in skip_cla: '%s': %+v", pattern, err)
148+
continue
149+
}
150+
if re.MatchString(repo) {
151+
config = v
152+
log.WithFields(f).Debugf("Found skip_cla config for repo via regex pattern: '%s'", config)
153+
break
154+
}
155+
}
156+
}
157+
158+
// 3. Wildcard fallback
159+
if config == "" {
160+
if val, ok := skipCLA["*"]; ok {
161+
config = val
162+
log.WithFields(f).Debugf("No skip_cla config found for repo, using wildcard config: '%s'", config)
163+
}
164+
}
165+
166+
// 4. No match
167+
if config == "" {
168+
log.WithFields(f).Debug("No skip_cla config found for repo, skipping whitelisted bots check")
169+
return actorsMissingCLA, []*UserCommitSummary{}
170+
}
171+
const nullStr = "(null)"
172+
173+
for _, actor := range actorsMissingCLA {
174+
if isActorSkipped(actor, config) {
175+
if actor == nil {
176+
continue
177+
}
178+
id, login, username, email := nullStr, nullStr, nullStr, nullStr
179+
if actor.CommitAuthor != nil && actor.CommitAuthor.ID != nil {
180+
id = fmt.Sprintf("%v", *actor.CommitAuthor.ID)
181+
}
182+
if actor.CommitAuthor != nil && actor.CommitAuthor.Login != nil {
183+
login = *actor.CommitAuthor.Login
184+
}
185+
if actor.CommitAuthor != nil && actor.CommitAuthor.Name != nil {
186+
username = *actor.CommitAuthor.Name
187+
}
188+
if actor.CommitAuthor != nil && actor.CommitAuthor.Email != nil {
189+
email = *actor.CommitAuthor.Email
190+
}
191+
actorData := fmt.Sprintf("id='%v',login='%v',username='%v',email='%v'", id, login, username, email)
192+
msg := fmt.Sprintf(
193+
"Skipping CLA check for repo='%s', actor: %s due to skip_cla config: '%s'",
194+
orgRepo, actorData, config,
195+
)
196+
log.WithFields(f).Info(msg)
197+
eventData := events.BypassCLAEventData{
198+
Repo: orgRepo,
199+
Config: config,
200+
Actor: actorData,
201+
}
202+
ev.LogEvent(&events.LogEventArgs{
203+
EventType: events.BypassCLA,
204+
EventData: &eventData,
205+
UserID: id,
206+
UserName: login,
207+
ProjectID: projectID,
208+
})
209+
log.WithFields(f).Debugf("event logged")
210+
actor.Authorized = true
211+
whitelistedActors = append(whitelistedActors, actor)
212+
} else {
213+
outActorsMissingCLA = append(outActorsMissingCLA, actor)
214+
}
215+
}
216+
217+
return outActorsMissingCLA, whitelistedActors
218+
}

0 commit comments

Comments
 (0)