Skip to content

Commit 4df87e1

Browse files
Sync with main - minimal changes
Signed-off-by: Lukasz Gryglicki <[email protected]> Assisted by [OpenAI](https://platform.openai.com/) Assisted by [GitHub Copilot](https://github.com/features/copilot)
1 parent fd95923 commit 4df87e1

File tree

3 files changed

+47
-2
lines changed

3 files changed

+47
-2
lines changed

COMMIT_AUTHORS_CACHING.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# EasyCLA: Author and Co-author Caching + Large-PR Support
2+
3+
- **Two-level caching** for author and co-author identity & identity plus per-project signature decisions.
4+
- **GraphQL-based commit ingestion** that comfortably handles PRs with **250+ commits (and beyond)**.
5+
6+
---
7+
8+
## Why it matters
9+
- Faster PR checks and `/easycla` re-runs.
10+
- Lower DB/API load via memoized decisions.
11+
- Stable, deterministic output and accurate status posting on the PR **head SHA**.
12+
13+
---
14+
15+
## Caching
16+
- **General cache key**: `(author_id, lower(login), lower(email)) → (user | None)`
17+
- **Per-project cache key**: `(project_id, author_id, lower(login), lower(email)) → (user | None, authorized, affiliated)`
18+
- **TTL policy**: positives **~24h**; negative/uncertain states use **Quick TTL = 5m**.
19+
- **Flow**: per-project cache → general cache → cold DB path. Results are stored back with the appropriate TTL.
20+
- Thread-safe with periodic expired entries cleanup (once per hour).
21+
22+
---
23+
24+
## Large PR (250+) support
25+
- Switch to **GitHub GraphQL** for commits (`pageSize=100`) with cursor paging.
26+
- Parallel processing via thread pool; co-authors parsed from **commit messages** (`Co-authored-by:`).
27+
- Final actor lists are **de-duplicated** and **sorted** (login, name, email, sha) for stable comments.
28+
- PR comments are **edited only when normalized body changes** (prevents churn & size bloat).
29+
- Commit statuses are always posted to the **true PR head SHA**.
30+
31+
---
32+
33+
## Operational notes
34+
- Expect noticeable **latency reduction** on large PRs and repeated checks.
35+
- Fallbacks remain safe; unknown users land in an “Unknown” bucket with guidance.
36+
- No behavior change to the core signing rules—only faster execution.
37+
38+
---
39+
40+
## Quick constants
41+
- `QUICK_CACHE_TTL = 300` seconds (negative/uncertain states).
42+
- Default positive cache TTL ≈ **24 hours**.
43+
- GraphQL: `pageSize=100`, parallel workers tuned for throughput.

cla-backend-go/github/github_repository.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1776,7 +1776,7 @@ func getCommentBadge(allSigned bool, signURL string, missingUserId, managerAppro
17761776
badgeURL = fmt.Sprintf("%s/cla-signed.svg%s", CLALogoURL, svgVersion)
17771777
badgeHyperLink = fmt.Sprintf("%s/#/?version=2", CLALandingPage)
17781778
alt = "CLA Signed"
1779-
return fmt.Sprintf(`<a href="%s"><img src="%s" alt="%s" align="left" height="28" width="328" >`, badgeHyperLink, badgeURL, alt)
1779+
return fmt.Sprintf(`<a href="%s"><img src="%s" alt="%s" align="left" height="28" width="328" ></a>`, badgeHyperLink, badgeURL, alt)
17801780
}
17811781
badgeHyperLink = signURL
17821782
if missingUserId {
@@ -1790,7 +1790,7 @@ func getCommentBadge(allSigned bool, signURL string, missingUserId, managerAppro
17901790
alt = "CLA Not Signed"
17911791
}
17921792

1793-
text = fmt.Sprintf(`<a href="%s"><img src="%s" alt="%s" align="left" height="28" width="328" >`, badgeHyperLink, badgeURL, alt)
1793+
text = fmt.Sprintf(`<a href="%s"><img src="%s" alt="%s" align="left" height="28" width="328" ></a>`, badgeHyperLink, badgeURL, alt)
17941794
return fmt.Sprintf("%s<br/>", text)
17951795
}
17961796

cla-backend/cla/models/github_models.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2067,6 +2067,8 @@ def pygithub_graphql(g, query: str, variables: dict | None = None):
20672067
Works on older PyGithub versions lacking Github.graphql().
20682068
"""
20692069
try:
2070+
# LG: note that this uses internal PyGithub API - may break in future versions:
2071+
# g._Github__requester.requestJsonAndCheck
20702072
headers, data = g._Github__requester.requestJsonAndCheck(
20712073
"POST",
20722074
"/graphql",

0 commit comments

Comments
 (0)