|
294 | 294 | <parsing> |
295 | 295 | <regex>Detect PR references as: /#(\d{1,7})/g</regex> |
296 | 296 | <heuristics> |
297 | | - - If a bullet has no explicit PR number, attempt fuzzy matching to PR titles |
298 | | - - If ambiguous, mark as "unlinked" and exclude by default pending user choice |
299 | | - </heuristics> |
| 297 | + Purpose: Count changelog bullets without explicit PR numbers as "referenced" when they confidently map to a PR. |
| 298 | + Matching algorithm: |
| 299 | + - Normalize both bullet text and PR titles: |
| 300 | + - lowercase; remove punctuation; strip prefixes like "fix:", "feat:", "add:", "improve:", "chore:", "refactor:" |
| 301 | + - remove parentheticals such as "(thanks @user!)", "(PR by @user)", "(#1234 ...)" |
| 302 | + - collapse whitespace |
| 303 | + - Tokenize and compute token-overlap score = |intersection(tokens)| / |union(tokens)| |
| 304 | + - Author signal: if bullet contains "thanks @user", "by @user", or "PR by @user" and that user equals the PR author or credited issue reporter, add +0.20 to score |
| 305 | + - Keyword boost: +0.05 when provider/model/domain keywords (e.g., OpenAI, Claude, Grok, Chutes, Qwen, LongCat, etc.) appear in both |
| 306 | + Confidence thresholds: |
| 307 | + - score ≥ 0.65 (after boosts) → linked (confident). Treat as changelog-referenced. |
| 308 | + - 0.45 ≤ score < 0.65 OR multiple candidates within 0.05 → ambiguous (needs review) |
| 309 | + - score < 0.45 → unlinked |
| 310 | + Tie-breakers: higher score; if within 0.02 then same author; then closer merge date to release date; then lowest PR number |
| 311 | + Edge case: If bullet credits exactly one username and exactly one PR in the window has that author, accept with score ≥ 0.50 (confidence="author-boost") |
| 312 | + Implementation notes: |
| 313 | + - Match only against PRs fetched for the version's date window |
| 314 | + - Persist mapping bullet_text → { prNumber, confidenceScore, rationaleSignals[] } and use it to compute linked/ambiguous/unlinked counts |
| 315 | +</heuristics> |
300 | 316 | </parsing> |
301 | 317 | </step> |
302 | 318 |
|
|
649 | 665 | <parsing> |
650 | 666 | <regex>Detect PR references as: /#(\d{1,7})/g</regex> |
651 | 667 | <heuristics> |
652 | | - - If a bullet has no explicit PR number, attempt fuzzy matching to PR titles |
653 | | - - If ambiguous, mark as "unlinked" and exclude by default pending user choice |
654 | | - </heuristics> |
| 668 | + Purpose: Count changelog bullets without explicit PR numbers as "referenced" when they confidently map to a PR. |
| 669 | + Matching algorithm: |
| 670 | + - Normalize both bullet text and PR titles: |
| 671 | + - lowercase; remove punctuation; strip prefixes like "fix:", "feat:", "add:", "improve:", "chore:", "refactor:" |
| 672 | + - remove parentheticals such as "(thanks @user!)", "(PR by @user)", "(#1234 ...)" |
| 673 | + - collapse whitespace |
| 674 | + - Tokenize and compute token-overlap score = |intersection(tokens)| / |union(tokens)| |
| 675 | + - Author signal: if bullet contains "thanks @user", "by @user", or "PR by @user" and that user equals the PR author or credited issue reporter, add +0.20 to score |
| 676 | + - Keyword boost: +0.05 when provider/model/domain keywords (e.g., OpenAI, Claude, Grok, Chutes, Qwen, LongCat, etc.) appear in both |
| 677 | + Confidence thresholds: |
| 678 | + - score ≥ 0.65 (after boosts) → linked (confident). Treat as changelog-referenced. |
| 679 | + - 0.45 ≤ score < 0.65 OR multiple candidates within 0.05 → ambiguous (needs review) |
| 680 | + - score < 0.45 → unlinked |
| 681 | + Tie-breakers: higher score; if within 0.02 then same author; then closer merge date to release date; then lowest PR number |
| 682 | + Edge case: If bullet credits exactly one username and exactly one PR in the window has that author, accept with score ≥ 0.50 (confidence="author-boost") |
| 683 | + Implementation notes: |
| 684 | + - Match only against PRs fetched for the version's date window |
| 685 | + - Persist mapping bullet_text → { prNumber, confidenceScore, rationaleSignals[] } and use it to compute linked/ambiguous/unlinked counts |
| 686 | +</heuristics> |
655 | 687 | </parsing> |
656 | 688 | </step> |
657 | 689 |
|
|
993 | 1025 | <parsing> |
994 | 1026 | <regex>Detect PR references as: /#(\d{1,7})/g</regex> |
995 | 1027 | <heuristics> |
996 | | - - If a bullet has no explicit PR number, attempt fuzzy matching to PR titles |
997 | | - - If ambiguous, mark as "unlinked" and exclude by default pending user choice |
998 | | - </heuristics> |
| 1028 | + Purpose: Count changelog bullets without explicit PR numbers as "referenced" when they confidently map to a PR. |
| 1029 | + Matching algorithm: |
| 1030 | + - Normalize both bullet text and PR titles: |
| 1031 | + - lowercase; remove punctuation; strip prefixes like "fix:", "feat:", "add:", "improve:", "chore:", "refactor:" |
| 1032 | + - remove parentheticals such as "(thanks @user!)", "(PR by @user)", "(#1234 ...)" |
| 1033 | + - collapse whitespace |
| 1034 | + - Tokenize and compute token-overlap score = |intersection(tokens)| / |union(tokens)| |
| 1035 | + - Author signal: if bullet contains "thanks @user", "by @user", or "PR by @user" and that user equals the PR author or credited issue reporter, add +0.20 to score |
| 1036 | + - Keyword boost: +0.05 when provider/model/domain keywords (e.g., OpenAI, Claude, Grok, Chutes, Qwen, LongCat, etc.) appear in both |
| 1037 | + Confidence thresholds: |
| 1038 | + - score ≥ 0.65 (after boosts) → linked (confident). Treat as changelog-referenced. |
| 1039 | + - 0.45 ≤ score < 0.65 OR multiple candidates within 0.05 → ambiguous (needs review) |
| 1040 | + - score < 0.45 → unlinked |
| 1041 | + Tie-breakers: higher score; if within 0.02 then same author; then closer merge date to release date; then lowest PR number |
| 1042 | + Edge case: If bullet credits exactly one username and exactly one PR in the window has that author, accept with score ≥ 0.50 (confidence="author-boost") |
| 1043 | + Implementation notes: |
| 1044 | + - Match only against PRs fetched for the version's date window |
| 1045 | + - Persist mapping bullet_text → { prNumber, confidenceScore, rationaleSignals[] } and use it to compute linked/ambiguous/unlinked counts |
| 1046 | +</heuristics> |
999 | 1047 | </parsing> |
1000 | 1048 | </step> |
1001 | 1049 |
|
@@ -1335,9 +1383,25 @@ fi |
1335 | 1383 | <parsing> |
1336 | 1384 | <regex>Detect PR references as: /#(\d{1,7})/g</regex> |
1337 | 1385 | <heuristics> |
1338 | | - - If a bullet has no explicit PR number, attempt fuzzy matching to PR titles |
1339 | | - - If ambiguous, mark as "unlinked" and exclude by default pending user choice |
1340 | | - </heuristics> |
| 1386 | + Purpose: Count changelog bullets without explicit PR numbers as "referenced" when they confidently map to a PR. |
| 1387 | + Matching algorithm: |
| 1388 | + - Normalize both bullet text and PR titles: |
| 1389 | + - lowercase; remove punctuation; strip prefixes like "fix:", "feat:", "add:", "improve:", "chore:", "refactor:" |
| 1390 | + - remove parentheticals such as "(thanks @user!)", "(PR by @user)", "(#1234 ...)" |
| 1391 | + - collapse whitespace |
| 1392 | + - Tokenize and compute token-overlap score = |intersection(tokens)| / |union(tokens)| |
| 1393 | + - Author signal: if bullet contains "thanks @user", "by @user", or "PR by @user" and that user equals the PR author or credited issue reporter, add +0.20 to score |
| 1394 | + - Keyword boost: +0.05 when provider/model/domain keywords (e.g., OpenAI, Claude, Grok, Chutes, Qwen, LongCat, etc.) appear in both |
| 1395 | + Confidence thresholds: |
| 1396 | + - score ≥ 0.65 (after boosts) → linked (confident). Treat as changelog-referenced. |
| 1397 | + - 0.45 ≤ score < 0.65 OR multiple candidates within 0.05 → ambiguous (needs review) |
| 1398 | + - score < 0.45 → unlinked |
| 1399 | + Tie-breakers: higher score; if within 0.02 then same author; then closer merge date to release date; then lowest PR number |
| 1400 | + Edge case: If bullet credits exactly one username and exactly one PR in the window has that author, accept with score ≥ 0.50 (confidence="author-boost") |
| 1401 | + Implementation notes: |
| 1402 | + - Match only against PRs fetched for the version's date window |
| 1403 | + - Persist mapping bullet_text → { prNumber, confidenceScore, rationaleSignals[] } and use it to compute linked/ambiguous/unlinked counts |
| 1404 | +</heuristics> |
1341 | 1405 | </parsing> |
1342 | 1406 | </step> |
1343 | 1407 |
|
@@ -1687,9 +1751,25 @@ fi |
1687 | 1751 | <parsing> |
1688 | 1752 | <regex>Detect PR references as: /#(\d{1,7})/g</regex> |
1689 | 1753 | <heuristics> |
1690 | | - - If a bullet has no explicit PR number, attempt fuzzy matching to PR titles |
1691 | | - - If ambiguous, mark as "unlinked" and exclude by default pending user choice |
1692 | | - </heuristics> |
| 1754 | + Purpose: Count changelog bullets without explicit PR numbers as "referenced" when they confidently map to a PR. |
| 1755 | + Matching algorithm: |
| 1756 | + - Normalize both bullet text and PR titles: |
| 1757 | + - lowercase; remove punctuation; strip prefixes like "fix:", "feat:", "add:", "improve:", "chore:", "refactor:" |
| 1758 | + - remove parentheticals such as "(thanks @user!)", "(PR by @user)", "(#1234 ...)" |
| 1759 | + - collapse whitespace |
| 1760 | + - Tokenize and compute token-overlap score = |intersection(tokens)| / |union(tokens)| |
| 1761 | + - Author signal: if bullet contains "thanks @user", "by @user", or "PR by @user" and that user equals the PR author or credited issue reporter, add +0.20 to score |
| 1762 | + - Keyword boost: +0.05 when provider/model/domain keywords (e.g., OpenAI, Claude, Grok, Chutes, Qwen, LongCat, etc.) appear in both |
| 1763 | + Confidence thresholds: |
| 1764 | + - score ≥ 0.65 (after boosts) → linked (confident). Treat as changelog-referenced. |
| 1765 | + - 0.45 ≤ score < 0.65 OR multiple candidates within 0.05 → ambiguous (needs review) |
| 1766 | + - score < 0.45 → unlinked |
| 1767 | + Tie-breakers: higher score; if within 0.02 then same author; then closer merge date to release date; then lowest PR number |
| 1768 | + Edge case: If bullet credits exactly one username and exactly one PR in the window has that author, accept with score ≥ 0.50 (confidence="author-boost") |
| 1769 | + Implementation notes: |
| 1770 | + - Match only against PRs fetched for the version's date window |
| 1771 | + - Persist mapping bullet_text → { prNumber, confidenceScore, rationaleSignals[] } and use it to compute linked/ambiguous/unlinked counts |
| 1772 | +</heuristics> |
1693 | 1773 | </parsing> |
1694 | 1774 | </step> |
1695 | 1775 |
|
|
0 commit comments