Skip to content

Commit fc9261e

Browse files
committed
Merge remote-tracking branch 'upstream/main' into feature/engine_extensions
2 parents b7fca22 + 8a4fd93 commit fc9261e

File tree

85 files changed

+5398
-2774
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+5398
-2774
lines changed

.claude/commands/dedupe.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
allowed-tools: Bash(gh:*), Bash(./scripts/comment-on-duplicates.sh:*)
3+
description: Find duplicate GitHub issues
4+
---
5+
6+
Find up to 3 likely duplicate issues for a given GitHub issue.
7+
8+
Follow these steps precisely:
9+
10+
1. Use `gh issue view <number>` to read the issue. If the issue is closed, or is broad product feedback without a specific bug/feature request, or already has a duplicate detection comment (containing `<!-- duplicate-detection -->`), stop and report why you are not proceeding.
11+
12+
2. Summarize the issue's core problem in 2-3 sentences. Identify the key terms, error messages, and affected components.
13+
14+
3. Search for potential duplicates using **at least 3 different search strategies**. Run these searches in parallel. **Only consider issues with a lower issue number** (older issues) as potential originals — skip any result with a number >= the current issue. Also skip issues already labeled `duplicate`.
15+
- `gh search issues "<exact error message or key phrase>" --repo $GITHUB_REPOSITORY --state open -- -label:duplicate --limit 15 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'`
16+
- `gh search issues "<component or feature keywords>" --repo $GITHUB_REPOSITORY --state open -- -label:duplicate --limit 15 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'`
17+
- `gh search issues "<alternate description of the problem>" --repo $GITHUB_REPOSITORY --state open -- -label:duplicate --limit 15 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'`
18+
- `gh search issues "<key terms>" --repo $GITHUB_REPOSITORY --state all -- -label:duplicate --limit 10 --json number,title | jq '[.[] | select(.number < <current-issue-number>)]'` (include closed issues for reference)
19+
20+
4. For each candidate issue that looks like a potential match, read it with `gh issue view <number>` to verify it is truly about the same problem. Filter out false positives — issues that merely share keywords but describe different problems.
21+
22+
5. If you find 1-3 genuine duplicates, post the result using the comment script:
23+
```
24+
./scripts/comment-on-duplicates.sh --base-issue <issue-number> --potential-duplicates <dup1> [dup2] [dup3]
25+
```
26+
27+
6. If no genuine duplicates are found, report that no duplicates were detected and take no further action.
28+
29+
Important notes:
30+
- Only flag issues as duplicates when you are confident they describe the **same underlying problem**
31+
- Prefer open issues as duplicates, but closed issues can be referenced too
32+
- Do not flag the issue as a duplicate of itself
33+
- The base issue number is the last part of the issue reference (e.g., for `owner/repo/issues/42`, the number is `42`)
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
name: Auto-close duplicate issues
2+
3+
on:
4+
schedule:
5+
- cron: "0 9 * * *"
6+
workflow_dispatch:
7+
8+
permissions:
9+
issues: write
10+
11+
jobs:
12+
auto-close-duplicates:
13+
runs-on: ubuntu-latest
14+
timeout-minutes: 10
15+
steps:
16+
- name: Close stale duplicate issues
17+
uses: actions/github-script@v7
18+
env:
19+
GRACE_DAYS: ${{ vars.DUPLICATE_GRACE_DAYS || '7' }}
20+
with:
21+
script: |
22+
const { owner, repo } = context.repo;
23+
const graceDays = parseInt(process.env.GRACE_DAYS, 10) || 7;
24+
const GRACE_PERIOD_MS = graceDays * 24 * 60 * 60 * 1000;
25+
const now = Date.now();
26+
27+
// Find all open issues with the duplicate label
28+
const issues = await github.paginate(github.rest.issues.listForRepo, {
29+
owner,
30+
repo,
31+
state: 'open',
32+
labels: 'duplicate',
33+
per_page: 100,
34+
});
35+
36+
console.log(`Found ${issues.length} open issues with duplicate label`);
37+
38+
let closedCount = 0;
39+
40+
for (const issue of issues) {
41+
console.log(`Processing issue #${issue.number}: ${issue.title}`);
42+
43+
// Get comments to find the duplicate detection comment
44+
const comments = await github.rest.issues.listComments({
45+
owner,
46+
repo,
47+
issue_number: issue.number,
48+
per_page: 100,
49+
});
50+
51+
// Find the duplicate detection comment (posted by our script)
52+
const dupeComments = comments.data.filter(c =>
53+
c.body.includes('<!-- duplicate-detection -->')
54+
);
55+
56+
if (dupeComments.length === 0) {
57+
console.log(` No duplicate detection comment found, skipping`);
58+
continue;
59+
}
60+
61+
const lastDupeComment = dupeComments[dupeComments.length - 1];
62+
const dupeCommentAge = now - new Date(lastDupeComment.created_at).getTime();
63+
64+
if (dupeCommentAge < GRACE_PERIOD_MS) {
65+
const daysLeft = ((GRACE_PERIOD_MS - dupeCommentAge) / (24 * 60 * 60 * 1000)).toFixed(1);
66+
console.log(` Duplicate comment is too recent (${daysLeft} days remaining), skipping`);
67+
continue;
68+
}
69+
70+
// Check for human comments after the duplicate detection comment
71+
const humanCommentsAfter = comments.data.filter(c =>
72+
new Date(c.created_at) > new Date(lastDupeComment.created_at) &&
73+
c.user.type !== 'Bot' &&
74+
!c.body.includes('<!-- duplicate-detection -->') &&
75+
!c.body.includes('automatically closed as a duplicate')
76+
);
77+
78+
if (humanCommentsAfter.length > 0) {
79+
console.log(` Has ${humanCommentsAfter.length} human comment(s) after detection, skipping`);
80+
continue;
81+
}
82+
83+
// Check for thumbs-down reaction from the issue author
84+
const reactions = await github.rest.reactions.listForIssueComment({
85+
owner,
86+
repo,
87+
comment_id: lastDupeComment.id,
88+
per_page: 100,
89+
});
90+
91+
const authorThumbsDown = reactions.data.some(r =>
92+
r.user.id === issue.user.id && r.content === '-1'
93+
);
94+
95+
if (authorThumbsDown) {
96+
console.log(` Issue author gave thumbs-down on duplicate comment, skipping`);
97+
continue;
98+
}
99+
100+
// Extract the primary duplicate issue number from the comment
101+
const dupeMatch = lastDupeComment.body.match(/#(\d+)/);
102+
const dupeNumber = dupeMatch ? dupeMatch[1] : 'unknown';
103+
104+
// Close the issue
105+
console.log(` Closing as duplicate of #${dupeNumber}`);
106+
107+
await github.rest.issues.update({
108+
owner,
109+
repo,
110+
issue_number: issue.number,
111+
state: 'closed',
112+
state_reason: 'duplicate',
113+
});
114+
115+
await github.rest.issues.addLabels({
116+
owner,
117+
repo,
118+
issue_number: issue.number,
119+
labels: ['autoclose'],
120+
});
121+
122+
await github.rest.issues.createComment({
123+
owner,
124+
repo,
125+
issue_number: issue.number,
126+
body: `This issue has been automatically closed as a duplicate of #${dupeNumber}.\n\nIf this is incorrect, please reopen this issue or create a new one.\n\n🤖 Generated with [Claude Code](https://claude.ai/code)`,
127+
});
128+
129+
closedCount++;
130+
}
131+
132+
console.log(`Done. Closed ${closedCount} duplicate issue(s).`);
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Claude Issue Dedupe
2+
3+
on:
4+
issues:
5+
types: [opened]
6+
workflow_dispatch:
7+
inputs:
8+
issue_number:
9+
description: 'Issue number to check for duplicates'
10+
required: true
11+
type: string
12+
13+
permissions:
14+
contents: read
15+
issues: write
16+
id-token: write
17+
18+
jobs:
19+
dedupe:
20+
runs-on: ubuntu-latest
21+
timeout-minutes: 10
22+
steps:
23+
- name: Checkout repository
24+
uses: actions/checkout@v4
25+
26+
- name: Configure AWS Credentials (OIDC)
27+
uses: aws-actions/configure-aws-credentials@v4
28+
with:
29+
role-to-assume: ${{ secrets.BEDROCK_ACCESS_ROLE }}
30+
aws-region: us-east-1
31+
32+
- name: Run duplicate detection
33+
uses: anthropics/claude-code-action@v1
34+
env:
35+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
36+
GITHUB_REPOSITORY: ${{ github.repository }}
37+
DUPLICATE_GRACE_DAYS: ${{ vars.DUPLICATE_GRACE_DAYS }}
38+
with:
39+
use_bedrock: "true"
40+
github_token: ${{ secrets.GITHUB_TOKEN }}
41+
allowed_bots: "github-actions[bot]"
42+
prompt: "/dedupe ${{ github.repository }}/issues/${{ github.event.issue.number || inputs.issue_number }}"
43+
claude_args: "--model us.anthropic.claude-sonnet-4-5-20250929-v1:0"

.github/workflows/dependabot.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: "gradle"
4+
directory: "/"
5+
schedule:
6+
interval: "weekly"
7+
day: "monday"
8+
time: "08:00"
9+
timezone: "America/Los_Angeles"
10+
labels:
11+
- "skip-changelog"
12+
group:
13+
all-dependencies:
14+
patterns:
15+
- "*"
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Remove duplicate label on activity
2+
3+
on:
4+
issue_comment:
5+
types: [created]
6+
7+
permissions:
8+
issues: write
9+
10+
jobs:
11+
remove-duplicate:
12+
if: |
13+
github.event.issue.state == 'open' &&
14+
contains(github.event.issue.labels.*.name, 'duplicate') &&
15+
github.event.comment.user.type != 'Bot'
16+
runs-on: ubuntu-latest
17+
steps:
18+
- name: Remove duplicate label
19+
uses: actions/github-script@v7
20+
with:
21+
script: |
22+
const { owner, repo } = context.repo;
23+
const issueNumber = context.issue.number;
24+
const commenter = context.payload.comment.user.login;
25+
26+
console.log(`Removing duplicate label from issue #${issueNumber} due to comment from ${commenter}`);
27+
28+
try {
29+
await github.rest.issues.removeLabel({
30+
owner,
31+
repo,
32+
issue_number: issueNumber,
33+
name: 'duplicate',
34+
});
35+
console.log(`Successfully removed duplicate label from issue #${issueNumber}`);
36+
} catch (error) {
37+
if (error.status === 404) {
38+
console.log(`duplicate label was already removed from issue #${issueNumber}`);
39+
} else {
40+
throw error;
41+
}
42+
}

.whitesource

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,11 @@
1111
},
1212
"issueSettings": {
1313
"minSeverityLevel": "LOW"
14+
},
15+
"remediateSettings": {
16+
"addLabels": ["skip-changelog"],
17+
"workflowRules": {
18+
"enabled": true
19+
}
1420
}
15-
}
21+
}

api/README.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ This module provides components organized into two main areas aligned with the [
88

99
### Unified Language Specification
1010

11-
- **`UnifiedQueryPlanner`**: Accepts PPL (Piped Processing Language) or SQL queries and returns Calcite `RelNode` logical plans as intermediate representation.
11+
- **`UnifiedQueryParser`**: Parses PPL (Piped Processing Language) or SQL queries and returns the native parse result (`UnresolvedPlan` for PPL, `SqlNode` for Calcite SQL).
12+
- **`UnifiedQueryPlanner`**: Accepts PPL or SQL queries and returns Calcite `RelNode` logical plans as intermediate representation.
1213
- **`UnifiedQueryTranspiler`**: Converts Calcite logical plans (`RelNode`) into SQL strings for various target databases using different SQL dialects.
1314

1415
### Unified Execution Runtime
@@ -42,6 +43,20 @@ UnifiedQueryContext context = UnifiedQueryContext.builder()
4243
.build();
4344
```
4445

46+
### UnifiedQueryParser
47+
48+
Use `UnifiedQueryParser` to parse queries into their native parse tree. The parser is owned by `UnifiedQueryContext` and returns the native parse result for each language.
49+
50+
```java
51+
// PPL parsing
52+
UnresolvedPlan ast = (UnresolvedPlan) context.getParser().parse("source = logs | where status = 200");
53+
54+
// SQL parsing (with QueryType.SQL context)
55+
SqlNode sqlNode = (SqlNode) sqlContext.getParser().parse("SELECT * FROM logs WHERE status = 200");
56+
```
57+
58+
Callers can then use each language's native visitor infrastructure (`AbstractNodeVisitor` for PPL, `SqlBasicVisitor` for Calcite SQL) on the typed result for further analysis.
59+
4560
### UnifiedQueryPlanner
4661

4762
Use `UnifiedQueryPlanner` to parse and analyze PPL or SQL queries into Calcite logical plans. The planner accepts a `UnifiedQueryContext` and can be reused for multiple queries.
@@ -179,6 +194,59 @@ try (UnifiedQueryContext context = UnifiedQueryContext.builder()
179194
}
180195
```
181196

197+
## Profiling
198+
199+
The unified query API supports the same [profiling capability](../docs/user/ppl/interfaces/endpoint.md#profile-experimental) as the PPL REST endpoint. When enabled, each unified query component automatically collects per-phase timing metrics. For code outside unified query components (e.g., `PreparedStatement.executeQuery()` or response formatting), `context.measure()` records custom phases into the same profile.
200+
201+
```java
202+
try (UnifiedQueryContext context = UnifiedQueryContext.builder()
203+
.language(QueryType.PPL)
204+
.catalog("catalog", schema)
205+
.defaultNamespace("catalog")
206+
.profiling(true)
207+
.build()) {
208+
209+
// Auto-profiled: ANALYZE
210+
RelNode plan = new UnifiedQueryPlanner(context).plan(query);
211+
212+
// Auto-profiled: OPTIMIZE
213+
PreparedStatement stmt = new UnifiedQueryCompiler(context).compile(plan);
214+
215+
// User-profiled via measure()
216+
ResultSet rs = context.measure(MetricName.EXECUTE, stmt::executeQuery);
217+
String json = context.measure(MetricName.FORMAT, () -> formatter.format(result));
218+
219+
// Retrieve profile snapshot
220+
QueryProfile profile = context.getProfile();
221+
}
222+
```
223+
224+
The returned `QueryProfile` follows the same JSON structure as the REST API:
225+
226+
```json
227+
{
228+
"summary": {
229+
"total_time_ms": 33.34
230+
},
231+
"phases": {
232+
"analyze": { "time_ms": 8.68 },
233+
"optimize": { "time_ms": 18.2 },
234+
"execute": { "time_ms": 4.87 },
235+
"format": { "time_ms": 0.05 }
236+
},
237+
"plan": {
238+
"node": "EnumerableCalc",
239+
"time_ms": 4.82,
240+
"rows": 2,
241+
"children": [
242+
{ "node": "CalciteEnumerableIndexScan", "time_ms": 4.12, "rows": 2 }
243+
]
244+
}
245+
}
246+
```
247+
248+
When profiling is disabled (the default), all components execute with zero overhead.
249+
182250
## Development & Testing
183251

184252
A set of unit tests is provided to validate planner behavior.

0 commit comments

Comments
 (0)