feat(ai): claude code agent by bartolomej · Pull Request #44 · lightning-rod-labs/lightningrod-python-sdk

bartolomej · 2026-03-20T18:58:44Z

Lightningrod Assistant Agent

This PR introduces a Claude Code subagent (lightningrod-assistant) that helps users build forecasting datasets and fine-tune models using the Lightningrod SDK.

How it works

The agent is defined in .claude/agents/lightningrod-assistant.md and runs as a Claude Code subagent with access to file tools, Bash, and a docs MCP server. It is guided by 8 domain-specific skills covering training pattern selection, data source choices, and example walkthroughs.

Try it out

First clone this SDK repo branch:

git clone https://github.com/lightning-rod-labs/lightningrod-python-sdk.git && cd lightningrod-python-sdk # if not there already
git switch bart/sdk-agent

Invoke directly from any terminal with Claude Code installed:

claude --dangerously-skip-permissions --agent lightningrod-assistant "I want to build a forecasting dataset on tech company layoffs"

Demo

Screen.Recording.2026-04-17.at.16.11.46.mov

bartolomej · 2026-03-20T18:58:56Z

feat(ai): cloud code agent evals #62
feat(ai): claude code agent #44 👈 (View in Graphite)
feat(training): update fine tuning examples to our API, linting v1 #43
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

…progress plot script

- Add Data quality flags section: bias detection rules for news (outcome bias, survivorship), survey selection bias, temporal leakage, power-law targets, and when structured data (BigQuery) beats news - Include clear rule for when news IS appropriate (sports, policy, elections) to avoid false alarms on positive test cases - Add "Assess first, ask second" guideline to reduce tool calls before giving initial assessment — fixes max_turns errors and timeout failures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Agent was burning all 10 allowed tool turns reading reference notebooks (~50k tokens each) before giving any response, causing max_turns errors on positive tests (golf, policy) and infrastructure timeouts on others. Changes: - Strengthen "first response is text" rule: explicitly forbid tool calls before the initial advisory response - Change reference notebooks section from "Consult these" (proactive read) to "Read only when writing code" (conditional) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Judge scored prediction_date criterion 0.3 because agent mentioned the concept but didn't explicitly say 'set it to the event date, not the outcome date'. Also missing entity leakage warning for multi-company datasets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. Temporal splitting: generalize from 'financial/market data' to 'all forecasting datasets' — golf missed this criterion (scored 0.0 on temporal split with weight 0.2) because the rule only mentioned finance 2. Binary label reuse: when data already has a binary outcome column (churned, funded, success), use it directly rather than predicting an intermediate score. Fixes bias-selection-survey criterion 3 (0.0 → 1.0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The "first response is text" rule had a conditional — "for any planning or advisory query" — that the agent correctly didn't apply to build requests like "set this up" or "build the pipeline". Those tasks caused the agent to immediately read reference notebooks (tool calls), which regularly exceeded the 240s eval timeout. Removing the qualifier makes the rule unconditional. This closes the loophole for golf, policy, stocks, and cost-awareness tasks — all of which are phrased as build/setup requests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The agent correctly warns about cost and calls estimate_cost() before large runs, but never suggests validating at intermediate scale (e.g. 500-1K questions) before committing the full budget. Added explicit guidance: when scaling from a small test to production, recommend an intermediate run first to validate quality at scale. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Temporal splitting was only mentioned reactively (as a data quality flag) — so the agent skipped it for clean forecasting tasks like golf where there are no data quality concerns. Added explicit guidance: mention temporal splitting proactively as a standard step in every forecasting proposal, not just when there are leakage issues. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This reverts commit 8b6332e.

paulwilczewski

looks interesting, curious to try it out!

one thing that's a bit odd in the demo video is that the agent initially proposes using a data source that it doesn't have access to. maybe it needs context about which data is available?

bartolomej · 2026-04-20T18:41:05Z

looks interesting, curious to try it out!

one thing that's a bit odd in the demo video is that the agent initially proposes using a data source that it doesn't have access to. maybe it needs context about which data is available?

That is meant to be intentional - I've instructed the agent to try find the best datasource for the problem, even if its not natively supported by our SDK (it should know how to manually fetch and transform that data into seeds). But whenever it can, it should use the natively supported datasets like BigQuery, which I think it should be already doing.

But it should also be easy to add more guidance around that in the existing skill-set, I just haven't finished setting up the evals - it'd be good if we can also test that every change we make to prompt/skills actually helps the agent in that specific scenario.

This PR introduces a Claude Code subagent (`lightningrod-assistant`) that helps users build forecasting datasets and fine-tune models using the Lightningrod SDK. The agent is defined in `.claude/agents/lightningrod-assistant.md` and runs as a Claude Code subagent with access to file tools, Bash, and a docs MCP server. It is guided by 8 domain-specific skills covering training pattern selection, data source choices, and example walkthroughs. First clone this SDK repo branch: ```bash git clone https://github.com/lightning-rod-labs/lightningrod-python-sdk.git && cd lightningrod-python-sdk # if not there already git switch bart/sdk-agent ``` Invoke directly from any terminal with Claude Code installed: ```bash claude --dangerously-skip-permissions --agent lightningrod-assistant "I want to build a forecasting dataset on tech company layoffs" ``` https://github.com/user-attachments/assets/63fc4906-1f4f-4b83-a5c5-02ab7fce4025 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

bartolomej mentioned this pull request Mar 20, 2026

feat(training): update fine tuning examples to our API, linting v1 #43

Merged

bartolomej changed the title ~~initial draft from the old branch~~ feat(agent): initial claude based agent experiment Mar 20, 2026

bartolomej changed the base branch from bart/update-training-examples to graphite-base/44 April 1, 2026 12:23

initial draft from the old branch

4625f2e

bartolomej force-pushed the graphite-base/44 branch from 489add5 to c252e6f Compare April 6, 2026 13:28

bartolomej force-pushed the bart/sdk-agent branch from b3c7f32 to 4625f2e Compare April 6, 2026 13:28

bartolomej changed the base branch from graphite-base/44 to main April 6, 2026 13:28

bartolomej and others added 22 commits April 6, 2026 16:36

updated skills based on agent-docs

ec65416

temporary remove SFT implementation details

7eb9928

add a generic and simpler lightningrod assistant agent

74434d9

set default user code location

4ff2525

update assistant agent with extra frontmatter fields

846a59e

update the clarification/solution proposal flow

fd9b463

run notebooks step by step by default

aede6f8

guidance around field usages

cd3a641

initial autoagent / harbor setup

3ec6438

better align program.md with reference autoagent implementation, add …

15edec2

…progress plot script

add commands to makefile

27ef68b

Revert "Always mention temporal splitting in forecasting proposals"

62ebebd

This reverts commit 8b6332e.

force assistant to always use AskUserQuestion tool

bc99591

improve bigquery seed generator handling

fa6c58a

self improve based on session feedback

4f16ee2

bartolomej added 10 commits April 7, 2026 17:16

improve assistant agent plan mode

7739629

update agents readme

dc02c32

fix python version to work with harbor, configure type resolution

9846a70

make session param optional for improvement command

97d6d9f

temporal relevance task

0d34510

start migrating to trajectory based format

2019e4a

Merge branch 'main' into bart/sdk-agent

dfca680

update example files

9f0f255

remove orchestration based agents

83398f4

Merge branch 'main' into bart/sdk-agent

c1b136a

bartolomej changed the title ~~feat(agent): initial claude based agent experiment~~ feat(ai): claude code agent Apr 17, 2026

bartolomej changed the title ~~feat(ai): claude code agent~~ feat(ai): claude code agent (experiment) Apr 17, 2026

env notes, switch to opus

ceb1b6b

bartolomej marked this pull request as ready for review April 17, 2026 14:15

bartolomej requested review from koskotheim and paulwilczewski April 17, 2026 14:15

paulwilczewski approved these changes Apr 20, 2026

View reviewed changes

bartolomej added 5 commits April 22, 2026 18:44

add small scale test guidance

3ac8c96

Merge branch 'main' into bart/sdk-agent

64d7798

update filsets skill

13ad462

update docs

7aa1b1c

Remove agent evals (moved to bart/sdk-agent-evals)

33b7cb9

bartolomej mentioned this pull request Apr 24, 2026

feat(ai): cloud code agent evals #62

Draft

bartolomej changed the title ~~feat(ai): claude code agent (experiment)~~ feat(ai): claude code agent Apr 24, 2026

bartolomej added 2 commits May 1, 2026 15:11

update agent flow

fbff290

Merge remote-tracking branch 'origin/main' into bart/sdk-agent

71c3c93

bartolomej merged commit 6dc5795 into main May 5, 2026
3 checks passed

bartolomej mentioned this pull request May 8, 2026

feat(ai): expose reasoning evals + dataset linting, agent lib #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): claude code agent#44

feat(ai): claude code agent#44
bartolomej merged 42 commits into
mainfrom
bart/sdk-agent

bartolomej commented Mar 20, 2026 •

edited

Loading

Uh oh!

bartolomej commented Mar 20, 2026 •

edited

Loading

Uh oh!

paulwilczewski left a comment

Uh oh!

bartolomej commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bartolomej commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lightningrod Assistant Agent

How it works

Try it out

Demo

Uh oh!

bartolomej commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulwilczewski left a comment

Choose a reason for hiding this comment

Uh oh!

bartolomej commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bartolomej commented Mar 20, 2026 •

edited

Loading

bartolomej commented Mar 20, 2026 •

edited

Loading