feat: add last_scrape_summary to template env#1924
feat: add last_scrape_summary to template env#1924yashmehrotra wants to merge 2 commits intomainfrom
Conversation
WalkthroughThis PR adds support for templating the last scrape summary data. It extends ScrapeContext to carry scrape summary information, implements caching for efficient retrieval from job history, and exposes the summary to various templating engines. Changes
Sequence Diagram(s)sequenceDiagram
participant Scraper as Scraper Job
participant Cache as Summary Cache
participant DB as Job History DB
participant Context as ScrapeContext
participant Template as Template Engine
participant Processor as Result Processor
Scraper->>Cache: Check for last scrape summary
alt Summary in cache
Cache-->>Scraper: Return cached summary
else Cache miss
Scraper->>DB: Query job_history for latest run
DB-->>Scraper: Return historical summary
Scraper->>Cache: Store in cache
end
Scraper->>Context: WithLastScrapeSummary(summary)
Context-->>Scraper: Return updated context
Scraper->>Processor: Run scraper with context
Processor->>Template: Evaluate template with last_scrape_summary
Template-->>Processor: Return evaluated result
Processor-->>Scraper: Return processed results
Scraper->>DB: Store scrape_summary in job history
Scraper->>Cache: Update cached summary
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
scrapers/processors/json.go (1)
468-489: Build and reuse oneenvmap per result in this loop.Line 468 injects into one
AsMap()call, while subsequent template/filter calls use freshAsMap()calls. Reusing a singleenvhere makeslast_scrape_summaryusage deterministic and reduces repeated map construction.💡 Proposed refactor
for _, result := range ongoingInput { - result.AsMap()["last_scrape_summary"] = ctx.LastScrapeSummary() + env := result.AsMap() + env["last_scrape_summary"] = ctx.LastScrapeSummary() for i, configProperty := range result.BaseScraper.Properties { if configProperty.Filter != "" { - if response, err := gomplate.RunTemplate(result.AsMap(), gomplate.Template{Expression: configProperty.Filter}); err != nil { + if response, err := gomplate.RunTemplate(env, gomplate.Template{Expression: configProperty.Filter}); err != nil { result.Errorf("failed to parse filter: %v", err) continue } else if boolVal, err := strconv.ParseBool(response); err != nil { @@ templater := gomplate.StructTemplater{ - Values: result.AsMap(), + Values: env, ValueFunctions: true,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/processors/json.go` around lines 468 - 489, Build a single env map once per result and reuse it inside the properties loop instead of calling result.AsMap() repeatedly: create env := result.AsMap(); set env["last_scrape_summary"] = ctx.LastScrapeSummary() before iterating over result.BaseScraper.Properties, then pass that env into gomplate.RunTemplate (for filters) and templater.StructTemplater{Values: env, ...} so all templates/filters consistently see the same map; keep the existing copy of Links (configProperty.Links) and other logic unchanged, just replace every in-loop result.AsMap() usage with the env variable.api/context_test.go (1)
12-63: Good coverage forLastScrapeSummary; add aWithValuepropagation regression test.Current tests validate summary propagation well, but there’s no guard for context-state propagation when cloning via
WithValue(especially incremental mode).🧪 Suggested test case
func TestScrapeContext_LastScrapeSummary(t *testing.T) { + t.Run("preserves incremental mode through WithValue", func(t *testing.T) { + ctx := ScrapeContext{}. + AsIncrementalScrape(). + WithValue("k", "v") + + assert.True(t, ctx.IsIncrementalScrape()) + }) + t.Run("returns empty map when unset", func(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/context_test.go` around lines 12 - 63, Add a regression test that verifies ScrapeContext.LastScrapeSummary is preserved when the context is cloned via WithValue, especially in incremental mode: create a ScrapeContext, set a summary with WithLastScrapeSummary, then call WithValue (and also call AsIncrementalScrape then WithValue in a sub-case) and assert the returned context's LastScrapeSummary still contains the same entries (use ScrapeContext, WithLastScrapeSummary, WithValue, AsIncrementalScrape, and LastScrapeSummary to locate the code).scrapers/cron.go (1)
36-39: Add cache eviction path forscraperSummaryCache.
scraperSummaryCacheis introduced at Line 38, but entries are never removed when a scraper job is deleted. This can leave stale cache growth over time.💡 Proposed fix
func DeleteScrapeJob(id string) { logger.Debugf("deleting scraper job for %s", id) + scraperSummaryCache.Delete(id) if j, ok := scrapeJobs.Load(id); ok {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scrapers/cron.go` around lines 36 - 39, The scraperSummaryCache (sync.Map) never evicts entries when a scraper job is removed, causing stale growth; update the job-deletion flow (the function that removes entries from scrapeJobs / stops cron jobs — e.g., DeleteScrapeJob or the handler that calls scrapeJobs.Delete) to also remove related cache entries by calling scraperSummaryCache.Delete(jobID) for the job's key (or iterate scraperSummaryCache.Range and Delete keys that match the job id pattern if keys are composite), ensuring cache entries are removed when a scraper job is deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/context.go`:
- Around line 93-101: WithValue currently reconstructs a ScrapeContext but omits
the isIncremental field, causing incremental mode set by AsIncrementalScrape()
to be lost; update the ScrapeContext literal returned by WithValue to copy the
isIncremental boolean from the receiver (ctx.isIncremental) so the incremental
state is preserved when cloning the context.
In `@scrapers/cron.go`:
- Around line 360-370: The current DB query fetches the latest run including
StatusFailed which can mask a previous successful summary; change the logic in
scrapers/cron.go so you first query only for models.StatusSuccess and
models.StatusWarning (using the same Where/Order/First against history), and if
that returns nil, run a second query that includes models.StatusFailed as a
fallback; update the code paths that read history.Details["scrape_summary"] to
use the history from the first successful/warning query when available, and only
use the failed-run history when no prior success/warning exists (refer to the
variables/history query block, scraperID, job.ResourceTypeScraper, and
history.Details access).
---
Nitpick comments:
In `@api/context_test.go`:
- Around line 12-63: Add a regression test that verifies
ScrapeContext.LastScrapeSummary is preserved when the context is cloned via
WithValue, especially in incremental mode: create a ScrapeContext, set a summary
with WithLastScrapeSummary, then call WithValue (and also call
AsIncrementalScrape then WithValue in a sub-case) and assert the returned
context's LastScrapeSummary still contains the same entries (use ScrapeContext,
WithLastScrapeSummary, WithValue, AsIncrementalScrape, and LastScrapeSummary to
locate the code).
In `@scrapers/cron.go`:
- Around line 36-39: The scraperSummaryCache (sync.Map) never evicts entries
when a scraper job is removed, causing stale growth; update the job-deletion
flow (the function that removes entries from scrapeJobs / stops cron jobs —
e.g., DeleteScrapeJob or the handler that calls scrapeJobs.Delete) to also
remove related cache entries by calling scraperSummaryCache.Delete(jobID) for
the job's key (or iterate scraperSummaryCache.Range and Delete keys that match
the job id pattern if keys are composite), ensuring cache entries are removed
when a scraper job is deleted.
In `@scrapers/processors/json.go`:
- Around line 468-489: Build a single env map once per result and reuse it
inside the properties loop instead of calling result.AsMap() repeatedly: create
env := result.AsMap(); set env["last_scrape_summary"] = ctx.LastScrapeSummary()
before iterating over result.BaseScraper.Properties, then pass that env into
gomplate.RunTemplate (for filters) and templater.StructTemplater{Values: env,
...} so all templates/filters consistently see the same map; keep the existing
copy of Links (configProperty.Links) and other logic unchanged, just replace
every in-loop result.AsMap() usage with the env variable.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
api/context.goapi/context_test.goscrapers/changes/rules.goscrapers/cron.goscrapers/processors/json.goscrapers/processors/script.go
| func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext { | ||
| return ScrapeContext{ | ||
| Context: ctx.Context.WithValue(key, val), | ||
| temp: ctx.temp, | ||
| namespace: ctx.namespace, | ||
| jobHistory: ctx.jobHistory, | ||
| scrapeConfig: ctx.scrapeConfig, | ||
| Context: ctx.Context.WithValue(key, val), | ||
| temp: ctx.temp, | ||
| namespace: ctx.namespace, | ||
| jobHistory: ctx.jobHistory, | ||
| scrapeConfig: ctx.scrapeConfig, | ||
| lastScrapeSummary: ctx.lastScrapeSummary, | ||
| } |
There was a problem hiding this comment.
Preserve incremental mode when cloning context with WithValue.
WithValue rebuilds ScrapeContext but currently drops isIncremental. If AsIncrementalScrape() is called before WithValue, incremental state is lost.
💡 Proposed fix
func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext {
return ScrapeContext{
Context: ctx.Context.WithValue(key, val),
temp: ctx.temp,
+ isIncremental: ctx.isIncremental,
namespace: ctx.namespace,
jobHistory: ctx.jobHistory,
scrapeConfig: ctx.scrapeConfig,
lastScrapeSummary: ctx.lastScrapeSummary,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext { | |
| return ScrapeContext{ | |
| Context: ctx.Context.WithValue(key, val), | |
| temp: ctx.temp, | |
| namespace: ctx.namespace, | |
| jobHistory: ctx.jobHistory, | |
| scrapeConfig: ctx.scrapeConfig, | |
| Context: ctx.Context.WithValue(key, val), | |
| temp: ctx.temp, | |
| namespace: ctx.namespace, | |
| jobHistory: ctx.jobHistory, | |
| scrapeConfig: ctx.scrapeConfig, | |
| lastScrapeSummary: ctx.lastScrapeSummary, | |
| } | |
| func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext { | |
| return ScrapeContext{ | |
| Context: ctx.Context.WithValue(key, val), | |
| temp: ctx.temp, | |
| isIncremental: ctx.isIncremental, | |
| namespace: ctx.namespace, | |
| jobHistory: ctx.jobHistory, | |
| scrapeConfig: ctx.scrapeConfig, | |
| lastScrapeSummary: ctx.lastScrapeSummary, | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/context.go` around lines 93 - 101, WithValue currently reconstructs a
ScrapeContext but omits the isIncremental field, causing incremental mode set by
AsIncrementalScrape() to be lost; update the ScrapeContext literal returned by
WithValue to copy the isIncremental boolean from the receiver
(ctx.isIncremental) so the incremental state is preserved when cloning the
context.
| Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper). | ||
| Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning, models.StatusFailed}). | ||
| Order("time_end DESC").First(&history).Error | ||
| if err != nil { | ||
| return nil | ||
| } | ||
|
|
||
| raw, ok := history.Details["scrape_summary"] | ||
| if !ok { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Latest failed run can hide the previous valid summary.
At Line 361-362, the query includes models.StatusFailed. At Line 367-370, if that failed row has no scrape_summary, this returns nil and drops incremental context (notably after process restart when cache is empty).
💡 Proposed fix
err := ctx.DB().
Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper).
- Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning, models.StatusFailed}).
+ Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning}).
Order("time_end DESC").First(&history).Error📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper). | |
| Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning, models.StatusFailed}). | |
| Order("time_end DESC").First(&history).Error | |
| if err != nil { | |
| return nil | |
| } | |
| raw, ok := history.Details["scrape_summary"] | |
| if !ok { | |
| return nil | |
| } | |
| Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper). | |
| Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning}). | |
| Order("time_end DESC").First(&history).Error | |
| if err != nil { | |
| return nil | |
| } | |
| raw, ok := history.Details["scrape_summary"] | |
| if !ok { | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scrapers/cron.go` around lines 360 - 370, The current DB query fetches the
latest run including StatusFailed which can mask a previous successful summary;
change the logic in scrapers/cron.go so you first query only for
models.StatusSuccess and models.StatusWarning (using the same Where/Order/First
against history), and if that returns nil, run a second query that includes
models.StatusFailed as a fallback; update the code paths that read
history.Details["scrape_summary"] to use the history from the first
successful/warning query when available, and only use the failed-run history
when no prior success/warning exists (refer to the variables/history query
block, scraperID, job.ResourceTypeScraper, and history.Details access).
Fixes: #1917
Summary by CodeRabbit
New Features
Tests