Skip to content

feat: add last_scrape_summary to template env#1924

Open
yashmehrotra wants to merge 2 commits intomainfrom
tmpl-last-scrape-summary
Open

feat: add last_scrape_summary to template env#1924
yashmehrotra wants to merge 2 commits intomainfrom
tmpl-last-scrape-summary

Conversation

@yashmehrotra
Copy link
Member

@yashmehrotra yashmehrotra commented Feb 28, 2026

Fixes: #1917

Summary by CodeRabbit

  • New Features

    • Scrapers can now access the previous scrape summary in rules and custom scripts for improved state tracking and change detection.
    • Added caching of historical scrape data to optimize performance.
  • Tests

    • Added comprehensive unit tests for last scrape summary functionality and context propagation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

Walkthrough

This PR adds support for templating the last scrape summary data. It extends ScrapeContext to carry scrape summary information, implements caching for efficient retrieval from job history, and exposes the summary to various templating engines.

Changes

Cohort / File(s) Summary
Context Extension
api/context.go, api/context_test.go
Adds lastScrapeSummary field to ScrapeContext with accessor methods WithLastScrapeSummary() and LastScrapeSummary(). Extends WithValue() to propagate the field. Tests cover default behavior, retrieval, and preservation through context operations.
Caching and Retrieval
scrapers/cron.go
Introduces in-memory cache (scraperSummaryCache) to store per-scraper last scrape summaries. Adds getLastScrapeSummary() helper that checks cache first, then queries job_history table for the most recent run and deserializes the summary. Updates scraper job runner to populate context with cached summary and store the summary after successful scrape.
Template Integration
scrapers/changes/rules.go, scrapers/processors/json.go, scrapers/processors/script.go
Makes last_scrape_summary available in template environments for change rules, JSON processor results, and script processor templates, enabling access to historical scrape data during template evaluation.

Sequence Diagram(s)

sequenceDiagram
    participant Scraper as Scraper Job
    participant Cache as Summary Cache
    participant DB as Job History DB
    participant Context as ScrapeContext
    participant Template as Template Engine
    participant Processor as Result Processor

    Scraper->>Cache: Check for last scrape summary
    alt Summary in cache
        Cache-->>Scraper: Return cached summary
    else Cache miss
        Scraper->>DB: Query job_history for latest run
        DB-->>Scraper: Return historical summary
        Scraper->>Cache: Store in cache
    end

    Scraper->>Context: WithLastScrapeSummary(summary)
    Context-->>Scraper: Return updated context

    Scraper->>Processor: Run scraper with context
    Processor->>Template: Evaluate template with last_scrape_summary
    Template-->>Processor: Return evaluated result
    Processor-->>Scraper: Return processed results

    Scraper->>DB: Store scrape_summary in job history
    Scraper->>Cache: Update cached summary
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding last_scrape_summary to the template environment, which is the primary objective of this pull request.
Linked Issues check ✅ Passed The PR successfully implements the feature requested in issue #1917 by exposing last_scrape_summary in template contexts across multiple processors and the cron job handler.
Out of Scope Changes check ✅ Passed All changes are directly related to adding last_scrape_summary to template environments; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tmpl-last-scrape-summary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
scrapers/processors/json.go (1)

468-489: Build and reuse one env map per result in this loop.

Line 468 injects into one AsMap() call, while subsequent template/filter calls use fresh AsMap() calls. Reusing a single env here makes last_scrape_summary usage deterministic and reduces repeated map construction.

💡 Proposed refactor
 		for _, result := range ongoingInput {
-			result.AsMap()["last_scrape_summary"] = ctx.LastScrapeSummary()
+			env := result.AsMap()
+			env["last_scrape_summary"] = ctx.LastScrapeSummary()
 			for i, configProperty := range result.BaseScraper.Properties {
 				if configProperty.Filter != "" {
-					if response, err := gomplate.RunTemplate(result.AsMap(), gomplate.Template{Expression: configProperty.Filter}); err != nil {
+					if response, err := gomplate.RunTemplate(env, gomplate.Template{Expression: configProperty.Filter}); err != nil {
 						result.Errorf("failed to parse filter: %v", err)
 						continue
 					} else if boolVal, err := strconv.ParseBool(response); err != nil {
@@
 				templater := gomplate.StructTemplater{
-					Values:         result.AsMap(),
+					Values:         env,
 					ValueFunctions: true,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/processors/json.go` around lines 468 - 489, Build a single env map
once per result and reuse it inside the properties loop instead of calling
result.AsMap() repeatedly: create env := result.AsMap(); set
env["last_scrape_summary"] = ctx.LastScrapeSummary() before iterating over
result.BaseScraper.Properties, then pass that env into gomplate.RunTemplate (for
filters) and templater.StructTemplater{Values: env, ...} so all
templates/filters consistently see the same map; keep the existing copy of Links
(configProperty.Links) and other logic unchanged, just replace every in-loop
result.AsMap() usage with the env variable.
api/context_test.go (1)

12-63: Good coverage for LastScrapeSummary; add a WithValue propagation regression test.

Current tests validate summary propagation well, but there’s no guard for context-state propagation when cloning via WithValue (especially incremental mode).

🧪 Suggested test case
 func TestScrapeContext_LastScrapeSummary(t *testing.T) {
+	t.Run("preserves incremental mode through WithValue", func(t *testing.T) {
+		ctx := ScrapeContext{}.
+			AsIncrementalScrape().
+			WithValue("k", "v")
+
+		assert.True(t, ctx.IsIncrementalScrape())
+	})
+
 	t.Run("returns empty map when unset", func(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/context_test.go` around lines 12 - 63, Add a regression test that
verifies ScrapeContext.LastScrapeSummary is preserved when the context is cloned
via WithValue, especially in incremental mode: create a ScrapeContext, set a
summary with WithLastScrapeSummary, then call WithValue (and also call
AsIncrementalScrape then WithValue in a sub-case) and assert the returned
context's LastScrapeSummary still contains the same entries (use ScrapeContext,
WithLastScrapeSummary, WithValue, AsIncrementalScrape, and LastScrapeSummary to
locate the code).
scrapers/cron.go (1)

36-39: Add cache eviction path for scraperSummaryCache.

scraperSummaryCache is introduced at Line 38, but entries are never removed when a scraper job is deleted. This can leave stale cache growth over time.

💡 Proposed fix
 func DeleteScrapeJob(id string) {
 	logger.Debugf("deleting scraper job for %s", id)
+	scraperSummaryCache.Delete(id)
 
 	if j, ok := scrapeJobs.Load(id); ok {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/cron.go` around lines 36 - 39, The scraperSummaryCache (sync.Map)
never evicts entries when a scraper job is removed, causing stale growth; update
the job-deletion flow (the function that removes entries from scrapeJobs / stops
cron jobs — e.g., DeleteScrapeJob or the handler that calls scrapeJobs.Delete)
to also remove related cache entries by calling
scraperSummaryCache.Delete(jobID) for the job's key (or iterate
scraperSummaryCache.Range and Delete keys that match the job id pattern if keys
are composite), ensuring cache entries are removed when a scraper job is
deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/context.go`:
- Around line 93-101: WithValue currently reconstructs a ScrapeContext but omits
the isIncremental field, causing incremental mode set by AsIncrementalScrape()
to be lost; update the ScrapeContext literal returned by WithValue to copy the
isIncremental boolean from the receiver (ctx.isIncremental) so the incremental
state is preserved when cloning the context.

In `@scrapers/cron.go`:
- Around line 360-370: The current DB query fetches the latest run including
StatusFailed which can mask a previous successful summary; change the logic in
scrapers/cron.go so you first query only for models.StatusSuccess and
models.StatusWarning (using the same Where/Order/First against history), and if
that returns nil, run a second query that includes models.StatusFailed as a
fallback; update the code paths that read history.Details["scrape_summary"] to
use the history from the first successful/warning query when available, and only
use the failed-run history when no prior success/warning exists (refer to the
variables/history query block, scraperID, job.ResourceTypeScraper, and
history.Details access).

---

Nitpick comments:
In `@api/context_test.go`:
- Around line 12-63: Add a regression test that verifies
ScrapeContext.LastScrapeSummary is preserved when the context is cloned via
WithValue, especially in incremental mode: create a ScrapeContext, set a summary
with WithLastScrapeSummary, then call WithValue (and also call
AsIncrementalScrape then WithValue in a sub-case) and assert the returned
context's LastScrapeSummary still contains the same entries (use ScrapeContext,
WithLastScrapeSummary, WithValue, AsIncrementalScrape, and LastScrapeSummary to
locate the code).

In `@scrapers/cron.go`:
- Around line 36-39: The scraperSummaryCache (sync.Map) never evicts entries
when a scraper job is removed, causing stale growth; update the job-deletion
flow (the function that removes entries from scrapeJobs / stops cron jobs —
e.g., DeleteScrapeJob or the handler that calls scrapeJobs.Delete) to also
remove related cache entries by calling scraperSummaryCache.Delete(jobID) for
the job's key (or iterate scraperSummaryCache.Range and Delete keys that match
the job id pattern if keys are composite), ensuring cache entries are removed
when a scraper job is deleted.

In `@scrapers/processors/json.go`:
- Around line 468-489: Build a single env map once per result and reuse it
inside the properties loop instead of calling result.AsMap() repeatedly: create
env := result.AsMap(); set env["last_scrape_summary"] = ctx.LastScrapeSummary()
before iterating over result.BaseScraper.Properties, then pass that env into
gomplate.RunTemplate (for filters) and templater.StructTemplater{Values: env,
...} so all templates/filters consistently see the same map; keep the existing
copy of Links (configProperty.Links) and other logic unchanged, just replace
every in-loop result.AsMap() usage with the env variable.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72b1a58 and 83b4a6c.

📒 Files selected for processing (6)
  • api/context.go
  • api/context_test.go
  • scrapers/changes/rules.go
  • scrapers/cron.go
  • scrapers/processors/json.go
  • scrapers/processors/script.go

Comment on lines 93 to 101
func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext {
return ScrapeContext{
Context: ctx.Context.WithValue(key, val),
temp: ctx.temp,
namespace: ctx.namespace,
jobHistory: ctx.jobHistory,
scrapeConfig: ctx.scrapeConfig,
Context: ctx.Context.WithValue(key, val),
temp: ctx.temp,
namespace: ctx.namespace,
jobHistory: ctx.jobHistory,
scrapeConfig: ctx.scrapeConfig,
lastScrapeSummary: ctx.lastScrapeSummary,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve incremental mode when cloning context with WithValue.

WithValue rebuilds ScrapeContext but currently drops isIncremental. If AsIncrementalScrape() is called before WithValue, incremental state is lost.

💡 Proposed fix
 func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext {
 	return ScrapeContext{
 		Context:           ctx.Context.WithValue(key, val),
 		temp:              ctx.temp,
+		isIncremental:     ctx.isIncremental,
 		namespace:         ctx.namespace,
 		jobHistory:        ctx.jobHistory,
 		scrapeConfig:      ctx.scrapeConfig,
 		lastScrapeSummary: ctx.lastScrapeSummary,
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext {
return ScrapeContext{
Context: ctx.Context.WithValue(key, val),
temp: ctx.temp,
namespace: ctx.namespace,
jobHistory: ctx.jobHistory,
scrapeConfig: ctx.scrapeConfig,
Context: ctx.Context.WithValue(key, val),
temp: ctx.temp,
namespace: ctx.namespace,
jobHistory: ctx.jobHistory,
scrapeConfig: ctx.scrapeConfig,
lastScrapeSummary: ctx.lastScrapeSummary,
}
func (ctx ScrapeContext) WithValue(key, val any) ScrapeContext {
return ScrapeContext{
Context: ctx.Context.WithValue(key, val),
temp: ctx.temp,
isIncremental: ctx.isIncremental,
namespace: ctx.namespace,
jobHistory: ctx.jobHistory,
scrapeConfig: ctx.scrapeConfig,
lastScrapeSummary: ctx.lastScrapeSummary,
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/context.go` around lines 93 - 101, WithValue currently reconstructs a
ScrapeContext but omits the isIncremental field, causing incremental mode set by
AsIncrementalScrape() to be lost; update the ScrapeContext literal returned by
WithValue to copy the isIncremental boolean from the receiver
(ctx.isIncremental) so the incremental state is preserved when cloning the
context.

Comment on lines +360 to +370
Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper).
Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning, models.StatusFailed}).
Order("time_end DESC").First(&history).Error
if err != nil {
return nil
}

raw, ok := history.Details["scrape_summary"]
if !ok {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Latest failed run can hide the previous valid summary.

At Line 361-362, the query includes models.StatusFailed. At Line 367-370, if that failed row has no scrape_summary, this returns nil and drops incremental context (notably after process restart when cache is empty).

💡 Proposed fix
 	err := ctx.DB().
 		Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper).
-		Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning, models.StatusFailed}).
+		Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning}).
 		Order("time_end DESC").First(&history).Error
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper).
Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning, models.StatusFailed}).
Order("time_end DESC").First(&history).Error
if err != nil {
return nil
}
raw, ok := history.Details["scrape_summary"]
if !ok {
return nil
}
Where("resource_id = ? AND resource_type = ?", scraperID, job.ResourceTypeScraper).
Where("status IN ?", []string{models.StatusSuccess, models.StatusWarning}).
Order("time_end DESC").First(&history).Error
if err != nil {
return nil
}
raw, ok := history.Details["scrape_summary"]
if !ok {
return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/cron.go` around lines 360 - 370, The current DB query fetches the
latest run including StatusFailed which can mask a previous successful summary;
change the logic in scrapers/cron.go so you first query only for
models.StatusSuccess and models.StatusWarning (using the same Where/Order/First
against history), and if that returns nil, run a second query that includes
models.StatusFailed as a fallback; update the code paths that read
history.Details["scrape_summary"] to use the history from the first
successful/warning query when available, and only use the failed-run history
when no prior success/warning exists (refer to the variables/history query
block, scraperID, job.ResourceTypeScraper, and history.Details access).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Template last scrape summary

1 participant