test: replace JSON fixtures with YAML fixture-folder for extraction tests by moshloop · Pull Request #1948 · flanksource/config-db

moshloop · 2026-03-05T13:47:30Z

Summary by CodeRabbit

Release Notes

New Features
- Added full-mode config extraction pipeline with comprehensive schema validation and normalization.
- Enhanced configuration reference handling with automatic default propagation and alias resolution.
- Improved external entity (users, groups, roles) resolution with persistent caching.
- Added detailed access control and audit log processing with summary tracking and warning generation.
Bug Fixes
- Corrected entity deletion behavior to respect empty input collections.
Tests
- Expanded test suite with extensive fixtures for extraction pipeline validation.

coderabbitai · 2026-03-05T13:47:48Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 75b852b0-ef20-4511-92ec-a6136e44254c

📥 Commits

Reviewing files that changed from the base of the PR and between ed3f349 and 7348cd4.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (69)

Makefile
api/v1/interface.go
api/v1/types.go
cmd/run.go
db/external_cache.go
db/external_entities.go
db/external_resolver.go
db/permission_changes.go
db/update.go
go.mod
jobs/jobs_test.go
scrapers/azure/active_directory.go
scrapers/exec/exec_backstage_integration_test.go
scrapers/external_entities_test.go
scrapers/extract/README.md
scrapers/extract/analysis.go
scrapers/extract/changes.go
scrapers/extract/extract.go
scrapers/extract/extract_suite_test.go
scrapers/extract/extract_test.go
scrapers/extract/fullmode.go
scrapers/extract/mock_resolver_test.go
scrapers/extract/schema.yaml
scrapers/extract/testdata/access_invalid_ref.yaml
scrapers/extract/testdata/access_logs_basic.yaml
scrapers/extract/testdata/access_logs_missing_user.yaml
scrapers/extract/testdata/access_logs_resolve_user_alias.yaml
scrapers/extract/testdata/access_logs_shorthand_external.yaml
scrapers/extract/testdata/access_logs_shorthand_uuid.yaml
scrapers/extract/testdata/access_logs_user_shorthand.yaml
scrapers/extract/testdata/access_skipped_config_not_found.yaml
scrapers/extract/testdata/analysis_basic.yaml
scrapers/extract/testdata/analysis_default_from_parent.yaml
scrapers/extract/testdata/analysis_invalid_ref.yaml
scrapers/extract/testdata/analysis_no_parent.yaml
scrapers/extract/testdata/changes_and_access.yaml
scrapers/extract/testdata/changes_basic.yaml
scrapers/extract/testdata/changes_default_from_parent.yaml
scrapers/extract/testdata/changes_invalid_ref.yaml
scrapers/extract/testdata/changes_missing_change_id.yaml
scrapers/extract/testdata/changes_missing_change_type.yaml
scrapers/extract/testdata/changes_no_parent.yaml
scrapers/extract/testdata/changes_with_details.yaml
scrapers/extract/testdata/combined_aliases.yaml
scrapers/extract/testdata/config_ref_defaults.yaml
scrapers/extract/testdata/config_ref_shorthand.yaml
scrapers/extract/testdata/config_ref_uuid.yaml
scrapers/extract/testdata/drop_nil_user_groups.yaml
scrapers/extract/testdata/empty_config_access.yaml
scrapers/extract/testdata/external_config_id.yaml
scrapers/extract/testdata/group_aliases.yaml
scrapers/extract/testdata/json_roundtrip.yaml
scrapers/extract/testdata/missing_config_access.yaml
scrapers/extract/testdata/remap_user_groups.yaml
scrapers/extract/testdata/resolve_after_sync.yaml
scrapers/extract/testdata/resolve_aliases.yaml
scrapers/extract/testdata/reuse_existing_ids.yaml
scrapers/extract/testdata/role_aliases.yaml
scrapers/extract/testdata/stub_entity_creation.yaml
scrapers/extract/testdata/unmarshal_all_aliases.yaml
scrapers/extract/testdata/user_aliases.yaml
scrapers/extraction_unit_test.go
scrapers/gcp/audit_logs.go
scrapers/gcp/security_center.go
scrapers/kubernetes/informers_test.go
scrapers/kubernetes/rbac.go
scrapers/processors/json.go
scrapers/run.go
scrapers/trivy/trivy.go

Walkthrough

This PR introduces a comprehensive configuration extraction and normalization pipeline for scrapers. It restructures public API types (AnalysisResult, ExternalConfigAccess, ExternalID) to support JSON serialization, adds a DB-backed resolver for entity synchronization, implements an end-to-end extraction workflow with full-mode support, and updates scrapers and orchestration to use the new extract package.

Changes

Cohort / File(s)	Summary
API Type Updates `api/v1/interface.go`, `api/v1/types.go`	Restructured AnalysisResult with JSON-tagged fields (ExternalID, ConfigType, Summary, Analysis, etc.), added Pretty() methods to EntitySummary and ExternalID, updated ExternalConfigAccessLog/ExternalConfigAccess with new utility methods and fields (Columns(), Row(), HasPrincipal(), ToConfigAccess(), UnmarshalJSON()).
Extraction Pipeline Core `scrapers/extract/extract.go`, `scrapers/extract/fullmode.go`, `scrapers/extract/analysis.go`, `scrapers/extract/changes.go`	Introduced new extract package with Resolver interface, ExtractedConfig/ExtractionSummary types, ExtractConfigChangesFromConfig() orchestration, SyncEntities() and ResolveAccess() workflows, ApplyAnalysisRules() enrichment, SummarizeChanges() summarization, and ExtractFullMode() processor for full-mode scraping.
Extraction Testing `scrapers/extract/extract_suite_test.go`, `scrapers/extract/extract_test.go`, `scrapers/extract/mock_resolver_test.go`, `scrapers/extract/schema.yaml`, `scrapers/extract/testdata/*`	Added comprehensive test suite with MockResolver test helper, fixture-driven extraction validation, and detailed JSON schema documenting full-mode config-map format, change/analysis/access-log structures, and validation rules.
Database Resolution Layer `db/external_resolver.go`, `db/update.go`, `db/external_entities.go`	Added DBResolver type implementing Resolver interface for entity/config synchronization, updated upsert logic to lookup ConfigItem via ExternalID first, removed conditional bulk-deletion fallbacks for empty entity lists.
Scraper Orchestration & Updates `scrapers/run.go`	Added public RunScraper() function wrapping end-to-end execution; delegated analysis rules and change summarization to extract package; removed inline extraction logic and ExtractedConfig type (moved to extract).
Individual Scraper Adaptations `scrapers/gcp/security_center.go`, `scrapers/gcp/audit_logs.go`, `scrapers/azure/active_directory.go`, `scrapers/kubernetes/rbac.go`, `scrapers/trivy/trivy.go`, `scrapers/processors/json.go`	Updated to use new API: direct ExternalID/ConfigType field assignments instead of ExternalConfigs wrapping, flattened ExternalConfigAccess struct usage, expanded entity-detection keys (users, roles, groups, access_logs, config_access).
Database Permission & Entity Management `db/permission_changes.go`	Updated to call ToConfigAccess() when computing deterministic access IDs and inserting into temporary tables for consistent internal model conversion.
Test Configuration & Cleanup `jobs/jobs_test.go`, `scrapers/exec/exec_backstage_integration_test.go`, `scrapers/external_entities_test.go`, `Makefile`	Added ginkgo slow labels to test suites, added explicit ConfigItem cleanup in test teardown, updated test-fast target to use label-filter instead of --procs parallelism flag.
Build Dependencies `go.mod`, `db/external_cache.go`	Bumped go.opentelemetry.io/otel and various indirect dependencies (charmbracelet, clipperhouse, playwright); reordered imports in external_cache.go.
Removed Tests `scrapers/extraction_unit_test.go`	Deleted inline extraction unit tests (replaced by fixture-driven extract_test.go).
Documentation `scrapers/extract/README.md`	Added comprehensive flow diagram, return format specification, config field format explanation, full-mode config-map schema, validation rules, and example YAML for the extraction pipeline.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant RunScraper as RunScraper<br/>(Main)
    participant Scraper as Scraper<br/>(e.g., GCP)
    participant Extract as extract<br/>Package
    participant Resolver as Resolver<br/>(DB-backed)
    participant DB as Database

    Client->>RunScraper: RunScraper(ctx)
    RunScraper->>Scraper: Execute scraper logic
    Scraper->>RunScraper: Return ScrapeResults
    
    loop For each result
        RunScraper->>Extract: ExtractConfigChangesFromConfig(config)
        Extract->>Extract: Parse & validate config<br/>structure
        Extract->>RunScraper: Return ExtractedConfig
        
        RunScraper->>Extract: SyncEntities(result)
        Extract->>Resolver: SyncExternalUsers/Groups/Roles
        Resolver->>DB: Lookup/create external entities
        DB->>Resolver: Return IDs & mappings
        Resolver->>Extract: Return synchronized entities
        
        RunScraper->>Extract: ResolveAccess(result)
        Extract->>Resolver: FindUserIDByAliases(aliases)
        Resolver->>DB: Query entity by alias
        DB->>Resolver: Return entity ID
        Resolver->>Extract: Return resolved IDs
        Extract->>Extract: Update access configs<br/>with resolved IDs
        
        RunScraper->>Extract: ApplyAnalysisRules(result)
        Extract->>Extract: Enrich with rule-based<br/>Category & Severity
        
        RunScraper->>Extract: SummarizeChanges(changes)
        Extract->>Extract: Extract JSON patch paths<br/>into summary text
    end
    
    RunScraper->>DB: Save aggregated results
    RunScraper->>Client: Return ScrapeOutput

Possibly related PRs

PR #1859: Modifies same API types (ExternalConfigAccess, ExternalID) in api/v1, changing JSON-serializable fields and config-access behavior.
PR #1862: Updates same ExternalConfigAccess type in api/v1/interface.go with external role/group alias fields and handling.
PR #1938: Modifies same public API types (AnalysisResult, ExternalConfigAccess, ExternalConfigAccessLog) and extraction/permission-access DB flows.

Suggested reviewers

adityathebe

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main changes: replacing JSON fixtures with YAML fixtures in the extraction test suite, which is substantiated by the numerous testdata files added.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr/extract-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (5)

scrapers/extract/testdata/changes_with_details.yaml (1)

12-18: Consider asserting summary.changes.skipped == 0 for stronger guardrails.

Current checks are good; adding skipped-count validation would make regressions in filtering logic easier to catch.

✅ Suggested assertion addition

 assertions:
   - size(changes) == 1
   - changes[0].config_type == 'VM'
   - changes[0].details.old_size == '100GB'
   - changes[0].details.new_size == '200GB'
   - changes[0].source == 'azure'
   - summary.changes.scraped == 1
+  - summary.changes.skipped == 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/testdata/changes_with_details.yaml` around lines 12 - 18,
Add an assertion to the existing test assertions block to validate that no
changes were skipped by asserting summary.changes.skipped == 0; update the
assertions list (the same block containing size(changes),
changes[0].config_type, etc.) to include this new check so the test fails if any
items are filtered out unexpectedly.

scrapers/extract/testdata/config_ref_shorthand.yaml (1)

24-38: Use one canonical key for config ref fields in this fixture.

This file mixes config_external_id (expected block) and external_config_id (assertions). Please standardize on one key name here to avoid ambiguous fixture semantics.

Proposed cleanup

-  - config_access[0].external_config_id.config_type == 'Organization'
-  - config_access[1].external_config_id.config_type == 'Database'
+  - config_access[0].config_external_id.config_type == 'Organization'
+  - config_access[1].config_external_id.config_type == 'Database'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/testdata/config_ref_shorthand.yaml` around lines 24 - 38,
The fixture mixes two keys for the same concept—config_external_id and
external_config_id—causing assertion mismatch; pick one canonical key (e.g.,
external_config_id) and update all occurrences so the data and the assertions
refer to the same field: change the config blocks under config_access (the
entries currently using config_external_id) to the chosen key name and/or update
the assertion lines that reference config_access[...] .external_config_id to
match the chosen key, ensuring both the config objects and the assertions use
the identical identifier.

scrapers/extract/analysis.go (1)

10-12: Add a nil guard for result to harden this helper.

This is a small defensive improvement to prevent accidental panics if a nil *v1.ScrapeResult is ever passed.

Suggested fix

 func ApplyAnalysisRules(result *v1.ScrapeResult) {
-	if result.AnalysisResult == nil {
+	if result == nil || result.AnalysisResult == nil {
 		return
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/analysis.go` around lines 10 - 12, Add a nil guard at the
start of ApplyAnalysisRules to return early if the passed-in pointer is nil:
check if result == nil (in addition to the existing result.AnalysisResult check)
and return when nil to avoid panics when a nil *v1.ScrapeResult is supplied;
update the function ApplyAnalysisRules (and its existing guard for
result.AnalysisResult) accordingly.

scrapers/extract/extract_test.go (1)

16-19: Don’t ignore env conversion errors in test helpers.

At Line 16 and Line 18, JSON errors are discarded, which can mask root causes and surface as misleading assertion/CEL failures later. Return an error from resultToEnv and assert it before evaluating expressions.

♻️ Proposed patch

-func resultToEnv(result ExtractedConfig) map[string]any {
-	raw, _ := json.Marshal(result)
-	var env map[string]any
-	_ = json.Unmarshal(raw, &env)
-	return env
+func resultToEnv(result ExtractedConfig) (map[string]any, error) {
+	raw, err := json.Marshal(result)
+	if err != nil {
+		return nil, err
+	}
+	var env map[string]any
+	if err := json.Unmarshal(raw, &env); err != nil {
+		return nil, err
+	}
+	return env, nil
 }
@@
-			env := resultToEnv(result)
+			env, err := resultToEnv(result)
+			Expect(err).ToNot(HaveOccurred())

Also applies to: 73-73

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/extract_test.go` around lines 16 - 19, The helper
resultToEnv currently swallows JSON errors (json.Marshal and json.Unmarshal),
which can hide root causes; change resultToEnv to return (map[string]any,
error), propagate and return any errors from json.Marshal and json.Unmarshal
instead of discarding them, update callers (tests referencing resultToEnv) to
check and assert the returned error before using the map, and apply the same fix
for the other occurrence noted around line 73; use the function name resultToEnv
to locate and modify the helper and all its call sites.

scrapers/extract/schema.yaml (1)

315-323: Enforce non-empty external_configs to match the documented rule.

Description says this must be a non-empty array, but the schema currently allows [].

Suggested fix

       external_configs:
         type: array
+        minItems: 1
         description: >
           Related config references. Alternative to single external_id.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/schema.yaml` around lines 315 - 323, The schema allows
external_configs to be an empty array but the description requires a non-empty
array; update the external_configs definition in schema.yaml (the
external_configs property that references `#/`$defs/ExternalID) to enforce at
least one element by adding "minItems: 1" (and optionally "type: array" is
already present) so the array cannot be empty and remains mutually exclusive
with external_id as documented.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1/interface.go`:
- Around line 216-223: The Pretty rendering unconditionally prepends a .Space()
before saved/skipped/deleted segments, causing a leading space when those are
the first segment; in EntitySummary.Pretty() (the code that builds t using
.Appendf, .AddText and .Space), change the logic to only call .Space() when
there's prior content (e.g., check t is non-empty or maintain a local "first"
flag) before appending each of the Saved, Skipped, and Deleted segments so the
first appended segment is not preceded by whitespace.

In `@db/update.go`:
- Around line 405-409: The direct cache lookup currently runs when only
result.AnalysisResult.ExternalID is set, which can misassociate analyses; change
the condition so ctx.TempCache().Find(...) is called only if both
result.AnalysisResult.ExternalID and result.AnalysisResult.ConfigType are
non-empty (i.e., guard with a check that ConfigType != "" && ExternalID != ""),
and apply the same guard to the second similar block (the code handling lines
415-424) so the safer fallback loop is used whenever ConfigType is missing.

In `@fixtures/data/extraction/alias_keys.yaml`:
- Around line 1-35: This YAML fixture file alias_keys.yaml is an orphan not used
by the extraction test harness and duplicates a working fixture; delete this
duplicate fixture to avoid confusion. If you must keep it instead, move it into
the test harness's testdata directory and populate it with the required
non-empty assertions and any pre-population data so it mirrors the canonical
alias_keys.yaml used by the tests. Ensure the fixture's top-level keys (input,
expected) remain correct and that assertions are present so
scrapers/extract/extract_test.go will load and execute it.

In `@fixtures/data/extraction/canonical_priority.yaml`:
- Around line 1-17: The YAML fixture 'canonical_priority.yaml' in
fixtures/data/extraction is orphaned (a duplicate exists in the active test
fixtures) — remove this unused file from the repository or move it into the
active test fixtures directory where scrapers/extract/testdata fixtures live so
there is only one canonical copy; ensure you keep the existing tested version
(the duplicate under scrapers/extract/testdata) and update/remove any references
to the orphaned fixture to keep the extraction tests unchanged.

In `@scrapers/extract/changes.go`:
- Line 31: The append to changes[i].Summary concatenates strings.Join(paths, ",
") without a separator; update the logic around changes[i].Summary so if
changes[i].Summary is non-empty you first append a separator (e.g., a single
space or " ") before adding strings.Join(paths, ", "), otherwise just set it to
the joined paths; reference changes[i].Summary and the paths variable when
making the conditional append to ensure output is not merged into the previous
text.

In `@scrapers/extract/extract.go`:
- Around line 214-221: The sync is happening before alias resolution so
resolveConfigAccessAliases (which can create users/roles/groups) appends new
entities after SyncEntities, allowing duplicate identities; change the flow so
resolver-created entities are synced: either call ResolveAccess(scraperID,
&result) before SyncEntities(resolver, scraperID, &result) or invoke
SyncEntities again immediately after ResolveAccess (ensure you update the code
paths in the extract flow that call SyncEntities and ResolveAccess so that any
new entities from resolveConfigAccessAliases are included in the resolver sync).
Target symbols: SyncEntities, ResolveAccess, resolveConfigAccessAliases; ensure
only one final sync runs after alias resolution to prevent duplicates.
- Around line 600-614: The validation loops over result.ConfigAccess (type
v1.ExternalConfigAccess) and result.AccessLogs (type v1.ExternalConfigAccessLog)
currently accept entries when ConfigID is nil but only check ExternalID != "";
update both checks to use the full external-ref emptiness helper (e.g. call
!ConfigExternalID.IsEmpty() on the respective struct) instead of comparing
ExternalID to an empty string so entries missing ConfigType are rejected; modify
the conditions in the loops that build validAccess and validLogs (the checks
around ca and al) to use !<struct>.ConfigExternalID.IsEmpty().

In `@scrapers/extract/fullmode.go`:
- Around line 12-18: The loop re-uses originalConfig for every scraped item
causing duplicate extraction; change the call to ExtractConfigChangesFromConfig
to use each item’s own config (scraped[i].Config) with a fallback to
originalConfig when nil. Concretely, inside the for i := range scraped loop
compute cfg := scraped[i].Config; if cfg == nil { cfg = originalConfig } then
call ExtractConfigChangesFromConfig(resolver, scraperID, cfg), keep assigning
scraped[i].Error on error and continue, and then merge the returned extracted
into all as before (all.Merge).

In `@scrapers/extract/README.md`:
- Around line 384-386: The markdown fenced code blocks in
scrapers/extract/README.md (the blocks containing "ConfigID == uuid.Nil AND
(ConfigType == "" OR ExternalID == "")" and the block starting with
"uuid.Parse(value) → ...") are missing language identifiers; update both fenced
blocks to include a language tag such as text (e.g., change ``` to ```text) so
markdownlint MD040 warnings are resolved, and repeat the same change for the
other occurrence at lines ~411-414.

In `@scrapers/extract/schema.yaml`:
- Around line 159-163: The schema's ChangeResult.required currently forces
external_id/external_change_id even when configID lookup should be allowed;
replace the flat required array with a oneOf (or anyOf) that enforces either {
required: [ "configID" ] } OR { required: [ "external_id", "external_change_id"
] } (include config_type in the external lookup branch if that field is needed),
and apply the same oneOf change to the second similar block (the section
referenced around the other occurrence). Update the ChangeResult schema
definition (and the matching schema at the other occurrence) so the validator
accepts configID-only payloads while still requiring
external_id+external_change_id for the external-lookup variant.
- Around line 47-53: The schema for the config property currently restricts it
to object (the "config" key in schema.yaml), which contradicts the extractor
contract that allows arrays, scalars and null; update the "config" schema to
accept all valid JSON/YAML value types (e.g., allow multiple types such as
object, array, string, number, boolean, null or remove the specific type
constraint) so the extractor can persist non-object payloads; ensure the change
is made on the "config" field definition so validation no longer rejects valid
non-object configs.
- Around line 183-188: The schema mistakenly applies the ChangeAction enum to
external_change_id (external_change_id currently references $defs/ChangeAction),
which constrains IDs to values like "delete|ignore|move"; change
external_change_id to be a plain string (remove the $ref to $defs/ChangeAction)
and add a separate field (e.g., action) that references $defs/ChangeAction for
the intended enum; update the object definition near external_change_id and
change_type to remove the enum constraint from external_change_id and introduce
action: $ref: "#/$defs/ChangeAction".

In `@scrapers/extract/testdata/changes_and_access.yaml`:
- Around line 31-35: The test summary is missing an assertion for
changes.skipped; update the YAML assertions block to include a line asserting
summary.changes.skipped == 0 alongside summary.changes.scraped == 1 so the
summary validation covers skipped counts for changes as it does for
config_access and access_logs.

In `@scrapers/extract/testdata/config_ref_defaults.yaml`:
- Around line 24-36: The fixture mixes two key names; use a single canonical key
(config_external_id) everywhere: update the entries that currently use
external_config_id (e.g., the records with ids a2 and a3 and the duplicate
occurrence around lines 40-43) to use config_external_id so the fixture keys
match the expected assertions and avoid ambiguity.

In `@scrapers/gcp/security_center.go`:
- Line 115: Set analysis.ConfigType whenever setting analysis.ExternalID so
upsertAnalysis can resolve by both keys; update the code around the assignment
to analysis.ExternalID (the block that sets analysis.Resource.Name) to also set
analysis.ConfigType to the appropriate constant/enum/value used for Security
Center resources (the same ConfigType expected by upsertAnalysis). Ensure the
value matches the key used in upsertAnalysis lookup so findings are not dropped.

---

Nitpick comments:
In `@scrapers/extract/analysis.go`:
- Around line 10-12: Add a nil guard at the start of ApplyAnalysisRules to
return early if the passed-in pointer is nil: check if result == nil (in
addition to the existing result.AnalysisResult check) and return when nil to
avoid panics when a nil *v1.ScrapeResult is supplied; update the function
ApplyAnalysisRules (and its existing guard for result.AnalysisResult)
accordingly.

In `@scrapers/extract/extract_test.go`:
- Around line 16-19: The helper resultToEnv currently swallows JSON errors
(json.Marshal and json.Unmarshal), which can hide root causes; change
resultToEnv to return (map[string]any, error), propagate and return any errors
from json.Marshal and json.Unmarshal instead of discarding them, update callers
(tests referencing resultToEnv) to check and assert the returned error before
using the map, and apply the same fix for the other occurrence noted around line
73; use the function name resultToEnv to locate and modify the helper and all
its call sites.

In `@scrapers/extract/schema.yaml`:
- Around line 315-323: The schema allows external_configs to be an empty array
but the description requires a non-empty array; update the external_configs
definition in schema.yaml (the external_configs property that references
`#/`$defs/ExternalID) to enforce at least one element by adding "minItems: 1" (and
optionally "type: array" is already present) so the array cannot be empty and
remains mutually exclusive with external_id as documented.

In `@scrapers/extract/testdata/changes_with_details.yaml`:
- Around line 12-18: Add an assertion to the existing test assertions block to
validate that no changes were skipped by asserting summary.changes.skipped == 0;
update the assertions list (the same block containing size(changes),
changes[0].config_type, etc.) to include this new check so the test fails if any
items are filtered out unexpectedly.

In `@scrapers/extract/testdata/config_ref_shorthand.yaml`:
- Around line 24-38: The fixture mixes two keys for the same
concept—config_external_id and external_config_id—causing assertion mismatch;
pick one canonical key (e.g., external_config_id) and update all occurrences so
the data and the assertions refer to the same field: change the config blocks
under config_access (the entries currently using config_external_id) to the
chosen key name and/or update the assertion lines that reference
config_access[...] .external_config_id to match the chosen key, ensuring both
the config objects and the assertions use the identical identifier.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a491afca-53a2-48ed-8caf-487fad7c2dc1

📥 Commits

Reviewing files that changed from the base of the PR and between 2a0727c and 97e014c.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (61)

api/v1/interface.go
api/v1/types.go
db/external_cache.go
db/external_resolver.go
db/update.go
fixtures/data/extraction/alias_keys.yaml
fixtures/data/extraction/canonical_keys.yaml
fixtures/data/extraction/canonical_priority.yaml
fixtures/data/extraction/shorthand_fields.yaml
go.mod
scrapers/extract/README.md
scrapers/extract/analysis.go
scrapers/extract/changes.go
scrapers/extract/extract.go
scrapers/extract/extract_suite_test.go
scrapers/extract/extract_test.go
scrapers/extract/fullmode.go
scrapers/extract/mock_resolver_test.go
scrapers/extract/schema.yaml
scrapers/extract/testdata/access_invalid_ref.yaml
scrapers/extract/testdata/access_logs_basic.yaml
scrapers/extract/testdata/access_logs_shorthand_external.yaml
scrapers/extract/testdata/access_logs_shorthand_uuid.yaml
scrapers/extract/testdata/access_skipped_config_not_found.yaml
scrapers/extract/testdata/alias_keys.yaml
scrapers/extract/testdata/analysis_basic.yaml
scrapers/extract/testdata/analysis_default_from_parent.yaml
scrapers/extract/testdata/analysis_invalid_ref.yaml
scrapers/extract/testdata/analysis_no_parent.yaml
scrapers/extract/testdata/canonical_keys.yaml
scrapers/extract/testdata/canonical_priority.yaml
scrapers/extract/testdata/changes_and_access.yaml
scrapers/extract/testdata/changes_basic.yaml
scrapers/extract/testdata/changes_default_from_parent.yaml
scrapers/extract/testdata/changes_invalid_ref.yaml
scrapers/extract/testdata/changes_no_parent.yaml
scrapers/extract/testdata/changes_with_details.yaml
scrapers/extract/testdata/combined_aliases.yaml
scrapers/extract/testdata/config_ref_defaults.yaml
scrapers/extract/testdata/config_ref_shorthand.yaml
scrapers/extract/testdata/config_ref_uuid.yaml
scrapers/extract/testdata/drop_nil_user_groups.yaml
scrapers/extract/testdata/empty_config_access.yaml
scrapers/extract/testdata/external_config_id.yaml
scrapers/extract/testdata/group_aliases.yaml
scrapers/extract/testdata/json_roundtrip.yaml
scrapers/extract/testdata/missing_config_access.yaml
scrapers/extract/testdata/remap_user_groups.yaml
scrapers/extract/testdata/resolve_after_sync.yaml
scrapers/extract/testdata/resolve_aliases.yaml
scrapers/extract/testdata/reuse_existing_ids.yaml
scrapers/extract/testdata/role_aliases.yaml
scrapers/extract/testdata/shorthand_fields.yaml
scrapers/extract/testdata/stub_entity_creation.yaml
scrapers/extract/testdata/unmarshal_all_aliases.yaml
scrapers/extract/testdata/user_aliases.yaml
scrapers/extraction_unit_test.go
scrapers/gcp/security_center.go
scrapers/kubernetes/informers_test.go
scrapers/run.go
scrapers/trivy/trivy.go

💤 Files with no reviewable changes (1)

scrapers/extraction_unit_test.go

coderabbitai · 2026-03-05T14:07:22Z

api/v1/interface.go

+		t = t.Space().Appendf("%d", e.Saved).AddText(" saved", "success")
 	}
 	if e.Skipped > 0 {
-		parts = append(parts, fmt.Sprintf("skipped=%d", e.Skipped))
+		t = t.Space().Appendf("%d", e.Skipped).AddText(" skipped", "warning")
 	}
 	if e.Deleted > 0 {
-		parts = append(parts, fmt.Sprintf("deleted=%d", e.Deleted))
+		t = t.Space().Appendf("%d", e.Deleted).AddText(" deleted", "error")
 	}


⚠️ Potential issue | 🟡 Minor

Prevent leading whitespace in EntitySummary.Pretty().

At Line 216/219/222, .Space() is unconditional for saved/skipped/deleted segments. If the first rendered segment is not scraped, output starts with a leading space.

♻️ Proposed patch

func (e EntitySummary) Pretty() api.Text { t := clicky.Text("") + first := true + appendPart := func(count int, label, style string) { + if count == 0 { + return + } + if !first { + t = t.Space() + } + t = t.Appendf("%d", count).AddText(" "+label, style) + first = false + } - if e.Scraped > 0 { - t = t.Appendf("%d", e.Scraped).AddText(" scraped", "muted") - } - if e.Saved > 0 { - t = t.Space().Appendf("%d", e.Saved).AddText(" saved", "success") - } - if e.Skipped > 0 { - t = t.Space().Appendf("%d", e.Skipped).AddText(" skipped", "warning") - } - if e.Deleted > 0 { - t = t.Space().Appendf("%d", e.Deleted).AddText(" deleted", "error") - } + appendPart(e.Scraped, "scraped", "muted") + appendPart(e.Saved, "saved", "success") + appendPart(e.Skipped, "skipped", "warning") + appendPart(e.Deleted, "deleted", "error") return t }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/v1/interface.go` around lines 216 - 223, The Pretty rendering unconditionally prepends a .Space() before saved/skipped/deleted segments, causing a leading space when those are the first segment; in EntitySummary.Pretty() (the code that builds t using .Appendf, .AddText and .Space), change the logic to only call .Space() when there's prior content (e.g., check t is non-empty or maintain a local "first" flag) before appending each of the Saved, Skipped, and Deleted segments so the first appended segment is not preceded by whitespace.

db/update.go

coderabbitai · 2026-03-05T14:07:22Z

fixtures/data/extraction/alias_keys.yaml

+input:
+  config:
+    name: Test Org
+  access:
+    - id: a1
+      external_config_id:
+        config_type: Org
+        external_id: org-1
+      user: alice
+  users:
+    - name: Alice
+      aliases: [alice]
+  roles:
+    - name: Admin
+      aliases: [admin]
+  groups:
+    - name: Devs
+      aliases: [devs]
+  user_groups:
+    - external_user_id: "00000000-0000-0000-0000-000000000001"
+      external_group_id: "00000000-0000-0000-0000-000000000002"
+
+expected:
+  config_access:
+    - id: a1
+      external_user_aliases: [alice]
+  external_users:
+    - name: Alice
+  external_roles:
+    - name: Admin
+  external_groups:
+    - name: Devs
+  external_user_groups:
+    - external_user_id: "00000000-0000-0000-0000-000000000001"
+      external_group_id: "00000000-0000-0000-0000-000000000002"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the fixture file and check its location find . -name "alias_keys.yaml" -type f # Search for references to this fixture in the codebase rg -n "alias_keys" --type go --type yaml # Search for test harnesses that load fixtures from fixtures/data/extraction rg -n "fixtures/data/extraction" --type go # Search for test patterns related to fixture loading rg -n "assertions" --type go | head -20

Repository: flanksource/config-db

Length of output: 532

🏁 Script executed:

# Check if there are any tests or code that load from fixtures/data/extraction rg -n "fixtures/data/extraction" --type go --type yaml # Check what's actually in the scrapers/extract/testdata version head -40 "./scrapers/extract/testdata/alias_keys.yaml" # See if fixtures/data/extraction is used by any other test suite fd -t f -path "*/extraction/*" "*.yaml" | head -20 # Check if there's any test loading from fixtures/data/extraction rg -n "fixtures" scrapers/extract/extract_test.go | head -20

Repository: flanksource/config-db

Length of output: 1292

🏁 Script executed:

# Check if there's any other test that loads from fixtures/data/extraction rg -rn "fixtures/data/extraction" --type go # Check if there's any test in other directories that might use fixtures/data/extraction find . -name "*_test.go" -type f -exec grep -l "fixtures/data/extraction" {} \; # Check the test file to see the fixture loading pattern more clearly sed -n '30,60p' scrapers/extract/extract_test.go

Repository: flanksource/config-db

Length of output: 1054

🏁 Script executed:

# Check if there are any other fixtures test harnesses or loaders find . -name "*_test.go" -type f | xargs grep -l "fixtures/data" # Check what test files exist in scrapers/extract ls -la scrapers/extract/*_test.go # Double-check: are there any references to fixtures/data/extraction in Go code rg "fixtures/data/extraction" --type go

Repository: flanksource/config-db

Length of output: 322

This fixture is not executed by the extraction test harness and should be removed or moved.

The test harness in scrapers/extract/extract_test.go loads fixtures only from testdata/*.yaml using filepath.Glob("testdata/*.yaml"). This fixture lives in fixtures/data/extraction/ and is never referenced. Additionally, the test explicitly requires assertions to be non-empty; this fixture has none. An identical, properly configured version exists in scrapers/extract/testdata/alias_keys.yaml with assertions and pre-population data. Remove the duplicate orphaned fixture to avoid future maintenance confusion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@fixtures/data/extraction/alias_keys.yaml` around lines 1 - 35, This YAML fixture file alias_keys.yaml is an orphan not used by the extraction test harness and duplicates a working fixture; delete this duplicate fixture to avoid confusion. If you must keep it instead, move it into the test harness's testdata directory and populate it with the required non-empty assertions and any pre-population data so it mirrors the canonical alias_keys.yaml used by the tests. Ensure the fixture's top-level keys (input, expected) remain correct and that assertions are present so scrapers/extract/extract_test.go will load and execute it.

fixtures/data/extraction/canonical_priority.yaml

coderabbitai · 2026-03-05T14:07:22Z

scrapers/extract/changes.go

+			continue
+		}
+
+		changes[i].Summary += strings.Join(paths, ", ")


⚠️ Potential issue | 🟡 Minor

Missing separator when appending to non-empty Summary.

If change.Summary already contains text, the joined paths are concatenated directly without any separator, resulting in malformed output like "existing texta, b" instead of "existing text a, b".

Proposed fix

- changes[i].Summary += strings.Join(paths, ", ") + pathStr := strings.Join(paths, ", ") + if changes[i].Summary != "" { + changes[i].Summary += " " + } + changes[i].Summary += pathStr

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

changes[i].Summary += strings.Join(paths, ", ")

pathStr := strings.Join(paths, ", ")

if changes[i].Summary != "" {

changes[i].Summary += " "

}

changes[i].Summary += pathStr

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scrapers/extract/changes.go` at line 31, The append to changes[i].Summary concatenates strings.Join(paths, ", ") without a separator; update the logic around changes[i].Summary so if changes[i].Summary is non-empty you first append a separator (e.g., a single space or " ") before adding strings.Join(paths, ", "), otherwise just set it to the joined paths; reference changes[i].Summary and the paths variable when making the conditional append to ensure output is not merged into the previous text.

coderabbitai · 2026-03-05T14:07:22Z

scrapers/extract/schema.yaml

+      Mutual exclusivity: configID (direct UUID lookup) vs external_id + config_type
+      (external lookup). If configID is set, external_id/config_type are ignored for lookup.
+    required:
+      - external_id
+      - external_change_id


⚠️ Potential issue | 🟠 Major

ChangeResult.required conflicts with the documented configID lookup path.

The schema says configID can bypass external_id + config_type, but Line [162] always requires external_id. That invalidates legitimate configID-only payloads.

Suggested fix

- required: - - external_id - - external_change_id - - change_type + required: + - external_change_id + - change_type + oneOf: + - required: [configID] + - required: [external_id, config_type]

Also applies to: 225-231

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scrapers/extract/schema.yaml` around lines 159 - 163, The schema's ChangeResult.required currently forces external_id/external_change_id even when configID lookup should be allowed; replace the flat required array with a oneOf (or anyOf) that enforces either { required: [ "configID" ] } OR { required: [ "external_id", "external_change_id" ] } (include config_type in the external lookup branch if that field is needed), and apply the same oneOf change to the second similar block (the section referenced around the other occurrence). Update the ChangeResult schema definition (and the matching schema at the other occurrence) so the validator accepts configID-only payloads while still requiring external_id+external_change_id for the external-lookup variant.

scrapers/extract/schema.yaml

coderabbitai · 2026-03-05T14:07:23Z

scrapers/extract/testdata/changes_and_access.yaml

+  - summary.config_access.scraped == 1
+  - summary.config_access.skipped == 0
+  - summary.changes.scraped == 1
+  - summary.access_logs.scraped == 1
+  - summary.access_logs.skipped == 0


⚠️ Potential issue | 🟡 Minor

Add the missing changes.skipped summary assertion.

You assert skipped == 0 for config_access and access_logs, but not for changes (Lines 31–35). This leaves a small regression gap in summary validation.

✅ Suggested assertion update

assertions: - summary.config_access.scraped == 1 - summary.config_access.skipped == 0 - summary.changes.scraped == 1 + - summary.changes.skipped == 0 - summary.access_logs.scraped == 1 - summary.access_logs.skipped == 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scrapers/extract/testdata/changes_and_access.yaml` around lines 31 - 35, The test summary is missing an assertion for changes.skipped; update the YAML assertions block to include a line asserting summary.changes.skipped == 0 alongside summary.changes.scraped == 1 so the summary validation covers skipped counts for changes as it does for config_access and access_logs.

coderabbitai · 2026-03-05T14:07:23Z

scrapers/extract/testdata/config_ref_defaults.yaml

+      config_external_id:
+        config_type: Organization
+        external_id: org-1
+      external_user_aliases: [alice]
+    - id: a2
+      config_external_id:
+        config_type: Organization
+        external_id: org-1
+      external_user_aliases: [bob]
+    - id: a3
+      config_external_id:
+        config_type: Database
+        external_id: db-1


⚠️ Potential issue | 🟡 Minor

Use one canonical key name for config refs in this fixture.

expected uses config_external_id, while assertions use external_config_id. Keep these aligned to avoid ambiguity in future fixture consumers.

Suggested fix

- config_external_id: + external_config_id: config_type: Organization external_id: org-1 @@ - config_external_id: + external_config_id: config_type: Organization external_id: org-1 @@ - config_external_id: + external_config_id: config_type: Database external_id: db-1

Also applies to: 40-43

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scrapers/extract/testdata/config_ref_defaults.yaml` around lines 24 - 36, The fixture mixes two key names; use a single canonical key (config_external_id) everywhere: update the entries that currently use external_config_id (e.g., the records with ids a2 and a3 and the duplicate occurrence around lines 40-43) to use config_external_id so the fixture keys match the expected assertions and avoid ambiguity.

coderabbitai · 2026-03-05T14:07:23Z

scrapers/gcp/security_center.go

 	}

-	analysis.ExternalConfigs = append(analysis.ExternalConfigs, v1.ExternalID{ExternalID: finding.Resource.Name})
+	analysis.ExternalID = finding.Resource.Name


⚠️ Potential issue | 🟠 Major

Set ConfigType alongside ExternalID to avoid dropping findings.

upsertAnalysis resolves by {ConfigType, ExternalID}. With only ExternalID set here, many findings won’t match a config item and can be silently skipped.

🛠️ Proposed fix

- analysis.ExternalID = finding.Resource.Name + analysis.ExternalID = finding.Resource.Name + analysis.ConfigType = finding.Resource.Type

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

analysis.ExternalID = finding.Resource.Name

analysis.ExternalID = finding.Resource.Name

analysis.ConfigType = finding.Resource.Type

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scrapers/gcp/security_center.go` at line 115, Set analysis.ConfigType whenever setting analysis.ExternalID so upsertAnalysis can resolve by both keys; update the code around the assignment to analysis.ExternalID (the block that sets analysis.Resource.Name) to also set analysis.ConfigType to the appropriate constant/enum/value used for Security Center resources (the same ConfigType expected by upsertAnalysis). Ensure the value matches the key used in upsertAnalysis lookup so findings are not dropped.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (5)

scrapers/extract/README.md (1)

390-392: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks (MD040).

Both blocks are missing a fence language and will keep triggering markdownlint warnings.
🧹 Suggested fix
-```
+```text
 ConfigID == uuid.Nil AND (ConfigType == "" OR ExternalID == "")
@@

- +text
uuid.Parse(value) → if err == nil: treat as ConfigID (UUID)
→ if err != nil: treat as ExternalID (string)
Also applies to: 417-420
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/README.md` around lines 390 - 392, The markdown code fences
in the README are missing language identifiers (MD040); update the two fenced
blocks containing the expressions "ConfigID == uuid.Nil AND (ConfigType == "" OR
ExternalID == "")" and the uuid.Parse example so they start with a language tag
(e.g., ```text) instead of just ``` — locate the blocks with those exact
contents (the ConfigID expression and the uuid.Parse(value) → ... example) and
change their opening fences to ```text to satisfy markdownlint.

scrapers/extract/schema.yaml (1)

158-163: ⚠️ Potential issue | 🟠 Major

ChangeResult is still internally contradictory on configID vs external_id.

The description says configID can bypass external lookup, but Line [161] still requires external_id. This blocks or contradicts the configID-only path.

🧭 Suggested schema correction

   ChangeResult:
     type: object
@@
-    required:
-      - external_id
-      - external_change_id
-      - change_type
+    required:
+      - external_change_id
+      - change_type
+    oneOf:
+      - required: [configID]
+      - required: [external_id, config_type]

Also applies to: 225-231

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/schema.yaml` around lines 158 - 163, The ChangeResult schema
is contradictory: it documents mutual exclusivity between configID and
external_id+config_type but still lists external_id in required; update the
schema to express the mutual-exclusion properly (e.g., replace the flat required
list with a oneOf/anyOf that defines two valid shapes: one object requiring
configID only, and another requiring external_id and config_type (and
external_change_id/change_type as needed)), and apply the same fix to the
duplicate occurrence referenced around the other block (lines ~225-231). Ensure
the schema names (ChangeResult, configID, external_id, config_type,
external_change_id, change_type) are used so the validator accepts either the
configID-only path or the external lookup path but not both.

api/v1/interface.go (1)

210-223: ⚠️ Potential issue | 🟡 Minor

EntitySummary.Pretty() can still emit leading whitespace.

When Scraped == 0 and Saved/Skipped/Deleted > 0, Line [216]/[219]/[222] prepend .Space() before the first token.

♻️ Suggested fix

func (e EntitySummary) Pretty() api.Text {
 	t := clicky.Text("")
+	first := true
+	appendPart := func(count int, label, style string) {
+		if count == 0 {
+			return
+		}
+		if !first {
+			t = t.Space()
+		}
+		t = t.Appendf("%d", count).AddText(" "+label, style)
+		first = false
+	}
-	if e.Scraped > 0 {
-		t = t.Appendf("%d", e.Scraped).AddText(" scraped", "muted")
-	}
-	if e.Saved > 0 {
-		t = t.Space().Appendf("%d", e.Saved).AddText(" saved", "success")
-	}
-	if e.Skipped > 0 {
-		t = t.Space().Appendf("%d", e.Skipped).AddText(" skipped", "warning")
-	}
-	if e.Deleted > 0 {
-		t = t.Space().Appendf("%d", e.Deleted).AddText(" deleted", "error")
-	}
+	appendPart(e.Scraped, "scraped", "muted")
+	appendPart(e.Saved, "saved", "success")
+	appendPart(e.Skipped, "skipped", "warning")
+	appendPart(e.Deleted, "deleted", "error")
 	return t
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/v1/interface.go` around lines 210 - 223, EntitySummary.Pretty() can emit
a leading space because Saved/Skipped/Deleted branches always call .Space()
before the first token; change each branch (the code inside EntitySummary.Pretty
for Saved, Skipped, Deleted) to only prepend a space when the existing text is
non-empty—i.e. detect whether t is empty (or maintain a firstToken boolean) and
if empty call Appendf(...).AddText(...) directly, otherwise call t =
t.Space().Appendf(...).AddText(...); apply this change to the Saved, Skipped and
Deleted branches so no leading whitespace is produced.

scrapers/extract/extract.go (2)

612-624: ⚠️ Potential issue | 🟠 Major

Validate config references with IsEmpty() instead of ExternalID only.

Line [612] and Line [623] accept/reject records based only on ConfigExternalID.ExternalID, which allows incomplete refs and ignores other valid external-ref forms.

Suggested patch

-		if ca.ConfigID != uuid.Nil || ca.ConfigExternalID.ExternalID != "" {
+		if ca.ConfigID != uuid.Nil || !ca.ConfigExternalID.IsEmpty() {
 			validAccess = append(validAccess, ca)
 			continue
 		}
@@
-		if al.ConfigID != uuid.Nil || al.ConfigExternalID.ExternalID != "" {
+		if al.ConfigID != uuid.Nil || !al.ConfigExternalID.IsEmpty() {
 			validLogs = append(validLogs, al)
 			continue
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/extract.go` around lines 612 - 624, The validation currently
checks only ConfigID != uuid.Nil or ConfigExternalID.ExternalID != "" (in the ca
and al loops), which misses other non-empty external ref forms; update both
checks to use the ConfigExternalID.IsEmpty() helper instead (i.e., treat a
non-empty ref as !ConfigExternalID.IsEmpty()). Concretely, in the first loop
guarding variable ca (type with ConfigID and ConfigExternalID) and in the second
loop for al (v1.ExternalConfigAccessLog), replace the
"ConfigExternalID.ExternalID != \"\"" condition with
"!ConfigExternalID.IsEmpty()" so all valid external-ref shapes are recognized.
Ensure you update both occurrences so result.Summary.ConfigAccess.Skipped and
result.AddWarning semantics remain unchanged.

214-221: ⚠️ Potential issue | 🟠 Major

Run resolver sync after alias materialization to avoid unsynced/duplicate entities.

At Line [215], SyncEntities runs before Line [218] ResolveAccess, but alias misses in resolveConfigAccessAliases (Lines [334], [351], [368]) append new users/roles/groups afterward. Those additions are not synced in this flow and can duplicate identities within the batch.

Also applies to: 325-376

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/extract.go` around lines 214 - 221, The current flow calls
SyncEntities(resolver, scraperID, &result) before ResolveAccess, but
resolveConfigAccessAliases (used by ResolveAccess) appends new
users/roles/groups later, causing unsynced/duplicate identities; fix by ensuring
entity sync happens after alias materialization: move the SyncEntities call to
run after ResolveAccess has completed (or invoke a second SyncEntities pass
immediately after ResolveAccess/resolveConfigAccessAliases), updating calls
around SyncEntities and ResolveAccess so SyncEntities runs once aliases are
materialized (references: SyncEntities, ResolveAccess,
resolveConfigAccessAliases).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1/interface.go`:
- Around line 54-68: AnalysisResult has new fields ExternalID, ConfigType, and
ExternalConfigs that are not persisted because persistence uses
ToConfigAnalysis(); update the conversion/persistence flow so these fields are
carried through: modify ToConfigAnalysis() (and any callers in db/update.go) to
include ExternalID, ConfigType, and ExternalConfigs in the produced config
analysis struct or ensure an alternative conversion function is used that
preserves them; adjust db/update.go persistence logic to consume the extended
conversion output so the new targeting fields are stored.

In `@api/v1/types.go`:
- Line 191: ExternalID's emptiness check includes ConfigID (so IsEmpty() returns
false for ConfigID-only IDs) but ExternalID.Find() still doesn't use ConfigID in
its lookup, causing configID-only references to fail resolution; update the
lookup logic in ExternalID.Find (and any helper it calls) to include ConfigID as
a searchable key when ConfigID is non-empty (i.e., add a filter/branch that
queries by ConfigID or combines ConfigID with existing criteria), or
alternatively change ExternalID.IsEmpty to ignore ConfigID—prefer updating
ExternalID.Find to honor ConfigID so that ExternalID.IsEmpty, ExternalID.Find,
and any call paths using them are consistent.

In `@scrapers/extract/extract.go`:
- Around line 545-553: The loop over result.AccessLogs currently only checks
result.AccessLogs[i].ConfigID before applying top-level defaults, which can
overwrite an existing per-entry ConfigExternalID; update the condition so you
skip setting defaults if either ConfigID is set (ConfigID != uuid.Nil) OR
ConfigExternalID is set (!ConfigExternalID.IsEmpty()), and when assigning
defaults only set ConfigID if defaultConfigID != uuid.Nil and only set
ConfigExternalID if defaultExternalID is non-empty, using the existing symbols
result.AccessLogs, ConfigID, ConfigExternalID, defaultConfigID, and
defaultExternalID.

In `@scrapers/extract/README.md`:
- Line 229: The link fragment in the table row targeting
`#changeaction-allowed-values` is invalid because the actual heading is `###
ChangeAction Values`; fix by making the fragment match the heading (change the
link to `#changeaction-values`) or rename the heading to `ChangeAction Allowed
Values` so the fragment `#changeaction-allowed-values` is valid; update the
table entry referencing `ChangeAction`/`ChangeAction Values` accordingly to keep
the anchor consistent.
- Around line 381-385: The README docs for ExternalID.ConfigID are out of sync:
update the field table to match the code where ConfigID is defined as type
string (not uuid.UUID) and note emptiness is checked with ConfigID == "" rather
than uuid.Nil; ensure the same correction is applied to the other occurrence of
the ExternalID docs (the second table block mentioned) and keep the JSON tag
`config_id,omitempty` and the description meaning (direct config identifier)
consistent with the api/v1/types.go definition of ConfigID and its emptiness
rule.

---

Duplicate comments:
In `@api/v1/interface.go`:
- Around line 210-223: EntitySummary.Pretty() can emit a leading space because
Saved/Skipped/Deleted branches always call .Space() before the first token;
change each branch (the code inside EntitySummary.Pretty for Saved, Skipped,
Deleted) to only prepend a space when the existing text is non-empty—i.e. detect
whether t is empty (or maintain a firstToken boolean) and if empty call
Appendf(...).AddText(...) directly, otherwise call t =
t.Space().Appendf(...).AddText(...); apply this change to the Saved, Skipped and
Deleted branches so no leading whitespace is produced.

In `@scrapers/extract/extract.go`:
- Around line 612-624: The validation currently checks only ConfigID != uuid.Nil
or ConfigExternalID.ExternalID != "" (in the ca and al loops), which misses
other non-empty external ref forms; update both checks to use the
ConfigExternalID.IsEmpty() helper instead (i.e., treat a non-empty ref as
!ConfigExternalID.IsEmpty()). Concretely, in the first loop guarding variable ca
(type with ConfigID and ConfigExternalID) and in the second loop for al
(v1.ExternalConfigAccessLog), replace the "ConfigExternalID.ExternalID != \"\""
condition with "!ConfigExternalID.IsEmpty()" so all valid external-ref shapes
are recognized. Ensure you update both occurrences so
result.Summary.ConfigAccess.Skipped and result.AddWarning semantics remain
unchanged.
- Around line 214-221: The current flow calls SyncEntities(resolver, scraperID,
&result) before ResolveAccess, but resolveConfigAccessAliases (used by
ResolveAccess) appends new users/roles/groups later, causing unsynced/duplicate
identities; fix by ensuring entity sync happens after alias materialization:
move the SyncEntities call to run after ResolveAccess has completed (or invoke a
second SyncEntities pass immediately after
ResolveAccess/resolveConfigAccessAliases), updating calls around SyncEntities
and ResolveAccess so SyncEntities runs once aliases are materialized
(references: SyncEntities, ResolveAccess, resolveConfigAccessAliases).

In `@scrapers/extract/README.md`:
- Around line 390-392: The markdown code fences in the README are missing
language identifiers (MD040); update the two fenced blocks containing the
expressions "ConfigID == uuid.Nil AND (ConfigType == "" OR ExternalID == "")"
and the uuid.Parse example so they start with a language tag (e.g., ```text)
instead of just ``` — locate the blocks with those exact contents (the ConfigID
expression and the uuid.Parse(value) → ... example) and change their opening
fences to ```text to satisfy markdownlint.

In `@scrapers/extract/schema.yaml`:
- Around line 158-163: The ChangeResult schema is contradictory: it documents
mutual exclusivity between configID and external_id+config_type but still lists
external_id in required; update the schema to express the mutual-exclusion
properly (e.g., replace the flat required list with a oneOf/anyOf that defines
two valid shapes: one object requiring configID only, and another requiring
external_id and config_type (and external_change_id/change_type as needed)), and
apply the same fix to the duplicate occurrence referenced around the other block
(lines ~225-231). Ensure the schema names (ChangeResult, configID, external_id,
config_type, external_change_id, change_type) are used so the validator accepts
either the configID-only path or the external lookup path but not both.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97141654-741d-4a4d-9ec8-55aae9469402

📥 Commits

Reviewing files that changed from the base of the PR and between 97e014c and 54070f2.

📒 Files selected for processing (14)

api/v1/interface.go
api/v1/types.go
cmd/run.go
scrapers/extract/README.md
scrapers/extract/extract.go
scrapers/extract/schema.yaml
scrapers/extract/testdata/changes_and_access.yaml
scrapers/extract/testdata/changes_basic.yaml
scrapers/extract/testdata/changes_default_from_parent.yaml
scrapers/extract/testdata/changes_invalid_ref.yaml
scrapers/extract/testdata/changes_missing_change_id.yaml
scrapers/extract/testdata/changes_missing_change_type.yaml
scrapers/extract/testdata/changes_no_parent.yaml
scrapers/extract/testdata/changes_with_details.yaml

✅ Files skipped from review due to trivial changes (2)

scrapers/extract/testdata/changes_missing_change_type.yaml
cmd/run.go

🚧 Files skipped from review as they are similar to previous changes (3)

scrapers/extract/testdata/changes_with_details.yaml
scrapers/extract/testdata/changes_and_access.yaml
scrapers/extract/testdata/changes_no_parent.yaml

api/v1/interface.go

coderabbitai · 2026-03-05T15:24:56Z

api/v1/types.go

 }

 type ExternalID struct {
+	ConfigID   string `json:"config_id,omitempty"`


⚠️ Potential issue | 🟠 Major

ConfigID is now part of emptiness semantics, but lookup still ignores it.

After Line [244], ExternalID{ConfigID: ...} is considered non-empty, so call paths that branch on IsEmpty() can invoke external-ID lookup. But ExternalID.Find() still does not query by ConfigID, so configID-only references can fail resolution.

🔧 Proposed follow-up

func (e ExternalID) Find(db *gorm.DB) *gorm.DB { - query := db.Limit(1).Order("updated_at DESC").Where("deleted_at IS NULL").Where("? = ANY(external_id)", strings.ToLower(e.ExternalID)) + query := db.Limit(1).Order("updated_at DESC").Where("deleted_at IS NULL") + if e.ConfigID != "" { + return query.Where("id = ?", e.ConfigID) + } + query = query.Where("? = ANY(external_id)", strings.ToLower(e.ExternalID)) if e.ConfigType != "" { query = query.Where("type = ?", e.ConfigType) } ... } func (e ExternalID) Key() string { - return strings.ToLower(fmt.Sprintf("%s%s%s", e.ConfigType, e.ExternalID, e.ScraperID)) + if e.ConfigID != "" { + return strings.ToLower("config_id:" + e.ConfigID) + } + return strings.ToLower(fmt.Sprintf("%s%s%s", e.ConfigType, e.ExternalID, e.ScraperID)) }

Also applies to: 243-245

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/v1/types.go` at line 191, ExternalID's emptiness check includes ConfigID (so IsEmpty() returns false for ConfigID-only IDs) but ExternalID.Find() still doesn't use ConfigID in its lookup, causing configID-only references to fail resolution; update the lookup logic in ExternalID.Find (and any helper it calls) to include ConfigID as a searchable key when ConfigID is non-empty (i.e., add a filter/branch that queries by ConfigID or combines ConfigID with existing criteria), or alternatively change ExternalID.IsEmpty to ignore ConfigID—prefer updating ExternalID.Find to honor ConfigID so that ExternalID.IsEmpty, ExternalID.Find, and any call paths using them are consistent.

scrapers/extract/extract.go

scrapers/extract/README.md

coderabbitai

🧹 Nitpick comments (1)

scrapers/extract/fullmode.go (1)

57-61: Add a defensive guard before indexing scraped[0].

Line 59 assumes scraped is non-empty. It’s safe with current invariants, but a small guard makes this panic-proof against future refactors.

Suggested patch

+	if len(scraped) == 0 {
+		return scraped
+	}
+
 	for _, a := range appenders {
 		if a.hasItems {
 			result := v1.NewScrapeResult(scraped[0].BaseScraper)
 			a.apply(result)
 			scraped = append(scraped, *result)
 		}
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scrapers/extract/fullmode.go` around lines 57 - 61, Add a defensive guard
before indexing scraped[0] in the loop over appenders: check len(scraped) > 0
(or otherwise ensure scraped is non-empty) before calling
v1.NewScrapeResult(scraped[0].BaseScraper); if scraped is empty, skip the
appender (or handle the error) so a.apply(result) and scraped = append(scraped,
*result) never cause a panic. Locate this in the loop iterating over appenders
and protect the v1.NewScrapeResult(...) call and subsequent a.apply(...) usage.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scrapers/extract/fullmode.go`:
- Around line 57-61: Add a defensive guard before indexing scraped[0] in the
loop over appenders: check len(scraped) > 0 (or otherwise ensure scraped is
non-empty) before calling v1.NewScrapeResult(scraped[0].BaseScraper); if scraped
is empty, skip the appender (or handle the error) so a.apply(result) and scraped
= append(scraped, *result) never cause a panic. Locate this in the loop
iterating over appenders and protect the v1.NewScrapeResult(...) call and
subsequent a.apply(...) usage.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d417b15c-7e16-4066-8fda-dbf3ff7ad02b

📥 Commits

Reviewing files that changed from the base of the PR and between 54070f2 and ed3f349.

📒 Files selected for processing (7)

Makefile
db/external_entities.go
jobs/jobs_test.go
scrapers/exec/exec_backstage_integration_test.go
scrapers/external_entities_test.go
scrapers/extract/fullmode.go
scrapers/run.go

💤 Files with no reviewable changes (1)

db/external_entities.go

…ests Replace separate JSON test fixtures with YAML files in fixtures/data/extraction/ that contain both input and expected output.

When extracted config has no "config" key but has entities/access, clear Config and ID so the item is skipped as a config item.

- Add ExternalUserAliases field to ExternalConfigAccessLog struct - Expand "user" shorthand in expandAccessLogShorthand - Resolve user aliases to ExternalUserID in ResolveAccess - Validate user reference in validateConfigRefs

- Fix ExternalConfigAccess struct: remove incorrect embedding claim, add missing fields (id, audit timestamps), update file reference - Fix ExternalConfigAccessLog: add missing ExternalUserAliases field and user shorthand documentation - Add access_logs shorthand expansion table (user, config ref fields) - Add missing validations to summary: HasPrincipal() for config_access, user reference for access_logs - Add user shorthand to access_logs example in section 10 - Update schema.yaml: add id/external_config_id/audit fields to ConfigAccess, add user shorthand/external_user_aliases/external_config_id to ConfigAccessLog, update descriptions with validation warnings

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

moshloop force-pushed the pr/extract-refactor branch from ed3f349 to 1549ee5 Compare March 5, 2026 21:00

moshloop added 5 commits March 5, 2026 23:08

test: replace JSON fixtures with YAML fixture-folder for extraction t…

4665ccc

…ests Replace separate JSON test fixtures with YAML files in fixtures/data/extraction/ that contain both input and expected output.

chore: add changes tests

5237d3e

fix: entity-only items should not produce config items

632efe6

When extracted config has no "config" key but has entities/access, clear Config and ID so the item is skipped as a config item.

feat: add user alias support to ExternalConfigAccessLog

e203214

- Add ExternalUserAliases field to ExternalConfigAccessLog struct - Expand "user" shorthand in expandAccessLogShorthand - Resolve user aliases to ExternalUserID in ResolveAccess - Validate user reference in validateConfigRefs

moshloop force-pushed the pr/extract-refactor branch from 1549ee5 to db35b35 Compare March 5, 2026 21:15

moshloop enabled auto-merge (rebase) March 5, 2026 21:18

chore: move fixtures to testdata

7348cd4

moshloop force-pushed the pr/extract-refactor branch from db35b35 to 7348cd4 Compare March 5, 2026 21:34

moshloop merged commit e358b38 into main Mar 5, 2026
12 of 15 checks passed

moshloop deleted the pr/extract-refactor branch March 5, 2026 21:46

-		changes[i].Summary += strings.Join(paths, ", ")
+		pathStr := strings.Join(paths, ", ")
+		if changes[i].Summary != "" {
+			changes[i].Summary += " "
+		}
+		changes[i].Summary += pathStr

	analysis.ExternalID = finding.Resource.Name
	analysis.ExternalID = finding.Resource.Name
	analysis.ConfigType = finding.Resource.Type

Conversation

moshloop commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

moshloop commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading