-
Notifications
You must be signed in to change notification settings - Fork 123
Fix GCP provider to iterate over only provided project_ids in config instead of looping through All projects #719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Added provider_ids for GCP
WalkthroughAdds optional GCP Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Config as YAML Config
participant Provider as OrganizationProvider
participant CRM as Cloud Resource Manager
participant AssetAPI as Cloud Asset API
participant Runner as Runner/Processor
participant Output as Output
Note over Config,Provider: Init with optional `project_ids`
Config->>Provider: provide provider options (project_ids)
Provider->>CRM: enrichWithProjectNumbers(project_ids)
alt enrichment succeeds
CRM-->>Provider: project numbers
Provider->>Provider: build projectScope (IDs + numbers)
else enrichment fails / none provided
Provider->>CRM: list projects (fallback)
CRM-->>Provider: discovered projects
Provider->>Provider: build projectScope or leave nil
end
Provider->>AssetAPI: enumerate assets (per-project if scope present, org-wide otherwise)
AssetAPI-->>Provider: stream assets
Provider->>Runner: forward assets (client-side filter by projectScope)
Runner->>Runner: sanitizePrivateIPs(resource, exclude=true)
Runner->>Output: write sanitized asset record
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pkg/schema/schema.go (1)
243-254: Error message is now misleading for the expanded case.The error message on line 253 references only
account_ids, but this case now handlesaccount_ids,urls,services, andproject_ids. Consider updating the error message to be dynamic or generic.- return fmt.Errorf("unsupported type %T in account_ids", v) + return fmt.Errorf("unsupported type %T in %s", v, key)pkg/providers/gcp/project_scope_test.go (1)
36-47: Consider adding a negative test case for numeric project filtering.This test only verifies the positive case. Adding a test where the asset belongs to a different project number would improve coverage.
func TestProjectScopeAllowsAssetByNumber(t *testing.T) { scope := newProjectScope([]string{"123456789"}) require.NotNil(t, scope) asset := &assetpb.Asset{ Resource: &assetpb.Resource{ Parent: "//cloudresourcemanager.googleapis.com/projects/123456789", }, } assert.True(t, scope.allowsAsset(asset)) + + blockedAsset := &assetpb.Asset{ + Resource: &assetpb.Resource{ + Parent: "//cloudresourcemanager.googleapis.com/projects/987654321", + }, + } + assert.False(t, scope.allowsAsset(blockedAsset)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
PROVIDERS.md(2 hunks)pkg/providers/gcp/gcp.go(7 hunks)pkg/providers/gcp/project_scope_test.go(1 hunks)pkg/schema/schema.go(1 hunks)pkg/schema/schema_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
PROVIDERS.md
📄 CodeRabbit inference engine (CLAUDE.md)
Document configuration for any new provider in PROVIDERS.md
Files:
PROVIDERS.md
pkg/providers/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
pkg/providers/**/*.go: All providers must implement the schema.Provider interface with methods: Name(), ID(), Resources(ctx), Services()
For complex providers, keep service-specific logic in separate files (e.g., instances.go, dns.go, route53.go, s3.go, eks.go) and return *schema.Resources merged by the main Resources()
Parse provider configuration via block.GetMetadata(key); support environment variables using $VAR syntax; return &schema.ErrNoSuchKey{Name: key} for missing required config
Respect service filtering: Services() must list supported services, and providers should honor -s filters when gathering resources
Always pass and honor context.Context in Resources() and long-running API calls to support cancellation
Do not manually deduplicate resources; use Resources.Append() to rely on schema’s ResourceDeduplicator
Each emitted resource must have at least one of IP or DNS populated; empty resources are invalid
Treat provider config id as a user-defined identifier for filtering, not a cloud resource ID
Files:
pkg/providers/gcp/project_scope_test.gopkg/providers/gcp/gcp.go
pkg/providers/*/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Each provider directory must include a main file named .go defining New(block schema.OptionBlock) and Resources(ctx) orchestration
Files:
pkg/providers/gcp/project_scope_test.gopkg/providers/gcp/gcp.go
pkg/providers/gcp/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
GCP provider: switch discovery mode based on presence of organization_id; use cloud.google.com/go/asset for Asset API mode, else service-specific clients
Files:
pkg/providers/gcp/project_scope_test.gopkg/providers/gcp/gcp.go
**/*_test.go
📄 CodeRabbit inference engine (CLAUDE.md)
Prefer unit tests with mocked cloud provider clients; use table-driven tests for resource parsing; include tests for error handling on missing/invalid credentials
Files:
pkg/providers/gcp/project_scope_test.gopkg/schema/schema_test.go
pkg/providers/**/*_test.go
📄 CodeRabbit inference engine (CLAUDE.md)
Skip or isolate integration tests that require real cloud credentials; keep unit tests deterministic for CI
Files:
pkg/providers/gcp/project_scope_test.go
🧬 Code graph analysis (1)
pkg/schema/schema_test.go (1)
pkg/schema/schema.go (1)
Options(211-211)
🔇 Additional comments (19)
pkg/providers/gcp/project_scope_test.go (2)
11-18: LGTM!Good coverage for
splitAndCleanProjectList- tests whitespace trimming, empty element removal, and deduplication in a single case.
20-34: LGTM!Solid test coverage for project ID-based asset filtering with both positive and negative cases.
pkg/schema/schema_test.go (1)
10-26: LGTM!Good test validating that
project_idsarrays are correctly parsed into comma-separated strings while preserving order and duplicates at the schema level. The deduplication happens downstream in the GCP provider.PROVIDERS.md (2)
155-156: LGTM!Clear documentation of the new
project_idsparameter with accurate description of its effect on discovery scope.
195-201: LGTM!Good example showing
project_idsin an organization-level configuration with a helpful explanation of use cases.pkg/providers/gcp/gcp.go (14)
45-62: LGTM!Clean addition of
projectScopefield to support project-based filtering in organization-level discovery.
247-248: LGTM!Clean extraction of configured project IDs from options at the start of provider initialization.
344-368: LGTM!Good logic flow: prioritizes configured projects, falls back to listing all accessible projects, and properly handles the case when no projects are available. The logging at line 365 provides useful visibility into which mode is being used.
485-488: LGTM!Correct asset filtering placement. The nil-safe
allowsAssetmethod ensures backward compatibility when no project scope is configured.
549-550: LGTM!Consistent extraction of configured projects at the start of organization provider initialization.
686-713: LGTM!Well-structured project scope initialization with appropriate error handling. The approach of logging warnings for enrichment failures while continuing is correct, as partial resolution is acceptable.
932-961: LGTM!Clean helper functions with good normalization (handling various separators) and deduplication logic. Returning
nilinstead of empty slice enables clean nil checks in callers.
963-998: LGTM!Well-designed
projectScopetype with clean initialization. The separation ofallowedIDsandallowedNumbersenables matching assets by either identifier format.
1007-1030: LGTM!Good resilient design - continues processing remaining projects even if some fail to resolve, while still surfacing the first error. Passing context to API calls supports cancellation as required by coding guidelines.
1032-1059: LGTM!Clean asset allowance logic with proper nil-safety for backward compatibility. Short-circuit evaluation on first match is efficient.
1061-1083: LGTM!Comprehensive project ID extraction from multiple possible locations in the asset structure, with proper nil checks and filtering of placeholder values.
1085-1107: LGTM!Thorough project number extraction with multiple fallback locations, including ancestor traversal for hierarchical resources.
1120-1136: LGTM!Clean token extraction logic handling various GCP resource path formats including the
//prefix used in asset names.
1138-1146: LGTM!Simple and effective numeric validation using
strconv.ParseInt.
|
Hey Folks, Let me know if this will be taken for consideration. This is my first ever Opensource contribution and I hope I will make more.. |
|
@srivathsan-srinivasan this looks good, will review / test and merge asap! |
ShubhamRasal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for this contribution! 🎉 Really appreciate you taking the time to add project scoping for GCP — this is a useful feature for large organizations.
I had a look at the implementation and wanted to share some thoughts:
Current approach:
The PR fetches all assets at the organization level (organizations/{org-id}) and then filters them client-side using projectScope.allowsAsset(). While this works correctly, it can be inefficient for large organizations since we're:
- Fetching all assets across the entire org (potentially thousands)
- Using API quota for assets we'll discard
- Adding unnecessary network overhead
Suggested approach:
The GCP Asset API's ListAssetsRequest actually supports scoping directly via the Parent field:
Format: "organizations/[organization-number]", "projects/[project-id]",
"projects/[project-number]", or "folders/[folder-number]"
So when project_ids is configured, we could make per-project API calls instead:
for _, projectID := range configuredProjectIDs {
parent := "projects/" + projectID
assets, err := p.getAssetsForTypes(ctx, parent, assetTypes)
// ...merge results
}This way, the API returns only what we need — no client-side filtering required, less quota usage, and faster for targeted discovery.
Happy to discuss further or help if you'd like to iterate on this! Let me know what you think.
…dded per project calls instead of fetching entire org to reduce overhead
feat: add sanitization for private IPs and implement related tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/runner/runner.go (1)
309-321: Consider logging write errors for better observability.The helper functions silently ignore write errors (lines 313, 320). While this may be intentional for resilience, it could hide issues like full disks or unmounted file systems.
🔎 Optional enhancement: Log write errors
func writeOutputString(output *os.File, data string) { if output == nil || data == "" { return } - _, _ = output.WriteString(data) + if _, err := output.WriteString(data); err != nil { + gologger.Warning().Msgf("Failed to write output: %s", err) + } } func writeOutputBytes(output *os.File, data []byte) { if output == nil || len(data) == 0 { return } - _, _ = output.Write(data) + if _, err := output.Write(data); err != nil { + gologger.Warning().Msgf("Failed to write output: %s", err) + } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
internal/runner/runner.gointernal/runner/runner_test.gopkg/providers/gcp/gcp.gopkg/schema/validate/validate.go
🧰 Additional context used
📓 Path-based instructions (4)
pkg/providers/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
pkg/providers/**/*.go: All providers must implement the schema.Provider interface with methods: Name(), ID(), Resources(ctx), Services()
For complex providers, keep service-specific logic in separate files (e.g., instances.go, dns.go, route53.go, s3.go, eks.go) and return *schema.Resources merged by the main Resources()
Parse provider configuration via block.GetMetadata(key); support environment variables using $VAR syntax; return &schema.ErrNoSuchKey{Name: key} for missing required config
Respect service filtering: Services() must list supported services, and providers should honor -s filters when gathering resources
Always pass and honor context.Context in Resources() and long-running API calls to support cancellation
Do not manually deduplicate resources; use Resources.Append() to rely on schema’s ResourceDeduplicator
Each emitted resource must have at least one of IP or DNS populated; empty resources are invalid
Treat provider config id as a user-defined identifier for filtering, not a cloud resource ID
Files:
pkg/providers/gcp/gcp.go
pkg/providers/*/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Each provider directory must include a main file named .go defining New(block schema.OptionBlock) and Resources(ctx) orchestration
Files:
pkg/providers/gcp/gcp.go
pkg/providers/gcp/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
GCP provider: switch discovery mode based on presence of organization_id; use cloud.google.com/go/asset for Asset API mode, else service-specific clients
Files:
pkg/providers/gcp/gcp.go
**/*_test.go
📄 CodeRabbit inference engine (CLAUDE.md)
Prefer unit tests with mocked cloud provider clients; use table-driven tests for resource parsing; include tests for error handling on missing/invalid credentials
Files:
internal/runner/runner_test.go
🧬 Code graph analysis (2)
internal/runner/runner.go (2)
pkg/schema/validate/validate.go (7)
PublicIPv4(74-74)PublicIPv6(75-75)DNSName(73-73)Validator(45-49)NewValidator(52-66)PrivateIPv4(76-76)PrivateIPv6(77-77)pkg/schema/schema.go (1)
Resource(177-198)
internal/runner/runner_test.go (2)
pkg/schema/schema.go (1)
Resource(177-198)pkg/schema/validate/validate.go (3)
PublicIPv4(74-74)PublicIPv6(75-75)DNSName(73-73)
🔇 Additional comments (16)
pkg/providers/gcp/gcp.go (7)
52-52: LGTM! Clean addition of project scope support.The
projectScopefield is appropriately typed as a pointer to support the nil case when no specific projects are configured.
247-368: LGTM! Project scoping logic is well-implemented.The implementation correctly:
- Prefers configured projects from
project_idsconfig- Falls back to listing all accessible projects when none are configured
- Provides clear logging to inform users which mode is active
- Maintains backward compatibility for existing configurations
386-450: LGTM! Resources() method correctly implements project-scoped discovery.The implementation properly branches between project-scoped and organization-level discovery:
- When
projectScopeis configured, it makes per-project API calls (lines 393-422)- Falls back to organization-level API calls when no scope is configured (lines 424-447)
- Provides clear logging to indicate which discovery mode is active
- Handles errors gracefully with warnings and continues processing
This design allows users to limit discovery to specific projects as intended by the PR objectives.
518-524: LGTM! Client-side filtering logic is correctly implemented.The conditional filtering is appropriate:
- Filtering is only applied when using organization-level API calls (
organizations/parent) with a configuredprojectScope- When using project-scoped API calls (
projects/parent), the API itself handles filtering- The explanatory comment helps future maintainers understand when filtering is necessary
586-750: LGTM! Organization provider creation correctly handles project scoping.The implementation:
- Reads configured
project_idsand creates aprojectScopewhen present- Enriches project IDs with numeric project numbers for robust asset matching (line 728)
- Falls back to listing all organization projects when no specific projects are configured
- Provides clear logging to indicate which discovery mode will be used
- Handles errors gracefully with appropriate warnings
The enrichment step is particularly valuable as it allows matching assets that reference projects by number rather than ID.
969-998: LGTM! Project ID parsing is robust and user-friendly.The helper functions handle various input formats well:
getProjectIDsFromOptionsprovides a clean interface to read the configurationsplitAndCleanProjectListhandles multiple delimiters (newlines, semicolons, commas) making it flexible for different YAML formatting styles- Whitespace trimming prevents user input errors
- Deduplication prevents redundant API calls
- Returns
nilfor empty/invalid input, following Go conventions
1000-1183: LGTM! Excellent design of projectScope type and supporting functions.The implementation demonstrates strong software engineering practices:
Type design:
- Separate maps for project IDs and numbers enable efficient O(1) lookup
orderedIDspreserves user-specified order for iterationRobust parsing:
extractProjectIDFromAssetandextractProjectNumberFromAssetcheck multiple locations in the asset structure- Proper nil checks throughout prevent panics
- Consistent handling of empty/invalid values (empty strings and "_")
Error handling:
enrichWithProjectNumberscontinues enrichment even if some projects fail, while tracking the first error- This graceful degradation ensures partial success rather than complete failure
Performance:
- Map-based lookups in
containsIDandcontainsNumberare optimal for filtering large asset setsextractProjectTokenefficiently parses GCP resource names using string operationspkg/schema/validate/validate.go (1)
13-13: LGTM: Correct CGNAT range addition.The addition of
100.64.0.0/10for Carrier-Grade NAT (RFC 6598) is correct and appropriate for private IP detection.internal/runner/runner_test.go (2)
9-37: LGTM: Comprehensive test coverage.The test effectively validates:
- Private IP fields are cleared when
excludePrivate=true- Public fields (PublicIPv4, PublicIPv6, DNSName) remain intact
- Private IP fields are preserved when
excludePrivate=false- Nil resource handling (line 36) prevents panics
39-58: LGTM: Good defensive test for misclassified IPs.This test validates an important edge case where private IPs might be incorrectly placed in public fields. The sanitizer correctly identifies and clears them regardless of field name, which is the right defensive behavior.
internal/runner/runner.go (6)
120-120: LGTM: Correct placement of sanitization.The
sanitizePrivateIPscall is correctly positioned before any output processing, ensuring consistent behavior across all output modes (JSON, hosts, IPs, etc.).
131-131: LGTM: Good refactor to centralize output writing.Replacing direct
output.Write/WriteStringcalls with helper functions improves maintainability and ensures consistent error handling across all output paths.Also applies to: 159-159, 170-170, 178-178, 186-186, 194-194, 205-205, 213-213, 221-221, 229-229, 237-237
271-284: LGTM: Correct sanitization logic with defensive checks.The function correctly:
- Returns early when exclusion is disabled or resource is nil
- Clears private IP fields unconditionally when exclusion is enabled
- Defensively checks
PublicIPv4andPublicIPv6fields for misclassified private IPs
286-289: LGTM: Proper singleton pattern for concurrency safety.Using
sync.Onceensures thread-safe lazy initialization of the validator, which is appropriate for a package-level singleton that may be accessed concurrently.
291-307: LGTM: Robust IP detection with graceful error handling.The function handles edge cases well:
- Empty string check (line 292)
- Graceful degradation if validator initialization fails (lines 302-304)
- Logging of initialization errors without failing the entire operation
182-182: Defensive checks are acceptable but redundant.The
!r.options.ExcludePrivatechecks before outputting private IPs are technically redundant sincesanitizePrivateIPs(line 120) already clears these fields when exclusion is enabled. However, this defensive approach improves code clarity and serves as documentation of intent.Also applies to: 190-190, 225-225, 233-233
Removing Public bogon Range. ++ plans to add sanitizer for bogon ranges
|
Hey @ShubhamRasal Thanks for reviewing and suggesting the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pkg/providers/gcp/gcp.go (1)
402-402: Fix typo in comment (duplicate).The comment contains "thne" which should be "then".
This was previously flagged in past reviews.
🧹 Nitpick comments (2)
pkg/providers/gcp/gcp.go (2)
344-368: Verify configured project IDs exist and are accessible.The code uses configured project IDs without validation. If a user misconfigures a project ID (typo, wrong project, insufficient permissions), API calls will fail later during resource discovery. Consider adding early validation by attempting a lightweight API call (e.g.,
manager.Projects.Get(projectID)) for each configured project and logging warnings for inaccessible ones.This is a UX improvement to provide clearer feedback when projects are misconfigured, though the current fail-later approach is also acceptable.
723-750: Consider more verbose logging for project enrichment failures.When
enrichWithProjectNumbersfails (line 728-730), only a generic warning is logged. If a user misconfigures project IDs (typos, wrong org, insufficient permissions), they won't know which specific projects failed. Consider logging the specific project ID and error for each failure withinenrichWithProjectNumbersto improve debuggability.Example improvement in
enrichWithProjectNumbers:if err != nil { gologger.Warning().Msgf("Could not resolve project '%s': %s", id, err) if firstErr == nil { firstErr = err } continue }This would make troubleshooting configuration issues much easier for users.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/providers/gcp/gcp.go
🧰 Additional context used
📓 Path-based instructions (3)
pkg/providers/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
pkg/providers/**/*.go: All providers must implement the schema.Provider interface with methods: Name(), ID(), Resources(ctx), Services()
For complex providers, keep service-specific logic in separate files (e.g., instances.go, dns.go, route53.go, s3.go, eks.go) and return *schema.Resources merged by the main Resources()
Parse provider configuration via block.GetMetadata(key); support environment variables using $VAR syntax; return &schema.ErrNoSuchKey{Name: key} for missing required config
Respect service filtering: Services() must list supported services, and providers should honor -s filters when gathering resources
Always pass and honor context.Context in Resources() and long-running API calls to support cancellation
Do not manually deduplicate resources; use Resources.Append() to rely on schema’s ResourceDeduplicator
Each emitted resource must have at least one of IP or DNS populated; empty resources are invalid
Treat provider config id as a user-defined identifier for filtering, not a cloud resource ID
Files:
pkg/providers/gcp/gcp.go
pkg/providers/*/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Each provider directory must include a main file named .go defining New(block schema.OptionBlock) and Resources(ctx) orchestration
Files:
pkg/providers/gcp/gcp.go
pkg/providers/gcp/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
GCP provider: switch discovery mode based on presence of organization_id; use cloud.google.com/go/asset for Asset API mode, else service-specific clients
Files:
pkg/providers/gcp/gcp.go
🔇 Additional comments (3)
pkg/providers/gcp/gcp.go (3)
518-524: LGTM: Filtering logic is correct and well-documented.The conditional filtering correctly distinguishes between scenarios:
- Project-scoped API calls filter inherently (no client-side filtering needed)
- Organization-level calls without
project_idsreturn all assets (no filtering needed)- Organization-level calls with
project_idsrequire client-side filtering (implemented here)The comment clearly explains the rationale.
969-998: LGTM: Project ID parsing is robust.The utility functions correctly handle multiple delimiter formats (commas, newlines, semicolons), trim whitespace, deduplicate entries, and handle empty inputs gracefully. This provides a good user experience for configuration.
1000-1183: LGTM: Project scope implementation is comprehensive and defensive.The
projectScopeimplementation correctly:
- Deduplicates and sanitizes project IDs during construction
- Enriches IDs with numeric project numbers via API calls
- Filters assets by both project ID and project number for robustness
- Uses multiple fallback extraction strategies to handle various asset formats
- Handles nil cases and edge conditions safely
The code is well-structured and follows defensive programming practices.
What changed
The GCP provider now respects the
project_idsfield in the configuration. ( provider-config.yaml ) as given in this documentation link for Cloudlist Instead of enumerating all accessible GCP projects, discovery is limited strictly to the projects explicitly listed by the user.