Skip to content

[1 of 2] ENG-3157: Platform identity resolution — OSS type changes for PBAC#7807

Open
galvana wants to merge 13 commits intomainfrom
platform-identity-resolution
Open

[1 of 2] ENG-3157: Platform identity resolution — OSS type changes for PBAC#7807
galvana wants to merge 13 commits intomainfrom
platform-identity-resolution

Conversation

@galvana
Copy link
Copy Markdown
Contributor

@galvana galvana commented Apr 1, 2026

Ticket ENG-3157

Description Of Changes

OSS-side changes to support platform identity resolution in Fidesplus PBAC. These changes make the type system cross-platform ready and allow fidesplus to inject platform-specific identity resolvers.

Breaking changes (PBAC is in active development, no backward compat needed):

  • RawQueryLogEntry.identity is now a plain str (was user_email: str + principal_subject: str | None)
  • TableRef fields renamed: projectcatalog, datasetschema (standard SQL catalog terminology)
  • IdentityResolver Protocol signature changed from (user_email, principal_subject) to (identity: str)

Companion PR: Requires fidesplus#3338

Code Changes

  • types.py — Remove QueryIdentity dataclass, change RawQueryLogEntry.identity to str, rename TableRef fields
  • identity/interface.py — Update IdentityResolver Protocol to accept str
  • identity/basic.py — Simplify BasicIdentityResolver.resolve() to work with plain string
  • identity/resolver.py — Simplify RedisIdentityResolver.resolve(), update DatasetResolver for schema field
  • service.py — Add optional identity_resolver param to InProcessPBACEvaluationService.__init__
  • sql_parser.py — Pass identity as plain string
  • consumers/entities.py — Add connection_config_key field (shared Redis storage with fidesplus)

Steps to Confirm

  1. Run all PBAC tests in fidesplus container: docker exec fidesplus-slim bash -c "pytest --no-cov tests/ops/service/pbac/ -v" — should be 139 passing
  2. Run the PBAC demo: python demo/pbac_demo.py — should complete with 3 violations detected
  3. Verify BasicIdentityResolver still resolves by email and external_id

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:

🤖 Generated with Claude Code

…jectable resolver

- Replace QueryIdentity dataclass with plain str for RawQueryLogEntry.identity
- Rename TableRef fields: project→catalog, dataset→schema (standard SQL terminology)
- Make InProcessPBACEvaluationService accept injectable identity_resolver parameter
- Update IdentityResolver Protocol signature to accept str
- Update BasicIdentityResolver, RedisIdentityResolver, DatasetResolver for new field names
- Add connection_config_key to fides OSS DataConsumerEntity (shared Redis storage)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Apr 9, 2026 8:40pm
fides-privacy-center Ignored Ignored Apr 9, 2026 8:40pm

Request Review

@galvana galvana changed the title Platform identity resolution: OSS type changes for PBAC ENG-3157: Platform identity resolution — OSS type changes for PBAC Apr 1, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 0% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.04%. Comparing base (7192e68) to head (64cb289).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/service/pbac/evaluate.py 0.00% 19 Missing ⚠️
src/fides/service/pbac/types.py 0.00% 18 Missing ⚠️
src/fides/service/pbac/dataset/resolver.py 0.00% 16 Missing ⚠️
src/fides/service/pbac/identity/basic.py 0.00% 13 Missing ⚠️
src/fides/service/pbac/service.py 0.00% 11 Missing ⚠️
src/fides/service/pbac/identity/resolver.py 0.00% 3 Missing ⚠️
src/fides/service/pbac/consumers/entities.py 0.00% 2 Missing ⚠️
src/fides/service/pbac/consumers/repository.py 0.00% 2 Missing ⚠️
src/fides/service/pbac/dataset/__init__.py 0.00% 2 Missing ⚠️
src/fides/service/pbac/identity/interface.py 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (83.04%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7807      +/-   ##
==========================================
- Coverage   85.07%   83.04%   -2.04%     
==========================================
  Files         627      627              
  Lines       40780    40763      -17     
  Branches     4742     4736       -6     
==========================================
- Hits        34694    33851     -843     
- Misses       5017     5823     +806     
- Partials     1069     1089      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Adrian Galvan and others added 4 commits April 1, 2026 12:51
- Add EvaluationGap type to types.py (gap_type, identifier, dataset_key, reason)
- Add gaps field to EvaluationResult
- Update evaluate_access to return EvaluationOutput (result + gaps)
- Unresolved identity → gap (was: violation)
- Unconfigured dataset → gap (was: silently passing)
- Violations are now strictly purpose-mismatch issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete ResolvedConsumer — resolvers return DataConsumerEntity directly
- EvaluationResult.consumer is now DataConsumerEntity | None
- Add EvaluationResult.identity field for the unresolved case
- Add GapType and ConsumerType enums (replace magic strings)
- Consolidate AccessGap into EvaluationGap (remove duplicate type)
- evaluate_access uses EvaluationGap with GapType enum directly
- BasicIdentityResolver works with DataConsumerEntity
- Remove _build_resolved_consumer from service (no conversion needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move DatasetResolver from identity/resolver.py to dataset/resolver.py
- Make DatasetResolver injectable via InProcessPBACEvaluationService constructor
- Remove build_identity_resolver factory (unused)
- Clean up identity/resolver.py to only contain RedisIdentityResolver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers evaluation flow, three outcomes (compliant/violation/gap),
extension points with defaults, type inventory, identity resolution,
and package structure. No fidesplus concepts or file listings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana force-pushed the platform-identity-resolution branch from 87f0cd0 to 6ffe02f Compare April 2, 2026 04:56
Adrian Galvan and others added 2 commits April 1, 2026 22:00
…nsumerEntity

Consumers are now identified by `type` + `scope` (a dict of platform-specific
identifiers) instead of `external_id` + `connection_config_key`. This enables
cross-platform identity resolution where the same consumer (e.g., a Google Group)
works across multiple data platform connections without duplication.

The scope dict always includes the full namespace chain (e.g., domain + group_email
for Google Groups, domain + project_id + role for GCP IAM roles). Each scope key
is individually indexed in Redis for efficient filtering.

Both BasicIdentityResolver and RedisIdentityResolver updated to match on
scope email instead of external_id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana changed the title ENG-3157: Platform identity resolution — OSS type changes for PBAC [1 of 2] ENG-3157: Platform identity resolution — OSS type changes for PBAC Apr 6, 2026
@galvana galvana requested a review from thabofletcher April 9, 2026 18:40
@galvana galvana marked this pull request as ready for review April 9, 2026 18:40
@galvana galvana requested a review from a team as a code owner April 9, 2026 18:40
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — ENG-3157: Platform identity resolution (OSS type changes)

The overall direction is solid: collapsing user_email/principal_subject into a plain identity: str, introducing the scope dict for platform-specific identifiers, and the gaps/violations split are all good design moves that will make the fidesplus extension points cleaner. The TableRef rename to standard SQL catalog terminology (catalog/schema) is also an improvement.

That said, there are a few issues worth addressing before merge:


Issues requiring attention

1. is_compliant=True for unresolved identities (evaluate.py:69)
The most significant behavioral change: an unresolved consumer now returns is_compliant=True with gaps, rather than is_compliant=False with violations. Downstream code checking is_compliant to gate access or fire alerts will now silently pass unresolved users. This needs either a clear contract change (documented and agreed upon) or is_compliant=False should be returned when gaps exist.

2. DatasetResolver silent fallback to table_ref.schema (dataset/resolver.py:35)
The resolver always returns a non-None value, so the if fides_key: guard in service.py is always truthy. Every table gets "resolved" — via a guess — with no logging. This masks configuration errors and makes the ds_purposes is None branch in _check_access unreachable (see also inline comment on service.py:174).

3. Scope index key collision (consumers/repository.py:45)
Using f"{key}={value}" as a Redis index value is ambiguous when values contain = (AWS ARNs, URLs, encoded tokens). This will silently break lookups for affected consumers.

4. Silent loss of external_id (consumers/entities.py:80)
from_consumer() sets scope={}, discarding any existing external_id value on the ORM model. Consumers previously identified by external_id will silently become unresolvable. If the ORM column still exists, bridge it into scope as part of this change.

5. BasicIdentityResolver only matches scope["email"] (identity/basic.py:45)
The scope dict is designed to hold arbitrary platform identifiers, but both the BasicIdentityResolver and RedisIdentityResolver only ever look up scope["email"]. Consumers with any other scope key (group name, IAM role, service account, etc.) cannot be resolved via OSS code, which contradicts the extension-point design in the README.


Minor issues

  • ConsumerType enum unused (types.py:142-153): defined with platform-specific values that are never set on any entity. Either enforce it on DataConsumerEntity.type or remove it until it is used.
  • Two distinct dataset gap cases share one GapType (evaluate.py:128 and 141): "not registered" vs "registered but no purposes" are currently distinguishable only by parsing the reason string, which is fragile for callers.
  • Dead code in _build_dataset_purposes (service.py:174): always produces a DatasetPurposes (with empty keys) for every resolved dataset, making the ds_purposes is None branch in _check_access unreachable from the in-process service path.

return EvaluationOutput(
result=ValidationResult(
violations=[],
is_compliant=True,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compliance semantics change: unresolved consumer now returns is_compliant=True

When a consumer has no declared purposes (i.e., an unresolved identity), this path returns is_compliant=True with a list of gaps. Previously this resulted in violations.

This is a meaningful behavioral shift: any downstream code that checks result.is_compliant to gate access or trigger alerts will now treat unresolved identities as passing, not failing. If gaps are not separately monitored and alarmed on, this could silently mask unauthorized access by unknown users.

Consider either:

  1. Returning is_compliant=False when there are identity gaps, or
  2. Adding a has_gaps field to EvaluationResult and documenting clearly that is_compliant=True does not mean "access is safe" when gaps exist.


# Dataset not registered in Fides — no purpose metadata available
# Dataset not registered or has no purposes — record as gap
if ds_purposes is None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two distinct cases mapped to the same UNCONFIGURED_DATASET gap type

Lines 128–136 (dataset not in registry) and lines 141–150 (dataset registered but no purposes) both emit GapType.UNCONFIGURED_DATASET. The gap consumer must diff on reason string to distinguish them, which is brittle.

Consider splitting into UNREGISTERED_DATASET vs UNCONFIGURED_DATASET (or adding a dedicated GapType value), so callers can handle each case programmatically without parsing the reason string.

if table_ref.schema in self._mappings:
return self._mappings[table_ref.schema]

return table_ref.schema
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent fallback to table_ref.schema will never return None

The return type is str | None, but this function always returns a non-None value: when no explicit mapping is found it silently falls back to the raw schema name as a fides key. This means:

  1. Callers in service.py (if fides_key:) will always treat every table as resolved, never producing an UNCONFIGURED_DATASET gap from the resolver stage.
  2. There is no indication when a guess is being used vs. an intentional mapping.

Consider returning None when no mapping is found and letting the caller decide whether to fall back to table_ref.schema. At minimum, add a logger.debug or logger.warning here to make the implicit fallback visible.

entries.append(("external_id", entity.external_id))
# Index each scope key individually for filtering
for key, value in sorted(entity.scope.items()):
entries.append(("scope", f"{key}={value}"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index key collision when scope values contain =

The index entry is formatted as f"{key}={value}", so a scope dict like {"role": "arn:aws:iam::123456789:role/admin=superuser"} would produce the index value "role=arn:aws:iam::123456789:role/admin=superuser". A lookup for scope["role"] == "arn:aws:iam::..." needs to parse this correctly, and anything splitting on the first = would work, but splitting on any = would not.

The RedisIdentityResolver currently does get_by_index("scope", f"email={identity}") which is a prefix-aware lookup — but this pattern will silently break for scope values that happen to contain = characters (e.g. URLs, ARNs, JWT subjects).

Consider using a delimiter that cannot appear in realistic values (e.g., \x00 or |), or URL-encoding the value portion before storing.

description=obj.description,
type=obj.type,
external_id=obj.external_id,
scope={},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external_id is silently dropped when building from a DataConsumer ORM object

from_consumer() sets scope={} unconditionally. If the underlying DataConsumer model still has an external_id column, its value is discarded here with no migration or fallback. Any existing consumers identified by external_id (e.g., group names, role IDs) will silently lose their identity mapping after this change, causing future queries from those consumers to produce UNRESOLVED_IDENTITY gaps instead of resolving correctly.

If DataConsumer.external_id still exists on the ORM model, consider seeding scope={"external_id": obj.external_id} (or a more domain-appropriate key) as a migration bridge, at least until the companion database migration removes the column.

SNOWFLAKE_DATABASE_ROLE = "snowflake_database_role"
SNOWFLAKE_SERVICE_USER = "snowflake_service_user"
SYSTEM = "system"
UNRESOLVED = "unresolved"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConsumerType enum is defined but never used by any entity construction code

from_consumer() sets type=obj.type (a raw string from the ORM), and from_system() sets type="system" (a string literal). Neither uses ConsumerType.GROUP, ConsumerType.IAM_ROLE, etc. As a result:

  • The enum values are never validated against
  • ConsumerType.GOOGLE_GROUP, ConsumerType.IAM_ROLE, ConsumerType.SNOWFLAKE_ROLE, etc. are defined but unreachable from existing construction paths
  • Type checkers won't catch type="iam_role" vs ConsumerType.IAM_ROLE mismatches

Either make DataConsumerEntity.type a ConsumerType field (with appropriate coercion in from_consumer/from_dict), or remove the enum until it is actually enforced. An unused enum in a shared types module creates misleading signals about what values are valid.

self._by_email[c.contact_email] = c
scope_email = c.scope.get("email")
if scope_email:
self._by_scope_email[scope_email] = c
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BasicIdentityResolver only indexes scope["email"] — all other scope keys are ignored

The new scope dict is intended to be a generic map of platform identifiers (group_email, role, project_id, etc.), but the index built here only extracts scope.get("email"). A consumer with scope={"group_email": "analytics@company.com", "domain": "company.com"} will never be resolved by this resolver unless one of the keys happens to be "email".

The same limitation applies to RedisIdentityResolver, which queries get_by_index("scope", f"email={identity}") — it will only match consumers whose scope contains the key "email".

The README's extension points table promises that IdentityResolver implementations resolve any platform identity. The current OSS implementation does not deliver on this for non-email scope keys. At minimum, document the constraint; ideally, iterate over all scope values (or allow configuring which scope key to match on).

Adrian Galvan and others added 4 commits April 9, 2026 12:25
… no purposes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consumer types are handled dynamically by fidesplus's
ConsumerTypeDescriptor provider pattern, making this enum dead code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant