[1 of 2] ENG-3157: Platform identity resolution — OSS type changes for PBAC#7807
[1 of 2] ENG-3157: Platform identity resolution — OSS type changes for PBAC#7807
Conversation
…jectable resolver - Replace QueryIdentity dataclass with plain str for RawQueryLogEntry.identity - Rename TableRef fields: project→catalog, dataset→schema (standard SQL terminology) - Make InProcessPBACEvaluationService accept injectable identity_resolver parameter - Update IdentityResolver Protocol signature to accept str - Update BasicIdentityResolver, RedisIdentityResolver, DatasetResolver for new field names - Add connection_config_key to fides OSS DataConsumerEntity (shared Redis storage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #7807 +/- ##
==========================================
- Coverage 85.07% 83.04% -2.04%
==========================================
Files 627 627
Lines 40780 40763 -17
Branches 4742 4736 -6
==========================================
- Hits 34694 33851 -843
- Misses 5017 5823 +806
- Partials 1069 1089 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add EvaluationGap type to types.py (gap_type, identifier, dataset_key, reason) - Add gaps field to EvaluationResult - Update evaluate_access to return EvaluationOutput (result + gaps) - Unresolved identity → gap (was: violation) - Unconfigured dataset → gap (was: silently passing) - Violations are now strictly purpose-mismatch issues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete ResolvedConsumer — resolvers return DataConsumerEntity directly - EvaluationResult.consumer is now DataConsumerEntity | None - Add EvaluationResult.identity field for the unresolved case - Add GapType and ConsumerType enums (replace magic strings) - Consolidate AccessGap into EvaluationGap (remove duplicate type) - evaluate_access uses EvaluationGap with GapType enum directly - BasicIdentityResolver works with DataConsumerEntity - Remove _build_resolved_consumer from service (no conversion needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move DatasetResolver from identity/resolver.py to dataset/resolver.py - Make DatasetResolver injectable via InProcessPBACEvaluationService constructor - Remove build_identity_resolver factory (unused) - Clean up identity/resolver.py to only contain RedisIdentityResolver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers evaluation flow, three outcomes (compliant/violation/gap), extension points with defaults, type inventory, identity resolution, and package structure. No fidesplus concepts or file listings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
87f0cd0 to
6ffe02f
Compare
…nsumerEntity Consumers are now identified by `type` + `scope` (a dict of platform-specific identifiers) instead of `external_id` + `connection_config_key`. This enables cross-platform identity resolution where the same consumer (e.g., a Google Group) works across multiple data platform connections without duplication. The scope dict always includes the full namespace chain (e.g., domain + group_email for Google Groups, domain + project_id + role for GCP IAM roles). Each scope key is individually indexed in Redis for efficient filtering. Both BasicIdentityResolver and RedisIdentityResolver updated to match on scope email instead of external_id. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review — ENG-3157: Platform identity resolution (OSS type changes)
The overall direction is solid: collapsing user_email/principal_subject into a plain identity: str, introducing the scope dict for platform-specific identifiers, and the gaps/violations split are all good design moves that will make the fidesplus extension points cleaner. The TableRef rename to standard SQL catalog terminology (catalog/schema) is also an improvement.
That said, there are a few issues worth addressing before merge:
Issues requiring attention
1. is_compliant=True for unresolved identities (evaluate.py:69)
The most significant behavioral change: an unresolved consumer now returns is_compliant=True with gaps, rather than is_compliant=False with violations. Downstream code checking is_compliant to gate access or fire alerts will now silently pass unresolved users. This needs either a clear contract change (documented and agreed upon) or is_compliant=False should be returned when gaps exist.
2. DatasetResolver silent fallback to table_ref.schema (dataset/resolver.py:35)
The resolver always returns a non-None value, so the if fides_key: guard in service.py is always truthy. Every table gets "resolved" — via a guess — with no logging. This masks configuration errors and makes the ds_purposes is None branch in _check_access unreachable (see also inline comment on service.py:174).
3. Scope index key collision (consumers/repository.py:45)
Using f"{key}={value}" as a Redis index value is ambiguous when values contain = (AWS ARNs, URLs, encoded tokens). This will silently break lookups for affected consumers.
4. Silent loss of external_id (consumers/entities.py:80)
from_consumer() sets scope={}, discarding any existing external_id value on the ORM model. Consumers previously identified by external_id will silently become unresolvable. If the ORM column still exists, bridge it into scope as part of this change.
5. BasicIdentityResolver only matches scope["email"] (identity/basic.py:45)
The scope dict is designed to hold arbitrary platform identifiers, but both the BasicIdentityResolver and RedisIdentityResolver only ever look up scope["email"]. Consumers with any other scope key (group name, IAM role, service account, etc.) cannot be resolved via OSS code, which contradicts the extension-point design in the README.
Minor issues
ConsumerTypeenum unused (types.py:142-153): defined with platform-specific values that are never set on any entity. Either enforce it onDataConsumerEntity.typeor remove it until it is used.- Two distinct dataset gap cases share one
GapType(evaluate.py:128and141): "not registered" vs "registered but no purposes" are currently distinguishable only by parsing thereasonstring, which is fragile for callers. - Dead code in
_build_dataset_purposes(service.py:174): always produces aDatasetPurposes(with empty keys) for every resolved dataset, making theds_purposes is Nonebranch in_check_accessunreachable from the in-process service path.
| return EvaluationOutput( | ||
| result=ValidationResult( | ||
| violations=[], | ||
| is_compliant=True, |
There was a problem hiding this comment.
Compliance semantics change: unresolved consumer now returns is_compliant=True
When a consumer has no declared purposes (i.e., an unresolved identity), this path returns is_compliant=True with a list of gaps. Previously this resulted in violations.
This is a meaningful behavioral shift: any downstream code that checks result.is_compliant to gate access or trigger alerts will now treat unresolved identities as passing, not failing. If gaps are not separately monitored and alarmed on, this could silently mask unauthorized access by unknown users.
Consider either:
- Returning
is_compliant=Falsewhen there are identity gaps, or - Adding a
has_gapsfield toEvaluationResultand documenting clearly thatis_compliant=Truedoes not mean "access is safe" when gaps exist.
|
|
||
| # Dataset not registered in Fides — no purpose metadata available | ||
| # Dataset not registered or has no purposes — record as gap | ||
| if ds_purposes is None: |
There was a problem hiding this comment.
Two distinct cases mapped to the same UNCONFIGURED_DATASET gap type
Lines 128–136 (dataset not in registry) and lines 141–150 (dataset registered but no purposes) both emit GapType.UNCONFIGURED_DATASET. The gap consumer must diff on reason string to distinguish them, which is brittle.
Consider splitting into UNREGISTERED_DATASET vs UNCONFIGURED_DATASET (or adding a dedicated GapType value), so callers can handle each case programmatically without parsing the reason string.
| if table_ref.schema in self._mappings: | ||
| return self._mappings[table_ref.schema] | ||
|
|
||
| return table_ref.schema |
There was a problem hiding this comment.
Silent fallback to table_ref.schema will never return None
The return type is str | None, but this function always returns a non-None value: when no explicit mapping is found it silently falls back to the raw schema name as a fides key. This means:
- Callers in
service.py(if fides_key:) will always treat every table as resolved, never producing anUNCONFIGURED_DATASETgap from the resolver stage. - There is no indication when a guess is being used vs. an intentional mapping.
Consider returning None when no mapping is found and letting the caller decide whether to fall back to table_ref.schema. At minimum, add a logger.debug or logger.warning here to make the implicit fallback visible.
| entries.append(("external_id", entity.external_id)) | ||
| # Index each scope key individually for filtering | ||
| for key, value in sorted(entity.scope.items()): | ||
| entries.append(("scope", f"{key}={value}")) |
There was a problem hiding this comment.
Index key collision when scope values contain =
The index entry is formatted as f"{key}={value}", so a scope dict like {"role": "arn:aws:iam::123456789:role/admin=superuser"} would produce the index value "role=arn:aws:iam::123456789:role/admin=superuser". A lookup for scope["role"] == "arn:aws:iam::..." needs to parse this correctly, and anything splitting on the first = would work, but splitting on any = would not.
The RedisIdentityResolver currently does get_by_index("scope", f"email={identity}") which is a prefix-aware lookup — but this pattern will silently break for scope values that happen to contain = characters (e.g. URLs, ARNs, JWT subjects).
Consider using a delimiter that cannot appear in realistic values (e.g., \x00 or |), or URL-encoding the value portion before storing.
| description=obj.description, | ||
| type=obj.type, | ||
| external_id=obj.external_id, | ||
| scope={}, |
There was a problem hiding this comment.
external_id is silently dropped when building from a DataConsumer ORM object
from_consumer() sets scope={} unconditionally. If the underlying DataConsumer model still has an external_id column, its value is discarded here with no migration or fallback. Any existing consumers identified by external_id (e.g., group names, role IDs) will silently lose their identity mapping after this change, causing future queries from those consumers to produce UNRESOLVED_IDENTITY gaps instead of resolving correctly.
If DataConsumer.external_id still exists on the ORM model, consider seeding scope={"external_id": obj.external_id} (or a more domain-appropriate key) as a migration bridge, at least until the companion database migration removes the column.
src/fides/service/pbac/types.py
Outdated
| SNOWFLAKE_DATABASE_ROLE = "snowflake_database_role" | ||
| SNOWFLAKE_SERVICE_USER = "snowflake_service_user" | ||
| SYSTEM = "system" | ||
| UNRESOLVED = "unresolved" |
There was a problem hiding this comment.
ConsumerType enum is defined but never used by any entity construction code
from_consumer() sets type=obj.type (a raw string from the ORM), and from_system() sets type="system" (a string literal). Neither uses ConsumerType.GROUP, ConsumerType.IAM_ROLE, etc. As a result:
- The enum values are never validated against
ConsumerType.GOOGLE_GROUP,ConsumerType.IAM_ROLE,ConsumerType.SNOWFLAKE_ROLE, etc. are defined but unreachable from existing construction paths- Type checkers won't catch
type="iam_role"vsConsumerType.IAM_ROLEmismatches
Either make DataConsumerEntity.type a ConsumerType field (with appropriate coercion in from_consumer/from_dict), or remove the enum until it is actually enforced. An unused enum in a shared types module creates misleading signals about what values are valid.
| self._by_email[c.contact_email] = c | ||
| scope_email = c.scope.get("email") | ||
| if scope_email: | ||
| self._by_scope_email[scope_email] = c |
There was a problem hiding this comment.
BasicIdentityResolver only indexes scope["email"] — all other scope keys are ignored
The new scope dict is intended to be a generic map of platform identifiers (group_email, role, project_id, etc.), but the index built here only extracts scope.get("email"). A consumer with scope={"group_email": "analytics@company.com", "domain": "company.com"} will never be resolved by this resolver unless one of the keys happens to be "email".
The same limitation applies to RedisIdentityResolver, which queries get_by_index("scope", f"email={identity}") — it will only match consumers whose scope contains the key "email".
The README's extension points table promises that IdentityResolver implementations resolve any platform identity. The current OSS implementation does not deliver on this for non-email scope keys. At minimum, document the constraint; ideally, iterate over all scope values (or allow configuring which scope key to match on).
… no purposes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consumer types are handled dynamically by fidesplus's ConsumerTypeDescriptor provider pattern, making this enum dead code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ticket ENG-3157
Description Of Changes
OSS-side changes to support platform identity resolution in Fidesplus PBAC. These changes make the type system cross-platform ready and allow fidesplus to inject platform-specific identity resolvers.
Breaking changes (PBAC is in active development, no backward compat needed):
RawQueryLogEntry.identityis now a plainstr(wasuser_email: str+principal_subject: str | None)TableReffields renamed:project→catalog,dataset→schema(standard SQL catalog terminology)IdentityResolverProtocol signature changed from(user_email, principal_subject)to(identity: str)Code Changes
types.py— RemoveQueryIdentitydataclass, changeRawQueryLogEntry.identitytostr, renameTableReffieldsidentity/interface.py— UpdateIdentityResolverProtocol to acceptstridentity/basic.py— SimplifyBasicIdentityResolver.resolve()to work with plain stringidentity/resolver.py— SimplifyRedisIdentityResolver.resolve(), updateDatasetResolverforschemafieldservice.py— Add optionalidentity_resolverparam toInProcessPBACEvaluationService.__init__sql_parser.py— Pass identity as plain stringconsumers/entities.py— Addconnection_config_keyfield (shared Redis storage with fidesplus)Steps to Confirm
docker exec fidesplus-slim bash -c "pytest --no-cov tests/ops/service/pbac/ -v"— should be 139 passingpython demo/pbac_demo.py— should complete with 3 violations detectedBasicIdentityResolverstill resolves by email and external_idPre-Merge Checklist
CHANGELOG.mdupdated🤖 Generated with Claude Code