Add ACL ownership augmentation and AND-slice semantics to C++ runtime#704
Add ACL ownership augmentation and AND-slice semantics to C++ runtime#704simonhollis wants to merge 2 commits intofacebookincubator:mainfrom
Conversation
Summary: # What has changed Thrift interfaces gain new fields for ACL support across three areas: 1. Server config gets a new ACLPolicy struct and optional acl_policy field 2. Write path: Batch and SendJsonBatch structs gain acl_config maps (path → group name lists) 3. Query path: UserQueryClientInfo gains acl_group_names field for string-based group names with server-side resolution 4. New ACLConfigConflict exception for detecting conflicting ACL mappings during writes # Why the change? ACL enforcement requires passing ACL configuration data between indexers, the write server, and the query server via thrift. Indexers must attach per-path ACL group assignments to batches during writes, and clients must pass their ACL group memberships at query time so the server can filter results. Group names are resolved server-side to numeric IDs using the DB's name-to-ID mapping. # Most important changes - `server_config.thrift` - New ACLPolicy struct with enabled_repos list and acl_policy field on Config - `glean.thrift` - acl_config field on Batch struct (map<string, list<string>>) - `glean.thrift` - New ACLConfigConflict exception with conflicting_key/value diagnostics - `glean.thrift` - acl_group_names field on UserQueryClientInfo for server-side name resolution - `glean.thrift` - acl_config field on SendJsonBatch struct Differential Revision: D97111677
Summary: # What has changed Core C++ runtime changes to support ACL enforcement via the ownership infrastructure: 1. New `acl.h/acl.cpp` with `augmentOwnershipWithACL()` — walks computed ownership facts and ANDs each fact's existing owner Uset with a CNF Uset built from the unit's ACL group assignments (OR within levels, AND across levels). 2. `slice.h` visibility semantics changed from first-match-returns to AND-all: when multiple slices cover the same UsetId, ALL must agree the fact is visible. 3. `setu32.h` moves `append(const_iterator, const_iterator)` from private to public, adds `from(std::set<uint32_t>)` factory, and changes `foreach` iterator to const. 4. `serialize.h` adds StringList, StringMap variant types to support the new thrift acl_config map<string, list<string>> serialization. # Why the change? The ownership system already tracks which units own which facts via Usets. ACL enforcement piggybacks on this: each unit gets an ACL CNF constraint (AND of OR-groups), which is ANDed with the unit's existing ownership. At query time, the user's groups form a slice, and the AND-slice semantics ensure a fact is only visible if BOTH the ownership slice AND the ACL slice agree. This avoids a separate ACL-filtering pass and reuses the existing ownership visibility infrastructure. # Most important changes - `acl.cpp:56-187` - `augmentOwnershipWithACL()` implementation: builds UnitId→CNF map, resolves promoted Usets to leaf UnitIds, walks facts with memoization cache - `acl.cpp:24-47` - `makeOrSet`/`makeAndSet` helpers for CNF construction - `acl.h:23-51` - New ACLConfigEntry, UnitACLAssignment structs - `slice.h:83-97` - **CRITICAL**: Changed from first-match return to AND-all semantics. Old code returned immediately on first slice match; new code requires ALL covering slices to return visible=true. Non-overlapping slices behave identically to before. - `setu32.h:474` - `append(const_iterator, const_iterator)` moved to public interface - `serialize.h:213-214` - New StringList/StringMap types added to Field variant - `serialize.h:307-315` - String and StringMap serialization branches added Differential Revision: D97111684
✅ Deploy Preview for fb-oss-glean canceled.
|
|
@simonhollis has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97111684. |
|
Putting this PR up to solicit feedback on a new ACL functionality for glean that records directories files are in when indexed and guards visibility of facts on query based on ACL group identities. The purpose of this PR is not to ask to merge code, but to expose current ideas in |
There was a problem hiding this comment.
On the slice changes
OK - but see comment
On the overall approach of augmentOwnershipWithACL
What you have looks basically sound (modulo a few comments I made inline).
If it were me, I would try to move as much of this out of the server and into the indexer though - ideally Glean should know nothing about ACLs, and it should provide generic functionality that could be used to support multiple different ACL policies. Here we're baking in a policy that is file/directory prefix-based, which seems too specific. However, this is not a hill I will die on, just a matter of trying to design something as generic as possible.
| 41: optional string db_create_storage; | ||
|
|
||
| // ACL filtering configuration. When enabled, query results are filtered | ||
| // based on user's group membership. Requires both JustKnobs |
There was a problem hiding this comment.
note that JustKnobs is not open-source
There was a problem hiding this comment.
Thanks for pointing out. The plan here is to use #ifdef to have different paths for OSS (no JustKnobs check) and internal (with JustKnobs check)
There was a problem hiding this comment.
As a general point, we try to avoid #ifdef in the codebase as far as possible because it's a maintenance headache. There are a few alternatives, e.g. make an API that has a stub implementation in the OSS build.
There was a problem hiding this comment.
Will use the facebook/ vs github/ directory pattern with facebook/Glean/Query/ACL.hs having the internal implementation and github/Glean/Query/ACL.hs being a stub
| 1: DatabaseStatus status; | ||
| } | ||
|
|
||
| // Conflict when merging ACL configurations |
There was a problem hiding this comment.
I'm pretty suspicious of this, as we discussed: the ACL config should be fixed for a given revision of the repo. Also I don't think the server has any business enforcing this.
There was a problem hiding this comment.
Yes. We can remove this conflict resolution logic.
| /// The outer vector represents levels that are ANDed together. | ||
| struct UnitACLAssignment { | ||
| UnitId unitId; | ||
| std::vector<std::vector<UsetId>> levels; // outer=AND, inner=OR |
There was a problem hiding this comment.
as above, why not construct the UsetId for the expression and put it here instead of the vector-of-vectors?
There was a problem hiding this comment.
Here, I think your suggestion makes sense. There's no inheritance needed per-file.
There was a problem hiding this comment.
Maybe doing something else is more complex. Here's an attempt: #705
| /// All groups at the same prefix level are ORed together. | ||
| struct ACLConfigEntry { | ||
| std::string prefix; | ||
| std::vector<UsetId> groupUsetIds; |
There was a problem hiding this comment.
why not construct the UsetId for A || B || ... and put it here instead of the vector? That would save repeatedly constructing it, no?
There was a problem hiding this comment.
This allows caching of the recursive construction. example:
dir: foo/ -> Groups [a,b]
dir: foo/bar/ -> Groups [c, d, e]
foo/bar/ also needs to inherit [a, b]
Both directories need assigned UsetIds. So, the Group->UsetID map is:
[a,b] -> UsetID(1)
[c, d, e] -> UsetID(2)
and dir to usetIDs:
foo/ -> (1)
foo/bar/ -> (1, 2), which we construct as UsetID('foo/') Union UsetID("'foo/bar").
Not too bad in this example, but as directories get deeper, the efficiency of creation is more significant.
| // and promoted. The old code checked (ownerUsetId < firstSetId) which | ||
| // was NEVER true, so no facts were ever augmented. We must resolve | ||
| // promoted sets to their leaf UnitIds to find ACL matches. | ||
| folly::F14FastMap<UsetId, Uset*> idToUset; |
There was a problem hiding this comment.
you only need this because ComputedOwnership has an Id to UsetId mapping and you need to turn the UsetId into a Uset*, right?
Isn't there a problem with stacked DBs here? The UsetId might be in the base DB, so we don't have a Uset* for it. But in that case we should have already added ACLs to it, so we should be OK? Need to handle this case anyway.
Building this mapping seems pretty ugly and potentially expensive. You could imagine either storing the mapping in the ComputedOwnership, or keeping it in the Usets itself (depending on how expensive that is).
There was a problem hiding this comment.
For the incremental case, we ensure that stacked UsetIds are > any used ID in the base, so they don't conflict. Then, we reverse-walk the stack to lookup USets, so we eventually cover all the mappings from the stack.
Specifically here, a UsetId could refer to a Uset from the base DB that isn't in this ComputedOwnership container. However, since the stacked DB only has stacked DB facts in it, we should be OK.
I'm thinking over the performance question.
There was a problem hiding this comment.
We can have a fact with an owner UsetId that refers to a set in the base DB, so you won't find it in the mapping, I think this ends up working out OK with the code as it is but I'm not sure - at the least I would suggest testing it and adding a comment.
There was a problem hiding this comment.
OK. Adding testcases and some logging to expose when a computed ownership UsetId belongs to a base DB's Usets container not the stacked one.
| UsetId originalOwner = ownerUsetId; | ||
|
|
||
| // Collect unique ACL CNF IDs for leaf UnitIds in this owner's set. | ||
| std::set<uint32_t> aclCnfIds; |
There was a problem hiding this comment.
why is this unit32_t? Shouldn't it be UsetId?
There was a problem hiding this comment.
They're type aliases, but agree UsetID is more descriptive; will change.
| std::set<uint32_t> aclCnfIds; | ||
|
|
||
| if (ownerUsetId < firstSetId) { | ||
| // Direct UnitId (rare after computeOwnership but handle it) |
There was a problem hiding this comment.
I would expect this to be quite common because most facts are owned by a single unit, so I'm surprised. Why do you say it's rare?
There was a problem hiding this comment.
Actually, it should be an error case. Since we should by now have created an OR set, there shouldn't be any bare UnitIds here. However, you're right that in many cases, the size of the OR set will be length=1
There was a problem hiding this comment.
if ownerUsetId < firstSetId and this is a stacked DB, then the owner could be a set in the base DB.
There was a problem hiding this comment.
Adding a test case, logging and assertion for this too.
| // Name of program making the query. | ||
| 4: optional list<string> acl_group_names; | ||
| // ACL group names for query-time filtering. | ||
| // Names are resolved to IDs server-side using the DB's name-to-ID mapping. |
There was a problem hiding this comment.
I think this line is redundant, IDs are not a concept the user of this API needs to care about.
There was a problem hiding this comment.
This is needed since remote Glean clients need a way of passing this information to queries.
There was a problem hiding this comment.
sorry for not being clear, I was referring to just the part of the comment beginning "Names are resolved..", it's redundant detail from the user's point of view
| if (usetid == INVALID_USET) { | ||
| return false; | ||
| } | ||
| // AND semantics: when multiple slices cover the same UsetId (e.g., |
There was a problem hiding this comment.
OK - but you could optimise this. If the slices are sorted, then we can stop as soon as we find an out of range slice. Might be worth it since this is in the inner loop for queries.
There was a problem hiding this comment.
Good idea. I'll have a look at that.
There was a problem hiding this comment.
I did some analysis here, and I think this is cheap enough already, since the loop is already small and tight. Maintaining the sort may outweigh the lookup improvement in the average case. Performance may degrade once we get to higher stacks, and could make this worthwhile, but we don't currently construct those. Putting this in the won't fix bucket.
I understand the idea here to abstract out of the core and treat ACLs as just more generic ownership units supplied by indexers. It could be done, but would still need server changes, as well as some implicit contracts between indexers and write server; and query server and query agent. I dislike implicit contracts from a safety point of view. More fundamentally, server changes wouldn't be eliminated entirely due to the fundamental mismatch of how we currently use all-OR semantics for ownership propagation and merging, compared with the OR-AND semantics needed for ACLs with hierarchy. The expression computation is just as important when we think about derived facts, and relying on the indexer to perform derivation propagation of ACL names just adds a place where things can go wrong. There's an additional wrinkle when dealing with both ACL'd and stacked DBs where a unit has changed ACL permissions across the base + stacked DB. I think this could be worked around, but to me just highlights the number of corner cases that are much more easily addressed by riding on the existing server ownership logic, rather than putting the code indexer-side. So, I think the right thing to do here is to keep the code in the server, but as modular as possible. If we really want a version where the core is kept clean, it could probably be conditionally compiled in, but I'd start by including it. In terms of policy question, the code being added here is in new files, which makes it modular enough that a new policy could add a new calculation module or override the default functionality. This is already a long response, but I may as well list some other minor advantages of having ACL-specific code in the server:
|
|
Hi @simonhollis - I do think it's possible to achieve a design here that embeds none of the ACL policy knowledge in the server and has no implicit contracts. It's easy to see that the current design is not quite right, because it breaks abstraction boundaries: (1) it adds meaning to units (they should be arbitrary strings from the server's point of view) and (2) if the client wants to use a different ACL policy they have to change Glean itself. Using a different ACL policy should be a matter of changing the indexer and query-time ACLs, both things that the client controls. But anyway, we do need to resolve some technical things:
Could you give an example here? I thought we had established that since the ownership system supports arbitrary expressions involving AND/OR that it could express whatever ACL logic you wanted. The clue is that ownership propagation and derivation are policy-free: they do only what is required to ensure the integrity of the DB. The ownership of a fact A that is referenced by B and C must be (as I mentioned earlier, whether to put the ACL logic in the server is not a hill I'm going to die on. But to that I will add that DB integrity is a hill I will die on!) |
Summary:
What has changed
Core C++ runtime changes to support ACL enforcement via the ownership infrastructure:
acl.h/acl.cppwithaugmentOwnershipWithACL()— walks computed ownershipfacts and ANDs each fact's existing owner Uset with a CNF Uset built from the
unit's ACL group assignments (OR within levels, AND across levels).
slice.hvisibility semantics changed from first-match-returns to AND-all:when multiple slices cover the same UsetId, ALL must agree the fact is visible.
setu32.hmovesappend(const_iterator, const_iterator)from private to public,adds
from(std::set<uint32_t>)factory, and changesforeachiterator to const.serialize.hadds StringList, StringMap variant types to support the new thriftacl_config map<string, list> serialization.
Why the change?
The ownership system already tracks which units own which facts via Usets. ACL
enforcement piggybacks on this: each unit gets an ACL CNF constraint (AND of OR-groups),
which is ANDed with the unit's existing ownership. At query time, the user's groups
form a slice, and the AND-slice semantics ensure a fact is only visible if BOTH
the ownership slice AND the ACL slice agree. This avoids a separate ACL-filtering
pass and reuses the existing ownership visibility infrastructure.
Most important changes
acl.cpp:56-187-augmentOwnershipWithACL()implementation: builds UnitId→CNF map,resolves promoted Usets to leaf UnitIds, walks facts with memoization cache
acl.cpp:24-47-makeOrSet/makeAndSethelpers for CNF constructionacl.h:23-51- New ACLConfigEntry, UnitACLAssignment structsslice.h:83-97- CRITICAL: Changed from first-match return to AND-all semantics.Old code returned immediately on first slice match; new code requires ALL covering
slices to return visible=true. Non-overlapping slices behave identically to before.
setu32.h:474-append(const_iterator, const_iterator)moved to public interfaceserialize.h:213-214- New StringList/StringMap types added to Field variantserialize.h:307-315- String and StringMap serialization branches addedDifferential Revision: D97111684