Skip to content

Conversation

@alxndrsn
Copy link
Contributor

@alxndrsn alxndrsn commented Aug 14, 2025

Ref getodk/central#1217

What has been done to verify that this works as intended?

Tests etc.

Why is this the best possible solution? Were any other approaches considered?

There are other suggestions for extending/changing this including:

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

  • maybe a slow migration
  • maybe cases where non-UUIDs have been used; hopefully caught in QA

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

I don't think so.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@brontolosone
Copy link
Contributor

What would be nice is to have DB functions species_from_uuid(input uuid) -> text and uuid_from_species(input text) -> uuid to use for DB queries and interactive DB work.

@brontolosone
Copy link
Contributor

Again I've run into a situation where I have to refer to actees and find it wasteful to have to 2x-bloat a table and 2x-bloat an index because of the stringery...
I think this PR has value!

@matthew-white
Copy link
Member

matthew-white commented Nov 2, 2025

I think this PR has value!

I agree that it's reasonable to use the uuid type for actee IDs, I support that. 👍 It's hard for me to know how much improved performance to expect. But we know there should be some, and the changes in this PR aren't too big. There's a migration and a change to queries that reference actees that are species or *.

I also wanted to link this PR to the discussion at getodk/central#1217. In particular, I still think that we could take this opportunity to do away with species-level actees entirely. Instead of reifying them, we could get rid of them. We assign actors access to specific actees (like a specific project), and we assign actors access to *, but I don't know of any cases where we assign actors access to species-level actees. Actors end up with access to species-level actees only insomuch as they are given access to *.

In other words, I think we could make do with just one special UUID (for *) instead of one for each species.

Concretely, I'm proposing that we change lines like

auth.canOrReject('user.create', User.species)

to:

auth.canOrReject('user.create', '*')

I really don't think this would be a huge code change. I'd be happy to create a draft PR along those lines if there's interest.

@matthew-white
Copy link
Member

And just to be clear, I'm not suggesting that we get rid of the concept of species entirely. For example, the actees.species column can stay put. I just think that we don't need each species to correspond to a row of the actees table.

@matthew-white
Copy link
Member

Now that @brontolosone has added this PR to the project board, I'll go ahead and assign @alxndrsn just so it's clear from a glance who's working on it.

@alxndrsn
Copy link
Contributor Author

alxndrsn commented Nov 2, 2025

I don't know of any cases where we assign actors access to species-level actees.

Are there any immediate plans to use more fine-grained assignment?

It seems reasonable that there might be different permissions for e.g. creating single-use links vs system settings.

Now the system's in place, maintaining it doesn't seem like a big cost, whereas re-adding it later might be a lot of work.

If it's costing effort to maintain, and isn't going to get used, let's get rid.

alxndrsn pushed a commit to alxndrsn/odk-central-backend that referenced this pull request Nov 2, 2025
By changing `false` to `true`, a list of database foreign keys and their corresponding reverse indexes will be printed.

Implemented while try to understand why getodk#1589 required _some_ new "acteeId" reverse-indexes, but not others.

Example output:

```sh
  database indexes

┌── Foreign Key Reverse Indexes ───────┐
┌─────────┬──────────────────────────────────────┬─────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────┐
│ (index) │ FK table                             │ foreign key                                                 │ database index                                     │
├─────────┼──────────────────────────────────────┼─────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤
│ 0       │ 'actors'                             │ 'actors_acteeid_foreign'                                    │ 'idx_fk_actors_acteeid'                            │
│ 1       │ 'assignments'                        │ 'assignments_acteeid_foreign'                               │ 'idx_fk_assignments_acteeid'                       │
│ 2       │ 'assignments'                        │ 'assignments_actorid_foreign'                               │ 'assignments_pkey'                                 │
│ 3       │ 'assignments'                        │ 'assignments_roleid_foreign'                                │ 'idx_fk_assignments_roleid'                        │
│ 4       │ 'audits'                             │ 'audits_acteeid_foreign'                                    │ 'audits_acteeid_loggedat_index'                    │
│ 5       │ 'audits'                             │ 'audits_actorid_foreign'                                    │ 'audits_actorid_loggedat_index'                    │
│ 6       │ 'client_audits'                      │ 'client_audits_blobid_foreign'                              │ 'idx_fk_client_audits_blobid'                      │
│ 7       │ 'comments'                           │ 'comments_actorid_foreign'                                  │ 'idx_fk_comments_actorid'                          │
│ 8       │ 'comments'                           │ 'comments_submissionid_foreign'                             │ 'comments_submissionid_index'                      │
│ 9       │ 'dataset_form_defs'                  │ 'dataset_form_defs_datasetid_foreign'                       │ 'dataset_form_defs_datasetid_formdefid_unique'     │
│ 10      │ 'dataset_form_defs'                  │ 'dataset_form_defs_formdefid_foreign'                       │ 'idx_fk_dataset_form_defs_formdefid'               │
│ 11      │ 'datasets'                           │ 'datasets_projectid_foreign'                                │ 'idx_fk_datasets_projectid'                        │
│ 12      │ 'ds_properties'                      │ 'ds_properties_datasetid_foreign'                           │ 'idx_fk_ds_properties_datasetid'                   │
│ 13      │ 'ds_property_fields'                 │ 'ds_property_fields_dspropertyid_foreign'                   │ 'ds_property_fields_dspropertyid_formdefid_unique' │
│ 14      │ 'entities'                           │ 'entities_createdby_foreign'                                │ 'idx_fk_entities_creatorid'                        │
│ 15      │ 'entities'                           │ 'entities_datasetid_foreign'                                │ 'entities_datasetid_createdat_id_index'            │
│ 16      │ 'entity_def_sources'                 │ 'entity_def_sources_auditid_foreign'                        │ 'idx_fk_entity_def_sources_auditid'                │
│ 17      │ 'entity_def_sources'                 │ 'entity_def_sources_submissiondefid_foreign'                │ 'idx_fk_entity_def_sources_submissiondefid'        │
│ 18      │ 'entity_defs'                        │ 'entity_defs_entityid_foreign'                              │ 'entity_defs_entityid_current_index'               │
│ 19      │ 'entity_defs'                        │ 'entity_defs_sourceid_foreign'                              │ 'entity_defs_sourceid_index'                       │
│ 20      │ 'entity_submission_backlog'          │ 'fk_audit_id'                                               │ 'idx_fk_entity_submission_backlog_auditid'         │
│ 21      │ 'entity_submission_backlog'          │ 'fk_submission_defs'                                        │ 'idx_fk_entity_submission_backlog_submissiondefid' │
│ 22      │ 'entity_submission_backlog'          │ 'fk_submissions'                                            │ 'idx_fk_entity_submission_backlog_submissionid'    │
│ 23      │ 'field_keys'                         │ 'field_keys_actorid_foreign'                                │ 'field_keys_pkey'                                  │
│ 24      │ 'field_keys'                         │ 'field_keys_createdby_foreign'                              │ 'idx_fk_field_keys_createdby'                      │
│ 25      │ 'field_keys'                         │ 'field_keys_projectid_foreign'                              │ 'idx_fk_field_keys_projectid'                      │
│ 26      │ 'form_attachments'                   │ 'form_attachments_blobid_foreign'                           │ 'idx_fk_form_attachments_blobid'                   │
│ 27      │ 'form_attachments'                   │ 'form_attachments_datasetid_foreign'                        │ 'idx_fk_form_attachments_datasetid'                │
│ 28      │ 'form_attachments'                   │ 'form_attachments_formdefid_foreign'                        │ 'form_attachments_pkey'                            │
│ 29      │ 'form_attachments'                   │ 'form_attachments_formid_foreign'                           │ 'form_attachments_formid_index'                    │
│ 30      │ 'form_defs'                          │ 'form_defs_formid_foreign'                                  │ 'form_defs_formid_publishedat_index'               │
│ 31      │ 'form_defs'                          │ 'form_defs_keyid_foreign'                                   │ 'idx_fk_form_defs_keyid'                           │
│ 32      │ 'form_defs'                          │ 'form_defs_schemaid_foreign'                                │ 'idx_fk_form_defs_schemaid'                        │
│ 33      │ 'form_defs'                          │ 'form_defs_xlsblobid_foreign'                               │ 'idx_fk_form_defs_xlsblobid'                       │
│ 34      │ 'form_field_values'                  │ 'form_field_values_formid_foreign'                          │ 'form_field_values_formid_index'                   │
│ 35      │ 'form_field_values'                  │ 'form_field_values_submissiondefid_foreign'                 │ 'form_field_values_submissiondefid_index'          │
│ 36      │ 'form_fields'                        │ 'form_fields_formid_foreign'                                │ 'form_fields_formid_path_type_index'               │
│ 37      │ 'form_fields'                        │ 'form_fields_schemaid_foreign'                              │ 'form_fields_pkey'                                 │
│ 38      │ 'forms'                              │ 'forms_acteeid_foreign'                                     │ 'idx_fk_forms_acteeid'                             │
│ 39      │ 'forms'                              │ 'forms_currentdefid_foreign'                                │ 'idx_fk_forms_currentdefid'                        │
│ 40      │ 'forms'                              │ 'forms_draftdefid_foreign'                                  │ 'idx_fk_forms_draftdefid'                          │
│ 41      │ 'forms'                              │ 'forms_projectid_foreign'                                   │ 'forms_projectid_xmlformid_deletedat_unique'       │
│ 42      │ 'projects'                           │ 'projects_keyid_foreign'                                    │ 'idx_fk_projects_keyid'                            │
│ 43      │ 'public_links'                       │ 'public_links_actorid_foreign'                              │ 'public_links_pkey'                                │
│ 44      │ 'public_links'                       │ 'public_links_createdby_foreign'                            │ 'idx_fk_public_links_createdby'                    │
│ 45      │ 'public_links'                       │ 'public_links_formid_foreign'                               │ 'idx_fk_public_links_formid'                       │
│ 46      │ 'sessions'                           │ 'sessions_actorid_foreign'                                  │ 'sessions_actorid_expires_index'                   │
│ 47      │ 'submission_attachments'             │ 'attachments_blobid_foreign'                                │ 'idx_fk_submission_attachments_blobid'             │
│ 48      │ 'submission_attachments'             │ 'submission_attachments_submissiondefid_foreign'            │ 'submission_attachments_pkey'                      │
│ 49      │ 'submission_defs'                    │ 'submission_defs_actorid_foreign'                           │ 'idx_fk_submission_defs_submitterid'               │
│ 50      │ 'submission_defs'                    │ 'submission_defs_formdefid_foreign'                         │ 'idx_fk_submission_defs_formdefid'                 │
│ 51      │ 'submission_defs'                    │ 'submission_defs_submissionid_foreign'                      │ 'submission_defs_submissionid_current_index'       │
│ 52      │ 'submission_field_extract_geo_cache' │ 'submission_field_extract_geo_cache_submission_def_id_fkey' │ 'submission_field_extract_geo_cache_pkey'          │
│ 53      │ 'submissions'                        │ 'submissions_formid_foreign'                                │ 'submissions_formid_instanceid_draft_unique'       │
│ 54      │ 'submissions'                        │ 'submissions_submitter_foreign'                             │ 'idx_fk_submissions_submitterid'                   │
│ 55      │ 'user_project_preferences'           │ 'user_project_preferences_projectId_fkey'                   │ 'idx_fk_user_project_preferences_projectid'        │
│ 56      │ 'user_project_preferences'           │ 'user_project_preferences_userId_fkey'                      │ 'user_project_preferences_primary_key'             │
│ 57      │ 'user_site_preferences'              │ 'user_site_preferences_userId_fkey'                         │ 'user_site_preferences_primary_key'                │
│ 58      │ 'users'                              │ 'users_actorid_foreign'                                     │ 'users_pkey'                                       │
└─────────┴──────────────────────────────────────┴─────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────┘
```
@brontolosone
Copy link
Contributor

It's hard for me to know how much improved performance to expect.

Yes. It'd be great to be able to measure it, but the real-world benefits won't let themselves be approximated well through an easy pgbench run I think.

It's more a case of compounding positive effects, as it does cut the size of foreign keys into this table, and indices thereon, in half. Following from that there will be setups and circumstances in which this means that some pages of some table/index will now not be pushed out of memory caches when pages of these tables/indices need to be read, and vice versa. Conversely it's reasonable to expect that there's a performance cliff we fall off once hot path tables/indices don't all fit into memory anymore, especially on DB setups like bottom tier Amazon RDS which has the I/O bandwidth of a 1994 PC. It would be nice to see that dropoff point move in a benchmark, and if we get a benchmark setup going we might backtest this change, but I think we can follow an instinct of "compact dedicated fixed-width datatypes† are probably better" :-)

†) IIRC varchar(36) is not fixed-width, it's variable-width, just with a limit

@brontolosone
Copy link
Contributor

I've read @matthew-white 's point above, and the earlier considerations. There is redundancy (along multiple axes! eg, various belongs-to relationships are also expressed in the orthodox FK ways, for example, forms rows have an FK to their projects AND they have that same parentage expressed through the actees table †) in this system, that's also what I concluded when I was exploring a backwards-compatible alternative.

And just to be clear, I'm not suggesting that we get rid of the concept of species entirely. For example, the actees.species column can stay put. I just think that we don't need each species to correspond to a row of the actees table.

Can't we drop that column? :machete: :rambo:
@matthew-white you're proposing keeping around just * (or its UUID representation), would that purely be to express "admins can do anything"? Because while I understand the philosophical-ontological attraction of separating "is-an-admin?" from "can-an-admin?", the reality is that of course, an admin can perform any exposed action, and the "admin-ness" of a user is already expressed through assignment of the admin role to a user.
IOW,

auth.canOrReject('user.create', User.species) 

could conceivably just shortcut for admins, the answer is "Yes They Can". No need to plunge into the actees recursion pool. Or am I missing some detail?

†) And nothing guarantees consistency between those two. "If you have one watch you know what time it is, if you have two watches, you don't." (A saying from the era of analog spring-powered watches; it's not going to work for a generation growing up with GPS-timesynced smartwatches).

@matthew-white
Copy link
Member

Are there any immediate plans to use more fine-grained assignment?

It seems reasonable that there might be different permissions for e.g. creating single-use links vs system settings.

Different permissions are driven by (1) verbs on the role and (2) the actee.

  • Creating single-use links: actor must have the public_link.create verb on an assignment to the form actee, the project actee, the form species ("all forms"), the project species ("all projects"), or *.
  • System settings: actor must have the config.set verb on an assignment to the config species or *.

In both of these cases, I don't think the species actees are needed. We could remove the parts above about the form species, the project species, and the config species without changing any user-facing behavior. That's because there's no way to use the API to create an assignment to a species actee. The ways to create an assignment are for a form, for a project, or for everything (*), where the assignment is limited to a specific set of verbs.

We've definitely discussed creating more roles: getodk/central#736, getodk/central#737. However, I see those issues as being about changing the roles/verbs that can be used for assignment, not changing which actees are assignable. I don't expect species actees to be relevant to that work.

Similarly, for API keys (getodk/central#1335), I expect API keys to provide access to specific projects or to a select list of verbs on *, but I don't expect API keys to need to do anything with species actees.

Now the system's in place, maintaining it doesn't seem like a big cost, whereas re-adding it later might be a lot of work.

If it's costing effort to maintain, and isn't going to get used, let's get rid.

I think the main reason to remove it is code clarity and making it easier for developers on the project to understand the auth system. We rarely touch this area of the code, so I don't see it as being much cost to maintain. It just needs like a good opportunity to simplify things if we're going to be refactoring this code anyway.

I also don't think there's much risk of removing it and needing to add it back later. I can't think of a time when we've ever needed these species actees, and I don't know of any upcoming work that will require them.

I don't want to block progress on this PR though. The main goal is to make acteeIds of type UUID. So if this conversation feels like a tangent/separate, I'm happy to come back to it later. There's no urgency in removing species actees now. I could file a separate issue about that, or we could just remember the idea for later.

@matthew-white
Copy link
Member

you're proposing keeping around just * (or its UUID representation), would that purely be to express "admins can do anything"? Because while I understand the philosophical-ontological attraction of separating "is-an-admin?" from "can-an-admin?", the reality is that of course, an admin can perform any exposed action, and the "admin-ness" of a user is already expressed through assignment of the admin role to a user.

I think it's possible that in the future, there could be additional top-level roles or assignments. For example, an API key that has the submission.read verb on *, giving it read access to submissions from all projects.

I think you're probably right though that there's a way we could remove * and represent that on the roles table instead. Like a column that indicates whether a role is a top-level role or a project-level role. Right now, Frontend just hard-codes which roles are top-level (admin) and which are project-level (manager, viewer, formfill).

Under the current setup, users can technically do things via the assignments API that aren't possible in Frontend. For example, creating a top-level project viewer who can view all projects. I don't know of any such use cases (they would be unexpected), just saying that our API allows such things right now.

I'd be open to getting rid of the * actee. I think I'm more excited about removing species actees, because we just never create assignments to those actees. In contrast, we do create assignments to * and allow users to do so. If we removed species actees, this PR wouldn't have to introduce new code to maintain them.

And just to be clear, I'm not suggesting that we get rid of the concept of species entirely. For example, the actees.species column can stay put. I just think that we don't need each species to correspond to a row of the actees table.

Can't we drop that column? :machete: :rambo:

Haha machete most welcome. Maybe we can drop that column, that would be exciting. I'm not sure I see any uses of it from a quick search of the codebase. What do you think about doing that in a separate PR? I could review it. For this PR, I think we need to figure out the near-term destiny of species actees (rows), but I think we could remove the species column separately.

@alxndrsn
Copy link
Contributor Author

alxndrsn commented Nov 4, 2025

Similarly, for API keys (getodk/central#1335), I expect API keys to provide access to specific projects or to a select list of verbs on *, but I don't expect API keys to need to do anything with species actees.

Would this be an example where an API key would want species-level actee permissions: a system to sync odk-central users with a 3rd-party IAM system? E.g. if an organisation wants to manage their user accounts for ODK and other systems centrally.

@alxndrsn
Copy link
Contributor Author

Opening for review, as this PR seems to be generally positively received, but also have a number of suggestions for different directions it could be taken in next.

@alxndrsn alxndrsn marked this pull request as ready for review November 19, 2025 13:01
@alxndrsn alxndrsn moved this from 🕒 backlog to ✏️ in progress in ODK Central Nov 20, 2025
Copy link
Contributor

@brontolosone brontolosone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By and large 👍, and also 🎊 , but I think there are a few non-radical improvements to make, please see inline comments!

Comment on lines +41 to +44
if (Object.keys(specialActees).length !==
new Set(Object.values(specialActees)).size) {
throw new Error('Check specialActees values are unique');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel we really need this little inline test-in-a-migration.

throw new Error('Check specialActees values are unique');
}

const tableName = t => t.padStart(16, ' ');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for aesthetics?

`);

const down = (db) => db.raw(`
-- TODO reverse all statements from up()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice. Ktuite likes to use down-migrations.

Comment on lines +76 to +77
CREATE INDEX idx_fk_actees_species ON "actees" ("species");
CREATE INDEX idx_fk_datasets_acteeId ON "datasets" ("acteeId");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (from 83ebf49) is bycatch, right? Bycatch as in an improvement (or bugfix) that you stumble upon while fullfilling the primary mission? Personally I'm not against bycatch especially if it's denoted as such in a commitmessage (the one of 83ebf49 suffices, but it could be made more distinct with a bycatch: prefix convention) (and then the commit should IMO ideally not be squash-merged).
I have the impression that a bycatch-positive disposition is not present across the whole team and perhaps another reviewer would ask you to put it in a separate micro-PR but hey, I'm the reviewer here so let's smuggle that contraband commit in 😆

`);

const down = (db) => db.raw(`
-- TODO reverse all statements from up()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be nice. I know ktuite likes to use down-migrations.

Comment on lines +33 to +38
const uuidFor = acteeId => {
if (acteeId == null) return null;
else if (Object.prototype.hasOwnProperty.call(sqlSpecial, acteeId)) return sqlSpecial[acteeId];
else if (acteeId.length === 36) return acteeId;
else throw new Error(`Unexpected acteeId: '${acteeId}'`);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not store these constants as sql-templates. That's a boxing (of sorts) for a particular use case for a particular DB driver. So I'd leave the base definition assumption-free which in this case would mean storing the UUIDs as string values as there is no UUID type in the stdlib.

And then of course you can construct this sqlSpecial special purpose mapping automatically from that base assumption-free definition. Alternatively the ensql-ing it could take place inside uuidFor() though of course from a performance-aesthetics POV it's nicer to not repeat that work, which speaks for creating a mapping once on initialization. And then that sql-specific stuff should then probably go into query/actees.js, and read the table of constants from a new lib/constants/actees.js which contains just the mapping constants and no presumptions about the usage. (actees.js is imo preferable to actz.js).

describe('database migrations: intermediate form schema', function() {
// skipped: test setup relies on populateUsers(), which relies on application
// code which has changed since the test was written.
// REVIEW: this test will never work again, and should probably be deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review? is this addressed to me? 🤔

throw new Error('Check specialActees values are unique');
}

const tableName = t => t.padStart(16, ' ');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this purely for aesthetics? We don't look at the migration output that much, especially not on an ongoing basis going forward, so... leave it out, less is more? If I want to look at formatted SQL code I'll just pipe it through my SQL formatter on an ad-hoc basis.

sql`${_unjoiner.fields} FROM forms JOIN form_defs on ${versionJoinCondition(defVersion)}`;
const conditions = sql`
WHERE "acteeId"=${acteeId}
WHERE "acteeId"=${acteeId}::UUID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the typename is actually in lowercase, it just so happens that things get converted to lowercase unless quoted, so this happens to work, but I don't like things that happen to work if there's a more to-the-point way. Witness:

odkcentral> select 'd7bb7054-1f61-4cd3-9a90-ac4f18b8a060'::"uuid"
+--------------------------------------+
| uuid                                 |
|--------------------------------------|
| d7bb7054-1f61-4cd3-9a90-ac4f18b8a060 |
+--------------------------------------+

Versus:

odkcentral> select 'd7bb7054-1f61-4cd3-9a90-ac4f18b8a060'::"UUID"
type "UUID" does not exist              

So the type is called uuid, not UUID.

alxndrsn added a commit that referenced this pull request Dec 3, 2025
Implemented while try to understand why #1589 required some new "acteeId" reverse-indexes, but not others.

When indexes are missing, the table is printed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✏️ in progress

Development

Successfully merging this pull request may close these issues.

3 participants