-
Notifications
You must be signed in to change notification settings - Fork 85
Use postgres UUID types for actee ids #1589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Use postgres UUID types for actee ids #1589
Conversation
|
What would be nice is to have DB functions |
|
Again I've run into a situation where I have to refer to actees and find it wasteful to have to 2x-bloat a table and 2x-bloat an index because of the stringery... |
I agree that it's reasonable to use the I also wanted to link this PR to the discussion at getodk/central#1217. In particular, I still think that we could take this opportunity to do away with species-level actees entirely. Instead of reifying them, we could get rid of them. We assign actors access to specific actees (like a specific project), and we assign actors access to In other words, I think we could make do with just one special UUID (for Concretely, I'm proposing that we change lines like central-backend/lib/resources/users.js Line 37 in 7739129
to: auth.canOrReject('user.create', '*')I really don't think this would be a huge code change. I'd be happy to create a draft PR along those lines if there's interest. |
|
And just to be clear, I'm not suggesting that we get rid of the concept of species entirely. For example, the |
|
Now that @brontolosone has added this PR to the project board, I'll go ahead and assign @alxndrsn just so it's clear from a glance who's working on it. |
Are there any immediate plans to use more fine-grained assignment? It seems reasonable that there might be different permissions for e.g. creating single-use links vs system settings. Now the system's in place, maintaining it doesn't seem like a big cost, whereas re-adding it later might be a lot of work. If it's costing effort to maintain, and isn't going to get used, let's get rid. |
By changing `false` to `true`, a list of database foreign keys and their corresponding reverse indexes will be printed. Implemented while try to understand why getodk#1589 required _some_ new "acteeId" reverse-indexes, but not others. Example output: ```sh database indexes ┌── Foreign Key Reverse Indexes ───────┐ ┌─────────┬──────────────────────────────────────┬─────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────┐ │ (index) │ FK table │ foreign key │ database index │ ├─────────┼──────────────────────────────────────┼─────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤ │ 0 │ 'actors' │ 'actors_acteeid_foreign' │ 'idx_fk_actors_acteeid' │ │ 1 │ 'assignments' │ 'assignments_acteeid_foreign' │ 'idx_fk_assignments_acteeid' │ │ 2 │ 'assignments' │ 'assignments_actorid_foreign' │ 'assignments_pkey' │ │ 3 │ 'assignments' │ 'assignments_roleid_foreign' │ 'idx_fk_assignments_roleid' │ │ 4 │ 'audits' │ 'audits_acteeid_foreign' │ 'audits_acteeid_loggedat_index' │ │ 5 │ 'audits' │ 'audits_actorid_foreign' │ 'audits_actorid_loggedat_index' │ │ 6 │ 'client_audits' │ 'client_audits_blobid_foreign' │ 'idx_fk_client_audits_blobid' │ │ 7 │ 'comments' │ 'comments_actorid_foreign' │ 'idx_fk_comments_actorid' │ │ 8 │ 'comments' │ 'comments_submissionid_foreign' │ 'comments_submissionid_index' │ │ 9 │ 'dataset_form_defs' │ 'dataset_form_defs_datasetid_foreign' │ 'dataset_form_defs_datasetid_formdefid_unique' │ │ 10 │ 'dataset_form_defs' │ 'dataset_form_defs_formdefid_foreign' │ 'idx_fk_dataset_form_defs_formdefid' │ │ 11 │ 'datasets' │ 'datasets_projectid_foreign' │ 'idx_fk_datasets_projectid' │ │ 12 │ 'ds_properties' │ 'ds_properties_datasetid_foreign' │ 'idx_fk_ds_properties_datasetid' │ │ 13 │ 'ds_property_fields' │ 'ds_property_fields_dspropertyid_foreign' │ 'ds_property_fields_dspropertyid_formdefid_unique' │ │ 14 │ 'entities' │ 'entities_createdby_foreign' │ 'idx_fk_entities_creatorid' │ │ 15 │ 'entities' │ 'entities_datasetid_foreign' │ 'entities_datasetid_createdat_id_index' │ │ 16 │ 'entity_def_sources' │ 'entity_def_sources_auditid_foreign' │ 'idx_fk_entity_def_sources_auditid' │ │ 17 │ 'entity_def_sources' │ 'entity_def_sources_submissiondefid_foreign' │ 'idx_fk_entity_def_sources_submissiondefid' │ │ 18 │ 'entity_defs' │ 'entity_defs_entityid_foreign' │ 'entity_defs_entityid_current_index' │ │ 19 │ 'entity_defs' │ 'entity_defs_sourceid_foreign' │ 'entity_defs_sourceid_index' │ │ 20 │ 'entity_submission_backlog' │ 'fk_audit_id' │ 'idx_fk_entity_submission_backlog_auditid' │ │ 21 │ 'entity_submission_backlog' │ 'fk_submission_defs' │ 'idx_fk_entity_submission_backlog_submissiondefid' │ │ 22 │ 'entity_submission_backlog' │ 'fk_submissions' │ 'idx_fk_entity_submission_backlog_submissionid' │ │ 23 │ 'field_keys' │ 'field_keys_actorid_foreign' │ 'field_keys_pkey' │ │ 24 │ 'field_keys' │ 'field_keys_createdby_foreign' │ 'idx_fk_field_keys_createdby' │ │ 25 │ 'field_keys' │ 'field_keys_projectid_foreign' │ 'idx_fk_field_keys_projectid' │ │ 26 │ 'form_attachments' │ 'form_attachments_blobid_foreign' │ 'idx_fk_form_attachments_blobid' │ │ 27 │ 'form_attachments' │ 'form_attachments_datasetid_foreign' │ 'idx_fk_form_attachments_datasetid' │ │ 28 │ 'form_attachments' │ 'form_attachments_formdefid_foreign' │ 'form_attachments_pkey' │ │ 29 │ 'form_attachments' │ 'form_attachments_formid_foreign' │ 'form_attachments_formid_index' │ │ 30 │ 'form_defs' │ 'form_defs_formid_foreign' │ 'form_defs_formid_publishedat_index' │ │ 31 │ 'form_defs' │ 'form_defs_keyid_foreign' │ 'idx_fk_form_defs_keyid' │ │ 32 │ 'form_defs' │ 'form_defs_schemaid_foreign' │ 'idx_fk_form_defs_schemaid' │ │ 33 │ 'form_defs' │ 'form_defs_xlsblobid_foreign' │ 'idx_fk_form_defs_xlsblobid' │ │ 34 │ 'form_field_values' │ 'form_field_values_formid_foreign' │ 'form_field_values_formid_index' │ │ 35 │ 'form_field_values' │ 'form_field_values_submissiondefid_foreign' │ 'form_field_values_submissiondefid_index' │ │ 36 │ 'form_fields' │ 'form_fields_formid_foreign' │ 'form_fields_formid_path_type_index' │ │ 37 │ 'form_fields' │ 'form_fields_schemaid_foreign' │ 'form_fields_pkey' │ │ 38 │ 'forms' │ 'forms_acteeid_foreign' │ 'idx_fk_forms_acteeid' │ │ 39 │ 'forms' │ 'forms_currentdefid_foreign' │ 'idx_fk_forms_currentdefid' │ │ 40 │ 'forms' │ 'forms_draftdefid_foreign' │ 'idx_fk_forms_draftdefid' │ │ 41 │ 'forms' │ 'forms_projectid_foreign' │ 'forms_projectid_xmlformid_deletedat_unique' │ │ 42 │ 'projects' │ 'projects_keyid_foreign' │ 'idx_fk_projects_keyid' │ │ 43 │ 'public_links' │ 'public_links_actorid_foreign' │ 'public_links_pkey' │ │ 44 │ 'public_links' │ 'public_links_createdby_foreign' │ 'idx_fk_public_links_createdby' │ │ 45 │ 'public_links' │ 'public_links_formid_foreign' │ 'idx_fk_public_links_formid' │ │ 46 │ 'sessions' │ 'sessions_actorid_foreign' │ 'sessions_actorid_expires_index' │ │ 47 │ 'submission_attachments' │ 'attachments_blobid_foreign' │ 'idx_fk_submission_attachments_blobid' │ │ 48 │ 'submission_attachments' │ 'submission_attachments_submissiondefid_foreign' │ 'submission_attachments_pkey' │ │ 49 │ 'submission_defs' │ 'submission_defs_actorid_foreign' │ 'idx_fk_submission_defs_submitterid' │ │ 50 │ 'submission_defs' │ 'submission_defs_formdefid_foreign' │ 'idx_fk_submission_defs_formdefid' │ │ 51 │ 'submission_defs' │ 'submission_defs_submissionid_foreign' │ 'submission_defs_submissionid_current_index' │ │ 52 │ 'submission_field_extract_geo_cache' │ 'submission_field_extract_geo_cache_submission_def_id_fkey' │ 'submission_field_extract_geo_cache_pkey' │ │ 53 │ 'submissions' │ 'submissions_formid_foreign' │ 'submissions_formid_instanceid_draft_unique' │ │ 54 │ 'submissions' │ 'submissions_submitter_foreign' │ 'idx_fk_submissions_submitterid' │ │ 55 │ 'user_project_preferences' │ 'user_project_preferences_projectId_fkey' │ 'idx_fk_user_project_preferences_projectid' │ │ 56 │ 'user_project_preferences' │ 'user_project_preferences_userId_fkey' │ 'user_project_preferences_primary_key' │ │ 57 │ 'user_site_preferences' │ 'user_site_preferences_userId_fkey' │ 'user_site_preferences_primary_key' │ │ 58 │ 'users' │ 'users_actorid_foreign' │ 'users_pkey' │ └─────────┴──────────────────────────────────────┴─────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────┘ ```
Yes. It'd be great to be able to measure it, but the real-world benefits won't let themselves be approximated well through an easy It's more a case of compounding positive effects, as it does cut the size of foreign keys into this table, and indices thereon, in half. Following from that there will be setups and circumstances in which this means that some pages of some table/index will now not be pushed out of memory caches when pages of these tables/indices need to be read, and vice versa. Conversely it's reasonable to expect that there's a performance cliff we fall off once hot path tables/indices don't all fit into memory anymore, especially on DB setups like bottom tier Amazon RDS which has the I/O bandwidth of a 1994 PC. It would be nice to see that dropoff point move in a benchmark, and if we get a benchmark setup going we might backtest this change, but I think we can follow an instinct of "compact dedicated fixed-width datatypes† are probably better" :-) †) IIRC varchar(36) is not fixed-width, it's variable-width, just with a limit |
|
I've read @matthew-white 's point above, and the earlier considerations. There is redundancy (along multiple axes! eg, various belongs-to relationships are also expressed in the orthodox FK ways, for example,
Can't we drop that column? :machete: :rambo: auth.canOrReject('user.create', User.species) could conceivably just shortcut for admins, the answer is "Yes They Can". No need to plunge into the actees recursion pool. Or am I missing some detail? †) And nothing guarantees consistency between those two. "If you have one watch you know what time it is, if you have two watches, you don't." (A saying from the era of analog spring-powered watches; it's not going to work for a generation growing up with GPS-timesynced smartwatches). |
Different permissions are driven by (1) verbs on the role and (2) the actee.
In both of these cases, I don't think the species actees are needed. We could remove the parts above about the form species, the project species, and the config species without changing any user-facing behavior. That's because there's no way to use the API to create an assignment to a species actee. The ways to create an assignment are for a form, for a project, or for everything ( We've definitely discussed creating more roles: getodk/central#736, getodk/central#737. However, I see those issues as being about changing the roles/verbs that can be used for assignment, not changing which actees are assignable. I don't expect species actees to be relevant to that work. Similarly, for API keys (getodk/central#1335), I expect API keys to provide access to specific projects or to a select list of verbs on
I think the main reason to remove it is code clarity and making it easier for developers on the project to understand the auth system. We rarely touch this area of the code, so I don't see it as being much cost to maintain. It just needs like a good opportunity to simplify things if we're going to be refactoring this code anyway. I also don't think there's much risk of removing it and needing to add it back later. I can't think of a time when we've ever needed these species actees, and I don't know of any upcoming work that will require them. I don't want to block progress on this PR though. The main goal is to make |
I think it's possible that in the future, there could be additional top-level roles or assignments. For example, an API key that has the I think you're probably right though that there's a way we could remove Under the current setup, users can technically do things via the assignments API that aren't possible in Frontend. For example, creating a top-level project viewer who can view all projects. I don't know of any such use cases (they would be unexpected), just saying that our API allows such things right now. I'd be open to getting rid of the
Haha machete most welcome. Maybe we can drop that column, that would be exciting. I'm not sure I see any uses of it from a quick search of the codebase. What do you think about doing that in a separate PR? I could review it. For this PR, I think we need to figure out the near-term destiny of species actees (rows), but I think we could remove the |
Would this be an example where an API key would want species-level actee permissions: a system to sync odk-central users with a 3rd-party IAM system? E.g. if an organisation wants to manage their user accounts for ODK and other systems centrally. |
|
Opening for review, as this PR seems to be generally positively received, but also have a number of suggestions for different directions it could be taken in next. |
brontolosone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By and large 👍, and also 🎊 , but I think there are a few non-radical improvements to make, please see inline comments!
| if (Object.keys(specialActees).length !== | ||
| new Set(Object.values(specialActees)).size) { | ||
| throw new Error('Check specialActees values are unique'); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel we really need this little inline test-in-a-migration.
| throw new Error('Check specialActees values are unique'); | ||
| } | ||
|
|
||
| const tableName = t => t.padStart(16, ' '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for aesthetics?
| `); | ||
|
|
||
| const down = (db) => db.raw(` | ||
| -- TODO reverse all statements from up() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice. Ktuite likes to use down-migrations.
| CREATE INDEX idx_fk_actees_species ON "actees" ("species"); | ||
| CREATE INDEX idx_fk_datasets_acteeId ON "datasets" ("acteeId"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (from 83ebf49) is bycatch, right? Bycatch as in an improvement (or bugfix) that you stumble upon while fullfilling the primary mission? Personally I'm not against bycatch especially if it's denoted as such in a commitmessage (the one of 83ebf49 suffices, but it could be made more distinct with a bycatch: prefix convention) (and then the commit should IMO ideally not be squash-merged).
I have the impression that a bycatch-positive disposition is not present across the whole team and perhaps another reviewer would ask you to put it in a separate micro-PR but hey, I'm the reviewer here so let's smuggle that contraband commit in 😆
| `); | ||
|
|
||
| const down = (db) => db.raw(` | ||
| -- TODO reverse all statements from up() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be nice. I know ktuite likes to use down-migrations.
| const uuidFor = acteeId => { | ||
| if (acteeId == null) return null; | ||
| else if (Object.prototype.hasOwnProperty.call(sqlSpecial, acteeId)) return sqlSpecial[acteeId]; | ||
| else if (acteeId.length === 36) return acteeId; | ||
| else throw new Error(`Unexpected acteeId: '${acteeId}'`); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not store these constants as sql-templates. That's a boxing (of sorts) for a particular use case for a particular DB driver. So I'd leave the base definition assumption-free which in this case would mean storing the UUIDs as string values as there is no UUID type in the stdlib.
And then of course you can construct this sqlSpecial special purpose mapping automatically from that base assumption-free definition. Alternatively the ensql-ing it could take place inside uuidFor() though of course from a performance-aesthetics POV it's nicer to not repeat that work, which speaks for creating a mapping once on initialization. And then that sql-specific stuff should then probably go into query/actees.js, and read the table of constants from a new lib/constants/actees.js which contains just the mapping constants and no presumptions about the usage. (actees.js is imo preferable to actz.js).
| describe('database migrations: intermediate form schema', function() { | ||
| // skipped: test setup relies on populateUsers(), which relies on application | ||
| // code which has changed since the test was written. | ||
| // REVIEW: this test will never work again, and should probably be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review? is this addressed to me? 🤔
| throw new Error('Check specialActees values are unique'); | ||
| } | ||
|
|
||
| const tableName = t => t.padStart(16, ' '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this purely for aesthetics? We don't look at the migration output that much, especially not on an ongoing basis going forward, so... leave it out, less is more? If I want to look at formatted SQL code I'll just pipe it through my SQL formatter on an ad-hoc basis.
| sql`${_unjoiner.fields} FROM forms JOIN form_defs on ${versionJoinCondition(defVersion)}`; | ||
| const conditions = sql` | ||
| WHERE "acteeId"=${acteeId} | ||
| WHERE "acteeId"=${acteeId}::UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the typename is actually in lowercase, it just so happens that things get converted to lowercase unless quoted, so this happens to work, but I don't like things that happen to work if there's a more to-the-point way. Witness:
odkcentral> select 'd7bb7054-1f61-4cd3-9a90-ac4f18b8a060'::"uuid"
+--------------------------------------+
| uuid |
|--------------------------------------|
| d7bb7054-1f61-4cd3-9a90-ac4f18b8a060 |
+--------------------------------------+
Versus:
odkcentral> select 'd7bb7054-1f61-4cd3-9a90-ac4f18b8a060'::"UUID"
type "UUID" does not exist
So the type is called uuid, not UUID.
Implemented while try to understand why #1589 required some new "acteeId" reverse-indexes, but not others. When indexes are missing, the table is printed.
Ref getodk/central#1217
What has been done to verify that this works as intended?
Tests etc.
Why is this the best possible solution? Were any other approaches considered?
There are other suggestions for extending/changing this including:
uuidFor()in postgres (TODO can't find the link for this)*(Use postgres UUID types for actee ids #1589 (comment))How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.
I don't think so.
Before submitting this PR, please make sure you have:
make testand confirmed all checks still pass OR confirm CircleCI build passes