This document summarizes the design and purpose of the CFDE data submission and user-preferences registry.
The CFDE registry is a bookkeeping system for:
- The C2M2 datapackage submission pipeline.
- Metadata related to DCC federation.
- User-managed preferences/profile data.
The registry data model is illustrated in the registry model diagram.
These registry tables are populated by CFDE-CC staff activity:
dcc: each row represents an onboarded DCCgroup: each row represents a known group in the authentication systemdcc_group_role: each row binds a known group to a permission class for a DCCid_namespace: each row represents a CFDE federated identifier namespace
Other aspects of DCC federation are under continued development.
These registry tables are populated by a mixture of CFDE-CC staff and submission system activity, representing known C2M2 vocabulary concepts:
anatomyassay_typedata_typediseasefile_formatmime_type(C2M2 raw string column)ncbi_taxonomysubject_granularity(C2M2 enum column)subject_role(C2M2 enum column)
Generally, these share structure with C2M2 vocabulary tables present in submissions, review catalogs, and release catalogs. However, we may introduce additional columns relevant to registry functionality. We also promote several non-vocabulary columns into quasi-vocabulary tables for the purpose of tracking usage of distinct values for those columns.
It is expected that CFDE-CC and/or DCC staff may pre-populate these tables with terms intended for common use. Additionally, the tables may indicate automatically-detected terms found in C2M2 submissions.
The usage of these tables is under continued development.
These registry tables are populated based on DCC submission system activity:
datapackage: each row represents one C2M2 submissiondatapackage_table: each row represents one TSV file of one submissiondatapackage_measurement: each row represents a measurement of one submissiondatapackage_metric: each row represents a measurement concept known by the system
These tables are populated by release-planning and preparation:
release: each row represents a CFDE inventory release in some stage of planningdcc_release_datapackage: each row binds a constituent submission to a release
Other aspects of submission-tracking under continued development.
These registry tables are populated by user activity and represent their saved preferences or other state values relevant to personalization of CFDE service features:
user_profile: each row represents scalar settings for one usersaved_query: each row represents one saved query for a userfavorite_anatomy: each row represents one favorited vocabulary term for a userfavorite_assay_type: each row represents one favorited vocabulary term for a userfavorite_data_type: each row represents one favorited vocabulary term for a userfavorite_disease: each row represents one favorited vocabulary term for a userfavorite_file_format: each row represents one favorited vocabulary term for a userfavorite_ncbi_taxonomy: each row represents one favorited vocabulary term for a user
The user_profile table is keyed by authenticated user ID which is
also a foreign key to the built-in ERMrest_Client table. Each user
can have at most one profile record associated with their
identity. The latter table is automatically populated by the DERIVA
system, while the former will be populated by a user or user agent
action to enable per-user profile features. The profile record will
store any scalar settings for the user, as a single column for each
named setting.
The saved_query table is keyed by a composite key consisting of
several parts:
user_id: the user saving the query, also a foreign key to the profileschema_name: will always beCFDE, needed by DERIVA mechanismtable_name: the C2M2 table being searched by the queryquery_id: a client-generated key for the query, unique among all queries saved by the user
Each saved_query has these additional columns which
are not part of the composite key:
RID: system-assigned record ID, useful for API accessRCT: system-maintained record creation time, when the query was savedname: a human-readable name for the saved querydescription: a longer, human-readable summary of the querylast_executon_time: a client maintained timestamp, updated when the query is performed
Each user can have zero or more saved query records associated with their identity. Each record will store necessary information to reconstitute a query in Chaise, to name/describe the query in a query listing UI, and other system metadata TBD.
The Chaise user agent (client) should produce per-user query_id
values to detect/prevent duplicate saved queries for the user:
- Generate the stable form of the facet config document
- Generate an MD5 hexadecimal format hash of the config document
Generally, the various favorite_* tables form binary associations to
link a subset of vocabulary concepts from a given vocabulary table to
a given user's profile.
There are several possible phases to the profile content (and related UX capabilities) for a user.
The following descriptions are based on the assumption that a Globus Group will be use to represent the community of CFDE profile users. To enroll, a user must join this new group (group name TBD).
- An anonymous (non-logged-in) user
- has no
user_id - no profile storage possible
- prompt to login and join community?
- has no
- A logged-in user (not yet onboarded to community...)
- not yet enrolled in the profile users' group
- has a
user_id - no profile storage possible
- promot to join community?
- A logged-in user (transitional phase...)
- enrolled in the profile user's group
- has a
user_id user_profilecan be created on-demand, transitioning to next phase
- A logged-in user (full profile functionality...)
- continues to belong to profile users' group
- has a
user_id - has a
user_profilerecord - has permission to read/update profile
- has permission to read/add/delete sub-records (saved queries, favorite terms)
- has permission to delete profile (transition back to phase 3)?
- A logged-in user (degraded function...)
- has been removed from the profile users' group!?
- has a
user_id - has a
user_profilerecord - no longer permitted to add sub-records to profile
- has permission to read/update profile
- has permission to read/delete sub-records (saved queries, favorite terms)
- has permission to deleted profile (transition back to phase 2)?
The registry supports several classes of client/user. Generally, one group corresponds directly to a role, and other groups for higher-privilege roles also enjoy the same privileges:
- Submission ingest pipeline automation (CFDE-hosted machine identity)
- CFDE Submission Pipeline
- DCC staff who review content of datapackages (read-only)
- CFDE DCC Reviewer
- also CFDE DCC Approver
- also CFDE DCC Submitter
- also CFDE DCC Admin (not in current practice?)
- DCC staff who submit datapackages
- CFDE DCC Submitter
- also CFDE DCC Admin (not in current practice?)
- DCC staff who approve datapackages for release
- CFDE DCC Approver
- also CFDE DCC Admin (not in current practice?)
- DCC staff who administer their DCC's submission content
- CFDE DCC Admin (not in current practice?)
- CFDE-CC staff who review submitted datapackages (read-only)
- CFDE Portal Reviewer
- also CFDE Portal Curator
- also CFDE Portal Admin
- also CFDE Infrastructure Operations
- CFDE-CC staff who review and approve datapackages for release (CFDE-CC decision data-entry)
- CFDE Portal Curator
- also CFDE Portal Admin
- also CFDE Infrastructure Operations
- CFDE-CC staff who administer the pipeline, onboard DCCs, can also submit:
- CFDE Portal Admin
- also CFDE Infrastructure Operations
- CFDE-CC staff with highest permissions on infrastructure
- CFDE Infrastructure Operations
- General users who have personal preferences/profile data
- a member of a general CFDE Portal group?
- during dev cycle: CFDE Portal Curator, CFDE Portal Reviewer, CFDE Portal Writer, CFDE Portal Reader
- The owner of a particular profile
- where the profile
idor related contentuser_idmatches the authenticated client
- where the profile
Other roles TBD.
The fine-grained policy for the submission system is implemented in a number of policy elements:
-
For simplicity, the bulk of the registry's CFDE schema is made visible to the public, not requiring detailed reconfiguration. This includes the general informational/vocabulary tables of the registry. Only portal administrators can write to these tables.
-
The core
datapackageanddatapackage_tabletables are configured with more-specific policies which override the schema-wide defaults to make these tables more restrictive for read access and to allow the automated submission system to perform certain updates. -
The special built-in ERMrest client table is useful for converting low-level authentication IDs into human-readable display values. However, we conservatively restrict access to certain columns which may be considered more sensitive.
This table summarizes these in more detail:
| resource | rights | roles | conditions | impl. | notes |
|---|---|---|---|---|---|
| registry catalog | enumerate | public | N/A | catalog ACL | tables detected (chaise avoids table-not-found) |
registry CFDE.* |
select | public | N/A | schema ACL | basic vocabs/config can be public? |
registry CFDE.* |
insert, update, delete | CFDE admin | N/A | schema ACL | admin can modify all vocabs/config |
registry CFDE. vocab |
select | public | N/A | table ACL | everyone can view vocabulary term sets |
registry CFDE. vocab |
insert, update | CFDE admin/curator/pipeline | N/A | table ACL | staff can curate vocabulary terms, pipeline can add newly encountered terms |
registry CFDE. vocab |
delete | CFDE admin/curator | N/A | table ACL | staff can curate vocabulary terms |
registry CFDE.datapackage |
select | CFDE admin/curator/pipeline/reviewer | N/A | table ACL | CFDE-CC roles can read all submission records |
registry CFDE.datapackage |
insert | CFDE pipeline | N/A | table ACL | CFDE-CC pipeline can record new submissions |
registry CFDE.datapackage |
update | CFDE admin/curator/pipeline | N/A | table ACL | some CFDE-CC roles can edit all submission records |
registry CFDE.datapackage |
delete | none | N/A | table ACL | no CFDE-CC role can delete submissions (except ops/infrastructure) |
registry CFDE.datapackage |
select | DCC group | client belongs to any group role for submitting_dcc |
table ACL-binding | DCC members can see DCC's submissions |
registry CFDE.datapackage |
update | DCC decider, admin | client belongs to decider group role for submitting_dcc |
table ACL-binding | DCC admin or decider can edit DCC's submissions |
registry CFDE.datapackage.id |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage.submitting_dcc |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage.submitting_user |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage.submission_time |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage.datapackage_url |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage.status |
update | CFDE-CC admin/pipeline | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage.decision_time |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage.review_ermrest_url |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage.review_browse_url |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage.review_summary_url |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage.diagnostics |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage.dcc_approval_status |
update | CFDE-CC admin | N/A | column ACL | Can be edited by CFDE-CC admin |
registry CFDE.datapackage.cfde_approval_status |
update | CFDE-CC admin/curator | N/A | column ACL | Can be edited by CFDE-CC admin or curator |
registry CFDE.datapackage.description |
update | DCC admin/decider | client belongs to decider or admin group role with submitting_dcc |
inherited table ACL-binding | DCC admin or decider can edit DCC's submission description |
registry CFDE.datapackage.dcc_approval_status |
update | DCC admin/decider | client belongs to decider or admin group role with submitting_dcc |
inerited table ACL-binding | DCC admin or decider can edit DCC's submission approval |
registry CFDE.datapackage.* |
update | DCC admin/decider | client belongs to decider or admin group role with submitting_dcc |
masked table ACL-binding | DCC-derived rights are suppressed for all other columns not mentioned previously |
registry CFDE.datapackage_table |
select | CFDE admin/curator/pipeline/reviewer | N/A | table ACL | CFDE-CC roles can read all submission records |
registry CFDE.datapackage_table |
insert, update | CFDE admin/pipeline | N/A | table ACL | CFDE-CC admin or pipeline can record or edit submissions |
registry CFDE.datapackage_table |
delete | CFDE admin | N/A | table ACL | CFDE-CC admin can delete submissions |
registry CFDE.datapackage_table.datapackage |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage_table.position |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage_table.table_name |
update | none | N/A | column ACL | Set once during row insertion, then immutable |
registry CFDE.datapackage_table.status |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage_table.num_rows |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage_table.diagnostics |
update | none | N/A | column ACL | Can be edited by CFDE-CC admin or pipeline |
registry CFDE.datapackage_table |
select | DCC group | client belongs to any group role with submitting_dcc |
table ACL-binding | DCC members can see DCC's submissions |
registry CFDE.datapackage_ vocab |
select | CFDE admin/curator/pipeline/reviewer | N/A | table ACL | CFDE-CC roles can read all submission records |
registry CFDE.datapackage_ vocab |
insert | CFDE admin/pipeline | N/A | table ACL | CFDE-CC admin or pipeline can record new submissions |
registry CFDE.datapackage_ vocab |
update, delete | CFDE admin | N/A | table ACL | CFDE-CC admin can modify all submission records |
registry CFDE.user_profile |
insert | CFDE community | N/A | table ACL | Community members can create a profile |
registry CFDE.user_profile |
select, update, delete | CFDE admin | N/A | table ACL | CFDE-CC admin can read all profiles |
registry CFDE.user_profile |
update, delete | CFDE admin | user id matches client |
table ACL binding | user can edit their own profile |
registry CFDE.user_profile.id |
update | none | N/A | fkey ACL | user id is immutable |
registry CFDE.user_profile.id fkey |
insert | CFDE admin | N/A | fkey ACL | only CFDE-CC admin can set other profile users |
registry CFDE.user_profile.id fkey |
insert | user-self | N/A | fkey ACL binding | user can only set profile user id to self |
registry CFDE.user_profile.dashboard_state |
select, update, delete | owner | N/A | table ACL-binding | profile owner can read/write dashboard state |
registry CFDE.saved_query |
insert | CFDE community | N/A | table ACL | Community members can create |
registry CFDE.saved_query |
select, update, delete | CFDE admin | N/A | table ACL | CFDE-CC admin can read and modify all |
registry CFDE.saved_query |
select, update, delete | CFDE admin | user user_id matches client |
table ACL binding | user can view and edit their own |
registry CFDE.saved_query.user_id |
update | none | N/A | user_id is immutable | |
registry CFDE.saved_query.schema_name |
update | none | N/A | column ACL | schema name is immutable |
registry CFDE.saved_query.table_name |
update | none | N/A | column ACL | table name is immutable |
registry CFDE.saved_query.query_id |
update | none | N/A | column ACL | query id (hash) is immutable |
registry CFDE.saved_query.facets |
update | none | N/A | column ACL | facets blob is immutable |
registry CFDE.saved_query.user_id fkey |
insert | profile owner | user_id matches client | user can set own user ID in profile related records | |
registry CFDE.favorite_* |
insert | CFDE community | N/A | table ACL | Community members can create |
registry CFDE.favorite_* |
select, update, delete | CFDE admin | N/A | table ACL | CFDE-CC admin can read and modify all |
registry CFDE.favorite_* |
select, update, delete | CFDE admin | user user_id matches client |
table ACL binding | user can view and edit their own |
registry CFDE.favorite_*.user_id |
update | none | N/A | user_id is immutable | |
registry CFDE.favorite_*.user_id fkey |
insert | profile owner | user_id matches client | user can set own user ID in profile related records | |
registry public.ERMrest_Client |
select | users | client matches record ID | table ACL-binding | User can see their own full ERMrest_Client record |
registry public.ERMrest_Client |
insert | CFDE-CC admin + pipeline | N/A | table ACL | Submission can discover new submitting users before they visit registry themselves |
registry public.ERMrest_Client.Email |
select | CFDE admin/curator | N/A | column ACL | Not everyone needs to know a submitting user's email |
registry public.ERMrest_Client.Client_Object |
enumerate | CFDE-CC pipeline | N/A | column ACL | Column detectable for pipeline deriva-py operations |
registry public.ERMrest_Client.Client_Object |
select | none | N/A | column ACL | No part of the registry or submission needs this at present |
| review catalog | enumerate | public | N/A | catalog ACL | tables detected (chaise avoids table-not-found) |
| review catalog | owner | CFDE-CC ops/pipeline | N/A | catalog ACL | Pipeline needs to own since it creates catalog, ops should own everything |
review CFDE.* |
select | CFDE-CC admin/curator + DCC admin/reviewer/decider/submitter | N/A | schema ACL | CFDE-CC roles and DCC groups with role for submitting DCC can view content |
review public.* |
select | none | N/A | schema ACL | non-CFDE tables hidden from users |
| release catalog | enumerate | public | N/A | catalog ACL | tables detected (chaise avoids table-not-found) |
| release catalog | owner | CFDE-CC ops/pipeline | N/A | catalog ACL | Pipeline needs to own since it creates catalog, ops should own everything |
release CFDE.* |
select | public | N/A | schema ACL | CFDE releases are visible to public |
release public.* |
select | none | N/A | schema ACL | non-CFDE tables hidden from users |
The table-level ACLs override the schema-wide defaults and make certain tables more restrictive. Likewise, the column-level ACLs override the table-wide or schema-wide defaults and make specific columns more restrictive.
In DERIVA, the schema, table, and column ACLs are considered data-independent policies. They grant a given access right for all rows of a table.
The table and column ACL bindings are considered data-dependent and they grant a given access for rows of a table only when certain conditions are met by the content of that row.
The table-level ACL-bindings are inherited by all columns of the table and so pass-through the data-dependent access privilege. Columns which should not get these rights are masked off with overriding statements to disable these passthrough privileges.
Notes:
-
The schema-level
CFDE.* ACLs provide a default policy for all tables in the schema which lack their own custom ACLs. These ACLs capture a baseline policy for the registry, which is to be read-only accessible to everyone and writable by the CFDE-CC admins. The majority of the registry tables are vocabulary and configuration which should not be edited by untrusted parties, but which is safe to show in a read-only fashion. -
The built-in ERMrest clients table is useful for showing system provenance for which authenticated client changed what row. It is also used to represent the "submitting user" concept of the registry. Showing this table to all users ensures that reasonable display names can be presented for this information.
-
However, the email address column might be considered sensitive and will be restricted to CFDE-CC admin and curator roles who might have a need to contact a responsible party.
-
The
Client_Objectmay contain additional account information and does not need to be seen by any of the normal user roles. Only the infrastructure operator will have access to this column.
-
-
The core authorization decision for whether a user can make a submission is enforced by the submission ingest pipeline logic. The server-side automation runs trusted code to do this, and hence there is no specific ERMrest policy to reflect per-DCC submission policies. The pipeline identity is the one recording the submission in the registry.
-
The review catalog ACLs and
CFDEschema ACLs are configured for each submission, based on the submission'ssubmitting_dccand the groups known by the registry (with appropriate DCC group roles) at the time of submission ingest. This is primarily to adjust read privileges on review content. The infrastructure ownership aspects do not vary per DCC.