Skip to content

Track created_by and updated_by in Node and Revisions#1255

Open
jmaruland wants to merge 23 commits intobluesky:mainfrom
jmaruland:track-created-by-and-update-by-in-node-and-revisions
Open

Track created_by and updated_by in Node and Revisions#1255
jmaruland wants to merge 23 commits intobluesky:mainfrom
jmaruland:track-created-by-and-update-by-in-node-and-revisions

Conversation

@jmaruland
Copy link
Copy Markdown
Collaborator

@jmaruland jmaruland commented Dec 12, 2025

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section

This PR introduces created_by and updated_by columns in the database for forensic purposes.

@jmaruland jmaruland force-pushed the track-created-by-and-update-by-in-node-and-revisions branch from 0f4e558 to 1ce3bdc Compare January 22, 2026 21:15
@jmaruland jmaruland force-pushed the track-created-by-and-update-by-in-node-and-revisions branch from ee06f6d to 505796b Compare February 5, 2026 23:16
@jmaruland
Copy link
Copy Markdown
Collaborator Author

Closes #1076

Copy link
Copy Markdown
Member

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this coming together! I have a couple questions in line.

Also, this needs a database migration to add the new columns to existing databases.

data_sources=body.data_sources,
access_blob=access_blob,
created_by=(
principal.identities[0].id if len(principal.identities) > 0 else ""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design consideration deserves some thought:

  1. What should we put here when the principal has no identities? Service Principals have no identities. Should we put service:{principal.uuid}? Or nothing?
  2. Should the fallback value be "" or None / NULL?

Copy link
Copy Markdown
Collaborator Author

@jmaruland jmaruland Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had to add this conditional because test_access_control.py::test_service_principal_access_control was failing here. It seems like, at some point during the handshake with the server, principal.identities is passed as an empty list.

@danielballan
Copy link
Copy Markdown
Member

Suggestions for various scenarios:

scenario principal len(identities) created_by
single-user server None N/A NULL in SQL, None in Python (not "")
user in a multi-user server, one ID provider str 1 principal.identities[0]
user in a multi-user server, N ID providers str N principal.identities[0]
service principal in a mult-user server, one ID provider str 0 f"service:{principal.uuid}"

To test a service principal:

# Log in as admin. Then...

# Create a new Service Principal. It is identified by its UUID. It has no "identities"; it cannot log in with a password.
sp = admin_client.context.admin.create_service_principal()
# Create an API key for this Service Principal.
key_info = admin_client.context.admin.create_api_key(sp["uuid"])

# Then to log in as the principal, you can do this...
admin_client.logout()
sp_client.context.api_key = key_info["secret"]
# For clarity use a new variable name.
sp_client = admin_client
del admin_client

@danielballan
Copy link
Copy Markdown
Member

Run a command line this to generate a template for a database migration. See other similar files in this directory for examples.

❯ python -m tiled.catalog revision -m "Add created_by and updated_by"

Finally, add the unique ID for this migration (the first part of the filename, and it also appears in the file) to this list:

# This is list of all valid revisions (from current to oldest).
ALL_REVISIONS = [
"4cf6011a8db5",
"a86a48befff6",
"dfbb7478c6bd",
"a963a6c32a0c",
"e05e918092c3",
"7809873ea2c7",
"9331ed94d6ac",
"45a702586b2a",
"ed3a4223a600",
"e756b9381c14",
"2ca16566d692",
"1cd99c02d0c7",
"a66028395cab",
"3db11ff95b6c",
"0b033e7fbe30",
"83889e049ddc",
"6825c778aa3c",
]

@jmaruland
Copy link
Copy Markdown
Collaborator Author

@danielballan I ran the command for the database migration but I have pre-commit a bit a bit mad because of flake8

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tiled/catalog/migrations/versions/8fd6ac88f2ec_add_created_by_and_updated_by.py:8:1: F401 'sqlalchemy as sa' imported but unused
tiled/catalog/migrations/versions/8fd6ac88f2ec_add_created_by_and_updated_by.py:9:1: F401 'alembic.op' imported but unused

This is an auto-generated file. Should I ignore these messages?

@jmaruland
Copy link
Copy Markdown
Collaborator Author

Is it ok if I add an inline comment with # noqa: F401 to ignore this and avoid deleting any auto-generated code?

@jmaruland
Copy link
Copy Markdown
Collaborator Author

False alarm. I understand what the next step us with the upgrade and downgrade methods that were just generated.
Thanks @genematx for the tips

Comment on lines +88 to +95
def get_current_user(principal: Principal) -> Optional[str]:
username = None
if principal is not None:
if len(principal.identities) > 0:
username = principal.identities[0].id
else:
username = f"service:{principal.uuid}"
return username
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like broader input on whether this passes the smell test as a pragmatic approach.

This PR is about adding created_by and updated_by columns to some tables in the catalog database. This will not no bearing on access policy (that uses access_blob exclusively) but could be useful for forensics. Once a node is "tagged", we don't currently retain information about who created it. That seems useful to retain, internally.

Background:

  • Tiled supports multiple identity providers (akin to Globus' "linked identities" or "Log in with GitHub OR Google OR ....")
  • For a while, I've wondered if we should simplify this to just one. I am not aware of any deployments today that use more than one IdP (identity provider).
  • But the recent "American Science Cloud" goal of providing an IdP that works across Labs means we might soon have a reason to support both a BNL OIDC and an "American Science Cloud" OIDC. It would be shame to rip this out and find out 3 months later than we want it back.
  • It would be clean/pure to store Principal UUIDs in access tags, like {"user": "66f1e964-5ea4-4cde-b304-1c93231e45dc"} but {"user": "dallan"} is a lot easier to read, and I went with "easy to read" since I foresaw "multiple identity" providers as rare or maybe soon-to-be-dropped.
  • But the above feels a bit messy....What if identities changes, etc.

Should we switch to storing principal UUIDs here? Or "Just use the first identity in the list," acceptable? There are obvious problems if identities change...

attn @gwbischof @nmaytan @genematx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants