Skip to content

Conversation

miketheman
Copy link
Member

The API and claims generated from GitLab will be desensitized when
generating JWTs.

Instead of updating the regular expression and using the same generic
error message, create a custom validator with a custom message.

Resolves #18797

Signed-off-by: Mike Fiedler [email protected]

The API and claims generated from GitLab will be desensitized when
generating JWTs.

Instead of updating the regular expression and using the same generic
error message, create a custom validator with a custom message.

Resolves pypi#18797

Signed-off-by: Mike Fiedler <[email protected]>
Signed-off-by: Mike Fiedler <[email protected]>
@miketheman miketheman requested a review from a team as a code owner October 6, 2025 17:53
@miketheman miketheman added UX/UI design, user experience, user interface trusted-publishing labels Oct 6, 2025
@miketheman
Copy link
Member Author

I elected to update the form to reject and thus guide the user to input what becomes the ground truth value, rather than lower-case it during query time, so that we are not manipulating the ground truth, and reduce the time-to-discover a problem with the configuration.

@miketheman
Copy link
Member Author

Looks like:
image

@di
Copy link
Member

di commented Oct 6, 2025

I think if GitLab is case-insensitive, we should just be case-insensitive as well?

@miketheman
Copy link
Member Author

miketheman commented Oct 6, 2025

I think if GitLab is case-insensitive, we should just be case-insensitive as well?

I previously wrote:

I elected to update the form to reject and thus guide the user to input what becomes the ground truth value, rather than lower-case it during query time, so that we are not manipulating the ground truth, and reduce the time-to-discover a problem with the configuration.

Expanding on this - if we do lowercase at query-time, our machinery would still provide the end user a "publisher not found" message when incorrect, and then kick off the questions of "what's the ground truth configured, what did we get in the claims, etc". Whereas prompting them to input the value that we will expect to find later at the outset removes a set of concerns, especially as I look at other behaviors related to validation.

The way I see it, if there's data stored in the DB, that's the ground truth. We should store what folks give us, instead of lowercasing at insert-time, since then we modify the value they gave us, creating a mismatch of expectations.

If we store the user-supplied, case-sensitive value (like we do today), then we must remember that every time we access that value, we have to lowercase it, even during manual debugging in a SQL prompt. My memory isn't that good, so I'm preferring to have the user control the input and provide valid values and reduce potential confusion down the line.

Do you see an inherent problem with this approach?

@di
Copy link
Member

di commented Oct 6, 2025

Yeah, I think the issue is that if the user puts in caseSensitiveProject (because that's what they see displayed on GitLab and we reject it because we're being case-sensitive and overly restrictive, that's somewhat confusing:

  • If the user knows and expects GitLab to be case-insensitive, it's confusing because it's the opposite of what they know.
  • If the user doesn't know that GitLab is case-insensitive, it's confusing because it seems like we're telling them to put in something other than what their repo is named.

The way I see it, if there's data stored in the DB, that's the ground truth. We should store what folks give us, instead of lowercasing at insert-time, since then we modify the value they gave us, creating a mismatch of expectations.

I agree, we should not mutate what we've been originally given by the user before storing it.

If we store the user-supplied, case-sensitive value (like we do today), then we must remember that every time we access that value, we have to lowercase it, even during manual debugging in a SQL prompt. My memory isn't that good, so I'm preferring to have the user control the input and provide valid values and reduce potential confusion down the line.

I disagree, I don't think this is worth making users jump through extra hoops here just to do normalization for us.

I think either way when doing a manual query, we'll need to remember that there might be a difference between the repo name a user provides us and what's in the database (e.g. if they come to us with a repo named https://gitlab.com/coBib/cobib, we'll have to remember that this will be stored as cobib and not coBib in the database.

Additionally, this doesn't help any of the users that have a mis-configured publisher currently, whereas updating the query to do normalization would make their publishers work immediately:

warehouse=> SELECT count(*) FROM gitlab_oidc_publishers WHERE namespace ~ '[A-Z]' OR project ~ '[A-Z]';
 count
-------
    66
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trusted-publishing UX/UI design, user experience, user interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitLab Trusted Publisher case sensitivity
2 participants