Skip to content

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Jun 16, 2025

Two commits:

  1. Simple logging for which error case we're hitting
  2. Refactor of password verification result so that we can differentiate between "password mismatch" vs. other errors.

The second has ended up being a fairly large change, as I opted to add a new struct PasswordVerificationResult for the return type. I did look instead into making the returned error being structured, but that proved a bit of a PITA, and generally I think we want to handle the error vs password mismatch cases differently anyway.

We currently alert if we see too many errors, so we should write to the
logs what is going on.
Copy link

cloudflare-workers-and-pages bot commented Jun 16, 2025

Deploying matrix-authentication-service-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 35a8d6c
Status: ✅  Deploy successful!
Preview URL: https://babb9a92.matrix-authentication-service-docs.pages.dev
Branch Preview URL: https://erikj-user-password-logging.matrix-authentication-service-docs.pages.dev

View logs

@erikjohnston erikjohnston force-pushed the erikj/user_password_logging branch from 3b8d1f9 to b6595ae Compare June 16, 2025 12:53
This gets a bit involved, but should help us separate "expected" errors
(password mismatch) vs "unexpected" errors (wrong hash algorithm, etc).
@erikjohnston erikjohnston force-pushed the erikj/user_password_logging branch from b6595ae to 485a577 Compare June 16, 2025 13:04
@erikjohnston erikjohnston marked this pull request as ready for review June 16, 2025 13:12
@erikjohnston
Copy link
Member Author

Oh, also bonus third commit: differentiating between errors and password mismatches in the metrics, as I think we care differently about them.

@erikjohnston erikjohnston requested a review from sandhose June 16, 2025 14:30
pepper: Option<&[u8]>,
) -> Result<(), anyhow::Error> {
match self {
) -> Result<PasswordVerificationResult, anyhow::Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be simplified with just a boolean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for using PasswordVerificationResult is so that we can use #[must_use], which a) helped a lot in the refactor to ensure we changed all call sites, and b) ensures future uses don't forget to actually check the result (this is a common issue with verify functions returning bools).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially change it to be type PasswordVerificationResult<T> = Result<T, ()> too, but that just might be even more confusing to use.

Comment on lines +169 to 170
tracing::warn!("Invalid login form: {form_state:?}");
PASSWORD_LOGIN_COUNTER.add(1, &[KeyValue::new(RESULT, "error")]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add an additional label on the metric to attach the reason on each failure, which will be easier to look in graphs rather than logs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It'd only be useful if we can capture enough information in the metrics to be able to debug it, but we also can't add much cardinality to the metrics without causing issues for Prometheus.

TBH, so long as we actually have what is going wrong in the logs then its probably easy enough to just check in there (or in tracing, etc) 🤷

}
}

impl Display for User {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of adding a display implementation for things in datamodel, unless the whole thing has an obvious string representation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, have removed for now. Although I do suspect we'll run into this later where we want to ergonomically log the user, so not sure what the best plan there is


#[error("password verification failed")]
PasswordVerificationFailed(#[source] anyhow::Error),
PasswordVerificationFailed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the rest, split this into PasswordMismatch and PasswordVerificationFailed(#[source] anyhow::Error), and mark the second as an internal server error like we do for ProvisionDeviceFailed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have renamed. Errors raised during verification are currently going to InternalError, is it helpful to split that up into a separate variant? I wouldn't have thought so as they're unexpected internal errors at that point.

password: Zeroizing<String>,
hashed_password: String,
) -> Result<Option<(SchemeVersion, String)>, anyhow::Error> {
) -> Result<PasswordVerificationResult<Option<(SchemeVersion, String)>>, anyhow::Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the prettiest, but this could return Result<bool, Option<(SchemaVersion, String)>>. I don't think adding this new result type makes sense

@erikjohnston erikjohnston requested a review from sandhose June 20, 2025 09:12
@erikjohnston erikjohnston changed the title Add logging to login on failure Add logging to /login failure Jun 23, 2025
@sandhose
Copy link
Member

sandhose commented Jul 8, 2025

Sorry that I've missed the re-review request

@sandhose sandhose merged commit 38088e5 into main Jul 8, 2025
20 checks passed
@sandhose sandhose deleted the erikj/user_password_logging branch July 8, 2025 16:30
@sandhose sandhose added A-Logging Related to logs output by the service T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. labels Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Logging Related to logs output by the service T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants