Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4236 +/- ##
==========================================
+ Coverage 88.23% 88.30% +0.07%
==========================================
Files 1765 1765
Lines 44344 44361 +17
Branches 686 686
==========================================
+ Hits 39126 39175 +49
+ Misses 5202 5170 -32
Partials 16 16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b2037c0 to
9d65d4f
Compare
This is not true, those lines are covered by https://github.com/3scale/porta/blob/513293b4e74515fced4aaef41a07e5e442fb6720/features/provider/admin/user/access_tokens.feature. But that's somehow ignored by Codecov. Probably a misconfiguration. |
app/models/access_token.rb
Outdated
| def self.find_from_value(plaintext_value) | ||
| return nil if plaintext_value.blank? | ||
|
|
||
| scrubbed = plaintext_value.to_s.scrub | ||
| digest = compute_digest(scrubbed) | ||
|
|
||
| def self.find_from_value(value) | ||
| find_by(value: value.to_s.scrub) | ||
| # Fast path: find by digest (new/migrated tokens) | ||
| token = find_by(value: digest) | ||
| return token if token | ||
|
|
||
| # Legacy tokens can't be exactly 96 chars (that's our hash length) | ||
| # This prevents using a leaked hash directly as a token | ||
| return nil if scrubbed.length == HASHED_TOKEN_LENGTH | ||
|
|
||
| # Slow path: find by plaintext (legacy tokens) | ||
| token = find_by(value: scrubbed) | ||
| return nil unless token | ||
|
|
||
| # Migrate on use: replace plaintext with hash | ||
| token.migrate_to_hashed!(scrubbed) | ||
| token |
There was a problem hiding this comment.
Why not make a database migration instead of having another piece of crazy logic to handle legacy and new data nad never to be sure that the data was eventually fully converted or not and have to notify customers when we finally decide to remove the legacy support, wait for them, check they actually stopped using this, wait more, etc.
There was a problem hiding this comment.
I agree on adding a DB migration or a migration script, that was my plan. But I don't think this is crazy logic or this can be replaced by a migration. Backwards compatibility is mandatory because otherwise we couldn't deploy without downtime, and the logic is really simple, just a few additional lines in a one and single method.
The safe path to proceed is to deploy code that is compatible with pre and post migration state, then run the migration at any moment, then in the next deploy, remove the code to support legacy tokens.
It's as simple as creating a jira issue to remove the legacy support and schedule it to the release after the one including this PR. I don't see any drama, just regular work.
There was a problem hiding this comment.
Backwards compatibility is mandatory because otherwise we couldn't deploy without downtime
if we start with this, then pure database migration will not be really feasible as far as I can tell.
I'm not talking about ruby code migration but pure query migration that can execute very fast on the DB server side.
A short downtime is always acceptable if it is short. For example a couple of minutes is totally fine.
While a pure ruby migration might be super slow. On the other hand it can be lazily executed.
There was a problem hiding this comment.
But this lazy execution can only reliably be done on clusters we maintain, not sure we can rely customers running this. That's why I think for on-prem much better would be a one-step migration without gimmicks.
There was a problem hiding this comment.
A short downtime is always acceptable if it is short. For example a couple of minutes is totally fine.
Well, I think no down time is even better.
I'm not talking about ruby code migration but pure query migration that can execute very fast on the DB server side.
Like an SQL query that hashes the values? Is there a way to do it reliably on all the DBs, including Internet Explor... I mean Oracle? Is it really so fast?
Everything is trade-off. If I did it this way is because for me, providing a easy and less risky way to migrate is worth the super smaller extra complexity. Forcing the client to have a down time and execute a migration written in SQL that must work in all DBs and versions we support is more risky and complicated for them I think. And the only advantage would be that it's allegedly less work for us, and maybe not even that.
@mayorova @Madnialihussain Please untie.
There was a problem hiding this comment.
From our discussion, I suggest this approach:
- New tokens are created hashed
- Hashed tokens are identified by the prefix with the algorithm
- When receiving a request
- Find by hashed
- If it fails, find by clear
- If it fails, reject
- Clear tokens are not migrated on the fly
Deploy paths:
- SaaS:
- Deploy code
- Leave it running to ensure we don't rollback
- Run the migration
- Two queries will only happen in the lapse between deploying code and running the migration
- On premises:
- Have downtime
- Update the code
- Run the migration
- Start server
- They will never run two queries per hash
Result:
| One time deploy | Backwards compatibility | No downtime | Rollback | Leak-safe | No DDL Migration | No Corner cases |
|---|---|---|---|---|---|---|
| ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
WDYT?
There was a problem hiding this comment.
I fact, it's almost one time deploy, bc the second phase is only a cleanup of dead code.
There was a problem hiding this comment.
I'm fine with whatever you find yourself comfortable with. I'm only not sure about the 2 queries, but provided on first request the key will be migrated and 2 queries will consistently run only for missing keys, perhaps should be fine.
There was a problem hiding this comment.
In my last suggestion, I added:
Clear tokens are not migrated on the fly
This is to provide a way back. But two queries will only happen until they run the migration.
On the other hand, I think we can actually deploy in two phases for SaaS and a single phase for on premises, before we can release the two phases altogether in one release if they are in fact having downtime anyway.
There was a problem hiding this comment.
It is not ultimately down-gradable still, because new tokens may be created while in the new version. Still much higher downgradeablibity.
| def create | ||
| @presenter = AccessTokensNewPresenter.new(current_account) | ||
| create! do |success, _failure| | ||
| create! do |success, failure| |
There was a problem hiding this comment.
This syntax belongs to inherited_resources 😅
As we're trying to get rid of it, I'd suggest to use a standard Rails approach to avoid conflicts.
There was a problem hiding this comment.
Exactly, this is how I learnt about that gem.
As we're trying to get rid of it, I'd suggest to use a standard Rails approach to avoid conflicts.
Sure!
|
|
||
| def self.random_id | ||
| SecureRandom.hex(32) | ||
| SecureRandom.hex(48) |
There was a problem hiding this comment.
Hm,the existing 64-characters tokens are represented as 96-character long hash, it seems. Why is it necessary to also increase the generated plain value size? 🤔
While we can do it, but I think it's better if we can avoid it - maybe some customers also have some assumptions based on the token size (even though it might not be wise of them).
There was a problem hiding this comment.
Because it's more resistant to quantum computers, if those actually work some day.
There was a problem hiding this comment.
+1 to generate tokens that contain at least as many random bits as the hash used to store them. This is not expensive so should be noticeable in performance.
9d65d4f to
bae2dcc
Compare
* New token are generated already hashed * Old tokens are hashed first time they are used Co-Authored-By: Claude <noreply@anthropic.com>
* Remove dead code * Render on create instead of redirecting, so the user can see the token while still in memory * New template to show the generated token. It was embedded in the index template before
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
c3be649 to
e58951c
Compare
❌ 55 blocking issues (55 total)
|
|
I made a few changes, my last proposal: Changes:
Deploy paths:
Support table:
Performance tests:
Results:
|
akostadinov
left a comment
There was a problem hiding this comment.
Awesome work!
I have a few nitpick comments but overall very solid, thanks!
|
|
||
| def self.find_from_value(value) | ||
| find_by(value: value.to_s.scrub) | ||
| scrubbed = plaintext_value.to_s.scrub |
There was a problem hiding this comment.
do we need scrub?
Just a nitpick for waste of compute resources.
There was a problem hiding this comment.
yeah we need it because we re-use it for the legacy comparison. We can remove it in the follow-up PR.
| section id="access-tokens" | ||
| h2 Access Tokens | ||
| p | ||
| ' Access tokens are personal tokens that let you authenticate against the Account Management API, the Analytics API and the Billing API through HTTP Basic Auth. You can create multiple access tokens with custom scopes and permissions. We suggest you create tokens with the minimal scopes & permissions needed for the task at hand. Use Access Tokens from within the |
There was a problem hiding this comment.
Can we use i18n for all strings? 👼
But not blocking merge can be done later.
|
|
||
| def new | ||
| @presenter = AccessTokensNewPresenter.new(current_account) | ||
| @access_token = access_tokens.build |
There was a problem hiding this comment.
I assume we can't easily avoid instantiating an object here generating a key, but at the same time generating random data should be fast enough nowadays...
| "WHERE value NOT LIKE '#{DIGEST_PREFIX}%' LIMIT #{BATCH_SIZE}" | ||
| elsif System::Database.postgres? | ||
| "UPDATE access_tokens SET value = '#{DIGEST_PREFIX}' || encode(sha384(value::bytea), 'hex') " \ | ||
| "WHERE id IN (SELECT id FROM access_tokens WHERE value NOT LIKE '#{DIGEST_PREFIX}%' LIMIT #{BATCH_SIZE})" |
There was a problem hiding this comment.
Now I'm thinking, postgres may potentially not have the crypto extension enabled. Maybe we should have some failsafe solution for this case 😬
But can be discussed after merge to see if we need one and what exactly.
There was a problem hiding this comment.
We don't use the extension, that's for old postgres versions, in PG 14 there's a builtin alternative, the sha384() function I'm using here.
https://www.postgresql.org/docs/14/functions-binarystring.html
| assert_equal legacy_value, @token.reload.read_attribute(:value) | ||
| end | ||
|
|
||
| def test_find_from_value_rejects_leaked_hash_as_token |
There was a problem hiding this comment.
this one seems to be a dup with authentication with leaked database hash fails
There was a problem hiding this comment.
Not dupped, one is unit and the other is integration test. They test different layers
| assert @token.reload.read_attribute(:value).start_with?(AccessToken::DIGEST_PREFIX) | ||
| end | ||
|
|
||
| def test_find_from_value_finds_legacy_token |
There was a problem hiding this comment.
this seems to be a dup with authentication with legacy unmigrated token succeeds
There was a problem hiding this comment.
Same. Different level testing.
app/models/access_token.rb
Outdated
| @@ -1,10 +1,14 @@ | |||
| class AccessToken < ApplicationRecord | |||
| DIGEST_PREFIX = 'SHA384|'.freeze | |||
There was a problem hiding this comment.
Overnight I had an insight that this prefix is not ideal. It will be a constant nuisance to use it on command line:
$ echo SHA384|asdasd
bash: asdasd: command not found...
The | character needs to always be quoted or escaped. It will be much more user friendly to have a shorter prefix without special characters, for example s2_
$ echo s2_asdasd
s2_asdasd
There was a problem hiding this comment.
The _ isn't good neither, because it's a wildcard in mysql:
https://dev.mysql.com/doc/refman/8.4/en/string-comparison-functions.html#operator_like
Let's use $ which is what bcrypt uses and we are already using in our DB for passwords. 65a75ea
I hope you can sleep now.
What this PR does / why we need it:
This adds some protection to access tokens in our DB, so tokens are not visible in the case the DB is leaked.
This implements two measures to increase protection:
Since we want to make this backwards compatible, the current code still allows plain text tokens to exist and be used to authenticate.
The code is designed to be super simple. The day we want to remove the migration code and the support for plain text tokens, we'll only have to remove a few lines in the
find_from_valuemethod.A brief explanation of the new design:
valueattribute will always unconditionally hold the value in the DB column. That is: plain text if unmigrated, hash if migratedplaintext_valueholds the plain text value of a hashed token. It's not persisted so it's lost when the instance is removed. This attribute is designed to provide a way to return the original value once in the request response, be it UI or API. The user will have that only opportunity to store it somewhere else. After that, we won't hold the original value anywhere.Which issue(s) this PR fixes
https://issues.redhat.com/browse/THREESCALE-12408
Verification steps
Special notes for your reviewer:
How to review:
There are 90+ files edited in this PR but most of them are just the result of replacing
access_token.valuebyaccess_token.plaintext_valuein tests. This PR is better reviewed commit by commit: