Skip to content

Conversation

@eth3lbert
Copy link
Contributor

@eth3lbert eth3lbert commented Feb 6, 2025

The total number of versions is one of the blockers for avoiding loading full versions within the application. To address this, we can either send a separate request to retrieve the number, or, as suggested by @Turbo87, store the value in db and include it in the crate response. The latter approach seems more promising, as it eliminates the need for an extra request solely to obtain the total.

To ensure the value is up-to-date and maintain deployment convenience, the implementation will be separated into multiple PRs:

@eth3lbert eth3lbert added C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-backend ⚙️ labels Feb 6, 2025
@eth3lbert
Copy link
Contributor Author

eth3lbert commented Feb 6, 2025

Admin could leverage the following SQL to set the num_versions values:

WITH filtered AS (
	SELECT crate_id
	FROM default_versions
	WHERE num_versions is NULL
	ORDER BY crate_id
	-- the limit can be tuned or removed as needed.
	LIMIT 10000
), to_update AS (
	SELECT crate_id, count(*) AS num_versions
	FROM versions
	JOIN filtered USING (crate_id)
	GROUP BY crate_id
)
UPDATE default_versions
SET num_versions = to_update.num_versions
FROM to_update
WHERE default_versions.crate_id = to_update.crate_id;

@eth3lbert eth3lbert force-pushed the total-crate-versions branch 2 times, most recently from 5cd638f to ffde83a Compare February 6, 2025 08:56
@eth3lbert eth3lbert force-pushed the total-crate-versions branch from ffde83a to 739b0d3 Compare February 6, 2025 10:50
@eth3lbert eth3lbert changed the title Add default_versions.total column Add default_versions.num_versions column Feb 6, 2025
@eth3lbert eth3lbert force-pushed the total-crate-versions branch from 739b0d3 to 46f6a3d Compare February 6, 2025 13:03
Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm good with this, but would appreciate a quick glance from @LawnGnome before we move forward :)

@Turbo87 Turbo87 requested a review from LawnGnome February 6, 2025 16:17
Copy link
Contributor

@LawnGnome LawnGnome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is blocking, but here are my thoughts:

Firstly, default_versions is now not as accurate a name for the table. It's probably not worth renaming right now, but computed_versions might be closer to what we're doing.

Secondly, I don't love the extra complexity in the publish endpoint, honestly. I know some of that is transitional, but I get slightly twitchy any time denormalised table updates happen in one place outside of the model layer, because that makes it easy to not do it in unusual circumstances.

For default version calculation, we don't really have a choice (since it's — at best — challenging to do in SQL), but it feels like we could calculate num_versions with a trigger on the versions table, which would get rid of that potential issue.

I'm not saying we must use a trigger, and I wouldn't block the merging of this PR as-is, but @eth3lbert, I'm interested in whether you considered that here?

@Turbo87
Copy link
Member

Turbo87 commented Feb 7, 2025

but it feels like we could calculate num_versions with a trigger on the versions table, which would get rid of that potential issue.

that's a good point. if we can do this within the database then we could avoid a lot of the application code complexity.

I assume a "insert version => increment num_versions, delete version => decrement num_versions" might be sufficient with a one-time query to set the initial counts.

@eth3lbert
Copy link
Contributor Author

I'm not saying we must use a trigger, and I wouldn't block the merging of this PR as-is, but @eth3lbert, I'm interested in whether you considered that here?

Good suggestion! I totally forgot about the trigger function here. Triggers can be a bit tricky sometimes, but we already have similar one for categories, so I'm happy to use a trigger. Thanks again!

@eth3lbert
Copy link
Contributor Author

I assume a "insert version => increment num_versions, delete version => decrement num_versions" might be sufficient with a one-time query to set the initial counts.

Yeah, this is basically what I initially thought. With this, a default value of 0 seems more suitable!

@eth3lbert eth3lbert force-pushed the total-crate-versions branch 2 times, most recently from 536b190 to b8a25fb Compare February 7, 2025 14:11
@Turbo87
Copy link
Member

Turbo87 commented Feb 7, 2025

do we still need the second commit?

@eth3lbert
Copy link
Contributor Author

do we still need the second commit?

I see this as a last resort, but it should be okay to remove.

@Turbo87
Copy link
Member

Turbo87 commented Feb 7, 2025

probably fine to drop it. but I guess we should adjust the publish endpoint to insert num_versions: 1 for the first record or change the default to 1.

@eth3lbert eth3lbert force-pushed the total-crate-versions branch from b8a25fb to f614247 Compare February 7, 2025 15:08
@eth3lbert
Copy link
Contributor Author

but I guess we should adjust the publish endpoint to insert num_versions: 1 for the first record or change the default to 1.

I added a test for this to determine if a default value of 0 is sufficient.

@eth3lbert
Copy link
Contributor Author

but I guess we should adjust the publish endpoint to insert num_versions: 1 for the first record or change the default to 1.

Yeah, it looks like we must default to 1 or set it to 1 in endpoint for first version. I actually lean towards setting it in the publish endpoint and making the default value null. This has the benefit that existing crates would remain null since null + 1 = null, making it easier to determine if this has been initialized by an admin yet. What do you think, @Turbo87 @LawnGnome?

@eth3lbert eth3lbert force-pushed the total-crate-versions branch 4 times, most recently from 323304f to 384d9c3 Compare February 7, 2025 15:51
@LawnGnome
Copy link
Contributor

Yeah, it looks like we must default to 1 or set it to 1 in endpoint for first version. I actually lean towards setting it in the publish endpoint and making the default value null. This has the benefit that existing crates would remain null since null + 1 = null, making it easier to determine if this has been initialized by an admin yet. What do you think, @Turbo87 @LawnGnome?

The only reason num_versions is nullable now is for the initial post-migration case where we haven't updated the records for existing crates, right?

If that's the case, then I'd rather set a default, make it NOT NULL, and just accept that we need to run the post-migration queries to set num_versions immediately after deployment. Since we're merging this PR before the PR that actually uses this value, that seems safe enough as long as we:

  1. Deploy this PR.
  2. Set all the num_versions values.
  3. Deploy the PR that actually uses this in the API.

@eth3lbert
Copy link
Contributor Author

just accept that we need to run the post-migration queries to set num_versions immediately after deployment. Since we're merging this PR before the PR that actually uses this value, that seems safe enough as long as we:

1. Deploy this PR.

2. Set all the `num_versions` values.

3. Deploy the PR that actually uses this in the API.

You're absolutely right. This is exactly what I expect for the deployment process. The main reason for exposing APIs in separate PRs is to ensure that the post-migration process is completed before exposing them, as you mentioned.

The only reason num_versions is nullable now is for the initial post-migration case where we haven't updated the records for existing crates, right?

Yeah, basically! And it also allows the update script to determine the initialization status by null.

On the other hand, with a default non-null value (like 1) for existing crates and a new version released before initialization, it would increment the value to 2. This means the update script loses the ability to determine the initialization status using a nullish value.

However, this isn't a major issue. It only requires adjusting the update script to update all records in a range instead of relying on the initialization state.

I'm fine with either approach. I can make the change if needed!

@Turbo87 Do you also suggest using a non-null default value?

Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also suggest using a non-null default value?

I think I prefer nullable for now

@Turbo87 Turbo87 merged commit af35097 into rust-lang:main Feb 10, 2025
10 checks passed
@eth3lbert eth3lbert deleted the total-crate-versions branch February 11, 2025 06:22
@Turbo87
Copy link
Member

Turbo87 commented Feb 12, 2025

WITH filtered AS (
	SELECT crate_id
	FROM default_versions
	WHERE num_versions is NULL
	ORDER BY crate_id
	-- the limit can be tuned or removed as needed.
	LIMIT 10000
), to_update AS (
	SELECT crate_id, count(*) AS num_versions
	FROM versions
	JOIN filtered USING (crate_id)
	GROUP BY crate_id
)
UPDATE default_versions
SET num_versions = to_update.num_versions
FROM to_update
WHERE default_versions.crate_id = to_update.crate_id;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-backend ⚙️ C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants