-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Build: Add BuildConfig model for deduplicated config storage #12646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: humitos <[email protected]>
Co-authored-by: humitos <[email protected]>
_config attribute data modelling
humitos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good first step towards the end goal. Removing the _config and using this new field will reduce the complexity a lot and will give us a lot more power to perform simple queries over config files 🎉
@copilot please, take a look at the feedback I gave you and fix it.
| null=True, | ||
| blank=True, | ||
| ) | ||
| readthedocs_yaml_data = models.ForeignKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK with this name? Other alternatives are:
readthedocs_yaml(no_dataat the end)yaml_configconfig_yamlconfig_file
cc @ericholscher what do you think?
… basic Django tests Co-authored-by: humitos <[email protected]>
|
This is a good starting point to create the field and start populating it. We can move forward here and then work on the part two of this work: use the new field everywhere, and then the part three: remove the old field. @ericholscher let me know if this plan sounds good to you. |
The
Build._configfield currently stores configuration data in two ways: the full YAML config, or a reference to another build's config ({"__config": <build_id>}). This dual-purpose design complicates querying (e.g., "which projects use ubuntu-22.04?") and only deduplicates within a project's consecutive builds.Changes
New
BuildConfigmodel withdataJSONField andunique=TrueconstraintTimeStampedModelfor audit trailAdded
Build.readthedocs_yaml_dataForeignKeyBuildConfigwithSET_NULLon deleteBuildConfig.objects.filter(data__build__os="ubuntu-22.04")_configuses reference styleUpdated
Build.save()to populate both fieldsget_or_create()to leverage unique constraintreadthedocs_yaml_databy following reference chains when neededMigration
0066_add_buildconfig_model.pymarkedSafe.before_deploy()Focused integration tests validating Build and BuildConfig interaction
Example
Rollout
This implements Step 1 of the multi-step migration. The
_configfield remains functional for backward compatibility. Future steps will:_configdata toBuildConfiginstancesreadthedocs_yaml_data_configfield and helper methodsOriginal prompt
This section details on the original issue you should resolve
<issue_title>Build: simplify
_configattribute data modelling</issue_title><issue_description>We are currently using a
models.JSONFieldfor theBuild._configattribute. This field can be used in two different ways:This is basically to avoid duplicating the data on our database. However, it only avoids duplicated config on the same projects and only for consecutive builds.
I found this approach pretty confusing when exploring
Buildobjects because we have to do extra operations to find out the real YAML used for that build (see for example theconfig @propertyatreadthedocs.org/readthedocs/builds/models.py
Lines 313 to 331 in 12cbb3d
Also, it makes hard to answer questions like "What are the projects using
build.os: ubuntu-22.04?" in an easy way and many other similar questions that require querying the config file.New approach proposal
Due to these limitations and complications, I want to propose creating a new model called
BuildConfigthat's used as a foreign key from theBuildmodel:Then the new model would be:
Benefits
BuildConfig.objects.get_or_create(data=...)).select_related("readthedocs_yaml_data")BuildConfig.objects.filter(data__build__os="ubuntu-22.04").projects.count()to answer my previous example question)@propertyto get the YAML for a particular build@config.setterhelperBuild.saveself._config_changedSummarizing, this approach keeps the same features but simplifies the application and modelling and gives us more features to analyze in an easier way platform usage and make better decisions based on them.
Rollout proposal
Build.readthedocs_yaml_datafield and create the newBuildConfigmodelBuild.readthedocs_yaml_dataandBuild._configwhile doing the migrationBuild._configintoBuildConfigand linkBuild.readthedocs_yaml_datato themBuild.readthedocs_yaml_datafieldBuild._configfield</issue_description>Comments on the Issue (you are @copilot in this section)
@humitos Another important benefit that I'm seeing here about making querying our database easier: while working on the deprecation of config file and `build.image` I had to make multiple queries and iterate over all the builds returned to get the _real_ value of `Build._config` https://github.com/readthedocs/readthedocs.org/blob/1b74999add86d6a45f4c6f6149dcb21d5ef2322b/readthedocs/projects/tasks/utils.py#L235-L251Instead, we would be able to do all that in just one query:
That query will give me "projects building with
build.image: readthedocs/build:lateston their default version with a successful build in the last year", which is a lot easier to read, maintain and communicate to the rest of the team.Note that with the current structure, we won't get the accurate data because those builds where the config file is identical to their previous build will have
_configdefined as{"__config": 12345678}"</comment_new><comment_new>@ericholscher
Yeah, it seems like our implementation is just a hacky way of doing a m2m, without any of the nice benefits. This seems like an obvious thing to move forward...
_configattribute data modelling #10586💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.