Skip to content

Build: simplify _config attribute data modelling #10586

@humitos

Description

@humitos

We are currently using a models.JSONField for the Build._config attribute. This field can be used in two different ways:

  1. store the whole YAML configuration file
  2. save a build ID where the full/whole YAML configuration file is stored

This is basically to avoid duplicating the data on our database. However, it only avoids duplicated config on the same projects and only for consecutive builds.

I found this approach pretty confusing when exploring Build objects because we have to do extra operations to find out the real YAML used for that build (see for example the config @property at

@property
def config(self):
"""
Proxy to the configuration of the build.
:returns: The configuration used in the last successful build.
:rtype: dict
"""
last_build = (
self.builds(manager=INTERNAL).filter(
state=BUILD_STATE_FINISHED,
success=True,
).order_by('-date')
.only('_config')
.first()
)
if last_build:
return last_build.config
return None
)

Also, it makes hard to answer questions like "What are the projects using build.os: ubuntu-22.04?" in an easy way and many other similar questions that require querying the config file.

New approach proposal

Due to these limitations and complications, I want to propose creating a new model called BuildConfig that's used as a foreign key from the Build model:

class Build(models.Model):
    # ...
    readthedocs_yaml_data = models.ForeignKey(
        "BuildConfig"
        null=True,
        blank=True,
    )
    # ...

Then the new model would be:

class BuildConfig(TimeStampedModel):
    # ...
    data = models.JSONField(unique=True)
    # ...

Benefits

  • the "avoid duplicated" data is solved automatically by the database at a table level among all the projects, instead of per-project (e.g. BuildConfig.objects.get_or_create(data=...))
  • the previous point will reduce the size of the table considerably, since there won't be any duplicated config
  • getting the YAML for a particular build is going to be pretty fast since we have direct access to it
  • we can reduce the number of queries performed by using .select_related("readthedocs_yaml_data")
  • answer interesting/complex questions pretty easily using Django ORM (e.g. BuildConfig.objects.filter(data__build__os="ubuntu-22.04").projects.count() to answer my previous example question)
  • remove the requirement of a helper @property to get the YAML for a particular build
  • remove the requirement for a @config.setter helper
  • remove the requirement of having extra logic at Build.save
  • remove the need of using self._config_changed
  • remove having two different meanings for the same field
  • allows us to quickly show badges on "Build details page" as Anthony suggested for ext-theme

Summarizing, this approach keeps the same features but simplifies the application and modelling and gives us more features to analyze in an easier way platform usage and make better decisions based on them.

Rollout proposal

  1. add a Build.readthedocs_yaml_data field and create the new BuildConfig model
  2. start saving both fields Build.readthedocs_yaml_data and Build._config while doing the migration
  3. make a data migration to convert Build._config into BuildConfig and link Build.readthedocs_yaml_data to them
  4. make the application to use the new Build.readthedocs_yaml_data field
  5. remove the old code that uses Build._config field

Metadata

Metadata

Assignees

Labels

AcceptedAccepted issue on our roadmapFeatureNew feature

Type

No type

Projects

Status

Planned

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions