Skip to content

Changelist generation doesn't scale #18

@giorgiobasile

Description

@giorgiobasile

In the ChangeListExecutor class, the changelist_generator collects all the resources from a previously generated resourcelist using the update_previous_state() method. This was reasonable for the rspub-core filesystem-centric approach, but generally speaking this just doesn't scale (I'm working with ~70 million resources).
I guess the only reason for doing so is being able to perform this check, which is again reasonable when you have file system resources, but what should happen is that your resource generator should be able to list changes and label them as C/U/D without relying on py-resourcesync. You should therefore use a specific generator for "change" resources (or make a generator able to issue resources or changes based on the strategy).
What I mean is something like:

resource_generator = self.resource_generator()
changes = {change for count, change in resource_generator(resource_metadata)}
created = [r for r in changes if r.change=="created"]
updated = [r for r in changes if r.change=="updated"]
deleted = [r for r in changes if r.change=="deleted"]

What do you think? Does it sound reasonable?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions