Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

It is difficult to optimize an in memory cache without duckdb/pandas #35

@C-Loftus

Description

@C-Loftus

Currently the edr operations work by fetching the entire content of locations/ then slowly filtering it step by step until you get the data that the user wants. If you fetch this data from redis you have to reconstruct the pydantic model each time which is slow. It would be ideal to validate this once since it rarely will change.

The easiest way to perform filtering on one big location response is to drop in place. However, the issue is that if you drop in place, then you can't cache the python object itself in memory and have to do a copy.deepcopy() at the start of every API call (to make sure you can mutate without affecting other references), effectively defeating the purpose of an in memory cache. If you try to get around this by generating a new python object after every filter then you add extra allocations, risk invalidating your pydantic models, and generally also defeat the purpose of in memory caching.

The easiest way to circumvent this issue is to just cache all of locations/ in a table in something like duckdb/pandas/sqlite and then performing directly joins on that. You pay for the up front cost of caching but then you have a datastructure that is optimized for joins and can return filtered results easier without needing to mutate state in the pydantic model or other python object.

This isn't a big deal at the moment and redis is sufficient but worth calling out for tracking it as we iterate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions