It is difficult to optimize an in memory cache without duckdb/pandas

Currently the edr operations work by fetching the entire content of locations/ then slowly filtering it step by step until you get the data that the user wants. If you fetch this data from redis you have to reconstruct the pydantic model each time which is slow. It would be ideal to validate this once since it rarely will change.

The easiest way to perform filtering on one big location response is to drop in place. However, the issue is that if you drop in place, then you can't cache the python object itself in memory and have to do a `copy.deepcopy()` at the start of every API call (to make sure you can mutate without affecting other references), effectively defeating the purpose of an in memory cache. If you try to get around this by generating a new python object after every filter then you add extra allocations, risk invalidating your pydantic models, and generally also defeat the purpose of in memory caching. 

The easiest way to circumvent this issue is to just cache all of locations/ in a table in something like duckdb/pandas/sqlite and then performing directly joins on that. You pay for the up front cost of caching but then you have a datastructure that is optimized for joins and can return filtered results easier without needing to mutate state in the pydantic model or other python object.

This isn't a big deal at the moment and redis is sufficient but worth calling out for tracking it as we iterate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is difficult to optimize an in memory cache without duckdb/pandas #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

It is difficult to optimize an in memory cache without duckdb/pandas #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions