Skip to content

Implement TF-IDF cache loading#549

Merged
dakinggg merged 1 commit intoallenai:mainfrom
cthoyt:tfidf-ann-caching
Sep 11, 2025
Merged

Implement TF-IDF cache loading#549
dakinggg merged 1 commit intoallenai:mainfrom
cthoyt:tfidf-ann-caching

Conversation

@cthoyt
Copy link
Contributor

@cthoyt cthoyt commented Aug 30, 2025

Before, create_tfidf_ann_index would only write cache files to a given directory.

This PR implements functionality for loading cache files, if they're already there. It also does a minor refactor to reduce code duplication for the loading of such cache objects, which was also implemented in CandidateGenerator.__init__()

I've found this to be very useful for #542, where I don't want to have to spend a time- and compute-intensive process to rebuild the index each time, but I also want to have my code be fully reproducible / not require manually running scripts ahead of time to create cache files

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me!

@dakinggg
Copy link
Collaborator

Looks like a couple lint errors to resolve

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 11, 2025

@dakinggg thanks for the review (and sorry I missed the styling). I have updated it and squashed my commits together. This should be ready for merge now.

@dakinggg dakinggg merged commit ad9da74 into allenai:main Sep 11, 2025
11 checks passed
@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 11, 2025

@dakinggg do you think you could make a new release following this PR? thanks!

@cthoyt cthoyt deleted the tfidf-ann-caching branch September 11, 2025 19:10
@dakinggg
Copy link
Collaborator

Sure, I can probably do one next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments