-
Notifications
You must be signed in to change notification settings - Fork 13
Description
I think it would be nice to have a small utility data structure to fetch pretrained embeddings. I don't think this needs to be part of the finalfusion crate, since it is not really core functionality. The basic idea is:
-
We'd have a repository
finalfusion-fetcherwith some metadata file (probably JSON), mapping embedding file identifiers to URLs. E.g.fasttext.wiki.nl.fifucould map to http://www.sfs.uni-tuebingen.de/a3-public-data/finalfusion-fasttext/wiki/wiki.nl.fifu -
A small crate (possibly in the same repo), would provide a datastructure
FetcherWith a constructor that retrieves the metadata and gives a fetcher:let fetcher = Fetcher::fetch_metadata().unwrap();
A user could then open embeddings:
let dutch_embeddings = fetcher.open("fasttext.wiki.nl.fifu").unwrap();
This method would check if the embeddings are already available. If not, fetch them, store them in a standard XDG location. Then it would open the embeddings stored in this location.
Similarly,
Fetcher::mmapcould be used to memory-map an embedding after downloading.
After this is implemented, the functionality could also be exposed in finalfusion-python.