-
Notifications
You must be signed in to change notification settings - Fork 54
Description
❓ Questions & Help
Details
I am following the tutorial hereto include pre-computed embeddings when I train a Two Tower Retrieval model. Specifically, I am using this method to not to include the Embedding Table as part of the model:
loader = mm.Loader(
train,
batch_size=1024,
transforms=[
EmbeddingOperator(
pretrained_movie_embs,
lookup_key="movieId",
embedding_name="pretrained_movie_embeddings",
),
],
)
I am trying to match this solution with the Retrieval Model tutorial here.
# Top-K evaluation
candidate_features = unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID)
candidate_features.head()
topk = 20
topk_model = model.to_top_k_encoder(candidate_features, k=topk, batch_size=128)
# we can set `metrics` param in the `compile(), if we want
topk_model.compile(run_eagerly=False)
The problem is that loader.output_schema is different from loader.dataset.schema. The utility function unique_rows_by_features requires a dataset as the first argument, but passing loader.dataset doesn't work as this dataset doesn't contain the embedding vectors yet.
My question is, using the method to include pre-trained embeddings described above, how should one get the candidate_features, required by the Candidate Tower from the loader?
Thank you in advance if you take your time to answer!