Skip to content

[QST] How to get candidate features after using pre-trained embedding tables? #1239

@hkristof03

Description

@hkristof03

❓ Questions & Help

Details

I am following the tutorial hereto include pre-computed embeddings when I train a Two Tower Retrieval model. Specifically, I am using this method to not to include the Embedding Table as part of the model:

loader = mm.Loader(
    train,
    batch_size=1024,
    transforms=[
        EmbeddingOperator(
            pretrained_movie_embs,
            lookup_key="movieId",
            embedding_name="pretrained_movie_embeddings",
        ),
    ],
)

I am trying to match this solution with the Retrieval Model tutorial here.

# Top-K evaluation
candidate_features = unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID)
candidate_features.head()

topk = 20
topk_model = model.to_top_k_encoder(candidate_features, k=topk, batch_size=128)

# we can set `metrics` param in the `compile(), if we want
topk_model.compile(run_eagerly=False)

The problem is that loader.output_schema is different from loader.dataset.schema. The utility function unique_rows_by_features requires a dataset as the first argument, but passing loader.dataset doesn't work as this dataset doesn't contain the embedding vectors yet.

My question is, using the method to include pre-trained embeddings described above, how should one get the candidate_features, required by the Candidate Tower from the loader?

Thank you in advance if you take your time to answer!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions