Skip to content

[QST] Categorifying nested lists in NVTabular and transformers4rec #792

@maciekrtb

Description

@maciekrtb

❓ Questions & Help

Details

Hello everyone! In my sequential recommendation dataset every item actually comes annotated with a list of categories (potentially with repeated values). The following would be a pretty meaningful example.

data = [
    {"session_id": 1, "item_id-list": [101, 102, 103], "categories-list": [[A, B], [C, D], [E]]},
    {"session_id": 2, "item_id-list": [201, 202], "categories-list": [[A], [F, F]]}
]

Is it possible to categorify the categories present above in a nested way so that:

  • the lists [[A,B], [C,D], ..], .. do not become separate tokens but remain lists of categorified elements (e.g. [[1,2], [3,4], [6]] and [[1], [5,5]])
  • we can then feed those into EmbeddingBag downstream?

I've tried supplying the Dataset constructor with an appropriate schema, but unfortunately failed. I could also try flattening the lists categorifying and fusing back but this looks like a inefficient and bad idea..

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions