-
Notifications
You must be signed in to change notification settings - Fork 155
Open
Labels
Description
❓ Questions & Help
Details
Hello everyone! In my sequential recommendation dataset every item actually comes annotated with a list of categories (potentially with repeated values). The following would be a pretty meaningful example.
data = [
{"session_id": 1, "item_id-list": [101, 102, 103], "categories-list": [[A, B], [C, D], [E]]},
{"session_id": 2, "item_id-list": [201, 202], "categories-list": [[A], [F, F]]}
]Is it possible to categorify the categories present above in a nested way so that:
- the lists
[[A,B], [C,D], ..], ..do not become separate tokens but remain lists of categorified elements (e.g.[[1,2], [3,4], [6]]and[[1], [5,5]]) - we can then feed those into
EmbeddingBagdownstream?
I've tried supplying the Dataset constructor with an appropriate schema, but unfortunately failed. I could also try flattening the lists categorifying and fusing back but this looks like a inefficient and bad idea..