Skip to content

Dataset tool#325

Open
mayabar wants to merge 5 commits intollm-d:mainfrom
mayabar:dataset-tool
Open

Dataset tool#325
mayabar wants to merge 5 commits intollm-d:mainfrom
mayabar:dataset-tool

Conversation

@mayabar
Copy link
Collaborator

@mayabar mayabar commented Feb 5, 2026

Tool for creation an input-output dataset for the simulator which is based on HugingFace GPT datasets.

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
…le, dataset tool readme added

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@mayabar mayabar requested a review from irar2 February 5, 2026 06:41
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
}

rec := outputRecord{
PromptHash: getTextHash(inputText),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PromptHash should be a hash of the tokenized prompt (to support the tokens-in scenario in the simulator)

…erated dataset card + other PR comments

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants