Skip to content

Conversation

@alimaredia
Copy link
Contributor

Enable users to bring their own chunks as contexts for question and answer generation.

Add quick review of seed examples after questions
and answers are genrerated.

" chunks_jsonl_path = contribution[\"dir\"] / CHUNKING_DIR / \"chunks.jsonl\"\n",
" authoring_path = contribution[\"dir\"] / AUTHORING_DIR\n",
"\n",
" selected_chunks_path = get_random_chunks(chunks_jsonl_path,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bit misleading to me because I would expect a function called get_random_chunks to return chunks in some sort of data structure, but it actually saves to a nonobvious path and returns the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I changed the name of the function to create_random_chunks_jsonl().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's misleading - it's not creating chunks is it? save_chunk_selection?

Copy link
Contributor Author

@alimaredia alimaredia Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion a lot, I ended up with save_random_chunk_selection. I like having random in the function name so users understand how the chunks are being selected.

"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
"version": "3.12.10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep taking turns overwriting this in various PRs. I think we still have the requirement for <= 3.11 for some pieces, right?

@iamemilio
Copy link
Contributor

iamemilio commented Jun 26, 2025 via email

@alimaredia alimaredia force-pushed the chunk-context-mgmt branch 5 times, most recently from 1e9e231 to b14e1a5 Compare June 26, 2025 18:27
Enable users to bring their own chunks as contexts
for question and answer generation.

Add quick review of seed examples after questions
and answers are genrerated.

Signed-off-by: Ali Maredia <[email protected]>
@alimaredia alimaredia force-pushed the chunk-context-mgmt branch from b14e1a5 to 176c4b0 Compare June 27, 2025 12:53
@khaledsulayman khaledsulayman merged commit ac219dd into instructlab:main Jun 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants