Skip to content
Discussion options

You must be logged in to vote

I see, the reason is that we have tokenizers in the repo only for counting tokens, so that content can be chunked correctly and RAG prompts can be optimized in size, and CalculateSHA256 is just an internal utility that we need to check for some internal uniqueness logic. Neither Encode or CalculateSHA256 are part of the public API, and like many other internal things they are subject to refactoring and renaming, because it's logic there just to support the solution as a whole.

For tokenizers I would suggest taking a dependency on Microsoft.ML.Tokenizers if you need something stable and supported, while CalculateSHA256 is just two lines of code that you can copy:

static string CalculateSHA256

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by dluc
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
1. Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #507 on June 04, 2024 19:40.