-
Context / ScenarioI previously referenced some static methods of KM, but after upgrading, I found that they all became internal QuestionWhy have many recent versions changed public to internal, and for what reason? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Hi @xuzeyu91, which methods in particular? |
Beta Was this translation helpful? Give feedback.
-
For example: CalculateSHA256 and GPT3Tokenizer Encode |
Beta Was this translation helpful? Give feedback.
-
I see, the reason is that we have tokenizers in the repo only for counting tokens, so that content can be chunked correctly and RAG prompts can be optimized in size, and CalculateSHA256 is just an internal utility that we need to check for some internal uniqueness logic. Neither Encode or CalculateSHA256 are part of the public API, and like many other internal things they are subject to refactoring and renaming, because it's logic there just to support the solution as a whole. For tokenizers I would suggest taking a dependency on Microsoft.ML.Tokenizers if you need something stable and supported, while CalculateSHA256 is just two lines of code that you can copy: static string CalculateSHA256(this BinaryData binaryData)
{
byte[] byteArray = SHA256.HashData(binaryData.ToMemory().Span);
return Convert.ToHexString(byteArray).ToLowerInvariant();
} |
Beta Was this translation helpful? Give feedback.
-
Okay, that's what I'm doing now. I copied out CalculateSHA256 because I have a custom ITextEmbeddingGenerator, so the token counter is using GPT3Tokenizer Encode, And now I have replaced it with Default GPTTokenizer.StaticCountTokens (queryStr) |
Beta Was this translation helpful? Give feedback.
I see, the reason is that we have tokenizers in the repo only for counting tokens, so that content can be chunked correctly and RAG prompts can be optimized in size, and CalculateSHA256 is just an internal utility that we need to check for some internal uniqueness logic. Neither Encode or CalculateSHA256 are part of the public API, and like many other internal things they are subject to refactoring and renaming, because it's logic there just to support the solution as a whole.
For tokenizers I would suggest taking a dependency on Microsoft.ML.Tokenizers if you need something stable and supported, while CalculateSHA256 is just two lines of code that you can copy: