How to use the tokenizer on Android/kotlin #980
Replies: 2 comments 2 replies
-
You can export the tokenizer as an ONNX model itself and then run it. There's an example of using an ONNX based tokenizer here - https://github.com/oracle/sd4j/blob/main/src/main/java/com/oracle/labs/mlrg/sd4j/TextEmbedder.java#L186, though I don't have any public code for exporting them in Java. This Python code shows how to stitch the tokenizer to the top of the embedding model - https://github.com/microsoft/onnxruntime-extensions/blob/main/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L328. |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks to the links you provide, the steps to use the toknenizer which has been converted to an ONNX model seems to be very clear.
I do not know how to create an ONNX model from those files. Any help would be very appreciated, the tokenisation is the last piece of my puzzle to be able to run ma FAQ application on Android. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all
this is this first time I use the ONNX framework
I developed a python code that exported (and quantized) the model "intfloat/multilingual-e5-small" to an ONNX model
My exportation directory contains the model and all the tokenizer configuration files:
│ quantized_model.onnx │ sentencepiece.bpe.model │ special_tokens_map.json │ tokenizer.json │ tokenizer_config.json
In my Kotlin project on Android Studio, I added the following dependencies:
implementation("com.microsoft.onnxruntime:onnxruntime-android:latest.release") implementation("com.microsoft.onnxruntime:onnxruntime-extensions-android:latest.release")
In my Kotlin class, I initiliaze the ORT session as follow:
val modelBytes = context.assets.open("quantized_model.onnx").readBytes() ortSession = ortEnv.createSession(modelBytes, OrtSession.SessionOptions())
Then I am looking for tokenizing the input sentence with a similar code to:
val tokenizationResult = ortSession.tokenize(sentence) val inputIds = tokenizationResult.inputIds val attentionMask = tokenizationResult.attentionMask
I know it cannot compile because there is no method "tokenize" does not exist.
But I was not able to find the good solution to use the ONNX libs to tokenize.
Could you please help ?
Do you have any information or code sample that could help me for getting started with the tokenization using ONNX?
Thanks you so much for your help
Beta Was this translation helpful? Give feedback.
All reactions