Skip to content

How to use tiktoken as a tokenizer? #131

@AlanLu0808

Description

@AlanLu0808

Hi,

I am using your project and noticed that the current tokenizer only works well with English text. When I try to use it with Chinese (or other non-English languages), the results are not satisfactory.

I would like to know:

  1. Is there a way to use tiktoken as the tokenizer in this project?
  2. If not, are there plans to support tiktoken or improve non-English language support in the tokenizer?

My use case involves a lot of multilingual text, so having a better tokenizer (like tiktoken, which handles multilingual text well) would be very helpful.

Thank you for your work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions