In #137, an external tokenizer service based on UDS (Unix Domain Socket) was proposed.
This service aims to enhance compatibility with vLLM tokenization by utilizing the Python transformers library. However, due to its external nature, it may impact tokenization performance. Therefore, an appropriate benchmark needs to be proposed so that we can make a reasonable trade-off between external and internal tokenization.
Additionally, this service can still benefit from performance optimization techniques. For example, using gRPC instead of HTTP+JSON could be considered. The benchmark can be used to measure the performance improvements brought by different optimization approaches.