support LLava-Next(Stronger) model (#933)

hjh0119 · web-flow · commit c6c1cdf9a240 · 2024-05-16T23:03:30.000+08:00
diff --git a/README.md b/README.md
@@ -41,6 +41,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
 SWIFT has rich documentations for users, please check [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM).
 
 ## 🎉 News
+- 2024.05.16: Supports Llava-Next (Stronger) series models. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
 - 🔥2024.05.13: Support Yi-1.5 series models，use `--model_type yi-1_5-9b-chat` to begin!
 - 2024.05.11: Support for qlora training and quantized inference using [hqq](https://github.com/mobiusml/hqq) and [eetq](https://github.com/NetEase-FuXi/EETQ). For more information, see the [LLM Quantization Documentation](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/LLM-quantization.md).
 - 2024.05.10: Support split a sequence to multiple GPUs to reduce memory usage. Use this feature by `pip install .[seq_parallel]`, then add `--sequence_parallel_size n` to your DDP script to begin!
@@ -514,6 +515,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
 | MiniCPM-V          | [OpenBmB MiniCPM vision model](https://github.com/OpenBMB/MiniCPM)           | Chinese<br>English    | 3B                                 | chat model         |
 | CogVLM<br>CogAgent | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/)         | English    | 17B-18B                            | chat model         |
 | Llava              | [Llava series models](https://github.com/haotian-liu/LLaVA)                  | English | 7B-34B                             | chat model |
+| Llava-Next              | [Llava-Next series models](https://github.com/LLaVA-VL/LLaVA-NeXT)                  | Chinese<br>English | 8B-110B                             | chat model |
 | mPLUG-Owl          | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl)               | English | 11B                                | chat model |
 | InternVL           | [InternVL](https://github.com/OpenGVLab/InternVL)                            | Chinese<br>English | 25.5B<br>including quantized version                              | chat model |
 | Llava-llama3       | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers)   | English | 8B                                 | chat model |
diff --git a/README_CN.md b/README_CN.md
@@ -42,6 +42,7 @@ SWIFT支持近**200种LLM和MLLM**（多模态大模型）的训练、推理、
 SWIFT具有丰富的文档体系，如有使用问题请请查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM).
 
 ## 🎉 新闻
+- 2024.05.16: 支持Llava-Next (Stronger)系列模型，最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
 - 🔥2024.05.13: 支持Yi-1.5系列模型，使用`--model_type yi-1_5-9b-chat`等开始体验
 - 2024.05.11: 支持使用[hqq](https://github.com/mobiusml/hqq)和[eetq](https://github.com/NetEase-FuXi/EETQ)进行qlora训练和量化推理，可以查看[LLM量化文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/LLM量化文档.md)
 - 2024.05.10: 支持序列并行. 先安装`pip install .[seq_parallel]`, 之后在DDP环境中添加`--sequence_parallel_size n`即可使用!
@@ -514,6 +515,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | MiniCPM-V      | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 3B               | chat模型          |
 | CogVLM<br>CogAgent | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/)  | 英文 | 17B-18B          | chat模型          |
 | Llava      | [Llava系列模型](https://github.com/haotian-liu/LLaVA)                | 英文 | 7B-34B               | chat模型 |
+| Llava-Next      | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT)                | 中文<br>英文 | 8B-110B       | chat模型 |
 | mPLUG-Owl      | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl)         | 英文 | 11B               | chat模型 |
 | InternVL         | [InternVL](https://github.com/OpenGVLab/InternVL)                | 中文<br>英文 | 25.5B<br>包含量化版本 | chat模型 |
 | Llava-llama3       | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers)   | 英文 | 8B  | chat model |
diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -123,6 +123,9 @@
 |atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
 |llava1d6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|multi-modal, vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
 |llava1d6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|&#x2714;|&#x2718;||multi-modal, vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)|
+|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
+|llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)|
+|llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)|
 |yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)|
 |yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)|
 |yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)|
diff --git a/docs/source/Multi-Modal/llava最佳实践.md b/docs/source/Multi-Modal/llava最佳实践.md
@@ -1,5 +1,16 @@
 
 # Llava 最佳实践
+本篇文档对应的模型
+
+| model | model_type |
+|-------|------------|
+| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1d6-mistral-7b-instruct |
+| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1d6-yi-34b-instruct |
+|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b|
+|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b|
+|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b|
+
+以下实践以`llava-v1.6-mistral-7b`为例，你也可以通过指定`--model_type`切换为其他模型
 
 ## 目录
 - [环境准备](#环境准备)
@@ -16,10 +27,8 @@ pip install -e '.[llm]'
 ```
 
 ## 推理
-
-推理[llava1d6-mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)和[llava1d6-yi-34b-instruct](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary):
 ```shell
-# Experimental environment: A10, 3090, V100...
+# Experimental environment: A100
 # 20GB GPU memory
 CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1d6-mistral-7b-instruct
 
@@ -110,7 +119,7 @@ from swift.llm import (
 from swift.utils import seed_everything
 import torch
 
-model_type = ModelType.llava1d6_mistral_7b_instruct # ModelType.llava1d6_yi_34b_instruct
+model_type = 'llava1d6-mistral-7b-instruct'
 template_type = get_default_template_type(model_type)
 print(f'template_type: {template_type}')
 
@@ -208,7 +217,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
 ## 微调后推理
 直接推理:
 ```shell
-model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
+model_type="llava1d6-mistral-7b-instruct"
 
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \
@@ -217,7 +226,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
 
 **merge-lora**并推理:
 ```shell
-model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
+model_type="llava1d6-mistral-7b-instruct"
 CUDA_VISIBLE_DEVICES=0 swift export \
     --ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \
     --merge_lora true
diff --git a/docs/source_en/LLM/Supported-models-datasets.md b/docs/source_en/LLM/Supported-models-datasets.md
@@ -123,6 +123,9 @@ The table below introcudes all models supported by SWIFT:
 |atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
 |llava1d6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|multi-modal, vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
 |llava1d6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|&#x2714;|&#x2718;||multi-modal, vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)|
+|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
+|llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)|
+|llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)|
 |yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)|
 |yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)|
 |yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)|
diff --git a/docs/source_en/Multi-Modal/llava-best-practice.md b/docs/source_en/Multi-Modal/llava-best-practice.md
@@ -1,4 +1,16 @@
 # Llava Best Practices
+The document corresponds to the following models
+
+| model | model_type |
+|-------|------------|
+| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1d6-mistral-7b-instruct |
+| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1d6-yi-34b-instruct |
+|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b|
+|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b|
+|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b|
+
+The following practices take `llava-v1.6-mistral-7b` as an example. You can also switch to other models by specifying the `--model_type`.
+
 
 ## Table of Contents
 - [Environment Setup](#environment-setup)
@@ -14,10 +26,8 @@ pip install -e '.[llm]'
 ```
 
 ## Inference
-
-Inference for [llava1d6-mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) and [llava1d6-yi-34b-instruct](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary):
 ```shell
-# Experimental environment: A10, 3090, V100...
+# Experimental environment: A100
 # 20GB GPU memory
 CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1d6-mistral-7b-instruct
 
@@ -108,7 +118,7 @@ from swift.llm import (
 from swift.utils import seed_everything
 import torch
 
-model_type = ModelType.llava1d6_mistral_7b_instruct # ModelType.llava1d6_yi_34b_instruct
+model_type = 'llava1d6-mistral-7b-instruct'
 template_type = get_default_template_type(model_type)
 print(f'template_type: {template_type}')
 
@@ -205,15 +215,15 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
 ## Inference after Fine-tuning
 Direct inference:
 ```shell
-model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
+model_type="llava1d6-mistral-7b-instruct"
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \
     --load_dataset_config true
 ```
 
 **merge-lora** and inference:
 ```shell
-model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
+model_type="llava1d6-mistral-7b-instruct"
 CUDA_VISIBLE_DEVICES=0 swift export \
     --ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \
     --merge_lora true
diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -160,6 +160,9 @@ class ModelType:
     # llava
     llava1d6_mistral_7b_instruct = 'llava1d6-mistral-7b-instruct'
     llava1d6_yi_34b_instruct = 'llava1d6-yi-34b-instruct'
+    llama3_llava_next_8b = 'llama3-llava-next-8b'
+    llava_next_72b = 'llava-next-72b'
+    llava_next_110b = 'llava-next-110b'
     # yi
     yi_6b = 'yi-6b'
     yi_6b_200k = 'yi-6b-200k'
@@ -3910,23 +3913,53 @@ def _new_generate(inputs=None, *args, **kwargs):
     function_kwargs={'llm_model_type': 'mistral'},
     tags=['multi-modal', 'vision'],
     hf_model_id='liuhaotian/llava-v1.6-mistral-7b')
+@register_model(
+    ModelType.llama3_llava_next_8b,
+    'AI-Modelscope/llama3-llava-next-8b',
+    LoRATM.llama2,
+    TemplateType.llama_llava_next,
+    support_flash_attn=True,
+    tags=['multi-modal', 'vision'],
+    function_kwargs={'llm_model_type': 'next_llama'},
+    hf_model_id='lmms-lab/llama3-llava-next-8b')
+@register_model(
+    ModelType.llava_next_72b,
+    'AI-Modelscope/llava-next-72b',
+    LoRATM.llama2,
+    TemplateType.llava_qwen_instruct,
+    support_flash_attn=True,
+    tags=['multi-modal', 'vision'],
+    function_kwargs={'llm_model_type': 'next_qwen'},
+    hf_model_id='lmms-lab/llava-next-72b')
+@register_model(
+    ModelType.llava_next_110b,
+    'AI-Modelscope/llava-next-110b',
+    LoRATM.llama2,
+    TemplateType.llava_qwen_instruct,
+    support_flash_attn=True,
+    tags=['multi-modal', 'vision'],
+    function_kwargs={'llm_model_type': 'next_qwen'},
+    hf_model_id='lmms-lab/llava-next-110b')
 def get_model_tokenizer_llava(model_dir: str,
                               torch_dtype: Dtype,
                               model_kwargs: Dict[str, Any],
                               load_model: bool = True,
                               **kwargs):
+    llm_model_type = kwargs.pop('llm_model_type')
     if 'local_repo_path' in kwargs:
-        local_repo_path = kwargs['local_repo_path']
+        repo_path = kwargs['local_repo_path']
+    elif 'next' in llm_model_type:
+        repo_path = 'https://github.com/LLaVA-VL/LLaVA-NeXT.git'
     else:
-        local_repo_path = _git_clone_github('https://github.com/haotian-liu/LLaVA.git')
+        repo_path = 'https://github.com/haotian-liu/LLaVA.git'
+    local_repo_path = _git_clone_github(repo_path)
     sys.path.append(os.path.join(local_repo_path))
 
-    llm_model_type = kwargs.pop('llm_model_type')
     if llm_model_type == 'mistral':
         from llava.model import LlavaMistralForCausalLM, LlavaMistralConfig
         model_config = LlavaMistralConfig.from_pretrained(model_dir)
         automodel_class = LlavaMistralForCausalLM
-    else:  # llama
+    elif 'llama' in llm_model_type:  # llama
         from llava.model import LlavaLlamaForCausalLM, LlavaConfig
         if not hasattr(LlavaLlamaForCausalLM, '__old_forward'):  # Avoid double patching
             forward = LlavaLlamaForCausalLM.forward
@@ -3940,6 +3973,11 @@ def _new_forward(*args, **kwargs):
             LlavaLlamaForCausalLM.forward = _new_forward
         model_config = LlavaConfig.from_pretrained(model_dir)
         automodel_class = LlavaLlamaForCausalLM
+    else:  # qwen
+        from llava.model import LlavaQwenForCausalLM
+        automodel_class = LlavaQwenForCausalLM
+        model_config = AutoConfig.from_pretrained(model_dir)
+
     model_config.mm_vision_tower = snapshot_download('AI-ModelScope/clip-vit-large-patch14-336')
     model, tokenizer = get_model_tokenizer_with_flash_attn(
         model_dir,
diff --git a/swift/llm/utils/template.py b/swift/llm/utils/template.py
@@ -38,6 +38,8 @@ class TemplateType:
     llava_mistral_instruct = 'llava-mistral-instruct'
     llava_yi_instruct = 'llava-yi-instruct'
     llava_llama_instruct = 'llava-llama-instruct'
+    llava_qwen_instruct = 'llava-qwen-instruct'
+    llama_llava_next = 'llama-llava-next'
     openbuddy = 'openbuddy'
     openbuddy2 = 'openbuddy2'
     internlm = 'internlm'
@@ -1060,6 +1062,40 @@ def data_collator(self, batch: List[Dict[str, Any]], padding_to: Optional[int] =
     lazy_tokenize=True)
 
 
+class LlamaLlavaNextTemplate(LLavaTemplate):
+    default_system = 'You are a helpful language and vision assistant. ' \
+            'You are able to understand the visual content that the user provides, ' \
+            'and assist the user with a variety of tasks using natural language.'
+
+    def __init__(self):
+        Template.__init__(self, [], [
+            '<|start_header_id|>user<|end_header_id|>\n\n', [-200],
+            '\n{{QUERY}}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'
+        ], ['<|eot_id|>'], ['<|eot_id|>'], self.default_system,
+                          ['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{{SYSTEM}}'])
+
+
+register_template(
+    TemplateType.llama_llava_next,
+    LlamaLlavaNextTemplate(),
+    use_model=True,
+    infer_media_type='round',
+    lazy_tokenize=True)
+
+
+class LLavaQwenTemplate(LLavaTemplate):
+    llavayi_query_template = 'You are a helpful assistant'
+
+    def __init__(self):
+        Template.__init__(self, [], ['<|im_start|>user\n', [-200], '{{QUERY}}<|im_end|>\n<|im_start|>assistant\n'],
+                          ['<|im_end|>\n'], ['<|im_end|>'], self.llavayi_query_template,
+                          ['<|im_start|>system\n{{SYSTEM}}<|im_end|>\n'])
+
+
+register_template(
+    TemplateType.llava_qwen_instruct, LLavaQwenTemplate(), use_model=True, infer_media_type='round', lazy_tokenize=True)
+
+
 def _findall(token_list: List[int], token: int) -> List[int]:
     """Find the index of a token in the token_list."""
     res = []