@@ -48,36 +48,34 @@ You can run the SNF evaluation using various backends.
4848### OpenAI Compatible Servers
4949
5050``` bash
51- repoqa.search_needle_function --model " gpt4-turbo" --caching -- backend openai
51+ repoqa.search_needle_function --model " gpt4-turbo" --backend openai
5252# 💡 If you use customized server such vLLM:
5353# repoqa.search_needle_function --base-url "http://url.to.vllm.server/v1" \
54- # --model "gpt4-turbo" --caching -- backend openai
54+ # --model "gpt4-turbo" --backend openai
5555```
5656
5757### Anthropic Compatible Servers
5858
5959``` bash
60- repoqa.search_needle_function --model " claude-3-haiku-20240307" --caching -- backend anthropic
60+ repoqa.search_needle_function --model " claude-3-haiku-20240307" --backend anthropic
6161```
6262
6363### vLLM
6464
6565``` bash
66- repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" \
67- --caching --backend vllm
66+ repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" --backend vllm
6867```
6968
7069### HuggingFace transformers
7170
7271``` bash
73- repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" \
74- --caching --backend hf --trust-remote-code
72+ repoqa.search_needle_function --model " Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
7573```
7674
7775### Google Generative AI API (Gemini)
7876
7977``` bash
80- repoqa.search_needle_function --model " gemini-1.5-pro-latest" --caching -- backend google
78+ repoqa.search_needle_function --model " gemini-1.5-pro-latest" --backend google
8179```
8280
8381> [ !Tip]
@@ -93,7 +91,7 @@ repoqa.search_needle_function --model "gemini-1.5-pro-latest" --caching --backen
9391> - ` --backend ` : ` vllm ` (default) or ` openai `
9492> - ` --base-url ` : OpenAI API base URL
9593> - ` --code-context-size ` (default: 16384): #tokens (by DeepSeekCoder tokenizer) of repository context
96- > - ` --caching ` (default: True): accelerate subsequent runs by caching tokenization and chuncking results
94+ > - ` --caching ` (default: True): accelerate subsequent runs by caching preprocessing; ` --nocaching ` to disable
9795> - ` --max-new-tokens ` (default: 1024): Maximum #new tokens to generate
9896> - ` --system-message ` (default: None): system message (note it's not supported by some models)
9997> - ` --tensor-parallel-size ` : #GPUS for doing tensor parallelism (only for vLLM)
0 commit comments