Skip to content

Commit 600bdc1

Browse files
HaiHui886haihwang
andauthored
Add Qwen1.5-72B-chat (#44)
* push image to opencsg registry * fix bug pad token * remove cpu_nums and add time log * add deep code model * update * update OpenCSG model name * update deepseek parameters * add Qwen1.5 * add Qwen 1.5 72B --------- Co-authored-by: haihwang <[email protected]>
1 parent ada574d commit 600bdc1

File tree

2 files changed

+51
-1
lines changed

2 files changed

+51
-1
lines changed

llmserve/backend/server/config.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,8 @@
131131
"opencsg/opencsg-starcoder-v0.1": "./models/text-generation--opencsg--opencsg-starcoder-15B-v0.1-pipeline.yaml",
132132
"OpenCSG/opencsg-starcoder-v0.1": "./models/text-generation--opencsg--opencsg-starcoder-15B-v0.1-pipeline.yaml",
133133
"opencsg/opencsg-deepseek-coder-1.3b-v0.1": "./models/text-generation--opencsg--opencsg-deepseek-coder-1.3b-v0.1.yaml",
134-
"OpenCSG/opencsg-deepseek-coder-1.3b-v0.1": "./models/text-generation--opencsg--opencsg-deepseek-coder-1.3b-v0.1.yaml"
134+
"OpenCSG/opencsg-deepseek-coder-1.3b-v0.1": "./models/text-generation--opencsg--opencsg-deepseek-coder-1.3b-v0.1.yaml",
135+
"Qwen/Qwen1.5-72B-Chat": "./models/text-generation--Qwen--Qwen1.5-72B-Chat.yaml"
135136
}
136137

137138
SERVE_RUN_HOST = "0.0.0.0"
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
deployment_config:
2+
autoscaling_config:
3+
min_replicas: 1
4+
initial_replicas: 1
5+
max_replicas: 1
6+
target_num_ongoing_requests_per_replica: 1.0
7+
metrics_interval_s: 10.0
8+
look_back_period_s: 30.0
9+
smoothing_factor: 1.0
10+
downscale_delay_s: 300.0
11+
upscale_delay_s: 90.0
12+
ray_actor_options:
13+
num_cpus: 2 # for a model deployment, we have 3 actor created, 1 and 2 will cost 0.1 cpu, and the model infrence will cost 6(see the setting in the end of the file)
14+
model_config:
15+
warmup: False
16+
model_task: text-generation
17+
model_id: Qwen/Qwen1.5-72B-Chat
18+
max_input_words: 800
19+
initialization:
20+
s3_mirror_config:
21+
bucket_uri: /data/models/Qwen1.5-72B-Chat/
22+
initializer:
23+
type: DeviceMap
24+
dtype: float16
25+
from_pretrained_kwargs:
26+
use_cache: true
27+
trust_remote_code: true
28+
# use_kernel: true # for deepspped type only
29+
# max_tokens: 1536 # for deepspped type only
30+
pipeline: defaulttransformers
31+
# pipeline: default
32+
generation:
33+
max_batch_size: 1
34+
generate_kwargs:
35+
bos_token_id: 151643,
36+
# pad_token_id: 151643,
37+
# eos_token_id: [151645, 151643],
38+
do_sample: false
39+
max_new_tokens: 512
40+
repetition_penalty: 1.05
41+
temperature: 0.7
42+
top_p: 0.8
43+
top_k: 20
44+
prompt_format: "'role': 'user', 'content': {instruction}"
45+
# stopping_sequences: ["### Response:", "### End"]
46+
scaling_config:
47+
num_workers: 1
48+
num_gpus_per_worker: 7
49+
num_cpus_per_worker: 32 # for inference

0 commit comments

Comments
 (0)