Skip to content

Commit 67278d7

Browse files
committed
update VLM modle docs
1 parent 9da13f3 commit 67278d7

File tree

17 files changed

+415
-344
lines changed

17 files changed

+415
-344
lines changed

samples/genie/c++/API.md

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,5 @@
11
# GenieAPIService API <br>
22

3-
## Note: <br>
4-
Using '127.0.0.1:8910' to access a local service is about 2 seconds faster in establishing a connection compared to using 'localhost:8910'. It seems like the domain name resolution is taking extra time.
5-
6-
## Parameters: <br>
7-
-c, --config_file: Path to the config file.<br>
8-
-m, --model_name: Name of the model to use.<br>
9-
--adapter: if using lora model, set the adapter of lora.<br>
10-
--lora_aplha: if using lora model, set lora value weight.<br>
11-
-l, --load_model: if load model.<br>
12-
-a, --all_text: Output all text includes tool calls text. Disabled by default.<br>
13-
-t, --enable_thinking: Enable thinking model. Disabled by default.<br>
14-
-v, --version: Print version info and exit.<br>
15-
-n, --num_response: The number of dialogue turns saved in the history record. If you do not need to enable the historical context feature, please set this value to 0.<br>
16-
-o, --min_output_num: The number of tokens reserved for output.<br>
17-
-d, --loglevel: log level setting for record.<br>
18-
-f, --logfile: log file path, it's a option.<br>
19-
-p, --port: Port used for running.<br>
20-
21-
Note: Please note that the input length must not exceed the maximum number of tokens reserved for the input, which means it cannot exceed the model's maximum context length minus the value set for '--min_output_num'. You can invoke the Text Splitter to send the input text to the server for segmentation, and then sequentially pass the split segments to the LLM to complete the question-answering process.<br>
22-
Note: It is recommended to disable thinking mode when using the tools call function.<br>
23-
Note: You can refer to [GenieAPIClientTools.py](../python/GenieAPIClientTools.py) on how to use tools call.<br>
24-
25-
```
26-
GenieAPIService\GenieAPIService.exe -c "genie\python\models\Qwen3.0-8B-v31\config.json" -l --all_text --enable_thinking --num_response 10 --min_output_num 1024 -p 8096
27-
```
28-
293
## Text Splitter
304
This function can divide a long text into multiple paragraphs according to the priority order of the specified delimiter and the maximum length of each paragraph. Length is counted by token number instead of text length. You can also use this function to calculate the token number of text. <br>
315
You can get the sample code on how to use Text Splitter

samples/genie/c++/README.md

Lines changed: 14 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -10,31 +10,31 @@
1010
This software is provided “as is,” without any express or implied warranties. The authors and contributors shall not be held liable for any damages arising from its use. The code may be incomplete or insufficiently tested. Users are solely responsible for evaluating its suitability and assume all associated risks. <br>
1111
Note: Contributions are welcome. Please ensure thorough testing before deploying in critical systems.
1212

13-
## Introduction
13+
## Introduction
1414
This sample helps developers use C++ to build Genie based Open AI compatibility API service on Windows on Snapdragon (WoS), Mobile and Linux platforms.
1515

1616
## Features
1717
• Support LLM on both CPU & NPU [*NEW!*] <br>
18-
• Support both stream and none stream mode <br>
18+
• Support both stream and none stream mode <br>
1919
• Support switching between models <br>
2020
• Support customization model <br>
21-
• Support text splitter feature <br>
22-
• Support tools call <br>
23-
• Support enable/disable thinking mode <br>
24-
• Support lora <br>
25-
• Support history feature <br>
21+
• Support text splitter feature <br>
22+
• Support tools call <br>
23+
• Support enable/disable thinking mode <br>
24+
• Support lora <br>
25+
• Support history feature <br>
2626

2727
## GenieAPIService
2828
Genie OpenAI Compatible API Service.
2929

3030
This is an OpenAI compatible API service that can be used to access the Genie AI model.
3131
This service can be used on multiple platforms such as Android, Windows, Linux, etc.
3232

33-
### Run the service on WoS:
33+
### Run the service on WoS:
3434
You can also run the batch file from [QAI AppBuilder Launcher](../../../tools/launcher/) to setup the environment automatically. <br>
3535
1. [Setup LLM models](https://github.com/quic/ai-engine-direct-helper/tree/main/samples/genie/python#step-3-download-models-and-tokenizer-files) first before running this service. <br>
3636
2. Download [GenieAPIService](https://github.com/quic/ai-engine-direct-helper/releases/download/v2.38.0/GenieAPIService_v2.1.0_QAIRT_v2.38.0_v73.zip) and copy the subdirectory "GenieAPIService" to path "ai-engine-direct-helper\samples".<br>
37-
3. Run the following commands to launch the Service (Do *not* close this terminal window while service is running).
37+
3. Run the following commands to launch the Service (Do *not* close this terminal window while service is running).
3838

3939
```
4040
cd ai-engine-direct-helper\samples
@@ -67,19 +67,11 @@ INFO: Service Is Ready Now!
6767
## GenieAPIService API:
6868
Refere to [API](./API.md) for detailed information.
6969

70-
## Client Usage:
71-
The service can be access through the ip address '127.0.0.1:8910', it's compatible with OpenAI API.
72-
73-
### C++ Client Sample Code:
74-
Here is a C++ client sample: [GenieAPIClient.cpp](Service/GenieAPIClient.cpp). You can get the compiled 'GenieAPIClient.exe' from [GenieAPIService](https://github.com/quic/ai-engine-direct-helper/releases/download/v2.38.0/GenieAPIService_v2.1.0_QAIRT_v2.38.0_v73.zip). The sample app depends on the dynamical library 'libcurl.dll' which also included in the 'GenieAPIService' package.
75-
76-
We can run it with the command below in a new terminal window:
77-
```
78-
GenieAPIService\GenieAPIClient.exe --prompt "How to fish?" --stream --model "IBM-Granite-v3.1-8B"
79-
```
70+
## GenieService & C++ Client usage:
71+
It is located at [Docs](Service/docs/USAGE.MD)
8072

8173
### Python Client Sample Code:
82-
Here is a Python client sample (You can save it to 'GenieAPIClient.py'):
74+
Here is a Python client sample (You can save it to 'GenieAPIClient.py'):
8375

8476
```
8577
import argparse
@@ -114,10 +106,10 @@ else:
114106
print(response.choices[0].message.content)
115107
```
116108

117-
We can run it with the command below in a new terminal window:
109+
We can run it with the command below in a new terminal window:
118110
```
119111
python GenieAPIClient.py --prompt "How to fish?" --stream
120112
```
121113

122114
### More Sample Code
123-
You can find more client sample code [here](../python/README.md#sample-list).
115+
You can find more client sample code [here](../python/README.md#sample-list).
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
ai-engine-direct-helper/
2-
build/
2+
build/
3+
GenieService_v*
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# GenieService Usage
2+
3+
GenieService is a pair of HTTP CS(client and server) process.
4+
5+
You should launch the server first, then use client for ask questions.
6+
7+
## Service start:
8+
9+
The Genie will start with loading the model config file. if you need to load VLM models, please follow
10+
the [VLM model layout](VLM_Deployment.MD)
11+
12+
`GenieService.exe -c modles/Qwen2.0-7B-SSD/config.json -l`
13+
14+
or VLM example
15+
16+
`GenieService.exe -c modles/qwen2.5vl3b/config.json -l`
17+
18+
There is some options, You can also type `GenieService.exe -h` to look up the more usage.
19+
20+
```
21+
Options:
22+
-h,--help Print this help message and exit
23+
-c,--config_file TEXT Path to the config file.
24+
--adapter TEXT the adapter of lora
25+
-l,--load_model Load the model.
26+
-a,--all_text Output all text includes tool calls text.
27+
-t,--enable_thinking Enable thinking mode.
28+
-v,--version Print version info and exit.
29+
-n,--num_response INT The number of rounds saved in the historical record
30+
-o,--min_output_num INT The minimum number of tokens output
31+
-d,--loglevel INT log level setting for record. 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Verbose
32+
-f,--logfile TEXT log file path, it's a option
33+
--lora_alpha FLOAT lora Alpha Value
34+
-p,--port INT Port used for running
35+
```
36+
37+
Note: Please note that the input length must not exceed the maximum number of tokens reserved for the input, which means it cannot exceed the model's maximum context length minus the value set for '--min_output_num'. You can invoke the Text Splitter to send the input text to the server for segmentation, and then sequentially pass the split segments to the LLM to complete the question-answering process.<br>
38+
Note: It is recommended to disable thinking mode when using the tools call function.<br>
39+
Note: You can refer to [GenieAPIClientTools.py](../../python/GenieAPIClientTools.py) on how to use tools call.<br>
40+
41+
## Client Start
42+
43+
### text models
44+
45+
Asking you a question directly.
46+
47+
`GenieClient.exe --promopt "how to fish?" --stream`
48+
49+
### image models
50+
51+
You should provide your question and the image for client
52+
`GenieClient.exe --promopt "what is the image descript?" --img test.png --stream`
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Deployment
2+
3+
- [qwen2.5vl3b](#qwen2.5vl3b)
4+
5+
## qwen2.5vl3b
6+
7+
```
8+
models/qwen2.5vl3b
9+
│ config.json
10+
│ embedding_weights.raw
11+
│ htp_backend_ext_config.json
12+
│ llm_model-0.bin
13+
│ llm_model-1.bin
14+
│ prompt.json
15+
│ tokenizer.json
16+
│ veg.serialized.bin
17+
└───raw
18+
full_attention_mask.raw
19+
position_ids_cos.raw
20+
position_ids_sin.raw
21+
window_attention_mask.raw
22+
```

samples/genie/c++/Service/src/GenieAPIService/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ add_executable(${CMAKE_PROJECT_NAME} ${SERVICE_SOURCES})
5858
target_include_directories(${CMAKE_PROJECT_NAME} PRIVATE ${INCLUDE_PATH})
5959
target_compile_definitions(${CMAKE_PROJECT_NAME} PRIVATE -DNOMINMAX -DBUILDING_LIBCURL -D_HAS_STD_BYTE=0)
6060
target_link_directories(${CMAKE_PROJECT_NAME} PRIVATE ${EXTERNAL_LIB_PATH})
61+
target_link_options(${CMAKE_PROJECT_NAME} PRIVATE /DEBUG)
6162
target_link_libraries(${CMAKE_PROJECT_NAME} PRIVATE
6263
libappbuilder
6364
Crypt32

samples/genie/c++/Service/src/GenieAPIService/platform.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ if (USE_MNN)
5757
-DCMAKE_CXX_COMPILER=${MSVC_CLANG_COMPILER}
5858
-DCMAKE_LINKER=${MSVC_CLANG_LINKER}
5959
-DLLM_SUPPORT_VISION=ON
60-
-DMNN_BUILD_OPENCV=ON
60+
-DMNN_BUILD_OPENCV=ON
6161
-DMNN_IMGCODECS=ON
6262
-DLLM_SUPPORT_AUDIO=ON
6363
-DMNN_BUILD_AUDIO=ON

0 commit comments

Comments
 (0)