quic
diff --git a/‎samples/genie/c++/API.md‎
Lines changed: 0 additions & 26 deletions b/‎samples/genie/c++/API.md‎
Lines changed: 0 additions & 26 deletions
diff --git a/‎samples/genie/c++/README.md‎
Lines changed: 14 additions & 22 deletions b/‎samples/genie/c++/README.md‎
Lines changed: 14 additions & 22 deletions
diff --git a/‎samples/genie/c++/Service/.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎samples/genie/c++/Service/.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎samples/genie/c++/Service/docs/USAGE.MD‎
Lines changed: 52 additions & 0 deletions b/‎samples/genie/c++/Service/docs/USAGE.MD‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎samples/genie/c++/Service/docs/VLM_DEPLOYMENT.MD‎
Lines changed: 22 additions & 0 deletions b/‎samples/genie/c++/Service/docs/VLM_DEPLOYMENT.MD‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎samples/genie/c++/Service/src/GenieAPIService/CMakeLists.txt‎
Lines changed: 1 addition & 0 deletions b/‎samples/genie/c++/Service/src/GenieAPIService/CMakeLists.txt‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎samples/genie/c++/Service/src/GenieAPIService/platform.cmake‎
Lines changed: 1 addition & 1 deletion b/‎samples/genie/c++/Service/src/GenieAPIService/platform.cmake‎
Lines changed: 1 addition & 1 deletion
@@ -1,31 +1,5 @@
 # GenieAPIService API <br>
 
-## Note: <br>
-Using '127.0.0.1:8910' to access a local service is about 2 seconds faster in establishing a connection compared to using 'localhost:8910'. It seems like the domain name resolution is taking extra time.
-
-## Parameters: <br>
--c, --config_file: Path to the config file.<br>
--m, --model_name: Name of the model to use.<br>
---adapter: if using lora model, set the adapter of lora.<br>
---lora_aplha: if using lora model, set lora value weight.<br>
--l, --load_model: if load model.<br>
--a, --all_text: Output all text includes tool calls text. Disabled by default.<br>
--t, --enable_thinking: Enable thinking model. Disabled by default.<br>
--v, --version: Print version info and exit.<br>
--n, --num_response: The number of dialogue turns saved in the history record. If you do not need to enable the historical context feature, please set this value to 0.<br>
--o, --min_output_num: The number of tokens reserved for output.<br>
--d, --loglevel: log level setting for record.<br>
--f, --logfile: log file path, it's a option.<br>
--p, --port: Port used for running.<br>
-
-Note: Please note that the input length must not exceed the maximum number of tokens reserved for the input, which means it cannot exceed the model's maximum context length minus the value set for '--min_output_num'. You can invoke the Text Splitter to send the input text to the server for segmentation, and then sequentially pass the split segments to the LLM to complete the question-answering process.<br>
-Note: It is recommended to disable thinking mode when using the tools call function.<br>
-Note: You can refer to [GenieAPIClientTools.py](../python/GenieAPIClientTools.py) on how to use tools call.<br>
-
-```
-GenieAPIService\GenieAPIService.exe -c "genie\python\models\Qwen3.0-8B-v31\config.json" -l --all_text --enable_thinking --num_response 10 --min_output_num 1024 -p 8096
-```
-
 ## Text Splitter
 This function can divide a long text into multiple paragraphs according to the priority order of the specified delimiter and the maximum length of each paragraph. Length is counted by token number instead of text length. You can also use this function to calculate the token number of text. <br>
 You can get the sample code on how to use Text Splitter 
 
@@ -10,31 +10,31 @@
 This software is provided “as is,” without any express or implied warranties. The authors and contributors shall not be held liable for any damages arising from its use. The code may be incomplete or insufficiently tested. Users are solely responsible for evaluating its suitability and assume all associated risks. <br>
 Note: Contributions are welcome. Please ensure thorough testing before deploying in critical systems.
 
-## Introduction 
+## Introduction
 This sample helps developers use C++ to build Genie based Open AI compatibility API service on Windows on Snapdragon (WoS), Mobile and Linux platforms.
 
 ## Features
 • Support LLM on both CPU & NPU [*NEW!*] <br>
-• Support both stream and none stream mode <br> 
+• Support both stream and none stream mode <br>
 • Support switching between models <br>
 • Support customization model <br>
-• Support text splitter feature <br> 
-• Support tools call <br> 
-• Support enable/disable thinking mode <br> 
-• Support lora <br> 
-• Support history feature <br> 
+• Support text splitter feature <br>
+• Support tools call <br>
+• Support enable/disable thinking mode <br>
+• Support lora <br>
+• Support history feature <br>
 
 ## GenieAPIService
 Genie OpenAI Compatible API Service.
 
 This is an OpenAI compatible API service that can be used to access the Genie AI model.
 This service can be used on multiple platforms such as Android, Windows, Linux, etc.
 
-### Run the service on WoS: 
+### Run the service on WoS:
 You can also run the batch file from [QAI AppBuilder Launcher](../../../tools/launcher/) to setup the environment automatically. <br>
 1. [Setup LLM models](https://github.com/quic/ai-engine-direct-helper/tree/main/samples/genie/python#step-3-download-models-and-tokenizer-files) first before running this service. <br>
 2. Download [GenieAPIService](https://github.com/quic/ai-engine-direct-helper/releases/download/v2.38.0/GenieAPIService_v2.1.0_QAIRT_v2.38.0_v73.zip) and copy the subdirectory "GenieAPIService" to path "ai-engine-direct-helper\samples".<br>
-3. Run the following commands to launch the Service (Do *not* close this terminal window while service is running). 
+3. Run the following commands to launch the Service (Do *not* close this terminal window while service is running).
 
 ```
 cd ai-engine-direct-helper\samples
@@ -67,19 +67,11 @@ INFO: Service Is Ready Now!
 ## GenieAPIService API:
 Refere to [API](./API.md) for detailed information.
 
-## Client Usage:
-  The service can be access through the ip address '127.0.0.1:8910', it's compatible with OpenAI API.
-
-### C++ Client Sample Code:
-  Here is a C++ client sample: [GenieAPIClient.cpp](Service/GenieAPIClient.cpp). You can get the compiled 'GenieAPIClient.exe' from [GenieAPIService](https://github.com/quic/ai-engine-direct-helper/releases/download/v2.38.0/GenieAPIService_v2.1.0_QAIRT_v2.38.0_v73.zip). The sample app depends on the dynamical library 'libcurl.dll' which also included in the 'GenieAPIService' package.
-
-  We can run it with the command below in a new terminal window:
-```
-GenieAPIService\GenieAPIClient.exe --prompt "How to fish?" --stream --model "IBM-Granite-v3.1-8B"
-```
+## GenieService & C++ Client usage: 
+It is located at [Docs](Service/docs/USAGE.MD)
 
 ### Python Client Sample Code:
-  Here is a Python client sample (You can save it to 'GenieAPIClient.py'):
+Here is a Python client sample (You can save it to 'GenieAPIClient.py'):
 
 ```
 import argparse
@@ -114,10 +106,10 @@ else:
     print(response.choices[0].message.content)
 ```
 
-  We can run it with the command below in a new terminal window:
+We can run it with the command below in a new terminal window:
 ```
 python GenieAPIClient.py --prompt "How to fish?" --stream
 ```
 
 ### More Sample Code
-You can find more client sample code [here](../python/README.md#sample-list).
+You can find more client sample code [here](../python/README.md#sample-list).
@@ -1,2 +1,3 @@
 ai-engine-direct-helper/
-build/
+build/
+GenieService_v*
@@ -0,0 +1,52 @@
+# GenieService Usage
+
+GenieService is a pair of HTTP CS(client and server) process.
+
+You should launch the server first, then use client for ask questions.
+
+## Service start:
+
+The Genie will start with loading the model config file. if you need to load VLM models, please follow
+the [VLM model layout](VLM_Deployment.MD)
+
+`GenieService.exe -c modles/Qwen2.0-7B-SSD/config.json -l`
+
+or VLM example
+
+`GenieService.exe -c modles/qwen2.5vl3b/config.json -l`
+
+There is some options, You can also type `GenieService.exe -h` to look up the more usage.
+
+```
+Options:
+  -h,--help                   Print this help message and exit
+  -c,--config_file TEXT       Path to the config file.
+  --adapter TEXT              the adapter of lora
+  -l,--load_model             Load the model.
+  -a,--all_text               Output all text includes tool calls text.
+  -t,--enable_thinking        Enable thinking mode.
+  -v,--version                Print version info and exit.
+  -n,--num_response INT       The number of rounds saved in the historical record
+  -o,--min_output_num INT     The minimum number of tokens output
+  -d,--loglevel INT           log level setting for record. 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Verbose
+  -f,--logfile TEXT           log file path, it's a option
+  --lora_alpha FLOAT          lora Alpha Value
+  -p,--port INT               Port used for running
+```
+
+Note: Please note that the input length must not exceed the maximum number of tokens reserved for the input, which means it cannot exceed the model's maximum context length minus the value set for '--min_output_num'. You can invoke the Text Splitter to send the input text to the server for segmentation, and then sequentially pass the split segments to the LLM to complete the question-answering process.<br>
+Note: It is recommended to disable thinking mode when using the tools call function.<br>
+Note: You can refer to [GenieAPIClientTools.py](../../python/GenieAPIClientTools.py) on how to use tools call.<br>
+
+## Client Start
+
+### text models
+
+Asking you a question directly.
+
+`GenieClient.exe --promopt "how to fish?" --stream`
+
+### image models
+
+You should provide your question and the image for client
+`GenieClient.exe --promopt "what is the image descript?" --img test.png --stream`
@@ -0,0 +1,22 @@
+# Deployment
+
+- [qwen2.5vl3b](#qwen2.5vl3b)
+
+## qwen2.5vl3b
+
+```
+models/qwen2.5vl3b
+│   config.json
+│   embedding_weights.raw
+│   htp_backend_ext_config.json
+│   llm_model-0.bin
+│   llm_model-1.bin
+│   prompt.json
+│   tokenizer.json
+│   veg.serialized.bin
+└───raw
+        full_attention_mask.raw
+        position_ids_cos.raw
+        position_ids_sin.raw
+        window_attention_mask.raw
+```
@@ -58,6 +58,7 @@ add_executable(${CMAKE_PROJECT_NAME} ${SERVICE_SOURCES})
 target_include_directories(${CMAKE_PROJECT_NAME} PRIVATE ${INCLUDE_PATH})
 target_compile_definitions(${CMAKE_PROJECT_NAME} PRIVATE -DNOMINMAX -DBUILDING_LIBCURL -D_HAS_STD_BYTE=0)
 target_link_directories(${CMAKE_PROJECT_NAME} PRIVATE ${EXTERNAL_LIB_PATH})
+target_link_options(${CMAKE_PROJECT_NAME} PRIVATE /DEBUG)
 target_link_libraries(${CMAKE_PROJECT_NAME} PRIVATE
         libappbuilder
         Crypt32
 
@@ -57,7 +57,7 @@ if (USE_MNN)
             -DCMAKE_CXX_COMPILER=${MSVC_CLANG_COMPILER}
             -DCMAKE_LINKER=${MSVC_CLANG_LINKER}
             -DLLM_SUPPORT_VISION=ON
-            -DMNN_BUILD_OPENCV=ON
+           -DMNN_BUILD_OPENCV=ON
             -DMNN_IMGCODECS=ON
             -DLLM_SUPPORT_AUDIO=ON
             -DMNN_BUILD_AUDIO=ON
-Original file line number
+Diff line change
@@ @@ -1,2 +1,3 @@ @@
 ai-engine-direct-helper/
 -build/
 +build/
 +GenieService_v*