Distributed Split Inference

这是一个分布式模型推理系统，将大型语言模型（如Qwen3-32B）分割到客户端和服务器端进行推理，以优化资源使用和提高推理效率。

项目概述

本项目实现了模型分割推理（Model Split Inference），将大语言模型按层数分割成两部分：

客户端（Client）: 运行模型的前几层，处理输入嵌入和初始层的计算
服务器端（Server）: 运行模型的后续层和归一化层，完成剩余计算并输出结果

这种架构允许在资源受限的设备上运行大型语言模型，同时利用服务器的强大计算能力。

启动服务器

python run_server.py

运行客户端

python run_client.py

配置参数

model_name: 模型路径（如/opt/models/Qwen3-32B/layers_safetensors）
client_layers: 客户端运行的层数
max_new_tokens: 最大生成token数
addr: 通信地址（默认为tcp://0.0.0.0:5558）

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Llama-3.1-70B-Instruct		Llama-3.1-70B-Instruct
Llama-3.2-8B-Instruct		Llama-3.2-8B-Instruct
Qwen3-32B		Qwen3-32B
__pycache__		__pycache__
data		data
delete after loading		delete after loading
layer_inference		layer_inference
outputs		outputs
README.md		README.md
client.py		client.py
common.py		common.py
llama_modelsplit.py		llama_modelsplit.py
metrics.py		metrics.py
qwen_modelsplit.py		qwen_modelsplit.py
run_client.py		run_client.py
run_server.py		run_server.py
serial.py		serial.py
server.py		server.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Split Inference

项目概述

启动服务器

运行客户端

配置参数

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Split Inference

项目概述

启动服务器

运行客户端

配置参数

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages