Skip to content

Commit ca03421

Browse files
author
Chen Fei
committed
Remove etcd dependency in Orion Controller
1 parent aa2d0fe commit ca03421

File tree

5 files changed

+9
-75
lines changed

5 files changed

+9
-75
lines changed

doc/Orion-User-Guide.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,6 @@ Orion Client端可以部署在本地物理机上,Container里,或者VM里,
9292
## 部署Orion Controller
9393
### 环境依赖
9494
* Linux Centos 7.x 或者 Ubuntu 14.04 及以上版本
95-
* etcd服务
9695

9796
### 启动Orion Controller服务
9897

@@ -106,7 +105,6 @@ Orion Controller通过启动参数支持不同的配置选项。orion-controller
106105

107106
-i 绑定服务于指定IP地址。默认值为 "0.0.0.0"
108107
-p 服务监听端口。默认值为 9123
109-
--kv 指定etcd的key-value存储后端地址。默认值为 "http://127.0.0.1:2379"
110108
--log-level 日志的级别。支持 [debug|info|warn|error] 四种级别。默认值为 "info"
111109
--log 日志文件的输出路径。默认为输出到标准输出
112110
--vgpu 每个物理GPU切分为多少个vGPU。默认值为 4。允许的合法范围为大于等于1的自然数
@@ -373,7 +371,6 @@ Orion Client的配置文件名为 client.conf,可以放在两个位置。一
373371
* 确认是否的确由于正在运行的Orion Client过多从而导致资源耗尽。
374372
* 是否由于连续串行执行多次Orion Client应用程序,系统需要一定时间间隙完成全局资源的释放。
375373
* 是否没有成功启动 Orion Server服务。
376-
* Orion Controller的环境中没有安装并启动Linux etcd服务。
377374
* 如果剩余vGPU资源不为0
378375
* 确认Orion Clinet环境中的 ORION_VGPU 和 ORION_GMEM 请求的资源确实超出了实际可分配的能力,以下资源请求无法得到满足
379376
* ORION_GMEM 配置使用的 GPU 显存超出了单个物理 GPU 的显存大小。

doc/quick-start/README.md

Lines changed: 8 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -84,29 +84,6 @@ EXAMPLES:
8484

8585
## <a id="controller"></a> 步骤一:部署 Orion Controller
8686

87-
### 安装环境准备
88-
环境依赖项
89-
90-
* etcd
91-
92-
如果操作系统中尚未安装`etcd`,可以通过以下命令安装并启动:
93-
94-
```bash
95-
# Ubuntu 16.04
96-
sudo apt install -y etcd
97-
98-
# CentOS 7.x
99-
sudo yum install -y etcd.x86_64
100-
sudo systemctl start etcd
101-
```
102-
103-
可以通过`systemctl`确认`etcd`服务正常运行:
104-
105-
```bash
106-
systemctl status etcd
107-
```
108-
109-
正常运行时,屏幕上会出现`active (running)`的彩色文字。
11087
### 启动 Orion Controller
11188

11289
在集群上部署Orion vGPU软件时,Orion Controller可以运行在任意节点上。本文为方便起见,将Orion Controller和Orion Server服务部署在同一台含有GPU的节点上。
@@ -117,13 +94,18 @@ systemctl status etcd
11794
./orion-controller start
11895
```
11996

120-
正常情况下,屏幕会输出如下的日志,表明Orion Controller连接上`etcd`服务,并监听来自网络所有地址的Orion vGPU资源请求。
97+
正常情况下,屏幕会输出如下的日志,监听来自网络所有地址的Orion vGPU资源请求。
12198

12299
```bash
123-
INFO[0000] Creating database connection to http://127.0.0.1:2379
124-
INFO[0000] Database connection is created.
100+
ERRO[0000] Config File "controller" Not Found in "[/etc/orion]"
101+
ERRO[0000] read config file error, enable default config
102+
WARN[0000] use database default config
103+
INFO[0000] Etcd Server is ready!
104+
INFO[0000] Creating database connection to http://127.0.0.1:23790
105+
INFO[0000] Database connection is created.
125106
INFO[0000] Controller is launching, listening on 0.0.0.0:9123
126107
```
108+
上述日志的前两个error信息仅仅表明系统没有配置文件,则Orion Controller会使用默认值进行配置。用户可以忽略该项错误日志。
127109

128110
### (可选)后台运行Orion Controller并输出日志到文件
129111
下述命令将Orion Controller进程运行在后台,并将日志输出到工作目录下的`controller.log`文件。用户可以用`cat`等命令查看日志内容。

doc/quick-start/appendix.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ systemctl start firewalld
6565

6666
./orion-check install all
6767

68-
为使本地Orion vGPU软件正常工作,至少需要满足基本项`OS`, `CUDA`, `CUDNN`, `NVIDIA GPU`, `etcd service`的状态为`Yes`
68+
为使本地Orion vGPU软件正常工作,至少需要满足基本项`OS`, `CUDA`, `CUDNN`, `NVIDIA GPU`的状态为`Yes`
6969

7070
除了 Volta 和 Turing架构的显卡(例如 V100, 2080Ti)之外,在使用其余显卡时,一定要确保`CUDA MPS`处于关闭状态。
7171

@@ -81,7 +81,6 @@ CUDA : [Yes]
8181
CUDNN : [Yes]
8282
NVIDIA GPU : [Yes]
8383
NVIDIA CUDA MPS : [OFF]
84-
etcd service : [Yes]
8584
QEMU-KVM environment : [No]
8685
Docker container environment : [Yes]
8786
Orion Server binary: [Yes]

orion-check

Lines changed: 0 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -278,42 +278,6 @@ function find_mps_support {
278278
}
279279

280280

281-
etcd_running=0
282-
etcd_version=
283-
etcd_v2=0
284-
etcd_v3=0
285-
summary_etcd_support="No"
286-
function find_etcd {
287-
echo -e "\nSeaching etcd service ..."
288-
bin=
289-
running_bin=$(ps -aux | grep -v grep | grep -w etcd | awk '{print $11}')
290-
if [ -z "$running_bin" ]; then
291-
if which etcd > /dev/null 2>&1; then
292-
bin=$(which etcd)
293-
fi
294-
else
295-
bin=$running_bin
296-
etcd_running=1
297-
summary_etcd_support="Yes"
298-
fi
299-
300-
if [ -z "$bin" ]; then
301-
echo "No etcd is running or installed in the system."
302-
return 1
303-
fi
304-
305-
etcd_version=$($bin --version | grep "etcd Version" | awk '{print $3}')
306-
echo "etcd (version $etcd_version) is installed in $bin"
307-
308-
major=${etcd_version:0:1}
309-
if [ $major -eq 2 ]; then
310-
etcd_v2=1
311-
elif [ $major -eq 3 ]; then
312-
etcd_v3=1
313-
fi
314-
}
315-
316-
317281
qemu_api_version=
318282
qemu_version=
319283
qemu_version_major=
@@ -775,7 +739,6 @@ if [ "$1" == "install" ]; then
775739
find_cudnn
776740
find_nvidia_gpu
777741
find_mps_support
778-
find_etcd
779742
find_qemu_kvm
780743
find_docker
781744
check_server_install
@@ -788,7 +751,6 @@ if [ "$1" == "install" ]; then
788751
printf "%-40s [%s]\n" "CUDNN :" "$summary_cudnn_support"
789752
printf "%-40s [%s]\n" "NVIDIA GPU :" "$summary_nvidia_gpu_support"
790753
printf "%-40s [%s]\n" "NVIDIA CUDA MPS :" "$summary_nvidia_mps"
791-
printf "%-40s [%s]\n" "etcd service :" "$summary_etcd_support"
792754
printf "%-40s [%s]\n" "QEMU-KVM environment :" "$summary_qemu_kvm_support"
793755
printf "%-40s [%s]\n" "Docker container environment :" "$summary_docker_support"
794756
printf "%-40s [%s]\n" "Orion Server binary:" "$summary_server_support"
@@ -828,16 +790,10 @@ if [ "$1" == "install" ]; then
828790
printf "%-40s [%s]\n" "Docker container environment :" "$summary_docker_support"
829791
elif [ "$2" == "controller" ]; then
830792
check_os
831-
find_etcd
832793

833794
echo -e "\n==============================================="
834795
echo -e "Installation summaries :\n"
835796
printf "%-40s [%s]\n" "OS :" "$summary_os_support"
836-
printf "%-40s [%s]\n" "etcd service :" "$summary_etcd_support"
837-
838-
if [ $summary_etcd_support == "No" ]; then
839-
echo -e "\n\033[31mOrion Controller can not be installed in this environment.\033[0m"
840-
fi
841797
else
842798
echo "Invalid parameters."
843799
print_help

orion-controller

15.3 MB
Binary file not shown.

0 commit comments

Comments
 (0)