Skip to content

Commit 175b762

Browse files
authored
Merge pull request #82 from intel/update-branch
245 user story add digital avatar use case (#246)
2 parents 2af1f5c + 8a468f3 commit 175b762

File tree

200 files changed

+33361
-11
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

200 files changed

+33361
-11
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
__pycache__
2+
.env
3+
4+
ffmpeg*/
5+
checkpoints
6+
cache/
7+
backend/musetalk/models
8+
backend/musetalk/data/avatars
9+
backend/wav2lip/wav2lip/results
10+
backend/wav2lip/wav2lip/temp
11+
weights/*
12+
backend/liveportrait/templates
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Digital Avatar
2+
3+
A digital avatar that utilizes Image to Video, Text To Speech, Speech To Text, and LLM to create an interactive avatar.
4+
5+
![Demo](./docs/demo.gif)
6+
7+
8+
## Table of Contents
9+
- [Architecture Diagram](#requirements)
10+
- [Requirements](#requirements)
11+
- [Minimum](#minimum)
12+
- [Recommended](#recommended)
13+
- [Application Ports](#application-ports)
14+
- [Setup](#setup)
15+
- [Prerequisite](#prerequisite)
16+
- [Setup ENV](#setup-env)
17+
- [Build Docker Container](#build-docker-container)
18+
- [Start Docker Container](#start-docker-container)
19+
- [Access the App](#access-the-app)
20+
- [FAQ](#faq)
21+
22+
## Architecture DIagram
23+
![Archictecture Diagram](./docs/architecture.png)
24+
25+
## Requirements
26+
27+
### Minimum
28+
- CPU: 13th generations of Intel Core i5 and above
29+
- GPU: Intel® Arc™ A770 graphics (16GB)
30+
- RAM: 32GB
31+
- DISK: 128GB
32+
33+
## Application Ports
34+
Please ensure that you have these ports available before running the applications.
35+
36+
| Apps | Port |
37+
|--------------|------|
38+
| Lipsync | 8011 |
39+
| LivePortrait | 8012 |
40+
| TTS | 8013 |
41+
| STT | 8014 |
42+
| OLLAMA | 8015 |
43+
| Frontend | 80 |
44+
45+
## Setup
46+
47+
### Prerequisite
48+
1. **OS**: Ubuntu (Validated on 22.04)
49+
50+
***Note***: If you are using different Ubuntu version, please [update the RENDER_GROUP_ID](#1-how-to-check-render-group-id)
51+
52+
1. **Docker and Docker Compose**: Ensure Docker and Docker Compose are installed. Refer to [Docker installation guide](https://docs.docker.com/engine/install/).
53+
1. **Intel GPU Drivers**:
54+
1. Refer to [here](../../../README.md#gpu) to install Intel GPU Drivers
55+
1. **Download Wav2Lip Model**: Download the [Wav2Lip model](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW) and place the file in the `weights` folder.
56+
1. **Create Avatar**:
57+
1. Place an `image.png` file containing an image of a person (preferably showing at least the upper half of the body) in the assets folder.
58+
2. Place an `idle.mp4` file of a person with some movement such as eye blinking (to be used as a reference) in the assets folder.
59+
60+
### Setup ENV
61+
1. Create a `.env` file and copy the contents from `.env.template`:
62+
```bash
63+
cp .env.template .env
64+
```
65+
2. Modify the `LLM_MODEL` in the `.env` file. Refer to [Ollama library](https://ollama.com/library) for available models. (Default is `QWEN2.5`).
66+
67+
### Build Docker Container
68+
```bash
69+
docker compose build
70+
```
71+
72+
### Start Docker container
73+
```bash
74+
docker compose up -d
75+
```
76+
77+
### Access the App
78+
- Navigate to http://localhost
79+
80+
## Notes
81+
### Device Workload Configurations
82+
You can offload model inference to specific device by modifying the environment variable setting in the docker-compose.yml file.
83+
84+
| Workload | Environment Variable |Supported Device |
85+
|----------------------|----------------------|-------------------------|
86+
| LLM | - | GPU |
87+
| STT - Encoded Device | STT_ENCODED_DEVICE | CPU,GPU,NPU |
88+
| STT - Decided Device | STT_DECODED_DEVICE | CPU,GPU |
89+
| TTS | TTS_DEVICE | CPU |
90+
| Lipsync (Wav2lip) | DEVICE | CPU, GPU |
91+
92+
Example Configuration:
93+
94+
* To offload the STT encoded workload to `NPU`, you can use the following configuration.
95+
96+
```
97+
wav2lip:
98+
...
99+
environment:
100+
...
101+
DEVICE=CPU
102+
...
103+
```
104+
105+
## FAQ
106+
### 1. Update Render Group ID
107+
1. Ensure the [Intel GPU driver](#prerequisite) is installed.
108+
2. Check the group ID from `/etc/group`:
109+
```bash
110+
grep render /etc/group
111+
```
112+
3. The output will be something like:
113+
```
114+
render:x:110:user
115+
```
116+
4. The group ID is the number in the third field (e.g., `110` in the example above).
117+
5. Ensure the `RENDER_GROUP_ID` in the [docker-compose.yml](./docker-compose.yml) file matches the render group ID.
118+
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
FROM debian:12-slim
2+
3+
ARG DEBIAN_FRONTEND=noninteractive
4+
ARG RENDER_GROUP_ID
5+
RUN apt-get update \
6+
&& apt-get upgrade -y \
7+
&& apt-get install --no-install-recommends -y \
8+
sudo \
9+
wget \
10+
ca-certificates \
11+
ffmpeg \
12+
libsm6 \
13+
libxext6 \
14+
curl \
15+
git \
16+
build-essential \
17+
libssl-dev \
18+
zlib1g-dev \
19+
libbz2-dev \
20+
libreadline-dev \
21+
libsqlite3-dev \
22+
llvm \
23+
libncursesw5-dev \
24+
xz-utils \
25+
tk-dev \
26+
libxml2-dev \
27+
libxmlsec1-dev \
28+
libffi-dev \
29+
liblzma-dev \
30+
&& addgroup --system intel --gid 1000 \
31+
&& adduser --system --ingroup intel --uid 1000 --home /home/intel intel \
32+
&& echo "intel ALL=(ALL:ALL) NOPASSWD:ALL" > /etc/sudoers.d/intel \
33+
&& groupadd -g ${RENDER_GROUP_ID} render \
34+
&& usermod -aG render intel \
35+
&& rm -rf /var/lib/apt/lists/* \
36+
&& mkdir -p /usr/src \
37+
&& chown -R intel:intel /usr/src
38+
39+
# Intel GPU Driver
40+
RUN apt-get update && apt-get install -y gnupg
41+
42+
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
43+
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg && \
44+
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
45+
tee /etc/apt/sources.list.d/intel-gpu-jammy.list && \
46+
apt update && \
47+
apt-get install -y --no-install-recommends libze1 intel-level-zero-gpu intel-opencl-icd clinfo libze-dev intel-ocloc
48+
49+
USER intel
50+
WORKDIR /usr/src/app
51+
52+
# Set environment variables for pyenv
53+
ENV PYENV_ROOT="/usr/src/app/.pyenv"
54+
ENV PATH="$PYENV_ROOT/bin:$PYENV_ROOT/shims:$PATH"
55+
56+
# Install pyenv
57+
RUN curl https://pyenv.run | bash \
58+
&& echo 'export PYENV_ROOT="$PYENV_ROOT"' >> ~/.bashrc \
59+
&& echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc \
60+
&& echo 'eval "$(pyenv init --path)"' >> ~/.bashrc \
61+
&& echo 'eval "$(pyenv init -)"' >> ~/.bashrc \
62+
&& . ~/.bashrc \
63+
&& pyenv install 3.10.15 \
64+
&& pyenv global 3.10.15
65+
66+
RUN python3 -m pip install --upgrade pip \
67+
&& python3 -m pip install virtualenv
68+
69+
RUN python3 -m venv /usr/src/.venv
70+
ENV PATH="/usr/src/.venv/bin:$PATH"
71+
72+
COPY --chown=intel ./backend/liveportrait .
73+
RUN python3 -m pip install -r requirements.txt \
74+
&& huggingface-cli download KwaiVGI/LivePortrait --local-dir liveportrait/pretrained_weights --exclude "*.git*" "README.md" "docs"
75+
76+
HEALTHCHECK --interval=30s --timeout=180s --start-period=60s --retries=3 \
77+
CMD sh -c 'PORT=${SERVER_PORT:-8012} && wget --no-verbose -O /dev/null --tries=1 http://localhost:$PORT/healthcheck || exit 1'
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
**/__pycache__/
4+
*.py[cod]
5+
**/*.py[cod]
6+
*$py.class
7+
8+
# Model weights
9+
**/*.pth
10+
**/*.onnx
11+
12+
pretrained_weights/*.md
13+
pretrained_weights/docs
14+
pretrained_weights/liveportrait
15+
pretrained_weights/liveportrait_animals
16+
17+
# Ipython notebook
18+
*.ipynb
19+
20+
# Temporary files or benchmark resources
21+
animations/*
22+
tmp/*
23+
.vscode/launch.json
24+
**/*.DS_Store
25+
gradio_temp/**
26+
27+
# Windows dependencies
28+
ffmpeg/
29+
LivePortrait_env/
30+
31+
# XPose build files
32+
src/utils/dependencies/XPose/models/UniPose/ops/build
33+
src/utils/dependencies/XPose/models/UniPose/ops/dist
34+
src/utils/dependencies/XPose/models/UniPose/ops/MultiScaleDeformableAttention.egg-info
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Kuaishou Visual Generation and Interaction Center
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
22+
23+
---
24+
25+
The code of InsightFace is released under the MIT License.
26+
The models of InsightFace are for non-commercial research purposes only.
27+
28+
If you want to use the LivePortrait project for commercial purposes, you
29+
should remove and replace InsightFace’s detection models to fully comply with
30+
the MIT license.

0 commit comments

Comments
 (0)