Merge branch 'main' of github.com:Winfredy/SadTalker

vinthony · vinthony · commit 35dd94f443e5 · 2023-04-08T18:42:50.000+08:00
diff --git a/README.md b/README.md
@@ -36,6 +36,11 @@
 </div>
 
 ## 🔥 Highlight
+
+- 🔥 The extension of the [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is online. Just install it in `extensions -> install from URL -> https://github.com/Winfredy/SadTalker`, checkout more details [here](#sd-webui-extension).
+
+https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4
+
 - 🔥 Beta version of the `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#beta-full-bodyimage-generation) for more details.
 
 | still                 | still + enhancer          |   [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) |
@@ -49,6 +54,10 @@
 
 ## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md))
 
+- __[2023.04.06]__: stable-diffiusion webui extension is release.
+
+- __[2023.04.03]__: Enable TTS in huggingface and gradio local demo.
+
 - __[2023.03.30]__: Launch beta version of the full body mode.
 
 - __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement.
@@ -82,16 +91,14 @@ the 3D-aware face render for final video generation.
 - [ ] training code of each componments.
 - [ ] Audio-driven Anime Avatar.
 - [ ] interpolate ChatGPT for a conversation demo 🤔
-- [ ] integrade with stable-diffusion-web-ui. (stay tunning!)
+- [x] integrade with stable-diffusion-web-ui. (stay tunning!)
 
-https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4
 
 
-## ⚙️ Installation
 
-#### Dependence Installation
+## ⚙️ Installation
 
-<details><summary>CLICK ME For Mannual Installation </summary>
+#### Installing Sadtalker on Linux:
 
 ```bash
 git clone https://github.com/Winfredy/SadTalker.git
@@ -108,25 +115,39 @@ conda install ffmpeg
 
 pip install -r requirements.txt
 
+### tts is optional for gradio demo. 
+### pip install TTS
+
 ```  
 
-</details>
+More tips about installnation on Windows and the Docker file can be founded [here](docs/install.md)
+
+#### Sd-Webui-Extension:
+<details><summary>CLICK ME</summary>
 
-<details><summary>CLICK For Docker Installation </summary>
+Installing the lastest version of [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) and install the sadtalker via `extension`.
+<img width="726" alt="image" src="https://user-images.githubusercontent.com/4397546/230698519-267d1d1f-6e99-4dd4-81e1-7b889259efbd.png">
 
-A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as:
+Then, retarting the stable-diffusion-webui, set some commandline args. The models will be downloaded automatically in the right place. Alternatively, you can add the path of pre-downloaded sadtalker checkpoints to `SADTALKTER_CHECKPOINTS` in `webui_user.sh`(linux) or `webui_user.bat`(windows) by:
 
 ```bash
-docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \
-    --driven_audio /host_dir/deyu.wav \
-    --source_image /host_dir/image.jpg \
-    --expression_scale 1.0 \
-    --still \
-    --result_dir /host_dir
+# windows (webui_user.bat)
+set COMMANDLINE_ARGS=--no-gradio-queue  --disable-safe-unpickle
+set SADTALKER_CHECKPOINTS=D:\SadTalker\checkpoints
+
+# linux (webui_user.sh)
+export COMMANDLINE_ARGS=--no-gradio-queue  --disable-safe-unpickle
+export SADTALKER_CHECKPOINTS=/path/to/SadTalker/checkpoints
 ```
+
+After installation, the SadTalker can be used in stable-diffusion-webui directly. 
+
+<img width="726" alt="image" src="https://user-images.githubusercontent.com/4397546/230698614-58015182-2916-4240-b324-e69022ef75b3.png">
+
 </details>
 
 
+
 #### Download Trained Models
 <details><summary>CLICK ME</summary>
 
@@ -161,9 +182,12 @@ python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or pict
 ```
 The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`.
 
-Or a local gradio demo can be run by:
+Or a local gradio demo similar to our [hugging-face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run by:
 
 ```bash
+
+## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
+
 python app.py
 ```
 
diff --git a/docs/install.md b/docs/install.md
@@ -0,0 +1,25 @@
+
+
+
+### Windows Native
+
+- Make sure you have `ffmpeg` in the `%PATH%` as suggested in [#54](https://github.com/Winfredy/SadTalker/issues/54), following [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) installation to install `ffmpeg`. 
+
+
+### Windows WSL
+- Make sure the environment: `export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH`
+
+
+### Docker installnation
+
+A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as:
+
+```bash
+docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \
+    --driven_audio /host_dir/deyu.wav \
+    --source_image /host_dir/image.jpg \
+    --expression_scale 1.0 \
+    --still \
+    --result_dir /host_dir
+```
+
diff --git a/scripts/download_models.sh b/scripts/download_models.sh
@@ -1,12 +1,13 @@
 mkdir ./checkpoints  
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2exp_00300-model.pth -O ./checkpoints/auido2exp_00300-model.pth
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2pose_00140-model.pth -O ./checkpoints/auido2pose_00140-model.pth
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/epoch_20.pth -O ./checkpoints/epoch_20.pth
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/facevid2vid_00189-model.pth.tar -O ./checkpoints/facevid2vid_00189-model.pth.tar
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/shape_predictor_68_face_landmarks.dat -O ./checkpoints/shape_predictor_68_face_landmarks.dat
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/wav2lip.pth -O ./checkpoints/wav2lip.pth
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/mapping_00229-model.pth.tar -O ./checkpoints/mapping_00229-model.pth.tar
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/BFM_Fitting.zip -O ./checkpoints/BFM_Fitting.zip
-wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/hub.zip -O ./checkpoints/hub.zip
-unzip ./checkpoints/hub.zip -d ./checkpoints/
-unzip ./checkpoints/BFM_Fitting.zip -d ./checkpoints/
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2exp_00300-model.pth -O ./checkpoints/auido2exp_00300-model.pth
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2pose_00140-model.pth -O ./checkpoints/auido2pose_00140-model.pth
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/epoch_20.pth -O ./checkpoints/epoch_20.pth
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/facevid2vid_00189-model.pth.tar -O ./checkpoints/facevid2vid_00189-model.pth.tar
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/shape_predictor_68_face_landmarks.dat -O ./checkpoints/shape_predictor_68_face_landmarks.dat
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/wav2lip.pth -O ./checkpoints/wav2lip.pth
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/mapping_00229-model.pth.tar -O ./checkpoints/mapping_00229-model.pth.tar
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/BFM_Fitting.zip -O ./checkpoints/BFM_Fitting.zip
+wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/hub.zip -O ./checkpoints/hub.zip
+
+unzip -n ./checkpoints/hub.zip -d ./checkpoints/
+unzip -n ./checkpoints/BFM_Fitting.zip -d ./checkpoints/
diff --git a/scripts/extension.py b/scripts/extension.py
@@ -0,0 +1,133 @@
+import os, sys
+from pathlib import Path
+import tempfile
+import gradio as gr
+from modules.call_queue import wrap_gradio_gpu_call, wrap_queued_call
+from modules.shared import opts, OptionInfo
+from modules import shared, paths, script_callbacks
+import launch
+import glob
+
+def get_source_image(image):   
+        return image
+
+def get_img_from_txt2img(x):
+    talker_path = Path(paths.script_path) / "outputs"
+    imgs_from_txt_dir = str(talker_path / "txt2img-images/")
+    imgs = glob.glob(imgs_from_txt_dir+'/*/*.png')
+    imgs.sort(key=lambda x:os.path.getmtime(os.path.join(imgs_from_txt_dir, x)))
+    img_from_txt_path = os.path.join(imgs_from_txt_dir, imgs[-1])
+    return img_from_txt_path, img_from_txt_path
+
+def get_img_from_img2img(x):
+    talker_path = Path(paths.script_path) / "outputs"
+    imgs_from_img_dir = str(talker_path / "img2img-images/")
+    imgs = glob.glob(imgs_from_img_dir+'/*/*.png')
+    imgs.sort(key=lambda x:os.path.getmtime(os.path.join(imgs_from_img_dir, x)))
+    img_from_img_path = os.path.join(imgs_from_img_dir, imgs[-1])
+    return img_from_img_path, img_from_img_path
+ 
+def install():
+
+    kv = {
+        "face-alignment": "face-alignment==1.3.5",
+        "imageio": "imageio==2.19.3",
+        "imageio-ffmpeg": "imageio-ffmpeg==0.4.7",
+        "librosa":"librosa==0.8.0",
+        "pydub":"pydub==0.25.1",
+        "scipy":"scipy==1.8.1",
+        "tqdm": "tqdm",
+        "yacs":"yacs==0.1.8",
+        "pyyaml": "pyyaml", 
+        "dlib": "dlib-bin",
+        "gfpgan": "gfpgan",
+    }
+
+    for k,v in kv.items():
+        print(k, launch.is_installed(k))
+        if not launch.is_installed(k):
+            launch.run_pip("install "+ v, "requirements for SadTalker")
+
+
+    if os.getenv('SADTALKER_CHECKPOINTS'):
+        print('load Sadtalker Checkpoints from '+ os.getenv('SADTALKER_CHECKPOINTS'))
+    else:
+        ### run the scripts to downlod models to correct localtion.
+        print('download models for SadTalker')
+        launch.run("cd " + paths.script_path+"/extensions/SadTalker && bash ./scripts/download_models.sh", live=True)
+        print('SadTalker is successfully installed!')
+    
+ 
+def on_ui_tabs():
+    install()
+
+    sys.path.extend([paths.script_path+'/extensions/SadTalker']) 
+    
+    repo_dir = paths.script_path+'/extensions/SadTalker/'
+
+    result_dir = opts.sadtalker_result_dir
+    os.makedirs(result_dir, exist_ok=True)
+
+    from src.gradio_demo import SadTalker  
+
+    if  os.getenv('SADTALKER_CHECKPOINTS'):
+        checkpoint_path = os.getenv('SADTALKER_CHECKPOINTS')
+    else:
+        checkpoint_path = repo_dir+'checkpoints/'
+
+    sad_talker = SadTalker(checkpoint_path=checkpoint_path, config_path=repo_dir+'src/config', lazy_load=True)
+    
+    with gr.Blocks(analytics_enabled=False) as audio_to_video:
+        with gr.Row().style(equal_height=False):
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_source_image"):
+                    with gr.TabItem('Upload image'):
+                        with gr.Row():
+                            input_image = gr.Image(label="Source image", source="upload", type="filepath").style(height=512,width=512)
+                        
+                        with gr.Row():
+                            submit_image2 = gr.Button('load From txt2img', variant='primary')
+                            submit_image2.click(fn=get_img_from_txt2img, inputs=input_image, outputs=[input_image, input_image])
+                            
+                            submit_image3 = gr.Button('load from img2img', variant='primary')
+                            submit_image3.click(fn=get_img_from_img2img, inputs=input_image, outputs=[input_image, input_image])
+
+                with gr.Tabs(elem_id="sadtalker_driven_audio"):
+                    with gr.TabItem('Upload'):
+                        with gr.Column(variant='panel'):
+
+                            with gr.Row():
+                                driven_audio = gr.Audio(label="Input audio", source="upload", type="filepath")
+                                    
+                            
+            with gr.Column(variant='panel'): 
+                with gr.Tabs(elem_id="sadtalker_checkbox"):
+                    with gr.TabItem('Settings'):
+                        with gr.Column(variant='panel'):
+                            is_still_mode = gr.Checkbox(label="Still Mode (fewer head motion)").style(container=True)
+                            is_enhance_mode = gr.Checkbox(label="Enhance Mode (better face quality )").style(container=True)
+                            submit = gr.Button('Generate', elem_id="sadtalker_generate", variant='primary')
+
+                with gr.Tabs(elem_id="sadtalker_genearted"):
+                        gen_video = gr.Video(label="Generated video", format="mp4").style(width=256)
+
+
+        ### gradio gpu call will always return the html, 
+        submit.click(
+                    fn=wrap_queued_call(sad_talker.test), 
+                    inputs=[input_image,
+                            driven_audio,
+                            is_still_mode,
+                            is_enhance_mode], 
+                    outputs=[gen_video, ]
+                    )
+
+    return [(audio_to_video, "SadTalker", "extension")]
+
+def on_ui_settings():
+    talker_path = Path(paths.script_path) / "outputs"
+    section = ('extension', "SadTalker") 
+    opts.add_option("sadtalker_result_dir", OptionInfo(str(talker_path / "SadTalker/"), "Path to save results of sadtalker", section=section)) 
+
+script_callbacks.on_ui_settings(on_ui_settings)
+script_callbacks.on_ui_tabs(on_ui_tabs)
diff --git a/src/gradio_demo.py b/src/gradio_demo.py
diff --git a/src/utils/preprocess.py b/src/utils/preprocess.py