Skip to content
Soroush Omranpour edited this page Apr 18, 2025 · 6 revisions

Folly

Folly Logo

Folly is an AI-powered pipeline designed to facilitate the generation of therapeutic content by leveraging cutting-edge techniques in affective computing, audio understanding, and image generation. This pipeline is composed of two main components:

  1. Audio Understanding Modules:

    • Music Analysis System: This component processes input music, extracting valuable information such as segments, genre, instrumentation, and emotion analysis.
    • Voice-Over Analysis System: Paired with the music file, this system processes voice-over (speech) files, extracting transcription, voice activity, and emotion.
  2. Video Generation Module: Using the audio files (music and voice-over) from the previous stage, this component generates video content that is synchronized with the audio. It takes inputs such as prompts, timing, and transitions to produce coherent, audio-synchronised videos. An image generative model is used to generate the video frame by frame.

Project Goal:

The primary goal of Folly is to contribute to therapeutic content creation by seamlessly integrating affective computing techniques with state-of-the-art audio understanding and video generation frameworks. This holistic approach enables the creation of emotionally resonant and synchronized media, offering potential applications in therapeutic settings.

Check out the generated samples in this folder.

Running the demo

Using the source code (with python <= 3.10):

apt-get update && apt-get install -y libsndfile1 ffmpeg portaudio19-dev
python3.10 -m venv lucid_env
source lucid_env/bin/activate

git clone https://github.com/thelucidproject/Folly
cd Folly/
pip install -r requirements.txt

python demo.py

or using Docker image:

docker build -t folly-image .
docker run --gpus all -it --rm folly-image

Clone this wiki locally