Skip to content

Commit 12b3b0e

Browse files
daavooKostis-S-Z
andauthored
Simplify README (#91)
* enh(docs): Simplify README * Add google Colab * Update README. Move troubleshooting to docs * Add more instructions for quick start (#81) * Add more instructions for quick start * Add pre-commit note * Move colab badge with the rest of the badges * Update README.md --------- Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com> * Update width * Drop table * Update customization * Update README * Apply suggestions from code review Co-authored-by: Kostis <Kostis-S-Z@users.noreply.github.com> --------- Co-authored-by: Kostis <Kostis-S-Z@users.noreply.github.com>
1 parent 82f7779 commit 12b3b0e

File tree

5 files changed

+76
-117
lines changed

5 files changed

+76
-117
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ We welcome all kinds of contributions, from improving customization, to extendin
2323

2424
### **Submit Pull Requests** 💻
2525
- Fork the repository and create a new branch for your changes.
26+
- Install [pre-commit](https://pre-commit.com/) to ensure the code is formatted and standardized correctly, by running `pip install pre-commit` and then `pre-commit install`.
2627
- Ensure your branch is up-to-date with the main branch before submitting the PR
2728
- Please follow the PR template, adding as much detail as possible, including how to test the changes
2829

README.md

Lines changed: 35 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -1,138 +1,61 @@
1+
<p align="center"><img src="./images/Blueprints-logo.png" width="35%" alt="Project logo"/></p>
2+
3+
# Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI
4+
15
[![](https://dcbadge.limes.pink/api/server/YuMNeuKStr?style=flat)](https://discord.gg/YuMNeuKStr)
26
[![Docs](https://github.com/mozilla-ai/document-to-podcast/actions/workflows/docs.yaml/badge.svg)](https://github.com/mozilla-ai/document-to-podcast/actions/workflows/docs.yaml/)
37
[![Tests](https://github.com/mozilla-ai/document-to-podcast/actions/workflows/tests.yaml/badge.svg)](https://github.com/mozilla-ai/document-to-podcast/actions/workflows/tests.yaml/)
48
[![Ruff](https://github.com/mozilla-ai/document-to-podcast/actions/workflows/lint.yaml/badge.svg?label=Ruff)](https://github.com/mozilla-ai/document-to-podcast/actions/workflows/lint.yaml/)
59

6-
<p align="center"><img src="./images/Blueprints-logo.png" width="35%" alt="Project logo"/></p>
7-
8-
# Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI
9-
1010
This blueprint demonstrate how you can use open-source models & tools to convert input documents into a podcast featuring two speakers.
11-
It is designed to work on most local setups or with [GitHub Codespaces](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb), meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local.
12-
13-
<p align="center"><a href="https://colab.research.google.com/github/mozilla-ai/document-to-podcast/blob/main/demo/notebook.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
11+
It is designed to work on most local setups, meaning no external API calls or GPU access is required.
12+
This makes it more accessible and privacy-friendly by keeping everything local.
1413

15-
### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/document-to-podcast/).
14+
<img src="./images/document-to-podcast-diagram.png" width="1200" alt="document-to-podcast Diagram" />
1615

17-
### Built with
16+
### 👉 📖 For more detailed guidance on using this project, please visit our [Docs](https://mozilla-ai.github.io/document-to-podcast/).
17+
### 👉 🔨 Built with
1818
- Python 3.10+ (use Python 3.12 for Apple M1/2/3 chips)
19-
- [Llama-cpp](https://github.com/abetlen/llama-cpp-python) (text-to-text, i.e script generation)
20-
- [OuteAI](https://github.com/edwko/OuteTTS) (text-to-speech, i.e audio generation)
19+
- [Llama-cpp](https://github.com/abetlen/llama-cpp-python)
2120
- [Streamlit](https://streamlit.io/) (UI demo)
2221

22+
### 👉 🧠 Check the [Supported Models](https://mozilla-ai.github.io/document-to-podcast/customization/#supported-models).
2323

2424
## Quick-start
2525

26-
Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine.
27-
28-
---
29-
30-
### **Option 1: GitHub Codespaces**
31-
32-
The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces:
33-
34-
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb)
35-
36-
Once the Codespaces environment launches, inside the terminal, start the Streamlit demo by running:
37-
```bash
38-
python -m streamlit run demo/app.py
39-
```
40-
41-
### **Option 2: Local Installation**
42-
43-
1. **Clone the Repository**
44-
Inside the Codespaces terminal, run:
45-
```bash
46-
git clone https://github.com/mozilla-ai/document-to-podcast.git
47-
cd document-to-podcast
48-
```
49-
50-
2. **Install Dependencies**
51-
Inside the terminal, run:
52-
```bash
53-
pip install -e .
54-
3. **Run the Demo**
55-
Inside the terminal, start the Streamlit demo by running:
56-
```bash
57-
python -m streamlit run demo/app.py
58-
```
59-
60-
***NOTE***: The first time you run the demo app it might take a while to generate the script or the audio because it will download the models to the machine which are a few GBs in size.
61-
62-
63-
## How it Works
64-
65-
<img src="./images/document-to-podcast-diagram.png" width="1200" />
66-
67-
68-
1. **Document Input**
69-
Start by either:
70-
- Uploading a document in a supported format (e.g., PDF, .txt, or .docx)
71-
- Entering a website URL to fetch content directly
72-
73-
2. **Document Pre-Processing**
74-
The input is processed to extract and clean the text. This involves:
75-
- Extracting readable text from the document or webpage
76-
- Removing noise such as URLs, email addresses, and special characters to ensure the text is clean and structured
77-
78-
3. **Script Generation**
79-
The cleaned text is passed to a language model to generate a podcast transcript in the form of a conversation between two speakers.
80-
- **Model Loading**: The system selects and loads a pre-trained LLM optimized for running locally, using the llama_cpp library. This enables the model to run efficiently on CPUs, making them more accessible and suitable for local setups.
81-
- **Customizable Prompt**: A user-defined "system prompt" guides the LLM in shaping the conversation, specifying tone, content, speaker interaction, and format.
82-
- **Output Transcript**: The model generates a podcast script in structured format, with each speaker's dialogue clearly labeled.
83-
Example output:
84-
```json
85-
{
86-
"Speaker 1": "Welcome to the podcast on AI advancements.",
87-
"Speaker 2": "Thank you! So what's new this week for the latest AI trends?",
88-
"Speaker 1": "Where should I start.. Lots has been happening!",
89-
...
90-
}
91-
```
92-
This step ensures that the podcast script is engaging, relevant, and ready for audio conversion.
93-
94-
4. **Audio Generation**
95-
- The generated transcript is converted into audio using a Text-to-Speech (TTS) model.
96-
- Each speaker is assigned a distinct voice.
97-
- The final output is saved as an audio file in formats like MP3 or WAV.
98-
99-
## Models
100-
101-
The architecture of this codebase focuses on modularity and adaptability, meaning it shouldn't be too difficult to swap frameworks to use your own suite of models. We have selected fully open source models that are very memory efficient and can run on a laptop CPU with less than 10GB RAM requirements.
102-
103-
### text-to-text
104-
105-
We are using the [llama.cpp](https://github.com/ggerganov/llama.cpp) library, which supports open source models optimized for local inference and minimal hardware requirements. The default text-to-text model in this repo is the open source [Qwen2.5-3B-Instruct](https://huggingface.co/bartowski/Qwen2.5-3B-Instruct-GGUF).
106-
107-
For the complete list of models supported out-of-the-box, visit this [link](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#text-only).
108-
109-
### text-to-speech
110-
111-
We support models from the [OuteAI](https://github.com/edwko/OuteTTS) package. The default text-to-speech model in this repo is [OuteTTS-0.2-500M](https://huggingface.co/OuteAI/OuteTTS-0.2-500M). Note that the `0.1-350M` version has a `CC-By-4.0` (permissive) license, whereas the newer / better `0.2-500M` version has a `CC-By-NC-4.0` (non-commercial) license.
112-
For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI) (only the GGUF versions).
26+
Get started right away using one of the options below:
11327

114-
In this [repo](https://github.com/Kostis-S-Z/document-to-podcast) you can see examples of using different TTS models with minimal code changes.
28+
| Google Colab | HuggingFace Spaces | GitHub Codespaces |
29+
| -------------| ------------------- | ----------------- |
30+
| [![Try on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/document-to-podcast/blob/main/demo/notebook.ipynb) | [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/mozilla-ai/document-to-podcast) | [![Try on Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) |
11531

116-
## Pre-requisites
32+
You can also install and use the blueprint locally:
11733

118-
- **System requirements**:
119-
- OS: Windows, macOS, or Linux
120-
- Python 3.10>, <3.12
121-
- Minimum RAM: 10 GB
122-
- Disk space: 32 GB minimum
34+
### Command Line Interface
12335

124-
- **Dependencies**:
125-
- Dependencies listed in `pyproject.toml`
36+
```bash
37+
pip install document-to-podcast
38+
```
12639

127-
## Troubleshooting
40+
```bash
41+
document-to-podcast \
42+
--input_file "example_data/Mozilla-Trustworthy_AI.pdf" \
43+
--output_folder "example_data"
44+
--text_to_text_model "Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"
45+
```
12846

129-
> When starting up the codespace, I get the message `Oh no, it looks like you are offline!`
47+
### Graphical Interface App
13048

131-
If you are on Firefox and have Enhanced Tracking Protection `On`, try turning it `Off` for the codespace webpage.
49+
```bash
50+
git clone https://github.com/mozilla-ai/document-to-podcast.git
51+
cd document-to-podcast
52+
pip install -e .
53+
```
13254

133-
> During the installation of the package, it fails with `ERROR: Failed building wheel for llama-cpp-python`
55+
```bash
56+
python -m streamlit run demo/app.py
57+
```
13458

135-
You are probably missing the `GNU Make` package. A quick way to solve it is run on your terminal `sudo apt install build-essential`
13659

13760
## License
13861

docs/api.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44

55
::: document_to_podcast.inference.model_loaders
66

7+
::: document_to_podcast.inference.model_loaders.TTS_LOADERS
8+
79
::: document_to_podcast.inference.text_to_text
810

911
::: document_to_podcast.inference.text_to_speech
12+
13+
::: document_to_podcast.inference.text_to_speech.TTS_INFERENCE

docs/customization.md

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,35 @@
33
The Document-to-Podcast Blueprint is designed to be flexible and adaptable to your specific needs.
44
This guide outlines the key parameters you can customize and explains how to make these changes depending on whether you’re running the application via app.py or the CLI pipeline.
55

6-
## 🖋️ **Key Parameters for Customization**
6+
## 🧠 **Supported models**
77

8-
- **`input_file`**: The input file specifies the document to be processed. Supports the following formats: `pdf`, `html`, `txt`, `docx`, `md`.
8+
There are 2 different parameters to customize the models being used:
99

10-
- **`text_to_text_model`**: The language model used to generate the podcast script. Note: The model parameter must be in GGFUF format, for example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf`.
10+
### **`text_to_text_model`**
1111

12-
- **`text_to_text_prompt`**: Defines the tone, structure, and instructions for generating the podcast script. This prompt is crucial for tailoring the conversation style to your project.
12+
The language model used to generate the podcast script.
13+
14+
Any model that can be loaded by [`LLama.from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest/#pulling-models-from-hugging-face-hub) can be used here.
15+
16+
Format is expected to be `{org}/{repo}/{filename}`.
17+
For example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf`.
18+
19+
20+
### **`text_to_speech_model`**
21+
22+
The model used to generate the audio from the podcast script.
1323

14-
- **`text_to_speech_model`**: Specifies the model used for text-to-speech conversion. You can change this to achieve the desired voice style or improve performance. Check `config.py` to choose from supported models.
24+
You can use any of the models listed in [`TTS_LOADERS`](api.md/#document_to_podcast.inference.model_loaders.TTS_LOADERS) out of the box.
25+
We currently support [OuteTTS](https://github.com/edwko/OuteTTS).
26+
27+
If you want to use a different model, you can integrate it by implementing the `_load` and `_text_to_speech` functions and registering them in [`TTS_LOADERS`](api.md/#document_to_podcast.inference.model_loaders.TTS_LOADERS) and [`TTS_INFERENCE`](api.md/#document_to_podcast.inference.model_loaders.TTS_INFERENCE).
28+
You can check [this repo](https://github.com/Kostis-S-Z/document-to-podcast/) where different text-to-speech models are integrated.
29+
30+
## 🖋️ **Other Customizable Parameters**
31+
32+
- **`input_file`**: The input file specifies the document to be processed. Supports the following formats: `pdf`, `html`, `txt`, `docx`, `md`.
33+
34+
- **`text_to_text_prompt`**: Defines the tone, structure, and instructions for generating the podcast script. This prompt is crucial for tailoring the conversation style to your project.
1535

1636
- **`speakers`**: Defines the podcast participants, including their names, roles, descriptions, and voice profiles. Customize this to create engaging personas and voices for your podcast.
1737

docs/getting-started.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,14 @@ Get started with Document-to-Podcast using one of the options below:
5555
```bash
5656
python -m streamlit run demo/app.py
5757
```
58+
59+
60+
## Troubleshooting
61+
62+
> When starting up the codespace, I get the message `Oh no, it looks like you are offline!`
63+
64+
If you are on Firefox and have Enhanced Tracking Protection `On`, try turning it `Off` for the codespace webpage.
65+
66+
> During the installation of the package, it fails with `ERROR: Failed building wheel for llama-cpp-python`
67+
68+
You are probably missing the `GNU Make` package. A quick way to solve it is run on your terminal `sudo apt install build-essential`

0 commit comments

Comments
 (0)