This project demonstrates how to transcribe meeting audio into text and generate structured meeting minutes (summary, key points, takeaways, and action items) using OpenAI Whisper, Hugging Face Transformers, and quantization techniques for efficient model loading.
-
Transcribe audio files into text using:
- OpenAI Whisper
- Open-source Hugging Face models
-
Generate detailed meeting minutes in Markdown format:
- Summary with attendees, date, and location
- Key discussion points
- Takeaways
- Action items with owners
-
Quantization with BitsAndBytes: Efficient 4-bit model loading using Hugging Face
BitsAndBytesConfig
-
Tokenizer integration with Hugging Face API: Apply chat templates and manage inputs for LLaMA models
-
Supports GPU acceleration on Google Colab
Run the following inside Colab to install dependencies:
!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0 openai
-
Mount Google Drive (to load your audio file):
from google.colab import drive drive.mount("/content/drive") audio_filename = "/content/drive/MyDrive/folder/audio_extract.mp3"
-
Add API Keys:
- Hugging Face:
HF_TOKEN
- OpenAI:
OPENAI_API_KEY
Store them in Colab Secrets for security:
from google.colab import userdata hf_token = userdata.get("HF_TOKEN") openai_api_key = userdata.get("OPENAI_API_KEY")
- Hugging Face:
- Whisper (OpenAI) for audio-to-text transcription
- Meta-LLaMA 3.1 (8B-Instruct) for meeting minutes generation
- Quantization with BitsAndBytes to run large models on limited GPU(T4 free) memory:
- Upload or mount your audio file.
- Run the transcription cell to convert audio to text.
- Generate meeting minutes with LLaMA by adjusting the system and user prompts.
- View results in Markdown format inside Colab.
Feel free to fork the repo, open issues, and submit PRs.