A tool to extract clean conversation transcripts from TLDV HTML, available as both a web app and a Python script.
Try the web version: TLDV Conversation Extractor
This tool extracts clean, readable conversation transcripts from TLDV (The Long Distance Video) meeting recordings. It processes the HTML of a TLDV transcript and converts it into a simple text format with speaker names and their dialogue.
Speaker 1: Not at the moment. No. Yeah, that's
Speaker 2: It. Listen, listen. Now and Another another project too. Pretty cool. Eh
Speaker 1: Um, what are you talking about this page?
- Works entirely in your browser - no data is sent to any server
- Privacy-focused: all processing happens locally
- Download the extracted conversation as a text file
- Copy the extracted conversation to clipboard
- No installation or dependencies required
- Visit TLDV Conversation Extractor
- Go to your TLDV transcript page
- Right-click on the transcript container and select "Inspect Element"
- Find the
<div id="transcript-container">element - Right-click on it and select "Copy" → "Copy outerHTML"
- Paste the copied HTML into the tool
- Click "Extract Conversation" to process the transcript
- Use the "Download" or "Copy to Clipboard" buttons as needed
We've included a sample-input.html file in this repository that shows the expected HTML structure from TLDV. The sample input would produce this output:
Speaker 1: Hello and welcome to our meeting.
Speaker 2: Thanks for having me here today.
Speaker 1: Let's discuss the project timeline.
This sample demonstrates the HTML structure that the tool is designed to parse. You can use it to test the tool or understand what kind of HTML to copy from TLDV.
Want to host your own version? You have several options:
- Fork this repository
- Sign up for Netlify
- Click "New site from Git" and select your forked repository
- Configure build settings (leave defaults)
- Click "Deploy site"
- Fork this repository
- Sign up for Cloudflare Pages
- Create a new project and connect your GitHub account
- Select your forked repository
- Configure with these settings:
- Build command: (leave empty)
- Build output directory:
/
- Click "Save and Deploy"
- Fork this repository
- Sign up for Vercel
- Create a new project and import your forked repository
- Configure with default settings
- Click "Deploy"
For those who prefer a command-line tool or want to process files in batch, we provide a Python script.
- Python 3.6+
- BeautifulSoup4 (
pip install beautifulsoup4)
# Clone the repository
git clone https://github.com/barshy/tldv-transcript-extractor-.git
# Navigate to the directory
cd tldv-transcript-extractor-
# Install dependencies
pip install -r requirements.txt# Process a single file
python extract_conversation.py input_file.txt [output_file.txt]
# If output file is not specified, result is printed to console
python extract_conversation.py conversation1.txt- Process TLDV transcript HTML files from the command line
- Save output to a file or print to console
- Can be integrated into other Python projects
- Suitable for batch processing multiple files
Both versions (web and Python) process all data locally. Your transcript data never leaves your computer.
- Web version: Built with vanilla HTML, CSS, and JavaScript
- Python version: Uses BeautifulSoup4 for HTML parsing
- No external API calls or data collection
Contributions are welcome! Feel free to open issues or submit pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.