Improve AI Transcripts Accuracy with Automated Post-Processing

Our transcriber tool processes source material to produce AI-generated transcripts using various transcription services (mainly Deepgram). While these AI transcripts are highly accurate, achieving around 90% accuracy, they still require human review to reach near-perfect accuracy, especially given the technical nature of Bitcoin-related content.

We have observed common AI transcription errors through our review process. We took a first step to address this by creating the style guide with https://github.com/bitcointranscripts/bitcointranscripts/pull/489. The next step is to create a machine-readable JSON format that includes these common mistakes, allowing us to correct them during post-processing.

**Steps to Implement:**

1. **Create JSON Format**: Develop a JSON format to list common AI transcription errors and their corrections.
2. **Post-Processing Logic**: Implement logic to use this JSON file during post-processing to automatically fix known errors.
3. **Autogenerate Error List**: After the initial implementation, enhance the system to autogenerate this list based on previous corrections. This can be achieved by comparing AI-generated transcripts with the final reviewed versions stored in source control.

This approach will help us improve the accuracy of AI transcripts and reduce the workload for human reviewers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve AI Transcripts Accuracy with Automated Post-Processing #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve AI Transcripts Accuracy with Automated Post-Processing #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions