Skip to content

Improve AI Transcripts Accuracy with Automated Post-Processing #122

@kouloumos

Description

@kouloumos

Our transcriber tool processes source material to produce AI-generated transcripts using various transcription services (mainly Deepgram). While these AI transcripts are highly accurate, achieving around 90% accuracy, they still require human review to reach near-perfect accuracy, especially given the technical nature of Bitcoin-related content.

We have observed common AI transcription errors through our review process. We took a first step to address this by creating the style guide with bitcointranscripts/bitcointranscripts#489. The next step is to create a machine-readable JSON format that includes these common mistakes, allowing us to correct them during post-processing.

Steps to Implement:

  1. Create JSON Format: Develop a JSON format to list common AI transcription errors and their corrections.
  2. Post-Processing Logic: Implement logic to use this JSON file during post-processing to automatically fix known errors.
  3. Autogenerate Error List: After the initial implementation, enhance the system to autogenerate this list based on previous corrections. This can be achieved by comparing AI-generated transcripts with the final reviewed versions stored in source control.

This approach will help us improve the accuracy of AI transcripts and reduce the workload for human reviewers.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions