-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Our transcriber tool processes source material to produce AI-generated transcripts using various transcription services (mainly Deepgram). While these AI transcripts are highly accurate, achieving around 90% accuracy, they still require human review to reach near-perfect accuracy, especially given the technical nature of Bitcoin-related content.
We have observed common AI transcription errors through our review process. We took a first step to address this by creating the style guide with bitcointranscripts/bitcointranscripts#489. The next step is to create a machine-readable JSON format that includes these common mistakes, allowing us to correct them during post-processing.
Steps to Implement:
- Create JSON Format: Develop a JSON format to list common AI transcription errors and their corrections.
- Post-Processing Logic: Implement logic to use this JSON file during post-processing to automatically fix known errors.
- Autogenerate Error List: After the initial implementation, enhance the system to autogenerate this list based on previous corrections. This can be achieved by comparing AI-generated transcripts with the final reviewed versions stored in source control.
This approach will help us improve the accuracy of AI transcripts and reduce the workload for human reviewers.