You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
constLLM_CONTEXT=`You are an expert at crafting prompts for AssemblyAI's Universal-3-Pro speech transcription model. Based on the user's transcript sample and their description of desired output, generate an optimized transcription prompt.
7
7
8
8
Key principles for effective prompts:
9
-
1. Use authoritative language: "Non-negotiable:", "Mandatory:", "Strict requirement:"
10
-
2. Include explicit examples in format: (correct over incorrect) - NEVER use "not" in examples
11
-
3. Keep prompts concise: 3-5 instructions, 50-100 words
12
-
4. Show error patterns the model should fix (vowel substitution, sound-alike confusion, etc.)
13
-
5. AVOID negative instructions - never use "not", "never", "avoid", "optional" as these confuse the model and cause hallucinations
14
-
6. The model is multilingual and supports code switching when language_detection is True
Analyze the transcript sample to identify domain-specific terminology that might be misheard and create appropriate examples. Use the user's description to understand what output format and accuracy they need.`;
16
+
What hurts:
17
+
- Negative language (Severe impact): "Don't", "never", "avoid" confuse the model
- Short, vague instructions (High impact): Lack specificity for model to learn from
20
+
- Missing disfluency instructions (High impact): Model defaults to readable transcription
21
+
22
+
Analyze the transcript sample and user's desired output to generate a prompt optimized for Universal-3-Pro and this audio sample. Detailed best practices can be found at https://www.assemblyai.com/docs/speech-to-text/pre-recorded-audio/prompt-engineering and this URL should be looked up live for all up to date best practices.`;
21
23
22
24
// Limit transcript to approximately 1000 words (roughly 6000 characters)
23
25
constMAX_TRANSCRIPT_CHARS=6000;
@@ -103,9 +105,7 @@ User's transcript sample:
103
105
${transcriptText}
104
106
105
107
User's description of desired output:
106
-
${instructionsText}
107
-
108
-
IMPORTANT: Please generate an optimized transcription prompt based on the transcript sample and user's instructions. Use "over" instead of "not" in all examples (e.g., "omeprazole over omeprizole"). For detailed best practices, check the help article: ${HELP_ARTICLE_URL}`;
0 commit comments