How to have speaker wise transcript #129
-
|
Hello All, I am using following options with deepgram API dg.transcription.sync_prerecorded(source, options) and then getting whole transcription into json element results.channels.alternatives.transcript but then results.channels.alternativeswords has speaker id associated. Is there anyway I can get speaker wise transcription? Otherwise, I need to write custom script to skim through each word and then extract from transcript to separate from transcript. I was looking for something following format... speaker 0:"xxxxxxxxx" Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
|
Hi @amitkayal, I am not in the Deepgram team but have been playing around with it for a while. I recommend using the Smart Format feature which is explained here.... https://developers.deepgram.com/documentation/features/smart-format/ It should give you what you require. I am using the following set of features (shown in the URL I am using below)... "https://api.deepgram.com/v1/listen?language=en&model=nova&diarize=true&smart_format=true" You can see highlights of the output I get from this article I have recently put together on LinkedIn.... https://www.linkedin.com/pulse/audio-transcription-made-super-easy-richard-hall I hope it helps! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @amitkayal , You will need to write a script to do this. I would suggest building it off the Here's some information from a previous post that might help you https://github.com/orgs/deepgram/discussions/106#discussioncomment-5445821 Hope this is helpful! Sandra |
Beta Was this translation helpful? Give feedback.
-
|
Does this Python script help you out? import requests
import json
def get_speaker_wise_transcript(response):
speaker_transcripts = {}
for channel in response['results']['channels']:
for word in channel['alternatives'][0]['words']:
speaker_id = word['speaker']
if speaker_id not in speaker_transcripts:
speaker_transcripts[speaker_id] = word['word']
else:
speaker_transcripts[speaker_id] += ' ' + word['word']
return speaker_transcripts
# Replace this with your Deepgram API key
API_KEY = "your-deepgram-api-key"
# Replace this with the path to your audio file
AUDIO_FILE_PATH = "path-to-your-audio-file.wav"
# Define the headers for the request
headers = {
"Authorization": f"Token {API_KEY}",
}
# Define the options for the request
options = {
"punctuate": True,
"model": 'general',
"tier": 'enhanced',
"diarize": True,
"endpointing": 'true'
}
# Open the audio file
with open(AUDIO_FILE_PATH, 'rb') as audio_file:
# Make the request to the Deepgram API
response = requests.post(
"https://api.deepgram.com/v1/listen",
headers=headers,
params=options,
data=audio_file
)
# Parse the JSON response
response_json = response.json()
# Get the speaker-wise transcripts
speaker_transcripts = get_speaker_wise_transcript(response_json)
# Print out the transcripts
for speaker, transcript in speaker_transcripts.items():
print(f'speaker {speaker}: "{transcript}"')Sample output for this one would look like this: If you want to capture the sequential nature of speakers switching back and forth, you can do this instead: import requests
import json
def get_speaker_wise_transcript(response):
speaker_transcripts = []
last_speaker_id = None
for channel in response['results']['channels']:
for word in channel['alternatives'][0]['words']:
speaker_id = word['speaker']
if speaker_id != last_speaker_id:
speaker_transcripts.append((speaker_id, word['word']))
else:
speaker_transcripts[-1] = (speaker_id, speaker_transcripts[-1][1] + ' ' + word['word'])
last_speaker_id = speaker_id
return speaker_transcripts
# Replace this with your Deepgram API key
API_KEY = "your-deepgram-api-key"
# Replace this with the path to your audio file
AUDIO_FILE_PATH = "path-to-your-audio-file.wav"
# Define the headers for the request
headers = {
"Authorization": f"Token {API_KEY}",
}
# Define the options for the request
options = {
"punctuate": True,
"model": 'general',
"tier": 'enhanced',
"diarize": True,
"endpointing": 'true'
}
# Open the audio file
with open(AUDIO_FILE_PATH, 'rb') as audio_file:
# Make the request to the Deepgram API
response = requests.post(
"https://api.deepgram.com/v1/listen",
headers=headers,
params=options,
data=audio_file
)
# Parse the JSON response
response_json = response.json()
# Get the speaker-wise transcripts
speaker_transcripts = get_speaker_wise_transcript(response_json)
# Print out the transcripts
for speaker, transcript in speaker_transcripts:
print(f'speaker {speaker}: "{transcript}"')Sample output would look something like this: I hope that helps! |
Beta Was this translation helpful? Give feedback.
Does this Python script help you out?