Mixing in microphone data as information source #1877
Unanswered
ChrisSpraaklab
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to improve the performance by not only using the audio through the standard pipeline, but also using data about the location of the audio source relative to the microphone. For this, I have recordings made with a Shure MXA910 microphone array, which delivers 6 distinct, directional channels (lobes) each optimised for a speaker within a specified region. It also delivers an optimized, combined channel.
I run pyannote normally on the combined channel and use its local segmentations & embeddings, but alongside the local embeddings compute 'energy vectors' containing info about the amount of energy in each lobe in the corresponding local segments, creating a matrix of (num_chunks, local_num_speakers, dimension): (n, 3, 6) mirroring the dimensions of the local embedding matrix: (n, 3, 256). What I am stuck on is the following:
I want to merge these two information streams before global clustering, by taking the distance matrices of both embeddings matrices and combining them using a weighted sum. However, I am not sure how to intervene in the pipeline to input your own distance matrix into clustering instead of the embeddings. scipy.cluster.hierarchy.linkage used in
clustering.py
allows the input of a distance matrix instead of embeddings, but I would need a way to first get segmentations and embeddings from the pipeline, perform my own manipulations on that data and feed the output (distance matrix) back into the pipeline, where it will replace the embeddings inputted into clustering.Beta Was this translation helpful? Give feedback.
All reactions