Mixing in microphone data as information source #1877

ChrisSpraaklab · 2025-05-27T14:35:30Z

ChrisSpraaklab
May 27, 2025

I am trying to improve the performance by not only using the audio through the standard pipeline, but also using data about the location of the audio source relative to the microphone. For this, I have recordings made with a Shure MXA910 microphone array, which delivers 6 distinct, directional channels (lobes) each optimised for a speaker within a specified region. It also delivers an optimized, combined channel.

I run pyannote normally on the combined channel and use its local segmentations & embeddings, but alongside the local embeddings compute 'energy vectors' containing info about the amount of energy in each lobe in the corresponding local segments, creating a matrix of (num_chunks, local_num_speakers, dimension): (n, 3, 6) mirroring the dimensions of the local embedding matrix: (n, 3, 256). What I am stuck on is the following:

I want to merge these two information streams before global clustering, by taking the distance matrices of both embeddings matrices and combining them using a weighted sum. However, I am not sure how to intervene in the pipeline to input your own distance matrix into clustering instead of the embeddings. scipy.cluster.hierarchy.linkage used in clustering.py allows the input of a distance matrix instead of embeddings, but I would need a way to first get segmentations and embeddings from the pipeline, perform my own manipulations on that data and feed the output (distance matrix) back into the pipeline, where it will replace the embeddings inputted into clustering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mixing in microphone data as information source #1877

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Mixing in microphone data as information source #1877

Uh oh!

Uh oh!

ChrisSpraaklab May 27, 2025

Replies: 0 comments

ChrisSpraaklab
May 27, 2025