Linkage function to aggregate topics is different between `visualize_hierarchy` and `_reduce_to_n_topics` #2453

AXLMRIN · 2025-11-02T16:05:15Z

AXLMRIN
Nov 2, 2025

Hello,

Reading the source code (bertopic version 0.17.3), I realised that :

to compute the dendogram in visualize_hierarchy, the default behaviour was to use the the c-TF-IDF embeddings and the ward linkage function (cf _bertopic.py L3064 and L3099).
to reduce the number of topics in _reduce_to_n_topics, the default behaviour was to use the topic embeddings and the average linkage function (cf _bertopic.py L4442 and L4464).

This made me confused, as I'd expect that the visualisation method and the topic reduction method to follow the same logic.

My question is: why not using the same linkage function? Why using the C-TF-IDF embeddings?

Thank you for your time.