Skip to content

Implement Community Detection#11

Merged
code-shoily merged 2 commits intomainfrom
community-experiment
Mar 19, 2026
Merged

Implement Community Detection#11
code-shoily merged 2 commits intomainfrom
community-experiment

Conversation

@code-shoily
Copy link
Copy Markdown
Owner

@code-shoily code-shoily commented Mar 19, 2026

PR: Community Detection Implementation (v5.0.0)

Summary

This PR introduces a comprehensive community detection suite to Yog, enabling users to identify and analyze densely connected groups of nodes in graphs. The implementation includes 8 community detection algorithms, metrics module for evaluation, and random walk primitives.

What's New

Core Module (yog/community.gleam)

Core types and utilities for community detection:

  • Communities - Maps nodes to community IDs with community count
  • Dendrogram - Hierarchical community structure with merge history
  • CommunityId - Type alias for community identifiers
  • Utility functions: communities_to_dict/1, largest_community/1, community_sizes/1, merge_communities/3

Community Detection Algorithms

Algorithm Module Best For Complexity
Louvain Method yog/community/louvain Large graphs, speed/quality balance O(E log V)
Leiden Method yog/community/leiden Quality guarantee, well-connected communities O(E log V)
Label Propagation yog/community/label_propagation Very large graphs, speed O(E × iters)
Girvan-Newman yog/community/girvan_newman Hierarchical structure, small graphs O(E² × V)
Walktrap yog/community/walktrap Random walk-based communities O(V² log V)
Infomap yog/community/infomap Flow-based, information-theoretic O(E × iters)
Clique Percolation yog/community/clique_percolation Overlapping communities O(3^(V/3))
Random Walk yog/community/random_walk Primitives for custom algorithms O(steps × k)

Metrics Module (yog/community/metrics.gleam)

Evaluation metrics for community quality:

  • modularity/2 - Newman's modularity Q (O(E))
  • count_triangles/1 - Global triangle count (O(V × k²))
  • triangles_per_node/1 - Local triangle counts
  • clustering_coefficient/2 - Local clustering coefficient
  • average_clustering_coefficient/1 - Global clustering
  • density/1 - Graph density
  • community_density/2 - Internal community density
  • average_community_density/2 - Average across all communities

Internal Utilities (yog/community/internal.gleam)

Shared infrastructure for modularity-based algorithms:

  • CommunityState type for tracking assignments and weights
  • Deterministic shuffle using LCG
  • Modularity gain calculations
  • Graph aggregation for hierarchical methods

API Design

All algorithms follow a consistent pattern:

// Basic usage
import yog/community/louvain

let communities = louvain.detect(graph)
io.debug(communities.num_communities)  // => 4

// With options
let options = louvain.LouvainOptions(
  min_modularity_gain: 0.0001,
  max_iterations: 100,
  seed: 42,
)
let communities = louvain.detect_with_options(graph, options)

// Hierarchical detection
let dendrogram = louvain.detect_hierarchical(graph)
// dendrogram.levels contains community partitions at each level

Documentation

Every module includes:

  • Algorithm overview with steps
  • "When to Use" comparison tables
  • Time/space complexity information
  • Code examples with imports
  • Reference links (papers, Wikipedia)

File Structure

src/yog/
├── community.gleam                      (~200 lines)  # Core types
└── community/
    ├── internal.gleam                   (~300 lines)  # Shared utilities
    ├── metrics.gleam                    (~240 lines)  # Evaluation
    ├── label_propagation.gleam          (~195 lines)  # Fast algorithm
    ├── girvan_newman.gleam              (~350 lines)  # Hierarchical
    ├── louvain.gleam                    (~400 lines)  # Most popular
    ├── leiden.gleam                     (~670 lines)  # Improved Louvain
    ├── walktrap.gleam                   (~300 lines)  # Random walk
    ├── infomap.gleam                    (~275 lines)  # Info-theoretic
    ├── clique_percolation.gleam         (~270 lines)  # Overlapping
    └── random_walk.gleam                (~180 lines)  # Primitives

Total: ~3,200 lines

Implementation Highlights

Louvain Method

  • Two-phase approach: local optimization + aggregation
  • Fast modularity gain calculation using community statistics
  • Hierarchical output via detect_hierarchical/1
  • Stats tracking for debugging via detect_with_stats/2

Leiden Method

  • Builds on Louvain with refinement step
  • Guarantees well-connected communities (no disconnected subgraphs)
  • Better hierarchical structure than Louvain

Label Propagation

  • Near-linear time complexity
  • Stochastic with seed control
  • Fastest option for very large graphs

Girvan-Newman

  • Leverages existing betweenness centrality implementation
  • Builds full dendrogram via edge removal
  • Warning: O(E² × V) complexity - best for small graphs

Walktrap

  • Random walk distance matrix computation
  • Hierarchical agglomerative clustering
  • Configurable walk length

Infomap

  • PageRank-based flow calculation
  • Map Equation optimization
  • Best for directed, flow-based communities

Clique Percolation

  • Bron-Kerbosch algorithm for maximal cliques
  • Supports overlapping communities
  • Convertible to hard assignments

Metrics

  • Efficient triangle counting using node iterator method
  • Modularity optimized with set operations
  • Clustering coefficient for local and global measures

Testing

All algorithms have been tested with:

  • Empty graphs
  • Single-node graphs
  • Disconnected graphs
  • Complete graphs
  • Path/star graphs
  • Zachary's Karate Club (where applicable)

Test count: All 1053 existing tests pass.

Performance Considerations

Algorithm Max Recommended Size Notes
Label Propagation Millions of nodes Near-linear time
Louvain Millions of nodes Fast in practice
Leiden Millions of nodes Slightly slower than Louvain
Infomap 100K+ nodes Depends on convergence
Walktrap 10K nodes O(V²) space requirement
Girvan-Newman 1K nodes Expensive betweenness recalculation
Clique Percolation 5K nodes Clique enumeration is costly

Algorithm Selection Guide

Speed Priority: Label Propagation > Louvain > Leiden

Quality Priority: Leiden > Louvain > Infomap > Walktrap

Hierarchical Structure: Girvan-Newman, Louvain, Leiden, Walktrap

Overlapping Communities: Clique Percolation (only option)

Flow-Based: Infomap

Small Graphs (<1000 nodes): Any algorithm works well

Breaking Changes

None. This is a pure addition to the API.

Migration Guide

N/A - New feature, no migration needed.

Future Enhancements (v6.0.0+)

  • Spectral clustering
  • Stochastic Block Models (SBM)
  • Consensus clustering
  • Dynamic community detection
  • GPU acceleration for large graphs
  • Parallel implementations

References

Checklist

  • All algorithms implemented
  • Core types and utilities
  • Metrics module
  • Comprehensive documentation
  • Examples for all modules
  • Consistent API pattern
  • Build passes
  • All tests pass (1053)
  • No breaking changes

Target Version: 5.0.0
Estimated Review Time: 30-45 minutes
Suggested Reviewers: Anyone familiar with graph algorithms

@code-shoily code-shoily self-assigned this Mar 19, 2026
@code-shoily code-shoily merged commit 8b8eabf into main Mar 19, 2026
1 check passed
@code-shoily code-shoily deleted the community-experiment branch March 19, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant