Skip to content

Semnet efficiently constructs graph structures from embeddings, enabling graph-based analysis and operations over large collections embedded texts, images, and more.

License

Notifications You must be signed in to change notification settings

specialprocedures/semnet

Repository files navigation

Semnet: efficient graph structures from embeddings

Embeddings of Guardian headlines represented as a network structure by Semnet and visualised by Cosmograph Embeddings of Guardian headlines represented as a network by Semnet and visualised in Cosmograph

Introduction

Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.

Semnet uses Annoy to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.

Graphs are returned as NetworkX objects, opening up a wide range of algorithms for downstream use.

The name "Semnet" derives from semantic network1, as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or graphs).

Semnet may be used for:

  • Graph algorithms: enrich your data with communities, centrality and much more for down-stream use in search, RAG and context engineering
  • Deduplication: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
  • Exploratory data analysis and visualisation, Cosmograph works brilliantly for large corpora

Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case.

Check out the launch blog for more about Semnet and the examples for inspiration.

Installation

pip install semnet

Quick Start

from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer

# Your documents
docs = [
    "The cat sat on the mat",
    "A cat was sitting on a mat",
    "The dog ran in the park",
    "I love Python",
    "Python is a great programming language",
]

# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)

# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks

# Build a NetworkX graph object from your embeddings
G = sem.fit_transform(embeddings, labels=docs)

# Export to pandas using the standalone function
from semnet import to_pandas
nodes, edges = to_pandas(G)

Requirements

  • Python 3.8+
  • networkx
  • annoy
  • numpy
  • pandas
  • tqdm

Recommended for examples:

  • sentence-transformers
  • cosmograph

Project origin

I love network analysis, and have explored embedding-derived semantic networks in the past as an alternative approach to representing, clustering and querying news data.

Semnet started life as a few functions I'd been using for deduplication and disambiguation of structured output from LLMs. I could see a number of potential uses for my code, so I decided to package it up for others to use.

Statement on the use of AI

I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.

I then used Github Copilot in VSCode to:

  • Bootstrap scaffolding, tests, documentation, examples and typing
  • Refactor the core methods in the style of the scikit-learn API
  • Add additional functionality, e.g., the ability to pass custom data to nodes
  • Walk me through deployment to readthedocs and pypi

Roadmap

Semnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions:

  • Performance optimizations for very large datasets
  • Utilities for deduplication, as that's my main use case
  • Integration with graph visualization tools

License

MIT License

Citation

If you use Semnet in academic work, please cite:

@software{semnet,
  title={Semnet: Semantic Networks from Embeddings},
  author={Ian Goodrich},
  year={2025},
  url={https://github.com/specialprocedures/semnet}
}

Footnotes

  1. Technically-speaking a Semantic Similarity Network (SSN)

About

Semnet efficiently constructs graph structures from embeddings, enabling graph-based analysis and operations over large collections embedded texts, images, and more.

Topics

Resources

License

Stars

Watchers

Forks

Languages