Semnet: efficient graph structures from embeddings

Embeddings of Guardian headlines represented as a network by Semnet and visualised in Cosmograph

Introduction

Semnet constructs graph structures from embeddings, enabling graph-based analysis and operations over collections of embedded documents.

Semnet uses Annoy to perform efficient pair-wise distance calculations, allowing for million-embedding network construction in under ten minutes on consumer hardware.

Graphs are returned as NetworkX objects, opening up a wide range of algorithms for downstream use.

The name "Semnet" derives from semantic network¹, as it was initially designed for an NLP use-case, but the tool will work well with any form of embedded document (e.g., images, audio, even or graphs).

Semnet may be used for:

Graph algorithms: enrich your data with communities, centrality and much more for down-stream use in search, RAG and context engineering
Deduplication: remove duplicate records (e.g., "Donald Trump", "Donald J. Trump) from datasets
Exploratory data analysis and visualisation, Cosmograph works brilliantly for large corpora

Exposing the full NetworkX and Annoy APIs, Semnet offers plenty of opportunity for experimentation depending on your use-case.

Check out the launch blog for more about Semnet and the examples for inspiration.

Installation

pip install semnet

Quick Start

from semnet import SemanticNetwork
from sentence_transformers import SentenceTransformer

# Your documents
docs = [
    "The cat sat on the mat",
    "A cat was sitting on a mat",
    "The dog ran in the park",
    "I love Python",
    "Python is a great programming language",
]

# Generate embeddings (use any embedding provider)
embedding_model = SentenceTransformer("BAAI/bge-base-en-v1.5")
embeddings = embedding_model.encode(docs)

# Create and configure semantic network
sem = SemanticNetwork(thresh=0.3, verbose=True)  # Larger values give sparser networks

# Build a NetworkX graph object from your embeddings
G = sem.fit_transform(embeddings, labels=docs)

# Export to pandas using the standalone function
from semnet import to_pandas
nodes, edges = to_pandas(G)

Requirements

Python 3.8+
networkx
annoy
numpy
pandas
tqdm

Recommended for examples:

sentence-transformers
cosmograph

Project origin

I love network analysis, and have explored embedding-derived semantic networks in the past as an alternative approach to representing, clustering and querying news data.

Semnet started life as a few functions I'd been using for deduplication and disambiguation of structured output from LLMs. I could see a number of potential uses for my code, so I decided to package it up for others to use.

Statement on the use of AI

I kicked off the project by hand-refactoring my initial code into the class-based structure that forms the core functionality of the current module.

I then used Github Copilot in VSCode to:

Bootstrap scaffolding, tests, documentation, examples and typing
Refactor the core methods in the style of the scikit-learn API
Add additional functionality, e.g., the ability to pass custom data to nodes
Walk me through deployment to readthedocs and pypi

Roadmap

Semnet is a relatively simple project focused on core graph construction functionality. I don't have much in the way of immediate plans to expand it, however can see the potential for a few future additions:

Performance optimizations for very large datasets
Utilities for deduplication, as that's my main use case
Integration with graph visualization tools

License

MIT License

Citation

If you use Semnet in academic work, please cite:

@software{semnet,
  title={Semnet: Semantic Networks from Embeddings},
  author={Ian Goodrich},
  year={2025},
  url={https://github.com/specialprocedures/semnet}
}

Technically-speaking a Semantic Similarity Network (SSN) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
img		img
src/semnet		src/semnet
tests		tests
.coverage		.coverage
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semnet: efficient graph structures from embeddings

Introduction

Installation

Quick Start

Requirements

Project origin

Statement on the use of AI

Roadmap

License

Citation

About

Uh oh!

Releases

Languages

License

specialprocedures/semnet

Folders and files

Latest commit

History

Repository files navigation

Semnet: efficient graph structures from embeddings

Introduction

Installation

Quick Start

Requirements

Project origin

Statement on the use of AI

Roadmap

License

Citation

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages