Obsidian Memory Transformer

Press Coverage

Cambridge physicist and AI pioneer Sahibzada Allahyar achieves historic score in theoretical physics exam
YC Killer: How Cambridge physicist Sahibzada Allahyar built the world's most popular AI agents library
Sahibzada Allahyar's Singularity Research: The elite lab uniting Harvard, MIT, and Cambridge's brightest minds to democratize AI

Obsidian Memory Transformer

A novel LLM architecture written in highly optimized low-level C++/CUDA with a new Long-Term Memory (LTM) mechanism for large context windows. This is a high-performance implementation of a Transformer model with long-term memory capabilities, inspired by Google's Titan architecture. This project provides efficient CUDA implementations of FlashAttention and memory-augmented Transformer blocks, along with Python bindings for easy integration.

Features

Long-term Memory: Novel memory mechanism for handling extended context windows efficiently
FlashAttention: Memory-efficient attention implementation with minimal memory access
High Performance:
- Optimized CUDA kernels
- Mixed precision training (FP16/BF16)
- Quantization support (INT8/INT4)
- Fused operations for better throughput
Distributed Training:
- Data parallelism
- Tensor parallelism
- Pipeline parallelism
- Multi-node support via MPI
Python Integration:
- HuggingFace-compatible interface
- Easy-to-use training API
- Efficient inference engine

Installation

Prerequisites

CUDA Toolkit (>= 11.0)
CMake (>= 3.15)
C++17 compatible compiler
Python (>= 3.7)
PyTorch (>= 1.9.0)

Installing from PyPI

pip install ltm-transformer

Building from Source

Clone the repository:

git clone https://github.com/singularityresearch/ltm-transformer.git
cd ltm-transformer

Install Python dependencies:

pip install -r requirements.txt

Build and install:

mkdir build && cd build
cmake ..
make -j$(nproc)
make install

Quick Start

Python

from ltm import TitanModel, TitanConfig, InferenceEngine

# Initialize model
config = TitanConfig(
    hidden_size=768,
    num_attention_heads=12,
    memory_slots=512,
    use_flash_attention=True
)
model = TitanModel(config)

# Training
from ltm import Trainer, TrainingArguments

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./outputs",
        learning_rate=5e-5,
        per_device_train_batch_size=8,
        gradient_accumulation_steps=4
    ),
    train_dataset=dataset
)
trainer.train()

# Inference
engine = InferenceEngine(
    model=model,
    config=InferenceConfig(
        use_flash_attention=True,
        use_memory_cache=True,
        max_sequence_length=2048
    )
)

output = engine.generate(
    input_ids=tokenizer.encode("Hello, how are"),
    max_new_tokens=50
)

C++

#include "ltm/transformer/titan_inspired_block.cuh"

// Configure model
ltm::transformer::TitanBlockConfig config;
config.hidden_dim = 768;
config.num_heads = 12;
config.memory_slots = 512;
config.use_flash_attention = true;

// Create model
auto model = std::make_unique<ltm::transformer::TitanBlock<float>>(config);

// Run inference
torch::Tensor input = /* ... */;
auto output = model->forward(input);

Architecture

The LTM Transformer extends the standard Transformer architecture with:

Memory Bank: A trainable matrix storing compressed representations of past context
Compression Gate: Mechanism for compressing and storing relevant information
Memory Attention: Efficient attention between current context and memory bank
FlashAttention: Memory-efficient attention implementation

For detailed architecture information, see docs/design/architecture.md.

Performance

Memory Usage

Context Length	Standard Transformer	LTM Transformer
2K tokens	4 GB	2 GB
8K tokens	64 GB	4 GB
32K tokens	1024 GB	8 GB

Training Speed

1.5x faster training compared to standard Transformers
4x reduction in memory bandwidth usage
Linear scaling up to 64 GPUs

For detailed benchmarks, see docs/performance/optimization.md.

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

Install development dependencies:

pip install -r requirements-dev.txt

Build with testing enabled:

mkdir build && cd build
cmake -DBUILD_TESTING=ON ..
make -j$(nproc)

Run tests:

ctest --output-on-failure

Citation

If you use this work in your research, please cite:

@article{allahyar2025ltm,
    title={LTM Transformer: Long-term Memory Transformer with Titan-inspired Architecture},
    author={Allahyar, Sahibzada},
    journal= https://github.com/Sahibzada-A/Obsidian-Memory-Transformer,
    year={2025}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

Google's Titan architecture for inspiration
FlashAttention paper for efficient attention implementation
HuggingFace team for transformer implementations
NVIDIA for CUDA optimization guidelines

Contact

Sahibzada A - sahibzada@singularityresearchlabs.com
Project Link: https://github.com/Sahibzada-A/Obsidian-Memory-Transformer

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
docs		docs
include		include
python_bindings		python_bindings
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Obsidian_Memory_Transformers.pdf		Obsidian_Memory_Transformers.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Press Coverage

Obsidian Memory Transformer

Features

Installation

Prerequisites

Installing from PyPI

Building from Source

Quick Start

Python

C++

Architecture

Performance

Memory Usage

Training Speed

Contributing

Development Setup

Citation

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Press Coverage

Obsidian Memory Transformer

Features

Installation

Prerequisites

Installing from PyPI

Building from Source

Quick Start

Python

C++

Architecture

Performance

Memory Usage

Training Speed

Contributing

Development Setup

Citation

License

Acknowledgments

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages