BrainAPI uses a modular adapter architecture that allows you to plug in different drivers for each component:
| Adapter | File | Purpose |
|---|---|---|
| DataORMAdapter | /adapters/data.py |
Textual information storage |
| EmbeddingsAdapter | /adapters/vectors.py |
Vector embeddings storage |
| GraphDBAdapter | /adapters/graph.py |
Graph database operations |
| CacheAdapter | /adapters/cache.py |
High-speed caching |
| LLMProviderAdapter | /adapters/llm.py |
Large Language Model integration |
| PromptsAdapter | /adapters/prompts.py |
Prompt injection and management |
The adapter pattern enables:
- Easy Driver Swapping: Switch between different databases, vector stores, or LLM providers
- Testing: Mock adapters for unit testing
- Flexibility: Mix and match different technologies based on your needs
- Maintainability: Clean separation of concerns
The data is processed and added to the brain with the following steps:
- Chunking - Break down input text into manageable pieces
- Ensure Memory Existence - Verify memory storage is available
- Save Chunks (can be concurrent) - Store text chunks in database
- Extract Facts & Observations (can be concurrent) - Identify key information
- Embed Facts (can be concurrent) - Generate vector embeddings for facts
- Extract Language - Detect the language of the content
- Save Vector - Store vector embeddings in vector database
- Get Relationship Extractor - Select appropriate extractor based on language
- Content Type Extraction - Determine the type of content being processed
- Retrieve Relevant Memories - Find related information from existing memories
- Resolve Coreferences - Link pronouns and references to their entities
- Extract Relationships with LLM - Use language model to identify entity relationships
- Wikification - Link entities to Wikipedia or knowledge base entries
- Save Triplets - Store subject-predicate-object relationships in graph database
BrainAPI uses Celery with Redis to manage the tasks and the queues during the injection and writing, make sure you have a Redis instance running and accessible and before running the project with the make dev-custom command.
Concurrent operations are handled with semaphores, you can configure the semaphores in the brainapi/config.py file, the default settings are:
class Config:
class ConcurrencyConfig:
def __init__(self):
self.fastcoref_semaphore = 100
self.azure_llm_large_semaphore = 1000
self.triplet_extraction_llm_semaphore = 1000
self.embedding_llm_semaphore = 1000
self.coref_model_semaphore = 100
self.ce_tokenizer_semaphore = 30Below you'll find everything you need to have your BrainAPI instance running and working with your own databases locally or hosted in the cloud.
Before starting the project make sure you've installed the python packages (managed with Poetry poetry install) and spacy dependencies with make download-spacy command, used for the named entity recognition and coreference resolution.
BrainAPI uses Celery with Redis to manage the tasks and the queues during the injection and writing, make sure you have a Redis instance running and accessible and before running the project with the make dev-custom command.
Before starting the project you'll need to add your implementation of the adapters, the following instructions will help you do that.
This adapter is responsable for the storage of the textual information. Inside the brainapi/server/adapters/data.py file, you'll need to implement an instance of a class that inherits from the AbstractDataORMAdapter class and implements the methods defined in the interface.
You'll need to return the driver in the place of the commented line and delete the raise NotImplementedError("You need to implement the get_async_driver method") line:
@classmethod
def get_async_driver(cls):
"""Get a fresh async driver instance"""
# return your implementation of the async driver instance here
raise NotImplementedError("You need to implement the get_async_driver method")This adapter is rasponsable for the creation, storagem and retrieval of vector embeddings, brainapi uses two adapters to handle two different vector dimensions, it's your choice to implement a single vector store and embeddings generator or two separate ones.
You'll need to implement two instances of two different classes, one for the embeddings generator (inherits from EmbeddingEncoderProvider) and one for the vector store (inherits from EmbeddingDBProvider).
# Example
class EmbeddingEncoderDriv(EmbeddingEncoderProvider):
def encode(self, text: str) -> list[float]:
# return your implementation of the encode method here
raise NotImplementedError("You need to implement the encode method")
embeddings_encoder_driver = EmbeddingEncoderDriver()
class EmbeddingDBDriver(EmbeddingDBProvider):
def upsert(self, vectors: list[Vector], namespace: str) -> None:
# return your implementation of the upsert method here
raise NotImplementedError("You need to implement the upsert method")
...
embeddings_db_driver = EmbeddingDBDriver()And implement them in the brainapi/server/adapters/vectors.py file like this:
embeddings_adapter = EmbeddingsAdapter(
embeddings_encoder_driver,
embeddings_db_driver
)
embeddings_adapter_nodes = EmbeddingsAdapter(
embeddings_encoder_driver, # Or another node-specific instance of the EmbeddingEncoderDriver class
embeddings_db_driver # Or another node-specific instance of the EmbeddingDBDriver class
)This adapter is responsable of the graph management, write/edit/retrieve operations on the knowledge graph are done through this class. You'll need to choose a graph db and create a class that implements the operations of the abstract class GraphDB class inside brainapi/server/adapters/graph.py.
def get_graph_adapter():
# from brainapi.server.lib.your_graph_db_driver_class import graphdb_client
# return GraphAdapter(graphdb_client)
raise NotImplementedError("You need to implemnent a graphdb client/driver")This adapter is responsable of managing the cache operations, you'll need to implement a class that inherits from the CacheDriver class inside brainapi/server/adapters/cache.py and implements the methods defined in the interface.
def _get_client(self):
"""Get cache client for current event loop"""
# return your cache client instance here
raise NotImplementedError("You need to implement the _get_client method")This is the simplest adapter to implement, you'll need to just create a class that inherits from the LLM class inside brainapi/server/interfaces.py and implements the methods defined in the interface.
class LLMAdapter:
def __init__(self, llm: LLM):
self.llm = # your llm instance here
async def generate_text(self, prompt: str, max_new_tokens: int = None) -> str:
return await self.llm.generate_text(prompt, max_new_tokens)
llm_adapter = LLMAdapter(_llm_large)This adapter is responsable for the management of the prompts that will be used to interact with the LLM, you'll need to implement a class that inherits from the PromptsAdapter class inside brainapi/server/adapters/prompts.py and implements the methods defined in the interface.
Create a class that is responsable of retrieving and parsing the prompts, and make sure that the llm responses return the correct type of data based on the registered result types.
prompts_adapter = PromptsAdapter(your_prompts_provider_class)
prompts_adapter.register_type(
"relationship_extractor", # the prompt with this key
RelationshipExtractedResult # will return this data type
)You can use the brainapi/config.py file to set the configuration settings for the project, the file is already populated with the default settings for the development environment, you can change the settings to your needs.
The convention in the project is to instatiate sub classes inside the Config class for each configuration setting, you can add your own settings by creating a new sub class and adding the settings to the __init__ method, throwing an error if any of the required settings are not set.
class Config:
async def __init__(self):
self.redis = self.RedisConfig()
class RedisConfig:
def __init__(self):
self.host = os.getenv("REDIS_HOST")
self.port = os.getenv("REDIS_PORT")
self.db = os.getenv("REDIS_DB")
if self.host is None or self.port is None or self.db is None:
raise ValueError(
"[Config:RedisConfig] REDIS_HOST, REDIS_PORT, and REDIS_DB must be set"
)
...