OpenBioCure Ingestion Core is the foundational library for data ingestion and processing in the OpenBioCure platform. It provides enterprise-grade infrastructure components, configuration management, logging utilities, database session handling, and the repository pattern for building robust data ingestion workflows.
- π§ Dependency Injection - Service registration and resolution
- π Repository Pattern - Type-safe entity operations with SQLAlchemy
- π Specification Pattern - Fluent query filtering and composition
- π§΅ Async Support - Full async/await patterns throughout
- π Type Safety - Generic interfaces with Python typing
- βοΈ Configuration Management - YAML with dataclass validation
- π Auto-discovery Startup System - Ordered initialization with configuration
- πͺ΅ Structured Logging - Consistent format across components
- π§ Database Integration - SQLAlchemy async support with schema management
git clone https://github.com/openbiocure/obc-ingestion-core.git
cd obc-ingestion-core
pip install -e .
pip install git+https://github.com/openbiocure/obc-ingestion-core.git
git clone https://github.com/openbiocure/obc-ingestion-core.git
cd obc-ingestion-core
pip install -e ".[dev]"
import asyncio
from obc_ingestion_core import engine, IRepository, Repository, BaseEntity
# Initialize and start the engine
engine.initialize()
await engine.start()
# Resolve services
my_repo = engine.resolve(IRepository[MyEntity])
# Use the repository
entity = await my_repo.create(title="My Entity")
entities = await my_repo.find(MySpecification())
from obc_ingestion_core import YamlConfig, AppConfig
# Access YAML configuration
config = engine.resolve(YamlConfig)
db_host = config.get('database.host')
# Access typed configuration
app_config = engine.resolve(AppConfig)
model_provider = app_config.default_model_provider
The central orchestrator managing application lifecycle and services:
from obc_ingestion_core import engine
# Initialize the engine
engine.initialize()
# Register services
engine.register(IMyService, MyService)
# Resolve services
my_service = engine.resolve(IMyService)
# Start the application
await engine.start()
Type-safe data access with SQLAlchemy integration:
from obc_ingestion_core import IRepository, Repository, BaseEntity
class Todo(BaseEntity):
__tablename__ = "todos"
title: Mapped[str] = mapped_column(nullable=False)
completed: Mapped[bool] = mapped_column(default=False)
class ITodoRepository(IRepository[Todo], Protocol):
pass
class TodoRepository(Repository[Todo]):
pass
# Auto-registered by engine
todo_repo = engine.resolve(ITodoRepository)
todo = await todo_repo.create(title="Learn CoreLib", completed=False)
Encapsulate query logic in reusable objects:
from obc_ingestion_core import Specification
class CompletedTodoSpecification(Specification[Todo]):
def to_expression(self):
return Todo.completed == True
class TitleContainsSpecification(Specification[Todo]):
def __init__(self, text: str):
self.text = text
def to_expression(self):
return Todo.title.contains(self.text)
# Usage
completed_todos = await todo_repo.find(CompletedTodoSpecification())
learn_todos = await todo_repo.find(TitleContainsSpecification("Learn"))
# Compose specifications
combined = CompletedTodoSpecification() & TitleContainsSpecification("Learn")
Ordered initialization with auto-discovery:
from obc_ingestion_core import StartupTask
class DatabaseInitializationTask(StartupTask):
order = 30 # Lower numbers run first
async def execute(self) -> None:
# Initialize database
pass
def configure(self, config: Dict[str, Any]) -> None:
# Configure from YAML
pass
All examples are fully functional and demonstrate real-world usage:
Example | Description | Status |
---|---|---|
01_basic_todo.py | Basic repository pattern with Todo entity | β Working |
02_yaml_config.py | YAML configuration with dotted access | β Working |
03_app_config.py | Strongly-typed dataclass configuration | β Working |
04_custom_startup.py | Custom startup tasks with ordering | β Working |
05_database_operations.py | Advanced database operations | β Working |
06_autodiscovery.py | Auto-discovery of components | β Working |
07_multi_config.py | Multiple configuration sources | β Working |
# Run a specific example
python examples/01_basic_todo.py
# Run all examples
for example in examples/*.py; do
echo "=== Running $example ==="
python "$example"
echo
done
# config.yaml
database:
dialect: "sqlite"
driver: "aiosqlite"
database: "./db/openbiocure-catalog.db"
is_memory_db: false
app:
default_model_provider: "claude"
agents:
research_agent:
model: "claude-3-sonnet"
temperature: 0.7
max_tokens: 2000
logging:
level: INFO
from obc_ingestion_core import Environment
db_host = Environment.get('HERPAI_DB_HOST', 'localhost')
debug_mode = Environment.get_bool('HERPAI_DEBUG', False)
port = Environment.get_int('HERPAI_PORT', 5432)
from obc_ingestion_core import IRepository, Repository, BaseEntity
class User(BaseEntity):
__tablename__ = "users"
username: Mapped[str] = mapped_column(unique=True)
email: Mapped[str] = mapped_column(unique=True)
class IUserRepository(IRepository[User], Protocol):
async def find_by_username(self, username: str) -> Optional[User]: ...
class UserRepository(Repository[User]):
async def find_by_username(self, username: str) -> Optional[User]:
return await self.find_one(UserByUsernameSpecification(username))
from obc_ingestion_core import StartupTask
class ModelInitializationTask(StartupTask):
order = 40
async def execute(self) -> None:
# Initialize AI models
pass
def configure(self, config: Dict[str, Any]) -> None:
self.model_path = config.get('model_path', '/models')
from obc_ingestion_core import engine
class IEmailService(Protocol):
async def send_email(self, to: str, subject: str, body: str) -> bool: ...
class EmailService:
async def send_email(self, to: str, subject: str, body: str) -> bool:
# Implementation
return True
# Register the service
engine.register(IEmailService, EmailService)
-
Getting Started Guide - Comprehensive guide for new users
-
Bug Tracker - Known issues and their status
-
Examples - Working examples demonstrating all features
This project serves as an excellent reference for implementing:
- Dependency Injection patterns with service lifetime management
- Repository Pattern with SQLAlchemy and async support
- Configuration Management with YAML and dataclasses
- Async Database operations with proper session management
- Startup Task orchestration and auto-discovery
- Type-Safe generic implementations
Study the codebase to understand these patterns and adapt them to your own projects.
obc_ingestion_core/
βββ config/ # Configuration management
β βββ app_config.py
β βββ environment.py
β βββ yaml_config.py
βββ core/ # Core framework
β βββ engine.py
β βββ service_collection.py
β βββ startup_task.py
β βββ type_finder.py
βββ data/ # Data access layer
β βββ db_context.py
β βββ entity.py
β βββ repository.py
β βββ specification.py
βββ infrastructure/ # Cross-cutting concerns
βββ caching/
βββ events/
βββ logging/
- Python: 3.9+
- SQLAlchemy: 2.0+
- PyYAML: For configuration
- aiosqlite: For async SQLite support
- dataclasses: Built-in for Python 3.9+
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This library is released under the MIT License as part of the OpenBioCure initiative.
- Discord: HerpAI Discord Server
- GitHub: OpenBioCure/HerpAI-Lib
- Core symbols exposed directly from root package
find_one
method to repository pattern- Enhanced type safety throughout the codebase
- Improved service collection to handle interfaces naturally
- All examples now working correctly
- Import path issues resolved
- Configuration registration issues fixed
- Database unique constraint handling improved
- Cleaner, more organized documentation
- Simplified import statements
- Better error handling and logging
- Renamed library to
obc_ingestion_core
- Updated project metadata and package name