Skip to content

openbiocure/obc-ingestion-core

Repository files navigation

🐍 OpenBioCure Ingestion Core Library

Python License: MIT Code style: black Imports: isort Makefile CI

OpenBioCure Ingestion Core is the foundational library for data ingestion and processing in the OpenBioCure platform. It provides enterprise-grade infrastructure components, configuration management, logging utilities, database session handling, and the repository pattern for building robust data ingestion workflows.

PyPI version Downloads GitHub stars GitHub forks

πŸ› οΈ Tech Stack

SQLAlchemy PyYAML aiosqlite asyncio

πŸš€ Features

  • 🧠 Dependency Injection - Service registration and resolution
  • πŸ”„ Repository Pattern - Type-safe entity operations with SQLAlchemy
  • πŸ” Specification Pattern - Fluent query filtering and composition
  • 🧡 Async Support - Full async/await patterns throughout
  • πŸ“ Type Safety - Generic interfaces with Python typing
  • βš™οΈ Configuration Management - YAML with dataclass validation
  • πŸš€ Auto-discovery Startup System - Ordered initialization with configuration
  • πŸͺ΅ Structured Logging - Consistent format across components
  • πŸ”§ Database Integration - SQLAlchemy async support with schema management

πŸ› οΈ Installation

From Source (Recommended)

git clone https://github.com/openbiocure/obc-ingestion-core.git
cd obc-ingestion-core
pip install -e .

From GitHub (Direct)

pip install git+https://github.com/openbiocure/obc-ingestion-core.git

For Development

git clone https://github.com/openbiocure/obc-ingestion-core.git
cd obc-ingestion-core
pip install -e ".[dev]"

⚑ Quick Start

Basic Usage

import asyncio
from obc_ingestion_core import engine, IRepository, Repository, BaseEntity

# Initialize and start the engine
engine.initialize()
await engine.start()

# Resolve services
my_repo = engine.resolve(IRepository[MyEntity])

# Use the repository
entity = await my_repo.create(title="My Entity")
entities = await my_repo.find(MySpecification())

Configuration

from obc_ingestion_core import YamlConfig, AppConfig

# Access YAML configuration
config = engine.resolve(YamlConfig)
db_host = config.get('database.host')

# Access typed configuration
app_config = engine.resolve(AppConfig)
model_provider = app_config.default_model_provider

πŸ“‹ Core Concepts

Dependency Injection Engine

The central orchestrator managing application lifecycle and services:

from obc_ingestion_core import engine

# Initialize the engine
engine.initialize()

# Register services
engine.register(IMyService, MyService)

# Resolve services
my_service = engine.resolve(IMyService)

# Start the application
await engine.start()

Repository Pattern

Type-safe data access with SQLAlchemy integration:

from obc_ingestion_core import IRepository, Repository, BaseEntity

class Todo(BaseEntity):
    __tablename__ = "todos"
    title: Mapped[str] = mapped_column(nullable=False)
    completed: Mapped[bool] = mapped_column(default=False)

class ITodoRepository(IRepository[Todo], Protocol):
    pass

class TodoRepository(Repository[Todo]):
    pass

# Auto-registered by engine
todo_repo = engine.resolve(ITodoRepository)
todo = await todo_repo.create(title="Learn CoreLib", completed=False)

Specification Pattern

Encapsulate query logic in reusable objects:

from obc_ingestion_core import Specification

class CompletedTodoSpecification(Specification[Todo]):
    def to_expression(self):
        return Todo.completed == True

class TitleContainsSpecification(Specification[Todo]):
    def __init__(self, text: str):
        self.text = text
    
    def to_expression(self):
        return Todo.title.contains(self.text)

# Usage
completed_todos = await todo_repo.find(CompletedTodoSpecification())
learn_todos = await todo_repo.find(TitleContainsSpecification("Learn"))

# Compose specifications
combined = CompletedTodoSpecification() & TitleContainsSpecification("Learn")

Startup Tasks

Ordered initialization with auto-discovery:

from obc_ingestion_core import StartupTask

class DatabaseInitializationTask(StartupTask):
    order = 30  # Lower numbers run first
    
    async def execute(self) -> None:
        # Initialize database
        pass
    
    def configure(self, config: Dict[str, Any]) -> None:
        # Configure from YAML
        pass

πŸ“ Examples

All examples are fully functional and demonstrate real-world usage:

Example Description Status
01_basic_todo.py Basic repository pattern with Todo entity βœ… Working
02_yaml_config.py YAML configuration with dotted access βœ… Working
03_app_config.py Strongly-typed dataclass configuration βœ… Working
04_custom_startup.py Custom startup tasks with ordering βœ… Working
05_database_operations.py Advanced database operations βœ… Working
06_autodiscovery.py Auto-discovery of components βœ… Working
07_multi_config.py Multiple configuration sources βœ… Working

Running Examples

# Run a specific example
python examples/01_basic_todo.py

# Run all examples
for example in examples/*.py; do
    echo "=== Running $example ==="
    python "$example"
    echo
done

βš™οΈ Configuration

YAML Configuration

# config.yaml
database:
  dialect: "sqlite"
  driver: "aiosqlite"
  database: "./db/openbiocure-catalog.db"
  is_memory_db: false

app:
  default_model_provider: "claude"
  agents:
    research_agent:
      model: "claude-3-sonnet"
      temperature: 0.7
      max_tokens: 2000

logging:
  level: INFO

Environment Variables

from obc_ingestion_core import Environment

db_host = Environment.get('HERPAI_DB_HOST', 'localhost')
debug_mode = Environment.get_bool('HERPAI_DEBUG', False)
port = Environment.get_int('HERPAI_PORT', 5432)

πŸ”§ Extending the Library

Custom Repositories

from obc_ingestion_core import IRepository, Repository, BaseEntity

class User(BaseEntity):
    __tablename__ = "users"
    username: Mapped[str] = mapped_column(unique=True)
    email: Mapped[str] = mapped_column(unique=True)

class IUserRepository(IRepository[User], Protocol):
    async def find_by_username(self, username: str) -> Optional[User]: ...

class UserRepository(Repository[User]):
    async def find_by_username(self, username: str) -> Optional[User]:
        return await self.find_one(UserByUsernameSpecification(username))

Custom Startup Tasks

from obc_ingestion_core import StartupTask

class ModelInitializationTask(StartupTask):
    order = 40
    
    async def execute(self) -> None:
        # Initialize AI models
        pass
    
    def configure(self, config: Dict[str, Any]) -> None:
        self.model_path = config.get('model_path', '/models')

Custom Services

from obc_ingestion_core import engine

class IEmailService(Protocol):
    async def send_email(self, to: str, subject: str, body: str) -> bool: ...

class EmailService:
    async def send_email(self, to: str, subject: str, body: str) -> bool:
        # Implementation
        return True

# Register the service
engine.register(IEmailService, EmailService)

πŸ“š Documentation

Using as a Reference

This project serves as an excellent reference for implementing:

  • Dependency Injection patterns with service lifetime management
  • Repository Pattern with SQLAlchemy and async support
  • Configuration Management with YAML and dataclasses
  • Async Database operations with proper session management
  • Startup Task orchestration and auto-discovery
  • Type-Safe generic implementations

Study the codebase to understand these patterns and adapt them to your own projects.

πŸ“ Library Structure

obc_ingestion_core/
β”œβ”€β”€ config/           # Configuration management
β”‚   β”œβ”€β”€ app_config.py
β”‚   β”œβ”€β”€ environment.py
β”‚   └── yaml_config.py
β”œβ”€β”€ core/             # Core framework
β”‚   β”œβ”€β”€ engine.py
β”‚   β”œβ”€β”€ service_collection.py
β”‚   β”œβ”€β”€ startup_task.py
β”‚   └── type_finder.py
β”œβ”€β”€ data/             # Data access layer
β”‚   β”œβ”€β”€ db_context.py
β”‚   β”œβ”€β”€ entity.py
β”‚   β”œβ”€β”€ repository.py
β”‚   └── specification.py
└── infrastructure/   # Cross-cutting concerns
    β”œβ”€β”€ caching/
    β”œβ”€β”€ events/
    └── logging/

πŸ§ͺ Requirements

  • Python: 3.9+
  • SQLAlchemy: 2.0+
  • PyYAML: For configuration
  • aiosqlite: For async SQLite support
  • dataclasses: Built-in for Python 3.9+

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“ License

This library is released under the MIT License as part of the OpenBioCure initiative.

πŸ’¬ Community


πŸ“‹ Changelog

[3.1.0] - 2025-01-26

Added

  • Core symbols exposed directly from root package
  • find_one method to repository pattern
  • Enhanced type safety throughout the codebase
  • Improved service collection to handle interfaces naturally

Fixed

  • All examples now working correctly
  • Import path issues resolved
  • Configuration registration issues fixed
  • Database unique constraint handling improved

Changed

  • Cleaner, more organized documentation
  • Simplified import statements
  • Better error handling and logging

[0.2.1] - 2025-04-05

Changed

  • Renamed library to obc_ingestion_core
  • Updated project metadata and package name

About

Core library for the OpenBioCure HerpAI Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published