Releases: Red-Hat-AI-Innovation-Team/sdg_hub
v0.8.3 - SDG Hub UI is here!
We’ve just merged a major addition to SDG Hub: a full-featured local web UI for building, running, and monitoring synthetic data pipelines.
✨ What You Can Do
🏠 Dashboard with flow catalog & quick-start actions
🧩 Visual Flow Builder — node-based editor (LLM, Parser, Transform, Eval blocks)
🪄 Step-by-step flow testing before saving
📄 PDF → Markdown → Dataset pipeline with ICL setup
📊 Live monitoring with token stats & block-level metrics
🔁 Checkpoint & Resume for long-running jobs
🕓 Run history & logs with downloadable outputs
⚙️ Config management — save, clone, import flows
🚀 Try it out
> git clone https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub
> cd ui
> ./start.shOpens automatically 👉 http://localhost:3000/
📘 Full docs live in ui/docs/
What's Changed
- chore(deps): bump rhysd/actionlint from 1.7.10 to 1.7.11 in /.github/workflows by @dependabot[bot] in #587
- feat(ui): Add SDG Hub UI - MVP web interface for local development by @ashtarkb in #515
- fix(ui): disable experimental webstorage to prevent Node.js 25.2.0 error by @ashtarkb in #589
Full Changelog: v0.8.2...v0.8.3
v0.8.2 - Knowledge Tuning via CPT, dedicated parsers and DeepWiki
This release strengthens SDG Hub’s knowledge tuning capabilities with first-class support for Continued Pre-Training (CPT) data generation, introduces modular parsing blocks for cleaner output extraction, and expands documentation with a new DeepWiki reference page!
What's Changed
- feat: Add MCPAgentBlock for LLM agents with remote MCP tools by @shivchander in #567
- docs: update CLAUDE.md with latest repo structure and pre-commit setup by @shivchander in #583
- refactor(blocks): split parsers into focused blocks and add response extractors by @shivchander in #579
- chore: deprecate InstructLab Multi-Summary QA flow by @shivchander in #570
- Feat: CPT data generation for knowledge tuning by @abhi1092 in #572
- fix: restore prompt config YAMLs removed during InstructLab deprecation by @eshwarprasadS in #585
- chore: Fix badge link in README.md for build status by @shivchander in #577
- docs: add DeepWiki badge and update documentation link by @shivchander in #586
Full Changelog: v0.8.1...v0.8.2
v0.8.1 - uv Migration, Type Safety Improvements & Flow Architecture Cleanup
What's Changed
- chore(deps): bump aws-actions/configure-aws-credentials from 5.1.1 to 6.0.0 by @dependabot[bot] in #556
- feat: add MultiplierBlock for row duplication by @mihirathale98 in #553
- feat: Add SamplerBlock for random sampling from list columns by @shivchander in #555
- ci: add conventional commits enforcement by @xukai92 in #562
- chore: remove deprecated LLMParserBlock by @shivchander in #568
- chore: add uv.lock for reproducible builds by @shivchander in #565
- Fix/convert types hashable for checkpoints by @eshwarprasadS in #557
- feat: add Claude skill for synthetic data generation by @RohanAwhad in #566
- ci: migrate CI workflows to use uv for package management by @shivchander in #569
- chore(deps): bump actions/setup-node from 4.4.0 to 6.2.0 by @dependabot[bot] in #576
- refactor(flow): split base.py into focused submodules by @shivchander in #571
- style(types): improve type annotation coverage to pass mypy by @shivchander in #578
- chore(deps): bump actions/cache from 4.3.0 to 5.0.3 by @dependabot[bot] in #582
Full Changelog: v0.8.0...v0.8.1
v0.8.0 - Introducing Connectors: A New Architecture for External Integrations
🚀 Introducing Connectors: A New Architecture for External Integrations
This release introduces Connectors - a major new architecture for integrating external tools and frameworks into SDG Hub pipelines. This extensible system provides a standardized way to communicate with third-party services while maintaining the composable block-based design you know.
Architecture Overview
BaseConnector (universal interface)
│
├── BaseAgentConnector (messages → response)
│ └── LangflowConnector ✅
└── BaseSandboxConnector (planned)
Design Principles:
- Connectors handle external service communication
- Blocks handle DataFrame integration
- Registry provides discovery and instantiation
Langflow Integration
The first supported framework is Langflow - a visual framework for building LLM-powered applications. Use AgentBlock to integrate Langflow flows into your data generation pipelines:
from sdg_hub.core.blocks.agent import AgentBlock
block = AgentBlock(
block_name="my_agent",
agent_framework="langflow",
agent_url="http://localhost:7860/api/v1/run/my-flow",
agent_api_key="your-api-key",
input_cols=["question"],
output_cols=["response"],
extract_response=True,
async_mode=True,
max_concurrency=10,
)
result = block.generate(dataset)What's Changed
- fix: Add model_post_init to RenameColumnsBlock by @RohanAwhad in #544
- chore(deps): bump actions/cache from 5.0.1 to 5.0.2 by @dependabot[bot] in #546
- chore(deps): bump actions/cache from 5.0.2 to 5.0.3 by @dependabot[bot] in #550
- fix: use isna() for null checks in melt_columns tests by @shivchander in #552
- feat: Add connectors architecture for external service integrations by @shivchander and @mihirathale98 in #551
- docs: add connector system documentation to CLAUDE.md by @shivchander in #554
Full Changelog: v0.7.3...v0.8.0
v0.7.3 - Block Identity Refactor & LiteLLM Compatibility Update
What's Changed
- chore(deps): bump rhysd/actionlint from 1.7.9 to 1.7.10 in /.github/workflows by @dependabot[bot] in #535
- chore: raise litellm version cap to <2.0.0 by @eshwarprasadS in #540
- feat: Add block_type attribute to BaseBlock by @RohanAwhad in #537
- refactor: Replace class.name usage with block_type attribute by @RohanAwhad in #539
Full Changelog: v0.7.2...v0.7.3
v0.7.2 - Logging bug fix and Progress Bars
What's Changed
- Updating the Knowledge Tuning Example with Results by @abhi1092 in #525
- chore(deps): bump DavidAnson/markdownlint-cli2-action from 21.0.0 to 22.0.0 by @dependabot[bot] in #524
- chore(deps): bump sigstore/gh-action-sigstore-python from 3.1.0 to 3.2.0 by @dependabot[bot] in #520
- pass extra parameters to knowledge_utils.py for flexibility by @mtake in #528
- chore(deps): bump actions/cache from 4.3.0 to 5.0.1 by @dependabot[bot] in #533
- chore(deps): bump actions/download-artifact from 6.0.0 to 7.0.0 by @dependabot[bot] in #531
- Add secrets and params redaction logic by @eshwarprasadS in #521
- chore(deps): bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #534
- feat: Add progress bar for async LLM generation by @ashtarkb in #522
Full Changelog: v0.7.1...v0.7.2
v0.7.1 - Testing & Dependency Cleanup
What's Changed
- fix: lighten integration test dependencies by @eshwarprasadS in #518
Full Changelog: v0.7.0...v0.7.1
v0.7.0 - New Flow for RAG Benchmarking & Evaluation
This release introduces a brand-new RAG Dataset Flow that allows users to generate high-quality evaluation data for testing Retrieval-Augmented Generation (RAG) systems.
You can now create structured query, context, and answer triples designed specifically to benchmark and validate RAG pipelines.
Try it out here: https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/blob/main/examples/rag_evaluation/rag_evaluation_dataset_generation.ipynb
What's Changed
- docs: replace flow structure TODO with reference link by @RohanAwhad in #512
- docs: replace block example TODO with reference link by @RohanAwhad in #511
- feat: RAG dataset flow by @Ygnas in #507
- chore(deps): bump aws-actions/configure-aws-credentials from 5.1.0 to 5.1.1 by @dependabot[bot] in #514
New Contributors
Full Changelog: v0.6.1...v0.7.0
v0.6.1 - Documentation Fixes
What's Changed
- Add a Japanese knowledge SDG example by @mtake in #457
- feat(snyk): adding script for Jupyter Notebook Snyk Scan by @deekay2310 in #451
- Fixing the "missing column" bug in knowledge mixing notebook by @abhi1092 in #485
- chore(deps): bump aws-actions/configure-aws-credentials from ff717079ee2060e4bcee96c4779b553acc87447c to 7474bc4690e29a8392af63c5b98e7449536d5c3a by @dependabot[bot] in #471
- chore(deps): bump hynek/build-and-inspect-python-package from 2.13.0 to 2.14.0 by @dependabot[bot] in #469
- chore(deps): bump rhysd/actionlint from 1.7.7 to 1.7.8 in /.github/workflows by @dependabot[bot] in #470
- migrate knowledge_utils to pandas by @mtake in #489
- Updating Documentation: Tagging and Available flows by @abhi1092 in #488
- chore(deps): bump aws-actions/configure-aws-credentials from 4.3.1 to 5.1.0 by @dependabot[bot] in #491
- docs: add comprehensive boilerplate code for custom blocks by @RohanAwhad in #486
- docs: add comprehensive PromptBuilderBlock example to LLM blocks docu… by @RohanAwhad in #487
- Readability improvements for instructlab document pre-processing notebook by @mtake in #493
- docs: fix sidebar links, add dark/light theme, and update styling by @shivchander in #494
- chore(deps): bump actions/download-artifact from 5.0.0 to 6.0.0 by @dependabot[bot] in #492
- chore(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #490
- Fix documentation errors for Japanese by @mtake in #497
- chore(deps): bump sigstore/gh-action-sigstore-python from 3.0.1 to 3.1.0 by @dependabot[bot] in #495
- docs: revert PR #494 styling changes and add cover image to README by @eshwarprasadS in #500
- Store json files in the native format by @mtake in #502
- chore(deps): bump rhysd/actionlint from 1.7.8 to 1.7.9 in /.github/workflows by @dependabot[bot] in #508
- chore(deps): bump DavidAnson/markdownlint-cli2-action from 20.0.0 to 21.0.0 by @dependabot[bot] in #505
- Fix broken links in web docs with docsify, fix stale sections by @eshwarprasadS in #503
New Contributors
- @deekay2310 made their first contribution in #451
Full Changelog: v0.6.0...v0.6.1
v0.6.0 - Migration from HuggingFace Dataset to pandas DataFrame
This release represents a comprehensive architectural change in SDG Hub, replacing HuggingFace Dataset with native pandas DataFrames throughout the entire codebase. This migration delivers significant performance improvements and simplifies the data processing pipeline.
What's Changed
- Complete pandas migration from HuggingFace Dataset by @shivchander in #479
Full Changelog: v0.5.1...v0.6.0