nisaba

A data quality, reconciliation, and validation framework across different data store in Arrow Rust.

nisaba

This library brings a disciplined structure to data reconciliation, providing deterministic conflict resolution, data validation and quality checks, clean merging primitives, and a trustworthy source of truth—so engineers can focus on building systems, not untangling inconsistent or unreliable records. It exposes the properties and attributes of datasets to make differences, inconsistencies, and risks explicit.

In its initial versions, the library focuses on reconciliation, producing deterministic insights and reports that explain how disparate data silos relate to one another and how they can be merged into a single, unified source of truth. This enables data store migrations and integrations to surface gaps when discovered and resolve them efficiently, before trust is lost downstream.

Naming

In Mesopotamian/Sumeria mythology, Nisaba is the goddess of writing, accounting, and the orderly keeping of records, entrusted with maintaining clarity across ledgers and knowledge archives.

Core Concepts and Features

Reconciliation-first architecture: Establishes dataset equivalence across systems as the strongest guarantee of correctness using Lancedb for vector persistence and similarity search and Fastembed for embedding generation
Deterministic reconciliation engine: Produces order-independent, repeatable results suitable for CI and automated workflows.
Cross-store data support: Unified handling of tabular data across SQL (MySQL, PostgreSQL, SQLite), NoSQL (MongoDB), and file formats (CSV, Excel, Parquet).
Store-agnostic internal data model: Logical representation of data decoupled from physical storage or format.

Getting Started

To get started, just add to Cargo.toml

[dependencies]
nisaba = { version = "0.2.0" }

Usage

Prefer using the example and the generated docs or:

use nisaba::{
    AnalyzerConfig, DistanceType, EmbeddingModel, FileStoreType, SchemaAnalyzer, ScoringConfig,
    SimilarityConfig, Source,
};

#[tokio::main]
async fn main() {
    let config = AnalyzerConfig::builder()
        .sample_size(10)
        .scoring(ScoringConfig {
            type_weight: 0.65,
            structure_weight: 0.35,
        })
        .similarity(SimilarityConfig {
            threshold: 0.59,
            top_k: Some(7),
            algorithm: DistanceType::Cosine,
        })
        .build();

    // analyzer
    let analyzer = SchemaAnalyzer::builder()
        .name("nisaba")
        .config(config)
        .embedding_model(EmbeddingModel::MultilingualE5Small)
        .source(
            Source::files(FileStoreType::Csv)
                .path("./assets/csv")
                .num_rows(10)
                .has_header(true)
                .build()
                .unwrap(),
        )
        .sources(vec![
            Source::files(FileStoreType::Parquet)
                .path("./assets/parquet")
                .num_rows(10)
                .build()
                .unwrap(),
        ])
        .build()
        .await
        .unwrap();

    let _result = analyzer.analyze().await.unwrap();
}

How nisaba works

Assume that a data engineer discovers multiple schema/sources with several tables that have been long been ignored and wants to deduce how they are connected and related between themselves and (or) the contemporary data store. The engineer would:

Map out the sources and relevant credentials
Setup Nisaba StorageConfigs
Setup SchemaAnalyzer
Run the analyzer with the storage configs
Review the Results/Report for reconcialiation hints

Roadmap

Successive improvements will allow more features in providing quality and validation as documented in the roadmap

Versioning

As with most Rust crates, this library is versioned according to Semantic Versioning. Breaking changes will only be made with good reason, and as infrequently as is feasible. Such changes will generally be made in releases where the major version number is increased (note Cargo's caveat for pre-1.x versions), although limited exceptions may occur. Increases in the minimum supported Rust version (MSRV) are not considered breaking, but will result in a minor version bump.

See also the changelog for details about changes in recent versions.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.cargo		.cargo
.github		.github
assets/parquet		assets/parquet
docs		docs
examples		examples
scripts		scripts
src		src
supply-chain		supply-chain
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
deny.toml		deny.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nisaba

nisaba

Naming

Core Concepts and Features

Getting Started

Usage

How nisaba works

Roadmap

Versioning

License

Contribution

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nisaba

nisaba

Naming

Core Concepts and Features

Getting Started

Usage

How nisaba works

Roadmap

Versioning

License

Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages