Skip to content

Proposal: Unify REST API and GraphQL read / write access to database #1727

@wslulciuc

Description

@wslulciuc

With the introduction of GraphQL (currently in beta) to easily explore dataset and job metadata collected by Marquez, there has been a considerable drift in the REST API spec and the schema for GraphQL. Keeping both specs aligned will be addressed in a separate proposal, but for now, we'd like to propose a simple solution for reading / writing metadata to / from the Marquez database.

What to consider in our design:

  1. Dataset and job metadata collected via either the Dataset and Job APIs or the Lineage API (used to collect OpenLineage events) should be stored using a common interface to avoid drift in logic
  2. When reading collected dataset and job metadata using either the REST API or GraphQL, a common interface should be used to avoid drift in logic but also code duplication

With a common read / write interface to access metadata, there's also the added benefit of maintainability and testability.

How will dataset and job metadata be written / read?

We propose a common DAO class MetadataDao (defined below) that would delegate writes to tables using specific underlying DAOs, but also encapsulate any pre-processing steps:

public interface MetadataDao {
  BagOfJobInfo upsertLineageEvent(LineageEvent event);
  Namespace upsertNamespaceMeta(NamespaceName namespaceName, NamespaceMeta namespaceMeta);
  Dataset upsertDatasetMeta(NamespaceName namespaceName, DatasetName datasetName, DatasetMeta datasetMeta);
  Job upsertJob(NamespaceName namespaceName, JobName jobName, JobMeta jobMeta);
  .
  .
}

What are the benefits?

  1. Clear interface for inserting rows to database
  2. Maintainability and testability
  3. Avoids duplicating row insertion logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions