-
Notifications
You must be signed in to change notification settings - Fork 378
Open
Description
With the introduction of GraphQL (currently in beta) to easily explore dataset and job metadata collected by Marquez, there has been a considerable drift in the REST API spec and the schema for GraphQL. Keeping both specs aligned will be addressed in a separate proposal, but for now, we'd like to propose a simple solution for reading / writing metadata to / from the Marquez database.
What to consider in our design:
- Dataset and job metadata collected via either the Dataset and Job APIs or the Lineage API (used to collect OpenLineage events) should be stored using a common interface to avoid drift in logic
- When reading collected dataset and job metadata using either the REST API or GraphQL, a common interface should be used to avoid drift in logic but also code duplication
With a common read / write interface to access metadata, there's also the added benefit of maintainability and testability.
How will dataset and job metadata be written / read?
We propose a common DAO class MetadataDao (defined below) that would delegate writes to tables using specific underlying DAOs, but also encapsulate any pre-processing steps:
public interface MetadataDao {
BagOfJobInfo upsertLineageEvent(LineageEvent event);
Namespace upsertNamespaceMeta(NamespaceName namespaceName, NamespaceMeta namespaceMeta);
Dataset upsertDatasetMeta(NamespaceName namespaceName, DatasetName datasetName, DatasetMeta datasetMeta);
Job upsertJob(NamespaceName namespaceName, JobName jobName, JobMeta jobMeta);
.
.
}What are the benefits?
- Clear interface for inserting rows to database
- Maintainability and testability
- Avoids duplicating row insertion logic
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo