Skip to content

Commit 7a406a8

Browse files
authored
fix: align GraphQL queries with upstream DataHub schema (#121)
Validated all 59 GraphQL query/mutation constants against the upstream DataHub schema files (datahub-project/datahub v1.5.0.1) and fixed field paths that did not match the actual API: - documents.go: relatedAssets/relatedDocuments/parentDocument use wrapper objects (asset/document) rather than direct urn fields - structured_properties.go: fragment target type is StructuredProperties - data_contracts.go: DataContract uses properties/status, not result() - semantic_search.go: use searchAcrossEntities with SearchAcrossEntitiesInput Adds schema validation infrastructure to prevent future drift: - testdata/datahub-schema/: 31 .graphql files from upstream v1.5.0.1 - testdata/datahub-schema/sync.sh: downloads schema for any tagged version - schema_validation_test.go: validates all queries against schema files - make schema-sync / make schema-check (included in make verify) Updates CLAUDE.md with version compatibility matrix and schema workflow.
1 parent fef02a4 commit 7a406a8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+25426
-318
lines changed

CLAUDE.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -236,21 +236,38 @@ All tools accept an optional `connection` parameter to target a specific server.
236236

237237
## DataHub API Compatibility
238238

239-
**Minimum version: DataHub 1.3.x. Full feature set: DataHub 1.4.x.**
239+
**Schema source: `testdata/datahub-schema/` (synced from [datahub-project/datahub](https://github.com/datahub-project/datahub))**
240240

241-
| DataHub Version | Features Available |
242-
|---|---|
243-
| 1.3.x+ (minimum) | All read tools, all write operations except documents (tags, domains, glossary, data products, queries, owners, links, descriptions, incidents, applications, structured properties incl. delete, data contracts) |
244-
| 1.4.x+ (full) | + Documents (create/update/delete) |
241+
**Current schema version: v1.5.0.1. Minimum supported: v1.3.x. Full feature set: v1.4.x+.**
242+
243+
### Version Compatibility Matrix
244+
245+
| DataHub Version | Features Available | Schema Validated |
246+
|---|---|---|
247+
| 1.3.x+ (minimum) | All read tools, all write operations except documents (tags, domains, glossary, data products, queries, owners, links, descriptions, incidents, applications, structured properties incl. delete, data contracts) | No (pre-dates schema sync) |
248+
| 1.4.x+ (full) | + Documents (create/update/delete), semantic search | Yes (v1.4.0.3) |
249+
| 1.5.x+ (current) | + Batch data product operations | Yes (v1.5.0.1) |
250+
251+
### Schema Validation
252+
253+
All GraphQL queries are validated against the upstream DataHub schema files:
254+
255+
```bash
256+
make schema-sync # Download schema for pinned version
257+
make schema-check # Validate all queries against schema
258+
DATAHUB_VERSION=v1.5.0.1 make schema-sync # Target a specific version
259+
```
260+
261+
Schema files are checked into `testdata/datahub-schema/` and `make schema-check` runs as part of `make verify`. When adding or modifying GraphQL queries, **always check the upstream `.graphql` files first** — never guess field names or type structures.
262+
263+
### Graceful Degradation
245264

246265
The client handles variations across DataHub versions gracefully:
247266
- Uses search fallback when `listDataProducts` query unavailable
248267
- Returns empty results (not errors) when usage stats not configured
249268
- Returns empty results (not errors) when incidents or structured properties are unavailable
250269
- Parses properties from different response structures
251270

252-
When adding new queries, test against actual DataHub instances as GraphQL schemas vary between versions.
253-
254271
## Verification (AI-Verified Development)
255272

256273
Run the full verification suite before every commit:
@@ -265,6 +282,7 @@ make test # go test -race -shuffle=on ./...
265282
make coverage # Coverage report (threshold: 80%)
266283
make patch-coverage # Coverage of changed lines only (threshold: 80%)
267284
make security # gosec + govulncheck
285+
make schema-check # Validate GraphQL queries against upstream schema
268286
make mutation # gremlins (threshold: 60%)
269287
make deadcode # deadcode (unreachable functions)
270288
make build-check # go build + go mod verify

Makefile

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
.PHONY: all build test lint clean coverage security help tidy verify fmt lint-fix test-integration \
2-
patch-coverage mutation deadcode bench profile build-check
2+
patch-coverage mutation deadcode bench profile build-check schema-sync schema-check
33

44
GOCMD=go
55
GOBUILD=$(GOCMD) build
@@ -116,6 +116,15 @@ build-check:
116116
$(GOCMD) build ./...
117117
$(GOCMD) mod verify
118118

119+
## Schema validation against upstream DataHub GraphQL schema
120+
DATAHUB_VERSION ?= $(shell cat testdata/datahub-schema/VERSION 2>/dev/null || echo "v1.5.0.1")
121+
122+
schema-sync:
123+
@./testdata/datahub-schema/sync.sh $(DATAHUB_VERSION)
124+
125+
schema-check:
126+
$(GOTEST) -race -run TestGraphQLQueriesMatchSchema ./pkg/client/...
127+
119128
tidy:
120129
$(GOCMD) mod tidy
121130
$(GOCMD) mod verify
@@ -124,7 +133,7 @@ clean:
124133
rm -f $(BINARY_NAME) $(COVERAGE_FILE) coverage.html bench.txt cpu.prof mem.prof
125134
$(GOCMD) clean -cache -testcache
126135

127-
verify: tidy lint test coverage patch-coverage security deadcode build-check
136+
verify: tidy lint test coverage patch-coverage security schema-check deadcode build-check
128137
@echo "All verification checks passed."
129138

130139
help:
@@ -147,5 +156,7 @@ help:
147156
@echo " build-check - Verify build and modules"
148157
@echo " tidy - Tidy and verify modules"
149158
@echo " clean - Remove build artifacts"
159+
@echo " schema-sync - Download DataHub GraphQL schema files"
160+
@echo " schema-check - Validate queries against schema"
150161
@echo " verify - Run full verification suite"
151162
@echo " help - Show this help"

pkg/client/data_contracts.go

Lines changed: 64 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,35 @@ import (
88

99
// GraphQL query for data contracts (DataHub 1.3.x+).
1010
const (
11-
// GetDataContractQuery retrieves the data contract status for a dataset.
11+
// GetDataContractQuery retrieves the data contract for a dataset.
12+
// DataContract has properties (assertion references by category) and
13+
// status (overall state). Each category contains assertions with URNs.
1214
GetDataContractQuery = `
1315
query getDataContract($urn: String!) {
1416
dataset(urn: $urn) {
1517
contract {
16-
result(refresh: false) {
17-
type
18-
assertionResults {
18+
urn
19+
properties {
20+
entityUrn
21+
freshness {
1922
assertion {
2023
urn
2124
}
22-
type
23-
result {
24-
type
25-
nativeResults {
26-
key
27-
value
28-
}
25+
}
26+
schema {
27+
assertion {
28+
urn
29+
}
30+
}
31+
dataQuality {
32+
assertion {
33+
urn
2934
}
3035
}
3136
}
37+
status {
38+
state
39+
}
3240
}
3341
}
3442
}
@@ -44,9 +52,7 @@ func (c *Client) GetDataContract(ctx context.Context, datasetURN string) (*types
4452

4553
var response struct {
4654
Dataset struct {
47-
Contract *struct {
48-
Result *contractResultEntry `json:"result"`
49-
} `json:"contract"`
55+
Contract *contractGQLResponse `json:"contract"`
5056
} `json:"dataset"`
5157
}
5258

@@ -56,54 +62,64 @@ func (c *Client) GetDataContract(ctx context.Context, datasetURN string) (*types
5662
return nil, nil
5763
}
5864

59-
if response.Dataset.Contract == nil || response.Dataset.Contract.Result == nil {
65+
if response.Dataset.Contract == nil {
6066
return nil, nil
6167
}
6268

63-
return response.Dataset.Contract.Result.toContract(), nil
69+
return response.Dataset.Contract.toContract(), nil
6470
}
6571

66-
// contractResultEntry maps the GraphQL contract result response.
67-
type contractResultEntry struct {
68-
Type string `json:"type"`
69-
AssertionResults []assertionResultGQLEntry `json:"assertionResults"`
72+
// contractGQLResponse maps the GraphQL DataContract response.
73+
type contractGQLResponse struct {
74+
URN string `json:"urn"`
75+
Properties *struct {
76+
EntityURN string `json:"entityUrn"`
77+
Freshness []contractAssertionRef `json:"freshness"`
78+
Schema []contractAssertionRef `json:"schema"`
79+
DataQuality []contractAssertionRef `json:"dataQuality"`
80+
} `json:"properties"`
81+
Status *struct {
82+
State string `json:"state"`
83+
} `json:"status"`
7084
}
7185

72-
// assertionResultGQLEntry maps a single assertion result from GraphQL.
73-
type assertionResultGQLEntry struct {
86+
// contractAssertionRef maps an assertion reference within a contract category.
87+
type contractAssertionRef struct {
7488
Assertion struct {
7589
URN string `json:"urn"`
7690
} `json:"assertion"`
77-
Type string `json:"type"`
78-
Result struct {
79-
Type string `json:"type"`
80-
NativeResults []struct {
81-
Key string `json:"key"`
82-
Value string `json:"value"`
83-
} `json:"nativeResults"`
84-
} `json:"result"`
8591
}
8692

87-
func (r *contractResultEntry) toContract() *types.DataContract {
88-
contract := &types.DataContract{
89-
Status: r.Type,
93+
func (r *contractGQLResponse) toContract() *types.DataContract {
94+
contract := &types.DataContract{}
95+
96+
if r.Status != nil {
97+
contract.Status = r.Status.State
98+
}
99+
100+
if r.Properties == nil {
101+
return contract
102+
}
103+
104+
for _, a := range r.Properties.Freshness {
105+
contract.AssertionResults = append(contract.AssertionResults, types.AssertionResult{
106+
AssertionURN: a.Assertion.URN,
107+
Type: "FRESHNESS",
108+
})
109+
}
110+
111+
for _, a := range r.Properties.Schema {
112+
contract.AssertionResults = append(contract.AssertionResults, types.AssertionResult{
113+
AssertionURN: a.Assertion.URN,
114+
Type: "SCHEMA",
115+
})
90116
}
91117

92-
for _, ar := range r.AssertionResults {
93-
result := types.AssertionResult{
94-
AssertionURN: ar.Assertion.URN,
95-
Type: ar.Type,
96-
ResultType: ar.Result.Type,
97-
}
98-
99-
if len(ar.Result.NativeResults) > 0 {
100-
result.NativeResults = make(map[string]string, len(ar.Result.NativeResults))
101-
for _, nr := range ar.Result.NativeResults {
102-
result.NativeResults[nr.Key] = nr.Value
103-
}
104-
}
105-
106-
contract.AssertionResults = append(contract.AssertionResults, result)
118+
for _, a := range r.Properties.DataQuality {
119+
contract.AssertionResults = append(contract.AssertionResults, types.AssertionResult{
120+
AssertionURN: a.Assertion.URN,
121+
Type: "DATA_QUALITY",
122+
})
107123
}
108124

109125
return contract

0 commit comments

Comments
 (0)