This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
DataHub GMA (General Metadata Architecture) is the backend framework for LinkedIn's DataHub metadata search & discovery platform. It provides type-safe, schema-driven metadata management with event-driven data consistency across multiple storage backends (SQL via Ebean, Elasticsearch for search). Built on LinkedIn's Pegasus data framework and Rest.li for REST APIs.
Java version: 1.8 (Java 8) — required, newer versions will cause build failures.
./gradlew build # Full build + all tests
./gradlew :module-name:build # Build a specific module (e.g., :dao-api, :dao-impl:ebean-dao)
./gradlew :module-name:test # Run tests for a specific module
./gradlew spotlessCheck # Check code formatting (CI enforced)
./gradlew spotlessApply # Auto-fix formatting
./gradlew checkstyleMain # Run checkstyle on main sources
./gradlew idea # Generate IntelliJ project filesTests use TestNG (not JUnit) as the default test framework across all subprojects. The elasticsearch integration tests require Docker.
Apple Silicon (M-series Mac): Requires brew install mariadb and uncommenting three lines in
EmbeddedMariaInstance.java (see docs/developers.md).
core-models/ → Pegasus PDL schemas: Urn, AuditStamp, Url, Time (no Java logic)
core-models-utils/ → URN utility helpers
dao-api/ → DAO abstractions (BaseLocalDAO, BaseSearchDAO, BaseBrowseDAO),
event producers, query utilities, retention policies
dao-impl/
ebean-dao/ → SQL storage via Ebean ORM (EbeanLocalDAO, relationship queries)
elasticsearch-dao/ → ES 5.x/6.x search implementation
elasticsearch-dao-7/→ ES 7.x search implementation
restli-resources/ → Rest.li resource base classes (BaseEntityResource,
BaseSearchableEntityResource) mapping DAOs to REST endpoints
validators/ → Schema validators ensuring PDL models conform to GMA conventions
(AspectValidator, EntityValidator, SnapshotValidator, etc.)
gradle-plugins/ → Annotation parsing (@gma) and code generation for metadata events
testing/ → Test infrastructure, ES integration test harness, test models
- Urn (Universal Resource Name):
urn:li:entityType:entityKey— the universal identifier for all entities. Typed URN subclasses provide entity-specific keys. - Aspect Union Pattern: Each entity type defines a Pegasus union of its supported aspects. Validators enforce that union members are record types only.
- Aspect Versioning: Version 0 = latest. Each aspect write creates a new immutable version. Retention policies (indefinite, time-based, version-based) control history.
- Layered Storage: BaseLocalDAO (SQL, source of truth) → BaseSearchDAO (Elasticsearch, derived index) → BaseBrowseDAO (hierarchical navigation). BaseRemoteDAO proxies to other GMS instances.
- Event Sourcing: Writes to LocalDAO trigger MCE/MAE event emission via BaseMetadataEventProducer. The
gradle-pluginsauto-generate event PDL schemas from@gmaannotations on aspect PDL files. - Generic Type Binding: DAOs are heavily parameterized with generics (
<ASPECT_UNION, URN>) and validate type constraints at construction time using reflection viaModelUtils.
PDL (Pegasus Data Language) schemas live in src/main/pegasus/ directories and compile to Java RecordTemplate
classes. Key namespaces:
com.linkedin.common.*— Core types (Urn, AuditStamp)com.linkedin.metadata.aspect.*— Aspect wrapperscom.linkedin.metadata.query.*— Search/filter structurescom.linkedin.metadata.events.*— Change tracking typescom.linkedin.metadata.snapshot.*— Entity snapshots (versioned aspect collections)
When modifying PDL schemas, the Pegasus gradle plugin regenerates Java bindings automatically during build.
Follow Conventional Commits: <type>(scope): description
Types: feat, fix, refactor, docs, test, perf, style, build, ci
Max line length: 88 characters. Use imperative present tense, no capitalized first letter, no trailing dot.