This file provides guidance to Claude Code (claude.ai/claude-code) when working with this repository.
GoodData Export is a Python library for exporting GoodData workspace metadata to SQLite databases and CSV files. It fetches metrics, dashboards, visualizations, and LDM (Logical Data Model) information from the GoodData API and stores them locally for analysis.
The library supports two modes:
- API mode (default): Fetches data from GoodData API
- Local mode: Processes local
layout.jsonfiles without API calls (useful for tagging workflows on feature branches)
This is a public package. When making changes:
-
Bump the version in
pyproject.toml(single source of truth):version = "1.0.0" # Increment appropriately
Note:
__version__in__init__.pyis derived automatically viaimportlib.metadata -
Update
CHANGELOG.mdwith the changes (follow Keep a Changelog format) -
A git tag
vX.Y.Zis auto-created when the PR merges tomain(viacreate_tag.ymlworkflow). To preview locally:python scripts/create_tag.py --dry-run
This is a public package. Before committing:
- Never commit
.env*files (already in .gitignore) - Never include real API tokens, workspace IDs, or customer data in code/tests
- Use mock data or placeholders in examples and tests
- Review diffs for accidentally exposed credentials or PII
# Full workflow: export + enrichment
make run # or: make export-enrich
# Export only (skip post-processing)
make export
# Enrichment only (on existing database)
make enrich
make enrich DB=output/db/custom.db
# Run with CLI directly
gooddata-export export
gooddata-export enrich --db-path output/db/gooddata_export.db
# Run tests
pytest
# Format code
make ruff-formatgooddata_export/
├── __init__.py # Public API exports
├── config.py # ExportConfig class, environment loading
├── constants.py # Shared constants (DEFAULT_DB_NAME, worker limits)
├── common.py # API client utilities (get_api_client, create_api_session)
├── db.py # SQLite database utilities
├── post_export.py # Post-processing orchestration, topological sort
├── export/ # Export module (orchestration, fetching, writing)
│ ├── __init__.py # Main orchestration (export_all_metadata)
│ ├── fetch.py # Data fetching functions (API calls)
│ ├── writers.py # Database/CSV writer functions (export_*)
│ └── utils.py # Export utilities (write_to_csv, execute_with_retry)
├── process/ # Data processing modules
│ ├── __init__.py # Exports all process functions
│ ├── entities.py # Entity processing (metrics, dashboards, visualizations) + fetch_child_workspaces
│ ├── layout.py # Layout API fetching (fetch_ldm, fetch_analytics_model, fetch_users_and_user_groups)
│ ├── dashboard_traversal.py # Dashboard widget/visualization extraction
│ ├── rich_text.py # Rich text extraction from dashboards
│ └── common.py # Shared utilities (sort_tags)
└── sql/ # SQL scripts for post-export processing
├── post_export_config.yaml # YAML configuration for all SQL operations
├── tables/ # Table creation scripts (metrics_references, etc.)
├── views/ # Analytical views (v_metrics_*, v_*_tags, etc.)
├── updates/ # Table modification scripts (duplicate detection)
└── procedures/ # Parameterized views for API automation
scripts/
└── create_tag.py # Auto-create git tag from pyproject.toml version (CI + manual)
-
Export Phase (
export/→process/)- API mode: Fetches from
analyticsModelendpoint (parent and children) - Local mode: Uses provided
layout_jsondirectly (no API calls) - All data is processed in layout format (flat structure with
obj["title"]) - Stores in SQLite tables: metrics, visualizations, dashboards, ldm_*, etc.
- API mode: Fetches from
-
Post-Export Phase (
post_export.py)- Loads
sql/post_export_config.yaml - Topologically sorts operations by dependencies
- Executes tables → views → procedures → updates in order
- Python populate functions run for tables needing regex (e.g.,
metrics_references)
- Loads
| Table | Description |
|---|---|
metrics |
Metric definitions with MAQL formulas |
visualizations |
Visualization configurations |
dashboards |
Dashboard definitions |
metrics_references |
All metric references from MAQL - metrics, attributes, labels, facts (Python populates) |
metrics_ancestry |
Transitive metric-to-metric ancestry (recursive CTE) |
| View | Purpose |
|---|---|
v_metrics_relationships |
Direct metric references with titles |
v_metrics_relationships_ancestry |
Full ancestry with titles/tags |
v_metrics_relationships_root |
Root metrics (no outgoing dependencies) |
v_*_tags |
Unnested tags for each entity type |
v_*_usage |
Usage tracking views |
Both metrics and visualizations validate label references against both ldm_labels and ldm_columns (type='attribute').
Data model:
Attribute: id="region" <- in ldm_columns (type='attribute')
└── Label: id="region.name" <- in ldm_labels only
└── Label: id="region.code" <- in ldm_labels only
Attribute: id="date.month" <- in ldm_columns (type='attribute')
└── Label: id="date.month" <- in ldm_labels (shares attribute ID)
Label IDs can be:
- Specific label IDs like
region.name→ only inldm_labels - Attribute IDs like
date.monthwhere default label shares the ID → inldm_columns - Date granularities like
process_date.day→ only inldm_columns
Validation logic (same for both):
LEFT JOIN ldm_labels ll ON referenced_id = ll.id
LEFT JOIN ldm_columns lc ON referenced_id = lc.id AND lc.type = 'attribute'
WHERE ll.id IS NULL AND lc.id IS NULL -- Invalid only if not in EITHERThis ensures any valid label reference is accepted regardless of whether it's a specific label ID or an attribute ID used as default label.
Create .env.gdcloud:
BASE_URL=https://your-instance.gooddata.com
WORKSPACE_ID=your_workspace_id
BEARER_TOKEN=your_api_tokensql/post_export_config.yaml defines:
- tables: Created tables (some with
python_populatefor Python processing) - views: Read-only analytical views
- procedures: Parameterized views (
base_url/workspace_idfromdictionary_metadataCTE, onlybearer_tokensubstituted) - updates: Table modifications with
required_columns
Each entry has:
sql_file: Path to SQL filedependencies: List of items that must run firstcategory: Grouping (tagging/usage/deduplication/procedures)
- Create SQL file in
sql/views/v_your_view.sql - Add to
sql/post_export_config.yaml:
views:
v_your_view:
sql_file: views/v_your_view.sql
description: What this view does
category: usage
dependencies: [] # or list dependencies- Create SQL file in
sql/tables/your_table.sql(structure only) - Add Python function in
post_export.py - Register in
PYTHON_POPULATE_FUNCTIONSdict - Add to YAML with
python_populate: your_function_name
Updates modify existing tables during post-export processing.
- Create SQL file in
sql/updates/your_update.sql - Add to
sql/post_export_config.yaml:
updates:
your_update:
sql_file: updates/your_update.sql
description: What this update does
category: usage
table: target_table_name
dependencies: []
required_columns:
new_column: INTEGER DEFAULT 0 # Columns to add if missing- Important: Include
{parent_workspace_filter}placeholder in WHERE clauses:
-- Pattern 1: When you have no other conditions
UPDATE metrics
SET some_column = value
WHERE 1=1 {parent_workspace_filter};
-- Pattern 2: When you have existing conditions
UPDATE metrics
SET some_column = value
WHERE is_valid IS NULL {parent_workspace_filter};This placeholder is replaced at runtime:
- Multi-workspace exports:
AND workspace_id = 'parent_ws_id'(only updates parent workspace) - Single-workspace exports: empty string (updates all rows)
- Dependencies are resolved via topological sort (Kahn's algorithm)
- Circular dependencies will raise
ValueError - Items without dependencies execute in alphabetical order
All export_* functions in export/writers.py share the same signature for uniform orchestration:
def export_something(all_workspace_data, export_dir, config, db_name):This allows export/__init__.py to call them in a loop:
for export_func in export_functions:
export_func(all_workspace_data, export_dir, config, db_path)Important: Some functions don't use all parameters (e.g., export_dashboards_permissions doesn't use config). Use underscore prefix (_config) for unused parameters and document in the docstring why it's kept. Don't remove unused parameters - it would break the uniform interface.
# Test imports work
python3 -c "from gooddata_export.post_export import load_post_export_config; print(load_post_export_config())"
# Run enrichment on existing DB to test SQL changes
make enrich
# Full export + enrich
make export-enrichPython files must be formatted and linted with Ruff after changes:
make ruff-format
# or directly: ruff check --fix . && ruff format .This project targets Python 3.14+. Use built-in generics and | union syntax - no typing imports needed:
def process(items: list[str], config: dict[str, int] | None = None) -> set[str]: ...
def get_class() -> type[MyClass]: ...
def fetch(id: str | int) -> tuple[str, bool]: ...Only import from typing: Any, Never, TypeVar, TYPE_CHECKING, Protocol, Literal, TypedDict
Exception syntax: Python 3.14 allows except A, B: without parentheses (equivalent to except (A, B):). Both forms are valid.
Don't consolidate every repeated pattern. Small, simple duplications (2-3 lines appearing a few times) are often clearer than adding another abstraction layer. Consolidate when the pattern is complex (5+ lines), appears in many places (5+), or requires consistent behavior that might need updating.
Use Python's logging module instead of print() for all output:
import logging
logger = logging.getLogger(__name__)
# Use logger methods instead of print()
logger.info("Processing workspace %s", workspace_id) # Not: print(f"Processing workspace {workspace_id}")
logger.warning("Could not fetch data: %s", error) # Not: print(f"Warning: Could not fetch data: {error}")
logger.debug("Debug info: %s", details) # For debug-only outputBenefits:
- Consistent output format across the codebase
- Log levels allow filtering (INFO, WARNING, DEBUG, ERROR)
- Easier to redirect output to files or external logging systems
- Use
%sformatting (not f-strings) for lazy evaluation
- SQL files use
DROP ... IF EXISTSthenCREATE - SQL comments explain purpose at top of file
- Table naming convention: Use plural form for grouping
- Main tables:
dashboards,metrics,visualizations - Junction tables:
dashboards_visualizations,dashboards_metrics,dashboards_permissions
- Main tables:
- View naming convention:
v_{table_plural}_{suffix}- views are grouped by table namev_dashboards_tags(dashboards group)v_metrics_tags,v_metrics_usage,v_metrics_relationships(metrics group)v_visualizations_tags,v_visualizations_usage(visualizations group)