diff --git a/api-reference/authentication.mdx b/api-reference/authentication.mdx new file mode 100644 index 00000000..539cfa87 --- /dev/null +++ b/api-reference/authentication.mdx @@ -0,0 +1,224 @@ +--- +title: Authentication +description: Authenticate your API requests using JWT Bearer tokens +sidebarTitle: Authentication +mode: "wide" +--- + +# Authentication + +The Collate API uses JWT (JSON Web Token) authentication. All API requests must include a valid token in the `Authorization` header. + +## Obtaining a Token + +There are two ways to obtain an API token: + +### Bot Token (Recommended for Automation) + +Bot tokens are ideal for service accounts, CI/CD pipelines, and automated integrations. + +1. Navigate to **Settings > Bots** in the Collate UI +2. Click **Add Bot** or select an existing bot +3. Under **Token**, click **Generate Token** +4. Copy and securely store the generated JWT token + + +Bot tokens have the permissions assigned to the bot's role. Ensure the bot has appropriate roles for your use case. + + +### Personal Access Token + +Personal access tokens are tied to your user account and inherit your permissions. + +1. Click your profile icon in the top-right corner +2. Select **Access Tokens** +3. Click **Generate New Token** +4. Set an expiration date and click **Generate** +5. Copy and securely store the token + + +Personal access tokens cannot be retrieved after creation. Store them securely immediately after generation. + + +## Using the Token + +Include the token in the `Authorization` header of all API requests: + +``` +Authorization: Bearer +``` + +### Examples + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import ( + OpenMetadataConnection, +) +from metadata.generated.schema.security.client.openMetadataJWTClientConfig import ( + OpenMetadataJWTClientConfig, +) + +# Configure with JWT token +server_config = OpenMetadataConnection( + hostPort="https://your-company.getcollate.io/api", + authProvider="openmetadata", + securityConfig=OpenMetadataJWTClientConfig( + jwtToken="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..." + ), +) + +# Create authenticated client +metadata = OpenMetadata(server_config) + +# All subsequent calls are authenticated +tables = metadata.list_all_entities(entity=Table) +``` + + + +```java +import org.openmetadata.client.gateway.OpenMetadata; +import org.openmetadata.schema.services.connections.metadata.OpenMetadataConnection; +import org.openmetadata.schema.security.client.OpenMetadataJWTClientConfig; + +// Configure with JWT token +OpenMetadataConnection config = new OpenMetadataConnection(); +config.setHostPort("https://your-company.getcollate.io/api"); +config.setAuthProvider(AuthProvider.OPENMETADATA); + +OpenMetadataJWTClientConfig jwtConfig = new OpenMetadataJWTClientConfig(); +jwtConfig.setJwtToken("eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."); +config.setSecurityConfig(jwtConfig); + +// Create authenticated client +OpenMetadata client = new OpenMetadata(config); +``` + + + +```bash +# Include token in Authorization header +curl -X GET "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..." \ + -H "Content-Type: application/json" +``` + + + +## Token Structure + +Collate JWT tokens contain the following claims: + +| Claim | Description | +|-------|-------------| +| `sub` | Subject - username or bot name | +| `iss` | Issuer - `open-metadata.org` | +| `roles` | Array of assigned roles | +| `email` | User or bot email | +| `isBot` | Boolean indicating if token is for a bot | +| `tokenType` | `BOT` or `PERSONAL_ACCESS` | +| `iat` | Issued at timestamp | +| `exp` | Expiration timestamp (null for non-expiring bot tokens) | + +Example decoded token payload: + +```json +{ + "iss": "open-metadata.org", + "sub": "ingestion-bot", + "roles": ["IngestionBotRole"], + "email": "ingestion-bot@open-metadata.org", + "isBot": true, + "tokenType": "BOT", + "iat": 1704067200, + "exp": null +} +``` + +## Authentication Errors + +| Error | Status Code | Description | +|-------|-------------|-------------| +| Missing token | `401` | No Authorization header provided | +| Invalid token | `401` | Token is malformed or signature invalid | +| Expired token | `401` | Token has passed its expiration time | +| Insufficient permissions | `403` | Token lacks required role/permission | + +### Error Response Format + +```json +{ + "code": 401, + "message": "Token has expired" +} +``` + +## Security Best Practices + + + + Create dedicated bot accounts for each integration rather than using personal tokens. + + + Set expiration dates on personal access tokens and rotate bot tokens periodically. + + + Assign only the minimum required roles to bots and service accounts. + + + Use environment variables or secret managers. Never commit tokens to source control. + + + Review audit logs to track API usage and detect anomalies. + + + +## Environment Variables + +For convenience, you can configure authentication using environment variables: + +```bash +# Set your Collate host +export OPENMETADATA_HOST=https://your-company.getcollate.io/api + +# Set your JWT token +export OPENMETADATA_JWT_TOKEN=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9... +``` + +The Python SDK automatically reads these variables: + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import ( + OpenMetadataConnection, +) + +# Reads from environment variables +config = OpenMetadataConnection( + hostPort=os.getenv("OPENMETADATA_HOST"), + securityConfig=OpenMetadataJWTClientConfig( + jwtToken=os.getenv("OPENMETADATA_JWT_TOKEN") + ), +) +metadata = OpenMetadata(config) +``` + +## SSO Integration + +Collate supports SSO authentication providers for the UI. For API access, you still need to use JWT tokens, but users authenticated via SSO can generate personal access tokens from their profile. + +Supported SSO providers: +- Okta +- Azure AD +- Google +- Auth0 +- Custom OIDC +- SAML +- LDAP + + + Configure Single Sign-On for your organization + diff --git a/api-reference/core/entities.mdx b/api-reference/core/entities.mdx new file mode 100644 index 00000000..d2d41bc5 --- /dev/null +++ b/api-reference/core/entities.mdx @@ -0,0 +1,441 @@ +--- +title: Entity Model +description: Understanding the Collate entity model and hierarchical relationships +sidebarTitle: Entities +mode: "wide" +--- + +# Entity Model + +Collate uses a hierarchical entity model to represent metadata. Understanding this model is essential for working with the API effectively. + + +Collate follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/overview/) for all entity definitions, ensuring interoperability and consistent metadata schemas across platforms. + + +## Entity Hierarchy + +Data assets in Collate follow a strict hierarchical structure. To create a table, you must first create the parent entities in order: + + + + A service represents a connection to a data source (e.g., Snowflake, BigQuery, PostgreSQL). + + + A database is a container within the service (e.g., `analytics`, `production`). + + + A schema organizes tables within a database (e.g., `public`, `sales`). + + + Tables belong to schemas and contain columns with metadata. + + + +### Hierarchy Diagram + +``` +DatabaseService (e.g., snowflake_prod) +├── Database (e.g., analytics_db) +│ ├── DatabaseSchema (e.g., public) +│ │ ├── Table (e.g., customers) +│ │ │ ├── Column (e.g., customer_id) +│ │ │ ├── Column (e.g., email) +│ │ │ └── Column (e.g., created_at) +│ │ ├── Table (e.g., orders) +│ │ └── Table (e.g., products) +│ └── DatabaseSchema (e.g., staging) +│ └── Table (e.g., raw_events) +└── Database (e.g., warehouse_db) + └── DatabaseSchema (e.g., dbt_models) + └── Table (e.g., dim_customers) +``` + +### Service Hierarchies + +Each service type has its own hierarchy: + +| Service Type | Hierarchy | +|--------------|-----------| +| **Database** | DatabaseService → Database → DatabaseSchema → Table → Column | +| **Dashboard** | DashboardService → Dashboard → Chart | +| **Pipeline** | PipelineService → Pipeline → Task | +| **Messaging** | MessagingService → Topic | +| **ML Model** | MlModelService → MlModel | +| **Storage** | StorageService → Container | + +## Complete Entity Creation Example + +Here's how to create the full hierarchy from service to table: + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.api.services.createDatabaseService import CreateDatabaseServiceRequest +from metadata.generated.schema.api.data.createDatabase import CreateDatabaseRequest +from metadata.generated.schema.api.data.createDatabaseSchema import CreateDatabaseSchemaRequest +from metadata.generated.schema.api.data.createTable import CreateTableRequest +from metadata.generated.schema.entity.services.databaseService import ( + DatabaseService, + DatabaseServiceType, + DatabaseConnection, +) +from metadata.generated.schema.entity.services.connections.database.common.basicAuth import BasicAuth +from metadata.generated.schema.entity.services.connections.database.mysqlConnection import MysqlConnection +from metadata.generated.schema.entity.data.table import Column, DataType + +metadata = OpenMetadata(config) + +# Step 1: Create Database Service +service_request = CreateDatabaseServiceRequest( + name="mysql_analytics", + serviceType=DatabaseServiceType.Mysql, + connection=DatabaseConnection( + config=MysqlConnection( + username="admin", + authType=BasicAuth(password="secret"), + hostPort="mysql.example.com:3306", + ) + ), + description="MySQL analytics database" +) +service = metadata.create_or_update(data=service_request) +print(f"Created service: {service.fullyQualifiedName}") + +# Step 2: Create Database +from metadata.generated.schema.entity.data.database import Database +database_request = CreateDatabaseRequest( + name="analytics", + service=service.fullyQualifiedName, + description="Analytics data warehouse" +) +database = metadata.create_or_update(data=database_request) +print(f"Created database: {database.fullyQualifiedName}") + +# Step 3: Create Database Schema +from metadata.generated.schema.entity.data.databaseSchema import DatabaseSchema +schema_request = CreateDatabaseSchemaRequest( + name="public", + database=database.fullyQualifiedName, + description="Public schema for production tables" +) +schema = metadata.create_or_update(data=schema_request) +print(f"Created schema: {schema.fullyQualifiedName}") + +# Step 4: Create Table +table_request = CreateTableRequest( + name="customers", + databaseSchema=schema.fullyQualifiedName, + columns=[ + Column(name="customer_id", dataType=DataType.BIGINT, description="Unique customer identifier"), + Column(name="email", dataType=DataType.VARCHAR, dataLength=255, description="Customer email"), + Column(name="name", dataType=DataType.VARCHAR, dataLength=100, description="Full name"), + Column(name="created_at", dataType=DataType.TIMESTAMP, description="Account creation timestamp"), + ], + description="Customer master data table" +) +table = metadata.create_or_update(data=table_request) +print(f"Created table: {table.fullyQualifiedName}") +# Output: mysql_analytics.analytics.public.customers +``` + + + +```bash +# Step 1: Create Database Service +curl -X PUT "https://your-company.getcollate.io/api/v1/services/databaseServices" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "mysql_analytics", + "serviceType": "Mysql", + "connection": { + "config": { + "type": "Mysql", + "username": "admin", + "authType": {"password": "secret"}, + "hostPort": "mysql.example.com:3306" + } + }, + "description": "MySQL analytics database" + }' + +# Step 2: Create Database +curl -X PUT "https://your-company.getcollate.io/api/v1/databases" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "analytics", + "service": "mysql_analytics", + "description": "Analytics data warehouse" + }' + +# Step 3: Create Database Schema +curl -X PUT "https://your-company.getcollate.io/api/v1/databaseSchemas" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "public", + "database": "mysql_analytics.analytics", + "description": "Public schema for production tables" + }' + +# Step 4: Create Table +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "customers", + "databaseSchema": "mysql_analytics.analytics.public", + "columns": [ + {"name": "customer_id", "dataType": "BIGINT", "description": "Unique customer identifier"}, + {"name": "email", "dataType": "VARCHAR", "dataLength": 255, "description": "Customer email"}, + {"name": "name", "dataType": "VARCHAR", "dataLength": 100, "description": "Full name"}, + {"name": "created_at", "dataType": "TIMESTAMP", "description": "Account creation timestamp"} + ], + "description": "Customer master data table" + }' +``` + + + +## Entity Types Reference + +### Services + +Services represent connections to external data sources. + +| Service | Endpoint | Description | +|---------|----------|-------------| +| DatabaseService | `/v1/services/databaseServices` | Snowflake, BigQuery, PostgreSQL, MySQL, etc. | +| DashboardService | `/v1/services/dashboardServices` | Tableau, Looker, Superset, etc. | +| PipelineService | `/v1/services/pipelineServices` | Airflow, Dagster, Prefect, etc. | +| MessagingService | `/v1/services/messagingServices` | Kafka, Pulsar, Kinesis, etc. | +| MlModelService | `/v1/services/mlmodelServices` | MLflow, SageMaker, etc. | +| StorageService | `/v1/services/storageServices` | S3, GCS, ADLS, etc. | + +### Data Assets + +| Entity | Endpoint | Parent | Description | +|--------|----------|--------|-------------| +| Database | `/v1/databases` | DatabaseService | Database container | +| DatabaseSchema | `/v1/databaseSchemas` | Database | Schema within database | +| Table | `/v1/tables` | DatabaseSchema | Tables and views | +| Dashboard | `/v1/dashboards` | DashboardService | BI dashboards | +| Pipeline | `/v1/pipelines` | PipelineService | Data pipelines | +| Topic | `/v1/topics` | MessagingService | Message topics | +| Container | `/v1/containers` | StorageService | Storage containers | +| MlModel | `/v1/mlmodels` | MlModelService | ML models | + +### Governance + +| Entity | Endpoint | Description | +|--------|----------|-------------| +| Glossary | `/v1/glossaries` | Business glossary container | +| GlossaryTerm | `/v1/glossaryTerms` | Individual glossary terms | +| Classification | `/v1/classifications` | Tag category/taxonomy | +| Tag | `/v1/tags` | Individual tags within classification | +| Domain | `/v1/domains` | Business domain | +| DataProduct | `/v1/dataProducts` | Curated data product | + +## Entity Structure + +All entities share common fields: + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customers", + "fullyQualifiedName": "mysql_prod.analytics.public.customers", + "displayName": "Customer Master Data", + "description": "Primary customer information table", + "version": 0.4, + "updatedAt": 1704067200000, + "updatedBy": "admin", + "href": "https://your-company.getcollate.io/api/v1/tables/550e8400...", + "owner": { ... }, + "tags": [ ... ], + "followers": [ ... ], + "deleted": false +} +``` + +### Common Fields + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Unique identifier | +| `name` | string | Entity name (unique within parent scope) | +| `fullyQualifiedName` | string | Globally unique hierarchical name | +| `displayName` | string | Human-friendly display name | +| `description` | string | Markdown description | +| `version` | number | Entity version (increments on changes) | +| `updatedAt` | timestamp | Last update time (milliseconds) | +| `updatedBy` | string | Username who last updated | +| `href` | URL | API URL to access this entity | +| `owner` | EntityReference | Owner (user or team) | +| `tags` | TagLabel[] | Applied tags and classifications | +| `domain` | EntityReference | Associated business domain | +| `deleted` | boolean | Soft-delete status | + +## Entity References + +When entities reference other entities, they use `EntityReference` objects: + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "type": "user", + "name": "john.doe", + "fullyQualifiedName": "john.doe", + "displayName": "John Doe", + "href": "https://your-company.getcollate.io/api/v1/users/550e8400..." +} +``` + +### Creating References + +When setting owners, tags, or other references: + + + +```json +{ + "owner": { + "id": "550e8400-e29b-41d4-a716-446655440000", + "type": "user" + } +} +``` + + + +```json +{ + "owner": { + "fullyQualifiedName": "john.doe", + "type": "user" + } +} +``` + + + +## Updating Entity Fields + +Each entity field can be updated using JSON Patch operations: + +### Update Description + + + +```python +from metadata.generated.schema.entity.data.table import Table + +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") +metadata.patch_description(entity=Table, source=table, description="New description") +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "replace", "path": "/description", "value": "New description"}]' +``` + + + +### Update Owner + + + +```python +from metadata.generated.schema.type.entityReference import EntityReference + +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") +metadata.patch_owner( + entity=Table, + source=table, + owner=EntityReference(id=user_id, type="user") +) +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/owner", "value": {"id": "user-uuid", "type": "user"}}]' +``` + + + +### Add Tags + + + +```python +from metadata.generated.schema.type.tagLabel import TagLabel, LabelType, State + +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") +metadata.patch_tags( + entity=Table, + source=table, + tag_labels=[ + TagLabel( + tagFQN="PII.Sensitive", + labelType=LabelType.Manual, + state=State.Confirmed + ) + ] +) +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/tags/-", "value": {"tagFQN": "PII.Sensitive", "labelType": "Manual", "state": "Confirmed"}}]' +``` + + + +## Versioning + +Collate automatically versions entities. Each change increments the version: + +- Minor changes (description, tags): increment by 0.1 (e.g., 0.1 → 0.2) +- Major changes (schema, structure): increment by 1.0 (e.g., 0.9 → 1.0) + +```bash +# Get all versions +curl -X GET "https://your-company.getcollate.io/api/v1/tables/{id}/versions" \ + -H "Authorization: Bearer $TOKEN" + +# Get specific version +curl -X GET "https://your-company.getcollate.io/api/v1/tables/{id}/versions/0.3" \ + -H "Authorization: Bearer $TOKEN" +``` + +## Related Resources + + + + Learn about the FQN naming convention + + + Create and manage database connections + + + Work with table entities + + + Update entity metadata + + diff --git a/api-reference/core/fully-qualified-names.mdx b/api-reference/core/fully-qualified-names.mdx new file mode 100644 index 00000000..368412bc --- /dev/null +++ b/api-reference/core/fully-qualified-names.mdx @@ -0,0 +1,310 @@ +--- +title: Fully Qualified Names +description: Understanding the FQN naming convention for entities +sidebarTitle: Fully Qualified Names +mode: "wide" +--- + +# Fully Qualified Names (FQN) + +A Fully Qualified Name (FQN) is a unique, hierarchical identifier for every entity in Collate. FQNs provide a human-readable way to reference entities across the catalog. + +## FQN Format + +FQNs follow a dot-separated hierarchical structure that reflects the entity's position in the data hierarchy: + +``` +service.database.schema.table.column +``` + +### Examples + +| Entity Type | FQN Example | +|-------------|-------------| +| Database Service | `mysql_prod` | +| Database | `mysql_prod.analytics` | +| Database Schema | `mysql_prod.analytics.public` | +| Table | `mysql_prod.analytics.public.customers` | +| Column | `mysql_prod.analytics.public.customers.email` | +| Dashboard Service | `tableau_prod` | +| Dashboard | `tableau_prod.Sales Dashboard` | +| Pipeline | `airflow_prod.etl_customers` | +| Topic | `kafka_prod.user-events` | + +## FQN Structure by Entity Type + +### Database Hierarchy + +``` +{database_service}.{database}.{schema}.{table}.{column} +``` + +- **Service**: `snowflake_prod` +- **Database**: `snowflake_prod.ANALYTICS` +- **Schema**: `snowflake_prod.ANALYTICS.PUBLIC` +- **Table**: `snowflake_prod.ANALYTICS.PUBLIC.CUSTOMERS` +- **Column**: `snowflake_prod.ANALYTICS.PUBLIC.CUSTOMERS.customer_id` + +### Dashboard Hierarchy + +``` +{dashboard_service}.{dashboard} +``` + +- **Service**: `tableau_prod` +- **Dashboard**: `tableau_prod.Sales Performance` +- **Chart**: `tableau_prod.Sales Performance.Revenue by Region` + +### Pipeline Hierarchy + +``` +{pipeline_service}.{pipeline} +``` + +- **Service**: `airflow_prod` +- **Pipeline**: `airflow_prod.etl_daily_sales` +- **Task**: `airflow_prod.etl_daily_sales.extract_orders` + +### Messaging Hierarchy + +``` +{messaging_service}.{topic} +``` + +- **Service**: `kafka_prod` +- **Topic**: `kafka_prod.user-events` + +### Governance Entities + +``` +{glossary}.{term} +``` + +- **Glossary**: `Business Glossary` +- **Term**: `Business Glossary.Customer.Customer ID` + +## Using FQNs in the API + +### Retrieve by FQN + + + +```python +from metadata.generated.schema.entity.data.table import Table + +# Get table by FQN +table = metadata.get_by_name( + entity=Table, + fqn="mysql_prod.analytics.public.customers" +) + +print(table.id) # UUID +print(table.fullyQualifiedName) # mysql_prod.analytics.public.customers +``` + + + +```java +// Get table by FQN +Table table = tablesApi.getTableByFQN( + "mysql_prod.analytics.public.customers", + "owner,columns", // fields + null // include +); +``` + + + +```bash +# Get table by FQN +curl -X GET "https://your-company.getcollate.io/api/v1/tables/name/mysql_prod.analytics.public.customers" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### FQN in Create Requests + +When creating entities, you reference parent entities by FQN: + + + +```python +from metadata.generated.schema.api.data.createTable import CreateTableRequest + +# Reference parent schema by FQN +create_request = CreateTableRequest( + name="orders", + databaseSchema="mysql_prod.analytics.public", # Parent FQN + columns=[...] +) +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "orders", + "databaseSchema": "mysql_prod.analytics.public", + "columns": [...] + }' +``` + + + +### FQN in Tag Operations + +```bash +# Add tag to entity by FQN +curl -X PUT "https://your-company.getcollate.io/api/v1/tables/name/mysql_prod.analytics.public.customers/tags" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '[{"tagFQN": "PII.Sensitive", "source": "Classification"}]' +``` + +## Special Characters + +FQNs support special characters in entity names by escaping with quotes: + +| Scenario | Name | FQN | +|----------|------|-----| +| Space in name | `Sales Data` | `service."Sales Data"` | +| Dot in name | `schema.v2` | `service."schema.v2"` | +| Quote in name | `user's table` | `service."user\'s table"` | + +### Examples with Special Characters + +```bash +# Table with space in name +curl -X GET "https://your-company.getcollate.io/api/v1/tables/name/mysql_prod.analytics.public.\"Customer Orders\"" \ + -H "Authorization: Bearer $TOKEN" + +# Dashboard with special characters +curl -X GET "https://your-company.getcollate.io/api/v1/dashboards/name/tableau_prod.\"Sales & Revenue\"" \ + -H "Authorization: Bearer $TOKEN" +``` + +## FQN vs ID + +Both FQN and ID uniquely identify entities: + +| Attribute | FQN | ID (UUID) | +|-----------|-----|-----------| +| Format | `service.db.schema.table` | `550e8400-e29b-41d4-a716-446655440000` | +| Human-readable | Yes | No | +| Stability | Changes if entity renamed | Never changes | +| Best for | Interactive use, scripts | Stable references, foreign keys | + +### When to Use Each + +**Use FQN when:** +- Writing scripts or documentation +- Referencing entities in configuration +- Human-readable logging +- Interactive API exploration + +**Use ID when:** +- Storing references in code or databases +- Entity references that must survive renames +- Performance-critical lookups +- Internal system integrations + +## Building FQNs Programmatically + + + +```python +from metadata.utils.fqn import fqn_build + +# Build table FQN +table_fqn = fqn_build( + service_name="mysql_prod", + database_name="analytics", + schema_name="public", + table_name="customers" +) +# Result: "mysql_prod.analytics.public.customers" + +# Build column FQN +column_fqn = fqn_build( + service_name="mysql_prod", + database_name="analytics", + schema_name="public", + table_name="customers", + column_name="email" +) +# Result: "mysql_prod.analytics.public.customers.email" +``` + + + +```java +import org.openmetadata.schema.utils.FullyQualifiedName; + +// Build table FQN +String tableFqn = FullyQualifiedName.build( + "mysql_prod", + "analytics", + "public", + "customers" +); +// Result: "mysql_prod.analytics.public.customers" +``` + + + +## Parsing FQNs + + + +```python +from metadata.utils.fqn import split + +# Parse FQN into components +fqn = "mysql_prod.analytics.public.customers" +parts = split(fqn) + +# parts = ["mysql_prod", "analytics", "public", "customers"] +service = parts[0] # mysql_prod +database = parts[1] # analytics +schema = parts[2] # public +table = parts[3] # customers +``` + + + +## Common Patterns + +### Filter by Parent FQN + +```bash +# List all tables in a specific schema +curl -X GET "https://your-company.getcollate.io/api/v1/tables?databaseSchema=mysql_prod.analytics.public&limit=100" \ + -H "Authorization: Bearer $TOKEN" + +# List all tables in a specific database +curl -X GET "https://your-company.getcollate.io/api/v1/tables?database=mysql_prod.analytics&limit=100" \ + -H "Authorization: Bearer $TOKEN" +``` + +### Bulk Operations by FQN + +```python +# Update tags for multiple tables by FQN pattern +fqns = [ + "mysql_prod.analytics.public.customers", + "mysql_prod.analytics.public.orders", + "mysql_prod.analytics.public.products" +] + +for fqn in fqns: + table = metadata.get_by_name(entity=Table, fqn=fqn) + metadata.patch_tag( + entity=Table, + source=table, + tag_label=TagLabel(tagFQN="PII.Sensitive") + ) +``` diff --git a/api-reference/data-assets/dashboards/create.mdx b/api-reference/data-assets/dashboards/create.mdx new file mode 100644 index 00000000..bd392fd3 --- /dev/null +++ b/api-reference/data-assets/dashboards/create.mdx @@ -0,0 +1,115 @@ +--- +title: Create a Dashboard +description: Create a new dashboard within a dashboard service +sidebarTitle: Create +--- + +Create a new dashboard within a dashboard service. + +## Endpoint + +``` +PUT /v1/dashboards +``` + +## Parameters + + + Name of the dashboard. Must be unique within the parent service. + + + + Name of the parent DashboardService. + + + + Human-readable display name for the dashboard. + + + + URL to the dashboard in the source system. + + + + Description of the dashboard in Markdown format. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Dashboard +from metadata.generated.schema.api.data.createDashboard import CreateDashboardRequest + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +request = CreateDashboardRequest( + name="sales_overview", + displayName="Sales Overview Dashboard", + service="tableau_prod", + dashboardUrl="https://tableau.company.com/views/SalesOverview", + description="Executive dashboard showing quarterly sales metrics and KPIs" +) + +dashboard = Dashboard.create(request) +print(f"Created: {dashboard.fullyQualifiedName}") +``` + +```java Java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.Dashboards; +import org.openmetadata.schema.api.data.CreateDashboardRequest; + +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +Dashboards.setDefaultClient(client); + +CreateDashboardRequest request = new CreateDashboardRequest() + .withName("sales_overview") + .withDisplayName("Sales Overview Dashboard") + .withService("tableau_prod") + .withDashboardUrl("https://tableau.company.com/views/SalesOverview") + .withDescription("Executive dashboard showing quarterly sales metrics and KPIs"); + +Dashboard dashboard = Dashboards.create(request); +System.out.println("Created: " + dashboard.getFullyQualifiedName()); +``` + +```bash cURL +curl -X PUT "https://your-company.getcollate.io/api/v1/dashboards" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "sales_overview", + "displayName": "Sales Overview Dashboard", + "service": "tableau_prod", + "dashboardUrl": "https://tableau.company.com/views/SalesOverview", + "description": "Executive dashboard showing quarterly sales metrics and KPIs" + }' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sales_overview", + "fullyQualifiedName": "tableau_prod.sales_overview", + "displayName": "Sales Overview Dashboard", + "description": "Executive dashboard showing quarterly sales metrics and KPIs", + "dashboardUrl": "https://tableau.company.com/views/SalesOverview", + "service": { + "id": "service-uuid", + "type": "dashboardService", + "name": "tableau_prod" + }, + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/dashboards/delete.mdx b/api-reference/data-assets/dashboards/delete.mdx new file mode 100644 index 00000000..fa007381 --- /dev/null +++ b/api-reference/data-assets/dashboards/delete.mdx @@ -0,0 +1,78 @@ +--- +title: Delete a Dashboard +description: Delete a dashboard. Use hardDelete=true to permanently remove +sidebarTitle: Delete +--- + +Delete a dashboard. Use `hardDelete=true` to permanently remove. + +## Endpoint + +``` +DELETE /v1/dashboards/{id} +``` + +## Path Parameters + + + Unique identifier of the dashboard. + + +## Query Parameters + + + Set to `true` to permanently delete (cannot be restored). + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Dashboard + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +Dashboard.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete +Dashboard.delete( + "550e8400-e29b-41d4-a716-446655440000", + hard_delete=True +) +``` + +```java Java +import org.openmetadata.sdk.entities.Dashboards; + +// Soft delete +Dashboards.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete +Dashboards.delete("550e8400-e29b-41d4-a716-446655440000", false, true); +``` + +```bash cURL +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/dashboards/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/dashboards/{id}?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sales_overview", + "fullyQualifiedName": "tableau_prod.sales_overview", + "deleted": true, + "version": 0.3 +} +``` + diff --git a/api-reference/data-assets/dashboards/index.mdx b/api-reference/data-assets/dashboards/index.mdx new file mode 100644 index 00000000..b27e966c --- /dev/null +++ b/api-reference/data-assets/dashboards/index.mdx @@ -0,0 +1,55 @@ +--- +title: Dashboards +description: Create and manage dashboard entities +sidebarTitle: Dashboards +mode: "wide" +--- + +**Dashboards** represent visual analytics assets from BI platforms. They contain charts and visualizations connected to data sources. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/dashboard/). + + +## Entity Hierarchy + +Dashboards are children of Dashboard Services: + +``` +DashboardService +└── Dashboard (this page) + └── Chart +``` + +## Inheritance + +When you set an **owner** or **domain** on a Dashboard Service, it is inherited by all child dashboards and charts. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/dashboards` | [Create or update a dashboard](/api-reference/data-assets/dashboards/create) | +| `GET` | `/v1/dashboards` | [List dashboards](/api-reference/data-assets/dashboards/list) | +| `GET` | `/v1/dashboards/{id}` | [Get by ID](/api-reference/data-assets/dashboards/retrieve) | +| `GET` | `/v1/dashboards/name/{fqn}` | [Get by fully qualified name](/api-reference/data-assets/dashboards/retrieve) | +| `PATCH` | `/v1/dashboards/{id}` | [Update a dashboard](/api-reference/data-assets/dashboards/update) | +| `DELETE` | `/v1/dashboards/{id}` | [Delete a dashboard](/api-reference/data-assets/dashboards/delete) | + +--- + +## Related + + + + View dashboard object attributes + + + Configure dashboard service connections + + + Track dashboard data lineage + + diff --git a/api-reference/data-assets/dashboards/list.mdx b/api-reference/data-assets/dashboards/list.mdx new file mode 100644 index 00000000..b5f33a26 --- /dev/null +++ b/api-reference/data-assets/dashboards/list.mdx @@ -0,0 +1,122 @@ +--- +title: List Dashboards +description: List all dashboards with optional filtering and pagination +sidebarTitle: List +--- + +List all dashboards with optional filtering and pagination. + +## Endpoint + +``` +GET /v1/dashboards +``` + +## Query Parameters + + + Filter by service name. + + + + Maximum number of results to return (max: 1000000). + + + + Cursor for backward pagination. + + + + Cursor for forward pagination. + + + + Comma-separated list of fields to include. + + + + Include `all`, `deleted`, or `non-deleted` entities. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Dashboard + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all dashboards with auto-pagination +for dashboard in Dashboard.list().auto_paging_iterable(): + print(f"{dashboard.fullyQualifiedName}: {dashboard.description}") + +# Filter by service +for dashboard in Dashboard.list(service="tableau_prod").auto_paging_iterable(): + print(f"{dashboard.name}") +``` + +```java Java +import org.openmetadata.sdk.entities.Dashboards; + +// List with auto-pagination +for (Dashboard dashboard : Dashboards.list().autoPagingIterable()) { + System.out.println(dashboard.getFullyQualifiedName() + ": " + dashboard.getDescription()); +} + +// Filter by service +for (Dashboard dashboard : Dashboards.list().service("tableau_prod").autoPagingIterable()) { + System.out.println(dashboard.getName()); +} +``` + +```bash cURL +# List all dashboards +curl "https://your-company.getcollate.io/api/v1/dashboards?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Filter by service +curl "https://your-company.getcollate.io/api/v1/dashboards?service=tableau_prod&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Include charts +curl "https://your-company.getcollate.io/api/v1/dashboards?fields=charts,owners,tags&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "data": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sales_overview", + "fullyQualifiedName": "tableau_prod.sales_overview", + "displayName": "Sales Overview Dashboard", + "service": { + "id": "service-uuid", + "type": "dashboardService", + "name": "tableau_prod" + } + }, + { + "id": "660e8400-e29b-41d4-a716-446655440001", + "name": "marketing_metrics", + "fullyQualifiedName": "tableau_prod.marketing_metrics", + "displayName": "Marketing Metrics", + "service": { + "id": "service-uuid", + "type": "dashboardService", + "name": "tableau_prod" + } + } + ], + "paging": { + "after": "cursor-string", + "total": 25 + } +} +``` + diff --git a/api-reference/data-assets/dashboards/object.mdx b/api-reference/data-assets/dashboards/object.mdx new file mode 100644 index 00000000..9d40703c --- /dev/null +++ b/api-reference/data-assets/dashboards/object.mdx @@ -0,0 +1,100 @@ +--- +title: The Dashboard Object +description: Attributes of the dashboard entity +sidebarTitle: The Dashboard Object +mode: "wide" +--- + + + Unique identifier for the dashboard. + + + + Name of the dashboard. Must be unique within the parent service. + + + + Fully qualified name in format `{service}.{dashboard}`. + + + + Human-readable display name for the dashboard. + + + + Description of the dashboard in Markdown format. + + + + URL to the dashboard in the source system. + + + + Type of dashboard (e.g., Report, Dashboard). + + + + Reference to the parent DashboardService. + + + + Charts contained in this dashboard. + + + + Data models associated with this dashboard. + + + + Owners of the dashboard (users or teams). + + + + Domain this dashboard belongs to. + + + + Tags and classifications applied to this dashboard. + + + + Entity version number, incremented on updates. + + + + Whether the dashboard has been soft-deleted. + + + +```json The Dashboard Object +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sales_overview", + "fullyQualifiedName": "tableau_prod.sales_overview", + "displayName": "Sales Overview Dashboard", + "description": "Executive dashboard showing key sales metrics", + "dashboardUrl": "https://tableau.company.com/views/SalesOverview", + "service": { + "id": "service-uuid", + "type": "dashboardService", + "name": "tableau_prod" + }, + "charts": [ + { + "id": "chart-uuid", + "type": "chart", + "name": "revenue_trend" + } + ], + "owners": [ + { + "id": "user-uuid", + "type": "user", + "name": "john.doe" + } + ], + "version": 0.1, + "deleted": false +} +``` + diff --git a/api-reference/data-assets/dashboards/retrieve.mdx b/api-reference/data-assets/dashboards/retrieve.mdx new file mode 100644 index 00000000..7db12e1d --- /dev/null +++ b/api-reference/data-assets/dashboards/retrieve.mdx @@ -0,0 +1,118 @@ +--- +title: Retrieve a Dashboard +description: Retrieve a dashboard by ID or fully qualified name +sidebarTitle: Retrieve +--- + +Retrieve a dashboard by ID or fully qualified name. + +## Endpoints + +``` +GET /v1/dashboards/{id} +GET /v1/dashboards/name/{fqn} +``` + +## Path Parameters + + + Unique identifier of the dashboard. + + + + Fully qualified name of the dashboard (e.g., `tableau_prod.sales_overview`). + + +## Query Parameters + + + Comma-separated list of fields to include. Options: `charts`, `owners`, `tags`, `domain`, `dataModels`, `usageSummary`. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Dashboard + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +dashboard = Dashboard.retrieve_by_name( + "tableau_prod.sales_overview", + fields=["charts", "owners", "tags", "dataModels"] +) + +# By ID +dashboard = Dashboard.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {dashboard.displayName}") +print(f"URL: {dashboard.dashboardUrl}") +if dashboard.charts: + print(f"Charts: {len(dashboard.charts)}") +``` + +```java Java +import org.openmetadata.sdk.entities.Dashboards; + +// By name +Dashboard dashboard = Dashboards.retrieveByName( + "tableau_prod.sales_overview", + List.of("charts", "owners", "tags", "dataModels") +); + +// By ID +Dashboard dashboard = Dashboards.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + dashboard.getDisplayName()); +System.out.println("URL: " + dashboard.getDashboardUrl()); +if (dashboard.getCharts() != null) { + System.out.println("Charts: " + dashboard.getCharts().size()); +} +``` + +```bash cURL +# By FQN +curl "https://your-company.getcollate.io/api/v1/dashboards/name/tableau_prod.sales_overview?fields=charts,owners,tags" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/dashboards/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sales_overview", + "fullyQualifiedName": "tableau_prod.sales_overview", + "displayName": "Sales Overview Dashboard", + "description": "Executive dashboard showing key sales metrics", + "dashboardUrl": "https://tableau.company.com/views/SalesOverview", + "service": { + "id": "service-uuid", + "type": "dashboardService", + "name": "tableau_prod" + }, + "charts": [ + { + "id": "chart-uuid", + "type": "chart", + "name": "revenue_trend" + } + ], + "owners": [ + { + "id": "user-uuid", + "type": "user", + "name": "john.doe" + } + ], + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/dashboards/update.mdx b/api-reference/data-assets/dashboards/update.mdx new file mode 100644 index 00000000..f1effe04 --- /dev/null +++ b/api-reference/data-assets/dashboards/update.mdx @@ -0,0 +1,101 @@ +--- +title: Update a Dashboard +description: Update a dashboard using JSON Patch operations +sidebarTitle: Update +--- + +Update a dashboard using JSON Patch operations. + +## Endpoint + +``` +PATCH /v1/dashboards/{id} +``` + +## Path Parameters + + + Unique identifier of the dashboard. + + +## Request Body + +JSON Patch document following [RFC 6902](https://tools.ietf.org/html/rfc6902). + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Dashboard, User + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Update description +dashboard = Dashboard.retrieve_by_name("tableau_prod.sales_overview") +dashboard.description = "Real-time executive dashboard for Q4 sales performance" +updated = Dashboard.update(dashboard.id, dashboard) + +# Set owner +user = User.retrieve_by_name("john.doe") +dashboard.owners = [{"id": str(user.id), "type": "user"}] +updated = Dashboard.update(dashboard.id, dashboard) +``` + +```java Java +import org.openmetadata.sdk.entities.Dashboards; +import org.openmetadata.sdk.entities.Users; +import org.openmetadata.schema.type.EntityReference; + +Dashboard dashboard = Dashboards.retrieveByName("tableau_prod.sales_overview"); +dashboard.setDescription("Real-time executive dashboard for Q4 sales performance"); +Dashboard updated = Dashboards.update(dashboard.getId(), dashboard); + +// Set owner +User user = Users.retrieveByName("john.doe"); +dashboard.setOwners(List.of( + new EntityReference() + .withId(user.getId()) + .withType("user") +)); +updated = Dashboards.update(dashboard.getId(), dashboard); +``` + +```bash cURL +# Update description +curl -X PATCH "https://your-company.getcollate.io/api/v1/dashboards/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Real-time executive dashboard"} + ]' + +# Set owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/dashboards/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "user-uuid", "type": "user"}]} + ]' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sales_overview", + "fullyQualifiedName": "tableau_prod.sales_overview", + "description": "Real-time executive dashboard for Q4 sales performance", + "owners": [ + { + "id": "user-uuid", + "type": "user", + "name": "john.doe" + } + ], + "version": 0.2 +} +``` + diff --git a/api-reference/data-assets/databases/create.mdx b/api-reference/data-assets/databases/create.mdx new file mode 100644 index 00000000..fdb3f4a1 --- /dev/null +++ b/api-reference/data-assets/databases/create.mdx @@ -0,0 +1,138 @@ +--- +title: Create a Database +description: Create a new database within a database service +sidebarTitle: Create +api: PUT /v1/databases +--- + +Create a new database within a database service. + +## Parameters + + + Name of the database. Must be unique within the parent service. + + + + Fully qualified name of the parent DatabaseService. + + + + Human-readable display name for the database. + + + + Description of the database in Markdown format. + + + + Data retention period in ISO 8601 duration format (e.g., `P365D`). + + + + Owner users or teams to assign. + + + + Fully qualified name of the domain to assign. + + + + Tags to apply to the database. + + + + Custom property values. + + + +```bash BASE URL +https://your-company.getcollate.io/api/v1 +``` + + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database +from metadata.generated.schema.api.data.createDatabase import CreateDatabaseRequest + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +request = CreateDatabaseRequest( + name="analytics", + displayName="Analytics Database", + service="snowflake_prod", + description="Central analytics data warehouse", + retentionPeriod="P365D" +) + +database = Database.create(request) +print(f"Created: {database.fullyQualifiedName}") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; +import org.openmetadata.schema.api.data.CreateDatabaseRequest; + +CreateDatabaseRequest request = new CreateDatabaseRequest() + .withName("analytics") + .withDisplayName("Analytics Database") + .withService("snowflake_prod") + .withDescription("Central analytics data warehouse") + .withRetentionPeriod("P365D"); + +Database database = Databases.create(request); +``` + + + +```bash cURL +curl -X PUT "https://your-company.getcollate.io/api/v1/databases" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "analytics", + "displayName": "Analytics Database", + "service": "snowflake_prod", + "description": "Central analytics data warehouse", + "retentionPeriod": "P365D" + }' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics", + "displayName": "Analytics Database", + "description": "Central analytics data warehouse", + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "snowflake_prod" + }, + "serviceType": "Snowflake", + "retentionPeriod": "P365D", + "version": 0.1 +} +``` + + +--- + +## Error Handling + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid request body or missing required fields | +| `401` | `UNAUTHORIZED` | Invalid or missing authentication token | +| `403` | `FORBIDDEN` | User lacks permission to create databases | +| `409` | `ENTITY_ALREADY_EXISTS` | Database with same name already exists in service | diff --git a/api-reference/data-assets/databases/delete.mdx b/api-reference/data-assets/databases/delete.mdx new file mode 100644 index 00000000..bd1c92fe --- /dev/null +++ b/api-reference/data-assets/databases/delete.mdx @@ -0,0 +1,105 @@ +--- +title: Delete a Database +description: Delete a database with soft or hard delete options +sidebarTitle: Delete +api: DELETE /v1/databases/{id} +--- + +Delete a database. Use `hardDelete=true` to permanently remove. + +## Path Parameters + + + Unique identifier of the database. + + +## Query Parameters + + + Set to `true` to permanently delete (cannot be restored). + + + + Set to `true` to also delete all child schemas and tables. + + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +Database.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete with all children +Database.delete( + "550e8400-e29b-41d4-a716-446655440000", + recursive=True, + hard_delete=True +) +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; + +// Soft delete +Databases.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete with all children +Databases.delete("550e8400-e29b-41d4-a716-446655440000", true, true); +``` + + + +```bash cURL +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete with all children +curl -X DELETE "https://your-company.getcollate.io/api/v1/databases/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics", + "deleted": true, + "version": 0.4 +} +``` + + +--- + +## Restore a Database + +Restore a soft-deleted database. + +### Endpoint + +``` +PUT /v1/databases/restore +``` + + +```bash cURL +curl -X PUT "https://your-company.getcollate.io/api/v1/databases/restore" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "id": "550e8400-e29b-41d4-a716-446655440000" + }' +``` + diff --git a/api-reference/data-assets/databases/import-export.mdx b/api-reference/data-assets/databases/import-export.mdx new file mode 100644 index 00000000..3a64d2b9 --- /dev/null +++ b/api-reference/data-assets/databases/import-export.mdx @@ -0,0 +1,769 @@ +--- +title: Import & Export Databases +description: Bulk import and export databases with all nested entities (schemas, tables, stored procedures) +sidebarTitle: Import & Export +api: GET /v1/databases/name/{fqn}/export +--- + +# Import & Export Databases + +Bulk import and export database entities including all nested schemas, tables, columns, and stored procedures. Use CSV format for easy manipulation in spreadsheets or automated pipelines. + +## Endpoints + +``` +GET /v1/databases/name/{fqn}/export +PUT /v1/databases/name/{fqn}/import +PUT /v1/databases/name/{fqn}/importAsync +``` + +--- + +## Export Database + +Export a database and all its child entities to CSV format. + +### Export Entire Database Hierarchy + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Export database with all schemas, tables, and columns +csv_data = Database.export("snowflake_prod.analytics") + +# Save to file +with open("analytics_export.csv", "w") as f: + f.write(csv_data) + +print(f"Exported {len(csv_data.splitlines())} rows") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; +import java.nio.file.Files; +import java.nio.file.Path; + +String csvData = Databases.export("snowflake_prod.analytics"); + +// Save to file +Files.writeString(Path.of("analytics_export.csv"), csvData); +System.out.println("Export complete"); +``` + + + +```bash cURL +# Export to CSV +curl "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/export" \ + -H "Authorization: Bearer $TOKEN" \ + -o analytics_export.csv + +# View first few lines +head -20 analytics_export.csv +``` + + +### Async Export (Large Databases) + +For databases with thousands of entities, use async export to avoid timeouts. + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database +import time + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Start async export +exporter = Database.export_csv("snowflake_prod.analytics") +job = exporter.with_async().execute_async() + +print(f"Export job started: {job.id}") + +# Poll for completion +while job.status == "RUNNING": + time.sleep(5) + job = Database.get_export_status(job.id) + print(f"Progress: {job.progress}%") + +# Download result +csv_data = Database.download_export(job.id) +with open("analytics_export.csv", "w") as f: + f.write(csv_data) +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; +import java.nio.file.Files; +import java.nio.file.Path; + +// Start async export +ExportJob job = Databases.exportCsv("snowflake_prod.analytics") + .async() + .execute(); + +System.out.println("Job started: " + job.getId()); + +// Poll for completion +while (job.getStatus().equals("RUNNING")) { + Thread.sleep(5000); + job = Databases.getExportStatus(job.getId()); + System.out.println("Progress: " + job.getProgress() + "%"); +} + +// Download result +String csvData = Databases.downloadExport(job.getId()); +Files.writeString(Path.of("analytics_export.csv"), csvData); +``` + + + +```bash cURL +# Start async export +curl -X GET "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/exportAsync" \ + -H "Authorization: Bearer $TOKEN" + +# Response: {"jobId": "export-job-uuid", "status": "RUNNING"} + +# Check status +curl "https://your-company.getcollate.io/api/v1/databases/export/status/{jobId}" \ + -H "Authorization: Bearer $TOKEN" + +# Download when complete +curl "https://your-company.getcollate.io/api/v1/databases/export/download/{jobId}" \ + -H "Authorization: Bearer $TOKEN" \ + -o analytics_export.csv +``` + + +--- + +### CSV Export Format + +The export generates a CSV with these columns: + +| Column | Description | Example | +|--------|-------------|---------| +| `name*` | Entity name (required) | `customers` | +| `displayName` | Human-friendly name | `Customer Master` | +| `description` | Markdown description | `Customer data table` | +| `owners` | Owner references (JSON) | `[{"name":"data-team","type":"team"}]` | +| `tags` | Applied tags (JSON) | `["Tier.Tier1","PII.Sensitive"]` | +| `glossaryTerms` | Glossary terms (JSON) | `["BusinessGlossary.Customer"]` | +| `tiers` | Tier classification | `Tier.Tier1` | +| `domain` | Domain assignment | `Sales` | +| `retentionPeriod` | Data retention (ISO 8601) | `P1Y` | +| `extension` | Custom properties (JSON) | `{"costCenter":"CC-1234"}` | + + +```csv Export CSV +name*,displayName,description,owners,tags,glossaryTerms,tiers,domain,retentionPeriod,extension +public,,Production schema for analytics,"[{""name"":""data-platform"",""type"":""team""}]","[""Tier.Tier1""]",[],Tier.Tier1,Sales,P1Y,"{""costCenter"":""CC-1234""}" +public.customers,Customer Master,Core customer dimension table,"[{""name"":""data-platform"",""type"":""team""}]","[""Tier.Tier1"",""PII.Sensitive""]","[""BusinessGlossary.Customer""]",Tier.Tier1,Sales,P1Y, +public.customers.id,,,,,,,,, +public.customers.email,,Customer email address,,"[""PII.Email""]",,,, +public.customers.name,,Customer full name,,"[""PII.Name""]",,,, +public.orders,Orders,Transaction orders table,"[{""name"":""data-platform"",""type"":""team""}]","[""Tier.Tier2""]",,Tier.Tier2,Sales,P2Y, +``` + + +--- + +## Import Database + +Import entities from CSV to update or create database metadata in bulk. + +### Import Modes + +| Mode | Description | +|------|-------------| +| `dryRun=true` | Preview changes without applying | +| `dryRun=false` | Apply changes to database | + +### Dry Run (Preview Changes) + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Read CSV file +with open("analytics_import.csv", "r") as f: + csv_data = f.read() + +# Preview changes (dry run) +result = Database.import_data( + "snowflake_prod.analytics", + csv_data, + dry_run=True +) + +print(f"Would create: {result.numberOfRowsCreated}") +print(f"Would update: {result.numberOfRowsUpdated}") +print(f"Would fail: {result.numberOfRowsFailed}") + +# Review any errors +for error in result.failedRows: + print(f"Row {error.rowNumber}: {error.reason}") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; +import java.nio.file.Files; +import java.nio.file.Path; + +String csvData = Files.readString(Path.of("analytics_import.csv")); + +// Preview changes +ImportResult result = Databases.importData( + "snowflake_prod.analytics", + csvData, + true // dryRun +); + +System.out.println("Would create: " + result.getNumberOfRowsCreated()); +System.out.println("Would update: " + result.getNumberOfRowsUpdated()); +System.out.println("Would fail: " + result.getNumberOfRowsFailed()); +``` + + + +```bash cURL +# Dry run - preview changes +curl -X PUT "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/import?dryRun=true" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: text/plain" \ + --data-binary @analytics_import.csv +``` + + +### Apply Import + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +with open("analytics_import.csv", "r") as f: + csv_data = f.read() + +# Apply changes +result = Database.import_data( + "snowflake_prod.analytics", + csv_data, + dry_run=False +) + +print(f"Created: {result.numberOfRowsCreated}") +print(f"Updated: {result.numberOfRowsUpdated}") +print(f"Failed: {result.numberOfRowsFailed}") +print(f"Status: {result.status}") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; +import java.nio.file.Files; +import java.nio.file.Path; + +String csvData = Files.readString(Path.of("analytics_import.csv")); + +// Apply changes +ImportResult result = Databases.importData( + "snowflake_prod.analytics", + csvData, + false // dryRun +); + +System.out.println("Created: " + result.getNumberOfRowsCreated()); +System.out.println("Updated: " + result.getNumberOfRowsUpdated()); +System.out.println("Status: " + result.getStatus()); +``` + + + +```bash cURL +# Apply import +curl -X PUT "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/import?dryRun=false" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: text/plain" \ + --data-binary @analytics_import.csv +``` + + + +```json Import Result +{ + "dryRun": false, + "status": "SUCCESS", + "numberOfRowsProcessed": 156, + "numberOfRowsCreated": 12, + "numberOfRowsUpdated": 144, + "numberOfRowsFailed": 0, + "failedRows": [], + "successRows": [ + "public", + "public.customers", + "public.customers.id", + "public.customers.email", + "..." + ] +} +``` + + +--- + +## Async Import (Large Datasets) + +For large imports (thousands of entities), use async import to avoid timeouts. + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database +import time + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +with open("large_analytics_import.csv", "r") as f: + csv_data = f.read() + +# Start async import +job = Database.import_async( + "snowflake_prod.analytics", + csv_data +) + +print(f"Import job started: {job.id}") + +# Poll for completion +while job.status == "RUNNING": + time.sleep(5) + job = Database.get_import_status(job.id) + print(f"Progress: {job.progress}%") + +print(f"Final status: {job.status}") +print(f"Rows processed: {job.numberOfRowsProcessed}") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; +import java.nio.file.Files; +import java.nio.file.Path; + +String csvData = Files.readString(Path.of("large_analytics_import.csv")); + +// Start async import +ImportJob job = Databases.importAsync("snowflake_prod.analytics", csvData); +System.out.println("Job started: " + job.getId()); + +// Poll for completion +while (job.getStatus().equals("RUNNING")) { + Thread.sleep(5000); + job = Databases.getImportStatus(job.getId()); + System.out.println("Progress: " + job.getProgress() + "%"); +} + +System.out.println("Final status: " + job.getStatus()); +``` + + + +```bash cURL +# Start async import +curl -X PUT "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/importAsync" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: text/plain" \ + --data-binary @large_import.csv + +# Response: {"jobId": "job-uuid", "status": "RUNNING"} + +# Check status +curl "https://your-company.getcollate.io/api/v1/databases/import/status/{jobId}" \ + -H "Authorization: Bearer $TOKEN" +``` + + +--- + +## Bulk Update Examples + +### Add Tags to All Tables + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database +import csv +import io + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Export current state +csv_data = Database.export("snowflake_prod.analytics") + +# Parse and modify +reader = csv.DictReader(io.StringIO(csv_data)) +rows = list(reader) + +# Add Tier.Tier1 tag to all tables (rows with exactly 3 parts in name) +for row in rows: + name_parts = row['name*'].split('.') + if len(name_parts) == 2: # schema.table format + existing_tags = eval(row['tags']) if row['tags'] else [] + if 'Tier.Tier1' not in existing_tags: + existing_tags.append('Tier.Tier1') + row['tags'] = str(existing_tags) + +# Write modified CSV +output = io.StringIO() +writer = csv.DictWriter(output, fieldnames=reader.fieldnames) +writer.writeheader() +writer.writerows(rows) + +# Import with changes +result = Database.import_data( + "snowflake_prod.analytics", + output.getvalue(), + dry_run=False +) +print(f"Updated {result.numberOfRowsUpdated} entities") +``` + + + +```bash cURL +# 1. Export +curl "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/export" \ + -H "Authorization: Bearer $TOKEN" \ + -o export.csv + +# 2. Edit CSV to add tags (use spreadsheet or script) + +# 3. Import +curl -X PUT "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics/import?dryRun=false" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: text/plain" \ + --data-binary @export.csv +``` + + +### Set Owner for All Entities + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database +import csv +import io +import json + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Export +csv_data = Database.export("snowflake_prod.analytics") + +# Parse +reader = csv.DictReader(io.StringIO(csv_data)) +rows = list(reader) + +# Set owner for all entities +owner_json = json.dumps([{"name": "data-platform", "type": "team"}]) +for row in rows: + row['owners'] = owner_json + +# Write and import +output = io.StringIO() +writer = csv.DictWriter(output, fieldnames=reader.fieldnames) +writer.writeheader() +writer.writerows(rows) + +result = Database.import_data( + "snowflake_prod.analytics", + output.getvalue(), + dry_run=False +) +``` + + + +```bash cURL +# Manual CSV edit to set owners column: +# owners +# "[{""name"":""data-platform"",""type"":""team""}]" +``` + + +### Assign Domain to Database + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database +import csv +import io + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +csv_data = Database.export("snowflake_prod.analytics") + +reader = csv.DictReader(io.StringIO(csv_data)) +rows = list(reader) + +# Set domain for all entities +for row in rows: + row['domain'] = 'Sales' + +output = io.StringIO() +writer = csv.DictWriter(output, fieldnames=reader.fieldnames) +writer.writeheader() +writer.writerows(rows) + +result = Database.import_data( + "snowflake_prod.analytics", + output.getvalue(), + dry_run=False +) +print(f"Assigned domain to {result.numberOfRowsUpdated} entities") +``` + + + +```bash cURL +# Set domain column to "Sales" for all rows in CSV +``` + + +--- + +## Export Specific Schemas + +Export only specific schemas within a database. + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseSchema + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Export single schema with all tables +csv_data = DatabaseSchema.export("snowflake_prod.analytics.public") + +with open("public_schema_export.csv", "w") as f: + f.write(csv_data) +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.DatabaseSchemas; + +String csvData = DatabaseSchemas.export("snowflake_prod.analytics.public"); +Files.writeString(Path.of("public_schema_export.csv"), csvData); +``` + + + +```bash cURL +curl "https://your-company.getcollate.io/api/v1/databaseSchemas/name/snowflake_prod.analytics.public/export" \ + -H "Authorization: Bearer $TOKEN" \ + -o public_schema_export.csv +``` + + +--- + +## Export Tables Only + +Export only table-level metadata (no column details). + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Table + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Export single table +csv_data = Table.export("snowflake_prod.analytics.public.customers") + +# Or export from database with table filter +csv_data = Database.export( + "snowflake_prod.analytics", + entity_type="table" # Only tables, no columns +) +``` + + + +```bash cURL +curl "https://your-company.getcollate.io/api/v1/tables/name/snowflake_prod.analytics.public.customers/export" \ + -H "Authorization: Bearer $TOKEN" +``` + + +--- + +## Import CSV Format Reference + +### Required Fields + +| Field | Description | +|-------|-------------| +| `name*` | Entity fully qualified name (required) | + +### Optional Fields + +| Field | Format | Example | +|-------|--------|---------| +| `displayName` | String | `Customer Master` | +| `description` | Markdown string | `## Overview\nCustomer data` | +| `owners` | JSON array | `[{"name":"team","type":"team"}]` | +| `tags` | JSON array of tag FQNs | `["Tier.Tier1","PII.Sensitive"]` | +| `glossaryTerms` | JSON array | `["Glossary.Term1"]` | +| `tiers` | Tag FQN | `Tier.Tier1` | +| `domain` | Domain name | `Sales` | +| `retentionPeriod` | ISO 8601 duration | `P1Y` | +| `extension` | JSON object | `{"key":"value"}` | + +### CSV Escaping Rules + +- Wrap fields containing commas in double quotes: `"value,with,commas"` +- Escape double quotes by doubling: `"He said ""hello"""` +- JSON values must use escaped quotes: `"[{""name"":""team""}]"` + +--- + +## Error Handling + +### Import Errors + + +```json Failed Import +{ + "dryRun": false, + "status": "PARTIAL_SUCCESS", + "numberOfRowsProcessed": 100, + "numberOfRowsCreated": 80, + "numberOfRowsUpdated": 15, + "numberOfRowsFailed": 5, + "failedRows": [ + { + "rowNumber": 23, + "entityName": "public.invalid_table", + "reason": "Entity not found: public.invalid_table" + }, + { + "rowNumber": 45, + "entityName": "public.customers", + "reason": "Invalid tag FQN: NonExistent.Tag" + }, + { + "rowNumber": 67, + "entityName": "staging.orders", + "reason": "Invalid JSON in owners column" + } + ] +} +``` + + +### Common Errors + +| Error | Cause | Solution | +|-------|-------|----------| +| `Entity not found` | FQN doesn't exist | Verify entity exists in catalog | +| `Invalid tag FQN` | Tag doesn't exist | Create tag first or fix FQN | +| `Invalid JSON` | Malformed JSON | Check quotes and escaping | +| `Permission denied` | No edit access | Check user permissions | +| `Invalid owner` | Owner doesn't exist | Verify team/user exists | + +--- + +## Best Practices + + +**Tips for Large Imports:** +1. Always run dry-run first to preview changes +2. Use async import for >1000 entities +3. Break very large imports into schema-level batches +4. Export → modify → import workflow preserves existing data + + +### Migration Workflow + +```python +# 1. Export from source +source_csv = Database.export("old_service.old_db") + +# 2. Transform (update service/database names in CSV) +# ... modify CSV ... + +# 3. Dry run on target +result = Database.import_data("new_service.new_db", modified_csv, dry_run=True) + +# 4. Review and apply +if result.numberOfRowsFailed == 0: + Database.import_data("new_service.new_db", modified_csv, dry_run=False) +``` + +### Backup Before Bulk Updates + +```python +# Always export before bulk modifications +backup = Database.export("snowflake_prod.analytics") +with open(f"backup_{datetime.now().isoformat()}.csv", "w") as f: + f.write(backup) + +# Then proceed with import +``` diff --git a/api-reference/data-assets/databases/index.mdx b/api-reference/data-assets/databases/index.mdx new file mode 100644 index 00000000..96da2fda --- /dev/null +++ b/api-reference/data-assets/databases/index.mdx @@ -0,0 +1,45 @@ +--- +title: Databases +description: Create and manage database containers within a service +sidebarTitle: Databases +mode: "wide" +--- + +# Databases + +A **Database** is a container within a Database Service. It holds Database Schemas, which in turn contain Tables. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/data-assets/databases/database/). + + +## Entity Hierarchy + +Databases sit between Services and Schemas in the hierarchy: + +``` +DatabaseService +└── Database (this page) + └── DatabaseSchema + └── Table +``` + +## Inheritance + +When you set an **owner** or **domain** on a Database, it is inherited by all child schemas and tables. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/databases` | [Create or update a database](/api-reference/data-assets/databases/create) | +| `GET` | `/v1/databases` | [List databases](/api-reference/data-assets/databases/list) | +| `GET` | `/v1/databases/{id}` | [Get by ID](/api-reference/data-assets/databases/retrieve) | +| `GET` | `/v1/databases/name/{fqn}` | [Get by FQN](/api-reference/data-assets/databases/retrieve) | +| `PATCH` | `/v1/databases/{id}` | [Update a database](/api-reference/data-assets/databases/update) | +| `DELETE` | `/v1/databases/{id}` | [Delete a database](/api-reference/data-assets/databases/delete) | +| `PUT` | `/v1/databases/restore` | Restore a soft-deleted database | +| `GET` | `/v1/databases/name/{fqn}/export` | [Export to CSV](/api-reference/data-assets/databases/import-export) | +| `PUT` | `/v1/databases/name/{fqn}/import` | [Import from CSV](/api-reference/data-assets/databases/import-export) | diff --git a/api-reference/data-assets/databases/list.mdx b/api-reference/data-assets/databases/list.mdx new file mode 100644 index 00000000..8917295b --- /dev/null +++ b/api-reference/data-assets/databases/list.mdx @@ -0,0 +1,217 @@ +--- +title: List Databases +description: List all databases with optional filtering and pagination +sidebarTitle: List +api: GET /v1/databases +--- + +List all databases with optional filtering and pagination. + +## Query Parameters + + + Filter by service fully qualified name. + + + + Filter by domain fully qualified name. + + + + Maximum number of results to return (max: 1000000). + + + + Cursor for backward pagination. + + + + Cursor for forward pagination. + + + + Comma-separated list of fields to include. See [Supported Fields](#supported-fields) below. + + + + Include `all`, `deleted`, or `non-deleted` entities. + + + +```bash BASE URL +https://your-company.getcollate.io/api/v1 +``` + + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all with auto-pagination +for database in Database.list().auto_paging_iterable(): + print(f"{database.fullyQualifiedName}") + +# Filter by service +for database in Database.list(service="snowflake_prod").auto_paging_iterable(): + print(f"{database.name}: {database.description}") + +# With fields +databases = Database.list( + service="snowflake_prod", + fields=["owners", "usageSummary"], + limit=50 +) +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; + +// List with auto-pagination +for (Database database : Databases.list().autoPagingIterable()) { + System.out.println(database.getFullyQualifiedName()); +} + +// Filter by service +for (Database database : Databases.list() + .service("snowflake_prod") + .autoPagingIterable()) { + System.out.println(database.getName()); +} +``` + + + +```bash cURL +# List all +curl "https://your-company.getcollate.io/api/v1/databases?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Filter by service +curl "https://your-company.getcollate.io/api/v1/databases?service=snowflake_prod&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# With fields +curl "https://your-company.getcollate.io/api/v1/databases?service=snowflake_prod&fields=owners,usageSummary&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "data": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics", + "displayName": "Analytics Database", + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "snowflake_prod" + }, + "serviceType": "Snowflake" + }, + { + "id": "660e8400-e29b-41d4-a716-446655440001", + "name": "raw_data", + "fullyQualifiedName": "snowflake_prod.raw_data", + "displayName": "Raw Data", + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "snowflake_prod" + }, + "serviceType": "Snowflake" + } + ], + "paging": { + "after": "eyJsYXN0SWQiOiI2NjBlODQwMC1lMjliLTQxZDQtYTcxNi00NDY2NTU0NDAwMDEifQ==", + "total": 42 + } +} +``` + + +--- + +## Supported Fields + +Use the `fields` parameter to include additional data in the response. By default, only basic fields are returned. + +| Field | Description | +|-------|-------------| +| `owners` | List of owners (teams and users) assigned to the database | +| `tags` | Classification tags applied to the database | +| `domain` | Domain assignment for governance | +| `dataProducts` | Data products this database belongs to | +| `databaseSchemas` | Child schemas within this database | +| `usageSummary` | Usage statistics and query counts | +| `location` | Storage location information | +| `extension` | Custom properties defined on this database | +| `sourceHash` | Hash of the source metadata for change detection | +| `lifeCycle` | Lifecycle information (created, accessed, updated dates) | +| `votes` | User votes and ratings | +| `followers` | Users following this database for updates | + +### Example: Request Multiple Fields + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List with multiple fields +databases = Database.list( + service="snowflake_prod", + fields=["owners", "tags", "domain", "usageSummary"], + limit=50 +) + +for db in databases.entities: + print(f"{db.fullyQualifiedName}") + if db.owners: + print(f" Owners: {[o.name for o in db.owners]}") + if db.tags: + print(f" Tags: {[t.tagFQN for t in db.tags]}") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; + +var result = Databases.list() + .service("snowflake_prod") + .fields("owners", "tags", "domain", "usageSummary") + .limit(50) + .execute(); + +for (Database db : result.getData()) { + System.out.println(db.getFullyQualifiedName()); + if (db.getOwners() != null) { + db.getOwners().forEach(o -> System.out.println(" Owner: " + o.getName())); + } +} +``` + + + +```bash cURL +curl "https://your-company.getcollate.io/api/v1/databases?service=snowflake_prod&fields=owners,tags,domain,usageSummary&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + diff --git a/api-reference/data-assets/databases/object.mdx b/api-reference/data-assets/databases/object.mdx new file mode 100644 index 00000000..033db6a9 --- /dev/null +++ b/api-reference/data-assets/databases/object.mdx @@ -0,0 +1,191 @@ +--- +title: The Database Object +description: Complete schema for the database entity +sidebarTitle: The Database Object +mode: "wide" +--- + +## Required Fields + + + Unique identifier for the database. **Auto-generated by the server** - do not provide when creating. + + + + Name of the database. Must be unique within the parent service. Pattern: `^[^.]*$`, length 1-256 characters. + + + + Reference to the parent DatabaseService. Provide the service name when creating. + + +## Core Identity Properties + + + Fully qualified name in format `{service}.{database}`. **Auto-generated by the server**. + + + + Human-readable display name for the database. + + + + Rich text description of the database in Markdown format. + + +## Operational Properties + + + Whether this is the default database for the service. Some databases (like MySQL) don't support a database/catalog hierarchy and use a default database. + + + + Whether the database has been soft-deleted. **Auto-managed by the server**. + + + + Data retention period in ISO 8601 duration format (e.g., `P365D` for 365 days, `P1Y` for 1 year). + + + + URL to the database in the source system. + + +## Relationship Properties + + + Type of the parent database service (e.g., Snowflake, BigQuery, Postgres). **Auto-populated by the server**. + + + + References to child schemas within this database. **Read-only**, populated when requesting with `fields=databaseSchemas`. + + + + Reference to the storage location for this database. + + +## Governance Properties + + + Owner users or teams. Inherited by all child schemas and tables. + + + + Business domain this database belongs to. Inherited by all child entities. + + + + Data products this database is associated with. + + + + Classification tags applied to this database. + + + + Users following this database for updates. + + + + Upvotes and downvotes from users. + + + + Entity lifecycle metadata including creation and access timestamps. + + + + Certification status (e.g., Bronze, Silver, Gold). + + +## Usage & Profiling + + + Daily, weekly, and monthly usage statistics. **Auto-populated by usage ingestion**. + + + + Profiler configuration with settings: + - `profileSample`: Sample percentage for profiling + - `profileSampleType`: Sampling type (PERCENTAGE or ROWS) + - `sampleDataCount`: Number of sample rows (default: 50) + - `samplingMethodType`: Sampling method (BERNOULLI or SYSTEM) + + +## Versioning Properties + + + Entity version number. **Auto-incremented by the server** on each update. + + + + Last update timestamp in Unix epoch milliseconds. **Auto-managed by the server**. + + + + User who last updated the entity. **Auto-managed by the server**. + + + + Details of changes made in the current version. **Auto-managed by the server**. + + +## Extension Properties + + + Custom properties defined for this entity type. + + + +```json The Database Object +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics", + "displayName": "Analytics Database", + "description": "Central analytics data warehouse for business intelligence", + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "snowflake_prod", + "fullyQualifiedName": "snowflake_prod" + }, + "serviceType": "Snowflake", + "default": false, + "retentionPeriod": "P365D", + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "domain": { + "id": "domain-uuid", + "type": "domain", + "name": "Analytics" + }, + "tags": [ + { + "tagFQN": "Tier.Tier1", + "source": "Classification" + } + ], + "usageSummary": { + "dailyStats": { + "count": 150, + "percentileRank": 85.5 + }, + "weeklyStats": { + "count": 1050, + "percentileRank": 82.3 + } + }, + "version": 0.2, + "updatedAt": 1704067200000, + "updatedBy": "admin", + "deleted": false +} +``` + diff --git a/api-reference/data-assets/databases/retrieve.mdx b/api-reference/data-assets/databases/retrieve.mdx new file mode 100644 index 00000000..edba6f19 --- /dev/null +++ b/api-reference/data-assets/databases/retrieve.mdx @@ -0,0 +1,135 @@ +--- +title: Retrieve a Database +description: Retrieve a database by ID or fully qualified name +sidebarTitle: Retrieve +api: GET /v1/databases/{id} +--- + +Retrieve a database by ID or fully qualified name. + +Also available: `GET /v1/databases/name/{fqn}` + +## Path Parameters + + + Unique identifier of the database. + + + + Fully qualified name of the database (e.g., `snowflake_prod.analytics`). + + +## Query Parameters + + + Comma-separated list of fields to include: `owners`, `tags`, `domain`, `databaseSchemas`, `usageSummary`, `extension`, `dataProducts`, `lifeCycle`, `votes`, `followers`. + + + + Include `all`, `deleted`, or `non-deleted` entities. + + + +```bash BASE URL +https://your-company.getcollate.io/api/v1 +``` + + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name with fields +database = Database.retrieve_by_name( + "snowflake_prod.analytics", + fields=["owners", "tags", "databaseSchemas", "usageSummary"] +) + +# By ID +database = Database.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {database.name}") +print(f"FQN: {database.fullyQualifiedName}") +print(f"Service: {database.service.name}") +``` + + + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; + +// By name with fields +Database database = Databases.getByName( + "snowflake_prod.analytics", + List.of("owners", "tags", "databaseSchemas", "usageSummary") +); + +// By ID +Database database = Databases.get("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + database.getName()); +System.out.println("FQN: " + database.getFullyQualifiedName()); +``` + + + +```bash cURL +# By FQN with fields +curl "https://your-company.getcollate.io/api/v1/databases/name/snowflake_prod.analytics?fields=owners,tags,databaseSchemas,usageSummary" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/databases/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics", + "displayName": "Analytics Database", + "description": "Central analytics data warehouse", + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "snowflake_prod" + }, + "serviceType": "Snowflake", + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "databaseSchemas": [ + { + "id": "schema-uuid-1", + "type": "databaseSchema", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public" + } + ], + "version": 0.2 +} +``` + + +--- + +## Error Handling + +| Code | Error Type | Description | +|------|-----------|-------------| +| `401` | `UNAUTHORIZED` | Invalid or missing authentication token | +| `403` | `FORBIDDEN` | User lacks permission to view this database | +| `404` | `ENTITY_NOT_FOUND` | Database with given ID or FQN not found | diff --git a/api-reference/data-assets/databases/update.mdx b/api-reference/data-assets/databases/update.mdx new file mode 100644 index 00000000..187e28f8 --- /dev/null +++ b/api-reference/data-assets/databases/update.mdx @@ -0,0 +1,383 @@ +--- +title: Update a Database +description: Update database metadata including description, owners, tags, domain, and custom properties +sidebarTitle: Update +api: PATCH /v1/databases/{id} +--- + +
+ +Update database metadata using JSON Patch operations. The SDK handles patch generation automatically. + +## Path Parameters + + + Unique identifier of the database (UUID format). + + +## Common Operations + +### Update Description + +Replace the description text for a database. + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +database = Database.retrieve_by_name("snowflake_prod.analytics") +database.description = "Central analytics data warehouse" +updated = Database.update(database) +``` + +```java Java +import org.openmetadata.sdk.entities.Databases; + +Database database = Databases.getByName("snowflake_prod.analytics"); +database.setDescription("Central analytics data warehouse"); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "replace", "path": "/description", "value": "Central analytics data warehouse"}]' +``` + + +### Update Display Name + +Set a human-friendly display name that appears in the UI. + + +```python Python +database = Database.retrieve_by_name("snowflake_prod.analytics") +database.displayName = "Analytics Data Warehouse" +updated = Database.update(database) +``` + +```java Java +Database database = Databases.getByName("snowflake_prod.analytics"); +database.setDisplayName("Analytics Data Warehouse"); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/displayName", "value": "Analytics Data Warehouse"}]' +``` + + +### Set Team Owner + +Assign a team as the owner of the database. + + +```python Python +from metadata.sdk import to_entity_reference, Teams + +database = Database.retrieve_by_name("snowflake_prod.analytics") +team = Teams.retrieve_by_name("data-platform") +database.owners = [to_entity_reference(team)] +updated = Database.update(database) +``` + +```java Java +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.sdk.EntityReferences; + +Database database = Databases.getByName("snowflake_prod.analytics"); +Team team = Teams.getByName("data-platform"); +database.setOwners(List.of(EntityReferences.from(team))); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]}]' +``` + + +### Set Multiple Owners + +Assign both teams and individual users as co-owners. + + +```python Python +from metadata.sdk import to_entity_reference, Teams, Users + +database = Database.retrieve_by_name("snowflake_prod.analytics") +team = Teams.retrieve_by_name("data-platform") +user = Users.retrieve_by_name("john.doe") +database.owners = [to_entity_reference(team), to_entity_reference(user)] +updated = Database.update(database) +``` + +```java Java +Database database = Databases.getByName("snowflake_prod.analytics"); +Team team = Teams.getByName("data-platform"); +User user = Users.getByName("john.doe"); +database.setOwners(List.of( + EntityReferences.from(team), + EntityReferences.from(user) +)); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/owners", "value": [ + {"id": "team-uuid", "type": "team"}, + {"id": "user-uuid", "type": "user"} + ]}]' +``` + + +### Remove Owners + +Clear all ownership assignments from the database. + + +```python Python +database = Database.retrieve_by_name("snowflake_prod.analytics") +database.owners = None +updated = Database.update(database) +``` + +```java Java +Database database = Databases.getByName("snowflake_prod.analytics"); +database.setOwners(null); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "remove", "path": "/owners"}]' +``` + + +### Add Tags + +Add classification tags to the database. + + +```python Python +database = Database.retrieve_by_name("snowflake_prod.analytics") +database.add_tag("Tier.Tier1") +updated = Database.update(database) +``` + +```java Java +Database database = Databases.getByName("snowflake_prod.analytics"); +database.addTag("Tier.Tier1"); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/tags/-", "value": { + "tagFQN": "Tier.Tier1", + "labelType": "Manual", + "state": "Confirmed", + "source": "Classification" + }}]' +``` + + +### Set Domain + +Associate the database with a business domain. + + +```python Python +from metadata.sdk import to_entity_reference, Domains + +database = Database.retrieve_by_name("snowflake_prod.analytics") +domain = Domains.retrieve_by_name("Sales") +database.domain = to_entity_reference(domain) +updated = Database.update(database) +``` + +```java Java +import org.openmetadata.sdk.entities.Domains; + +Database database = Databases.getByName("snowflake_prod.analytics"); +Domain domain = Domains.getByName("Sales"); +database.setDomain(EntityReferences.from(domain)); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/domain", "value": {"id": "domain-uuid", "type": "domain"}}]' +``` + + +### Set Retention Period + +Define data retention using ISO 8601 duration format: `P30D` (30 days), `P6M` (6 months), `P1Y` (1 year). + + +```python Python +database = Database.retrieve_by_name("snowflake_prod.analytics") +database.retentionPeriod = "P1Y" +updated = Database.update(database) +``` + +```java Java +Database database = Databases.getByName("snowflake_prod.analytics"); +database.setRetentionPeriod("P1Y"); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "add", "path": "/retentionPeriod", "value": "P1Y"}]' +``` + + +### Multiple Updates + +Update multiple fields in a single request. + + +```python Python +from metadata.sdk import configure, to_entity_reference, Teams, Domains +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +database = Database.retrieve_by_name("snowflake_prod.analytics") + +# Update multiple fields +database.description = "Central analytics data warehouse" +database.displayName = "Analytics DW" + +team = Teams.retrieve_by_name("data-platform") +database.owners = [to_entity_reference(team)] + +domain = Domains.retrieve_by_name("Sales") +database.domain = to_entity_reference(domain) + +database.retentionPeriod = "P1Y" + +# Save all changes +updated = Database.update(database) +``` + +```java Java +import org.openmetadata.sdk.entities.*; +import org.openmetadata.sdk.EntityReferences; + +Database database = Databases.getByName("snowflake_prod.analytics"); + +// Update multiple fields +database.setDescription("Central analytics data warehouse"); +database.setDisplayName("Analytics DW"); + +Team team = Teams.getByName("data-platform"); +database.setOwners(List.of(EntityReferences.from(team))); + +Domain domain = Domains.getByName("Sales"); +database.setDomain(EntityReferences.from(domain)); + +database.setRetentionPeriod("P1Y"); + +// Save all changes +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Central analytics data warehouse"}, + {"op": "add", "path": "/displayName", "value": "Analytics DW"}, + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]}, + {"op": "add", "path": "/retentionPeriod", "value": "P1Y"} + ]' +``` + + + +```python Python SDK +from metadata.sdk import configure +from metadata.sdk.entities import Database + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +database = Database.retrieve_by_name("snowflake_prod.analytics") +database.description = "Central analytics data warehouse" +updated = Database.update(database) +``` + +```java Java SDK +import org.openmetadata.sdk.entities.Databases; + +Database database = Databases.getByName("snowflake_prod.analytics"); +database.setDescription("Central analytics data warehouse"); +Database updated = Databases.update(database.getId(), database); +``` + +```bash cURL +curl -X PATCH "https://your-company.getcollate.io/api/v1/databases/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[{"op": "replace", "path": "/description", "value": "New description"}]' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics", + "displayName": "Analytics DW", + "description": "Central analytics data warehouse", + "version": 0.5 +} +``` + + +--- + +## Error Handling + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid patch operation or malformed JSON | +| `401` | `UNAUTHORIZED` | Invalid or missing authentication token | +| `403` | `FORBIDDEN` | User lacks permission to update this database | +| `404` | `ENTITY_NOT_FOUND` | Database with given ID not found | +| `412` | `PRECONDITION_FAILED` | Version conflict - fetch latest version and retry | + +
diff --git a/api-reference/data-assets/pipelines/create.mdx b/api-reference/data-assets/pipelines/create.mdx new file mode 100644 index 00000000..fc30b467 --- /dev/null +++ b/api-reference/data-assets/pipelines/create.mdx @@ -0,0 +1,149 @@ +--- +title: Create a Pipeline +description: Create a new pipeline within a pipeline service +sidebarTitle: Create +--- + +Create a new pipeline within a pipeline service. + +## Endpoint + +``` +PUT /v1/pipelines +``` + +## Parameters + + + Name of the pipeline. Must be unique within the parent service. + + + + Name of the parent PipelineService. + + + + Human-readable display name for the pipeline. + + + + URL to the pipeline in the source system. + + + + Cron schedule expression for the pipeline. + + + + Array of tasks/operators in the pipeline. + + + + Description of the pipeline in Markdown format. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Pipeline +from metadata.generated.schema.api.data.createPipeline import CreatePipelineRequest +from metadata.generated.schema.entity.data.pipeline import Task + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +request = CreatePipelineRequest( + name="etl_daily_sales", + displayName="Daily Sales ETL", + service="airflow_prod", + pipelineUrl="https://airflow.company.com/dags/etl_daily_sales", + scheduleInterval="0 2 * * *", + tasks=[ + Task(name="extract_orders", displayName="Extract Orders", description="Extract from source"), + Task(name="transform_data", displayName="Transform Data", description="Apply transformations"), + Task(name="load_warehouse", displayName="Load Warehouse", description="Load to DW") + ], + description="Daily ETL pipeline for sales data processing" +) + +pipeline = Pipeline.create(request) +print(f"Created: {pipeline.fullyQualifiedName}") +``` + +```java Java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.Pipelines; +import org.openmetadata.schema.api.data.CreatePipelineRequest; +import org.openmetadata.schema.entity.data.Task; + +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +Pipelines.setDefaultClient(client); + +CreatePipelineRequest request = new CreatePipelineRequest() + .withName("etl_daily_sales") + .withDisplayName("Daily Sales ETL") + .withService("airflow_prod") + .withPipelineUrl("https://airflow.company.com/dags/etl_daily_sales") + .withScheduleInterval("0 2 * * *") + .withTasks(Arrays.asList( + new Task().withName("extract_orders").withDisplayName("Extract Orders"), + new Task().withName("transform_data").withDisplayName("Transform Data"), + new Task().withName("load_warehouse").withDisplayName("Load Warehouse") + )) + .withDescription("Daily ETL pipeline for sales data processing"); + +Pipeline pipeline = Pipelines.create(request); +System.out.println("Created: " + pipeline.getFullyQualifiedName()); +``` + +```bash cURL +curl -X PUT "https://your-company.getcollate.io/api/v1/pipelines" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "etl_daily_sales", + "displayName": "Daily Sales ETL", + "service": "airflow_prod", + "pipelineUrl": "https://airflow.company.com/dags/etl_daily_sales", + "scheduleInterval": "0 2 * * *", + "tasks": [ + {"name": "extract_orders", "displayName": "Extract Orders", "description": "Extract from source"}, + {"name": "transform_data", "displayName": "Transform Data", "description": "Apply transformations"}, + {"name": "load_warehouse", "displayName": "Load Warehouse", "description": "Load to DW"} + ], + "description": "Daily ETL pipeline for sales data processing" + }' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "etl_daily_sales", + "fullyQualifiedName": "airflow_prod.etl_daily_sales", + "displayName": "Daily Sales ETL", + "description": "Daily ETL pipeline for sales data processing", + "pipelineUrl": "https://airflow.company.com/dags/etl_daily_sales", + "scheduleInterval": "0 2 * * *", + "service": { + "id": "service-uuid", + "type": "pipelineService", + "name": "airflow_prod" + }, + "tasks": [ + {"name": "extract_orders", "displayName": "Extract Orders"}, + {"name": "transform_data", "displayName": "Transform Data"}, + {"name": "load_warehouse", "displayName": "Load Warehouse"} + ], + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/pipelines/delete.mdx b/api-reference/data-assets/pipelines/delete.mdx new file mode 100644 index 00000000..e5da57bc --- /dev/null +++ b/api-reference/data-assets/pipelines/delete.mdx @@ -0,0 +1,78 @@ +--- +title: Delete a Pipeline +description: Delete a pipeline. Use hardDelete=true to permanently remove +sidebarTitle: Delete +--- + +Delete a pipeline. Use `hardDelete=true` to permanently remove. + +## Endpoint + +``` +DELETE /v1/pipelines/{id} +``` + +## Path Parameters + + + Unique identifier of the pipeline. + + +## Query Parameters + + + Set to `true` to permanently delete (cannot be restored). + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Pipeline + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +Pipeline.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete +Pipeline.delete( + "550e8400-e29b-41d4-a716-446655440000", + hard_delete=True +) +``` + +```java Java +import org.openmetadata.sdk.entities.Pipelines; + +// Soft delete +Pipelines.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete +Pipelines.delete("550e8400-e29b-41d4-a716-446655440000", false, true); +``` + +```bash cURL +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/pipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/pipelines/{id}?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "etl_daily_sales", + "fullyQualifiedName": "airflow_prod.etl_daily_sales", + "deleted": true, + "version": 0.3 +} +``` + diff --git a/api-reference/data-assets/pipelines/index.mdx b/api-reference/data-assets/pipelines/index.mdx new file mode 100644 index 00000000..ad7863ee --- /dev/null +++ b/api-reference/data-assets/pipelines/index.mdx @@ -0,0 +1,56 @@ +--- +title: Pipelines +description: Create and manage data pipeline entities +sidebarTitle: Pipelines +mode: "wide" +--- + +**Pipelines** represent data workflows and orchestration jobs from platforms like Airflow, Dagster, and dbt. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/pipeline/). + + +## Entity Hierarchy + +Pipelines are children of Pipeline Services: + +``` +PipelineService +└── Pipeline (this page) + └── Task +``` + +## Inheritance + +When you set an **owner** or **domain** on a Pipeline Service, it is inherited by all child pipelines. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/pipelines` | [Create or update a pipeline](/api-reference/data-assets/pipelines/create) | +| `GET` | `/v1/pipelines` | [List pipelines](/api-reference/data-assets/pipelines/list) | +| `GET` | `/v1/pipelines/{id}` | [Get by ID](/api-reference/data-assets/pipelines/retrieve) | +| `GET` | `/v1/pipelines/name/{fqn}` | [Get by fully qualified name](/api-reference/data-assets/pipelines/retrieve) | +| `PATCH` | `/v1/pipelines/{id}` | [Update a pipeline](/api-reference/data-assets/pipelines/update) | +| `DELETE` | `/v1/pipelines/{id}` | [Delete a pipeline](/api-reference/data-assets/pipelines/delete) | +| `PUT` | `/v1/pipelines/{id}/status` | Add pipeline status | + +--- + +## Related + + + + View pipeline object attributes + + + Configure pipeline service connections + + + Track pipeline data lineage + + diff --git a/api-reference/data-assets/pipelines/list.mdx b/api-reference/data-assets/pipelines/list.mdx new file mode 100644 index 00000000..a240e3d1 --- /dev/null +++ b/api-reference/data-assets/pipelines/list.mdx @@ -0,0 +1,124 @@ +--- +title: List Pipelines +description: List all pipelines with optional filtering and pagination +sidebarTitle: List +--- + +List all pipelines with optional filtering and pagination. + +## Endpoint + +``` +GET /v1/pipelines +``` + +## Query Parameters + + + Filter by service name. + + + + Maximum number of results to return (max: 1000000). + + + + Cursor for backward pagination. + + + + Cursor for forward pagination. + + + + Comma-separated list of fields to include. + + + + Include `all`, `deleted`, or `non-deleted` entities. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Pipeline + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all pipelines with auto-pagination +for pipeline in Pipeline.list().auto_paging_iterable(): + print(f"{pipeline.fullyQualifiedName}: {pipeline.description}") + +# Filter by service +for pipeline in Pipeline.list(service="airflow_prod").auto_paging_iterable(): + print(f"{pipeline.name}") +``` + +```java Java +import org.openmetadata.sdk.entities.Pipelines; + +// List with auto-pagination +for (Pipeline pipeline : Pipelines.list().autoPagingIterable()) { + System.out.println(pipeline.getFullyQualifiedName() + ": " + pipeline.getDescription()); +} + +// Filter by service +for (Pipeline pipeline : Pipelines.list().service("airflow_prod").autoPagingIterable()) { + System.out.println(pipeline.getName()); +} +``` + +```bash cURL +# List all pipelines +curl "https://your-company.getcollate.io/api/v1/pipelines?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Filter by service +curl "https://your-company.getcollate.io/api/v1/pipelines?service=airflow_prod&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Include tasks +curl "https://your-company.getcollate.io/api/v1/pipelines?fields=tasks,owners,tags&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "data": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "etl_daily_sales", + "fullyQualifiedName": "airflow_prod.etl_daily_sales", + "displayName": "Daily Sales ETL", + "scheduleInterval": "0 2 * * *", + "service": { + "id": "service-uuid", + "type": "pipelineService", + "name": "airflow_prod" + } + }, + { + "id": "660e8400-e29b-41d4-a716-446655440001", + "name": "etl_hourly_events", + "fullyQualifiedName": "airflow_prod.etl_hourly_events", + "displayName": "Hourly Events ETL", + "scheduleInterval": "0 * * * *", + "service": { + "id": "service-uuid", + "type": "pipelineService", + "name": "airflow_prod" + } + } + ], + "paging": { + "after": "cursor-string", + "total": 45 + } +} +``` + diff --git a/api-reference/data-assets/pipelines/object.mdx b/api-reference/data-assets/pipelines/object.mdx new file mode 100644 index 00000000..6c440812 --- /dev/null +++ b/api-reference/data-assets/pipelines/object.mdx @@ -0,0 +1,110 @@ +--- +title: The Pipeline Object +description: Attributes of the pipeline entity +sidebarTitle: The Pipeline Object +mode: "wide" +--- + + + Unique identifier for the pipeline. + + + + Name of the pipeline. Must be unique within the parent service. + + + + Fully qualified name in format `{service}.{pipeline}`. + + + + Human-readable display name for the pipeline. + + + + Description of the pipeline in Markdown format. + + + + URL to the pipeline in the source system. + + + + Cron schedule expression for the pipeline. + + + + Pipeline start date. + + + + Tasks/operators in the pipeline. + + + + Reference to the parent PipelineService. + + + + Owners of the pipeline (users or teams). + + + + Domain this pipeline belongs to. + + + + Tags and classifications applied to this pipeline. + + + + Entity version number, incremented on updates. + + + + Whether the pipeline has been soft-deleted. + + + +```json The Pipeline Object +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "etl_daily_sales", + "fullyQualifiedName": "airflow_prod.etl_daily_sales", + "displayName": "Daily Sales ETL", + "description": "Daily pipeline to process sales data", + "pipelineUrl": "https://airflow.company.com/dags/etl_daily_sales", + "service": { + "id": "service-uuid", + "type": "pipelineService", + "name": "airflow_prod" + }, + "tasks": [ + { + "name": "extract_orders", + "displayName": "Extract Orders", + "description": "Extract orders from source database" + }, + { + "name": "transform_data", + "displayName": "Transform Data", + "description": "Apply business logic transformations" + }, + { + "name": "load_warehouse", + "displayName": "Load Warehouse", + "description": "Load transformed data to warehouse" + } + ], + "scheduleInterval": "0 2 * * *", + "owners": [ + { + "id": "user-uuid", + "type": "user" + } + ], + "version": 0.1, + "deleted": false +} +``` + diff --git a/api-reference/data-assets/pipelines/retrieve.mdx b/api-reference/data-assets/pipelines/retrieve.mdx new file mode 100644 index 00000000..16b1c1f8 --- /dev/null +++ b/api-reference/data-assets/pipelines/retrieve.mdx @@ -0,0 +1,114 @@ +--- +title: Retrieve a Pipeline +description: Retrieve a pipeline by ID or fully qualified name +sidebarTitle: Retrieve +--- + +Retrieve a pipeline by ID or fully qualified name. + +## Endpoints + +``` +GET /v1/pipelines/{id} +GET /v1/pipelines/name/{fqn} +``` + +## Path Parameters + + + Unique identifier of the pipeline. + + + + Fully qualified name of the pipeline (e.g., `airflow_prod.etl_daily_sales`). + + +## Query Parameters + + + Comma-separated list of fields to include. Options: `tasks`, `owners`, `tags`, `domain`, `pipelineStatus`, `usageSummary`. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Pipeline + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +pipeline = Pipeline.retrieve_by_name( + "airflow_prod.etl_daily_sales", + fields=["tasks", "owners", "tags", "pipelineStatus"] +) + +# By ID +pipeline = Pipeline.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {pipeline.displayName}") +print(f"Schedule: {pipeline.scheduleInterval}") +print(f"Tasks: {len(pipeline.tasks) if pipeline.tasks else 0}") +``` + +```java Java +import org.openmetadata.sdk.entities.Pipelines; + +// By name +Pipeline pipeline = Pipelines.retrieveByName( + "airflow_prod.etl_daily_sales", + List.of("tasks", "owners", "tags", "pipelineStatus") +); + +// By ID +Pipeline pipeline = Pipelines.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + pipeline.getDisplayName()); +System.out.println("Schedule: " + pipeline.getScheduleInterval()); +System.out.println("Tasks: " + (pipeline.getTasks() != null ? pipeline.getTasks().size() : 0)); +``` + +```bash cURL +# By FQN +curl "https://your-company.getcollate.io/api/v1/pipelines/name/airflow_prod.etl_daily_sales?fields=tasks,owners,tags" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/pipelines/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "etl_daily_sales", + "fullyQualifiedName": "airflow_prod.etl_daily_sales", + "displayName": "Daily Sales ETL", + "description": "Daily pipeline to process sales data", + "pipelineUrl": "https://airflow.company.com/dags/etl_daily_sales", + "service": { + "id": "service-uuid", + "type": "pipelineService", + "name": "airflow_prod" + }, + "tasks": [ + {"name": "extract_orders", "displayName": "Extract Orders"}, + {"name": "transform_data", "displayName": "Transform Data"}, + {"name": "load_warehouse", "displayName": "Load Warehouse"} + ], + "scheduleInterval": "0 2 * * *", + "owners": [ + { + "id": "user-uuid", + "type": "user", + "name": "data.engineer" + } + ], + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/pipelines/update.mdx b/api-reference/data-assets/pipelines/update.mdx new file mode 100644 index 00000000..7e4d6df8 --- /dev/null +++ b/api-reference/data-assets/pipelines/update.mdx @@ -0,0 +1,114 @@ +--- +title: Update a Pipeline +description: Update a pipeline using JSON Patch operations +sidebarTitle: Update +--- + +Update a pipeline using JSON Patch operations. + +## Endpoint + +``` +PATCH /v1/pipelines/{id} +``` + +## Path Parameters + + + Unique identifier of the pipeline. + + +## Request Body + +JSON Patch document following [RFC 6902](https://tools.ietf.org/html/rfc6902). + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Pipeline, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Update description +pipeline = Pipeline.retrieve_by_name("airflow_prod.etl_daily_sales") +pipeline.description = "Critical daily ETL pipeline for sales reporting - runs at 2 AM UTC" +updated = Pipeline.update(pipeline.id, pipeline) + +# Set owner +team = Team.retrieve_by_name("data-engineering") +pipeline.owners = [{"id": str(team.id), "type": "team"}] +updated = Pipeline.update(pipeline.id, pipeline) + +# Update schedule +pipeline.scheduleInterval = "0 3 * * *" # Changed to 3 AM +updated = Pipeline.update(pipeline.id, pipeline) +``` + +```java Java +import org.openmetadata.sdk.entities.Pipelines; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +Pipeline pipeline = Pipelines.retrieveByName("airflow_prod.etl_daily_sales"); +pipeline.setDescription("Critical daily ETL pipeline for sales reporting - runs at 2 AM UTC"); +Pipeline updated = Pipelines.update(pipeline.getId(), pipeline); + +// Set owner +Team team = Teams.retrieveByName("data-engineering"); +pipeline.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +updated = Pipelines.update(pipeline.getId(), pipeline); +``` + +```bash cURL +# Update description +curl -X PATCH "https://your-company.getcollate.io/api/v1/pipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Critical daily ETL pipeline"} + ]' + +# Set owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/pipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' + +# Update schedule +curl -X PATCH "https://your-company.getcollate.io/api/v1/pipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/scheduleInterval", "value": "0 3 * * *"} + ]' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "etl_daily_sales", + "fullyQualifiedName": "airflow_prod.etl_daily_sales", + "description": "Critical daily ETL pipeline for sales reporting - runs at 2 AM UTC", + "scheduleInterval": "0 3 * * *", + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-engineering" + } + ], + "version": 0.2 +} +``` + diff --git a/api-reference/data-assets/schemas/create.mdx b/api-reference/data-assets/schemas/create.mdx new file mode 100644 index 00000000..ef226036 --- /dev/null +++ b/api-reference/data-assets/schemas/create.mdx @@ -0,0 +1,107 @@ +--- +title: Create a Database Schema +description: Create a new schema within a database +sidebarTitle: Create +--- + +Create a new database schema within a database. + +## Endpoint + +``` +PUT /v1/databaseSchemas +``` + +## Parameters + + + Name of the schema. Must be unique within the parent database. + + + + Fully qualified name of the parent Database. + + + + Human-readable display name for the schema. + + + + Description of the schema in Markdown format. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseSchema +from metadata.generated.schema.api.data.createDatabaseSchema import CreateDatabaseSchemaRequest + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +request = CreateDatabaseSchemaRequest( + name="public", + displayName="Public Schema", + database="snowflake_prod.analytics", + description="Default schema for production analytics tables" +) + +schema = DatabaseSchema.create(request) +print(f"Created: {schema.fullyQualifiedName}") +``` + +```java Java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.DatabaseSchemas; +import org.openmetadata.schema.api.data.CreateDatabaseSchemaRequest; + +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +DatabaseSchemas.setDefaultClient(client); + +CreateDatabaseSchemaRequest request = new CreateDatabaseSchemaRequest() + .withName("public") + .withDisplayName("Public Schema") + .withDatabase("snowflake_prod.analytics") + .withDescription("Default schema for production analytics tables"); + +DatabaseSchema schema = DatabaseSchemas.create(request); +System.out.println("Created: " + schema.getFullyQualifiedName()); +``` + +```bash cURL +curl -X PUT "https://your-company.getcollate.io/api/v1/databaseSchemas" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "public", + "displayName": "Public Schema", + "database": "snowflake_prod.analytics", + "description": "Default schema for production analytics tables" + }' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public", + "displayName": "Public Schema", + "description": "Default schema for production analytics tables", + "database": { + "id": "database-uuid", + "type": "database", + "name": "analytics" + }, + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/schemas/delete.mdx b/api-reference/data-assets/schemas/delete.mdx new file mode 100644 index 00000000..ff0f45a8 --- /dev/null +++ b/api-reference/data-assets/schemas/delete.mdx @@ -0,0 +1,87 @@ +--- +title: Delete a Database Schema +description: Delete a schema. Use hardDelete=true to permanently remove +sidebarTitle: Delete +--- + +Delete a database schema. Use `hardDelete=true` to permanently remove. + +## Endpoint + +``` +DELETE /v1/databaseSchemas/{id} +``` + +## Path Parameters + + + Unique identifier of the schema. + + +## Query Parameters + + + Set to `true` to permanently delete (cannot be restored). + + + + Set to `true` to also delete all child tables. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseSchema + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +DatabaseSchema.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete with all tables +DatabaseSchema.delete( + "550e8400-e29b-41d4-a716-446655440000", + recursive=True, + hard_delete=True +) +``` + +```java Java +import org.openmetadata.sdk.entities.DatabaseSchemas; + +// Soft delete +DatabaseSchemas.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete with all tables +DatabaseSchemas.delete("550e8400-e29b-41d4-a716-446655440000", true, true); +``` + +```bash cURL +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/databaseSchemas/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/databaseSchemas/{id}?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete with recursive (deletes all tables) +curl -X DELETE "https://your-company.getcollate.io/api/v1/databaseSchemas/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public", + "deleted": true, + "version": 0.3 +} +``` + diff --git a/api-reference/data-assets/schemas/index.mdx b/api-reference/data-assets/schemas/index.mdx new file mode 100644 index 00000000..c8c1b55f --- /dev/null +++ b/api-reference/data-assets/schemas/index.mdx @@ -0,0 +1,56 @@ +--- +title: Database Schemas +description: Create and manage schemas within databases +sidebarTitle: Database Schemas +mode: "wide" +--- + +A **Database Schema** organizes tables within a database. It's the direct parent of tables in the entity hierarchy. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/database-schema/). + + +## Entity Hierarchy + +Schemas sit between Databases and Tables in the hierarchy: + +``` +DatabaseService +└── Database + └── DatabaseSchema (this page) + └── Table +``` + +## Inheritance + +When you set an **owner** or **domain** on a Database Schema, it is inherited by all child tables. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/databaseSchemas` | [Create or update a schema](/api-reference/data-assets/schemas/create) | +| `GET` | `/v1/databaseSchemas` | [List database schemas](/api-reference/data-assets/schemas/list) | +| `GET` | `/v1/databaseSchemas/{id}` | [Get by ID](/api-reference/data-assets/schemas/retrieve) | +| `GET` | `/v1/databaseSchemas/name/{fqn}` | [Get by fully qualified name](/api-reference/data-assets/schemas/retrieve) | +| `PATCH` | `/v1/databaseSchemas/{id}` | [Update a schema](/api-reference/data-assets/schemas/update) | +| `DELETE` | `/v1/databaseSchemas/{id}` | [Delete a schema](/api-reference/data-assets/schemas/delete) | + +--- + +## Related + + + + View schema object attributes + + + Create tables within this schema + + + Parent database documentation + + diff --git a/api-reference/data-assets/schemas/list.mdx b/api-reference/data-assets/schemas/list.mdx new file mode 100644 index 00000000..9a064518 --- /dev/null +++ b/api-reference/data-assets/schemas/list.mdx @@ -0,0 +1,122 @@ +--- +title: List Database Schemas +description: List all schemas with optional filtering and pagination +sidebarTitle: List +--- + +List all database schemas with optional filtering and pagination. + +## Endpoint + +``` +GET /v1/databaseSchemas +``` + +## Query Parameters + + + Filter by database FQN. + + + + Maximum number of results to return (max: 1000000). + + + + Cursor for backward pagination. + + + + Cursor for forward pagination. + + + + Comma-separated list of fields to include. + + + + Include `all`, `deleted`, or `non-deleted` entities. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseSchema + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all schemas with auto-pagination +for schema in DatabaseSchema.list().auto_paging_iterable(): + print(f"{schema.fullyQualifiedName}: {schema.description}") + +# Filter by database +for schema in DatabaseSchema.list(database="snowflake_prod.analytics").auto_paging_iterable(): + print(f"{schema.name}") +``` + +```java Java +import org.openmetadata.sdk.entities.DatabaseSchemas; + +// List with auto-pagination +for (DatabaseSchema schema : DatabaseSchemas.list().autoPagingIterable()) { + System.out.println(schema.getFullyQualifiedName() + ": " + schema.getDescription()); +} + +// Filter by database +for (DatabaseSchema schema : DatabaseSchemas.list().database("snowflake_prod.analytics").autoPagingIterable()) { + System.out.println(schema.getName()); +} +``` + +```bash cURL +# List all schemas +curl "https://your-company.getcollate.io/api/v1/databaseSchemas?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Filter by database +curl "https://your-company.getcollate.io/api/v1/databaseSchemas?database=snowflake_prod.analytics&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Include additional fields +curl "https://your-company.getcollate.io/api/v1/databaseSchemas?fields=owners,tags,tables&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "data": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public", + "displayName": "Public Schema", + "database": { + "id": "database-uuid", + "type": "database", + "name": "analytics" + } + }, + { + "id": "660e8400-e29b-41d4-a716-446655440001", + "name": "staging", + "fullyQualifiedName": "snowflake_prod.analytics.staging", + "displayName": "Staging Schema", + "database": { + "id": "database-uuid", + "type": "database", + "name": "analytics" + } + } + ], + "paging": { + "after": "cursor-string", + "total": 15 + } +} +``` + diff --git a/api-reference/data-assets/schemas/object.mdx b/api-reference/data-assets/schemas/object.mdx new file mode 100644 index 00000000..d0186988 --- /dev/null +++ b/api-reference/data-assets/schemas/object.mdx @@ -0,0 +1,90 @@ +--- +title: The Database Schema Object +description: Attributes of the database schema entity +sidebarTitle: The Schema Object +mode: "wide" +--- + + + Unique identifier for the database schema. + + + + Name of the schema. Must be unique within the parent database. + + + + Fully qualified name in format `{service}.{database}.{schema}`. + + + + Human-readable display name for the schema. + + + + Description of the schema in Markdown format. + + + + Reference to the parent Database. + + + + Reference to the parent DatabaseService (derived from database). + + + + Owners of the schema (users or teams). Inherited by all child tables. + + + + Domain this schema belongs to. Inherited by all child tables. + + + + Tags and classifications applied to this schema. + + + + Child tables within this schema (read-only, populated with `fields` param). + + + + Entity version number, incremented on updates. + + + + Whether the schema has been soft-deleted. + + + +```json The Database Schema Object +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public", + "displayName": "Public Schema", + "description": "Default schema for analytics tables", + "database": { + "id": "database-uuid", + "type": "database", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics" + }, + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "snowflake_prod" + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "version": 0.1, + "deleted": false +} +``` + diff --git a/api-reference/data-assets/schemas/retrieve.mdx b/api-reference/data-assets/schemas/retrieve.mdx new file mode 100644 index 00000000..f02a7f59 --- /dev/null +++ b/api-reference/data-assets/schemas/retrieve.mdx @@ -0,0 +1,127 @@ +--- +title: Retrieve a Database Schema +description: Retrieve a schema by ID or fully qualified name +sidebarTitle: Retrieve +--- + +Retrieve a database schema by ID or fully qualified name. + +## Endpoints + +``` +GET /v1/databaseSchemas/{id} +GET /v1/databaseSchemas/name/{fqn} +``` + +## Path Parameters + + + Unique identifier of the schema. + + + + Fully qualified name of the schema (e.g., `snowflake_prod.analytics.public`). + + +## Query Parameters + + + Comma-separated list of fields to include. Options: `owners`, `tags`, `domain`, `tables`, `usageSummary`, `extension`. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseSchema + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +schema = DatabaseSchema.retrieve_by_name( + "snowflake_prod.analytics.public", + fields=["owners", "tags", "tables", "domain"] +) + +# By ID +schema = DatabaseSchema.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {schema.name}") +print(f"FQN: {schema.fullyQualifiedName}") +print(f"Database: {schema.database.fullyQualifiedName}") + +# List tables in this schema +if schema.tables: + for table in schema.tables: + print(f" Table: {table.name}") +``` + +```java Java +import org.openmetadata.sdk.entities.DatabaseSchemas; + +// By name +DatabaseSchema schema = DatabaseSchemas.retrieveByName( + "snowflake_prod.analytics.public", + List.of("owners", "tags", "tables", "domain") +); + +// By ID +DatabaseSchema schema = DatabaseSchemas.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + schema.getName()); +System.out.println("FQN: " + schema.getFullyQualifiedName()); +System.out.println("Database: " + schema.getDatabase().getFullyQualifiedName()); + +// Access tables +if (schema.getTables() != null) { + for (EntityReference table : schema.getTables()) { + System.out.println(" Table: " + table.getName()); + } +} +``` + +```bash cURL +# By FQN +curl "https://your-company.getcollate.io/api/v1/databaseSchemas/name/snowflake_prod.analytics.public?fields=owners,tags,tables,domain" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/databaseSchemas/550e8400-e29b-41d4-a716-446655440000?fields=owners,tags" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public", + "displayName": "Public Schema", + "description": "Default schema for analytics tables", + "database": { + "id": "database-uuid", + "type": "database", + "name": "analytics", + "fullyQualifiedName": "snowflake_prod.analytics" + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "tables": [ + { + "id": "table-uuid", + "type": "table", + "name": "customers" + } + ], + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/schemas/update.mdx b/api-reference/data-assets/schemas/update.mdx new file mode 100644 index 00000000..160d0246 --- /dev/null +++ b/api-reference/data-assets/schemas/update.mdx @@ -0,0 +1,101 @@ +--- +title: Update a Database Schema +description: Update a schema using JSON Patch operations +sidebarTitle: Update +--- + +Update a database schema using JSON Patch operations. + +## Endpoint + +``` +PATCH /v1/databaseSchemas/{id} +``` + +## Path Parameters + + + Unique identifier of the schema. + + +## Request Body + +JSON Patch document following [RFC 6902](https://tools.ietf.org/html/rfc6902). + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseSchema, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Update description +schema = DatabaseSchema.retrieve_by_name("snowflake_prod.analytics.public") +schema.description = "Production schema containing core business tables and views" +updated = DatabaseSchema.update(schema.id, schema) + +# Set owner +team = Team.retrieve_by_name("data-platform") +schema.owners = [{"id": str(team.id), "type": "team"}] +updated = DatabaseSchema.update(schema.id, schema) +``` + +```java Java +import org.openmetadata.sdk.entities.DatabaseSchemas; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +DatabaseSchema schema = DatabaseSchemas.retrieveByName("snowflake_prod.analytics.public"); +schema.setDescription("Production schema containing core business tables and views"); +DatabaseSchema updated = DatabaseSchemas.update(schema.getId(), schema); + +// Set owner +Team team = Teams.retrieveByName("data-platform"); +schema.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +updated = DatabaseSchemas.update(schema.getId(), schema); +``` + +```bash cURL +# Update description +curl -X PATCH "https://your-company.getcollate.io/api/v1/databaseSchemas/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Production schema containing core business tables"} + ]' + +# Set owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/databaseSchemas/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public", + "description": "Production schema containing core business tables and views", + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "version": 0.2 +} +``` + diff --git a/api-reference/data-assets/tables/create.mdx b/api-reference/data-assets/tables/create.mdx new file mode 100644 index 00000000..ed518cbe --- /dev/null +++ b/api-reference/data-assets/tables/create.mdx @@ -0,0 +1,388 @@ +--- +title: Create Table +description: Create a new table entity in the catalog +sidebarTitle: Create +mode: "wide" +--- + +# Create a Table + +Creates a new table entity in the specified database schema. + + +Tables are typically created through metadata ingestion. Use this API for manual registration or programmatic catalog population. + + +## Endpoint + +``` +POST /api/v1/tables +``` + +Or use PUT for upsert (create or update): + +``` +PUT /api/v1/tables +``` + +## Request Body + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `name` | string | Yes | Table name (unique within schema) | +| `databaseSchema` | string | Yes | FQN of the parent schema | +| `columns` | Column[] | Yes | Array of column definitions | +| `description` | string | No | Markdown description | +| `displayName` | string | No | Human-friendly display name | +| `tableType` | string | No | `Regular`, `View`, `External`, etc. | +| `owner` | EntityReference | No | Owner reference | +| `tags` | TagLabel[] | No | Tags to apply | +| `tableConstraints` | TableConstraint[] | No | Table-level constraints | +| `tablePartition` | TablePartition | No | Partition configuration | + +### Column Object + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `name` | string | Yes | Column name | +| `dataType` | string | Yes | Data type (BIGINT, VARCHAR, etc.) | +| `dataLength` | integer | No | Length for VARCHAR/CHAR | +| `description` | string | No | Column description | +| `constraint` | string | No | `PRIMARY_KEY`, `UNIQUE`, `NOT_NULL` | +| `tags` | TagLabel[] | No | Column-level tags | +| `children` | Column[] | No | Nested columns for STRUCT | + +## Examples + +### Create a Simple Table + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.api.data.createTable import CreateTableRequest +from metadata.generated.schema.entity.data.table import Column, DataType, Constraint + +metadata = OpenMetadata(config) + +# Define columns +columns = [ + Column( + name="customer_id", + dataType=DataType.BIGINT, + description="Primary key - unique customer identifier", + constraint=Constraint.PRIMARY_KEY + ), + Column( + name="email", + dataType=DataType.VARCHAR, + dataLength=255, + description="Customer email address", + constraint=Constraint.NOT_NULL + ), + Column( + name="first_name", + dataType=DataType.VARCHAR, + dataLength=100, + description="Customer first name" + ), + Column( + name="last_name", + dataType=DataType.VARCHAR, + dataLength=100, + description="Customer last name" + ), + Column( + name="created_at", + dataType=DataType.TIMESTAMP, + description="Account creation timestamp" + ) +] + +# Create the table +create_request = CreateTableRequest( + name="customers", + databaseSchema="mysql_prod.analytics.public", + description="Customer master data containing account information", + tableType="Regular", + columns=columns +) + +table = metadata.create_or_update(data=create_request) +print(f"Created table: {table.fullyQualifiedName}") +print(f"ID: {table.id}") +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.schema.api.data.CreateTable; +import org.openmetadata.schema.entity.data.Table; +import org.openmetadata.schema.type.Column; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Define columns +List columns = List.of( + new Column() + .withName("customer_id") + .withDataType(Column.DataType.BIGINT) + .withDescription("Primary key - unique customer identifier") + .withConstraint(Column.Constraint.PRIMARY_KEY), + new Column() + .withName("email") + .withDataType(Column.DataType.VARCHAR) + .withDataLength(255) + .withDescription("Customer email address") + .withConstraint(Column.Constraint.NOT_NULL), + new Column() + .withName("first_name") + .withDataType(Column.DataType.VARCHAR) + .withDataLength(100) + .withDescription("Customer first name"), + new Column() + .withName("last_name") + .withDataType(Column.DataType.VARCHAR) + .withDataLength(100) + .withDescription("Customer last name"), + new Column() + .withName("created_at") + .withDataType(Column.DataType.TIMESTAMP) + .withDescription("Account creation timestamp") +); + +// Create the table +CreateTable createRequest = new CreateTable() + .withName("customers") + .withDatabaseSchema("mysql_prod.analytics.public") + .withDescription("Customer master data containing account information") + .withTableType(Table.TableType.REGULAR) + .withColumns(columns); + +Table table = tablesApi.createOrUpdateTable(createRequest); +System.out.println("Created table: " + table.getFullyQualifiedName()); +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "customers", + "databaseSchema": "mysql_prod.analytics.public", + "description": "Customer master data containing account information", + "tableType": "Regular", + "columns": [ + { + "name": "customer_id", + "dataType": "BIGINT", + "description": "Primary key - unique customer identifier", + "constraint": "PRIMARY_KEY" + }, + { + "name": "email", + "dataType": "VARCHAR", + "dataLength": 255, + "description": "Customer email address", + "constraint": "NOT_NULL" + }, + { + "name": "first_name", + "dataType": "VARCHAR", + "dataLength": 100, + "description": "Customer first name" + }, + { + "name": "last_name", + "dataType": "VARCHAR", + "dataLength": 100, + "description": "Customer last name" + }, + { + "name": "created_at", + "dataType": "TIMESTAMP", + "description": "Account creation timestamp" + } + ] + }' +``` + + + +### Create Table with Tags and Owner + + + +```python +from metadata.generated.schema.api.data.createTable import CreateTableRequest +from metadata.generated.schema.type.tagLabel import TagLabel, LabelType, State, TagSource + +create_request = CreateTableRequest( + name="orders", + databaseSchema="mysql_prod.analytics.public", + description="Order transactions", + tableType="Regular", + columns=[ + Column( + name="order_id", + dataType=DataType.BIGINT, + constraint=Constraint.PRIMARY_KEY + ), + Column( + name="customer_id", + dataType=DataType.BIGINT, + description="FK to customers table" + ), + Column( + name="total_amount", + dataType=DataType.DECIMAL, + description="Order total in USD" + ) + ], + owner={"id": "user-uuid", "type": "user"}, + tags=[ + TagLabel( + tagFQN="Tier.Tier1", + source=TagSource.Classification, + state=State.Confirmed, + labelType=LabelType.Manual + ) + ] +) + +table = metadata.create_or_update(data=create_request) +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "orders", + "databaseSchema": "mysql_prod.analytics.public", + "description": "Order transactions", + "tableType": "Regular", + "columns": [ + {"name": "order_id", "dataType": "BIGINT", "constraint": "PRIMARY_KEY"}, + {"name": "customer_id", "dataType": "BIGINT"}, + {"name": "total_amount", "dataType": "DECIMAL"} + ], + "owner": {"id": "user-uuid", "type": "user"}, + "tags": [ + {"tagFQN": "Tier.Tier1", "source": "Classification"} + ] + }' +``` + + + +### Create Table with Nested Columns (STRUCT) + + + +```python +create_request = CreateTableRequest( + name="events", + databaseSchema="bigquery_prod.analytics.events", + description="Event log with nested user properties", + columns=[ + Column(name="event_id", dataType=DataType.STRING), + Column(name="event_type", dataType=DataType.STRING), + Column( + name="user_properties", + dataType=DataType.STRUCT, + description="Nested user data", + children=[ + Column(name="user_id", dataType=DataType.STRING), + Column(name="email", dataType=DataType.STRING), + Column( + name="address", + dataType=DataType.STRUCT, + children=[ + Column(name="street", dataType=DataType.STRING), + Column(name="city", dataType=DataType.STRING), + Column(name="country", dataType=DataType.STRING) + ] + ) + ] + ), + Column(name="timestamp", dataType=DataType.TIMESTAMP) + ] +) + +table = metadata.create_or_update(data=create_request) +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "events", + "databaseSchema": "bigquery_prod.analytics.events", + "description": "Event log with nested user properties", + "columns": [ + {"name": "event_id", "dataType": "STRING"}, + {"name": "event_type", "dataType": "STRING"}, + { + "name": "user_properties", + "dataType": "STRUCT", + "description": "Nested user data", + "children": [ + {"name": "user_id", "dataType": "STRING"}, + {"name": "email", "dataType": "STRING"}, + { + "name": "address", + "dataType": "STRUCT", + "children": [ + {"name": "street", "dataType": "STRING"}, + {"name": "city", "dataType": "STRING"}, + {"name": "country", "dataType": "STRING"} + ] + } + ] + }, + {"name": "timestamp", "dataType": "TIMESTAMP"} + ] + }' +``` + + + +## Response + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customers", + "fullyQualifiedName": "mysql_prod.analytics.public.customers", + "description": "Customer master data containing account information", + "version": 0.1, + "updatedAt": 1704067200000, + "updatedBy": "admin", + "href": "https://your-company.getcollate.io/api/v1/tables/550e8400...", + "tableType": "Regular", + "columns": [...], + "databaseSchema": {...}, + "database": {...}, + "service": {...}, + "deleted": false +} +``` + +## Errors + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid column definition or missing required fields | +| `401` | `UNAUTHORIZED` | Invalid or missing token | +| `403` | `FORBIDDEN` | User lacks permission to create tables | +| `404` | `ENTITY_NOT_FOUND` | Database schema not found | +| `409` | `ENTITY_ALREADY_EXISTS` | Table with same name exists in schema | diff --git a/api-reference/data-assets/tables/delete.mdx b/api-reference/data-assets/tables/delete.mdx new file mode 100644 index 00000000..d30bf750 --- /dev/null +++ b/api-reference/data-assets/tables/delete.mdx @@ -0,0 +1,238 @@ +--- +title: Delete Table +description: Soft or hard delete a table from the catalog +sidebarTitle: Delete +mode: "wide" +--- + +# Delete a Table + +Deletes a table entity from the catalog. Supports both soft delete (recoverable) and hard delete (permanent). + +## Endpoints + +``` +DELETE /api/v1/tables/{id} +DELETE /api/v1/tables/name/{fqn} +``` + +## Parameters + +### Path Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `id` | UUID | Table ID | +| `fqn` | string | Fully qualified name | + +### Query Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `hardDelete` | boolean | `false` | Permanently delete if true | +| `recursive` | boolean | `false` | Delete child entities if true | + +## Soft Delete (Default) + +Soft delete marks the table as deleted but retains the data. The table can be restored later. + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Soft delete by ID +metadata.delete( + entity=Table, + entity_id="550e8400-e29b-41d4-a716-446655440000", + soft_delete=True +) + +# Or by FQN +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") +metadata.delete( + entity=Table, + entity_id=table.id, + soft_delete=True +) + +print("Table soft deleted") +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import java.util.UUID; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Soft delete (default) +tablesApi.deleteTable( + UUID.fromString("550e8400-e29b-41d4-a716-446655440000"), + false, // hardDelete + false // recursive +); + +System.out.println("Table soft deleted"); +``` + + + +```bash +# Soft delete by ID +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" + +# Soft delete by FQN +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/name/mysql_prod.analytics.public.customers" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Hard Delete + +Hard delete permanently removes the table and all its metadata. This action cannot be undone. + + +Hard delete permanently removes all table metadata, lineage, quality tests, and history. Use with caution. + + + + +```python +# Hard delete - permanent removal +metadata.delete( + entity=Table, + entity_id="550e8400-e29b-41d4-a716-446655440000", + hard_delete=True +) + +print("Table permanently deleted") +``` + + + +```java +// Hard delete - permanent removal +tablesApi.deleteTable( + UUID.fromString("550e8400-e29b-41d4-a716-446655440000"), + true, // hardDelete + false // recursive +); +``` + + + +```bash +# Hard delete by ID +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete by FQN +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/name/mysql_prod.analytics.public.customers?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Recursive Delete + +Use recursive delete to remove dependent entities. Not typically needed for tables (they're leaf entities). + +```bash +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/550e8400...?recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + +## Restore a Soft-Deleted Table + +Soft-deleted tables can be restored: + + + +```python +# Restore a soft-deleted table +metadata.restore( + entity=Table, + entity_id="550e8400-e29b-41d4-a716-446655440000" +) + +print("Table restored") +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables/restore" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"id": "550e8400-e29b-41d4-a716-446655440000"}' +``` + + + +## List Deleted Tables + +Query for soft-deleted tables: + + + +```bash +# List all deleted tables +curl -X GET "https://your-company.getcollate.io/api/v1/tables?include=deleted&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Get a specific deleted table +curl -X GET "https://your-company.getcollate.io/api/v1/tables/550e8400...?include=deleted" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Response + +Successful delete returns 200 OK with the deleted entity: + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customers", + "fullyQualifiedName": "mysql_prod.analytics.public.customers", + "deleted": true, + "version": 0.6, + "updatedAt": 1704153600000, + "updatedBy": "admin", + ... +} +``` + +## Errors + +| Code | Error Type | Description | +|------|-----------|-------------| +| `401` | `UNAUTHORIZED` | Invalid or missing token | +| `403` | `FORBIDDEN` | User lacks permission to delete | +| `404` | `ENTITY_NOT_FOUND` | Table not found | +| `409` | `ENTITY_LOCKED` | Table is locked (deletion in progress) | + +## Best Practices + + + + Use soft delete as the default. It allows recovery if needed. + + + Hard delete should only be used for cleanup of test data or erroneous entries. + + + Review lineage and quality tests before deleting tables that may be referenced. + + + When automating deletes, implement safeguards to prevent accidental data loss. + + diff --git a/api-reference/data-assets/tables/index.mdx b/api-reference/data-assets/tables/index.mdx new file mode 100644 index 00000000..0d4f769d --- /dev/null +++ b/api-reference/data-assets/tables/index.mdx @@ -0,0 +1,701 @@ +--- +title: Tables +description: Create, retrieve, update, and manage table metadata +sidebarTitle: Tables +mode: "wide" +--- + +Tables represent structured data assets in your data catalog. They belong to a database schema and contain columns with metadata, constraints, and profiling information. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/table/). + + +## Entity Hierarchy + +Tables are the fourth level in the database entity hierarchy: + +``` +DatabaseService +└── Database + └── DatabaseSchema + └── Table (this page) + └── Column +``` + +Before creating a table, you must have the parent schema. See [Entity Hierarchy](/api-reference/core/entities). + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/tables` | Create or update a table | +| `GET` | `/v1/tables` | List tables | +| `GET` | `/v1/tables/{id}` | Get by ID | +| `GET` | `/v1/tables/name/{fqn}` | Get by fully qualified name | +| `PATCH` | `/v1/tables/{id}` | Partial update | +| `DELETE` | `/v1/tables/{id}` | Delete a table | + +--- + +## The Table Object + + + Unique identifier for the table. + + + + Name of the table. Must be unique within the schema. + + + + Full path: `service.database.schema.table` + + + + Human-readable display name for the table. + + + + Description of the table in Markdown format. + + + + Type of table. One of: `Regular`, `View`, `MaterializedView`, `External`, `SecureView`, `Iceberg`, `Partitioned`, `Dynamic`. + + + + Array of column definitions. Each column has `name`, `dataType`, `description`, `constraint`, and optional `tags`. + + + + Reference to the parent database schema. + + + + Owners of the table (users or teams). + + + + Tags and classifications applied to this table. + + + + Domain this table belongs to. + + + + Primary keys, foreign keys, and unique constraints. + + + + Entity version number, incremented on updates. + + + + Whether the table has been soft-deleted. + + +```json Example Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customers", + "fullyQualifiedName": "snowflake_prod.analytics.public.customers", + "displayName": "Customer Master Data", + "description": "Primary customer information table", + "tableType": "Regular", + "columns": [ + { + "name": "customer_id", + "dataType": "BIGINT", + "dataTypeDisplay": "bigint", + "description": "Primary key", + "constraint": "PRIMARY_KEY", + "ordinalPosition": 1 + }, + { + "name": "email", + "dataType": "VARCHAR", + "dataLength": 255, + "description": "Customer email address", + "tags": [{"tagFQN": "PII.Sensitive"}], + "ordinalPosition": 2 + } + ], + "owners": [ + { + "id": "user-uuid", + "type": "user", + "name": "john.doe" + } + ], + "databaseSchema": { + "id": "schema-uuid", + "type": "databaseSchema", + "name": "public", + "fullyQualifiedName": "snowflake_prod.analytics.public" + }, + "tags": [{"tagFQN": "Tier.Tier1"}], + "version": 0.4, + "deleted": false +} +``` + +--- + +## Create a Table + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table +from metadata.generated.schema.api.data.createTable import CreateTableRequest +from metadata.generated.schema.entity.data.table import Column, DataType, TableType + +# Configure the SDK +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Create the table +request = CreateTableRequest( + name="customers", + databaseSchema="snowflake_prod.analytics.public", + description="Customer master data", + tableType=TableType.Regular, + columns=[ + Column(name="customer_id", dataType=DataType.BIGINT, description="Primary key"), + Column(name="email", dataType=DataType.VARCHAR, dataLength=255, description="Email"), + Column(name="name", dataType=DataType.VARCHAR, dataLength=100, description="Full name"), + Column(name="created_at", dataType=DataType.TIMESTAMP, description="Created timestamp") + ] +) + +table = Table.create(request) +print(f"Created: {table.fullyQualifiedName}") +``` + + + +```java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.Tables; +import org.openmetadata.schema.api.data.CreateTableRequest; +import org.openmetadata.schema.entity.data.Table; +import org.openmetadata.schema.type.Column; +import org.openmetadata.schema.type.ColumnDataType; + +// Initialize client +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +Tables.setDefaultClient(client); + +// Create the table +CreateTableRequest request = new CreateTableRequest() + .withName("customers") + .withDatabaseSchema("snowflake_prod.analytics.public") + .withDescription("Customer master data") + .withTableType(TableType.Regular) + .withColumns(List.of( + new Column().withName("customer_id").withDataType(ColumnDataType.BIGINT), + new Column().withName("email").withDataType(ColumnDataType.VARCHAR).withDataLength(255), + new Column().withName("name").withDataType(ColumnDataType.VARCHAR).withDataLength(100), + new Column().withName("created_at").withDataType(ColumnDataType.TIMESTAMP) + )); + +Table table = Tables.create(request); +System.out.println("Created: " + table.getFullyQualifiedName()); +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "customers", + "databaseSchema": "snowflake_prod.analytics.public", + "description": "Customer master data", + "tableType": "Regular", + "columns": [ + {"name": "customer_id", "dataType": "BIGINT", "description": "Primary key"}, + {"name": "email", "dataType": "VARCHAR", "dataLength": 255, "description": "Email"}, + {"name": "name", "dataType": "VARCHAR", "dataLength": 100, "description": "Full name"}, + {"name": "created_at", "dataType": "TIMESTAMP", "description": "Created timestamp"} + ] + }' +``` + + + +--- + +## Retrieve a Table + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By FQN +table = Table.retrieve_by_name( + "snowflake_prod.analytics.public.customers", + fields=["owners", "tags", "columns"] +) + +# By ID +table = Table.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {table.name}") +print(f"Columns: {len(table.columns)}") +``` + + + +```java +import org.openmetadata.sdk.entities.Tables; + +// By FQN +Table table = Tables.retrieveByName( + "snowflake_prod.analytics.public.customers", + List.of("owners", "tags", "columns") +); + +// By ID +Table table = Tables.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + table.getName()); +System.out.println("Columns: " + table.getColumns().size()); +``` + + + +```bash +# By FQN +curl "https://your-company.getcollate.io/api/v1/tables/name/snowflake_prod.analytics.public.customers?fields=owners,tags,columns" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000?fields=owners,tags,columns" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## List Tables + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table +from metadata.sdk.entities.table import TableListParams + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all tables with auto-pagination +for table in Table.list().auto_paging_iterable(): + print(f"{table.fullyQualifiedName}") + +# List with parameters +params = TableListParams.builder() \ + .database("snowflake_prod.analytics") \ + .limit(50) \ + .fields(["owners", "tags"]) \ + .build() + +tables = Table.list(params) +for table in tables.get_data(): + print(table.name) +``` + + + +```java +import org.openmetadata.sdk.entities.Tables; +import org.openmetadata.sdk.entities.TableListParams; + +// List with auto-pagination +for (Table table : Tables.list().autoPagingIterable()) { + System.out.println(table.getFullyQualifiedName()); +} + +// List with parameters +TableListParams params = TableListParams.builder() + .database("snowflake_prod.analytics") + .limit(50) + .fields(List.of("owners", "tags")) + .build(); + +TableCollection tables = Tables.list(params); +for (Table table : tables.getData()) { + System.out.println(table.getName()); +} +``` + + + +```bash +# List all +curl "https://your-company.getcollate.io/api/v1/tables?limit=50&fields=owners,tags" \ + -H "Authorization: Bearer $TOKEN" + +# Filter by database +curl "https://your-company.getcollate.io/api/v1/tables?database=snowflake_prod.analytics&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `limit` | integer | Maximum results (default: 10) | +| `before` | string | Cursor for backward pagination | +| `after` | string | Cursor for forward pagination | +| `fields` | string | Comma-separated fields to include | +| `database` | string | Filter by database FQN | +| `databaseSchema` | string | Filter by schema FQN | +| `include` | string | `all`, `deleted`, or `non-deleted` | + +--- + +## Update a Table + +### Update Description + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +table = Table.retrieve_by_name("snowflake_prod.analytics.public.customers") +table.description = "Updated: Customer master data with PII fields" +updated = Table.update(table.id, table) +``` + + + +```java +import org.openmetadata.sdk.entities.Tables; + +Table table = Tables.retrieveByName("snowflake_prod.analytics.public.customers"); +table.setDescription("Updated: Customer master data with PII fields"); +Table updated = Tables.update(table.getId(), table); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Updated description"} + ]' +``` + + + +### Set Owner + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Get the team +team = Team.retrieve_by_name("data-platform") + +# Update the table owner +table = Table.retrieve_by_name("snowflake_prod.analytics.public.customers") +table.owners = [{"id": str(team.id), "type": "team"}] +updated = Table.update(table.id, table) +``` + + + +```java +import org.openmetadata.sdk.entities.Tables; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +// Get the team +Team team = Teams.retrieveByName("data-platform"); + +// Update the table owner +Table table = Tables.retrieveByName("snowflake_prod.analytics.public.customers"); +table.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +Table updated = Tables.update(table.getId(), table); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' +``` + + + +### Add Tags + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +table = Table.retrieve_by_name("snowflake_prod.analytics.public.customers", fields=["tags"]) +table.tags = [ + {"tagFQN": "PII.Sensitive", "labelType": "Manual", "state": "Confirmed"}, + {"tagFQN": "Tier.Tier1", "labelType": "Manual", "state": "Confirmed"} +] +updated = Table.update(table.id, table) +``` + + + +```java +import org.openmetadata.sdk.entities.Tables; +import org.openmetadata.schema.type.TagLabel; + +Table table = Tables.retrieveByName("snowflake_prod.analytics.public.customers", List.of("tags")); +table.setTags(List.of( + new TagLabel() + .withTagFQN("PII.Sensitive") + .withLabelType(TagLabel.LabelType.Manual) + .withState(TagLabel.State.Confirmed), + new TagLabel() + .withTagFQN("Tier.Tier1") + .withLabelType(TagLabel.LabelType.Manual) + .withState(TagLabel.State.Confirmed) +)); +Table updated = Tables.update(table.getId(), table); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/tags/-", "value": {"tagFQN": "PII.Sensitive", "labelType": "Manual", "state": "Confirmed"}} + ]' +``` + + + +### Update Column Description + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/columns/0/description", "value": "Unique customer identifier"} + ]' +``` + + + +### Add Column Tag + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/columns/1/tags/-", "value": {"tagFQN": "PII.Email", "labelType": "Manual", "state": "Confirmed"}} + ]' +``` + + + +--- + +## Delete a Table + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import Table + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +Table.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete +Table.delete("550e8400-e29b-41d4-a716-446655440000", hard_delete=True) +``` + + + +```java +import org.openmetadata.sdk.entities.Tables; + +// Soft delete +Tables.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete +Tables.delete("550e8400-e29b-41d4-a716-446655440000", false, true); +``` + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/tables/{id}?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Column Object + + + Column name. + + + + Data type: `BIGINT`, `VARCHAR`, `DECIMAL`, `BOOLEAN`, `TIMESTAMP`, `DATE`, `ARRAY`, `STRUCT`, etc. + + + + Length for VARCHAR/CHAR types. + + + + Column description. + + + + Constraint: `PRIMARY_KEY`, `UNIQUE`, `NOT_NULL`, `FOREIGN_KEY`. + + + + Column-level tags and classifications. + + + + Column position (1-based). + + + + Nested columns for STRUCT/ARRAY types. + + +--- + +## Table Types + +| Type | Description | +|------|-------------| +| `Regular` | Standard table | +| `View` | Database view | +| `MaterializedView` | Materialized view | +| `External` | External table (S3, HDFS) | +| `SecureView` | Secure view (Snowflake) | +| `Iceberg` | Apache Iceberg table | +| `Partitioned` | Partitioned table | +| `Dynamic` | Dynamic table (Snowflake) | + +--- + +## Expandable Fields + +Use `fields` parameter to include additional data: + +| Field | Description | +|-------|-------------| +| `owners` | Owner information | +| `tags` | Applied tags | +| `columns` | Column definitions | +| `tableConstraints` | Constraints | +| `usageSummary` | Usage statistics | +| `tableProfile` | Profiling data | +| `sampleData` | Sample data rows | +| `viewDefinition` | View SQL | +| `domain` | Domain information | +| `testSuite` | Test suite | + +--- + +## Related + + + + Parent schemas for tables + + + Manage table tags + + + Table lineage + + + Table tests + + diff --git a/api-reference/data-assets/tables/list.mdx b/api-reference/data-assets/tables/list.mdx new file mode 100644 index 00000000..cd715248 --- /dev/null +++ b/api-reference/data-assets/tables/list.mdx @@ -0,0 +1,288 @@ +--- +title: List Tables +description: List and filter table entities with pagination +sidebarTitle: List +mode: "wide" +--- + +# List Tables + +Retrieve a paginated list of tables with optional filtering. + +## Endpoint + +``` +GET /api/v1/tables +``` + +## Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `fields` | string | - | Comma-separated fields to include | +| `limit` | integer | 10 | Number of results per page (max 1000) | +| `before` | string | - | Cursor for previous page | +| `after` | string | - | Cursor for next page | +| `database` | string | - | Filter by database FQN | +| `databaseSchema` | string | - | Filter by schema FQN | +| `service` | string | - | Filter by database service name | +| `serviceType` | string | - | Filter by service type (MySQL, Snowflake, etc.) | +| `include` | string | `non-deleted` | Include deleted: `non-deleted`, `deleted`, `all` | + +## Examples + +### Basic List + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# List first 20 tables +tables = metadata.list_entities( + entity=Table, + limit=20 +) + +print(f"Total tables: {tables.paging.total}") +for table in tables.entities: + print(f" {table.fullyQualifiedName}") +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.client.model.TableList; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +TableList tables = tablesApi.listTables( + null, // fields + 20, // limit + null, // before + null, // after + null // include +); + +System.out.println("Total: " + tables.getPaging().getTotal()); +for (Table table : tables.getData()) { + System.out.println(table.getFullyQualifiedName()); +} +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/tables?limit=20" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### List with Pagination + + + +```python +# Iterate through all tables +for table in metadata.list_all_entities(entity=Table, limit=100): + print(table.fullyQualifiedName) + # Process each table... +``` + + + +```java +String afterCursor = null; + +do { + TableList page = tablesApi.listTables(null, 100, null, afterCursor, null); + + for (Table table : page.getData()) { + System.out.println(table.getFullyQualifiedName()); + } + + afterCursor = page.getPaging().getAfter(); +} while (afterCursor != null); +``` + + + +```bash +# First page +curl -X GET "https://your-company.getcollate.io/api/v1/tables?limit=100" \ + -H "Authorization: Bearer $TOKEN" + +# Response includes paging.after cursor +# Use it for next page: +curl -X GET "https://your-company.getcollate.io/api/v1/tables?limit=100&after=eyJsYXN0SWQiOiIxMjM0NTY3ODkwIn0=" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Filter by Database + + + +```python +# List tables in a specific database +tables = metadata.list_entities( + entity=Table, + params={"database": "sample_data.ecommerce_db"}, + limit=50 +) + +for table in tables.entities: + print(table.fullyQualifiedName) +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/tables?database=sample_data.ecommerce_db&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Filter by Schema + + + +```bash +# List tables in a specific schema +curl -X GET "https://your-company.getcollate.io/api/v1/tables?databaseSchema=sample_data.ecommerce_db.shopify&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Filter by Service + + + +```bash +# List tables from a specific service +curl -X GET "https://your-company.getcollate.io/api/v1/tables?service=sample_data&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Filter by Service Type + + + +```bash +# List all Snowflake tables +curl -X GET "https://your-company.getcollate.io/api/v1/tables?serviceType=Snowflake&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# List all BigQuery tables +curl -X GET "https://your-company.getcollate.io/api/v1/tables?serviceType=BigQuery&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Include Specific Fields + + + +```bash +# Include owner and tags with results +curl -X GET "https://your-company.getcollate.io/api/v1/tables?fields=owner,tags&limit=20" \ + -H "Authorization: Bearer $TOKEN" + +# Include columns and usage summary +curl -X GET "https://your-company.getcollate.io/api/v1/tables?fields=columns,usageSummary&limit=20" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Include Deleted Tables + + + +```bash +# List only deleted tables +curl -X GET "https://your-company.getcollate.io/api/v1/tables?include=deleted&limit=20" \ + -H "Authorization: Bearer $TOKEN" + +# List all tables (including deleted) +curl -X GET "https://your-company.getcollate.io/api/v1/tables?include=all&limit=20" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Combine Filters + + + +```bash +# BigQuery tables in a specific database with owner info +curl -X GET "https://your-company.getcollate.io/api/v1/tables?serviceType=BigQuery&database=sample_data.ecommerce_db&fields=owner,tags&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Response + +```json +{ + "data": [ + { + "id": "0b3ae7a0-6be1-4f32-ab19-d0e3b332557a", + "name": "dim_customer", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_customer", + "description": "Customer dimension table", + "version": 0.1, + "updatedAt": 1766944701945, + "updatedBy": "admin", + "href": "http://localhost:8585/api/v1/tables/0b3ae7a0-6be1-4f32-ab19-d0e3b332557a", + "tableType": "Regular", + "columns": [...], + "databaseSchema": {...}, + "database": {...}, + "service": {...}, + "serviceType": "BigQuery", + "deleted": false + }, + { + "id": "60b94fdd-a48a-48c1-a8c5-532a42eecd68", + "name": "fact_orders", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.fact_orders", + ... + } + ], + "paging": { + "total": 298, + "after": "eyJpZCI6IjFmZTVhNTljLTZkOTMtNDQzYS05NjhkLTExOWIxZWJkNWU2YiIsIm5hbWUiOiJDYXRlZ29yaWVzIn0=" + } +} +``` + +## Response Fields + +| Field | Description | +|-------|-------------| +| `data` | Array of table entities | +| `paging.total` | Total count of matching tables | +| `paging.before` | Cursor for previous page | +| `paging.after` | Cursor for next page | + +## Errors + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid filter or limit value | +| `401` | `UNAUTHORIZED` | Invalid or missing token | +| `404` | `ENTITY_NOT_FOUND` | Referenced filter entity not found | diff --git a/api-reference/data-assets/tables/retrieve.mdx b/api-reference/data-assets/tables/retrieve.mdx new file mode 100644 index 00000000..5be1afb9 --- /dev/null +++ b/api-reference/data-assets/tables/retrieve.mdx @@ -0,0 +1,273 @@ +--- +title: Retrieve Table +description: Get table metadata by ID or fully qualified name +sidebarTitle: Retrieve +mode: "wide" +--- + +# Retrieve a Table + +Retrieves a table entity by its ID or Fully Qualified Name (FQN). + +## Endpoints + +``` +GET /api/v1/tables/{id} +GET /api/v1/tables/name/{fqn} +``` + +## Parameters + +### Path Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `id` | UUID | Table ID | +| `fqn` | string | Fully qualified name (e.g., `service.database.schema.table`) | + +### Query Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `fields` | string | - | Comma-separated list of fields to include | +| `include` | string | `non-deleted` | Include deleted tables: `non-deleted`, `deleted`, `all` | + +### Available Fields + +| Field | Description | +|-------|-------------| +| `owner` | Include owner information | +| `tags` | Include tags and classifications | +| `columns` | Include column definitions | +| `followers` | Include followers list | +| `tableConstraints` | Include table constraints | +| `usageSummary` | Include usage statistics | +| `tableProfile` | Include latest profile data | +| `tableProfilerConfig` | Include profiler configuration | +| `location` | Include storage location | +| `viewDefinition` | Include view SQL | +| `joins` | Include join information | +| `sampleData` | Include sample data | +| `dataModel` | Include data model | +| `extension` | Include custom properties | +| `testSuite` | Include test suite | +| `domain` | Include domain information | +| `dataProducts` | Include associated data products | + +## Examples + +### Get Table by ID + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Retrieve by ID +table = metadata.get_by_id( + entity=Table, + entity_id="550e8400-e29b-41d4-a716-446655440000", + fields=["owner", "tags", "columns"] +) + +print(f"Name: {table.name}") +print(f"Description: {table.description}") +print(f"Columns: {len(table.columns)}") +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.client.model.Table; +import java.util.UUID; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +Table table = tablesApi.getTableByID( + UUID.fromString("550e8400-e29b-41d4-a716-446655440000"), + "owner,tags,columns" // fields +); + +System.out.println("Name: " + table.getName()); +System.out.println("Description: " + table.getDescription()); +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000?fields=owner,tags,columns" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Get Table by FQN + + + +```python +# Retrieve by fully qualified name +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer", + fields=["owner", "tags", "columns", "usageSummary"] +) + +print(f"FQN: {table.fullyQualifiedName}") +print(f"Service: {table.service.name}") +print(f"Database: {table.database.name}") +print(f"Schema: {table.databaseSchema.name}") +``` + + + +```java +Table table = tablesApi.getTableByFQN( + "sample_data.ecommerce_db.shopify.dim_customer", + "owner,tags,columns,usageSummary", // fields + null // include +); + +System.out.println("FQN: " + table.getFullyQualifiedName()); +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/tables/name/sample_data.ecommerce_db.shopify.dim_customer?fields=owner,tags,columns,usageSummary" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Get Table with Sample Data + + + +```python +# Include sample data in response +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer", + fields=["sampleData"] +) + +if table.sampleData: + print("Sample columns:", table.sampleData.columns) + for row in table.sampleData.rows[:5]: + print(row) +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/tables/name/sample_data.ecommerce_db.shopify.dim_customer?fields=sampleData" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Get Table with Profile Data + + + +```bash +# Include profiling data +curl -X GET "https://your-company.getcollate.io/api/v1/tables/name/sample_data.ecommerce_db.shopify.dim_customer?fields=tableProfile,columns" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Include Deleted Tables + + + +```bash +# Include deleted tables in the result +curl -X GET "https://your-company.getcollate.io/api/v1/tables/name/sample_data.ecommerce_db.shopify.deleted_table?include=deleted" \ + -H "Authorization: Bearer $TOKEN" + +# Include all tables regardless of deletion status +curl -X GET "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000?include=all" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Response + +```json +{ + "id": "0b3ae7a0-6be1-4f32-ab19-d0e3b332557a", + "name": "dim_customer", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_customer", + "description": "Customer dimension table with demographics", + "version": 0.4, + "updatedAt": 1766944701945, + "updatedBy": "admin", + "href": "http://localhost:8585/api/v1/tables/0b3ae7a0-6be1-4f32-ab19-d0e3b332557a", + "tableType": "Regular", + "columns": [ + { + "name": "customer_id", + "dataType": "BIGINT", + "dataTypeDisplay": "bigint", + "description": "Unique customer identifier", + "constraint": "PRIMARY_KEY", + "ordinalPosition": 1 + }, + { + "name": "first_name", + "dataType": "VARCHAR", + "dataLength": 100, + "dataTypeDisplay": "varchar(100)", + "ordinalPosition": 2 + } + ], + "owner": { + "id": "user-uuid", + "type": "user", + "name": "john.doe", + "displayName": "John Doe" + }, + "tags": [ + { + "tagFQN": "PII.Sensitive", + "source": "Classification" + } + ], + "databaseSchema": { + "id": "schema-uuid", + "type": "databaseSchema", + "name": "shopify", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify" + }, + "database": { + "id": "db-uuid", + "type": "database", + "name": "ecommerce_db", + "fullyQualifiedName": "sample_data.ecommerce_db" + }, + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "sample_data" + }, + "serviceType": "BigQuery", + "deleted": false +} +``` + +## Errors + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid fields parameter | +| `401` | `UNAUTHORIZED` | Invalid or missing token | +| `403` | `FORBIDDEN` | User lacks permission to view table | +| `404` | `ENTITY_NOT_FOUND` | Table with given ID/FQN not found | diff --git a/api-reference/data-assets/tables/update.mdx b/api-reference/data-assets/tables/update.mdx new file mode 100644 index 00000000..8fb69796 --- /dev/null +++ b/api-reference/data-assets/tables/update.mdx @@ -0,0 +1,344 @@ +--- +title: Update Table +description: Update table metadata using PUT or PATCH operations +sidebarTitle: Update +mode: "wide" +--- + +# Update a Table + +Update table metadata. Use PUT for full replacement or PATCH for partial updates. + +## Endpoints + +``` +PUT /api/v1/tables # Create or replace (upsert) +PATCH /api/v1/tables/{id} # Partial update using JSON Patch +``` + +## PUT: Create or Replace + +Creates a new table or replaces an existing one. All fields must be provided. + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.api.data.createTable import CreateTableRequest +from metadata.generated.schema.entity.data.table import Column, DataType + +metadata = OpenMetadata(config) + +# Full update using PUT +create_request = CreateTableRequest( + name="customers", + databaseSchema="mysql_prod.analytics.public", + description="Updated customer master data", + tableType="Regular", + columns=[ + Column(name="customer_id", dataType=DataType.BIGINT), + Column(name="email", dataType=DataType.VARCHAR, dataLength=255), + Column(name="first_name", dataType=DataType.VARCHAR), + Column(name="last_name", dataType=DataType.VARCHAR), + Column(name="phone", dataType=DataType.VARCHAR), # New column + ] +) + +table = metadata.create_or_update(data=create_request) +print(f"Updated table version: {table.version}") +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "customers", + "databaseSchema": "mysql_prod.analytics.public", + "description": "Updated customer master data", + "tableType": "Regular", + "columns": [ + {"name": "customer_id", "dataType": "BIGINT"}, + {"name": "email", "dataType": "VARCHAR", "dataLength": 255}, + {"name": "first_name", "dataType": "VARCHAR"}, + {"name": "last_name", "dataType": "VARCHAR"}, + {"name": "phone", "dataType": "VARCHAR"} + ] + }' +``` + + + +## PATCH: Partial Update + +Update specific fields using JSON Patch operations. More efficient for small changes. + +### JSON Patch Operations + +| Operation | Description | Example | +|-----------|-------------|---------| +| `add` | Add a new value | Add a tag | +| `remove` | Remove a value | Remove owner | +| `replace` | Replace a value | Update description | +| `move` | Move a value | - | +| `copy` | Copy a value | - | +| `test` | Test a value | Verify before update | + +### Update Description + + + +```python +from metadata.generated.schema.entity.data.table import Table + +# Get the table first +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") + +# Update description using patch +metadata.patch_description( + entity=Table, + source=table, + description="Primary customer table containing all customer PII data" +) +``` + + + +```java +import javax.json.Json; +import javax.json.JsonPatch; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Create JSON Patch +JsonPatch patch = Json.createPatchBuilder() + .replace("/description", "Primary customer table containing all customer PII data") + .build(); + +Table updated = tablesApi.patchTable( + UUID.fromString("550e8400-e29b-41d4-a716-446655440000"), + patch +); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Primary customer table containing all customer PII data"} + ]' +``` + + + +### Update Owner + + + +```python +from metadata.generated.schema.entity.data.table import Table +from metadata.generated.schema.type.entityReference import EntityReference + +# Get table +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") + +# Set owner +owner = EntityReference(id="user-uuid", type="user") +metadata.patch_owner(entity=Table, source=table, owner=owner) +``` + + + +```bash +# Set owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owner", "value": {"id": "user-uuid", "type": "user"}} + ]' + +# Remove owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "remove", "path": "/owner"} + ]' +``` + + + +### Add Tags + + + +```python +from metadata.generated.schema.type.tagLabel import TagLabel, TagSource, State, LabelType + +# Get table +table = metadata.get_by_name(entity=Table, fqn="mysql_prod.analytics.public.customers") + +# Add tag +tag = TagLabel( + tagFQN="PII.Sensitive", + source=TagSource.Classification, + state=State.Confirmed, + labelType=LabelType.Manual +) +metadata.patch_tag(entity=Table, source=table, tag_label=tag) +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/tags/-", + "value": { + "tagFQN": "PII.Sensitive", + "source": "Classification", + "state": "Confirmed", + "labelType": "Manual" + } + } + ]' +``` + + + +### Update Column Description + + + +```python +from metadata.generated.schema.entity.data.table import Table + +# Update a specific column's description +table = metadata.get_by_name( + entity=Table, + fqn="mysql_prod.analytics.public.customers", + fields=["columns"] +) + +# Find the column index +for i, col in enumerate(table.columns): + if col.name == "email": + metadata.patch_column_description( + table=table, + column_fqn=f"{table.fullyQualifiedName}.email", + description="Customer primary email address - PII" + ) + break +``` + + + +```bash +# Update column at index 1 +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/columns/1/description", "value": "Customer primary email address - PII"} + ]' +``` + + + +### Add Column Tags + + + +```bash +# Add tag to column at index 1 +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/columns/1/tags/-", + "value": { + "tagFQN": "PII.Sensitive", + "source": "Classification" + } + } + ]' +``` + + + +### Multiple Updates in One Request + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Updated description"}, + {"op": "add", "path": "/owner", "value": {"id": "user-uuid", "type": "user"}}, + {"op": "add", "path": "/tags/-", "value": {"tagFQN": "Tier.Tier1", "source": "Classification"}} + ]' +``` + + + +## Response + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customers", + "fullyQualifiedName": "mysql_prod.analytics.public.customers", + "description": "Updated description", + "version": 0.5, + "updatedAt": 1704153600000, + "updatedBy": "admin", + "changeDescription": { + "fieldsAdded": [...], + "fieldsUpdated": [...], + "fieldsDeleted": [...] + }, + ... +} +``` + +## Versioning + +Each update increments the entity version: +- Minor changes (description, tags): +0.1 +- Major changes (columns, schema): +1.0 + +View version history: + +```bash +# Get all versions +curl -X GET "https://your-company.getcollate.io/api/v1/tables/550e8400.../versions" \ + -H "Authorization: Bearer $TOKEN" + +# Get specific version +curl -X GET "https://your-company.getcollate.io/api/v1/tables/550e8400.../versions/0.3" \ + -H "Authorization: Bearer $TOKEN" +``` + +## Errors + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid patch operation or value | +| `401` | `UNAUTHORIZED` | Invalid or missing token | +| `403` | `FORBIDDEN` | User lacks permission to update | +| `404` | `ENTITY_NOT_FOUND` | Table not found | +| `412` | `PRECONDITION_FAILED` | Version conflict (use latest version) | diff --git a/api-reference/data-assets/topics/create.mdx b/api-reference/data-assets/topics/create.mdx new file mode 100644 index 00000000..979f98d1 --- /dev/null +++ b/api-reference/data-assets/topics/create.mdx @@ -0,0 +1,135 @@ +--- +title: Create a Topic +description: Create a new topic within a messaging service +sidebarTitle: Create +--- + +Create a new topic within a messaging service. + +## Endpoint + +``` +PUT /v1/topics +``` + +## Parameters + + + Name of the topic. Must be unique within the parent service. + + + + Name of the parent MessagingService. + + + + Human-readable display name for the topic. + + + + Number of partitions in the topic. + + + + Replication factor for the topic. + + + + Cleanup policies (e.g., "delete", "compact"). + + + + Message schema definition. + + + + Description of the topic in Markdown format. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Topic +from metadata.generated.schema.api.data.createTopic import CreateTopicRequest + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +request = CreateTopicRequest( + name="orders.created", + displayName="Order Created Events", + service="kafka_prod", + partitions=12, + replicationFactor=3, + cleanupPolicies=["delete"], + description="Events published when a new order is created in the system" +) + +topic = Topic.create(request) +print(f"Created: {topic.fullyQualifiedName}") +``` + +```java Java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.Topics; +import org.openmetadata.schema.api.data.CreateTopicRequest; + +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +Topics.setDefaultClient(client); + +CreateTopicRequest request = new CreateTopicRequest() + .withName("orders.created") + .withDisplayName("Order Created Events") + .withService("kafka_prod") + .withPartitions(12) + .withReplicationFactor(3) + .withCleanupPolicies(Arrays.asList("delete")) + .withDescription("Events published when a new order is created in the system"); + +Topic topic = Topics.create(request); +System.out.println("Created: " + topic.getFullyQualifiedName()); +``` + +```bash cURL +curl -X PUT "https://your-company.getcollate.io/api/v1/topics" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "orders.created", + "displayName": "Order Created Events", + "service": "kafka_prod", + "partitions": 12, + "replicationFactor": 3, + "cleanupPolicies": ["delete"], + "description": "Events published when a new order is created" + }' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "orders.created", + "fullyQualifiedName": "kafka_prod.orders.created", + "displayName": "Order Created Events", + "description": "Events published when a new order is created in the system", + "service": { + "id": "service-uuid", + "type": "messagingService", + "name": "kafka_prod" + }, + "partitions": 12, + "replicationFactor": 3, + "cleanupPolicies": ["delete"], + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/topics/delete.mdx b/api-reference/data-assets/topics/delete.mdx new file mode 100644 index 00000000..99543e72 --- /dev/null +++ b/api-reference/data-assets/topics/delete.mdx @@ -0,0 +1,78 @@ +--- +title: Delete a Topic +description: Delete a topic. Use hardDelete=true to permanently remove +sidebarTitle: Delete +--- + +Delete a topic. Use `hardDelete=true` to permanently remove. + +## Endpoint + +``` +DELETE /v1/topics/{id} +``` + +## Path Parameters + + + Unique identifier of the topic. + + +## Query Parameters + + + Set to `true` to permanently delete (cannot be restored). + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Topic + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +Topic.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete +Topic.delete( + "550e8400-e29b-41d4-a716-446655440000", + hard_delete=True +) +``` + +```java Java +import org.openmetadata.sdk.entities.Topics; + +// Soft delete +Topics.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete +Topics.delete("550e8400-e29b-41d4-a716-446655440000", false, true); +``` + +```bash cURL +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/topics/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/topics/{id}?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "orders.created", + "fullyQualifiedName": "kafka_prod.orders.created", + "deleted": true, + "version": 0.3 +} +``` + diff --git a/api-reference/data-assets/topics/index.mdx b/api-reference/data-assets/topics/index.mdx new file mode 100644 index 00000000..ce389f7b --- /dev/null +++ b/api-reference/data-assets/topics/index.mdx @@ -0,0 +1,55 @@ +--- +title: Topics +description: Create and manage messaging topic entities +sidebarTitle: Topics +mode: "wide" +--- + +**Topics** represent messaging and streaming data assets from platforms like Kafka, Pulsar, and Kinesis. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/topic/). + + +## Entity Hierarchy + +Topics are children of Messaging Services: + +``` +MessagingService +└── Topic (this page) +``` + +## Inheritance + +When you set an **owner** or **domain** on a Messaging Service, it is inherited by all child topics. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/topics` | [Create or update a topic](/api-reference/data-assets/topics/create) | +| `GET` | `/v1/topics` | [List topics](/api-reference/data-assets/topics/list) | +| `GET` | `/v1/topics/{id}` | [Get by ID](/api-reference/data-assets/topics/retrieve) | +| `GET` | `/v1/topics/name/{fqn}` | [Get by fully qualified name](/api-reference/data-assets/topics/retrieve) | +| `PATCH` | `/v1/topics/{id}` | [Update a topic](/api-reference/data-assets/topics/update) | +| `DELETE` | `/v1/topics/{id}` | [Delete a topic](/api-reference/data-assets/topics/delete) | +| `PUT` | `/v1/topics/{id}/sampleData` | Add sample data | + +--- + +## Related + + + + View topic object attributes + + + Configure messaging service connections + + + Track streaming data lineage + + diff --git a/api-reference/data-assets/topics/list.mdx b/api-reference/data-assets/topics/list.mdx new file mode 100644 index 00000000..c8010492 --- /dev/null +++ b/api-reference/data-assets/topics/list.mdx @@ -0,0 +1,124 @@ +--- +title: List Topics +description: List all topics with optional filtering and pagination +sidebarTitle: List +--- + +List all topics with optional filtering and pagination. + +## Endpoint + +``` +GET /v1/topics +``` + +## Query Parameters + + + Filter by service name. + + + + Maximum number of results to return (max: 1000000). + + + + Cursor for backward pagination. + + + + Cursor for forward pagination. + + + + Comma-separated list of fields to include. + + + + Include `all`, `deleted`, or `non-deleted` entities. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Topic + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all topics with auto-pagination +for topic in Topic.list().auto_paging_iterable(): + print(f"{topic.fullyQualifiedName}: {topic.partitions} partitions") + +# Filter by service +for topic in Topic.list(service="kafka_prod").auto_paging_iterable(): + print(f"{topic.name}") +``` + +```java Java +import org.openmetadata.sdk.entities.Topics; + +// List with auto-pagination +for (Topic topic : Topics.list().autoPagingIterable()) { + System.out.println(topic.getFullyQualifiedName() + ": " + topic.getPartitions() + " partitions"); +} + +// Filter by service +for (Topic topic : Topics.list().service("kafka_prod").autoPagingIterable()) { + System.out.println(topic.getName()); +} +``` + +```bash cURL +# List all topics +curl "https://your-company.getcollate.io/api/v1/topics?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Filter by service +curl "https://your-company.getcollate.io/api/v1/topics?service=kafka_prod&limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Include schema +curl "https://your-company.getcollate.io/api/v1/topics?fields=messageSchema,owners,tags&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "data": [ + { + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "orders.created", + "fullyQualifiedName": "kafka_prod.orders.created", + "displayName": "Order Created Events", + "partitions": 12, + "service": { + "id": "service-uuid", + "type": "messagingService", + "name": "kafka_prod" + } + }, + { + "id": "660e8400-e29b-41d4-a716-446655440001", + "name": "orders.updated", + "fullyQualifiedName": "kafka_prod.orders.updated", + "displayName": "Order Updated Events", + "partitions": 12, + "service": { + "id": "service-uuid", + "type": "messagingService", + "name": "kafka_prod" + } + } + ], + "paging": { + "after": "cursor-string", + "total": 38 + } +} +``` + diff --git a/api-reference/data-assets/topics/object.mdx b/api-reference/data-assets/topics/object.mdx new file mode 100644 index 00000000..bd1ac260 --- /dev/null +++ b/api-reference/data-assets/topics/object.mdx @@ -0,0 +1,108 @@ +--- +title: The Topic Object +description: Attributes of the topic entity +sidebarTitle: The Topic Object +mode: "wide" +--- + + + Unique identifier for the topic. + + + + Name of the topic. Must be unique within the parent service. + + + + Fully qualified name in format `{service}.{topic}`. + + + + Human-readable display name for the topic. + + + + Description of the topic in Markdown format. + + + + Number of partitions in the topic. + + + + Replication factor for the topic. + + + + Retention size in bytes. + + + + Maximum message size in bytes. + + + + Cleanup policies (e.g., "delete", "compact"). + + + + Message schema (Avro, JSON, Protobuf). + + + + Reference to the parent MessagingService. + + + + Owners of the topic (users or teams). + + + + Domain this topic belongs to. + + + + Tags and classifications applied to this topic. + + + + Entity version number, incremented on updates. + + + + Whether the topic has been soft-deleted. + + + +```json The Topic Object +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "orders.created", + "fullyQualifiedName": "kafka_prod.orders.created", + "displayName": "Order Created Events", + "description": "Events published when a new order is created", + "service": { + "id": "service-uuid", + "type": "messagingService", + "name": "kafka_prod" + }, + "partitions": 12, + "replicationFactor": 3, + "retentionSize": 1073741824, + "maximumMessageSize": 1048576, + "cleanupPolicies": ["delete"], + "messageSchema": { + "schemaType": "Avro", + "schemaText": "..." + }, + "owners": [ + { + "id": "team-uuid", + "type": "team" + } + ], + "version": 0.1, + "deleted": false +} +``` + diff --git a/api-reference/data-assets/topics/retrieve.mdx b/api-reference/data-assets/topics/retrieve.mdx new file mode 100644 index 00000000..0acbd22d --- /dev/null +++ b/api-reference/data-assets/topics/retrieve.mdx @@ -0,0 +1,118 @@ +--- +title: Retrieve a Topic +description: Retrieve a topic by ID or fully qualified name +sidebarTitle: Retrieve +--- + +Retrieve a topic by ID or fully qualified name. + +## Endpoints + +``` +GET /v1/topics/{id} +GET /v1/topics/name/{fqn} +``` + +## Path Parameters + + + Unique identifier of the topic. + + + + Fully qualified name of the topic (e.g., `kafka_prod.orders.created`). + + +## Query Parameters + + + Comma-separated list of fields to include. Options: `messageSchema`, `owners`, `tags`, `domain`, `sampleData`, `usageSummary`. + + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Topic + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +topic = Topic.retrieve_by_name( + "kafka_prod.orders.created", + fields=["messageSchema", "owners", "tags", "sampleData"] +) + +# By ID +topic = Topic.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {topic.displayName}") +print(f"Partitions: {topic.partitions}") +print(f"Replication: {topic.replicationFactor}") +if topic.messageSchema: + print(f"Schema Type: {topic.messageSchema.schemaType}") +``` + +```java Java +import org.openmetadata.sdk.entities.Topics; + +// By name +Topic topic = Topics.retrieveByName( + "kafka_prod.orders.created", + List.of("messageSchema", "owners", "tags", "sampleData") +); + +// By ID +Topic topic = Topics.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + topic.getDisplayName()); +System.out.println("Partitions: " + topic.getPartitions()); +System.out.println("Replication: " + topic.getReplicationFactor()); +if (topic.getMessageSchema() != null) { + System.out.println("Schema Type: " + topic.getMessageSchema().getSchemaType()); +} +``` + +```bash cURL +# By FQN +curl "https://your-company.getcollate.io/api/v1/topics/name/kafka_prod.orders.created?fields=messageSchema,owners,tags" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/topics/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "orders.created", + "fullyQualifiedName": "kafka_prod.orders.created", + "displayName": "Order Created Events", + "description": "Events published when a new order is created", + "service": { + "id": "service-uuid", + "type": "messagingService", + "name": "kafka_prod" + }, + "partitions": 12, + "replicationFactor": 3, + "messageSchema": { + "schemaType": "Avro", + "schemaText": "{\"type\":\"record\",\"name\":\"Order\",\"fields\":[{\"name\":\"order_id\",\"type\":\"string\"}]}" + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "streaming-team" + } + ], + "version": 0.1 +} +``` + diff --git a/api-reference/data-assets/topics/update.mdx b/api-reference/data-assets/topics/update.mdx new file mode 100644 index 00000000..ac45c91a --- /dev/null +++ b/api-reference/data-assets/topics/update.mdx @@ -0,0 +1,114 @@ +--- +title: Update a Topic +description: Update a topic using JSON Patch operations +sidebarTitle: Update +--- + +Update a topic using JSON Patch operations. + +## Endpoint + +``` +PATCH /v1/topics/{id} +``` + +## Path Parameters + + + Unique identifier of the topic. + + +## Request Body + +JSON Patch document following [RFC 6902](https://tools.ietf.org/html/rfc6902). + + +```python Python +from metadata.sdk import configure +from metadata.sdk.entities import Topic, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Update description +topic = Topic.retrieve_by_name("kafka_prod.orders.created") +topic.description = "Real-time order creation events for downstream processing" +updated = Topic.update(topic.id, topic) + +# Set owner +team = Team.retrieve_by_name("streaming-team") +topic.owners = [{"id": str(team.id), "type": "team"}] +updated = Topic.update(topic.id, topic) + +# Update partitions +topic.partitions = 24 # Scale up partitions +updated = Topic.update(topic.id, topic) +``` + +```java Java +import org.openmetadata.sdk.entities.Topics; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +Topic topic = Topics.retrieveByName("kafka_prod.orders.created"); +topic.setDescription("Real-time order creation events for downstream processing"); +Topic updated = Topics.update(topic.getId(), topic); + +// Set owner +Team team = Teams.retrieveByName("streaming-team"); +topic.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +updated = Topics.update(topic.getId(), topic); +``` + +```bash cURL +# Update description +curl -X PATCH "https://your-company.getcollate.io/api/v1/topics/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Real-time order creation events"} + ]' + +# Set owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/topics/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' + +# Update partitions +curl -X PATCH "https://your-company.getcollate.io/api/v1/topics/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/partitions", "value": 24} + ]' +``` + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "orders.created", + "fullyQualifiedName": "kafka_prod.orders.created", + "description": "Real-time order creation events for downstream processing", + "partitions": 24, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "streaming-team" + } + ], + "version": 0.2 +} +``` + diff --git a/api-reference/data-quality/test-cases/index.mdx b/api-reference/data-quality/test-cases/index.mdx new file mode 100644 index 00000000..a04e12d7 --- /dev/null +++ b/api-reference/data-quality/test-cases/index.mdx @@ -0,0 +1,201 @@ +--- +title: Test Cases +description: Create, run, and manage data quality test cases +sidebarTitle: Test Cases +mode: "wide" +--- + +# Test Cases + +Test cases define specific data quality checks to run against tables and columns. + +## The Test Case Object + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customer_id_not_null", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_customer.customer_id_not_null", + "description": "Verify customer_id column has no null values", + "testDefinition": { + "id": "definition-uuid", + "type": "testDefinition", + "name": "columnValuesToBeNotNull" + }, + "entityLink": "<#E::table::sample_data.ecommerce_db.shopify.dim_customer::columns::customer_id>", + "testSuite": { + "id": "suite-uuid", + "type": "testSuite", + "name": "sample_data.ecommerce_db.shopify.dim_customer.testSuite" + }, + "parameterValues": [], + "testCaseResult": { + "timestamp": 1704067200000, + "testCaseStatus": "Success", + "result": "All 10000 values are not null" + } +} +``` + +## Create Test Case + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.api.tests.createTestCase import CreateTestCaseRequest +from metadata.generated.schema.tests.testCase import TestCaseParameterValue + +metadata = OpenMetadata(config) + +# Create a column not null test +test_case = CreateTestCaseRequest( + name="customer_id_not_null", + description="Verify customer_id has no null values", + entityLink="<#E::table::sample_data.ecommerce_db.shopify.dim_customer::columns::customer_id>", + testSuite="sample_data.ecommerce_db.shopify.dim_customer.testSuite", + testDefinition="columnValuesToBeNotNull", + parameterValues=[] +) + +result = metadata.create_or_update(data=test_case) +print(f"Created test case: {result.fullyQualifiedName}") +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/dataQuality/testCases" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "customer_id_not_null", + "description": "Verify customer_id has no null values", + "entityLink": "<#E::table::sample_data.ecommerce_db.shopify.dim_customer::columns::customer_id>", + "testSuite": "sample_data.ecommerce_db.shopify.dim_customer.testSuite", + "testDefinition": "columnValuesToBeNotNull", + "parameterValues": [] + }' +``` + + + +## Common Test Definitions + +### Column Tests + +| Test Definition | Description | Parameters | +|-----------------|-------------|------------| +| `columnValuesToBeNotNull` | Check column has no nulls | - | +| `columnValuesToBeUnique` | Check column values are unique | - | +| `columnValuesToBeBetween` | Check values in range | `minValue`, `maxValue` | +| `columnValuesToMatchRegex` | Check values match pattern | `regex` | +| `columnValueLengthsToBeBetween` | Check string lengths | `minLength`, `maxLength` | + +### Table Tests + +| Test Definition | Description | Parameters | +|-----------------|-------------|------------| +| `tableRowCountToEqual` | Check exact row count | `value` | +| `tableRowCountToBeBetween` | Check row count in range | `minValue`, `maxValue` | +| `tableColumnCountToEqual` | Check column count | `columnCount` | + +### Examples with Parameters + + + +```bash +# Column values between range +curl -X PUT "https://your-company.getcollate.io/api/v1/dataQuality/testCases" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "age_in_valid_range", + "entityLink": "<#E::table::sample_data.ecommerce_db.shopify.dim_customer::columns::age>", + "testSuite": "sample_data.ecommerce_db.shopify.dim_customer.testSuite", + "testDefinition": "columnValuesToBeBetween", + "parameterValues": [ + {"name": "minValue", "value": "0"}, + {"name": "maxValue", "value": "120"} + ] + }' + +# Email format validation +curl -X PUT "https://your-company.getcollate.io/api/v1/dataQuality/testCases" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "email_format_valid", + "entityLink": "<#E::table::sample_data.ecommerce_db.shopify.dim_customer::columns::email>", + "testSuite": "sample_data.ecommerce_db.shopify.dim_customer.testSuite", + "testDefinition": "columnValuesToMatchRegex", + "parameterValues": [ + {"name": "regex", "value": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"} + ] + }' +``` + + + +## Entity Link Format + +Entity links identify the target table or column: + +| Target | Format | +|--------|--------| +| Table | `<#E::table::fqn>` | +| Column | `<#E::table::fqn::columns::column_name>` | + +## Get Test Case Results + + + +```bash +# Get test case with latest result +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testCases/name/{fqn}?fields=testCaseResult" \ + -H "Authorization: Bearer $TOKEN" + +# Get result history +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testCases/{id}/testCaseResult?startTs=1704067200000&endTs=1704153600000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## List Test Cases + + + +```bash +# List all test cases +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testCases?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# List test cases for a table +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testCases?entityLink=<#E::table::sample_data.ecommerce_db.shopify.dim_customer>&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Test Case Results + +Result statuses: + +| Status | Description | +|--------|-------------| +| `Success` | Test passed | +| `Failed` | Test failed | +| `Aborted` | Test was aborted | +| `Queued` | Test is queued for execution | + +## Delete Test Case + + + +```bash +curl -X DELETE "https://your-company.getcollate.io/api/v1/dataQuality/testCases/{id}" \ + -H "Authorization: Bearer $TOKEN" +``` + + diff --git a/api-reference/data-quality/test-suites/index.mdx b/api-reference/data-quality/test-suites/index.mdx new file mode 100644 index 00000000..099dfebc --- /dev/null +++ b/api-reference/data-quality/test-suites/index.mdx @@ -0,0 +1,120 @@ +--- +title: Test Suites +description: Create and manage data quality test suites +sidebarTitle: Test Suites +mode: "wide" +--- + +# Test Suites + +Test suites group related data quality test cases for a table. Each table can have a test suite containing multiple test cases. + +## The Test Suite Object + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sample_data.ecommerce_db.shopify.dim_customer.testSuite", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_customer.testSuite", + "description": "Data quality tests for customer dimension table", + "executableEntityReference": { + "id": "table-uuid", + "type": "table", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_customer" + }, + "executable": true, + "deleted": false +} +``` + +## Create Test Suite + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.api.tests.createTestSuite import CreateTestSuiteRequest +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Get the table +table = metadata.get_by_name(entity=Table, fqn="sample_data.ecommerce_db.shopify.dim_customer") + +# Create test suite +test_suite_request = CreateTestSuiteRequest( + name=f"{table.fullyQualifiedName}.testSuite", + description="Data quality tests for customer table", + executableEntityReference=table.fullyQualifiedName +) + +test_suite = metadata.create_or_update(data=test_suite_request) +print(f"Created test suite: {test_suite.fullyQualifiedName}") +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/dataQuality/testSuites" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "sample_data.ecommerce_db.shopify.dim_customer.testSuite", + "description": "Data quality tests for customer table", + "executableEntityReference": "sample_data.ecommerce_db.shopify.dim_customer" + }' +``` + + + +## Get Test Suite + + + +```bash +# Get by ID +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testSuites/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Get by FQN +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testSuites/name/sample_data.ecommerce_db.shopify.dim_customer.testSuite" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## List Test Suites + + + +```bash +# List all test suites +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testSuites?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Get test suite for a specific table +curl -X GET "https://your-company.getcollate.io/api/v1/dataQuality/testSuites/executables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Delete Test Suite + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/dataQuality/testSuites/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete (removes all test cases) +curl -X DELETE "https://your-company.getcollate.io/api/v1/dataQuality/testSuites/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + + + Learn how to create and run test cases within a test suite + diff --git a/api-reference/errors.mdx b/api-reference/errors.mdx new file mode 100644 index 00000000..12323159 --- /dev/null +++ b/api-reference/errors.mdx @@ -0,0 +1,316 @@ +--- +title: Errors +description: Understanding and handling Collate API errors +sidebarTitle: Errors +mode: "wide" +--- + +# Errors + +The Collate API uses conventional HTTP response codes to indicate the success or failure of an API request. Error responses include a JSON body with details about what went wrong. + +## Error Response Format + +All error responses follow this structure: + +```json +{ + "code": 404, + "errorType": "ENTITY_NOT_FOUND", + "message": "Table with name [prod.analytics.customers] not found" +} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `code` | integer | HTTP status code | +| `errorType` | string | Machine-readable error type | +| `message` | string | Human-readable error description | + +## HTTP Status Codes + +### Success Codes (2xx) + +| Code | Description | +|------|-------------| +| `200 OK` | Request succeeded | +| `201 Created` | Resource created successfully | +| `202 Accepted` | Request accepted for async processing | +| `204 No Content` | Success with no response body | + +### Client Error Codes (4xx) + +| Code | Error Type | Description | +|------|-----------|-------------| +| `400` | `BAD_REQUEST` | Invalid request parameters or malformed JSON | +| `401` | `UNAUTHORIZED` | Missing or invalid authentication token | +| `403` | `FORBIDDEN` | Valid token but insufficient permissions | +| `404` | `ENTITY_NOT_FOUND` | Requested resource doesn't exist | +| `409` | `ENTITY_ALREADY_EXISTS` | Resource with same identifier exists | +| `409` | `ENTITY_LOCKED` | Entity is locked during deletion | +| `412` | `PRECONDITION_FAILED` | ETag mismatch on conditional update | +| `413` | `BULK_LIMIT_EXCEPTION` | Request payload exceeds size limits | +| `429` | `LIMITS_EXCEPTION` | Rate limit exceeded | + +### Server Error Codes (5xx) + +| Code | Error Type | Description | +|------|-----------|-------------| +| `500` | `INTERNAL_ERROR` | Unexpected server error | + +## Error Types Reference + +### BAD_REQUEST + +Returned when the request is malformed or contains invalid parameters. + +```json +{ + "code": 400, + "errorType": "BAD_REQUEST", + "message": "Invalid value for parameter 'limit': must be between 1 and 1000" +} +``` + +**Common causes:** +- Missing required fields +- Invalid field values or types +- Malformed JSON body +- Invalid query parameters + +### ENTITY_NOT_FOUND + +Returned when the requested resource doesn't exist. + +```json +{ + "code": 404, + "errorType": "ENTITY_NOT_FOUND", + "message": "Entity with id [550e8400-e29b-41d4-a716-446655440000] not found" +} +``` + +**Variants:** +- `Entity with id [] not found` +- `Entity with name [] not found` +- `Entity not found for query params []` + +### ENTITY_ALREADY_EXISTS + +Returned when attempting to create a resource that already exists. + +```json +{ + "code": 409, + "errorType": "ENTITY_ALREADY_EXISTS", + "message": "Entity already exists" +} +``` + +**Resolution:** Use PUT for upsert operations or verify the entity doesn't exist before creating. + +### UNAUTHORIZED + +Returned when authentication fails. + +```json +{ + "code": 401, + "errorType": "UNAUTHORIZED", + "message": "Token has expired" +} +``` + +**Common causes:** +- Missing Authorization header +- Invalid or malformed token +- Expired token +- Token used after logout + +### FORBIDDEN + +Returned when the authenticated user lacks permission for the operation. + +```json +{ + "code": 403, + "errorType": "FORBIDDEN", + "message": "User does not have permission to delete table" +} +``` + +**Resolution:** Ensure the user or bot has the required role/policy for the operation. + +### PRECONDITION_FAILED + +Returned when a conditional update fails due to version mismatch. + +```json +{ + "code": 412, + "errorType": "PRECONDITION_FAILED", + "message": "Entity version mismatch" +} +``` + +**Resolution:** Fetch the latest version and retry with the updated ETag. + +### LIMITS_EXCEPTION + +Returned when rate limits are exceeded. + +```json +{ + "code": 429, + "errorType": "LIMITS_EXCEPTION", + "message": "Rate limit exceeded. Please retry after 60 seconds" +} +``` + +**Resolution:** Implement exponential backoff and retry logic. + +## Handling Errors + +### Python SDK + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +try: + # Attempt to get a table + table = metadata.get_by_name(entity=Table, fqn="prod.analytics.customers") +except Exception as e: + if "404" in str(e): + print("Table not found") + elif "401" in str(e): + print("Authentication failed - check your token") + elif "403" in str(e): + print("Permission denied") + else: + print(f"API error: {e}") +``` + +### Java SDK + +```java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.client.ApiException; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +try { + Table table = tablesApi.getTableByFQN("prod.analytics.customers", null, null); +} catch (ApiException e) { + switch (e.getCode()) { + case 404: + System.out.println("Table not found"); + break; + case 401: + System.out.println("Authentication failed"); + break; + case 403: + System.out.println("Permission denied"); + break; + default: + System.out.println("API error: " + e.getMessage()); + } +} +``` + +### HTTP (cURL with error handling) + +```bash +response=$(curl -s -w "\n%{http_code}" \ + -X GET "https://your-company.getcollate.io/api/v1/tables/name/prod.analytics.customers" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json") + +http_code=$(echo "$response" | tail -n1) +body=$(echo "$response" | sed '$d') + +case $http_code in + 200) + echo "Success: $body" + ;; + 404) + echo "Table not found" + ;; + 401) + echo "Authentication failed" + ;; + 403) + echo "Permission denied" + ;; + *) + echo "Error $http_code: $body" + ;; +esac +``` + +## Validation Errors + +When request validation fails, the error message includes details about which field(s) failed: + +```json +{ + "code": 400, + "errorType": "BAD_REQUEST", + "message": "query param limit must be greater than 0" +} +``` + +For JSON body validation errors: + +```json +{ + "code": 400, + "errorType": "BAD_REQUEST", + "message": "name is required and cannot be empty" +} +``` + +## Retry Logic + +For transient errors (5xx, 429), implement retry with exponential backoff: + +```python +import time +import random + +def api_call_with_retry(func, max_retries=3): + for attempt in range(max_retries): + try: + return func() + except Exception as e: + if "429" in str(e) or "500" in str(e): + # Exponential backoff with jitter + wait_time = (2 ** attempt) + random.uniform(0, 1) + print(f"Retrying in {wait_time:.2f}s...") + time.sleep(wait_time) + else: + raise + raise Exception("Max retries exceeded") +``` + +## Debugging Tips + + + + The `message` field usually contains specific details about what went wrong. + + + Ensure your token is valid and not expired. Try generating a new token. + + + For 403 errors, verify the user/bot has the required role for the operation. + + + For 400 errors, check that all required fields are present and properly formatted. + + + For 404 errors, verify the resource exists and the FQN/ID is correct. + + diff --git a/api-reference/governance/index.mdx b/api-reference/governance/index.mdx new file mode 100644 index 00000000..498517f3 --- /dev/null +++ b/api-reference/governance/index.mdx @@ -0,0 +1,190 @@ +--- +title: Governance +description: Manage glossaries, classifications, domains, and policies +sidebarTitle: Overview +mode: "wide" +--- + +# Data Governance + +The Governance APIs help you manage organizational structures for data governance including glossaries, classifications, domains, and access policies. + +## Governance Resources + + + + Business glossaries with term definitions + + + Tag taxonomies for data classification + + + Organizational domains for data ownership + + + Access control and governance policies + + + +## Glossaries + +Glossaries contain business terms that provide consistent definitions across your organization. + + + +```bash +# Create a glossary +curl -X PUT "https://your-company.getcollate.io/api/v1/glossaries" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Business Glossary", + "displayName": "Business Glossary", + "description": "Standard business terminology" + }' + +# Create a glossary term +curl -X PUT "https://your-company.getcollate.io/api/v1/glossaryTerms" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Customer ID", + "glossary": "Business Glossary", + "description": "Unique identifier for a customer account", + "synonyms": ["customer_id", "cust_id", "client_id"] + }' + +# List glossaries +curl -X GET "https://your-company.getcollate.io/api/v1/glossaries?limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Classifications + +Classifications are tag taxonomies for categorizing data. + + + +```bash +# Create a classification +curl -X PUT "https://your-company.getcollate.io/api/v1/classifications" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "DataSensitivity", + "description": "Data sensitivity classification" + }' + +# Create a tag within classification +curl -X PUT "https://your-company.getcollate.io/api/v1/tags" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Confidential", + "classification": "DataSensitivity", + "description": "Confidential data - internal use only" + }' + +# List classifications +curl -X GET "https://your-company.getcollate.io/api/v1/classifications?limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Domains + +Domains represent organizational areas that own data assets. + + + +```bash +# Create a domain +curl -X PUT "https://your-company.getcollate.io/api/v1/domains" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Sales", + "displayName": "Sales Domain", + "description": "Sales and revenue data", + "domainType": "Consumer", + "experts": [{"id": "user-uuid", "type": "user"}] + }' + +# Add entity to domain +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/domain", + "value": {"id": "domain-uuid", "type": "domain"} + } + ]' +``` + + + +## Data Products + +Data products are curated datasets published by domains. + + + +```bash +# Create a data product +curl -X PUT "https://your-company.getcollate.io/api/v1/dataProducts" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Customer 360", + "displayName": "Customer 360 Dataset", + "description": "Unified customer view combining CRM, transactions, and support data", + "domain": "Sales", + "owner": {"id": "team-uuid", "type": "team"}, + "assets": [ + {"id": "table-uuid-1", "type": "table"}, + {"id": "table-uuid-2", "type": "table"} + ] + }' +``` + + + +## Policies and Roles + +Manage access control: + + + +```bash +# List roles +curl -X GET "https://your-company.getcollate.io/api/v1/roles?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# List policies +curl -X GET "https://your-company.getcollate.io/api/v1/policies?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Get policy details +curl -X GET "https://your-company.getcollate.io/api/v1/policies/name/DataStewardPolicy" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Endpoints Summary + +| Resource | Endpoints | +|----------|-----------| +| Glossaries | `/v1/glossaries` | +| Glossary Terms | `/v1/glossaryTerms` | +| Classifications | `/v1/classifications` | +| Tags | `/v1/tags` | +| Domains | `/v1/domains` | +| Data Products | `/v1/dataProducts` | +| Roles | `/v1/roles` | +| Policies | `/v1/policies` | diff --git a/api-reference/index.mdx b/api-reference/index.mdx new file mode 100644 index 00000000..1d436699 --- /dev/null +++ b/api-reference/index.mdx @@ -0,0 +1,201 @@ +--- +title: API Reference +description: Programmatic access to your Collate data catalog +sidebarTitle: Introduction +mode: "wide" +--- + +import { CodeLayout } from '/snippets/components/CodeLayout/CodeLayout.jsx' + +# Collate API Reference + +The Collate API provides programmatic access to all metadata in your data catalog. Build integrations, automate workflows, and manage your data assets using our REST API or native SDKs. + + + +```javascript Base URL +https://{your-company}.getcollate.io/api/v1 +``` + + + + + +```javascript Base URL +https://your-host.com/api/v1 +``` + + + + + +```python Python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import ( + OpenMetadataConnection, +) +from metadata.generated.schema.security.client.openMetadataJWTClientConfig import ( + OpenMetadataJWTClientConfig, +) + +# Configure connection +server_config = OpenMetadataConnection( + hostPort="https://your-company.getcollate.io/api", + authProvider="openmetadata", + securityConfig=OpenMetadataJWTClientConfig( + jwtToken="your-jwt-token" + ), +) + +# Create client +metadata = OpenMetadata(server_config) + +# List tables +tables = metadata.list_all_entities(entity=Table) +for table in tables: + print(f"{table.fullyQualifiedName}: {table.description}") +``` + +```java Java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.client.gateway.OpenMetadata; +import org.openmetadata.client.model.Table; +import org.openmetadata.schema.services.connections.metadata.OpenMetadataConnection; + +// Configure connection +OpenMetadataConnection config = new OpenMetadataConnection(); +config.setHostPort("https://your-company.getcollate.io/api"); +config.setAuthProvider(AuthProvider.OPENMETADATA); + +OpenMetadataJWTClientConfig jwtConfig = new OpenMetadataJWTClientConfig(); +jwtConfig.setJwtToken("your-jwt-token"); +config.setSecurityConfig(jwtConfig); + +// Create client +OpenMetadata client = new OpenMetadata(config); +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// List tables +TableList tables = tablesApi.listTables(null, 10, null, null, null); +for (Table table : tables.getData()) { + System.out.println(table.getFullyQualifiedName()); +} +``` + +```bash HTTP +curl -X GET "https://your-company.getcollate.io/api/v1/tables?limit=10" \ + -H "Authorization: Bearer your-jwt-token" \ + -H "Content-Type: application/json" +``` + + + +All API requests require authentication using a JWT Bearer token. You can obtain a token by:

1. Bot Token: Go to Settings > Bots in the Collate UI to create a service account
2. Personal Access Token: Go to your Profile > Access Tokens to generate a personal token} +> + +```bash Authorization +Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9... +``` + +
+ + + + Learn more about authentication methods and token management + + + +## SDKs + +We provide official SDKs for Python and Java: + + + + Install with `pip install openmetadata-ingestion` + + + Available via Maven Central + + + +## Core Resources + + + + Create, retrieve, update, and manage table metadata + + + Manage database entities and their schemas + + + Configure database, dashboard, and pipeline services + + + Track data flow and dependencies across assets + + + + + +```json Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "customers", + "fullyQualifiedName": "mysql_prod.analytics.public.customers", + "description": "Customer master data", + "version": 0.1, + "updatedAt": 1704067200000, + "updatedBy": "admin" +} +``` + + + +## Error Handling + +The API uses conventional HTTP response codes: + +| Code | Description | +|------|-------------| +| `200` | Success | +| `201` | Created | +| `400` | Bad Request - Invalid parameters | +| `401` | Unauthorized - Invalid or missing token | +| `403` | Forbidden - Insufficient permissions | +| `404` | Not Found - Resource doesn't exist | +| `409` | Conflict - Resource already exists | +| `500` | Internal Server Error | + + + + Complete guide to API errors and handling + + + +## Rate Limits + +API requests are subject to rate limiting to ensure fair usage. If you exceed the rate limit, you'll receive a `429 Too Many Requests` response. + +## Need Help? + + + + Contact our support team + + + Join the OpenMetadata Slack community + + diff --git a/api-reference/ingestion/index.mdx b/api-reference/ingestion/index.mdx new file mode 100644 index 00000000..2ac0fac5 --- /dev/null +++ b/api-reference/ingestion/index.mdx @@ -0,0 +1,265 @@ +--- +title: Ingestion +description: Manage ingestion pipelines and run metadata extraction +sidebarTitle: Overview +mode: "wide" +--- + +# Ingestion Pipelines + +Ingestion pipelines extract metadata from your data sources. The API allows you to create, run, and monitor pipelines programmatically. + +## The Ingestion Pipeline Object + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "sample_data_metadata", + "displayName": "Sample Data Metadata Ingestion", + "pipelineType": "metadata", + "fullyQualifiedName": "sample_data.sample_data_metadata", + "service": { + "id": "service-uuid", + "type": "databaseService", + "name": "sample_data" + }, + "sourceConfig": { + "config": { + "type": "DatabaseMetadata", + "markDeletedTables": true, + "includeTables": true, + "includeViews": true + } + }, + "airflowConfig": { + "scheduleInterval": "0 */6 * * *" + }, + "deployed": true, + "enabled": true +} +``` + +## Pipeline Types + +| Type | Description | +|------|-------------| +| `metadata` | Extract table/schema metadata | +| `usage` | Extract query usage data | +| `lineage` | Extract lineage from queries | +| `profiler` | Run data profiling | +| `TestSuite` | Run data quality tests | + +## List Ingestion Pipelines + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.services.ingestionPipelines.ingestionPipeline import IngestionPipeline + +metadata = OpenMetadata(config) + +# List all pipelines +for pipeline in metadata.list_all_entities(entity=IngestionPipeline): + print(f"{pipeline.name}: {pipeline.pipelineType} - Enabled: {pipeline.enabled}") +``` + + + +```bash +# List all ingestion pipelines +curl -X GET "https://your-company.getcollate.io/api/v1/services/ingestionPipelines?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# List pipelines for a specific service +curl -X GET "https://your-company.getcollate.io/api/v1/services/ingestionPipelines?service=sample_data&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Create Ingestion Pipeline + + + +```bash +curl -X POST "https://your-company.getcollate.io/api/v1/services/ingestionPipelines" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "mysql_prod_metadata", + "displayName": "MySQL Production Metadata", + "pipelineType": "metadata", + "service": { + "id": "service-uuid", + "type": "databaseService" + }, + "sourceConfig": { + "config": { + "type": "DatabaseMetadata", + "markDeletedTables": true, + "includeTables": true, + "includeViews": true, + "schemaFilterPattern": { + "includes": ["public", "analytics"] + } + } + }, + "airflowConfig": { + "scheduleInterval": "0 0 * * *" + } + }' +``` + + + +## Run Pipeline + +Trigger a pipeline run: + + + +```python +from metadata.generated.schema.entity.services.ingestionPipelines.ingestionPipeline import IngestionPipeline + +# Get the pipeline +pipeline = metadata.get_by_name( + entity=IngestionPipeline, + fqn="sample_data.sample_data_metadata" +) + +# Trigger run +metadata.run_pipeline(pipeline.id) +print("Pipeline triggered") +``` + + + +```bash +# Trigger pipeline run +curl -X POST "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/trigger/{pipeline-id}" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Get Pipeline Status + + + +```bash +# Get pipeline with status +curl -X GET "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/{id}?fields=pipelineStatuses" \ + -H "Authorization: Bearer $TOKEN" + +# Get pipeline run history +curl -X GET "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/{id}/pipelineStatus?startTs=1704067200000&endTs=1704153600000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Pipeline Status Values + +| Status | Description | +|--------|-------------| +| `running` | Pipeline is currently executing | +| `success` | Pipeline completed successfully | +| `failed` | Pipeline failed | +| `partialSuccess` | Pipeline completed with some failures | +| `queued` | Pipeline is queued for execution | + +## Enable/Disable Pipeline + + + +```bash +# Disable pipeline +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/enabled", "value": false} + ]' + +# Enable pipeline +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/enabled", "value": true} + ]' +``` + + + +## Deploy Pipeline + +Deploy a pipeline to the workflow orchestrator: + + + +```bash +curl -X POST "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/deploy/{pipeline-id}" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Delete Pipeline + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/ingestionPipelines/{id}?hardDelete=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Source Config Examples + +### Metadata Ingestion + +```json +{ + "type": "DatabaseMetadata", + "markDeletedTables": true, + "includeTables": true, + "includeViews": true, + "schemaFilterPattern": { + "includes": ["public"], + "excludes": ["temp_.*"] + }, + "tableFilterPattern": { + "includes": [".*"], + "excludes": ["tmp_.*", "backup_.*"] + } +} +``` + +### Profiler + +```json +{ + "type": "Profiler", + "profileSample": 50, + "profileSampleType": "PERCENTAGE", + "generateSampleData": true +} +``` + +### Usage + +```json +{ + "type": "DatabaseUsage", + "queryLogDuration": 7, + "stageFileLocation": "/tmp/usage" +} +``` diff --git a/api-reference/lineage/index.mdx b/api-reference/lineage/index.mdx new file mode 100644 index 00000000..c518fee7 --- /dev/null +++ b/api-reference/lineage/index.mdx @@ -0,0 +1,255 @@ +--- +title: Data Lineage +description: Track and manage data lineage across your data assets +sidebarTitle: Overview +mode: "wide" +--- + +# Data Lineage + +Data lineage tracks how data flows between entities, enabling impact analysis, debugging, and governance. The Collate API allows you to programmatically create, query, and manage lineage relationships. + +## Lineage Concepts + +| Concept | Description | +|---------|-------------| +| **Edge** | A directed connection from source to target entity | +| **Upstream** | Entities that feed into the current entity | +| **Downstream** | Entities that consume from the current entity | +| **Column Lineage** | Fine-grained lineage at the column level | +| **Pipeline** | The process/pipeline that created the lineage | + +## Get Entity Lineage + +Retrieve lineage for an entity: + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Get lineage for a table +lineage = metadata.get_lineage_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.fact_orders", + up_depth=3, # levels of upstream + down_depth=3 # levels of downstream +) + +print(f"Entity: {lineage.entity.name}") +print(f"Upstream nodes: {len(lineage.upstreamEdges or [])}") +print(f"Downstream nodes: {len(lineage.downstreamEdges or [])}") + +# Print upstream entities +for edge in lineage.upstreamEdges or []: + print(f" <- {edge.fromEntity.name}") +``` + + + +```bash +# Get lineage by entity type and FQN +curl -X GET "https://your-company.getcollate.io/api/v1/lineage/table/name/sample_data.ecommerce_db.shopify.fact_orders?upstreamDepth=3&downstreamDepth=3" \ + -H "Authorization: Bearer $TOKEN" + +# Get lineage by entity ID +curl -X GET "https://your-company.getcollate.io/api/v1/lineage/table/{table-id}?upstreamDepth=3&downstreamDepth=3" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Add Lineage Edge + +Create a lineage connection between entities: + + + +```python +from metadata.generated.schema.api.lineage.addLineage import AddLineageRequest +from metadata.generated.schema.type.entityLineage import EntitiesEdge +from metadata.generated.schema.type.entityReference import EntityReference + +# Get source and target tables +source = metadata.get_by_name(entity=Table, fqn="sample_data.ecommerce_db.shopify.raw_orders") +target = metadata.get_by_name(entity=Table, fqn="sample_data.ecommerce_db.shopify.fact_orders") + +# Create lineage edge +lineage_request = AddLineageRequest( + edge=EntitiesEdge( + fromEntity=EntityReference(id=source.id, type="table"), + toEntity=EntityReference(id=target.id, type="table"), + description="ETL transformation from raw to fact table" + ) +) + +metadata.add_lineage(data=lineage_request) +print("Lineage edge created") +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/lineage" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "edge": { + "fromEntity": { + "id": "source-table-uuid", + "type": "table" + }, + "toEntity": { + "id": "target-table-uuid", + "type": "table" + }, + "description": "ETL transformation" + } + }' +``` + + + +## Add Column-Level Lineage + +Track lineage at the column level: + + + +```python +from metadata.generated.schema.type.entityLineage import ColumnLineage + +# Create lineage with column mappings +lineage_request = AddLineageRequest( + edge=EntitiesEdge( + fromEntity=EntityReference(id=source.id, type="table"), + toEntity=EntityReference(id=target.id, type="table"), + lineageDetails=LineageDetails( + columnsLineage=[ + ColumnLineage( + fromColumns=["sample_data.ecommerce_db.shopify.raw_orders.order_id"], + toColumn="sample_data.ecommerce_db.shopify.fact_orders.order_key" + ), + ColumnLineage( + fromColumns=[ + "sample_data.ecommerce_db.shopify.raw_orders.quantity", + "sample_data.ecommerce_db.shopify.raw_orders.unit_price" + ], + toColumn="sample_data.ecommerce_db.shopify.fact_orders.total_amount" + ) + ] + ) + ) +) + +metadata.add_lineage(data=lineage_request) +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/lineage" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "edge": { + "fromEntity": {"id": "source-uuid", "type": "table"}, + "toEntity": {"id": "target-uuid", "type": "table"}, + "lineageDetails": { + "columnsLineage": [ + { + "fromColumns": ["source.schema.table.order_id"], + "toColumn": "target.schema.table.order_key" + }, + { + "fromColumns": ["source.schema.table.quantity", "source.schema.table.unit_price"], + "toColumn": "target.schema.table.total_amount" + } + ] + } + } + }' +``` + + + +## Delete Lineage Edge + +Remove a lineage connection: + + + +```bash +curl -X DELETE "https://your-company.getcollate.io/api/v1/lineage/{fromEntityType}/{fromId}/{toEntityType}/{toId}" \ + -H "Authorization: Bearer $TOKEN" + +# Example: Delete edge from table to table +curl -X DELETE "https://your-company.getcollate.io/api/v1/lineage/table/source-uuid/table/target-uuid" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Lineage Response Structure + +```json +{ + "entity": { + "id": "table-uuid", + "type": "table", + "name": "fact_orders", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.fact_orders" + }, + "nodes": [ + {"id": "upstream-1", "type": "table", "name": "raw_orders"}, + {"id": "downstream-1", "type": "dashboard", "name": "Sales Dashboard"} + ], + "upstreamEdges": [ + { + "fromEntity": {"id": "upstream-1", "type": "table"}, + "toEntity": {"id": "table-uuid", "type": "table"}, + "lineageDetails": { + "columnsLineage": [...] + } + } + ], + "downstreamEdges": [ + { + "fromEntity": {"id": "table-uuid", "type": "table"}, + "toEntity": {"id": "downstream-1", "type": "dashboard"} + } + ] +} +``` + +## Supported Entity Types + +Lineage can connect: + +| From | To | +|------|----| +| Table | Table, Dashboard, ML Model | +| Dashboard | Dashboard | +| Pipeline | Table, Dashboard | +| Topic | Table, Pipeline | +| Container | Table | + +## Best Practices + + + + Configure connectors to automatically extract lineage from queries and pipelines. + + + Column lineage enables precise impact analysis. + + + Link lineage to the pipeline that created the transformation. + + + Deleting lineage removes important context - verify before removing. + + diff --git a/api-reference/metadata/descriptions/index.mdx b/api-reference/metadata/descriptions/index.mdx new file mode 100644 index 00000000..51f1011d --- /dev/null +++ b/api-reference/metadata/descriptions/index.mdx @@ -0,0 +1,215 @@ +--- +title: Descriptions +description: Update entity and column descriptions +sidebarTitle: Descriptions +mode: "wide" +--- + +# Managing Descriptions + +Descriptions provide context and documentation for entities and their components. Use the API to programmatically update descriptions across your data catalog. + +## Update Entity Description + +Update the description of any entity type using PATCH. + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Get the table +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer" +) + +# Update description +metadata.patch_description( + entity=Table, + source=table, + description=""" +## Customer Dimension Table + +This table contains customer master data including: +- Customer demographics +- Contact information +- Account status + +**Owner:** Data Platform Team +**SLA:** Updated daily at 6:00 AM UTC +""" +) + +print("Description updated successfully") +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import javax.json.Json; +import javax.json.JsonPatch; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Get table first to get current version +Table table = tablesApi.getTableByFQN("sample_data.ecommerce_db.shopify.dim_customer", null, null); + +// Create patch +JsonPatch patch = Json.createPatchBuilder() + .replace("/description", "Updated description with **markdown** support") + .build(); + +// Apply patch +Table updated = tablesApi.patchTable(table.getId(), patch); +System.out.println("Updated version: " + updated.getVersion()); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "replace", + "path": "/description", + "value": "Updated description with **markdown** support" + } + ]' +``` + + + +## Update Column Description + +Update descriptions for individual columns. + + + +```python +from metadata.generated.schema.entity.data.table import Table + +# Get table with columns +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer", + fields=["columns"] +) + +# Update column description by column FQN +metadata.patch_column_description( + table=table, + column_fqn="sample_data.ecommerce_db.shopify.dim_customer.email", + description="Primary email address - contains PII" +) +``` + + + +```bash +# Find the column index first, then patch by index +# Column at index 1 (0-based) +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "replace", + "path": "/columns/1/description", + "value": "Primary email address - contains PII" + } + ]' +``` + + + +## Markdown Support + +Descriptions support GitHub-flavored Markdown: + +```markdown +## Overview +Customer dimension table with demographics. + +### Data Sources +- CRM system (daily sync) +- Website registrations + +### Key Columns +| Column | Description | +|--------|-------------| +| customer_id | Primary key | +| email | Contact email (PII) | + +**Note:** Contains sensitive data +``` + +## Bulk Update Descriptions + +Update descriptions for multiple entities: + + + +```python +from metadata.generated.schema.entity.data.table import Table + +# Define description updates +updates = { + "sample_data.ecommerce_db.shopify.dim_customer": "Customer master data", + "sample_data.ecommerce_db.shopify.fact_orders": "Order transactions", + "sample_data.ecommerce_db.shopify.dim_product": "Product catalog", +} + +for fqn, description in updates.items(): + table = metadata.get_by_name(entity=Table, fqn=fqn) + metadata.patch_description( + entity=Table, + source=table, + description=description + ) + print(f"Updated: {fqn}") +``` + + + +## Best Practices + + + + Structure descriptions with headers, lists, and tables for readability. + + + Document where data comes from and how it's transformed. + + + Explain what the data means in business terms, not just technical details. + + + Note any known data quality issues or limitations. + + + Update descriptions when schemas or business logic change. + + + +## Supported Entity Types + +Descriptions can be updated for: + +| Entity Type | Endpoint | +|-------------|----------| +| Table | `/v1/tables/{id}` | +| Database | `/v1/databases/{id}` | +| Database Schema | `/v1/databaseSchemas/{id}` | +| Dashboard | `/v1/dashboards/{id}` | +| Pipeline | `/v1/pipelines/{id}` | +| Topic | `/v1/topics/{id}` | +| ML Model | `/v1/mlmodels/{id}` | +| Container | `/v1/containers/{id}` | +| Glossary Term | `/v1/glossaryTerms/{id}` | diff --git a/api-reference/metadata/owners/index.mdx b/api-reference/metadata/owners/index.mdx new file mode 100644 index 00000000..73c2d105 --- /dev/null +++ b/api-reference/metadata/owners/index.mdx @@ -0,0 +1,237 @@ +--- +title: Owners +description: Set and manage entity ownership +sidebarTitle: Owners +mode: "wide" +--- + +# Managing Owners + +Ownership helps define accountability for data assets. Owners can be individual users or teams. + +## Set Owner + +Assign an owner to an entity. + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table +from metadata.generated.schema.type.entityReference import EntityReference + +metadata = OpenMetadata(config) + +# Get the table +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer" +) + +# Set user as owner +owner = EntityReference( + id="user-uuid-here", + type="user" +) +metadata.patch_owner(entity=Table, source=table, owner=owner) + +# Or set team as owner +team_owner = EntityReference( + id="team-uuid-here", + type="team" +) +metadata.patch_owner(entity=Table, source=table, owner=team_owner) +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.schema.type.EntityReference; +import javax.json.Json; +import javax.json.JsonPatch; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Get table +Table table = tablesApi.getTableByFQN("sample_data.ecommerce_db.shopify.dim_customer", null, null); + +// Create owner reference +JsonPatch patch = Json.createPatchBuilder() + .add("/owner", Json.createObjectBuilder() + .add("id", "user-uuid-here") + .add("type", "user") + .build()) + .build(); + +Table updated = tablesApi.patchTable(table.getId(), patch); +System.out.println("Owner set: " + updated.getOwner().getName()); +``` + + + +```bash +# Set user as owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/owner", + "value": { + "id": "user-uuid-here", + "type": "user" + } + } + ]' + +# Set team as owner +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/owner", + "value": { + "id": "team-uuid-here", + "type": "team" + } + } + ]' +``` + + + +## Find User or Team ID + +Before setting an owner, you need their UUID. + + + +```bash +# Find user by name +curl -X GET "https://your-company.getcollate.io/api/v1/users/name/john.doe" \ + -H "Authorization: Bearer $TOKEN" + +# Find team by name +curl -X GET "https://your-company.getcollate.io/api/v1/teams/name/data-platform" \ + -H "Authorization: Bearer $TOKEN" + +# Search for users +curl -X GET "https://your-company.getcollate.io/api/v1/users?limit=10" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```python +from metadata.generated.schema.entity.teams.user import User +from metadata.generated.schema.entity.teams.team import Team + +# Get user by name +user = metadata.get_by_name(entity=User, fqn="john.doe") +print(f"User ID: {user.id}") + +# Get team by name +team = metadata.get_by_name(entity=Team, fqn="data-platform") +print(f"Team ID: {team.id}") +``` + + + +## Remove Owner + +Clear ownership from an entity. + + + +```python +# Remove owner +metadata.patch_owner(entity=Table, source=table, owner=None) +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "remove", + "path": "/owner" + } + ]' +``` + + + +## Bulk Update Owners + +Set owners for multiple entities: + + + +```python +from metadata.generated.schema.entity.data.table import Table +from metadata.generated.schema.entity.teams.team import Team +from metadata.generated.schema.type.entityReference import EntityReference + +# Get the team +team = metadata.get_by_name(entity=Team, fqn="data-platform") +owner = EntityReference(id=str(team.id), type="team") + +# Tables to update +table_fqns = [ + "sample_data.ecommerce_db.shopify.dim_customer", + "sample_data.ecommerce_db.shopify.dim_product", + "sample_data.ecommerce_db.shopify.fact_orders", +] + +for fqn in table_fqns: + table = metadata.get_by_name(entity=Table, fqn=fqn) + metadata.patch_owner(entity=Table, source=table, owner=owner) + print(f"Set owner for: {fqn}") +``` + + + +## Query by Owner + +Find all entities owned by a user or team: + + + +```bash +# Using search API to find tables by owner +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=*&index=table_search_index&query_filter=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22owner.id%22%3A%22user-uuid%22%7D%7D%5D%7D%7D%7D" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Owner Types + +| Type | Description | Example | +|------|-------------|---------| +| `user` | Individual user account | Data engineer, analyst | +| `team` | Team or group | "Data Platform", "Analytics" | + +## Best Practices + + + + Teams provide continuity when individuals change roles. + + + Tables should be owned, schemas by teams, databases by platform teams. + + + Use ingestion pipelines to set owners based on source system metadata. + + + Regularly audit assets without owners. + + diff --git a/api-reference/metadata/tags/index.mdx b/api-reference/metadata/tags/index.mdx new file mode 100644 index 00000000..777082b9 --- /dev/null +++ b/api-reference/metadata/tags/index.mdx @@ -0,0 +1,317 @@ +--- +title: Tags +description: Add, remove, and manage tags on entities +sidebarTitle: Tags +mode: "wide" +--- + +# Managing Tags + +Tags and classifications help organize and govern data assets. Tags can be applied at the entity level (e.g., table) or component level (e.g., column). + +## Tag Types + +| Type | Description | Example | +|------|-------------|---------| +| **Classification Tags** | Built-in or custom classifications | `PII.Sensitive`, `Tier.Tier1` | +| **Glossary Terms** | Business glossary references | `Business Glossary.Customer ID` | + +## Add Tags to Entity + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table +from metadata.generated.schema.type.tagLabel import ( + TagLabel, TagSource, State, LabelType +) + +metadata = OpenMetadata(config) + +# Get the table +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer" +) + +# Add a classification tag +tag = TagLabel( + tagFQN="PII.Sensitive", + source=TagSource.Classification, + state=State.Confirmed, + labelType=LabelType.Manual +) + +metadata.patch_tag(entity=Table, source=table, tag_label=tag) +print("Tag added successfully") +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import javax.json.Json; +import javax.json.JsonPatch; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Get table +Table table = tablesApi.getTableByFQN("sample_data.ecommerce_db.shopify.dim_customer", "tags", null); + +// Add tag +JsonPatch patch = Json.createPatchBuilder() + .add("/tags/-", Json.createObjectBuilder() + .add("tagFQN", "PII.Sensitive") + .add("source", "Classification") + .add("state", "Confirmed") + .add("labelType", "Manual") + .build()) + .build(); + +Table updated = tablesApi.patchTable(table.getId(), patch); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/tags/-", + "value": { + "tagFQN": "PII.Sensitive", + "source": "Classification", + "state": "Confirmed", + "labelType": "Manual" + } + } + ]' +``` + + + +## Add Multiple Tags + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/tags/-", + "value": {"tagFQN": "PII.Sensitive", "source": "Classification"} + }, + { + "op": "add", + "path": "/tags/-", + "value": {"tagFQN": "Tier.Tier1", "source": "Classification"} + } + ]' +``` + + + +## Add Tags to Columns + + + +```python +from metadata.generated.schema.entity.data.table import Table +from metadata.generated.schema.type.tagLabel import TagLabel, TagSource + +# Get table with columns +table = metadata.get_by_name( + entity=Table, + fqn="sample_data.ecommerce_db.shopify.dim_customer", + fields=["columns"] +) + +# Add tag to specific column +tag = TagLabel( + tagFQN="PII.Sensitive", + source=TagSource.Classification +) + +metadata.patch_column_tag( + table=table, + column_fqn="sample_data.ecommerce_db.shopify.dim_customer.email", + tag_label=tag +) +``` + + + +```bash +# Add tag to column at index 2 +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/columns/2/tags/-", + "value": { + "tagFQN": "PII.Sensitive", + "source": "Classification" + } + } + ]' +``` + + + +## Remove Tags + + + +```bash +# Remove tag at index 0 from entity +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "remove", + "path": "/tags/0" + } + ]' + +# Remove tag from column +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "remove", + "path": "/columns/2/tags/0" + } + ]' +``` + + + +## List Available Tags + + + +```bash +# List all classifications +curl -X GET "https://your-company.getcollate.io/api/v1/classifications?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# List tags in a classification +curl -X GET "https://your-company.getcollate.io/api/v1/tags?parent=PII&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +```python +from metadata.generated.schema.entity.classification.classification import Classification +from metadata.generated.schema.entity.classification.tag import Tag + +# List classifications +classifications = metadata.list_all_entities(entity=Classification) +for c in classifications: + print(f"Classification: {c.name}") + +# List tags in a classification +tags = metadata.list_entities( + entity=Tag, + params={"parent": "PII"} +) +for tag in tags.entities: + print(f" Tag: {tag.fullyQualifiedName}") +``` + + + +## Add Glossary Term + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/tables/{table-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/tags/-", + "value": { + "tagFQN": "Business Glossary.Customer.Customer ID", + "source": "Glossary", + "state": "Confirmed", + "labelType": "Manual" + } + } + ]' +``` + + + +## Common Classification Tags + +| Tag FQN | Description | +|---------|-------------| +| `PII.Sensitive` | Personally identifiable information | +| `PII.NonSensitive` | Non-sensitive PII | +| `PersonalData.Personal` | Personal data (GDPR) | +| `Tier.Tier1` | Critical/production data | +| `Tier.Tier2` | Important data | +| `Tier.Tier3` | Less critical data | + +## Tag Label Properties + +| Property | Values | Description | +|----------|--------|-------------| +| `source` | `Classification`, `Glossary` | Tag origin | +| `state` | `Suggested`, `Confirmed` | Confirmation status | +| `labelType` | `Manual`, `Propagated`, `Automated`, `Derived` | How tag was applied | + +## Bulk Tag Operations + + + +```python +from metadata.generated.schema.entity.data.table import Table +from metadata.generated.schema.type.tagLabel import TagLabel, TagSource + +# Tag to apply +tag = TagLabel(tagFQN="Tier.Tier1", source=TagSource.Classification) + +# Tables to tag +tables = [ + "sample_data.ecommerce_db.shopify.dim_customer", + "sample_data.ecommerce_db.shopify.fact_orders", + "sample_data.ecommerce_db.shopify.dim_product", +] + +for fqn in tables: + table = metadata.get_by_name(entity=Table, fqn=fqn) + metadata.patch_tag(entity=Table, source=table, tag_label=tag) + print(f"Tagged: {fqn}") +``` + + + +## Query by Tag + +Find entities with specific tags: + + + +```bash +# Search for tables with PII tag +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=*&index=table_search_index&query_filter=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22tags.tagFQN%22%3A%22PII.Sensitive%22%7D%7D%5D%7D%7D%7D" \ + -H "Authorization: Bearer $TOKEN" +``` + + diff --git a/api-reference/pagination.mdx b/api-reference/pagination.mdx new file mode 100644 index 00000000..02f69416 --- /dev/null +++ b/api-reference/pagination.mdx @@ -0,0 +1,289 @@ +--- +title: Pagination +description: Navigate through large result sets using cursor-based pagination +sidebarTitle: Pagination +mode: "wide" +--- + +# Pagination + +The Collate API uses cursor-based pagination for list endpoints. This ensures consistent results even when data changes between requests. + +## How It Works + +When you request a list of resources, the response includes: +- An array of resources (up to the `limit`) +- A `paging` object with cursors for navigation + +```json +{ + "data": [ + { "id": "...", "name": "table1", ... }, + { "id": "...", "name": "table2", ... } + ], + "paging": { + "total": 150, + "after": "eyJsYXN0SWQiOiIxMjM0NTY3ODkwIn0=" + } +} +``` + +## Pagination Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `limit` | integer | 10 | Number of results per page (max 1000) | +| `before` | string | - | Cursor for previous page | +| `after` | string | - | Cursor for next page | + +## Response Fields + +The `paging` object contains: + +| Field | Type | Description | +|-------|------|-------------| +| `total` | integer | Total count of matching resources | +| `before` | string | Cursor for the previous page (if available) | +| `after` | string | Cursor for the next page (if available) | + +## Examples + +### Basic Pagination + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Get first page +first_page = metadata.list_entities( + entity=Table, + limit=20 +) + +print(f"Total tables: {first_page.paging.total}") +for table in first_page.entities: + print(table.fullyQualifiedName) + +# Get next page using after cursor +if first_page.paging.after: + next_page = metadata.list_entities( + entity=Table, + limit=20, + after=first_page.paging.after + ) + for table in next_page.entities: + print(table.fullyQualifiedName) +``` + + + +```java +import org.openmetadata.client.api.TablesApi; +import org.openmetadata.client.model.TableList; + +TablesApi tablesApi = client.buildClient(TablesApi.class); + +// Get first page +TableList firstPage = tablesApi.listTables(null, 20, null, null, null); + +System.out.println("Total tables: " + firstPage.getPaging().getTotal()); +for (Table table : firstPage.getData()) { + System.out.println(table.getFullyQualifiedName()); +} + +// Get next page +if (firstPage.getPaging().getAfter() != null) { + TableList nextPage = tablesApi.listTables( + null, 20, null, firstPage.getPaging().getAfter(), null + ); + for (Table table : nextPage.getData()) { + System.out.println(table.getFullyQualifiedName()); + } +} +``` + + + +```bash +# Get first page +curl -X GET "https://your-company.getcollate.io/api/v1/tables?limit=20" \ + -H "Authorization: Bearer $TOKEN" + +# Response includes paging.after cursor +# { +# "data": [...], +# "paging": { +# "total": 150, +# "after": "eyJsYXN0SWQiOiIxMjM0NTY3ODkwIn0=" +# } +# } + +# Get next page using after cursor +curl -X GET "https://your-company.getcollate.io/api/v1/tables?limit=20&after=eyJsYXN0SWQiOiIxMjM0NTY3ODkwIn0=" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Iterating Through All Results + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# Iterate through all tables +for table in metadata.list_all_entities(entity=Table, limit=100): + print(table.fullyQualifiedName) + # Process each table... +``` + + + +```java +import org.openmetadata.client.api.TablesApi; + +TablesApi tablesApi = client.buildClient(TablesApi.class); +String afterCursor = null; + +do { + TableList page = tablesApi.listTables(null, 100, null, afterCursor, null); + + for (Table table : page.getData()) { + System.out.println(table.getFullyQualifiedName()); + // Process each table... + } + + afterCursor = page.getPaging().getAfter(); +} while (afterCursor != null); +``` + + + +```bash +#!/bin/bash + +after_cursor="" + +while true; do + # Build URL with optional cursor + url="https://your-company.getcollate.io/api/v1/tables?limit=100" + if [ -n "$after_cursor" ]; then + url="${url}&after=${after_cursor}" + fi + + # Fetch page + response=$(curl -s -X GET "$url" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json") + + # Process results + echo "$response" | jq -r '.data[].fullyQualifiedName' + + # Get next cursor + after_cursor=$(echo "$response" | jq -r '.paging.after // empty') + + # Exit if no more pages + if [ -z "$after_cursor" ]; then + break + fi +done +``` + + + +## Filtering with Pagination + +Combine pagination with filters for efficient data retrieval: + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.data.table import Table + +metadata = OpenMetadata(config) + +# List tables from a specific database +tables = metadata.list_entities( + entity=Table, + params={"database": "prod.analytics"}, + limit=50 +) + +for table in tables.entities: + print(table.fullyQualifiedName) +``` + + + +```bash +# Filter tables by database +curl -X GET "https://your-company.getcollate.io/api/v1/tables?database=prod.analytics&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Include Fields + +Control which fields are returned in the response using the `fields` parameter: + + + +```bash +# Request specific fields +curl -X GET "https://your-company.getcollate.io/api/v1/tables?fields=owner,tags,columns&limit=20" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +Common field options for tables: +- `owner` - Include owner information +- `tags` - Include tags and classifications +- `columns` - Include column definitions +- `followers` - Include followers +- `tableConstraints` - Include constraints +- `usageSummary` - Include usage statistics + +## Best Practices + + + + Start with `limit=50-100`. Larger pages reduce API calls but increase memory usage. + + + Always use cursors sequentially. Don't try to construct cursor values manually. + + + Check if `data` array is empty or `after` cursor is null to detect end of results. + + + Use the `fields` parameter to reduce response size and improve performance. + + + Handle transient errors gracefully when paginating through large datasets. + + + +## Pagination vs Search + +For finding specific resources, consider using the Search API instead of paginating through all results: + +```bash +# Search is faster for finding specific resources +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=customers&index=table_search_index" \ + -H "Authorization: Bearer $TOKEN" +``` + + + Learn about searching and filtering metadata + diff --git a/api-reference/search/index.mdx b/api-reference/search/index.mdx new file mode 100644 index 00000000..5fed2b1f --- /dev/null +++ b/api-reference/search/index.mdx @@ -0,0 +1,172 @@ +--- +title: Search +description: Search and discover entities across your data catalog +sidebarTitle: Overview +mode: "wide" +--- + +# Search API + +The Search API provides full-text search across all entities in your data catalog using Elasticsearch. + +## Search Query + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata + +metadata = OpenMetadata(config) + +# Search for tables +results = metadata.es_search_from_fqn( + entity_type="table", + fqn_search_string="customer", + size=10 +) + +for hit in results: + print(f"{hit['_source']['fullyQualifiedName']}: {hit['_source'].get('description', 'N/A')}") +``` + + + +```bash +# Basic search +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=customer&index=table_search_index&size=10" \ + -H "Authorization: Bearer $TOKEN" + +# Search with from/size pagination +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=customer&index=table_search_index&from=0&size=20" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Search Indices + +| Index | Entity Type | +|-------|-------------| +| `table_search_index` | Tables | +| `dashboard_search_index` | Dashboards | +| `pipeline_search_index` | Pipelines | +| `topic_search_index` | Topics | +| `mlmodel_search_index` | ML Models | +| `container_search_index` | Containers | +| `glossary_search_index` | Glossary Terms | +| `tag_search_index` | Tags | +| `user_search_index` | Users | +| `team_search_index` | Teams | + +## Search Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `q` | string | Search query | +| `index` | string | Search index name | +| `from` | integer | Offset for pagination | +| `size` | integer | Number of results (default 10, max 10000) | +| `deleted` | boolean | Include deleted entities | +| `query_filter` | string | Elasticsearch query filter (URL encoded JSON) | +| `sort_field` | string | Field to sort by | +| `sort_order` | string | `asc` or `desc` | + +## Filtered Search + +Use query filters for advanced search: + + + +```bash +# Search tables with specific owner +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=*&index=table_search_index&query_filter=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22owner.id%22%3A%22user-uuid%22%7D%7D%5D%7D%7D%7D" \ + -H "Authorization: Bearer $TOKEN" + +# Search tables with specific tag +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=*&index=table_search_index&query_filter=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22tags.tagFQN%22%3A%22PII.Sensitive%22%7D%7D%5D%7D%7D%7D" \ + -H "Authorization: Bearer $TOKEN" + +# Search tables in a specific database +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=*&index=table_search_index&query_filter=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22database.name.keyword%22%3A%22analytics%22%7D%7D%5D%7D%7D%7D" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Search Response + +```json +{ + "took": 5, + "timed_out": false, + "hits": { + "total": { + "value": 25, + "relation": "eq" + }, + "max_score": 10.5, + "hits": [ + { + "_index": "table_search_index", + "_id": "table-uuid", + "_score": 10.5, + "_source": { + "id": "table-uuid", + "name": "customers", + "fullyQualifiedName": "sample_data.ecommerce_db.shopify.customers", + "description": "Customer master data", + "tableType": "Regular", + "owner": {...}, + "tags": [...], + "service": {...} + } + } + ] + } +} +``` + +## Suggest (Autocomplete) + +Get suggestions for autocomplete: + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/search/suggest?q=cust&index=table_search_index&field=name" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Aggregations + +Get aggregated counts: + + + +```bash +# Get table counts by service type +curl -X GET "https://your-company.getcollate.io/api/v1/search/query?q=*&index=table_search_index&size=0" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" +``` + + + +## Search Tips + + + + Use `*` for wildcard matches: `cust*` matches "customer", "custom", etc. + + + Use quotes for exact matches: `"customer orders"` + + + Search specific fields: `name:customer` or `description:sales` + + + Combine terms: `customer AND orders`, `sales OR revenue` + + diff --git a/api-reference/services/dashboard/index.mdx b/api-reference/services/dashboard/index.mdx new file mode 100644 index 00000000..badc1bab --- /dev/null +++ b/api-reference/services/dashboard/index.mdx @@ -0,0 +1,543 @@ +--- +title: Dashboard Services +description: Create and manage connections to BI and dashboard platforms +sidebarTitle: Dashboard Services +mode: "wide" +--- + +A **Dashboard Service** represents a connection to a business intelligence platform like Tableau, Looker, Superset, or Metabase. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/dashboard-service/). + + +## Entity Hierarchy + +Dashboard Services are the top-level entity for BI assets: + +``` +DashboardService (this page) +└── Dashboard + └── Chart +``` + +## Inheritance + +When you set an **owner** or **domain** on a Dashboard Service, it is inherited by all child dashboards and charts. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/services/dashboardServices` | Create or update a service | +| `GET` | `/v1/services/dashboardServices` | List services | +| `GET` | `/v1/services/dashboardServices/{id}` | Get by ID | +| `GET` | `/v1/services/dashboardServices/name/{name}` | Get by name | +| `PATCH` | `/v1/services/dashboardServices/{id}` | Update a service | +| `DELETE` | `/v1/services/dashboardServices/{id}` | Delete a service | + +--- + +## The Dashboard Service Object + + + Unique identifier for the dashboard service. + + + + Name of the dashboard service. Must be unique. + + + + Fully qualified name. For services, this equals the name. + + + + Human-readable display name for the service. + + + + Type of dashboard service. One of: `Tableau`, `Looker`, `Superset`, `Metabase`, `PowerBI`, `QuickSight`, `Mode`, `Redash`, `Domo`, `Lightdash`, `QlikSense`. + + + + Description in Markdown format. + + + + Connection configuration specific to the service type. + + + + Owners of the service (users or teams). Inherited by all child entities. + + + + Domain this service belongs to. Inherited by all child entities. + + + + Tags and classifications applied to this service. + + + + Entity version number, incremented on updates. + + + + Whether the service has been soft-deleted. + + +```json Example Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "tableau_prod", + "fullyQualifiedName": "tableau_prod", + "displayName": "Tableau Production", + "serviceType": "Tableau", + "description": "Production Tableau Server for enterprise analytics", + "connection": { + "config": { + "type": "Tableau", + "hostPort": "https://tableau.company.com", + "siteName": "default", + "apiVersion": "3.15" + } + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "bi-team" + } + ], + "version": 0.1, + "deleted": false +} +``` + +--- + +## Create a Dashboard Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DashboardService +from metadata.generated.schema.api.services.createDashboardService import CreateDashboardServiceRequest +from metadata.generated.schema.entity.services.dashboardService import ( + DashboardServiceType, + DashboardConnection, +) +from metadata.generated.schema.entity.services.connections.dashboard.tableauConnection import TableauConnection + +# Configure the SDK +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Create the service +request = CreateDashboardServiceRequest( + name="tableau_prod", + displayName="Tableau Production", + serviceType=DashboardServiceType.Tableau, + connection=DashboardConnection( + config=TableauConnection( + hostPort="https://tableau.company.com", + siteName="default", + apiVersion="3.15" + ) + ), + description="Production Tableau Server for enterprise analytics" +) + +service = DashboardService.create(request) +print(f"Created: {service.fullyQualifiedName}") +``` + + + +```java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.DashboardServices; +import org.openmetadata.schema.api.services.CreateDashboardServiceRequest; +import org.openmetadata.schema.entity.services.DashboardService; +import org.openmetadata.schema.entity.services.DashboardServiceType; + +// Initialize client +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +DashboardServices.setDefaultClient(client); + +// Create the service +CreateDashboardServiceRequest request = new CreateDashboardServiceRequest() + .withName("tableau_prod") + .withDisplayName("Tableau Production") + .withServiceType(DashboardServiceType.Tableau) + .withDescription("Production Tableau Server for enterprise analytics"); + +DashboardService service = DashboardServices.create(request); +System.out.println("Created: " + service.getFullyQualifiedName()); +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/services/dashboardServices" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "tableau_prod", + "displayName": "Tableau Production", + "serviceType": "Tableau", + "connection": { + "config": { + "type": "Tableau", + "hostPort": "https://tableau.company.com", + "siteName": "default", + "apiVersion": "3.15" + } + }, + "description": "Production Tableau Server for enterprise analytics" + }' +``` + + + +--- + +## Retrieve a Dashboard Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DashboardService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +service = DashboardService.retrieve_by_name( + "tableau_prod", + fields=["owners", "tags", "domain"] +) + +# By ID +service = DashboardService.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {service.name}") +print(f"Type: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.DashboardServices; + +// By name +DashboardService service = DashboardServices.retrieveByName( + "tableau_prod", + List.of("owners", "tags", "domain") +); + +// By ID +DashboardService service = DashboardServices.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + service.getName()); +System.out.println("Type: " + service.getServiceType()); +``` + + + +```bash +# By name +curl "https://your-company.getcollate.io/api/v1/services/dashboardServices/name/tableau_prod?fields=owners,tags,domain" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/services/dashboardServices/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## List Dashboard Services + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DashboardService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all services with auto-pagination +for service in DashboardService.list().auto_paging_iterable(): + print(f"{service.name}: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.DashboardServices; + +// List with auto-pagination +for (DashboardService service : DashboardServices.list().autoPagingIterable()) { + System.out.println(service.getName() + ": " + service.getServiceType()); +} +``` + + + +```bash +curl "https://your-company.getcollate.io/api/v1/services/dashboardServices?limit=50&fields=owners,tags" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Update a Dashboard Service + +### Update Description + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DashboardService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +service = DashboardService.retrieve_by_name("tableau_prod") +service.description = "Enterprise BI platform for executive reporting" +updated = DashboardService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.DashboardServices; + +DashboardService service = DashboardServices.retrieveByName("tableau_prod"); +service.setDescription("Enterprise BI platform for executive reporting"); +DashboardService updated = DashboardServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/dashboardServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Enterprise BI platform"} + ]' +``` + + + +### Set Owner + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DashboardService, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +team = Team.retrieve_by_name("bi-team") +service = DashboardService.retrieve_by_name("tableau_prod") +service.owners = [{"id": str(team.id), "type": "team"}] +updated = DashboardService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.DashboardServices; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +Team team = Teams.retrieveByName("bi-team"); +DashboardService service = DashboardServices.retrieveByName("tableau_prod"); +service.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +DashboardService updated = DashboardServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/dashboardServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' +``` + + + +--- + +## Delete a Dashboard Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DashboardService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +DashboardService.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete with all dashboards +DashboardService.delete( + "550e8400-e29b-41d4-a716-446655440000", + recursive=True, + hard_delete=True +) +``` + + + +```java +import org.openmetadata.sdk.entities.DashboardServices; + +// Soft delete +DashboardServices.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete with all dashboards +DashboardServices.delete("550e8400-e29b-41d4-a716-446655440000", true, true); +``` + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/dashboardServices/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete with all dashboards +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/dashboardServices/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Connection Configurations + +### Tableau + +```json +{ + "type": "Tableau", + "hostPort": "https://tableau.company.com", + "siteName": "default", + "username": "api_user", + "password": "secret", + "apiVersion": "3.15" +} +``` + +### Looker + +```json +{ + "type": "Looker", + "hostPort": "https://company.looker.com", + "clientId": "client_id", + "clientSecret": "client_secret" +} +``` + +### Superset + +```json +{ + "type": "Superset", + "hostPort": "https://superset.company.com", + "connection": { + "username": "admin", + "password": "secret", + "provider": "db" + } +} +``` + +### PowerBI + +```json +{ + "type": "PowerBI", + "clientId": "app-client-id", + "clientSecret": "app-secret", + "tenantId": "tenant-id" +} +``` + +--- + +## Service Types + +| Type | Description | +|------|-------------| +| `Tableau` | Tableau Server/Online | +| `Looker` | Looker/LookML | +| `Superset` | Apache Superset | +| `Metabase` | Metabase | +| `PowerBI` | Microsoft Power BI | +| `QuickSight` | Amazon QuickSight | +| `Mode` | Mode Analytics | +| `Redash` | Redash | +| `Domo` | Domo | +| `Lightdash` | Lightdash | +| `QlikSense` | Qlik Sense | + +--- + +## Related + + + + Create and manage dashboards + + + Learn about entity relationships + + diff --git a/api-reference/services/database/index.mdx b/api-reference/services/database/index.mdx new file mode 100644 index 00000000..33aee2de --- /dev/null +++ b/api-reference/services/database/index.mdx @@ -0,0 +1,723 @@ +--- +title: Database Services +description: Create and manage connections to your databases and data warehouses +sidebarTitle: Database Services +mode: "wide" +--- + +A **Database Service** represents a connection to a database, data warehouse, or data lake platform like Snowflake, PostgreSQL, BigQuery, or Databricks. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/database-service/). + + +## What is a Database Service? + +A Database Service is the **top-level entity** in the database hierarchy. It contains: + +- Connection configuration to your database platform +- All databases, schemas, tables, and stored procedures discovered from that platform +- Metadata (owner, tags, domain) that can be inherited by child entities + +``` +DatabaseService (this page) +└── Database + └── DatabaseSchema + ├── Table + │ └── Column + └── StoredProcedure +``` + +## Inheritance + +When you set an **owner** or **domain** on a Database Service, it is inherited by all child entities: + +- Setting an owner on `snowflake_prod` makes that owner responsible for all databases, schemas, and tables under it +- Assigning a domain to `snowflake_prod` groups all child entities under that domain + +This allows you to manage metadata at scale without updating each entity individually. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/services/databaseServices` | Create or update a service | +| `GET` | `/v1/services/databaseServices` | List services | +| `GET` | `/v1/services/databaseServices/{id}` | Get by ID | +| `GET` | `/v1/services/databaseServices/name/{name}` | Get by name | +| `PATCH` | `/v1/services/databaseServices/{id}` | Update a service | +| `DELETE` | `/v1/services/databaseServices/{id}` | Delete a service | + +--- + +## The Database Service Object + + + Unique identifier for the database service. + + + + Name of the database service. Must be unique across all database services. + + + + Fully qualified name. For services, this equals the name. + + + + Human-readable display name for the service. + + + + Type of database. One of: `Snowflake`, `BigQuery`, `Redshift`, `Postgres`, `MySQL`, `Databricks`, `Athena`, and [many more](https://openmetadatastandards.org/metadata-specifications/entities/database-service/). + + + + Description in Markdown format. + + + + Connection configuration specific to the service type. + + + + Owners of the service (users or teams). Inherited by all child entities. + + + + Domain this service belongs to. Inherited by all child entities. + + + + Tags and classifications applied to this service. + + + + Entity version number, incremented on updates. + + + + Whether the service has been soft-deleted. + + +```json Example Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "snowflake_prod", + "fullyQualifiedName": "snowflake_prod", + "displayName": "Snowflake Production", + "serviceType": "Snowflake", + "description": "Production Snowflake data warehouse", + "connection": { + "config": { + "type": "Snowflake", + "username": "admin", + "account": "company.us-east-1", + "warehouse": "COMPUTE_WH", + "database": "ANALYTICS" + } + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "version": 0.1, + "deleted": false +} +``` + +--- + +## Create a Database Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService +from metadata.generated.schema.api.services.createDatabaseService import CreateDatabaseServiceRequest +from metadata.generated.schema.entity.services.databaseService import ( + DatabaseServiceType, + DatabaseConnection, +) +from metadata.generated.schema.entity.services.connections.database.snowflakeConnection import SnowflakeConnection + +# Configure the SDK +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Create the service +request = CreateDatabaseServiceRequest( + name="snowflake_prod", + displayName="Snowflake Production", + serviceType=DatabaseServiceType.Snowflake, + connection=DatabaseConnection( + config=SnowflakeConnection( + username="admin", + account="company.us-east-1", + warehouse="COMPUTE_WH" + ) + ), + description="Production Snowflake data warehouse" +) + +service = DatabaseService.create(request) +print(f"Created: {service.fullyQualifiedName}") +``` + + + +```java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.DatabaseServices; +import org.openmetadata.schema.api.services.CreateDatabaseServiceRequest; +import org.openmetadata.schema.entity.services.DatabaseService; +import org.openmetadata.schema.entity.services.DatabaseServiceType; + +// Initialize client +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +DatabaseServices.setDefaultClient(client); + +// Create the service +CreateDatabaseServiceRequest request = new CreateDatabaseServiceRequest() + .withName("snowflake_prod") + .withDisplayName("Snowflake Production") + .withServiceType(DatabaseServiceType.Snowflake) + .withDescription("Production Snowflake data warehouse"); + +DatabaseService service = DatabaseServices.create(request); +System.out.println("Created: " + service.getFullyQualifiedName()); +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/services/databaseServices" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "snowflake_prod", + "displayName": "Snowflake Production", + "serviceType": "Snowflake", + "connection": { + "config": { + "type": "Snowflake", + "username": "admin", + "account": "company.us-east-1", + "warehouse": "COMPUTE_WH" + } + }, + "description": "Production Snowflake data warehouse" + }' +``` + + + +--- + +## Retrieve a Database Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +service = DatabaseService.retrieve_by_name( + "snowflake_prod", + fields=["owners", "tags", "domain"] +) + +# By ID +service = DatabaseService.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {service.name}") +print(f"Type: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; + +// By name +DatabaseService service = DatabaseServices.retrieveByName( + "snowflake_prod", + List.of("owners", "tags", "domain") +); + +// By ID +DatabaseService service = DatabaseServices.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + service.getName()); +System.out.println("Type: " + service.getServiceType()); +``` + + + +```bash +# By name +curl "https://your-company.getcollate.io/api/v1/services/databaseServices/name/snowflake_prod?fields=owners,tags,domain" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/services/databaseServices/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## List Database Services + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService +from metadata.sdk.entities.database_service import DatabaseServiceListParams + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all services with auto-pagination +for service in DatabaseService.list().auto_paging_iterable(): + print(f"{service.name}: {service.serviceType}") + +# List with parameters +params = DatabaseServiceListParams.builder() \ + .limit(50) \ + .fields(["owners", "tags"]) \ + .build() + +services = DatabaseService.list(params) +for service in services.get_data(): + print(service.name) +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; +import org.openmetadata.sdk.entities.DatabaseServiceListParams; + +// List with auto-pagination +for (DatabaseService service : DatabaseServices.list().autoPagingIterable()) { + System.out.println(service.getName() + ": " + service.getServiceType()); +} + +// List with parameters +DatabaseServiceListParams params = DatabaseServiceListParams.builder() + .limit(50) + .fields(List.of("owners", "tags")) + .build(); + +DatabaseServiceCollection services = DatabaseServices.list(params); +for (DatabaseService service : services.getData()) { + System.out.println(service.getName()); +} +``` + + + +```bash +curl "https://your-company.getcollate.io/api/v1/services/databaseServices?limit=50&fields=owners,tags" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `limit` | integer | Maximum results per page (default: 10, max: 1000000) | +| `before` | string | Cursor for backward pagination | +| `after` | string | Cursor for forward pagination | +| `fields` | string | Comma-separated fields to include | +| `include` | string | Include `all`, `deleted`, or `non-deleted` (default) | + +--- + +## Update a Database Service + +### Update Description + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +service = DatabaseService.retrieve_by_name("snowflake_prod") +service.description = "Enterprise Snowflake data warehouse for analytics" +updated = DatabaseService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; + +DatabaseService service = DatabaseServices.retrieveByName("snowflake_prod"); +service.setDescription("Enterprise Snowflake data warehouse for analytics"); +DatabaseService updated = DatabaseServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/databaseServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Enterprise Snowflake data warehouse"} + ]' +``` + + + +### Set Owner + +Setting an owner on a service makes them responsible for **all child entities** (databases, schemas, tables, stored procedures). + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Get the team to set as owner +team = Team.retrieve_by_name("data-platform") + +# Update the service owner +service = DatabaseService.retrieve_by_name("snowflake_prod") +service.owners = [{"id": str(team.id), "type": "team"}] +updated = DatabaseService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +// Get the team +Team team = Teams.retrieveByName("data-platform"); + +// Update the service owner +DatabaseService service = DatabaseServices.retrieveByName("snowflake_prod"); +service.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +DatabaseService updated = DatabaseServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/databaseServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' +``` + + + +### Set Domain + +Setting a domain on a service groups **all child entities** under that domain. + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService, Domain + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Get the domain +domain = Domain.retrieve_by_name("Sales") + +# Update the service domain +service = DatabaseService.retrieve_by_name("snowflake_prod") +service.domain = {"id": str(domain.id), "type": "domain"} +updated = DatabaseService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; +import org.openmetadata.sdk.entities.Domains; +import org.openmetadata.schema.type.EntityReference; + +// Get the domain +Domain domain = Domains.retrieveByName("Sales"); + +// Update the service domain +DatabaseService service = DatabaseServices.retrieveByName("snowflake_prod"); +service.setDomain( + new EntityReference() + .withId(domain.getId()) + .withType("domain") +); +DatabaseService updated = DatabaseServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/databaseServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/domain", "value": {"id": "domain-uuid", "type": "domain"}} + ]' +``` + + + +### Add Tags + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +service = DatabaseService.retrieve_by_name("snowflake_prod", fields=["tags"]) +service.tags = [ + {"tagFQN": "Tier.Tier1", "labelType": "Manual", "state": "Confirmed"}, + {"tagFQN": "PII.Sensitive", "labelType": "Manual", "state": "Confirmed"} +] +updated = DatabaseService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; +import org.openmetadata.schema.type.TagLabel; + +DatabaseService service = DatabaseServices.retrieveByName("snowflake_prod", List.of("tags")); +service.setTags(List.of( + new TagLabel() + .withTagFQN("Tier.Tier1") + .withLabelType(TagLabel.LabelType.Manual) + .withState(TagLabel.State.Confirmed), + new TagLabel() + .withTagFQN("PII.Sensitive") + .withLabelType(TagLabel.LabelType.Manual) + .withState(TagLabel.State.Confirmed) +)); +DatabaseService updated = DatabaseServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/databaseServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/tags/-", "value": {"tagFQN": "Tier.Tier1", "labelType": "Manual", "state": "Confirmed"}} + ]' +``` + + + +--- + +## Delete a Database Service + + +Deleting a Database Service with `recursive=true` will delete **all** databases, schemas, tables, and stored procedures under it. This action cannot be undone with a hard delete. + + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +DatabaseService.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete with all child entities +DatabaseService.delete( + "550e8400-e29b-41d4-a716-446655440000", + recursive=True, + hard_delete=True +) +``` + + + +```java +import org.openmetadata.sdk.entities.DatabaseServices; + +// Soft delete +DatabaseServices.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete with all child entities +DatabaseServices.delete( + "550e8400-e29b-41d4-a716-446655440000", + true, // recursive + true // hardDelete +); +``` + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/databaseServices/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete with all child entities +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/databaseServices/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Connection Configurations + +### Snowflake + +```json +{ + "type": "Snowflake", + "username": "admin", + "account": "company.us-east-1", + "warehouse": "COMPUTE_WH", + "database": "ANALYTICS", + "role": "ANALYTICS_ROLE" +} +``` + +### BigQuery + +```json +{ + "type": "BigQuery", + "credentials": { + "gcpConfig": { + "type": "service_account", + "projectId": "my-project", + "privateKeyId": "key-id", + "privateKey": "-----BEGIN PRIVATE KEY-----...", + "clientEmail": "service@project.iam.gserviceaccount.com" + } + } +} +``` + +### PostgreSQL + +```json +{ + "type": "Postgres", + "username": "postgres_user", + "authType": {"password": "secret"}, + "hostPort": "postgres.example.com:5432", + "database": "analytics" +} +``` + +### MySQL + +```json +{ + "type": "Mysql", + "username": "mysql_user", + "authType": {"password": "secret"}, + "hostPort": "mysql.example.com:3306", + "databaseSchema": "production" +} +``` + +--- + +## Service Types + +| Type | Description | +|------|-------------| +| `Snowflake` | Snowflake Data Cloud | +| `BigQuery` | Google BigQuery | +| `Redshift` | Amazon Redshift | +| `Postgres` | PostgreSQL | +| `MySQL` | MySQL | +| `Databricks` | Databricks Lakehouse | +| `Athena` | Amazon Athena | +| `Trino` | Trino (formerly Presto SQL) | +| `Hive` | Apache Hive | +| `MSSQL` | Microsoft SQL Server | +| `Oracle` | Oracle Database | + +See [all supported database types](https://openmetadatastandards.org/metadata-specifications/entities/database-service/) for the complete list. + +--- + +## Related + + + + Manage databases within this service + + + Manage tables within schemas + + diff --git a/api-reference/services/index.mdx b/api-reference/services/index.mdx new file mode 100644 index 00000000..45b2d1a2 --- /dev/null +++ b/api-reference/services/index.mdx @@ -0,0 +1,139 @@ +--- +title: Services +description: Services are the top-level entities that represent connections to your data platforms +sidebarTitle: Overview +mode: "wide" +--- + +# Services + +In Collate, a **Service** represents a connection to an external data platform. Services are the top-level entities in the metadata hierarchy and define how Collate connects to and extracts metadata from your data infrastructure. + +## What is a Service? + +A Service contains: +- **Connection configuration** - Credentials, hostnames, and settings to connect to your data platform +- **Service type** - The type of platform (Snowflake, PostgreSQL, Tableau, etc.) +- **Metadata** - Owner, tags, domain, and description that apply to all child assets + +## Service Types + +| Service Type | Description | Child Entities | +|--------------|-------------|----------------| +| **Database Service** | Relational databases, data warehouses, and data lakes | Database → Schema → Table, Stored Procedure | +| **Dashboard Service** | BI and visualization platforms | Dashboard, Chart, Data Model | +| **Pipeline Service** | Orchestration and ETL tools | Pipeline → Task | +| **Messaging Service** | Streaming and messaging platforms | Topic | +| **ML Model Service** | Machine learning platforms | ML Model | +| **Storage Service** | Object storage platforms | Container | + +## Entity Hierarchy + +Services sit at the top of the entity hierarchy. All metadata flows down from services: + +``` +Service (connection to platform) +└── Database / Dashboard / Pipeline / Topic / etc. + └── Schema / Chart / Task / etc. + └── Table / etc. + └── Column +``` + +## Inheritance + +When you set metadata on a Service, it can be **inherited** by all child entities: + +- **Owner** - Setting an owner on a Database Service makes that owner responsible for all databases, schemas, tables, and stored procedures under it +- **Domain** - Assigning a domain to a Service groups all child assets under that domain +- **Tags** - Tags applied to a Service can propagate to child entities + +This inheritance model allows you to efficiently manage metadata at scale. + +## Service Endpoints + +All services follow a consistent REST API pattern: + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/services/{serviceType}` | Create or update a service | +| `GET` | `/v1/services/{serviceType}` | List services | +| `GET` | `/v1/services/{serviceType}/{id}` | Get service by ID | +| `GET` | `/v1/services/{serviceType}/name/{name}` | Get service by name | +| `PATCH` | `/v1/services/{serviceType}/{id}` | Partially update a service | +| `DELETE` | `/v1/services/{serviceType}/{id}` | Delete a service | + +Where `{serviceType}` is one of: +- `databaseServices` +- `dashboardServices` +- `pipelineServices` +- `messagingServices` +- `mlmodelServices` +- `storageServices` + +## Quick Start + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import DatabaseService + +# Configure the SDK +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all database services +for service in DatabaseService.list().auto_paging_iterable(): + print(f"{service.name}: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.DatabaseServices; + +// Initialize client +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +DatabaseServices.setDefaultClient(client); + +// List all database services +for (DatabaseService service : DatabaseServices.list().autoPagingIterable()) { + System.out.println(service.getName() + ": " + service.getServiceType()); +} +``` + + + +```bash +curl "https://your-company.getcollate.io/api/v1/services/databaseServices" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Available Services + + + + Snowflake, PostgreSQL, MySQL, BigQuery, and more + + + Tableau, Looker, Power BI, Metabase, and more + + + Airflow, Dagster, dbt, Fivetran, and more + + + Kafka, Pulsar, Kinesis, and more + + diff --git a/api-reference/services/messaging/index.mdx b/api-reference/services/messaging/index.mdx new file mode 100644 index 00000000..97f1949b --- /dev/null +++ b/api-reference/services/messaging/index.mdx @@ -0,0 +1,517 @@ +--- +title: Messaging Services +description: Create and manage connections to messaging and streaming platforms +sidebarTitle: Messaging Services +mode: "wide" +--- + +A **Messaging Service** represents a connection to a streaming or messaging platform like Kafka, Pulsar, or Kinesis. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/messaging-service/). + + +## Entity Hierarchy + +Messaging Services are the top-level entity for streaming assets: + +``` +MessagingService (this page) +└── Topic +``` + +## Inheritance + +When you set an **owner** or **domain** on a Messaging Service, it is inherited by all child topics. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/services/messagingServices` | Create or update a service | +| `GET` | `/v1/services/messagingServices` | List services | +| `GET` | `/v1/services/messagingServices/{id}` | Get by ID | +| `GET` | `/v1/services/messagingServices/name/{name}` | Get by name | +| `PATCH` | `/v1/services/messagingServices/{id}` | Update a service | +| `DELETE` | `/v1/services/messagingServices/{id}` | Delete a service | + +--- + +## The Messaging Service Object + + + Unique identifier for the messaging service. + + + + Name of the messaging service. Must be unique. + + + + Fully qualified name. For services, this equals the name. + + + + Human-readable display name for the service. + + + + Type of messaging service. One of: `Kafka`, `Pulsar`, `Kinesis`, `Redpanda`, `CustomMessaging`. + + + + Description in Markdown format. + + + + Connection configuration specific to the service type. + + + + Owners of the service (users or teams). Inherited by all child entities. + + + + Domain this service belongs to. Inherited by all child entities. + + + + Tags and classifications applied to this service. + + + + Entity version number, incremented on updates. + + + + Whether the service has been soft-deleted. + + +```json Example Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "kafka_prod", + "fullyQualifiedName": "kafka_prod", + "displayName": "Kafka Production", + "serviceType": "Kafka", + "description": "Production Kafka cluster for event streaming", + "connection": { + "config": { + "type": "Kafka", + "bootstrapServers": "kafka1.company.com:9092,kafka2.company.com:9092", + "schemaRegistryURL": "https://schema-registry.company.com" + } + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "version": 0.1, + "deleted": false +} +``` + +--- + +## Create a Messaging Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import MessagingService +from metadata.generated.schema.api.services.createMessagingService import CreateMessagingServiceRequest +from metadata.generated.schema.entity.services.messagingService import ( + MessagingServiceType, + MessagingConnection, +) +from metadata.generated.schema.entity.services.connections.messaging.kafkaConnection import KafkaConnection + +# Configure the SDK +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Create the service +request = CreateMessagingServiceRequest( + name="kafka_prod", + displayName="Kafka Production", + serviceType=MessagingServiceType.Kafka, + connection=MessagingConnection( + config=KafkaConnection( + bootstrapServers="kafka1.company.com:9092,kafka2.company.com:9092", + schemaRegistryURL="https://schema-registry.company.com" + ) + ), + description="Production Kafka cluster for event streaming" +) + +service = MessagingService.create(request) +print(f"Created: {service.fullyQualifiedName}") +``` + + + +```java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.MessagingServices; +import org.openmetadata.schema.api.services.CreateMessagingServiceRequest; +import org.openmetadata.schema.entity.services.MessagingService; +import org.openmetadata.schema.entity.services.MessagingServiceType; + +// Initialize client +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +MessagingServices.setDefaultClient(client); + +// Create the service +CreateMessagingServiceRequest request = new CreateMessagingServiceRequest() + .withName("kafka_prod") + .withDisplayName("Kafka Production") + .withServiceType(MessagingServiceType.Kafka) + .withDescription("Production Kafka cluster for event streaming"); + +MessagingService service = MessagingServices.create(request); +System.out.println("Created: " + service.getFullyQualifiedName()); +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/services/messagingServices" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "kafka_prod", + "displayName": "Kafka Production", + "serviceType": "Kafka", + "connection": { + "config": { + "type": "Kafka", + "bootstrapServers": "kafka1.company.com:9092,kafka2.company.com:9092", + "schemaRegistryURL": "https://schema-registry.company.com" + } + }, + "description": "Production Kafka cluster for event streaming" + }' +``` + + + +--- + +## Retrieve a Messaging Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import MessagingService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +service = MessagingService.retrieve_by_name( + "kafka_prod", + fields=["owners", "tags", "domain"] +) + +# By ID +service = MessagingService.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {service.name}") +print(f"Type: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.MessagingServices; + +// By name +MessagingService service = MessagingServices.retrieveByName( + "kafka_prod", + List.of("owners", "tags", "domain") +); + +// By ID +MessagingService service = MessagingServices.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + service.getName()); +System.out.println("Type: " + service.getServiceType()); +``` + + + +```bash +# By name +curl "https://your-company.getcollate.io/api/v1/services/messagingServices/name/kafka_prod?fields=owners,tags,domain" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/services/messagingServices/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## List Messaging Services + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import MessagingService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all services with auto-pagination +for service in MessagingService.list().auto_paging_iterable(): + print(f"{service.name}: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.MessagingServices; + +// List with auto-pagination +for (MessagingService service : MessagingServices.list().autoPagingIterable()) { + System.out.println(service.getName() + ": " + service.getServiceType()); +} +``` + + + +```bash +curl "https://your-company.getcollate.io/api/v1/services/messagingServices?limit=50&fields=owners,tags" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Update a Messaging Service + +### Update Description + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import MessagingService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +service = MessagingService.retrieve_by_name("kafka_prod") +service.description = "Enterprise Kafka cluster for real-time event streaming" +updated = MessagingService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.MessagingServices; + +MessagingService service = MessagingServices.retrieveByName("kafka_prod"); +service.setDescription("Enterprise Kafka cluster for real-time event streaming"); +MessagingService updated = MessagingServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/messagingServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Enterprise Kafka cluster"} + ]' +``` + + + +### Set Owner + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import MessagingService, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +team = Team.retrieve_by_name("data-platform") +service = MessagingService.retrieve_by_name("kafka_prod") +service.owners = [{"id": str(team.id), "type": "team"}] +updated = MessagingService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.MessagingServices; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +Team team = Teams.retrieveByName("data-platform"); +MessagingService service = MessagingServices.retrieveByName("kafka_prod"); +service.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +MessagingService updated = MessagingServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/messagingServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' +``` + + + +--- + +## Delete a Messaging Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import MessagingService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +MessagingService.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete with all topics +MessagingService.delete( + "550e8400-e29b-41d4-a716-446655440000", + recursive=True, + hard_delete=True +) +``` + + + +```java +import org.openmetadata.sdk.entities.MessagingServices; + +// Soft delete +MessagingServices.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete with all topics +MessagingServices.delete("550e8400-e29b-41d4-a716-446655440000", true, true); +``` + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/messagingServices/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete with all topics +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/messagingServices/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Connection Configurations + +### Kafka + +```json +{ + "type": "Kafka", + "bootstrapServers": "kafka1.company.com:9092,kafka2.company.com:9092", + "schemaRegistryURL": "https://schema-registry.company.com" +} +``` + +### Pulsar + +```json +{ + "type": "Pulsar", + "hostPort": "pulsar://pulsar.company.com:6650", + "adminEndpoint": "https://pulsar.company.com:8080" +} +``` + +### Kinesis + +```json +{ + "type": "Kinesis", + "awsConfig": { + "awsAccessKeyId": "access-key", + "awsSecretAccessKey": "secret-key", + "awsRegion": "us-east-1" + } +} +``` + +--- + +## Service Types + +| Type | Description | +|------|-------------| +| `Kafka` | Apache Kafka | +| `Pulsar` | Apache Pulsar | +| `Kinesis` | Amazon Kinesis | +| `Redpanda` | Redpanda | +| `CustomMessaging` | Custom messaging platform | + +--- + +## Related + + + + Create and manage topics + + + Learn about entity relationships + + diff --git a/api-reference/services/pipeline/index.mdx b/api-reference/services/pipeline/index.mdx new file mode 100644 index 00000000..8d374075 --- /dev/null +++ b/api-reference/services/pipeline/index.mdx @@ -0,0 +1,533 @@ +--- +title: Pipeline Services +description: Create and manage connections to data pipeline and orchestration platforms +sidebarTitle: Pipeline Services +mode: "wide" +--- + +A **Pipeline Service** represents a connection to a data orchestration platform like Airflow, Dagster, Prefect, or dbt. + + +Entity schema follows the [OpenMetadata Standard](https://openmetadatastandards.org/metadata-specifications/entities/pipeline-service/). + + +## Entity Hierarchy + +Pipeline Services are the top-level entity for orchestration assets: + +``` +PipelineService (this page) +└── Pipeline + └── Task +``` + +## Inheritance + +When you set an **owner** or **domain** on a Pipeline Service, it is inherited by all child pipelines and tasks. + +--- + +## API Endpoints + +| Method | Endpoint | Description | +|--------|----------|-------------| +| `PUT` | `/v1/services/pipelineServices` | Create or update a service | +| `GET` | `/v1/services/pipelineServices` | List services | +| `GET` | `/v1/services/pipelineServices/{id}` | Get by ID | +| `GET` | `/v1/services/pipelineServices/name/{name}` | Get by name | +| `PATCH` | `/v1/services/pipelineServices/{id}` | Update a service | +| `DELETE` | `/v1/services/pipelineServices/{id}` | Delete a service | + +--- + +## The Pipeline Service Object + + + Unique identifier for the pipeline service. + + + + Name of the pipeline service. Must be unique. + + + + Fully qualified name. For services, this equals the name. + + + + Human-readable display name for the service. + + + + Type of pipeline service. One of: `Airflow`, `Dagster`, `DBTCloud`, `Prefect`, `Airbyte`, `Fivetran`, `GluePipeline`, `KafkaConnect`, `Nifi`, `Spline`, `DataFactory`. + + + + Description in Markdown format. + + + + Connection configuration specific to the service type. + + + + Owners of the service (users or teams). Inherited by all child entities. + + + + Domain this service belongs to. Inherited by all child entities. + + + + Tags and classifications applied to this service. + + + + Entity version number, incremented on updates. + + + + Whether the service has been soft-deleted. + + +```json Example Response +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "name": "airflow_prod", + "fullyQualifiedName": "airflow_prod", + "displayName": "Airflow Production", + "serviceType": "Airflow", + "description": "Production Airflow for data pipelines", + "connection": { + "config": { + "type": "Airflow", + "hostPort": "https://airflow.company.com" + } + }, + "owners": [ + { + "id": "team-uuid", + "type": "team", + "name": "data-platform" + } + ], + "version": 0.1, + "deleted": false +} +``` + +--- + +## Create a Pipeline Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import PipelineService +from metadata.generated.schema.api.services.createPipelineService import CreatePipelineServiceRequest +from metadata.generated.schema.entity.services.pipelineService import ( + PipelineServiceType, + PipelineConnection, +) +from metadata.generated.schema.entity.services.connections.pipeline.airflowConnection import AirflowConnection + +# Configure the SDK +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Create the service +request = CreatePipelineServiceRequest( + name="airflow_prod", + displayName="Airflow Production", + serviceType=PipelineServiceType.Airflow, + connection=PipelineConnection( + config=AirflowConnection( + hostPort="https://airflow.company.com", + connection={"type": "Backend"} + ) + ), + description="Production Airflow for data orchestration" +) + +service = PipelineService.create(request) +print(f"Created: {service.fullyQualifiedName}") +``` + + + +```java +import org.openmetadata.sdk.OpenMetadataClient; +import org.openmetadata.sdk.config.OpenMetadataConfig; +import org.openmetadata.sdk.entities.PipelineServices; +import org.openmetadata.schema.api.services.CreatePipelineServiceRequest; +import org.openmetadata.schema.entity.services.PipelineService; +import org.openmetadata.schema.entity.services.PipelineServiceType; + +// Initialize client +OpenMetadataConfig config = OpenMetadataConfig.builder() + .serverUrl("https://your-company.getcollate.io/api") + .accessToken("your-token") + .build(); +OpenMetadataClient client = new OpenMetadataClient(config); +PipelineServices.setDefaultClient(client); + +// Create the service +CreatePipelineServiceRequest request = new CreatePipelineServiceRequest() + .withName("airflow_prod") + .withDisplayName("Airflow Production") + .withServiceType(PipelineServiceType.Airflow) + .withDescription("Production Airflow for data orchestration"); + +PipelineService service = PipelineServices.create(request); +System.out.println("Created: " + service.getFullyQualifiedName()); +``` + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/services/pipelineServices" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "airflow_prod", + "displayName": "Airflow Production", + "serviceType": "Airflow", + "connection": { + "config": { + "type": "Airflow", + "hostPort": "https://airflow.company.com", + "connection": {"type": "Backend"} + } + }, + "description": "Production Airflow for data orchestration" + }' +``` + + + +--- + +## Retrieve a Pipeline Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import PipelineService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# By name +service = PipelineService.retrieve_by_name( + "airflow_prod", + fields=["owners", "tags", "domain"] +) + +# By ID +service = PipelineService.retrieve("550e8400-e29b-41d4-a716-446655440000") + +print(f"Name: {service.name}") +print(f"Type: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.PipelineServices; + +// By name +PipelineService service = PipelineServices.retrieveByName( + "airflow_prod", + List.of("owners", "tags", "domain") +); + +// By ID +PipelineService service = PipelineServices.retrieve("550e8400-e29b-41d4-a716-446655440000"); + +System.out.println("Name: " + service.getName()); +System.out.println("Type: " + service.getServiceType()); +``` + + + +```bash +# By name +curl "https://your-company.getcollate.io/api/v1/services/pipelineServices/name/airflow_prod?fields=owners,tags,domain" \ + -H "Authorization: Bearer $TOKEN" + +# By ID +curl "https://your-company.getcollate.io/api/v1/services/pipelineServices/550e8400-e29b-41d4-a716-446655440000" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## List Pipeline Services + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import PipelineService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# List all services with auto-pagination +for service in PipelineService.list().auto_paging_iterable(): + print(f"{service.name}: {service.serviceType}") +``` + + + +```java +import org.openmetadata.sdk.entities.PipelineServices; + +// List with auto-pagination +for (PipelineService service : PipelineServices.list().autoPagingIterable()) { + System.out.println(service.getName() + ": " + service.getServiceType()); +} +``` + + + +```bash +curl "https://your-company.getcollate.io/api/v1/services/pipelineServices?limit=50&fields=owners,tags" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Update a Pipeline Service + +### Update Description + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import PipelineService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +service = PipelineService.retrieve_by_name("airflow_prod") +service.description = "Enterprise orchestration platform for ETL workflows" +updated = PipelineService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.PipelineServices; + +PipelineService service = PipelineServices.retrieveByName("airflow_prod"); +service.setDescription("Enterprise orchestration platform for ETL workflows"); +PipelineService updated = PipelineServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/pipelineServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "replace", "path": "/description", "value": "Enterprise orchestration platform"} + ]' +``` + + + +### Set Owner + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import PipelineService, Team + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +team = Team.retrieve_by_name("data-platform") +service = PipelineService.retrieve_by_name("airflow_prod") +service.owners = [{"id": str(team.id), "type": "team"}] +updated = PipelineService.update(service.id, service) +``` + + + +```java +import org.openmetadata.sdk.entities.PipelineServices; +import org.openmetadata.sdk.entities.Teams; +import org.openmetadata.schema.type.EntityReference; + +Team team = Teams.retrieveByName("data-platform"); +PipelineService service = PipelineServices.retrieveByName("airflow_prod"); +service.setOwners(List.of( + new EntityReference() + .withId(team.getId()) + .withType("team") +)); +PipelineService updated = PipelineServices.update(service.getId(), service); +``` + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/services/pipelineServices/{id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + {"op": "add", "path": "/owners", "value": [{"id": "team-uuid", "type": "team"}]} + ]' +``` + + + +--- + +## Delete a Pipeline Service + + + +```python +from metadata.sdk import configure +from metadata.sdk.entities import PipelineService + +configure( + host="https://your-company.getcollate.io/api", + jwt_token="your-jwt-token" +) + +# Soft delete (can be restored) +PipelineService.delete("550e8400-e29b-41d4-a716-446655440000") + +# Hard delete with all pipelines +PipelineService.delete( + "550e8400-e29b-41d4-a716-446655440000", + recursive=True, + hard_delete=True +) +``` + + + +```java +import org.openmetadata.sdk.entities.PipelineServices; + +// Soft delete +PipelineServices.delete("550e8400-e29b-41d4-a716-446655440000"); + +// Hard delete with all pipelines +PipelineServices.delete("550e8400-e29b-41d4-a716-446655440000", true, true); +``` + + + +```bash +# Soft delete +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/pipelineServices/{id}" \ + -H "Authorization: Bearer $TOKEN" + +# Hard delete with all pipelines +curl -X DELETE "https://your-company.getcollate.io/api/v1/services/pipelineServices/{id}?hardDelete=true&recursive=true" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +--- + +## Connection Configurations + +### Airflow + +```json +{ + "type": "Airflow", + "hostPort": "https://airflow.company.com", + "connection": { + "type": "Backend" + } +} +``` + +### Dagster + +```json +{ + "type": "Dagster", + "hostPort": "https://dagster.company.com", + "token": "dagster-api-token" +} +``` + +### dbt Cloud + +```json +{ + "type": "DBTCloud", + "host": "https://cloud.getdbt.com", + "accountId": "12345", + "token": "dbt-cloud-token" +} +``` + +### Prefect + +```json +{ + "type": "Prefect", + "hostPort": "https://api.prefect.io", + "apiKey": "prefect-api-key" +} +``` + +--- + +## Service Types + +| Type | Description | +|------|-------------| +| `Airflow` | Apache Airflow | +| `Dagster` | Dagster | +| `DBTCloud` | dbt Cloud | +| `Prefect` | Prefect | +| `Airbyte` | Airbyte | +| `Fivetran` | Fivetran | +| `GluePipeline` | AWS Glue | +| `KafkaConnect` | Kafka Connect | +| `Nifi` | Apache NiFi | +| `Spline` | Spline | +| `DataFactory` | Azure Data Factory | + +--- + +## Related + + + + Create pipelines in this service + + + Track pipeline data lineage + + diff --git a/api-reference/teams-users/index.mdx b/api-reference/teams-users/index.mdx new file mode 100644 index 00000000..019c7fa3 --- /dev/null +++ b/api-reference/teams-users/index.mdx @@ -0,0 +1,218 @@ +--- +title: Teams & Users +description: Manage users, teams, and bot accounts +sidebarTitle: Overview +mode: "wide" +--- + +# Teams & Users + +Manage user accounts, teams, and service accounts (bots) for your organization. + +## Users + +### List Users + + + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.teams.user import User + +metadata = OpenMetadata(config) + +for user in metadata.list_all_entities(entity=User, limit=100): + print(f"{user.name}: {user.email}") +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/users?limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Get User by Name + + + +```python +from metadata.generated.schema.entity.teams.user import User + +user = metadata.get_by_name(entity=User, fqn="john.doe") +print(f"ID: {user.id}") +print(f"Email: {user.email}") +print(f"Teams: {[t.name for t in (user.teams or [])]}") +``` + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/users/name/john.doe?fields=teams,roles" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Create User + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/users" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "jane.smith", + "email": "jane.smith@company.com", + "displayName": "Jane Smith", + "description": "Data Engineer", + "isAdmin": false, + "teams": [{"id": "team-uuid", "type": "team"}] + }' +``` + + + +## Teams + +### List Teams + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/teams?limit=50" \ + -H "Authorization: Bearer $TOKEN" + +# Include users and roles +curl -X GET "https://your-company.getcollate.io/api/v1/teams?fields=users,defaultRoles&limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Create Team + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/teams" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "data-platform", + "displayName": "Data Platform Team", + "description": "Platform engineering for data infrastructure", + "teamType": "Group", + "isJoinable": true, + "defaultRoles": [{"id": "role-uuid", "type": "role"}] + }' +``` + + + +### Add User to Team + + + +```bash +curl -X PATCH "https://your-company.getcollate.io/api/v1/teams/{team-id}" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json-patch+json" \ + -d '[ + { + "op": "add", + "path": "/users/-", + "value": {"id": "user-uuid", "type": "user"} + } + ]' +``` + + + +## Bots (Service Accounts) + +Bots are service accounts for automation and integrations. + +### List Bots + + + +```bash +curl -X GET "https://your-company.getcollate.io/api/v1/bots?limit=50" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +### Create Bot + + + +```bash +curl -X PUT "https://your-company.getcollate.io/api/v1/bots" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "ci-cd-bot", + "displayName": "CI/CD Automation Bot", + "description": "Bot for CI/CD pipeline integrations", + "botUser": { + "name": "ci-cd-bot", + "email": "ci-cd-bot@company.com", + "isBot": true + } + }' +``` + + + +### Generate Bot Token + + + +```bash +# Get bot token +curl -X GET "https://your-company.getcollate.io/api/v1/bots/name/ingestion-bot/token" \ + -H "Authorization: Bearer $TOKEN" + +# Revoke and regenerate token +curl -X PUT "https://your-company.getcollate.io/api/v1/bots/{bot-id}/revokeToken" \ + -H "Authorization: Bearer $TOKEN" +``` + + + +## Team Types + +| Type | Description | +|------|-------------| +| `Organization` | Top-level organization | +| `BusinessUnit` | Business unit within organization | +| `Division` | Division within business unit | +| `Department` | Department within division | +| `Group` | Working group or team | + +## User Fields + +| Field | Description | +|-------|-------------| +| `name` | Username (unique identifier) | +| `displayName` | Full display name | +| `email` | Email address | +| `isAdmin` | Admin privileges | +| `teams` | Team memberships | +| `roles` | Assigned roles | +| `personas` | Assigned personas | +| `isBot` | Bot/service account flag | + +## Endpoints Summary + +| Resource | Endpoints | +|----------|-----------| +| Users | `GET/PUT/PATCH/DELETE /v1/users/{id}` | +| Teams | `GET/PUT/PATCH/DELETE /v1/teams/{id}` | +| Bots | `GET/PUT/PATCH/DELETE /v1/bots/{id}` | +| Roles | `GET/PUT/PATCH/DELETE /v1/roles/{id}` | diff --git a/docs.json b/docs.json index e2c42525..1578ec63 100644 --- a/docs.json +++ b/docs.json @@ -1612,6 +1612,159 @@ } ] }, + { + "tab": "API Reference", + "groups": [ + { + "group": "Getting Started", + "pages": [ + "api-reference/index", + "api-reference/authentication", + "api-reference/errors", + "api-reference/pagination" + ] + }, + { + "group": "Core Concepts", + "pages": [ + "api-reference/core/entities", + "api-reference/core/fully-qualified-names" + ] + }, + { + "group": "Services", + "pages": [ + "api-reference/services/index", + "api-reference/services/database/index", + "api-reference/services/dashboard/index", + "api-reference/services/pipeline/index", + "api-reference/services/messaging/index" + ] + }, + { + "group": "Data Assets", + "pages": [ + { + "group": "Databases", + "pages": [ + "api-reference/data-assets/databases/index", + "api-reference/data-assets/databases/object", + "api-reference/data-assets/databases/create", + "api-reference/data-assets/databases/retrieve", + "api-reference/data-assets/databases/list", + "api-reference/data-assets/databases/update", + "api-reference/data-assets/databases/delete", + "api-reference/data-assets/databases/import-export" + ] + }, + { + "group": "Database Schemas", + "pages": [ + "api-reference/data-assets/schemas/index", + "api-reference/data-assets/schemas/object", + "api-reference/data-assets/schemas/create", + "api-reference/data-assets/schemas/retrieve", + "api-reference/data-assets/schemas/list", + "api-reference/data-assets/schemas/update", + "api-reference/data-assets/schemas/delete" + ] + }, + { + "group": "Tables", + "pages": [ + "api-reference/data-assets/tables/index", + "api-reference/data-assets/tables/create", + "api-reference/data-assets/tables/retrieve", + "api-reference/data-assets/tables/list", + "api-reference/data-assets/tables/update", + "api-reference/data-assets/tables/delete" + ] + }, + { + "group": "Dashboards", + "pages": [ + "api-reference/data-assets/dashboards/index", + "api-reference/data-assets/dashboards/object", + "api-reference/data-assets/dashboards/create", + "api-reference/data-assets/dashboards/retrieve", + "api-reference/data-assets/dashboards/list", + "api-reference/data-assets/dashboards/update", + "api-reference/data-assets/dashboards/delete" + ] + }, + { + "group": "Pipelines", + "pages": [ + "api-reference/data-assets/pipelines/index", + "api-reference/data-assets/pipelines/object", + "api-reference/data-assets/pipelines/create", + "api-reference/data-assets/pipelines/retrieve", + "api-reference/data-assets/pipelines/list", + "api-reference/data-assets/pipelines/update", + "api-reference/data-assets/pipelines/delete" + ] + }, + { + "group": "Topics", + "pages": [ + "api-reference/data-assets/topics/index", + "api-reference/data-assets/topics/object", + "api-reference/data-assets/topics/create", + "api-reference/data-assets/topics/retrieve", + "api-reference/data-assets/topics/list", + "api-reference/data-assets/topics/update", + "api-reference/data-assets/topics/delete" + ] + } + ] + }, + { + "group": "Metadata Operations", + "pages": [ + "api-reference/metadata/descriptions/index", + "api-reference/metadata/owners/index", + "api-reference/metadata/tags/index" + ] + }, + { + "group": "Data Lineage", + "pages": [ + "api-reference/lineage/index" + ] + }, + { + "group": "Data Quality", + "pages": [ + "api-reference/data-quality/test-suites/index", + "api-reference/data-quality/test-cases/index" + ] + }, + { + "group": "Governance", + "pages": [ + "api-reference/governance/index" + ] + }, + { + "group": "Ingestion", + "pages": [ + "api-reference/ingestion/index" + ] + }, + { + "group": "Search", + "pages": [ + "api-reference/search/index" + ] + }, + { + "group": "Teams & Users", + "pages": [ + "api-reference/teams-users/index" + ] + } + ] + }, { "tab": "SDK & API", "groups": [ @@ -1704,6 +1857,12 @@ ] } ] + }, + { + "group": "API", + "pages": [ + "sdk/api" + ] } ] } diff --git a/sdk/api.mdx b/sdk/api.mdx new file mode 100644 index 00000000..9177d9f1 --- /dev/null +++ b/sdk/api.mdx @@ -0,0 +1,244 @@ +--- +title: SDK & API Reference +description: Integrate with Collate using our REST API and official SDKs for Python and Java +slug: /sdk/api +sidebarTitle: API +mode: "wide" +--- + +import { CodeLayout } from '/snippets/components/CodeLayout/CodeLayout.jsx' + +# SDK & API Reference + +Build powerful data catalog integrations using our comprehensive REST API and fluent SDKs. + + + + Complete REST API documentation with examples + + + Official Python client library + + + Official Java client library + + + Set up API authentication + + + +## Quick Start + + +```python Python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import ( + OpenMetadataConnection, +) +from metadata.generated.schema.security.client.openMetadataJWTClientConfig import ( + OpenMetadataJWTClientConfig, +) + +# Configure connection +config = OpenMetadataConnection( + hostPort="https://your-company.getcollate.io/api", + authProvider="openmetadata", + securityConfig=OpenMetadataJWTClientConfig( + jwtToken="your-token-here" + ), +) + +# Initialize client +metadata = OpenMetadata(config) + +# List all tables +from metadata.generated.schema.entity.data.table import Table +for table in metadata.list_all_entities(entity=Table, limit=10): + print(f"{table.fullyQualifiedName}: {table.description}") +``` + +```bash HTTP +# Base URL +BASE_URL="https://your-company.getcollate.io/api/v1" + +# Set your token +export TOKEN="your-token-here" + +# List tables +curl -X GET "$BASE_URL/tables?limit=10" \ + -H "Authorization: Bearer $TOKEN" + +# Get a specific table +curl -X GET "$BASE_URL/tables/name/my_service.my_db.my_schema.my_table" \ + -H "Authorization: Bearer $TOKEN" +``` + + + + +```javascript Base URL +https://your-company.getcollate.io/api/v1 +``` + + + +The API uses Bearer token authentication. You can obtain a token from:

1. Bot Tokens - Settings > Bots > Select bot > Copy token
2. Personal Access Tokens - Profile > Settings > Access Tokens > Generate} +> + +```bash Authentication +curl -X GET "https://your-company.getcollate.io/api/v1/tables" \ + -H "Authorization: Bearer YOUR_TOKEN" +``` + +
+ + + + See detailed authentication setup including token management and security best practices. + + + + + Collate organizes data assets in a hierarchical structure. Understanding this hierarchy is essential for working with the API: +

+ Each entity is identified by a Fully Qualified Name (FQN) that reflects this hierarchy: +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EntityFQN FormatExample
Serviceservice_namesnowflake_prod
Databaseservice.databasesnowflake_prod.analytics
Schemaservice.database.schemasnowflake_prod.analytics.public
Tableservice.database.schema.tablesnowflake_prod.analytics.public.customers
Columnservice.database.schema.table.columnsnowflake_prod.analytics.public.customers.email
+
+ + Learn about the complete entity model and hierarchical relationships. + + +} +> + +```text Hierarchy Structure +Service (DatabaseService, DashboardService, etc.) +└── Database + └── Schema + └── Table + └── Column +``` + +
+ +## Core Resources + +### Data Assets + +| Resource | Description | +|----------|-------------| +| [Tables](/api-reference/data-assets/tables) | Database tables and views | +| [Dashboards](/api-reference/data-assets/dashboards) | BI dashboards and charts | +| [Pipelines](/api-reference/data-assets/pipelines) | Data pipelines and workflows | +| [Topics](/api-reference/data-assets/topics) | Messaging topics and streams | +| [ML Models](/api-reference/data-assets/ml-models) | Machine learning models | +| [Containers](/api-reference/data-assets/containers) | Storage containers | + +### Services + +| Resource | Description | +|----------|-------------| +| [Database Services](/api-reference/services/database) | Snowflake, BigQuery, Postgres, etc. | +| [Dashboard Services](/api-reference/services/dashboard) | Tableau, Looker, Superset, etc. | +| [Pipeline Services](/api-reference/services/pipeline) | Airflow, Dagster, etc. | +| [Messaging Services](/api-reference/services/messaging) | Kafka, Pulsar, etc. | + +### Metadata Operations + +| Resource | Description | +|----------|-------------| +| [Descriptions](/api-reference/metadata/descriptions) | Update entity descriptions | +| [Owners](/api-reference/metadata/owners) | Assign ownership | +| [Tags](/api-reference/metadata/tags) | Apply classification tags | +| [Glossary Terms](/api-reference/metadata/glossary-terms) | Link business glossary terms | +| [Lineage](/api-reference/lineage) | Data lineage relationships | + +## Error Handling + +The API uses standard HTTP status codes: + +| Code | Description | +|------|-------------| +| `200` | Success | +| `201` | Created | +| `400` | Bad Request - Invalid parameters | +| `401` | Unauthorized - Invalid or missing token | +| `403` | Forbidden - Insufficient permissions | +| `404` | Not Found - Entity doesn't exist | +| `409` | Conflict - Entity already exists | +| `429` | Too Many Requests - Rate limited | + +Error responses include details: + +```json +{ + "code": 404, + "errorType": "ENTITY_NOT_FOUND", + "message": "Table sample_data.ecommerce_db.shopify.nonexistent not found" +} +``` + + + See all error codes and handling strategies. + + +## Next Steps + + + + Complete endpoint documentation + + + Python SDK installation and usage + + + Set up secure API access + + + Understand the data model + + diff --git a/snippets/components/APIReference/APICode.jsx b/snippets/components/APIReference/APICode.jsx new file mode 100644 index 00000000..e89de3a9 --- /dev/null +++ b/snippets/components/APIReference/APICode.jsx @@ -0,0 +1,24 @@ +export const APICode = () => { + return ( +
+ + + + {`console.log("Hello World");`} + + + {`print('Hello World!')`} + + + {`class HelloWorld { + public static void main(String[] args) { + System.out.println("Hello, World!"); + } +}`} + + + + +
+ ) +} diff --git a/snippets/components/CodeLayout/CodeLayout.css b/snippets/components/CodeLayout/CodeLayout.css new file mode 100644 index 00000000..a3e1a776 --- /dev/null +++ b/snippets/components/CodeLayout/CodeLayout.css @@ -0,0 +1,13 @@ +/* Main Layout Container */ +.code-layout { + display: grid; + grid-template-columns: 1fr 1fr; + gap: 2rem; + margin: 2rem 0; + align-items: start; + width: 100%; +} + +.code-layout h2 { + margin-top: 0px; +} \ No newline at end of file diff --git a/snippets/components/CodeLayout/CodeLayout.jsx b/snippets/components/CodeLayout/CodeLayout.jsx new file mode 100644 index 00000000..6a34b0a2 --- /dev/null +++ b/snippets/components/CodeLayout/CodeLayout.jsx @@ -0,0 +1,26 @@ +import React from 'react'; +import './CodeLayout.css'; + +export const CodeLayout = ({ title, description, children }) => { + return ( +
+
+ {title && ( +

{title}

+ )} + {description && ( +
+ {description} +
+ )} +
+ +
+
+ {children} +
+
+
+ ); +}; + diff --git a/snippets/components/LanguageSelector/LanguageSelector.jsx b/snippets/components/LanguageSelector/LanguageSelector.jsx new file mode 100644 index 00000000..1f9df87a --- /dev/null +++ b/snippets/components/LanguageSelector/LanguageSelector.jsx @@ -0,0 +1,34 @@ +import React, { useEffect } from "react"; + +export const LanguageSelector = () => { + useEffect(() => { + const getActiveTab = () => { + const activeTab = document.querySelector( + '.client-sdks [data-active="true"]' + ) + console.log(activeTab?.textContent) + } + + getActiveTab() + document.addEventListener('click', getActiveTab) + + return () => document.removeEventListener('click', getActiveTab) + }, []) + + return ( +
+
CLIENT SDKs
+ + + You can add any number of components inside of tabs. For example, a code block: + + + Java + + + Go + + +
+ ) +} \ No newline at end of file diff --git a/snippets/components/StripeAPIDoc/StripeAPIDoc.jsx b/snippets/components/StripeAPIDoc/StripeAPIDoc.jsx new file mode 100644 index 00000000..ac76afa1 --- /dev/null +++ b/snippets/components/StripeAPIDoc/StripeAPIDoc.jsx @@ -0,0 +1,109 @@ +// Stripe-like layout with sticky code panel on the right + +export const StripeAPIDoc = ({ children, endpoint }) => { + const parts = endpoint ? endpoint.split(' ') : []; + const method = parts[0] || ''; + const path = parts.slice(1).join(' '); + + const methodColors = { + GET: { bg: '#dbeafe', color: '#1d4ed8' }, + POST: { bg: '#dcfce7', color: '#16a34a' }, + PUT: { bg: '#fef3c7', color: '#d97706' }, + PATCH: { bg: '#e0e7ff', color: '#4f46e5' }, + DELETE: { bg: '#fee2e2', color: '#dc2626' }, + }; + const colors = methodColors[method] || methodColors.GET; + + return ( +
+ {endpoint && ( +
+ {method} + {path} +
+ )} + {children} +
+ ); +}; + +// Content section (left side text) +export const SectionContent = ({ title, children }) => ( +
+ {title &&

{title}

} +
{children}
+
+); + +// Code block (for right panel display) +export const CodeBlock = ({ title, children }) => { + const fmt = (c) => { + if (typeof c !== 'string') return c; + try { return JSON.stringify(JSON.parse(c), null, 2); } + catch { return c; } + }; + + return ( +
+ {title && ( +
{title}
+ )} +
{fmt(children)}
+
+ ); +}; + +export default StripeAPIDoc; diff --git a/style.css b/style.css index 24d48b3e..48471ef4 100644 --- a/style.css +++ b/style.css @@ -1,4 +1,4 @@ -.homepage-container { +.homepage-container, .sdk-api-container { max-width: 1440px; margin: 0 auto; padding: 0px 10px 40px; @@ -522,6 +522,158 @@ a.guide-card img { padding: 36px 0px 0px; } +/* SDK & API */ + +.sdk-api-container { + padding: 72px 0px; + max-width: 1300px; +} + +.sdk-api-container span[data-as="p"] { + display: block; + margin-bottom: 8px; + color: #414651; +} + +.sdk-api-container a.link { + color: #004CD3 !important; +} + +.sdk-api-container h3 { + color: #222244; + font-size: 20px; +} + +.api-header { + display: grid; + grid-template-columns: repeat(2, 1fr); + gap: 40px; + margin-top: 16px; + padding-bottom: 56px; + border-bottom: 1px solid #EAECF5; +} + +.sdk-api-container .code-group { + background-color: #515865; +} + +.sdk-api-container .code-group button { + color: white !important; + font-weight: 600; + font-size: 14px; +} + +.sdk-api-container .code-group button .absolute { + display: none; +} + +.sdk-api-container [data-component-part="code-block-root"] { + background-color: #3E444F !important; +} + +.sdk-api-container [data-component-part="code-block-root"] pre, .sdk-api-container [data-component-part="code-block-root"] pre .line span { + color: white !important; +} + +.client-sdks { + border: 1px solid #E9EAEB; + border-radius: 8px; +} + +.client-header { + color: #414651; + font-weight: 600; + padding: 24px 24px 0px; + background-color: #FAFAFA; + border-top-left-radius: 8px; + border-top-right-radius: 8px; +} + +.client-sdks [role="tablist"] { + margin-bottom: 0px !important; + display: flex; + justify-content: center; + gap: 100px; +} + +.client-sdks [data-component-part="tabs-list"] { + padding: 10px 24px 0px; + background-color: #FAFAFA; +} + +.client-sdks [data-active="true"] { + border-bottom: 3px solid #1570EF; +} + +.client-sdks [data-active="false"] { + border-bottom: none; +} + +.client-sdks [data-component-part="tab-button"] { + display: flex; + flex-direction: column; + color: #181D27 !important; +} + +.client-sdks [data-component-part="tab-button"] img { + width: 40px; + height: 40px; +} + +.client-sdks [data-component-part="tab-content"] { + background-color: #F5F5F5; + padding: 10px 24px; +} + +.api-header-auth { + display: grid; + grid-template-columns: repeat(2, 1fr); + gap: 40px; + padding-top: 48px; + padding-bottom: 56px; + border-bottom: 1px solid #EAECF5; +} + +.api-header-auth h2 { + color: #222244; + font-size: 20px; + font-weight: 600; + margin-bottom: 12px; +} + +.sdk-api-container [data-component-part="code-group-tab-bar"] div:first-child span { + color: white !important; + font-size: 14px; + font-weight: 600; + margin-left: 8px; +} + +.sdk-api-container [data-component-part="code-group-tab-bar"] [data-testid="copy-code-button"] svg { + color: white; +} + +.sdk-api-container [data-component-part="code-group-tab-bar"] .group { + background-color: #252B37; + border: 1px solid #5B626F; + padding: 4px; + border-radius: 4px; + color: white; +} + +.sdk-api-container [data-component-part="code-group-tab-bar"] .group div svg:first-of-type { + display: none !important; +} + +.sdk-api-container [data-component-part="code-group-tab-bar"] .group div p { + color: white; + margin: 0px; +} + +.sdk-api-container [data-component-part="code-group-tab-bar"] .group div:hover { + background-color: transparent; +} + + @media (max-width: 1280px) { .guide-list { grid-template-columns: repeat(3, 1fr);