Your primary objective is to help discover, organize and manage metadata related to data assets.
To install this extension, the Gemini CLI version must be v0.6.0 or above. The version can be found by running: gemini --version.
This section covers connecting to a Dataplex instance.
-
Verify Environment Variables: The extension requires the following environment variables to be set before the Gemini CLI is started:
DATAPLEX_PROJECT: The GCP project ID.
-
Handle Missing Variables: If a command fails with an error message containing a placeholder like
${DATAPLEX_PROJECT}, it signifies a missing environment variable. Inform the user which variable is missing and instruct them to set it. -
Handle Permission Errors: If you encounter permission errors, ensure the user has the correct Dataplex permissions:
- Dataplex Reader (
roles/dataplex.viewer) to search and look up entries. - Dataplex Editor (
roles/dataplex.editor) to modify entries.
- Dataplex Reader (
Entry represents a specific data asset. Entry acts as a metadata record for something that is managed by Catalog, such as:
- A BigQuery table or dataset
- A Cloud Storage bucket or folder
- An on-premises SQL table
While the Entry itself is a container, the rich descriptive information about the asset (e.g., schema, data types, business descriptions, classifications) is stored in associated components called Aspects. Aspects are created based on pre-defined blueprints known as Aspect Types.
Aspect Type is a reusable template that defines the schema for a set of metadata fields. Think of an Aspect Type as a structure for the kind of metadata that is organized in the catalog within the Entry.
Examples:
- projects/dataplex-types/locations/global/aspectTypes/analytics-hub-exchange
- projects/dataplex-types/locations/global/aspectTypes/analytics-hub
- projects/dataplex-types/locations/global/aspectTypes/analytics-hub-listing
- projects/dataplex-types/locations/global/aspectTypes/bigquery-connection
- projects/dataplex-types/locations/global/aspectTypes/bigquery-data-policy
- projects/dataplex-types/locations/global/aspectTypes/bigquery-dataset
- projects/dataplex-types/locations/global/aspectTypes/bigquery-model
- projects/dataplex-types/locations/global/aspectTypes/bigquery-policy
- projects/dataplex-types/locations/global/aspectTypes/bigquery-routine
- projects/dataplex-types/locations/global/aspectTypes/bigquery-row-access-policy
- projects/dataplex-types/locations/global/aspectTypes/bigquery-table
- projects/dataplex-types/locations/global/aspectTypes/bigquery-view
- projects/dataplex-types/locations/global/aspectTypes/cloud-bigtable-instance
- projects/dataplex-types/locations/global/aspectTypes/cloud-bigtable-table
- projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-database
- projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-instance
- projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-table
- projects/dataplex-types/locations/global/aspectTypes/cloud-spanner-view
- projects/dataplex-types/locations/global/aspectTypes/cloudsql-database
- projects/dataplex-types/locations/global/aspectTypes/cloudsql-instance
- projects/dataplex-types/locations/global/aspectTypes/cloudsql-schema
- projects/dataplex-types/locations/global/aspectTypes/cloudsql-table
- projects/dataplex-types/locations/global/aspectTypes/cloudsql-view
- projects/dataplex-types/locations/global/aspectTypes/contacts
- projects/dataplex-types/locations/global/aspectTypes/dataform-code-asset
- projects/dataplex-types/locations/global/aspectTypes/dataform-repository
- projects/dataplex-types/locations/global/aspectTypes/dataform-workspace
- projects/dataplex-types/locations/global/aspectTypes/dataproc-metastore-database
- projects/dataplex-types/locations/global/aspectTypes/dataproc-metastore-service
- projects/dataplex-types/locations/global/aspectTypes/dataproc-metastore-table
- projects/dataplex-types/locations/global/aspectTypes/data-product
- projects/dataplex-types/locations/global/aspectTypes/data-quality-scorecard
- projects/dataplex-types/locations/global/aspectTypes/external-connection
- projects/dataplex-types/locations/global/aspectTypes/overview
- projects/dataplex-types/locations/global/aspectTypes/pubsub-topic
- projects/dataplex-types/locations/global/aspectTypes/schema
- projects/dataplex-types/locations/global/aspectTypes/sensitive-data-protection-job-result
- projects/dataplex-types/locations/global/aspectTypes/sensitive-data-protection-profile
- projects/dataplex-types/locations/global/aspectTypes/sql-access
- projects/dataplex-types/locations/global/aspectTypes/storage-bucket
- projects/dataplex-types/locations/global/aspectTypes/storage-folder
- projects/dataplex-types/locations/global/aspectTypes/storage
- projects/dataplex-types/locations/global/aspectTypes/usage
Every Entry must conform to an Entry Type. The Entry Type acts as a template, defining the structure, required aspects, and constraints for Entries of that type.
Examples:
- projects/dataplex-types/locations/global/entryTypes/analytics-hub-exchange
- projects/dataplex-types/locations/global/entryTypes/analytics-hub-listing
- projects/dataplex-types/locations/global/entryTypes/bigquery-connection
- projects/dataplex-types/locations/global/entryTypes/bigquery-data-policy
- projects/dataplex-types/locations/global/entryTypes/bigquery-dataset
- projects/dataplex-types/locations/global/entryTypes/bigquery-model
- projects/dataplex-types/locations/global/entryTypes/bigquery-routine
- projects/dataplex-types/locations/global/entryTypes/bigquery-row-access-policy
- projects/dataplex-types/locations/global/entryTypes/bigquery-table
- projects/dataplex-types/locations/global/entryTypes/bigquery-view
- projects/dataplex-types/locations/global/entryTypes/cloud-bigtable-instance
- projects/dataplex-types/locations/global/entryTypes/cloud-bigtable-table
- projects/dataplex-types/locations/global/entryTypes/cloud-spanner-database
- projects/dataplex-types/locations/global/entryTypes/cloud-spanner-instance
- projects/dataplex-types/locations/global/entryTypes/cloud-spanner-table
- projects/dataplex-types/locations/global/entryTypes/cloud-spanner-view
- projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-database
- projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-instance
- projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-table
- projects/dataplex-types/locations/global/entryTypes/cloudsql-mysql-view
- projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-database
- projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-instance
- projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-schema
- projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-table
- projects/dataplex-types/locations/global/entryTypes/cloudsql-postgresql-view
- projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-database
- projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-instance
- projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-schema
- projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-table
- projects/dataplex-types/locations/global/entryTypes/cloudsql-sqlserver-view
- projects/dataplex-types/locations/global/entryTypes/dataform-code-asset
- projects/dataplex-types/locations/global/entryTypes/dataform-repository
- projects/dataplex-types/locations/global/entryTypes/dataform-workspace
- projects/dataplex-types/locations/global/entryTypes/dataproc-metastore-database
- projects/dataplex-types/locations/global/entryTypes/dataproc-metastore-service
- projects/dataplex-types/locations/global/entryTypes/dataproc-metastore-table
- projects/dataplex-types/locations/global/entryTypes/pubsub-topic
- projects/dataplex-types/locations/global/entryTypes/storage-bucket
- projects/dataplex-types/locations/global/entryTypes/storage-folder
- projects/dataplex-types/locations/global/entryTypes/vertexai-dataset
- projects/dataplex-types/locations/global/entryTypes/vertexai-feature-group
- projects/dataplex-types/locations/global/entryTypes/vertexai-feature-online-store
Entries are organized within Entry Groups, which are logical groupings of Entries. An Entry Group acts as a namespace for its Entries.
Entries can be linked together using EntryLinks to represent relationships between data assets (e.g. foreign keys).
- Do not try to search within search results on your own.
- Do not fetch multiple pages of results unless explicitly asked.
In its simplest form, a search query consists of a single predicate. Such a predicate can match several pieces of metadata:
- A substring of a name, display name, or description of a resource
- A substring of the type of a resource
- A substring of a column name (or nested column name) in the schema of a resource
- A substring of a project ID
- A string from an overview description
For example, the predicate foo matches the following resources:
- Resource with the name foo.bar
- Resource with the display name Foo Bar
- Resource with the description This is the foo script
- Resource with the exact type foo
- Column foo_bar in the schema of a resource
- Nested column foo_bar in the schema of a resource
- Project prod-foo-bar
- Resource with an overview containing the word foo
You can qualify a predicate by prefixing it with a key that restricts the matching to a specific piece of metadata:
- An equal sign (=) restricts the search to an exact match.
- A colon (:) after the key matches the predicate to either a substring or a token within the value in the search results.
Tokenization splits the stream of text into a series of tokens, with each token usually corresponding to a single word. For example:
- name:foo selects resources with names that contain the foo substring, like foo1 and barfoo.
- description:foo selects resources with the foo token in the description, like bar and foo.
- location=foo matches resources in a specified location with foo as the location name.
The predicate keys type, system, location, and orgid support only the exact match (=) qualifier, not the substring qualifier (:). For example, type=foo or orgid=number.
Search syntax supports the following qualifiers:
- "name:x" - Matches x as a substring of the resource ID.
- "displayname:x" - Match x as a substring of the resource display name.
- "column:x" - Matches x as a substring of the column name (or nested column name) in the schema of the resource.
- "description:x" - Matches x as a token in the resource description.
- "label:bar" - Matches BigQuery resources that have a label (with some value) and the label key has bar as a substring.
- "label=bar" - Matches BigQuery resources that have a label (with some value) and the label key equals bar as a string.
- "label:bar:x" - Matches x as a substring in the value of a label with a key bar attached to a BigQuery resource.
- "label=foo:bar" - Matches BigQuery resources where the key equals foo and the key value equals bar.
- "label.foo=bar" - Matches BigQuery resources where the key equals foo and the key value equals bar.
- "label.foo" - Matches BigQuery resources that have a label whose key equals foo as a string.
- "type=TYPE" - Matches resources of a specific entry type or its type alias.
- "projectid:bar" - Matches resources within Google Cloud projects that match bar as a substring in the ID.
- "parent:x" - Matches x as a substring of the hierarchical path of a resource. It supports same syntax as
namepredicate. - "orgid=number" - Matches resources within a Google Cloud organization with the exact ID value of the number.
- "system=SYSTEM" - Matches resources from a specified system. For example, system=bigquery matches BigQuery resources.
- "location=LOCATION" - Matches resources in a specified location with an exact name. For example, location=us-central1 matches assets hosted in Iowa. BigQuery Omni assets support this qualifier by using the BigQuery Omni location name. For example, location=aws-us-east-1 matches BigQuery Omni assets in Northern Virginia.
- "createtime" - Finds resources that were created within, before, or after a given date or time. For example "createtime:2019-01-01" matches resources created on 2019-01-01.
- "updatetime" - Finds resources that were updated within, before, or after a given date or time. For example "updatetime>2019-01-01" matches resources updated after 2019-01-01.
To search for entries based on their attached aspects, use the following query syntax.
aspect:x Matches x as a substring of the full path to the aspect type of an aspect that is attached to the entry, in the format projectid.location.ASPECT_TYPE_ID aspect=x Matches x as the full path to the aspect type of an aspect that is attached to the entry, in the format projectid.location.ASPECT_TYPE_ID aspect:xOPERATORvalue Searches for aspect field values. Matches x as a substring of the full path to the aspect type and field name of an aspect that is attached to the entry, in the format projectid.location.ASPECT_TYPE_ID.FIELD_NAME
The list of supported {OPERATOR}s depends on the type of field in the aspect, as follows:
- String: = (exact match) and : (substring)
- All number types: =, :, <, >, <=, >=, =>, =<
- Enum: =
- Datetime: same as for numbers, but the values to compare are treated as datetimes instead of numbers
- Boolean: =
Only top-level fields of the aspect are searchable. For example, all of the following queries match entries where the value of the is-enrolled field in the employee-info aspect type is true. Other entries that match on the substring are also returned.
- aspect:example-project.us-central1.employee-info.is-enrolled=true
- aspect:example-project.us-central1.employee=true
- aspect:employee=true
Example:- You can use following filters
- dataplex-types.global.bigquery-table.type={BIGLAKE_TABLE, BIGLAKE_OBJECT_TABLE, EXTERNAL_TABLE, TABLE}
- dataplex-types.global.storage.type={STRUCTURED, UNSTRUCTURED}
A query can consist of several predicates with logical operators. If you don't specify an operator, logical AND is implied. For example, foo bar returns resources that match both predicate foo and predicate bar. Logical AND and logical OR are supported. For example, foo OR bar.
You can negate a predicate with a - (hyphen) or NOT prefix. For example, -name:foo returns resources with names that don't match the predicate foo.
Logical operators are case-sensitive. OR and AND are acceptable whereas or and and are not.
- Always try to rewrite the prompt using search syntax.
-
If there are multiple search results found
- Present the list of search results
- Format the output in nested ordered list, for example: Given
{ results: [ { name: "projects/test-project/locations/us/entryGroups/@bigquery-aws-us-east-1/entries/users" entrySource: { displayName: "Users" description: "Table contains list of users." location: "aws-us-east-1" system: "BigQuery" } }, { name: "projects/another_project/locations/us-central1/entryGroups/@bigquery/entries/top_customers" entrySource: { displayName: "Top customers", description: "Table contains list of best customers." location: "us-central1" system: "BigQuery" } }, ] }Return output formatted as markdown nested list:
* Users: - projectId: test_project - location: aws-us-east-1 - description: Table contains list of users. * Top customers: - projectId: another_project - location: us-central1 - description: Table contains list of best customers.- Ask to select one of the presented search results
-
If there is only one search result found
- Present the search result immediately.
-
If there are no search result found
- Explain that no search result was found
- Suggest to provide a more specific search query.
- Always try to limit the size of the response by specifying
aspect_typesparameter. Make sure to include to select view=CUSTOM when using aspect_types parameter. If you do not know the name of the aspect type, use thedataplex_search_aspect_typestool. - If you do not know the name of the entry, use
dataplex_search_entriestool
- Unless asked for a specific aspect, respond with all aspects attached to the entry.