| page_title | subcategory | description |
|---|---|---|
airbyte_source_azure_blob_storage Resource - terraform-provider-airbyte |
SourceAzureBlobStorage Resource |
SourceAzureBlobStorage Resource
resource "airbyte_source_azure_blob_storage" "my_source_azureblobstorage" {
configuration = {
azure_blob_storage_account_name = "airbyte5storage"
azure_blob_storage_container_name = "airbytetescontainername"
azure_blob_storage_endpoint = "blob.core.windows.net"
credentials = {
authenticate_via_storage_account_key = {
azure_blob_storage_account_key = "Z8ZkZpteggFx394vm+PJHnGTvdRncaYS+JhLKdj789YNmD+iyGTnG+PV+POiuYNhBg/ACS+LKjd%4FG3FHGN12Nd=="
}
}
delivery_method = {
copy_raw_files = {
preserve_directory_structure = false
}
}
start_date = "2021-01-01"
streams = [
{
days_to_sync_if_history_is_full = 1
format = {
excel_format = {
# ...
}
}
globs = [
"..."
]
input_schema = "...my_input_schema..."
legacy_prefix = "...my_legacy_prefix..."
name = "...my_name..."
primary_key = "...my_primary_key..."
recent_n_files_to_read_for_schema_discovery = 2
schemaless = true
use_first_found_file_for_schema_discovery = false
validation_policy = "Wait for Discover"
}
]
}
definition_id = "3385920f-d837-42e0-b72d-7927f28bf9f2"
name = "...my_name..."
secret_id = "...my_secret_id..."
workspace_id = "2c3aeaad-c70f-44a8-a981-aca12752c864"
}configuration(Attributes) NOTE: When this Spec is changed, legacy_config_transformer.py must also be modified to uptake the changes because it is responsible for converting legacy Azure Blob Storage v0 configs into v1 configs using the File-Based CDK. (see below for nested schema)name(String) Name of the source e.g. dev-mysql-instance.workspace_id(String)
definition_id(String) The UUID of the connector definition. One of configuration.sourceType or definitionId must be provided. Default: "fdaaba68-4875-4ed9-8fcd-4ae1e0a25093"; Requires replacement if changed.secret_id(String) Optional secretID obtained through the public API OAuth redirect flow. Requires replacement if changed.
created_at(Number)resource_allocation(Attributes) actor or actor definition specific resource requirements. if default is set, these are the requirements that should be set for ALL jobs run for this actor definition. it is overriden by the job type specific configurations. if not set, the platform will use defaults. these values will be overriden by configuration at the connection level. (see below for nested schema)source_id(String)source_type(String)
Required:
azure_blob_storage_account_name(String) The account's name of the Azure Blob Storage.azure_blob_storage_container_name(String) The name of the Azure blob storage container.credentials(Attributes) Credentials for connecting to the Azure Blob Storage (see below for nested schema)streams(Attributes List) Each instance of this configuration defines a stream. Use this to define which files belong in the stream, their format, and how they should be parsed and validated. When sending data to warehouse destination such as Snowflake or BigQuery, each stream is a separate table. (see below for nested schema)
Optional:
azure_blob_storage_endpoint(String) This is Azure Blob Storage endpoint domain name. Leave default value (or leave it empty if run container from command line) to use Microsoft native from example.delivery_method(Attributes) (see below for nested schema)start_date(String) UTC date and time in the format 2017-01-25T00:00:00.000000Z. Any file modified before this date will not be replicated.
Optional:
authenticate_via_client_credentials(Attributes) (see below for nested schema)authenticate_via_oauth2(Attributes) (see below for nested schema)authenticate_via_storage_account_key(Attributes) (see below for nested schema)
Required:
app_client_id(String, Sensitive) Client ID of your Microsoft developer applicationapp_client_secret(String, Sensitive) Client Secret of your Microsoft developer applicationapp_tenant_id(String, Sensitive) Tenant ID of the Microsoft Azure Application
Required:
client_id(String, Sensitive) Client ID of your Microsoft developer applicationclient_secret(String, Sensitive) Client Secret of your Microsoft developer applicationrefresh_token(String, Sensitive) Refresh Token of your Microsoft developer applicationtenant_id(String, Sensitive) Tenant ID of the Microsoft Azure Application user
Required:
azure_blob_storage_account_key(String, Sensitive) The Azure blob storage account key.
Required:
format(Attributes) The configuration options that are used to alter how to read incoming files that deviate from the standard formatting. (see below for nested schema)name(String) The name of the stream.
Optional:
days_to_sync_if_history_is_full(Number) When the state history of the file store is full, syncs will only read files that were last modified in the provided day range. Default: 3globs(List of String) The pattern used to specify which files should be selected from the file system. For more information on glob pattern matching look here. Default: ["**"]input_schema(String) The schema that will be used to validate records extracted from the file. This will override the stream schema that is auto-detected from incoming files.legacy_prefix(String) The path prefix configured in v3 versions of the S3 connector. This option is deprecated in favor of a single glob.primary_key(String) The column or columns (for a composite key) that serves as the unique identifier of a record. If empty, the primary key will default to the parser's default primary key.recent_n_files_to_read_for_schema_discovery(Number) The number of resent files which will be used to discover the schema for this stream.schemaless(Boolean) When enabled, syncs will not validate or structure records against the stream's schema. Default: falseuse_first_found_file_for_schema_discovery(Boolean) When enabled, the source will use the first found file for schema discovery. Helps to avoid long discovery step. Default: falsevalidation_policy(String) The name of the validation policy that dictates sync behavior when a record does not adhere to the stream schema. Default: "Emit Record"; must be one of ["Emit Record", "Skip Record", "Wait for Discover"]
Optional:
avro_format(Attributes) (see below for nested schema)csv_format(Attributes) (see below for nested schema)excel_format(Attributes) (see below for nested schema)jsonl_format(Attributes) (see below for nested schema)parquet_format(Attributes) (see below for nested schema)unstructured_document_format(Attributes) Extract text from document formats (.pdf, .docx, .md, .pptx) and emit as one record per file. (see below for nested schema)
Optional:
double_as_string(Boolean) Whether to convert double fields to strings. This is recommended if you have decimal numbers with a high degree of precision because there can be a loss precision when handling floating point numbers. Default: false
Optional:
delimiter(String) The character delimiting individual cells in the CSV data. This may only be a 1-character string. For tab-delimited data enter '\t'. Default: ","double_quote(Boolean) Whether two quotes in a quoted CSV value denote a single quote in the data. Default: trueencoding(String) The character encoding of the CSV data. Leave blank to default to UTF8. See list of python encodings for allowable options. Default: "utf8"escape_char(String) The character used for escaping special characters. To disallow escaping, leave this field blank.false_values(List of String) A set of case-sensitive strings that should be interpreted as false values. Default: ["n","no","f","false","off","0"]header_definition(Attributes) How headers will be defined.User Providedassumes the CSV does not have a header row and uses the headers provided andAutogeneratedassumes the CSV does not have a header row and the CDK will generate headers using forf{i}whereiis the index starting from 0. Else, the default behavior is to use the header from the CSV file. If a user wants to autogenerate or provide column names for a CSV having headers, they can skip rows. (see below for nested schema)ignore_errors_on_fields_mismatch(Boolean) Whether to ignore errors that occur when the number of fields in the CSV does not match the number of columns in the schema. Default: falseinference_type(String) How to infer the types of the columns. If none, inference default to strings. must be one of ["None", "Primitive Types Only"]null_values(List of String) A set of case-sensitive strings that should be interpreted as null values. For example, if the value 'NA' should be interpreted as null, enter 'NA' in this field. Default: []quote_char(String) The character used for quoting CSV values. To disallow quoting, make this field blank. Default: """skip_rows_after_header(Number) The number of rows to skip after the header row. Default: 0skip_rows_before_header(Number) The number of rows to skip before the header row. For example, if the header row is on the 3rd row, enter 2 in this field. Default: 0strings_can_be_null(Boolean) Whether strings can be interpreted as null values. If true, strings that match the null_values set will be interpreted as null. If false, strings that match the null_values set will be interpreted as the string itself. Default: truetrue_values(List of String) A set of case-sensitive strings that should be interpreted as true values. Default: ["y","yes","t","true","on","1"]
Optional:
autogenerated(Attributes) (see below for nested schema)from_csv(Attributes) (see below for nested schema)user_provided(Attributes) (see below for nested schema)
Required:
column_names(List of String) The column names that will be used while emitting the CSV records
Optional:
decimal_as_float(Boolean) Whether to convert decimal fields to floats. There is a loss of precision when converting decimals to floats, so this is not recommended. Default: false
Optional:
processing(Attributes) Processing configuration (see below for nested schema)skip_unprocessable_files(Boolean) If true, skip files that cannot be parsed and pass the error message along as the _ab_source_file_parse_error field. If false, fail the sync. Default: truestrategy(String) The strategy used to parse documents.fastextracts text directly from the document which doesn't work for all files.ocr_onlyis more reliable, but slower.hi_resis the most reliable, but requires an API key and a hosted instance of unstructured and can't be used with local mode. See the unstructured.io documentation for more details: https://unstructured-io.github.io/unstructured/core/partition.html#partition-pdf. Default: "auto"; must be one of ["auto", "fast", "ocr_only", "hi_res"]
Optional:
local(Attributes) Process files locally, supportingfastandocrmodes. This is the default option. (see below for nested schema)
Optional:
copy_raw_files(Attributes) Copy raw files without parsing their contents. Bits are copied into the destination exactly as they appeared in the source. Recommended for use with unstructured text data, non-text and compressed files. (see below for nested schema)replicate_records(Attributes) Recommended - Extract and load structured records into your destination of choice. This is the classic method of moving data in Airbyte. It allows for blocking and hashing individual fields or files from a structured schema. Data can be flattened, typed and deduped depending on the destination. (see below for nested schema)
Optional:
preserve_directory_structure(Boolean) If enabled, sends subdirectory folder structure along with source file names to the destination. Otherwise, files will be synced by their names only. This option is ignored when file-based replication is not enabled. Default: true
Read-Only:
default(Attributes) optional resource requirements to run workers (blank for unbounded allocations) (see below for nested schema)job_specific(Attributes List) (see below for nested schema)
Read-Only:
cpu_limit(String)cpu_request(String)ephemeral_storage_limit(String)ephemeral_storage_request(String)memory_limit(String)memory_request(String)
Read-Only:
job_type(String) enum that describes the different types of jobs that the platform runs.resource_requirements(Attributes) optional resource requirements to run workers (blank for unbounded allocations) (see below for nested schema)
Read-Only:
cpu_limit(String)cpu_request(String)ephemeral_storage_limit(String)ephemeral_storage_request(String)memory_limit(String)memory_request(String)
Import is supported using the following syntax:
In Terraform v1.5.0 and later, the import block can be used with the id attribute, for example:
import {
to = airbyte_source_azure_blob_storage.my_airbyte_source_azure_blob_storage
id = "..."
}The terraform import command can be used, for example:
terraform import airbyte_source_azure_blob_storage.my_airbyte_source_azure_blob_storage "..."