@@ -72,17 +72,40 @@ When `format = "csv"`, you can specify CSV-specific options:
7272[output .csv ]
7373delimiter = " ," # Field delimiter (default: ",")
7474include_header = true # Include column headers (default: true)
75+ ipv4_bucket_size = 16 # Bucket prefix length for IPv4 (default: 16)
76+ ipv6_bucket_size = 16 # Bucket prefix length for IPv6 (default: 16)
77+ ipv6_bucket_type = " string" # IPv6 bucket value type: "string" or "int" (default: "string")
7578```
7679
80+ | Option | Description | Default |
81+ | ------------------ | -------------------------------------------------------------------------- | -------- |
82+ | ` delimiter ` | Field delimiter character | "," |
83+ | ` include_header ` | Include column headers in output | true |
84+ | ` ipv4_bucket_size ` | Prefix length for IPv4 buckets (1-32, when ` network_bucket ` column used) | 16 |
85+ | ` ipv6_bucket_size ` | Prefix length for IPv6 buckets (1-60, when ` network_bucket ` column used) | 16 |
86+ | ` ipv6_bucket_type ` | IPv6 bucket value type: "string" (hex) or "int" (first 60 bits as integer) | "string" |
87+
7788#### Parquet Options
7889
7990When ` format = "parquet" ` , you can specify Parquet-specific options:
8091
8192``` toml
8293[output .parquet ]
83- compression = " snappy" # Compression: "none", "snappy", "gzip", "lz4", "zstd" (default: "snappy")
94+ compression = " snappy" # Compression: "none", "snappy", "gzip", "lz4", "zstd" (default: "snappy")
95+ row_group_size = 500000 # Rows per row group (default: 500000)
96+ ipv4_bucket_size = 16 # Bucket prefix length for IPv4 (default: 16)
97+ ipv6_bucket_size = 16 # Bucket prefix length for IPv6 (default: 16)
98+ ipv6_bucket_type = " string" # IPv6 bucket value type: "string" or "int" (default: "string")
8499```
85100
101+ | Option | Description | Default |
102+ | ------------------ | -------------------------------------------------------------------------- | -------- |
103+ | ` compression ` | Compression codec: "none", "snappy", "gzip", "lz4", "zstd" | "snappy" |
104+ | ` row_group_size ` | Number of rows per row group | 500000 |
105+ | ` ipv4_bucket_size ` | Prefix length for IPv4 buckets (1-32, when ` network_bucket ` column used) | 16 |
106+ | ` ipv6_bucket_size ` | Prefix length for IPv6 buckets (1-60, when ` network_bucket ` column used) | 16 |
107+ | ` ipv6_bucket_type ` | IPv6 bucket value type: "string" (hex) or "int" (first 60 bits as integer) | "string" |
108+
86109#### MMDB Options
87110
88111When ` format = "mmdb" ` , you can specify MMDB-specific options:
@@ -121,6 +144,33 @@ ipv6_file = "merged_ipv6.parquet"
121144
122145When splitting output, both ` ipv4_file ` and ` ipv6_file ` must be configured.
123146
147+ #### IPv6 Bucket Type Options
148+
149+ IPv6 buckets can be stored as either hex strings (default) or int64 values:
150+
151+ ** String type (default):**
152+
153+ - Format: 32-character hex string (e.g., "20010db8000000000000000000000000")
154+ - Storage: 32 bytes per value
155+
156+ ** Int type (` ipv6_bucket_type = "int" ` ):**
157+
158+ - Format: First 60 bits of the bucket address as int64
159+ - Storage: 8 bytes per value (4x smaller than string)
160+
161+ We use 60 bits (not 64) because 60-bit values always fit in a positive int64,
162+ which simplifies queries by avoiding two's complement handling.
163+
164+ ** When to use each type:**
165+
166+ - Use ** string** (default) for databases where hex string representations are
167+ simpler to work with.
168+ - Use ** int** for reduced storage cost at the price of more complicated queries.
169+
170+ We do not provide a ` bytes ` type for the IPv6 bucket. Primarily this is because
171+ there so far has not been a need. For example, BigQuery cannot cluster on
172+ ` bytes ` , so it is not helpful there.
173+
124174### Network Columns
125175
126176Network columns define how IP network information is output. These columns
@@ -134,11 +184,14 @@ type = "cidr" # Output type
134184
135185** Available types:**
136186
137- - ` cidr ` - CIDR notation (e.g., "203.0.113.0/24")
138- - ` start_ip ` - Starting IP address (e.g., "203.0.113.0")
139- - ` end_ip ` - Ending IP address (e.g., "203.0.113.255")
140- - ` start_int ` - Starting IP as integer
141- - ` end_int ` - Ending IP as integer
187+ | Type | Description |
188+ | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
189+ | ` cidr ` | CIDR notation (e.g., "203.0.113.0/24") |
190+ | ` start_ip ` | Starting IP address (e.g., "203.0.113.0") |
191+ | ` end_ip ` | Ending IP address (e.g., "203.0.113.255") |
192+ | ` start_int ` | Starting IP as integer |
193+ | ` end_int ` | Ending IP as integer |
194+ | ` network_bucket ` | Bucket for efficient lookups. IPv4: integer. IPv6: hex string (default) or integer (with ` ipv6_bucket_type = "int" ` ). Requires split files (CSV and Parquet only). |
142195
143196** Default behavior:** If no ` [[network.columns]] ` sections are defined:
144197
0 commit comments