@@ -345,8 +345,9 @@ ipv4_file = "geoip-v4.parquet"
345345ipv6_file = " geoip-v6.parquet"
346346
347347[output .parquet ]
348- ipv4_bucket_size = 16 # Default: /16 prefix
349- ipv6_bucket_size = 16 # Default: /16 prefix
348+ ipv4_bucket_size = 16 # Default: /16 prefix
349+ ipv6_bucket_size = 16 # Default: /16 prefix
350+ ipv6_bucket_type = " string" # Default: "string" (hex), or "int" (60-bit integer)
350351
351352[[network .columns ]]
352353name = " start_int"
@@ -386,7 +387,9 @@ AND NET.IPV4_TO_INT64(NET.IP_FROM_STRING('203.0.113.100')) BETWEEN start_int AND
386387
387388#### IPv6 Lookup
388389
389- For IPv6, the bucket is a hex string. Use ` TO_HEX() ` with ` NET.IP_TRUNC() ` :
390+ The query depends on your ` ipv6_bucket_type ` configuration.
391+
392+ ** Using default ` ipv6_bucket_type = "string" ` (hex string):**
390393
391394``` sql
392395-- Using default ipv6_bucket_size = 16
@@ -396,6 +399,21 @@ WHERE network_bucket = TO_HEX(NET.IP_TRUNC(NET.IP_FROM_STRING('2001:db8::1'), 16
396399AND NET .IP_FROM_STRING (' 2001:db8::1' ) BETWEEN start_int AND end_int;
397400```
398401
402+ ** Using ` ipv6_bucket_type = "int" ` (60-bit int64):**
403+
404+ ``` sql
405+ -- Using default ipv6_bucket_size = 16
406+ SELECT *
407+ FROM ` project.dataset.geoip_v6`
408+ WHERE network_bucket = CAST(CONCAT(' 0x' , SUBSTR(
409+ TO_HEX(NET .IP_TRUNC (NET .IP_FROM_STRING (' 2001:db8::1' ), 16 )), 1 , 15
410+ )) AS INT64)
411+ AND NET .IP_FROM_STRING (' 2001:db8::1' ) BETWEEN start_int AND end_int;
412+ ```
413+
414+ The int type expression extracts the first 60 bits (15 hex chars) of the
415+ truncated IPv6 address as an integer.
416+
399417### Why Bucketing Helps
400418
401419Without bucketing, BigQuery must scan every row to check the range condition.
@@ -422,13 +440,32 @@ Both rows have the same `start_int`/`end_int` (the full /15 range), but
422440different ` network_bucket ` values (2.0.0.0 = 33554432, 2.1.0.0 = 33619968).
423441Queries for IPs in either bucket will find the network.
424442
425- ### Why Bucket is a Hex String
443+ ### IPv6 Bucket Type Options
444+
445+ IPv6 buckets can be stored as either hex strings (default) or int64 values:
446+
447+ ** String type (default):**
448+
449+ - Format: 32-character hex string (e.g., "20010db8000000000000000000000000")
450+ - Storage: 32 bytes per value
451+
452+ ** Int type (` ipv6_bucket_type = "int" ` ):**
453+
454+ - Format: First 60 bits of the bucket address as int64
455+ - Storage: 8 bytes per value (4x smaller than string)
456+
457+ We use 60 bits (not 64) because 60-bit values always fit in a positive int64,
458+ which simplifies BigQuery queries by avoiding two's complement handling.
459+
460+ ** When to use each type:**
461+
462+ - Use ** string** (default) for databases where hex string representations are
463+ simpler to work with.
464+ - Use ** int** for reduced storage cost at the price of more complicated queries.
426465
427- BigQuery cannot cluster on the ` bytes ` type, so we can't use the same type as we
428- do for ` start_int ` and ` end_int ` . Using ` int ` to include the prefix or using
429- ` bignumeric ` would be an option, but both are more complicated to query with.
430- Another reason to use a hex string is Snowflake's ` PARSE_IP() ` function provides
431- the address in this format.
466+ We do not provide a ` bytes ` type for the IPv6 bucket. Primarily this is because
467+ there so far has not been a need. For example, BigQuery cannot cluster on
468+ ` bytes ` , so it is not helpful there.
432469
433470## Common Query Patterns
434471
0 commit comments