Skip to content

Commit 4124f80

Browse files
istreeterbenjben
authored andcommitted
Cluster by event_name when creating new table (#402)
Clustering is the BigQuery table feature in which storage blocks are sorted by a user-defined column. Taking the advice of our Analytics Engineers, `event_name` is a good default clustering column, because many typical queries have a filter on the event name. Snowplow users can change the clustering column at any time, if they find a different column is more appropriate for their own query patterns.
1 parent 4fe4467 commit 4124f80

File tree

1 file changed

+7
-0
lines changed
  • modules/core/src/main/scala/com.snowplowanalytics.snowplow.bigquery/processing

1 file changed

+7
-0
lines changed

modules/core/src/main/scala/com.snowplowanalytics.snowplow.bigquery/processing/TableManager.scala

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import org.typelevel.log4cats.slf4j.Slf4jLogger
1717
import com.google.cloud.bigquery.{
1818
BigQuery,
1919
BigQueryOptions,
20+
Clustering,
2021
FieldList,
2122
Schema,
2223
StandardTableDefinition,
@@ -201,6 +202,12 @@ object TableManager {
201202
.setField("load_tstamp")
202203
.build()
203204
}
205+
.setClustering {
206+
Clustering
207+
.newBuilder()
208+
.setFields(List("event_name").asJava)
209+
.build()
210+
}
204211
.build()
205212
TableInfo.of(BigQueryUtils.tableIdOf(config), tableDefinition)
206213
}

0 commit comments

Comments
 (0)