You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight-aks/trino/trino-sharded-sql-connector.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,14 +12,14 @@ The sharded SQL connector allows queries to be executed over data distributed ac
12
12
13
13
## Prerequisites
14
14
15
-
To connect to sharded SQL servers, you need the following:
15
+
To connect to sharded SQL servers, you need:
16
16
17
17
- SQL Server 2012 or higher, or Azure SQL Database.
18
18
- Network access from the Trino coordinator and workers to SQL Server. Port 1433 is the default port.
19
19
20
20
### General configuration
21
21
22
-
The connector can query multiple SQL servers as a single data source. Create a catalog properties file and use `connector.name=sharded-sql` to use sharded SQL connector.
22
+
The connector can query multiple SQL servers as a single data source. Create a catalog properties file and use `connector.name=sharded-sql` to use sharded SQL connector.
The connector uses user-password authentication to query SQL servers. The same user specified in the configuration is expected to authenticate against all the SQL servers.
46
46
47
-
## Schema Definition
47
+
## Schema definition
48
48
49
49
Connector assumes a 2D partition/bucketed layout of the physical data across SQL servers. Schema definition describes this layout.
50
50
Currently, only file based sharding schema definition is supported.
@@ -57,13 +57,13 @@ The following JSON file describes the configuration for a Trino sharded SQL conn
57
57
-**tables**: An array of objects, each representing a table in the database. Each table object contains:
58
58
-**schema**: The schema name of the table, which corresponds to the database in the SQL server.
59
59
-**name**: The name of the table.
60
-
-**sharding_schema**: The name of the sharding schema associated with the table, this acts as a reference to the `sharding_schema` described in the next steps.
60
+
-**sharding_schema**: The name of the sharding schema associated with the table, which acts as a reference to the `sharding_schema` described in the next steps.
61
61
62
62
-**sharding_schema**: An array of objects, each representing a sharding schema. Each sharding schema object contains:
63
63
-**name**: The name of the sharding schema.
64
64
-**partitioned_by**: An array containing one or more columns by which the sharding schema is partitioned.
65
65
-**bucket_count(optional)**: An integer representing the total number of buckets the table is distributed, which defaults to 1.
66
-
-**bucketed_by(optional)**: An array containing one or more columns by which the data is bucketed, note the partitioning and bucketing are hierarchical, i.e each partition is bucketed.
66
+
-**bucketed_by(optional)**: An array containing one or more columns by which the data is bucketed, note the partitioning and bucketing are hierarchical, which means each partition is bucketed.
67
67
-**partition_map**: An array of objects, each representing a partition within the sharding schema. Each partition object contains:
68
68
-**partition**: The partition value specified in the form `partition-key=partitionvalue`
69
69
-**shards**: An array of objects, each representing a shard within the partition, each element of the array represents a replica, trino queries any one of them at random to fetch data for a partition/buckets. Each shard object contains:
@@ -137,13 +137,13 @@ This example describes:
137
137
- Shards are an array of `connectionUrl`. Each member of the array represents a replicaSet. During query execution, Trino selects a shard randomly from the array to query data.
138
138
139
139
140
-
### Partition and Bucket Pruning
140
+
### Partition and bucket pruning
141
141
142
142
Connector evaluates the query constraints during the planning and performs based on the provided query predicates. This helps speed-up query performance, and allows connector to query large amounts of data.
143
143
144
144
Bucketing formula to determine assignments using murmur hash function implementation described [here](https://commons.apache.org/proper/commons-codec/apidocs/src-html/org/apache/commons/codec/digest/MurmurHash3.html#line.388).
145
145
146
-
### Type Mapping
146
+
### Type mapping
147
147
148
148
Sharded SQL connector supports the same type mappings as SQL server connector [type mappings](https://trino.io/docs/current/connector/sqlserver.html#type-mapping).
0 commit comments