|
70 | 70 | :::image type="content" source="media/gen2-migration/adx-log-analytics.png" alt-text="Screenshot of the Azure Data Explorer Log Analytics Workspace" lightbox="media/gen2-migration/adx-log-analytics.png"::: |
71 | 71 |
|
72 | 72 | 1. Data partitioning. |
73 | | - 1. For small size data, the default ADX partitioning is enough. For more complex scenario, with large datasets and right push rate custom ADX data partitioning is more appropriate. Data partitioning is beneficial for scenarios, as follows: |
74 | | - 1. Improving query latency in big data sets. |
75 | | - 1. When querying historical data. |
76 | | - 1. When ingesting out-of-order data. |
77 | | - 1. The custom data partitioning should include: |
78 | | - 1. The timestamp column, which results in time-based partitioning of extents. |
79 | | - 1. A string-based column, which corresponds to the Time Series ID with highest cardinality. |
80 | | - 1. An example of data partitioning containing a Time Series ID column and a timestamp column is: |
81 | | - |
82 | | -``` |
83 | | -.alter table events policy partitioning |
84 | | - { |
85 | | - "PartitionKeys": [ |
86 | | - { |
87 | | - "ColumnName": "timeSeriesId", |
88 | | - "Kind": "Hash", |
89 | | - "Properties": { |
90 | | - "Function": "XxHash64", |
91 | | - "MaxPartitionCount": 32, |
92 | | - "PartitionAssignmentMode": "Uniform" |
93 | | - } |
94 | | - }, |
95 | | - { |
96 | | - "ColumnName": "timestamp", |
97 | | - "Kind": "UniformRange", |
98 | | - "Properties": { |
99 | | - "Reference": "1970-01-01T00:00:00", |
100 | | - "RangeSize": "1.00:00:00", |
101 | | - "OverrideCreationTime": true |
102 | | - } |
103 | | - } |
104 | | - ] , |
105 | | - "EffectiveDateTime": "1970-01-01T00:00:00", |
106 | | - "MinRowCountPerOperation": 0, |
107 | | - "MaxRowCountPerOperation": 0, |
108 | | - "MaxOriginalSizePerOperation": 0 |
109 | | - } |
110 | | -``` |
111 | | -For more references, check [ADX Data Partitioning Policy](/azure/data-explorer/kusto/management/partitioningpolicy). |
| 73 | + 1. For most data sets, the default ADX partitioning is enough. |
| 74 | + 1. Data partitioning is beneficial in a very specific set of scenarios, and shouldn't be applied otherwise: |
| 75 | + 1. Improving query latency in big data sets where most queries filter on a high cardinality string column, e.g. a time-series ID. |
| 76 | + 1. When ingesting out-of-order data, e.g. when events from the past may be ingested days or weeks after their generation in the origin. |
| 77 | + 1. For more information, check [ADX Data Partitioning Policy](/azure/data-explorer/kusto/management/partitioningpolicy). |
112 | 78 |
|
113 | 79 | #### Prepare for Data Ingestion |
114 | 80 |
|
|
0 commit comments