Skip to content

Commit ca6e2f0

Browse files
authored
Merge pull request #103681 from KingdomOfEnds/tsi-refresh
TSI ingress and storage improvements
2 parents ddec3aa + c1ba605 commit ca6e2f0

File tree

2 files changed

+89
-47
lines changed

2 files changed

+89
-47
lines changed

articles/time-series-insights/time-series-insights-update-storage-ingress.md

Lines changed: 87 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -8,114 +8,153 @@ ms.workload: big-data
88
ms.service: time-series-insights
99
services: time-series-insights
1010
ms.topic: conceptual
11-
ms.date: 12/31/2019
11+
ms.date: 02/10/2020
1212
ms.custom: seodec18
1313
---
1414

1515
# Data storage and ingress in Azure Time Series Insights Preview
1616

17-
This article describes updates to data storage and ingress for Azure Time Series Insights Preview. It covers the underlying storage structure, file format, and Time Series ID property. It also discusses the underlying ingress process, best practices, and current preview limitations.
17+
This article describes updates to data storage and ingress for Azure Time Series Insights Preview. It describes the underlying storage structure, file format, and Time Series ID property. The underlying ingress process, best practices, and current preview limitations are also described.
1818

1919
## Data ingress
2020

21-
Your Azure Time Series Insights environment contains an Ingestion Engine to collect, process, and store time-series data. When planning your environment, there are some considerations to take into account in order to ensure that all incoming data is processed, and to achieve high ingress scale and minimize ingestion latency (the time taken by TSI to read and process data from the event source).
21+
Your Azure Time Series Insights environment contains an *ingestion engine* to collect, process, and store time-series data.
2222

23-
In Time Series Insights Preview, data ingress policies determine where data can be sourced from and what format the data should have.
23+
There are some considerations to be mindful of to ensure all incoming data is processed, to achieve high ingress scale, and minimize *ingestion latency* (the time taken by Time Series Insights to read and process data from the event source) when [planning your environment](time-series-insights-update-plan.md).
24+
25+
Time Series Insights Preview data ingress policies determine where data can be sourced from and what format the data should have.
2426

2527
### Ingress policies
2628

29+
*Data ingress* involves how data is sent to an Azure Time Series Insights Preview environment.
30+
31+
Key configuration, formatting, and best practices are summarized below.
32+
2733
#### Event Sources
2834

29-
Time Series Insights Preview supports the following event sources:
35+
Azure Time Series Insights Preview supports the following event sources:
3036

3137
- [Azure IoT Hub](../iot-hub/about-iot-hub.md)
3238
- [Azure Event Hubs](../event-hubs/event-hubs-about.md)
3339

34-
Time Series Insights Preview supports a maximum of two event sources per instance.
40+
Azure Time Series Insights Preview supports a maximum of two event sources per instance.
3541

36-
> [!WARNING]
42+
> [!IMPORTANT]
3743
> * You may experience high initial latency when attaching an event source to your Preview environment.
3844
> Event source latency depends on the number of events currently in your IoT Hub or Event Hub.
39-
> * High latency will subside after event source data is first ingested. Contact us by submitting a support ticket through the Azure portal if you experience continued high latency.
45+
> * High latency will subside after event source data is first ingested. Submit a support ticket through the Azure portal if you experience ongoing high latency.
4046
4147
#### Supported data format and types
4248

43-
Azure Time Series Insights supports UTF8 encoded JSON submitted through Azure IoT Hub or Azure Event Hubs.
49+
Azure Time Series Insights supports UTF-8 encoded JSON sent from Azure IoT Hub or Azure Event Hubs.
4450

45-
Below is the list of supported data types.
51+
The supported data types are:
4652

4753
| Data type | Description |
48-
|-----------|------------------|-------------|
49-
| bool | A data type having one of two states: true or false. |
50-
| dateTime | Represents an instant in time, typically expressed as a date and time of day. DateTimes should be in ISO 8601 format. |
51-
| double | A double-precision 64-bit IEEE 754 floating point
52-
| string | Text values, comprised of Unicode characters. |
54+
|---|---|
55+
| **bool** | A data type having one of two states: `true` or `false`. |
56+
| **dateTime** | Represents an instant in time, typically expressed as a date and time of day. Expressed in [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) format. |
57+
| **double** | A double-precision 64-bit [IEEE 754](https://ieeexplore.ieee.org/document/8766229) floating point. |
58+
| **string** | Text values, comprised of Unicode characters. |
5359

5460
#### Objects and arrays
5561

56-
You can send complex types such as objects and arrays as part of your event payload, but your data will undergo a flattening process when stored. For more information on how to shape your JSON events as well as details on complex type and nested object flattening, see the page on [how to shape JSON for ingress and query](./time-series-insights-update-how-to-shape-events.md).
62+
You may send complex types such as objects and arrays as part of your event payload, but your data will undergo a flattening process when stored.
5763

64+
Detailed information describing how to shape your JSON events, send complex type, and nested object flattening is available in [How to shape JSON for ingress and query](./time-series-insights-update-how-to-shape-events.md) to assist with planning and optimization.
5865

5966
### Ingress best practices
6067

6168
We recommend that you employ the following best practices:
6269

63-
* Configure Time Series Insights and your IoT Hub or Event Hub in the same region in order to reduce network incurred ingestion latency.
64-
* Plan for your scale needs by calculating your anticipated ingestion rate and verifying that it falls within the supported rate listed below
70+
* Configure Azure Time Series Insights and any IoT Hub or Event Hub in the same region to reduce potential latency.
71+
72+
* [Plan for your scale needs](time-series-insights-update-plan.md) by calculating your anticipated ingestion rate and verifying that it falls within the supported rate listed below.
73+
6574
* Understand how to optimize and shape your JSON data, as well as the current limitations in preview, by reading [how to shape JSON for ingress and query](./time-series-insights-update-how-to-shape-events.md).
6675

67-
### Ingress scale and limitations in preview
76+
### Ingress scale and Preview limitations
77+
78+
Azure Time Series Insights Preview ingress limitations are described below.
79+
80+
> [!TIP]
81+
> Read [Plan your Preview environment](https://docs.microsoft.com/azure/time-series-insights/time-series-insights-update-plan#review-preview-limits) for a comprehensive list of all Preview limits.
6882
6983
#### Per environment limitations
7084

7185
In general, ingress rates are viewed as the factor of the number of devices that are in your organization, event emission frequency, and the size of each event:
7286

7387
* **Number of devices** × **Event emission frequency** × **Size of each event**.
7488

75-
By default, Time Series Insights preview can ingest incoming data at a rate of up to 1 megabyte per second (MBps) **per TSI environment**. Contact us if this does not meet your requirements, we can support up to 16 MBps for an environment by submitting a support ticket in the Azure portal.
76-
77-
Example 1: Contoso Shipping has 100,000 devices that emit an event three times per minute. The size of an event is 200 bytes. They’re using an Event Hub with 4 partitions as the TSI event source.
78-
The ingestion rate for their TSI environment would be: 100,000 devices * 200 bytes/event * (3/60 event/sec) = 1 MBps.
79-
The ingestion rate per partition would be 0.25 MBps.
80-
Contoso Shipping’s ingestion rate would be within the preview scale limitation.
89+
By default, Time Series Insights preview can ingest incoming data at a rate of **up to 1 megabyte per second (MBps) per Time Series Insights environment**.
90+
91+
> [!TIP]
92+
> * Environment support for ingesting speeds up to 16 MBps can be provided by request.
93+
> * Contact us if you require higher throughput by submitting a support ticket through Azure portal.
8194
82-
Example 2: Contoso Fleet Analytics has 60,000 devices that emit an event every second. They are using an IoT Hub 24 partition count of 4 as the TSI event source. The size of an event is 200 bytes.
83-
The environment ingestion rate would be: 20,000 devices * 200 bytes/event * 1 event/sec = 4 MBps.
84-
The per partition rate would be 1 MBps.
85-
Contoso Fleet Analytics would need to submit a request to TSI via the Azure portal for a dedicated environment to achieve this scale.
95+
* **Example 1:**
96+
97+
Contoso Shipping has 100,000 devices that emit an event three times per minute. The size of an event is 200 bytes. They’re using an Event Hub with four partitions as the Time Series Insights event source.
98+
99+
* The ingestion rate for their Time Series Insights environment would be: **100,000 devices * 200 bytes/event * (3/60 event/sec) = 1 MBps**.
100+
* The ingestion rate per partition would be 0.25 MBps.
101+
* Contoso Shipping’s ingestion rate would be within the preview scale limitation.
102+
103+
* **Example 2:**
104+
105+
Contoso Fleet Analytics has 60,000 devices that emit an event every second. They are using an IoT Hub 24 partition count of 4 as the Time Series Insights event source. The size of an event is 200 bytes.
86106

87-
#### Hub Partitions and Per Partition Limits
107+
* The environment ingestion rate would be: **20,000 devices * 200 bytes/event * 1 event/sec = 4 MBps**.
108+
* The per partition rate would be 1 MBps.
109+
* Contoso Fleet Analytics can submit a request to Time Series Insights through Azure portal to increase the ingestion rate for their environment.
88110

89-
When planning your TSI environment, it's important to consider the configuration of the event source(s) that you'll be connecting to TSI. Both Azure IoT Hub and Event Hubs utilize partitions to enable horizontal scale for event processing. A partition is an ordered sequence of events that is held in a hub. The partition count is set during the IoT or Event Hubs’ creation phase, and is not changeable. For more information on determining the partition count, see the Event Hubs' FAQ How many partitions do I need? For TSI environments using IoT Hub, generally most IoT Hubs only need 4 partitions. Whether or not you're creating a new hub for your TSI environment, or using an existing one, you'll need to calculate your per partition ingestion rate to determine if it is within the preview limits. TSI preview currently has a **per partition** limit of 0.5 MB/s. Use the examples below as a reference, and please note the following IoT Hub-specific consideration if you're an IoT Hub user.
111+
#### Hub partitions and per partition limits
112+
113+
When planning your Time Series Insights environment, it's important to consider the configuration of the event source(s) that you'll be connecting to Time Series Insights. Both Azure IoT Hub and Event Hubs utilize partitions to enable horizontal scale for event processing.
114+
115+
A *partition* is an ordered sequence of events held in a hub. The partition count is set during the hub creation phase and cannot be changed.
116+
117+
For Event Hubs partitioning best practices, review [How many partitions do I need?](https://docs.microsoft.com/azure/event-hubs/event-hubs-faq#how-many-partitions-do-i-need)
118+
119+
> [!NOTE]
120+
> Most IoT Hubs used with Azure Time Series Insights only need four partitions.
121+
122+
Whether you're creating a new hub for your Time Series Insights environment or using an existing one, you'll need to calculate your per partition ingestion rate to determine if it's within the preview limits.
123+
124+
Azure Time Series Insights Preview currently has a general **per partition limit of 0.5 MBps**.
90125

91126
#### IoT Hub-specific considerations
92127

93-
When a device is created in IoT Hub it is assigned to a partition, and the partition assignment will not change. By doing so, IoT Hub is able to guarantee event ordering. However, this has implications for TSI as a downstream reader in certain scenarios. When messages from multiple devices are forwarded to the hub using the same gateway device ID they will arrive in the same partition, thus potentially exceeding the per partition scale limitation.
128+
When a device is created in IoT Hub, it's permanently assigned to a partition. In doing so, IoT Hub is able to guarantee event ordering (since the assignment never changes).
129+
130+
A fixed partition assignment also impacts Time Series Insights instances that are ingesting data sent from IoT Hub downstream. When messages from multiple devices are forwarded to the hub using the same gateway device ID, they may arrive in the same partition at the same time potentially exceeding the per partition scale limits.
94131

95132
**Impact**:
96-
If a single partition experiences a sustained rate of ingestion over the preview limitation there is the potential that the TSI reader will not ever catch up before the IoT Hub data retention period has been exceeded. This would cause a loss of data.
97133

98-
We recommend the following:
134+
* If a single partition experiences a sustained rate of ingestion over the Preview limit, it's possible that Time Series Insights will not sync all device telemetry before the IoT Hub data retention period has been exceeded. As a result, sent data can be lost if the ingestion limits are consistently exceeded.
99135

100-
* Calculate your per environment and per partition ingestion rate before deploying your solution
101-
* Ensure that your IoT Hub devices (and thus partitions) are load-balanced to the furthest extend possible
136+
To mitigate that circumstance, we recommend the following best practices:
102137

103-
> [!WARNING]
138+
* Calculate your per environment and per partition ingestion rates before deploying your solution.
139+
* Ensure that your IoT Hub devices are load-balanced to the furthest extent possible.
140+
141+
> [!IMPORTANT]
104142
> For environments using IoT Hub as an event source, calculate the ingestion rate using the number of hub devices in use to be sure that the rate falls below the 0.5 MBps per partition limitation in preview.
143+
> * Even if several events arrive simultaneously, the Preview limit will not be exceeded.
105144
106145
![IoT Hub Partition Diagram](media/concepts-ingress-overview/iot-hub-partiton-diagram.png)
107146

108-
Refer to the following links for more information on throughput units and partitions:
147+
Refer to the following resources to learn more about optimizing hub throughput and partitions:
109148

110149
* [IoT Hub Scale](https://docs.microsoft.com/azure/iot-hub/iot-hub-scaling)
111150
* [Event Hub Scale](https://docs.microsoft.com/azure/event-hubs/event-hubs-scalability#throughput-units)
112151
* [Event Hub Partitions](https://docs.microsoft.com/azure/event-hubs/event-hubs-features#partitions)
113152

114153
### Data storage
115154

116-
When you create a Time Series Insights Preview pay-as-you-go SKU environment, you create two Azure resources:
155+
When you create a Time Series Insights Preview *pay-as-you-go* (PAYG) SKU environment, you create two Azure resources:
117156

118-
* A Time Series Insights Preview environment that can optionally include warm store capabilities.
157+
* An Azure Time Series Insights Preview environment that can be configured for warm storage.
119158
* An Azure Storage general-purpose V1 blob account for cold data storage.
120159

121160
Data in your warm store is available only via [Time Series Query](./time-series-insights-update-tsq.md) and the [Azure Time Series Insights Preview explorer](./time-series-insights-update-explorer.md).
@@ -127,7 +166,7 @@ Time Series Insights Preview saves your cold store data to Azure Blob storage in
127166
128167
### Data availability
129168

130-
Time Series Insights Preview partitions and indexes data for optimum query performance. Data becomes available to query after it’s indexed. The amount of data that's being ingested can affect this availability.
169+
Azure Time Series Insights Preview partitions and indexes data for optimum query performance. Data becomes available to query after it’s indexed. The amount of data that's being ingested can affect this availability.
131170

132171
> [!IMPORTANT]
133172
> During the preview, you might experience a period of up to 60 seconds before data becomes available. If you experience significant latency beyond 60 seconds, please submit a support ticket through the Azure portal.
@@ -140,13 +179,16 @@ For a thorough description of Azure Blob storage, read the [Storage blobs introd
140179

141180
### Your storage account
142181

143-
When you create a Time Series Insights Preview pay-as-you-go environment, an Azure Storage general-purpose V1 blob account is created as your long-term cold store.
182+
When you create an Azure Time Series Insights Preview PAYG environment, an Azure Storage general-purpose V1 blob account is created as your long-term cold store.
183+
184+
Azure Time Series Insights Preview publishes up to two copies of each event in your Azure Storage account. The initial copy has events ordered by ingestion time. That event order is **always preserved** so other services can access your events without sequencing issues.
144185

145-
Time Series Insights Preview publishes up to two copies of each event in your Azure Storage account. The initial copy has events ordered by ingestion time and is always preserved, so you can use other services to access it. You can use Spark, Hadoop, and other familiar tools to process the raw Parquet files.
186+
> [!NOTE]
187+
> You can also use Spark, Hadoop, and other familiar tools to process the raw Parquet files.
146188
147-
Time Series Insights Preview repartitions the Parquet files to optimize for the Time Series Insights query. This repartitioned copy of the data is also saved.
189+
Time Series Insights Preview also repartitions the Parquet files to optimize for the Time Series Insights query. This repartitioned copy of the data is also saved.
148190

149-
During public preview, data is stored indefinitely in your Azure Storage account.
191+
During public Preview, data is stored indefinitely in your Azure Storage account.
150192

151193
#### Writing and editing Time Series Insights blobs
152194

includes/time-series-insights-preview-limits.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: include file
44
services: digital-twins
55
ms.service: digital-twins
66
ms.topic: include
7-
ms.date: 02/03/2020
7+
ms.date: 02/07/2020
88
author: deepakpalled
99
ms.author: dpalled
1010
manager: cshankar
@@ -45,7 +45,7 @@ A maximum of two event sources per instance is supported.
4545
* Learn how to [Add an event hub source](https://docs.microsoft.com/azure/time-series-insights/time-series-insights-how-to-add-an-event-source-eventhub).
4646
* Configure [an IoT hub source](https://docs.microsoft.com/azure/time-series-insights/time-series-insights-how-to-add-an-event-source-iothub).
4747

48-
By default, [Preview environments support ingress rates](https://docs.microsoft.com/azure/time-series-insights/time-series-insights-update-storage-ingress) up to **1 megabyte per second (MB/s) per environment**. Customers may scale their Preview environments up to **16 MB/s** throughput if necessary. There is also a per-partition limit of **0.5 MB/s**.
48+
By default, [Preview environments support ingress rates](https://docs.microsoft.com/azure/time-series-insights/time-series-insights-update-storage-ingress#ingress-scale-and-preview-limitations) up to **1 megabyte per second (MB/s) per environment**. Customers may scale their Preview environments up to **16 MB/s** throughput if necessary. There is also a per-partition limit of **0.5 MB/s**.
4949

5050
### API limits
5151

0 commit comments

Comments
 (0)