Skip to content

Commit 38aa191

Browse files
authored
Merge pull request #99614 from lyrana/ingress-overview-updates
Ingress overview updates
2 parents 9431231 + 8bfc788 commit 38aa191

File tree

4 files changed

+71
-24
lines changed

4 files changed

+71
-24
lines changed

articles/time-series-insights/TOC.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
href: time-series-insights-send-events.md
9292
- name: Scale your environment
9393
href: time-series-insights-how-to-scale-your-environment.md
94-
- name: Shape JSON for queries
94+
- name: Shape JSON for ingress and queries
9595
href: how-to-shape-query-json.md
9696
- name: Mitigate throttling
9797
href: time-series-insights-environment-mitigate-latency.md
@@ -107,7 +107,7 @@
107107
href: time-series-insights-update-how-to-troubleshoot.md
108108
- name: Data modeling
109109
href: time-series-insights-update-how-to-tsm.md
110-
- name: Shape JSON for queries
110+
- name: Shape JSON for ingress and queries
111111
href: time-series-insights-update-how-to-shape-events.md
112112
- name: Connect to Power BI
113113
href: how-to-connect-power-bi.md

articles/time-series-insights/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ landingContent:
6464
url: time-series-insights-update-how-to-manage.md
6565
- text: Grant data access
6666
url: time-series-insights-data-access.md
67-
- text: Shape JSON for queries
67+
- text: Shape JSON for ingress and queries
6868
url: time-series-insights-update-how-to-shape-events.md
6969

7070
- title: API Reference
28.2 KB
Loading

articles/time-series-insights/time-series-insights-update-storage-ingress.md

Lines changed: 68 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: 'Data storage and ingress in Preview - Azure Time Series Insights | Microsoft Docs'
33
description: Learn about data storage and ingress in Azure Time Series Insights Preview.
4-
author: deepakpalled
5-
ms.author: dpalled
4+
author: lyrana
5+
ms.author: lyhughes
66
manager: cshankar
77
ms.workload: big-data
88
ms.service: time-series-insights
@@ -24,41 +24,88 @@ In Time Series Insights Preview, data ingress policies determine where data can
2424

2525
### Ingress policies
2626

27+
#### Event Sources
28+
2729
Time Series Insights Preview supports the following event sources:
2830

2931
- [Azure IoT Hub](../iot-hub/about-iot-hub.md)
3032
- [Azure Event Hubs](../event-hubs/event-hubs-about.md)
3133

32-
Time Series Insights Preview supports a maximum of two event sources per instance. Azure Time Series Insights supports JSON submitted through Azure IoT Hub or Azure Event Hubs.
34+
Time Series Insights Preview supports a maximum of two event sources per instance.
3335

3436
> [!WARNING]
3537
> * You may experience high initial latency when attaching an event source to your Preview environment.
3638
> Event source latency depends on the number of events currently in your IoT Hub or Event Hub.
37-
> * High latency will subside after event source data is first ingested. Please contact us by submitting a support ticket through the Azure portal if you experience continued high latency.
39+
> * High latency will subside after event source data is first ingested. Contact us by submitting a support ticket through the Azure portal if you experience continued high latency.
40+
41+
#### Supported data format and types
42+
43+
Azure Time Series Insights supports UTF8 encoded JSON submitted through Azure IoT Hub or Azure Event Hubs.
44+
45+
Below is the list of supported data types.
46+
47+
| Data type | Description |
48+
|-----------|------------------|-------------|
49+
| bool | A data type having one of two states: true or false. |
50+
| dateTime | Represents an instant in time, typically expressed as a date and time of day. DateTimes should be in ISO 8601 format. |
51+
| double | A double-precision 64-bit IEEE 754 floating point
52+
| string | Text values, comprised of Unicode characters. |
3853

39-
## Ingress best practices
54+
#### Objects and arrays
55+
56+
You can send complex types such as objects and arrays as part of your event payload, but your data will undergo a flattening process when stored. For more information on how to shape your JSON events as well as details on complex type and nested object flattening, see the page on [how to shape JSON for ingress and query](./time-series-insights-update-how-to-shape-events.md).
57+
58+
59+
### Ingress best practices
4060

4161
We recommend that you employ the following best practices:
4262

43-
* Configure Time Series Insights and an IoT hub or event hub in the same region. This will reduce ingestion latency incurred due to the network.
63+
* Configure Time Series Insights and your IoT Hub or Event Hub in the same region in order to reduce network incurred ingestion latency.
4464
* Plan for your scale needs by calculating your anticipated ingestion rate and verifying that it falls within the supported rate listed below
4565
* Understand how to optimize and shape your JSON data, as well as the current limitations in preview, by reading [how to shape JSON for ingress and query](./time-series-insights-update-how-to-shape-events.md).
4666

4767
### Ingress scale and limitations in preview
4868

49-
By default, Preview environments support ingress rates up to **1 megabyte per second (MB/s) per environment**. Customers may scale their Preview environments up to **16 MB/s** throughput if necessary.
50-
There is also a per-partition limit of **0.5 MB/s**.
51-
52-
The per-partition limit has implications for customers using IoT Hub. Specifically, given the affinity between an IoT Hub device and a partition. In scenarios where one gateway device is forwarding messages to hub using its own device ID and connection string, there is the danger of reaching the 0.5 MB/s limit given that messages will arrive in a single partition, even if the event payload specifies different Time Series IDs.
69+
#### Per environment limitations
5370

5471
In general, ingress rates are viewed as the factor of the number of devices that are in your organization, event emission frequency, and the size of each event:
5572

5673
* **Number of devices** × **Event emission frequency** × **Size of each event**.
5774

58-
> [!TIP]
59-
> For environments using IoT Hub as an event source, calculate the ingestion rate using the number of hub connections in use, rather than total devices in use or in the organization.
75+
By default, Time Series Insights preview can ingest incoming data at a rate of up to 1 megabyte per second (MBps) **per TSI environment**. Contact us if this does not meet your requirements, we can support up to 16 MBps for an environment by submitting a support ticket in the Azure portal.
76+
77+
Example 1: Contoso Shipping has 100,000 devices that emit an event three times per minute. The size of an event is 200 bytes. They’re using an Event Hub with 4 partitions as the TSI event source.
78+
The ingestion rate for their TSI environment would be: 100,000 devices * 200 bytes/event * (3/60 event/sec) = 1 MBps.
79+
The ingestion rate per partition would be 0.25 MBps.
80+
Contoso Shipping’s ingestion rate would be within the preview scale limitation.
81+
82+
Example 2: Contoso Fleet Analytics has 60,000 devices that emit an event every second. They are using an IoT Hub 24 partition count of 4 as the TSI event source. The size of an event is 200 bytes.
83+
The environment ingestion rate would be: 20,000 devices * 200 bytes/event * 1 event/sec = 4 MBps.
84+
The per partition rate would be 1 MBps.
85+
Contoso Fleet Analytics would need to submit a request to TSI via the Azure portal for a dedicated environment to achieve this scale.
86+
87+
#### Hub Partitions and Per Partition Limits
88+
89+
When planning your TSI environment, it's important to consider the configuration of the event source(s) that you'll be connecting to TSI. Both Azure IoT Hub and Event Hubs utilize partitions to enable horizontal scale for event processing. A partition is an ordered sequence of events that is held in a hub. The partition count is set during the IoT or Event Hubs’ creation phase, and is not changeable. For more information on determining the partition count, see the Event Hubs' FAQ How many partitions do I need? For TSI environments using IoT Hub, generally most IoT Hubs only need 4 partitions. Whether or not you're creating a new hub for your TSI environment, or using an existing one, you'll need to calculate your per partition ingestion rate to determine if it is within the preview limits. TSI preview currently has a **per partition** limit of 0.5 MB/s. Use the examples below as a reference, and please note the following IoT Hub-specific consideration if you're an IoT Hub user.
90+
91+
#### IoT Hub-specific considerations
92+
93+
When a device is created in IoT Hub it is assigned to a partition, and the partition assignment will not change. By doing so, IoT Hub is able to guarantee event ordering. However, this has implications for TSI as a downstream reader in certain scenarios. When messages from multiple devices are forwarded to the hub using the same gateway device ID they will arrive in the same partition, thus potentially exceeding the per partition scale limitation.
94+
95+
**Impact**:
96+
If a single partition experiences a sustained rate of ingestion over the preview limitation there is the potential that the TSI reader will not ever catch up before the IoT Hub data retention period has been exceeded. This would cause a loss of data.
97+
98+
We recommend the following:
99+
100+
* Calculate your per environment and per partition ingestion rate before deploying your solution
101+
* Ensure that your IoT Hub devices (and thus partitions) are load-balanced to the furthest extend possible
102+
103+
> [!WARNING]
104+
> For environments using IoT Hub as an event source, calculate the ingestion rate using the number of hub devices in use to be sure that the rate falls below the 0.5 MBps per partition limitation in preview.
105+
106+
![IoT Hub Partition Diagram](media/concepts-ingress-overview/iot-hub-partiton-diagram.png)
60107

61-
For more information about throughput units, limits, and partitions:
108+
Refer to the following links for more information on throughput units and partitions:
62109

63110
* [IoT Hub Scale](https://docs.microsoft.com/azure/iot-hub/iot-hub-scaling)
64111
* [Event Hub Scale](https://docs.microsoft.com/azure/event-hubs/event-hubs-scalability#throughput-units)
@@ -83,7 +130,7 @@ Time Series Insights Preview saves your cold store data to Azure Blob storage in
83130
Time Series Insights Preview partitions and indexes data for optimum query performance. Data becomes available to query after it’s indexed. The amount of data that's being ingested can affect this availability.
84131

85132
> [!IMPORTANT]
86-
> The upcoming general availability (GA) release of Time Series Insights will make data available in 60 seconds after it's read from the event source. During the preview, you might experience a longer period before data becomes available. If you experience significant latency beyond 60 seconds, please submit a support ticket through the Azure portal.
133+
> During the preview, you might experience a period of up to 60 seconds before data becomes available. If you experience significant latency beyond 60 seconds, please submit a support ticket through the Azure portal.
87134
88135
## Azure Storage
89136

@@ -101,25 +148,25 @@ Time Series Insights Preview repartitions the Parquet files to optimize for the
101148

102149
During public preview, data is stored indefinitely in your Azure Storage account.
103150

104-
### Writing and editing Time Series Insights blobs
151+
#### Writing and editing Time Series Insights blobs
105152

106153
To ensure query performance and data availability, don't edit or delete any blobs that Time Series Insights Preview creates.
107154

108-
### Accessing and exporting data from Time Series Insights Preview
155+
#### Accessing and exporting data from Time Series Insights Preview
109156

110157
You might want to access data viewed in the Time Series Insights Preview explorer to use in conjunction with other services. For example, you can use your data to build a report in Power BI or to train a machine learning model by using Azure Machine Learning Studio. Or, you can use your data to transform, visualize, and model in your Jupyter Notebooks.
111158

112159
You can access your data in three general ways:
113160

114161
* From the Time Series Insights Preview explorer. You can export data as a CSV file from the explorer. For more information, read [Time Series Insights Preview explorer](./time-series-insights-update-explorer.md).
115-
* From the Time Series Insights Preview API. You can reach the API endpoint at `/getRecorded`. To learn more about this API, read [Time Series Query](./time-series-insights-update-tsq.md).
162+
* From the Time Series Insights Preview API using Get Events Query. To learn more about this API, read [Time Series Query](./time-series-insights-update-tsq.md).
116163
* Directly from an Azure Storage account. You need read access to whatever account you're using to access your Time Series Insights Preview data. For more information, read [Manage access to your storage account resources](../storage/blobs/storage-manage-access-to-resources.md).
117164

118-
### Data deletion
165+
#### Data deletion
119166

120167
Don't delete your Time Series Insights Preview files. Manage related data from within Time Series Insights Preview only.
121168

122-
## Parquet file format and folder structure
169+
### Parquet file format and folder structure
123170

124171
Parquet is an open-source columnar file format that was designed for efficient storage and performance. Time Series Insights Preview uses Parquet for these reasons. It partitions data by Time Series ID for query performance at scale.
125172

@@ -146,12 +193,12 @@ Time Series Insights Preview events are mapped to Parquet file contents as follo
146193

147194
* Each event maps to a single row.
148195
* Every row includes the **timestamp** column with an event time stamp. The time-stamp property is never null. It defaults to **event enqueued time** if the time-stamp property isn't specified in the event source. The time stamp is always in UTC.
149-
* Every row includes the Time Series ID column as defined when the Time Series Insights environment is created. The property name includes the `_string` suffix.
196+
* Every row includes the Time Series ID column(s) as defined when the Time Series Insights environment is created. The property name includes the `_string` suffix.
150197
* All other properties sent as telemetry data are mapped to column names that end with `_string` (string), `_bool` (Boolean), `_datetime` (datetime), or `_double` (double), depending on the property type.
151198
* This mapping scheme applies to the first version of the file format, referenced as **V=1**. As this feature evolves, the name might be incremented.
152199

153200
## Next steps
154201

155-
- Read [Azure Time Series Insights Preview storage and ingress](./time-series-insights-update-storage-ingress.md).
202+
- Read [how to shape JSON for ingress and query](./time-series-insights-update-how-to-shape-events.md).
156203

157204
- Read about the new [data modeling](./time-series-insights-update-tsm.md).

0 commit comments

Comments
 (0)