You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/time-series-insights/time-series-insights-update-storage-ingress.md
+13-22Lines changed: 13 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -154,10 +154,10 @@ Refer to the following resources to learn more about optimizing hub throughput a
154
154
155
155
When you create a Time Series Insights Preview *pay-as-you-go* (PAYG) SKU environment, you create two Azure resources:
156
156
157
-
* An Azure Time Series Insights Preview environment that can be configured for warm storage.
157
+
* An Azure Time Series Insights Preview environment that can be configured for warm data storage.
158
158
* An Azure Storage general-purpose V1 blob account for cold data storage.
159
159
160
-
Data in your warm store is available only via [Time Series Query](./time-series-insights-update-tsq.md) and the [Azure Time Series Insights Preview explorer](./time-series-insights-update-explorer.md).
160
+
Data in your warm store is available only via [Time Series Query](./time-series-insights-update-tsq.md) and the [Azure Time Series Insights Preview explorer](./time-series-insights-update-explorer.md). Your warm store will contain recent data within the [retention period](./time-series-insights-update-plan.md#the-preview-environment) selected when creating the Time Series Insights environment.
161
161
162
162
Time Series Insights Preview saves your cold store data to Azure Blob storage in the [Parquet file format](#parquet-file-format-and-folder-structure). Time Series Insights Preview manages this cold store data exclusively, but it's available for you to read directly as standard Parquet files.
163
163
@@ -181,50 +181,41 @@ For a thorough description of Azure Blob storage, read the [Storage blobs introd
181
181
182
182
When you create an Azure Time Series Insights Preview PAYG environment, an Azure Storage general-purpose V1 blob account is created as your long-term cold store.
183
183
184
-
Azure Time Series Insights Preview publishes up to two copies of each event in your Azure Storage account. The initial copy has events ordered by ingestion time. That event order is **always preserved** so other services can access your events without sequencing issues.
185
-
186
-
> [!NOTE]
187
-
> You can also use Spark, Hadoop, and other familiar tools to process the raw Parquet files.
188
-
189
-
Time Series Insights Preview also repartitions the Parquet files to optimize for the Time Series Insights query. This repartitioned copy of the data is also saved.
184
+
Azure Time Series Insights Preview retains up to two copies of each event in your Azure Storage account. One copy stores events ordered by ingestion time, always allowing access to events in a time-ordered sequence. Over time, Time Series Insights Preview also creates a repartitioned copy of the data to optimize for performant Time Series Insights query.
190
185
191
186
During public Preview, data is stored indefinitely in your Azure Storage account.
192
187
193
188
#### Writing and editing Time Series Insights blobs
194
189
195
190
To ensure query performance and data availability, don't edit or delete any blobs that Time Series Insights Preview creates.
196
191
197
-
#### Accessing and exporting data from Time Series Insights Preview
198
-
199
-
You might want to access data viewed in the Time Series Insights Preview explorer to use in conjunction with other services. For example, you can use your data to build a report in Power BI or to train a machine learning model by using Azure Machine Learning Studio. Or, you can use your data to transform, visualize, and model in your Jupyter Notebooks.
192
+
#### Accessing Time Series Insights Preview cold store data
200
193
201
-
You can access your data in three general ways:
194
+
In addition to accessing your data from the [Time Series Insights Preview explorer](./time-series-insights-update-explorer.md) and [Time Series Query](./time-series-insights-update-tsq.md), you may also want to access your data directly from the Parquet files stored in the cold store. For example, you can read, transform, and cleanse data in a Jupyter notebook, then use it to train your Azure Machine Learning model in the same Spark workflow.
202
195
203
-
* From the Time Series Insights Preview explorer. You can export data as a CSV file from the explorer. For more information, read [Time Series Insights Preview explorer](./time-series-insights-update-explorer.md).
204
-
* From the Time Series Insights Preview API using Get Events Query. To learn more about this API, read [Time Series Query](./time-series-insights-update-tsq.md).
205
-
* Directly from an Azure Storage account. You need read access to whatever account you're using to access your Time Series Insights Preview data. For more information, read [Manage access to your storage account resources](../storage/blobs/storage-manage-access-to-resources.md).
196
+
To access data directly from your Azure Storage account, you need read access to the account used to store your Time Series Insights Preview data. You can then read selected data based on the creation time of the Parquet file located in the `PT=Time` folder described below in the [Parquet file format](#parquet-file-format-and-folder-structure) section. For more information on enabling read access to your storage account, see [Manage access to your storage account resources](../storage/blobs/storage-manage-access-to-resources.md).
206
197
207
198
#### Data deletion
208
199
209
200
Don't delete your Time Series Insights Preview files. Manage related data from within Time Series Insights Preview only.
210
201
211
202
### Parquet file format and folder structure
212
203
213
-
Parquet is an open-source columnar file format that was designed for efficient storage and performance. Time Series Insights Preview uses Parquet for these reasons. It partitions data by Time Series ID for query performance at scale.
204
+
Parquet is an open-source columnar file format designed for efficient storage and performance. Time Series Insights Preview uses Parquet to enable Time Series ID-based query performance at scale.
214
205
215
206
For more information about the Parquet file type, read the [Parquet documentation](https://parquet.apache.org/documentation/latest/).
216
207
217
208
Time Series Insights Preview stores copies of your data as follows:
218
209
219
-
* The first, initial copy is partitioned by ingestion time and stores data roughly in order of arrival. The data resides in the `PT=Time` folder:
210
+
* The first, initial copy is partitioned by ingestion time and stores data roughly in order of arrival. This data resides in the `PT=Time` folder:
In both cases, the time values correspond to blob creation time. Data in the `PT=Time` folder is preserved. Data in the `PT=TsId` folder will be optimized for query over time and will not remain static.
218
+
In both cases, the time property of the Parquet file corresponds to blob creation time. Data in the `PT=Time` folder is preserved with no changes once it's written to the file. Data in the `PT=TsId` folder will be optimized for query over time and is not static.
228
219
229
220
> [!NOTE]
230
221
> *`<YYYY>` maps to a four-digit year representation.
@@ -234,10 +225,10 @@ In both cases, the time values correspond to blob creation time. Data in the `PT
234
225
Time Series Insights Preview events are mapped to Parquet file contents as follows:
235
226
236
227
* Each event maps to a single row.
237
-
* Every row includes the **timestamp** column with an event time stamp. The time-stamp property is never null. It defaults to **event enqueued time** if the time-stamp property isn't specified in the event source. The time stamp is always in UTC.
238
-
* Every row includes the Time Series ID column(s) as defined when the Time Series Insights environment is created. The property name includes the `_string` suffix.
228
+
* Every row includes the **timestamp** column with an event time stamp. The time-stamp property is never null. It defaults to the **event enqueued time** if the time-stamp property isn't specified in the event source. The stored time-stamp is always in UTC.
229
+
* Every row includes the Time Series ID (TSID) column(s) as defined when the Time Series Insights environment is created. The TSID property name includes the `_string` suffix.
239
230
* All other properties sent as telemetry data are mapped to column names that end with `_string` (string), `_bool` (Boolean), `_datetime` (datetime), or `_double` (double), depending on the property type.
240
-
* This mapping scheme applies to the first version of the file format, referenced as **V=1**. As this feature evolves, the name might be incremented.
231
+
* This mapping schema applies to the first version of the file format, referenced as **V=1** and stored in the base folder of the same name. As this feature evolves, this mapping schema might change and the reference name incremented.
0 commit comments