Skip to content

Commit 64f250a

Browse files
authored
Merge pull request #106054 from esung22/patch-1
Minor updates to the storage doc
2 parents d2f5ff6 + 0090c3d commit 64f250a

File tree

1 file changed

+13
-22
lines changed

1 file changed

+13
-22
lines changed

articles/time-series-insights/time-series-insights-update-storage-ingress.md

Lines changed: 13 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -154,10 +154,10 @@ Refer to the following resources to learn more about optimizing hub throughput a
154154

155155
When you create a Time Series Insights Preview *pay-as-you-go* (PAYG) SKU environment, you create two Azure resources:
156156

157-
* An Azure Time Series Insights Preview environment that can be configured for warm storage.
157+
* An Azure Time Series Insights Preview environment that can be configured for warm data storage.
158158
* An Azure Storage general-purpose V1 blob account for cold data storage.
159159

160-
Data in your warm store is available only via [Time Series Query](./time-series-insights-update-tsq.md) and the [Azure Time Series Insights Preview explorer](./time-series-insights-update-explorer.md).
160+
Data in your warm store is available only via [Time Series Query](./time-series-insights-update-tsq.md) and the [Azure Time Series Insights Preview explorer](./time-series-insights-update-explorer.md). Your warm store will contain recent data within the [retention period](./time-series-insights-update-plan.md#the-preview-environment) selected when creating the Time Series Insights environment.
161161

162162
Time Series Insights Preview saves your cold store data to Azure Blob storage in the [Parquet file format](#parquet-file-format-and-folder-structure). Time Series Insights Preview manages this cold store data exclusively, but it's available for you to read directly as standard Parquet files.
163163

@@ -181,50 +181,41 @@ For a thorough description of Azure Blob storage, read the [Storage blobs introd
181181

182182
When you create an Azure Time Series Insights Preview PAYG environment, an Azure Storage general-purpose V1 blob account is created as your long-term cold store.
183183

184-
Azure Time Series Insights Preview publishes up to two copies of each event in your Azure Storage account. The initial copy has events ordered by ingestion time. That event order is **always preserved** so other services can access your events without sequencing issues.
185-
186-
> [!NOTE]
187-
> You can also use Spark, Hadoop, and other familiar tools to process the raw Parquet files.
188-
189-
Time Series Insights Preview also repartitions the Parquet files to optimize for the Time Series Insights query. This repartitioned copy of the data is also saved.
184+
Azure Time Series Insights Preview retains up to two copies of each event in your Azure Storage account. One copy stores events ordered by ingestion time, always allowing access to events in a time-ordered sequence. Over time, Time Series Insights Preview also creates a repartitioned copy of the data to optimize for performant Time Series Insights query.
190185

191186
During public Preview, data is stored indefinitely in your Azure Storage account.
192187

193188
#### Writing and editing Time Series Insights blobs
194189

195190
To ensure query performance and data availability, don't edit or delete any blobs that Time Series Insights Preview creates.
196191

197-
#### Accessing and exporting data from Time Series Insights Preview
198-
199-
You might want to access data viewed in the Time Series Insights Preview explorer to use in conjunction with other services. For example, you can use your data to build a report in Power BI or to train a machine learning model by using Azure Machine Learning Studio. Or, you can use your data to transform, visualize, and model in your Jupyter Notebooks.
192+
#### Accessing Time Series Insights Preview cold store data
200193

201-
You can access your data in three general ways:
194+
In addition to accessing your data from the [Time Series Insights Preview explorer](./time-series-insights-update-explorer.md) and [Time Series Query](./time-series-insights-update-tsq.md), you may also want to access your data directly from the Parquet files stored in the cold store. For example, you can read, transform, and cleanse data in a Jupyter notebook, then use it to train your Azure Machine Learning model in the same Spark workflow.
202195

203-
* From the Time Series Insights Preview explorer. You can export data as a CSV file from the explorer. For more information, read [Time Series Insights Preview explorer](./time-series-insights-update-explorer.md).
204-
* From the Time Series Insights Preview API using Get Events Query. To learn more about this API, read [Time Series Query](./time-series-insights-update-tsq.md).
205-
* Directly from an Azure Storage account. You need read access to whatever account you're using to access your Time Series Insights Preview data. For more information, read [Manage access to your storage account resources](../storage/blobs/storage-manage-access-to-resources.md).
196+
To access data directly from your Azure Storage account, you need read access to the account used to store your Time Series Insights Preview data. You can then read selected data based on the creation time of the Parquet file located in the `PT=Time` folder described below in the [Parquet file format](#parquet-file-format-and-folder-structure) section. For more information on enabling read access to your storage account, see [Manage access to your storage account resources](../storage/blobs/storage-manage-access-to-resources.md).
206197

207198
#### Data deletion
208199

209200
Don't delete your Time Series Insights Preview files. Manage related data from within Time Series Insights Preview only.
210201

211202
### Parquet file format and folder structure
212203

213-
Parquet is an open-source columnar file format that was designed for efficient storage and performance. Time Series Insights Preview uses Parquet for these reasons. It partitions data by Time Series ID for query performance at scale.
204+
Parquet is an open-source columnar file format designed for efficient storage and performance. Time Series Insights Preview uses Parquet to enable Time Series ID-based query performance at scale.
214205

215206
For more information about the Parquet file type, read the [Parquet documentation](https://parquet.apache.org/documentation/latest/).
216207

217208
Time Series Insights Preview stores copies of your data as follows:
218209

219-
* The first, initial copy is partitioned by ingestion time and stores data roughly in order of arrival. The data resides in the `PT=Time` folder:
210+
* The first, initial copy is partitioned by ingestion time and stores data roughly in order of arrival. This data resides in the `PT=Time` folder:
220211

221212
`V=1/PT=Time/Y=<YYYY>/M=<MM>/<YYYYMMDDHHMMSSfff>_<TSI_INTERNAL_SUFFIX>.parquet`
222213

223-
* The second, repartitioned copy is partitioned by a grouping of Time Series IDs and resides in the `PT=TsId` folder:
214+
* The second, repartitioned copy is grouped by Time Series IDs and resides in the `PT=TsId` folder:
224215

225216
`V=1/PT=TsId/Y=<YYYY>/M=<MM>/<YYYYMMDDHHMMSSfff>_<TSI_INTERNAL_SUFFIX>.parquet`
226217

227-
In both cases, the time values correspond to blob creation time. Data in the `PT=Time` folder is preserved. Data in the `PT=TsId` folder will be optimized for query over time and will not remain static.
218+
In both cases, the time property of the Parquet file corresponds to blob creation time. Data in the `PT=Time` folder is preserved with no changes once it's written to the file. Data in the `PT=TsId` folder will be optimized for query over time and is not static.
228219

229220
> [!NOTE]
230221
> * `<YYYY>` maps to a four-digit year representation.
@@ -234,10 +225,10 @@ In both cases, the time values correspond to blob creation time. Data in the `PT
234225
Time Series Insights Preview events are mapped to Parquet file contents as follows:
235226

236227
* Each event maps to a single row.
237-
* Every row includes the **timestamp** column with an event time stamp. The time-stamp property is never null. It defaults to **event enqueued time** if the time-stamp property isn't specified in the event source. The time stamp is always in UTC.
238-
* Every row includes the Time Series ID column(s) as defined when the Time Series Insights environment is created. The property name includes the `_string` suffix.
228+
* Every row includes the **timestamp** column with an event time stamp. The time-stamp property is never null. It defaults to the **event enqueued time** if the time-stamp property isn't specified in the event source. The stored time-stamp is always in UTC.
229+
* Every row includes the Time Series ID (TSID) column(s) as defined when the Time Series Insights environment is created. The TSID property name includes the `_string` suffix.
239230
* All other properties sent as telemetry data are mapped to column names that end with `_string` (string), `_bool` (Boolean), `_datetime` (datetime), or `_double` (double), depending on the property type.
240-
* This mapping scheme applies to the first version of the file format, referenced as **V=1**. As this feature evolves, the name might be incremented.
231+
* This mapping schema applies to the first version of the file format, referenced as **V=1** and stored in the base folder of the same name. As this feature evolves, this mapping schema might change and the reference name incremented.
241232

242233
## Next steps
243234

0 commit comments

Comments
 (0)