You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-track-experiments.md
+26Lines changed: 26 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,6 +48,7 @@ The following metrics can be added to a run while training an experiment. To vie
48
48
If you want to track or monitor your experiment, you must add code to start logging when you submit the run. The following are ways to trigger the run submission:
49
49
*__Run.start_logging__ - Add logging functions to your training script and start an interactive logging session in the specified experiment. **start_logging** creates an interactive run for use in scenarios such as notebooks. Any metrics that are logged during the session are added to the run record in the experiment.
50
50
*__ScriptRunConfig__ - Add logging functions to your training script and load the entire script folder with the run. **ScriptRunConfig** is a class for setting up configurations for script runs. With this option, you can add monitoring code to be notified of completion or to get a visual widget to monitor.
51
+
*__Designer logging__ - Add logging functions to a drag-&-drop designer pipeline by using the __Execute Python Script__ module. Add Python code to log designer experiments.
51
52
52
53
## Set up the workspace
53
54
Before adding logging and submitting an experiment, you must set up the workspace.
@@ -100,8 +101,33 @@ This example expands on the basic sklearn Ridge model from above. It does a simp
Use the __Execute Python Script__ module to add logging logic to your designer experiments. You can log any value using this workflow, but it's especially useful to log metrics from the __Evaluate Model__ module to track model performance across different runs.
104
107
108
+
1. Connect an __Execute Python Script__ module to the output of your __Evaluate Model__ module.
109
+
110
+

111
+
112
+
1. Paste the following code into the __Execute Python Script__ code editor to log the mean absolute error for your trained model:
113
+
114
+
```python
115
+
# dataframe1 contains the values from Evaluate Model
title: Best practices for SQL on-demand (preview) in Azure Synapse Analytics
3
-
description: Recommendations and best practices you should know as you work with SQL on-demand (preview).
2
+
title: Best practices for SQL on-demand (preview)
3
+
description: Recommendations and best practices you should know when you work with SQL on-demand (preview).
4
4
services: synapse-analytics
5
5
author: filippopovic
6
6
manager: craigg
@@ -14,60 +14,60 @@ ms.reviewer: jrasnick
14
14
15
15
# Best practices for SQL on-demand (preview) in Azure Synapse Analytics
16
16
17
-
In this article, you'll find a collection of best practices for using SQL on-demand (preview). SQL on-demand is an additional resource within Azure Synapse Analytics.
17
+
In this article, you'll find a collection of best practices for using SQL on-demand (preview). SQL on-demand is a resource in Azure Synapse Analytics.
18
18
19
19
## General considerations
20
20
21
-
SQL on-demand allows you to query files in your Azure storage accounts. It doesn't have local storage or ingestion capabilities. As such, all files that the query targets are external to SQL on-demand. Everything related to reading files from storage might have an impact on query performance.
21
+
SQL on-demand allows you to query files in your Azure storage accounts. It doesn't have local storage or ingestion capabilities. So all files that the query targets are external to SQL on-demand. Everything related to reading files from storage might have an impact on query performance.
22
22
23
-
## Colocate Azure Storage account and SQL on-demand
23
+
## Colocate your Azure storage account and SQL on-demand
24
24
25
25
To minimize latency, colocate your Azure storage account and your SQL on-demand endpoint. Storage accounts and endpoints provisioned during workspace creation are located in the same region.
26
26
27
-
For optimal performance, if you access other storage accounts with SQL on-demand, make sure they are in the same region. If they aren't in the same region, there will be increased latency for the data's network transfer between the remote and endpoint's regions.
27
+
For optimal performance, if you access other storage accounts with SQL on-demand, make sure they're in the same region. If they aren't in the same region, there will be increased latency for the data's network transfer between the remote region and the endpoint's region.
28
28
29
29
## Azure Storage throttling
30
30
31
-
Multiple applications and services may access your storage account. Storage throttling occurs when the combined IOPS or throughput generated by applications, services, and SQL on-demand workload exceed the limits of the storage account. As a result, you'll experience a significant negative effect on query performance.
31
+
Multiple applications and services might access your storage account. Storage throttling occurs when the combined IOPS or throughput generated by applications, services, and SQL on-demand workload exceed the limits of the storage account. As a result, you'll experience a significant negative effect on query performance.
32
32
33
-
Once throttling is detected, SQL on-demand has built-in handling of this scenario. SQL on-demand will make requests to storage at a slower pace until throttling is resolved.
33
+
When throttling is detected, SQL on-demand has built-in handling to resolve it. SQL on-demand will make requests to storage at a slower pace until throttling is resolved.
34
34
35
35
> [!TIP]
36
-
> For optimal query execution, you shouldn't stress the storage account with other workloads during query execution.
36
+
> For optimal query execution, don't stress the storage account with other workloads during query execution.
37
37
38
38
## Prepare files for querying
39
39
40
40
If possible, you can prepare files for better performance:
41
41
42
-
- Convert CSV and JSON to Parquet - Parquet is columnar format. Since it's compressed, its file sizes are smaller than CSV or JSON files with the same data. SQL on-demand will need less time and storage requests to read it.
42
+
- Convert CSV and JSON to Parquet. Parquet is a columnar format. Because it's compressed, its file sizes are smaller than CSV or JSON files that contain the same data. SQL on-demand will need less time and fewer storage requests to read it.
43
43
- If a query targets a single large file, you'll benefit from splitting it into multiple smaller files.
44
-
- Try keeping your CSV file size below 10 GB.
44
+
- Try to keep your CSV file size below 10 GB.
45
45
- It's better to have equally sized files for a single OPENROWSET path or an external table LOCATION.
46
-
- Partition your data by storing partitions to different folders or file names - check [use filename and filepath functions to target specific partitions](#use-fileinfo-and-filepath-functions-to-target-specific-partitions).
46
+
- Partition your data by storing partitions to different folders or file names. See [Use filename and filepath functions to target specific partitions](#use-filename-and-filepath-functions-to-target-specific-partitions).
47
47
48
-
## Push wildcards to lower levels in path
48
+
## Push wildcards to lower levels in the path
49
49
50
-
You can use wildcards in your path to [query multiple files and folders](develop-storage-files-overview.md#query-multiple-files-or-folders). SQL on-demand lists files in your storage account starting from first * using storage API and eliminates files that don't match specified path. Reducing initial list of files can improve performance if there are many files that match specified path up to first wildcard.
50
+
You can use wildcards in your path to [query multiple files and folders](develop-storage-files-overview.md#query-multiple-files-or-folders). SQL on-demand lists files in your storage account, starting from the first * using storage API. It eliminates files that don't match the specified path. Reducing the initial list of files can improve performance if there are many files that match the specified path up to the first wildcard.
51
51
52
52
## Use appropriate data types
53
53
54
-
The data types you use in your query impact performance. You can get better performance if you:
54
+
The data types you use in your query affect performance. You can get better performance if you follow these guidelines:
55
55
56
56
- Use the smallest data size that will accommodate the largest possible value.
57
-
- If maximum character value length is 30 characters, use character data type of length 30.
58
-
- If all character column values are of fixed size, use char or nchar. Otherwise, use varchar or nvarchar.
59
-
- If maximum integer column value is 500, use smallint as it is smallest data type that can accommodate this value. You can find integer data type ranges [here](https://docs.microsoft.com/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-ver15).
60
-
- If possible, use varchar and char instead of nvarchar and nchar.
61
-
- Use integer-based data types if possible. Sort, join, and group by operations are performed faster on integers than on characters data.
62
-
- If you're using schema inference, [check inferred data type](#check-inferred-data-types).
57
+
- If the maximum character value length is 30 characters, use a character data type of length 30.
58
+
- If all character column values are of fixed size, use **char** or **nchar**. Otherwise, use **varchar** or **nvarchar**.
59
+
- If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. You can find integer data type ranges in [this article](https://docs.microsoft.com/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-ver15).
60
+
- If possible, use **varchar** and **char** instead of **nvarchar** and **nchar**.
61
+
- Use integer-based data types if possible. SORT, JOIN, and GROUP BY operations complete faster on integers than on character data.
62
+
- If you're using schema inference, [check inferred data types](#check-inferred-data-types).
63
63
64
64
## Check inferred data types
65
65
66
-
[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schema. This comfort comes at the expense of inferred data types being larger than they actually are. It happens when there isn't enough information in source files to make sure appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length and SQL on-demand infers it as varchar(8000).
66
+
[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schemas. The cost of this convenience is that inferred data types are larger than the actual data types. This happens when there isn't enough information in the source files to make sure the appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length. So SQL on-demand infers it as varchar(8000).
67
67
68
-
You can check resulting data types of your query using [sp_describe_first_results_set](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15).
68
+
You can use [sp_describe_first_results_set](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15) to check the resulting data types of your query.
69
69
70
-
The following example shows how you can optimize inferred data types. Procedure is used to show inferred data types.
70
+
The following example shows how you can optimize inferred data types. This procedure is used to show the inferred data types:
71
71
```sql
72
72
EXEC sp_describe_first_result_set N'
73
73
SELECT
@@ -79,15 +79,15 @@ EXEC sp_describe_first_result_set N'
Once we know inferred data types for query, we can specify appropriate data types:
90
+
After you know the inferred data types for the query, you can specify appropriate data types:
91
91
92
92
```sql
93
93
SELECT
@@ -98,44 +98,44 @@ FROM
98
98
FORMAT='PARQUET'
99
99
)
100
100
WITH (
101
-
vendor_id varchar(4), -- we used length of 4 instead of inferred 8000
101
+
vendor_id varchar(4), -- we used length of 4 instead of the inferred 8000
102
102
pickup_datetime datetime2,
103
103
passenger_count int
104
104
) AS nyc;
105
105
```
106
106
107
-
## Use fileinfo and filepath functions to target specific partitions
107
+
## Use filename and filepath functions to target specific partitions
108
108
109
-
Data is often organized in partitions. You can instruct SQL on-demand to query particular folders and files. This function will reduce the number of files and amount of data the query needs to read and process. An added bonus is that you'll achieve better performance.
109
+
Data is often organized in partitions. You can instruct SQL on-demand to query particular folders and files. Doing so will reduce the number of files and the amount of data the query needs to read and process. An added bonus is that you'll achieve better performance.
110
110
111
-
For more information, check [filename](develop-storage-files-overview.md#filename-function) and [filepath](develop-storage-files-overview.md#filepath-function) functions and examples on how to [query specific files](query-specific-files.md).
111
+
For more information, read about the [filename](develop-storage-files-overview.md#filename-function) and [filepath](develop-storage-files-overview.md#filepath-function) functions and see the examples for [querying specific files](query-specific-files.md).
112
112
113
113
> [!TIP]
114
-
> Always cast result of filepath and fileinfo functions to appropriate data types. If you use character data types, make sure appropriate length is used.
114
+
> Always cast the results of the filepath and filename functions to appropriate data types. If you use character data types, be sure to use the appropriate length.
115
115
116
116
> [!NOTE]
117
-
> Functions used for partition elimination, filepath and fileinfo, are not currently supported for external tables other than those created automatically for each table created in Apache Spark for Azure Synapse Analytics.
117
+
> Functions used for partition elimination, filepath and filename, aren't currently supported for external tables, other than those created automatically for each table created in Apache Spark for Azure Synapse Analytics.
118
118
119
-
If your stored data isn't partitioned, consider partitioning it so you can use these functions to optimize queries targeting those files. When [querying partitioned Apache Spark for Azure Synapse tables](develop-storage-files-spark-tables.md) from SQL on-demand, the query will automatically target only the files needed.
119
+
If your stored data isn't partitioned, consider partitioning it. That way you can use these functions to optimize queries that target those files. When you [query partitioned Apache Spark for Azure Synapse tables](develop-storage-files-spark-tables.md) from SQL on-demand, the query will automatically target only the necessary files.
120
120
121
-
## Use PARSER_VERSION 2.0 for querying CSV files
121
+
## Use PARSER_VERSION 2.0 to query CSV files
122
122
123
-
You can use performance optimized parser when querying CSV files. Check [PARSER_VERSION](develop-openrowset.md) for details.
123
+
You can use a performance-optimized parser when you query CSV files. For details, see [PARSER_VERSION](develop-openrowset.md).
124
124
125
125
## Use CETAS to enhance query performance and joins
126
126
127
127
[CETAS](develop-tables-cetas.md) is one of the most important features available in SQL on-demand. CETAS is a parallel operation that creates external table metadata and exports the SELECT query results to a set of files in your storage account.
128
128
129
-
You can use CETAS to store frequently used parts of queries, like joined reference tables, to a new set of files. Next, you can join to this single external table instead of repeating common joins in multiple queries.
129
+
You can use CETAS to store frequently used parts of queries, like joined reference tables, to a new set of files. You can then join to this single external table instead of repeating common joins in multiple queries.
130
130
131
131
As CETAS generates Parquet files, statistics will be automatically created when the first query targets this external table, resulting in improved performance.
132
132
133
-
## AAD pass-through performance
133
+
## Azure AD Pass-through performance
134
134
135
-
SQL on-demand allows you to access files in storage using AAD pass-through or SAS credential. You might experience slower performance with AAD pass-through comparing to SAS.
135
+
SQL on-demand allows you to access files in storage by using Azure Active Directory (Azure AD) Pass-through or SAS credentials. You might experience slower performance with Azure AD Pass-through than you would with SAS.
136
136
137
-
If you need better performance, try SAS credentials to access storage until AAD pass-through performance is improved.
137
+
If you need better performance, try using SAS credentials to access storage until Azure AD Pass-through performance is improved.
138
138
139
139
## Next steps
140
140
141
-
Review the [Troubleshooting](../sql-data-warehouse/sql-data-warehouse-troubleshoot.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json) article for common issues and solutions. If you're working with SQL pool rather than SQL on-demand, see the [Best Practices for SQL pool](best-practices-sql-pool.md) article for specific guidance.
141
+
Review the [troubleshooting](../sql-data-warehouse/sql-data-warehouse-troubleshoot.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json) article for solutions to common problems. If you're working with SQL pools rather than SQL on-demand, see [Best practices for SQL pools](best-practices-sql-pool.md) for specific guidance.
0 commit comments