Merge branch 'master' of https://github.com/MicrosoftDocs/azure-docs-pr

SharmisthaRai · SharmisthaRai · commit 159e7ce8b331 · 2020-05-22T23:56:23.000+05:30
diff --git a/articles/machine-learning/how-to-track-experiments.md b/articles/machine-learning/how-to-track-experiments.md
@@ -48,6 +48,7 @@ The following metrics can be added to a run while training an experiment. To vie
 If you want to track or monitor your experiment, you must add code to start logging when you submit the run. The following are ways to trigger the run submission:
 * __Run.start_logging__ - Add logging functions to your training script and start an interactive logging session in the specified experiment. **start_logging** creates an interactive run for use in scenarios such as notebooks. Any metrics that are logged during the session are added to the run record in the experiment.
 * __ScriptRunConfig__ - Add logging functions to your training script and load the entire script folder with the run.  **ScriptRunConfig** is a class for setting up configurations for script runs. With this option, you can add monitoring code to be notified of completion or to get a visual widget to monitor.
+* __Designer logging__ - Add logging functions to a drag-&-drop designer pipeline by using the __Execute Python Script__ module. Add Python code to log designer experiments. 
 
 ## Set up the workspace
 Before adding logging and submitting an experiment, you must set up the workspace.
@@ -100,8 +101,33 @@ This example expands on the basic sklearn Ridge model from above. It does a simp
    [!notebook-python[] (~/MachineLearningNotebooks/how-to-use-azureml/training/train-on-local/train-on-local.ipynb?name=src)]
    [!notebook-python[] (~/MachineLearningNotebooks/how-to-use-azureml/training/train-on-local/train-on-local.ipynb?name=run)]
 
+## Option 3: Log designer experiments
 
+Use the __Execute Python Script__ module to add logging logic to your designer experiments. You can log any value using this workflow, but it's especially useful to log metrics from the __Evaluate Model__ module to track model performance across different runs.
 
+1. Connect an __Execute Python Script__ module to the output of your __Evaluate Model__ module.
+
+    ![Connect Execute Python Script module to Evaluate Model module](./media/how-to-track-experiments/designer-logging-pipeline.png)
+
+1. Paste the following code into the __Execute Python Script__ code editor to log the mean absolute error for your trained model:
+
+    ```python
+    # dataframe1 contains the values from Evaluate Model
+    def azureml_main(dataframe1 = None, dataframe2 = None):
+        print(f'Input pandas.DataFrame #1: {dataframe1}')
+
+        from azureml.core import Run
+
+        run = Run.get_context()
+
+        # Log the mean absolute error to the current run to see the metric in the module detail pane.
+        run.log(name='Mean_Absolute_Error', value=dataframe1['Mean_Absolute_Error'])
+
+        # Log the mean absolute error to the parent run to see the metric in the run details page.
+        run.parent.log(name='Mean_Absolute_Error', value=dataframe1['Mean_Absolute_Error'])
+    
+        return dataframe1,
+    ```
 
 ## Manage a run
 
diff --git a/articles/machine-learning/media/how-to-track-experiments/designer-logging-pipeline.png b/articles/machine-learning/media/how-to-track-experiments/designer-logging-pipeline.png
diff --git a/articles/synapse-analytics/sql/best-practices-sql-on-demand.md b/articles/synapse-analytics/sql/best-practices-sql-on-demand.md
@@ -1,6 +1,6 @@
 ---
-title: Best practices for SQL on-demand (preview) in Azure Synapse Analytics
-description: Recommendations and best practices you should know as you work with SQL on-demand (preview). 
+title: Best practices for SQL on-demand (preview) 
+description: Recommendations and best practices you should know when you work with SQL on-demand (preview). 
 services: synapse-analytics
 author: filippopovic
 manager: craigg
@@ -14,60 +14,60 @@ ms.reviewer: jrasnick
 
 # Best practices for SQL on-demand (preview) in Azure Synapse Analytics
 
-In this article, you'll find a collection of best practices for using SQL on-demand (preview). SQL on-demand is an additional resource within Azure Synapse Analytics.
+In this article, you'll find a collection of best practices for using SQL on-demand (preview). SQL on-demand is a resource in Azure Synapse Analytics.
 
 ## General considerations
 
-SQL on-demand allows you to query files in your Azure storage accounts. It doesn't have local storage or ingestion capabilities. As such, all files that the query targets are external to SQL on-demand. Everything related to reading files from storage might have an impact on query performance.
+SQL on-demand allows you to query files in your Azure storage accounts. It doesn't have local storage or ingestion capabilities. So all files that the query targets are external to SQL on-demand. Everything related to reading files from storage might have an impact on query performance.
 
-## Colocate Azure Storage account and SQL on-demand
+## Colocate your Azure storage account and SQL on-demand
 
 To minimize latency, colocate your Azure storage account and your SQL on-demand endpoint. Storage accounts and endpoints provisioned during workspace creation are located in the same region.
 
-For optimal performance, if you access other storage accounts with SQL on-demand, make sure they are in the same region. If they aren't in the same region, there will be increased latency for the data's network transfer between the remote and endpoint's regions.
+For optimal performance, if you access other storage accounts with SQL on-demand, make sure they're in the same region. If they aren't in the same region, there will be increased latency for the data's network transfer between the remote region and the endpoint's region.
 
 ## Azure Storage throttling
 
-Multiple applications and services may access your storage account. Storage throttling occurs when the combined IOPS or throughput generated by applications, services, and SQL on-demand workload exceed the limits of the storage account. As a result, you'll experience a significant negative effect on query performance.
+Multiple applications and services might access your storage account. Storage throttling occurs when the combined IOPS or throughput generated by applications, services, and SQL on-demand workload exceed the limits of the storage account. As a result, you'll experience a significant negative effect on query performance.
 
-Once throttling is detected, SQL on-demand has built-in handling of this scenario. SQL on-demand will make requests to storage at a slower pace until throttling is resolved.
+When throttling is detected, SQL on-demand has built-in handling to resolve it. SQL on-demand will make requests to storage at a slower pace until throttling is resolved.
 
 > [!TIP]
-> For optimal query execution, you shouldn't stress the storage account with other workloads during query execution.
+> For optimal query execution, don't stress the storage account with other workloads during query execution.
 
 ## Prepare files for querying
 
 If possible, you can prepare files for better performance:
 
-- Convert CSV and JSON to Parquet - Parquet is columnar format. Since it's compressed, its file sizes are smaller than CSV or JSON files with the same data. SQL on-demand will need less time and storage requests to read it.
+- Convert CSV and JSON to Parquet. Parquet is a columnar format. Because it's compressed, its file sizes are smaller than CSV or JSON files that contain the same data. SQL on-demand will need less time and fewer storage requests to read it.
 - If a query targets a single large file, you'll benefit from splitting it into multiple smaller files.
-- Try keeping your CSV file size below 10 GB.
+- Try to keep your CSV file size below 10 GB.
 - It's better to have equally sized files for a single OPENROWSET path or an external table LOCATION.
-- Partition your data by storing partitions to different folders or file names - check [use filename and filepath functions to target specific partitions](#use-fileinfo-and-filepath-functions-to-target-specific-partitions).
+- Partition your data by storing partitions to different folders or file names. See [Use filename and filepath functions to target specific partitions](#use-filename-and-filepath-functions-to-target-specific-partitions).
 
-## Push wildcards to lower levels in path
+## Push wildcards to lower levels in the path
 
-You can use wildcards in your path to [query multiple files and folders](develop-storage-files-overview.md#query-multiple-files-or-folders). SQL on-demand lists files in your storage account starting from first * using storage API and eliminates files that don't match specified path. Reducing initial list of files can improve performance if there are many files that match specified path up to first wildcard.
+You can use wildcards in your path to [query multiple files and folders](develop-storage-files-overview.md#query-multiple-files-or-folders). SQL on-demand lists files in your storage account, starting from the first * using storage API. It eliminates files that don't match the specified path. Reducing the initial list of files can improve performance if there are many files that match the specified path up to the first wildcard.
 
 ## Use appropriate data types
 
-The data types you use in your query impact performance. You can get better performance if you: 
+The data types you use in your query affect performance. You can get better performance if you follow these guidelines: 
 
 - Use the smallest data size that will accommodate the largest possible value.
-  - If maximum character value length is 30 characters, use character data type of length 30.
-  - If all character column values are of fixed size, use char or nchar. Otherwise, use varchar or nvarchar.
-  - If maximum integer column value is 500, use smallint as it is smallest data type that can accommodate this value. You can find integer data type ranges [here](https://docs.microsoft.com/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-ver15).
-- If possible, use varchar and char instead of nvarchar and nchar.
-- Use integer-based data types if possible. Sort, join, and group by operations are performed faster on integers than on characters data.
-- If you're using schema inference, [check inferred data type](#check-inferred-data-types).
+  - If the maximum character value length is 30 characters, use a character data type of length 30.
+  - If all character column values are of fixed size, use **char** or **nchar**. Otherwise, use **varchar** or **nvarchar**.
+  - If the maximum integer column value is 500, use **smallint** because it's the smallest data type that can accommodate this value. You can find integer data type ranges in [this article](https://docs.microsoft.com/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-ver15).
+- If possible, use **varchar** and **char** instead of **nvarchar** and **nchar**.
+- Use integer-based data types if possible. SORT, JOIN, and GROUP BY operations complete faster on integers than on character data.
+- If you're using schema inference, [check inferred data types](#check-inferred-data-types).
 
 ## Check inferred data types
 
-[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schema. This comfort comes at the expense of inferred data types being larger than they actually are. It happens when there isn't enough information in source files to make sure appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length and SQL on-demand infers it as varchar(8000). 
+[Schema inference](query-parquet-files.md#automatic-schema-inference) helps you quickly write queries and explore data without knowing file schemas. The cost of this convenience is that inferred data types are larger than the actual data types. This happens when there isn't enough information in the source files to make sure the appropriate data type is used. For example, Parquet files don't contain metadata about maximum character column length. So SQL on-demand infers it as varchar(8000).
 
-You can check resulting data types of your query using [sp_describe_first_results_set](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15).
+You can use [sp_describe_first_results_set](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql?view=sql-server-ver15) to check the resulting data types of your query.
 
-The following example shows how you can optimize inferred data types. Procedure is used to show inferred data types. 
+The following example shows how you can optimize inferred data types. This procedure is used to show the inferred data types: 
 ```sql  
 EXEC sp_describe_first_result_set N'
 	SELECT
@@ -79,15 +79,15 @@ EXEC sp_describe_first_result_set N'
     	) AS nyc';
 ```
 
-Here is the result set.
+Here's the result set:
 
 |is_hidden|column_ordinal|name|system_type_name|max_length|
 |----------------|---------------------|----------|--------------------|-------------------||
 |0|1|vendor_id|varchar(8000)|8000|
 |0|2|pickup_datetime|datetime2(7)|8|
 |0|3|passenger_count|int|4|
 
-Once we know inferred data types for query, we can specify appropriate data types:
+After you know the inferred data types for the query, you can specify appropriate data types:
 
 ```sql  
 SELECT
@@ -98,44 +98,44 @@ FROM
 		FORMAT='PARQUET'
     ) 
 	WITH (
-		vendor_id varchar(4), -- we used length of 4 instead of inferred 8000
+		vendor_id varchar(4), -- we used length of 4 instead of the inferred 8000
 		pickup_datetime datetime2,
 		passenger_count int
 	) AS nyc;
 ```
 
-## Use fileinfo and filepath functions to target specific partitions
+## Use filename and filepath functions to target specific partitions
 
-Data is often organized in partitions. You can instruct SQL on-demand to query particular folders and files. This function will reduce the number of files and amount of data the query needs to read and process. An added bonus is that you'll achieve better performance.
+Data is often organized in partitions. You can instruct SQL on-demand to query particular folders and files. Doing so will reduce the number of files and the amount of data the query needs to read and process. An added bonus is that you'll achieve better performance.
 
-For more information, check [filename](develop-storage-files-overview.md#filename-function) and [filepath](develop-storage-files-overview.md#filepath-function) functions and examples on how to [query specific files](query-specific-files.md).
+For more information, read about the [filename](develop-storage-files-overview.md#filename-function) and [filepath](develop-storage-files-overview.md#filepath-function) functions and see the examples for [querying specific files](query-specific-files.md).
 
 > [!TIP]
-> Always cast result of filepath and fileinfo functions to appropriate data types. If you use character data types, make sure appropriate length is used.
+> Always cast the results of the filepath and filename functions to appropriate data types. If you use character data types, be sure to use the appropriate length.
 
 > [!NOTE]
-> Functions used for partition elimination, filepath and fileinfo, are not currently supported for external tables other than those created automatically for each table created in Apache Spark for Azure Synapse Analytics.
+> Functions used for partition elimination, filepath and filename, aren't currently supported for external tables, other than those created automatically for each table created in Apache Spark for Azure Synapse Analytics.
 
-If your stored data isn't partitioned, consider partitioning it so you can use these functions to optimize queries targeting those files. When [querying partitioned Apache Spark for Azure Synapse tables](develop-storage-files-spark-tables.md) from SQL on-demand, the query will automatically target only the files needed.
+If your stored data isn't partitioned, consider partitioning it. That way you can use these functions to optimize queries that target those files. When you [query partitioned Apache Spark for Azure Synapse tables](develop-storage-files-spark-tables.md) from SQL on-demand, the query will automatically target only the necessary files.
 
-## Use PARSER_VERSION 2.0 for querying CSV files
+## Use PARSER_VERSION 2.0 to query CSV files
 
-You can use performance optimized parser when querying CSV files. Check [PARSER_VERSION](develop-openrowset.md) for details.
+You can use a performance-optimized parser when you query CSV files. For details, see [PARSER_VERSION](develop-openrowset.md).
 
 ## Use CETAS to enhance query performance and joins
 
 [CETAS](develop-tables-cetas.md) is one of the most important features available in SQL on-demand. CETAS is a parallel operation that creates external table metadata and exports the SELECT query results to a set of files in your storage account.
 
-You can use CETAS to store frequently used parts of queries, like joined reference tables, to a new set of files. Next, you can join to this single external table instead of repeating common joins in multiple queries.
+You can use CETAS to store frequently used parts of queries, like joined reference tables, to a new set of files. You can then join to this single external table instead of repeating common joins in multiple queries.
 
 As CETAS generates Parquet files, statistics will be automatically created when the first query targets this external table, resulting in improved performance.
 
-## AAD pass-through performance
+## Azure AD Pass-through performance
 
-SQL on-demand allows you to access files in storage using AAD pass-through or SAS credential. You might experience slower performance with AAD pass-through comparing to SAS. 
+SQL on-demand allows you to access files in storage by using Azure Active Directory (Azure AD) Pass-through or SAS credentials. You might experience slower performance with Azure AD Pass-through than you would with SAS.
 
-If you need better performance, try SAS credentials to access storage until AAD pass-through performance is improved.
+If you need better performance, try using SAS credentials to access storage until Azure AD Pass-through performance is improved.
 
 ## Next steps
 
-Review the [Troubleshooting](../sql-data-warehouse/sql-data-warehouse-troubleshoot.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json) article for common issues and solutions. If you're working with SQL pool rather than SQL on-demand, see the [Best Practices for SQL pool](best-practices-sql-pool.md) article for specific guidance.
+Review the [troubleshooting](../sql-data-warehouse/sql-data-warehouse-troubleshoot.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json) article for solutions to common problems. If you're working with SQL pools rather than SQL on-demand, see [Best practices for SQL pools](best-practices-sql-pool.md) for specific guidance.