MicrosoftDocs
diff --git a/‎articles/machine-learning/v1/how-to-create-register-datasets.md‎
Lines changed: 9 additions & 9 deletions b/‎articles/machine-learning/v1/how-to-create-register-datasets.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎articles/machine-learning/v1/how-to-data-prep-synapse-spark-pool.md‎
Lines changed: 4 additions & 4 deletions b/‎articles/machine-learning/v1/how-to-data-prep-synapse-spark-pool.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/machine-learning/v1/how-to-designer-transform-data.md‎
Lines changed: 10 additions & 10 deletions b/‎articles/machine-learning/v1/how-to-designer-transform-data.md‎
Lines changed: 10 additions & 10 deletions
@@ -10,7 +10,7 @@ ms.custom: UpdateFrequency5, data4ml, devx-track-arm-template
 ms.author: yogipandey
 author: ynpandey
 ms.reviewer: franksolomon
-ms.date: 02/28/2024
+ms.date: 03/06/2025
 #Customer intent: As an experienced data scientist, I need to package my data into a consumable and reusable object to train my machine learning models.
 ---
 
@@ -32,7 +32,7 @@ With Azure Machine Learning datasets, you can:
 
 > [!IMPORTANT]
 > Items in this article marked as "preview" are currently in public preview.
-> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. 
+> The preview version is provided without a service level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities.
 > For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
 
 ## Prerequisites
@@ -55,7 +55,7 @@ To create and work with datasets, you need:
 > Some dataset classes have dependencies on the [azureml-dataprep](https://pypi.org/project/azureml-dataprep/) package, which is only compatible with 64-bit Python. If you develop on __Linux__, these classes rely on .NET Core 2.1, and only specific distributions support them. For more information about the supported distros, read the .NET Core 2.1 column in the [Install .NET on Linux](/dotnet/core/install/linux) article.
 
 > [!IMPORTANT]
-> While the package may work on older versions of Linux distros, we do not recommend use of a distro that is out of mainstream support. Distros that are out of mainstream support may have security vulnerabilities, because they do not receive the latest updates. We recommend using the latest supported version of your distro that is compatible with .
+> While the package might work on older versions of Linux distros, we don't recommend use of a distro that is out of mainstream support. Distros that are out of mainstream support might have security vulnerabilities, because they don't receive the latest updates. We recommend using the latest supported version of your distro that is compatible with .
 
 ## Compute size guidance
 
@@ -71,7 +71,7 @@ There are two dataset types, based on how users consume datasets in training: Fi
 
 ### FileDataset
 
-A [FileDataset](/python/api/azureml-core/azureml.data.file_dataset.filedataset) references single or multiple files in your datastores or public URLs. If your data is already cleaned, and ready to use in training experiments, you can [download or mount](how-to-train-with-datasets.md#mount-vs-download) the files to your compute as a FileDataset object.
+A [FileDataset](/python/api/azureml-core/azureml.data.file_dataset.filedataset) references single or multiple files in your datastores or public URLs. If you have cleaned data that is ready for use in training experiments, you can [download or mount](how-to-train-with-datasets.md#mount-vs-download) the files to your compute as a FileDataset object.
 
 We recommend FileDatasets for your machine learning workflows, because the source files can be in any format. This enables a wider range of machine learning scenarios, including deep learning.
 
@@ -87,8 +87,8 @@ Create a TabularDataset with [the Python SDK](#create-a-tabulardataset) or [Azur
 
 >[!NOTE]
 > [Automated ML](../concept-automated-ml.md) workflows generated via the Azure Machine Learning studio currently only support TabularDatasets.
->  
->Also, for TabularDatasets generated from [SQL query results](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#azureml-data-dataset-factory-tabulardatasetfactory-from-sql-query), T-SQL (e.g. 'WITH' sub query) or duplicate column names are not supported. Complex T-SQL queries can cause performance issues. Duplicate column names in a dataset can cause ambiguity issues.
+> 
+>Also, for TabularDatasets generated from [SQL query results](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#azureml-data-dataset-factory-tabulardatasetfactory-from-sql-query), T-SQL (e.g. 'WITH' sub query) or duplicate column names aren't supported. Complex T-SQL queries can cause performance issues. Duplicate column names in a dataset can cause ambiguity issues.
 
 ## Access datasets in a virtual network
 
@@ -108,7 +108,7 @@ To create datasets from a datastore with the Python SDK:
 1. Create the dataset by referencing paths in the datastore. You can create a dataset from multiple paths in multiple datastores. There's no hard limit on the number of files or data size from which you can create a dataset.
 
 > [!NOTE]
-> For each data path, a few requests will be sent to the storage service to check whether it points to a file or a folder. This overhead may lead to degraded performance or failure. A dataset referencing one folder with 1000 files inside is considered referencing one data path. For optimal performance, we recommend creating datasets that reference less than 100 paths in datastores.
+> For each data path, a few requests are sent to the storage service to check whether it points to a file or a folder. This overhead might lead to degraded performance or failure. A dataset that references one folder with 1,000 files inside is considered referencing one data path. For optimal performance, we recommend creating datasets that reference fewer than 100 paths in datastores.
 
 ### Create a FileDataset
 
@@ -206,7 +206,7 @@ After you create and [register](#register-datasets) your dataset, you can load t
 
 Filtering capabilities depends on the type of dataset you have.
 > [!IMPORTANT]
-> Filtering datasets with the [`filter()`](/python/api/azureml-core/azureml.data.tabulardataset#azureml-data-tabulardataset-filter) preview method is an [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview feature, and may change at any time.
+> Filtering datasets with the [`filter()`](/python/api/azureml-core/azureml.data.tabulardataset#azureml-data-tabulardataset-filter) preview method is an [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview feature, and could change at any time.
 > 
 For **TabularDatasets**, you can keep or remove columns with the [keep_columns()](/python/api/azureml-core/azureml.data.tabulardataset#azureml-data-tabulardataset-keep-columns) and [drop_columns()](/python/api/azureml-core/azureml.data.tabulardataset#azureml-data-tabulardataset-drop-columns) methods.
 
@@ -338,7 +338,7 @@ dataset = Dataset.Tabular.register_pandas_dataframe(pandas_df, datastore, "datas
 
 ```
 > [!TIP]
-> Create and register a TabularDataset from an in memory spark dataframe or a dask dataframe with the public preview methods, [`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#azureml-data-dataset-factory-tabulardatasetfactory-register-spark-dataframe) and [`register_dask_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#azureml-data-dataset-factory-tabulardatasetfactory-register-dask-dataframe). These methods are [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview features, and may change at any time.
+> Create and register a TabularDataset from an in memory spark dataframe or a dask dataframe with the public preview methods, [`register_spark_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#azureml-data-dataset-factory-tabulardatasetfactory-register-spark-dataframe) and [`register_dask_dataframe()`](/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory#azureml-data-dataset-factory-tabulardatasetfactory-register-dask-dataframe). These methods are [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview features, and might change at any time.
 >  
 > These methods upload data to your underlying storage, and as a result incur storage costs.
 
 
@@ -9,7 +9,7 @@ ms.topic: how-to
 author: ynpandey
 ms.author: franksolomon
 ms.reviewer: franksolomon
-ms.date: 02/22/2024
+ms.date: 03/06/2025
 ms.custom: UpdateFrequency5, data4ml, synapse-azureml, sdkv1
 #Customer intent: As a data scientist, I want to prepare my data at scale, and to train my machine learning models from a single notebook using Azure Machine Learning.
 ---
@@ -71,7 +71,7 @@ After the session starts, you can check the session's metadata:
 You can specify an [Azure Machine Learning environment](../concept-environments.md) to use during your Apache Spark session. Only Conda dependencies specified in the environment will take effect. Docker images aren't supported.
 
 >[!WARNING]
->  Python dependencies specified in environment Conda dependencies are not supported in Apache Spark pools. Currently, only fixed Python versions are supported
+>  Python dependencies specified in environment Conda dependencies aren't supported in Apache Spark pools. Currently, only fixed Python versions are supported
 > Include `sys.version_info` in your script to check your Python version
 
 This code creates the`myenv` environment variable, to install `azureml-core` version 1.20.0 and `numpy` version 1.17.0 before the session starts. You can then include this environment in your Apache Spark session `start` statement.
@@ -214,7 +214,7 @@ After you complete the data preparation, and you save your prepared data to stor
 %synapse stop
 ```
 
-## Create a dataset, to represent prepared data
+## Create a dataset to represent prepared data
 
 When you're ready to consume your prepared data for model training, connect to your storage with an [Azure Machine Learning datastore](how-to-access-data.md), and specify the file or file you want to use with an [Azure Machine Learning dataset](how-to-create-register-datasets.md).
 
@@ -238,7 +238,7 @@ input1 = train_ds.as_mount()
 
 ## Use a `ScriptRunConfig` to submit an experiment run to a Synapse Spark pool
 
-If you're ready to automate and productionize your data wrangling tasks, you can submit an experiment run to [an attached Synapse Spark pool](how-to-link-synapse-ml-workspaces.md#attach-a-pool-with-the-python-sdk) with the [ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) object. In a similar way, if you have an Azure Machine Learning pipeline, you can use the [SynapseSparkStep to specify your Synapse Spark pool as the compute target](how-to-use-synapsesparkstep.md) for the data preparation step in your pipeline. Availability of your data to the Synapse Spark pool depends on your dataset type.
+If you're ready to automate and productionize your data wrangling tasks, you can submit an experiment run to [an attached Synapse Spark pool](how-to-link-synapse-ml-workspaces.md#attach-a-pool-with-the-python-sdk) with the [ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) object. In a similar way, if you have an Azure Machine Learning pipeline, you can use the [SynapseSparkStep to specify your Synapse Spark pool as the compute target](how-to-use-synapsesparkstep.md) for your pipeline data preparation step. Availability of your data to the Synapse Spark pool depends on your dataset type.
 
 * For a FileDataset, you can use the [`as_hdfs()`](/python/api/azureml-core/azureml.data.filedataset#as-hdfs--) method. When the run is submitted, the dataset is made available to the Synapse Spark pool as a Hadoop distributed file system (HFDS)
 * For a [TabularDataset](how-to-create-register-datasets.md#tabulardataset), you can use the [`as_named_input()`](/python/api/azureml-core/azureml.data.abstract_dataset.abstractdataset#as-named-input-name-) method
 
@@ -8,7 +8,7 @@ ms.subservice: mldata
 ms.reviewer: franksolomon
 ms.author: keli19
 author: likebupt
-ms.date: 03/27/2024
+ms.date: 03/07/2025
 ms.topic: how-to
 ms.custom: UpdateFrequency5, designer
 ---
@@ -17,22 +17,22 @@ ms.custom: UpdateFrequency5, designer
 
 In this article, you learn how to transform and save datasets in the Azure Machine Learning designer, to prepare your own data for machine learning.
 
-You'll use the sample [Adult Census Income Binary Classification](samples-designer.md) dataset to prepare two datasets: one dataset that includes adult census information from only the United States, and another dataset that includes census information from non-US adults.
+You'll use the sample [Adult Census Income Binary Classification](samples-designer.md) dataset to prepare two datasets. One dataset includes adult census information from only the United States, and another dataset includes census information from non-US adults.
 
-In this article, you'll learn how to:
+In this article, you learn how to:
 
 1. Transform a dataset to prepare it for training.
 1. Export the resulting datasets to a datastore.
 1. View the results.
 
-This how-to is a prerequisite for the [how to retrain designer models](how-to-retrain-designer.md) article. In that article, you'll learn how to use the transformed datasets to train multiple models, with pipeline parameters.
+This how-to is a prerequisite for the [how to retrain designer models](how-to-retrain-designer.md) article. In that article, you learn how to use the transformed datasets to train multiple models with pipeline parameters.
 
 > [!IMPORTANT]
-> If you do not observe graphical elements mentioned in this document, such as buttons in studio or designer, you may not have the correct level of permissions to the workspace. Please contact your Azure subscription administrator to verify that you have been granted the correct level of access. For more information, visit [Manage users and roles](../how-to-assign-roles.md).
+> If you don't observe the graphical elements mentioned in this document - for example, buttons in studio or designer, you might not have the correct level of permissions to the workspace. Contact your Azure subscription administrator to verify that you have the correct level of access. For more information, visit [Manage users and roles](../how-to-assign-roles.md).
 
 ## Transform a dataset
 
-In this section, you'll learn how to import the sample dataset, and split the data into US and non-US datasets. Visit [how to import data](how-to-designer-import-data.md) for more information about how to import your own data into the designer.
+In this section, you learn how to import the sample dataset, and split the data into US and non-US datasets. Visit [how to import data](how-to-designer-import-data.md) for more information about how to import your own data into the designer.
 
 ### Import data
 
@@ -52,7 +52,7 @@ Use these steps to import the sample dataset:
 
 ### Split the data
 
-In this section, you'll use the [Split Data component](../algorithm-module-reference/split-data.md) to identify and split rows that contain "United-States" in the "native-country" column
+In this section, you use the [Split Data component](../algorithm-module-reference/split-data.md) to identify and split rows that contain "United-States" in the "native-country" column
 
 1. To the left of the canvas, in the component tab, expand the **Data Transformation** section, and find the **Split Data** component
 
@@ -91,7 +91,7 @@ Now that you set up your pipeline to split the data, you must specify where to p
     For the **Split Data** component, the output port order is important. The first output port contains the rows where the regular expression is true. In this case, the first port contains rows for US-based income, and the second port contains rows for non-US based income
 
 1. In the component details pane to the right of the canvas, set the following options:
-    
+
     **Datastore type**: Azure Blob Storage
 
     **Datastore**: Select an existing datastore, or select "New datastore" to create a new one
@@ -101,9 +101,9 @@ Now that you set up your pipeline to split the data, you must specify where to p
     **File format**: csv
 
     > [!NOTE]
-    > This article assumes that you have access to a datastore registered to the current Azure Machine Learning workspace. Visit [Connect to Azure storage services](how-to-connect-data-ui.md#create-datastores) for datastore setup instructions
+    > This article assumes that you have access to a datastore registered to the current Azure Machine Learning workspace. Visit [Connect to Azure storage services](how-to-connect-data-ui.md#create-datastores) for datastore setup instructions.
 
-    You can create a datastore if you don't have one now. For example purposes, this article saves the datasets to the default blob storage account associated with the workspace. It saves the datasets into the `azureml` container, in a new folder named `data`
+    You can create a datastore if you don't have one. For example purposes, this article saves the datasets to the default blob storage account associated with the workspace. It saves the datasets into the `azureml` container, in a new folder named `data`.
 
 1.  Select the **Export Data** component connected to the *right*-most port of the **Split Data** component, to open the Export Data configuration pane