Skip to content

Commit 834c575

Browse files
author
Larry Franks
committed
acrolinx
1 parent 3a576ac commit 834c575

File tree

4 files changed

+19
-18
lines changed

4 files changed

+19
-18
lines changed

articles/machine-learning/how-to-administrate-data-authentication.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
---
22
title: How to administrate data authentication
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to manage data access and how to anthenticate in Azure Machine Learning
4+
description: Learn how to manage data access and how to authenticate in Azure Machine Learning
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: enterprise-readiness
88
ms.topic: how-to
99
ms.author: xunwan
1010
author: xunwan
1111
ms.reviewer: larryfr
12-
ms.date: 05/19/2022
12+
ms.date: 05/24/2022
1313

1414
# Customer intent: As an administrator, I need to administrate data access and set up authentication method for data scientists.
1515
---
1616

1717
# How to authenticate data access
18-
Learn how to manage data access and how to anthenticate in Azure Machine Learning
18+
Learn how to manage data access and how to authenticate in Azure Machine Learning
1919
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
2020
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
2121

articles/machine-learning/how-to-create-register-data-assets.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom: contperf-fy21q1, data4ml, sdkv1
1010
ms.author: xunwan
1111
author: xunwan
1212
ms.reviewer: nibaccam
13-
ms.date: 05/11/2022
13+
ms.date: 05/24/2022
1414

1515
# Customer intent: As an experienced data scientist, I need to package my data into a consumable and reusable object to train my machine learning models.
1616

@@ -343,7 +343,7 @@ transformations:
343343
header: all_files_same_headers
344344
```
345345

346-
The important part here is that the MLTable-artifact does have not have any absolute paths, hence it is self-contained and all that is needed is stored in that one folder; regardless of whether that folder is stored on your local drive or in your cloud drive or on a public http server.
346+
The important part here's that the MLTable-artifact doesn't have any absolute paths, hence it's self-contained and all that is needed is stored in that one folder; regardless of whether that folder is stored on your local drive or in your cloud drive or on a public http server.
347347

348348
This artifact file can be consumed in a command job as follows:
349349

@@ -390,7 +390,7 @@ command: |
390390
"
391391
```
392392

393-
You can also has an MLTable file stored on their *local machine*, but no data files. The underlying data is stored on the cloud. In this case, the MLTable should reference the underlying data by means of an **absolute expression (i.e. a URI)**:
393+
You can also have an MLTable file stored on the *local machine*, but no data files. The underlying data is stored on the cloud. In this case, the MLTable should reference the underlying data with an **absolute expression (i.e. a URI)**:
394394

395395
```
396396
.
@@ -414,7 +414,7 @@ transformations:
414414

415415

416416
### Supporting multiple files in a table
417-
While above scenarios are creating rectangular data, it is also possible to create an mltable-artifact that just contains files:
417+
While above scenarios are creating rectangular data, it's also possible to create an mltable-artifact that just contains files:
418418

419419
```
420420
.
@@ -437,7 +437,7 @@ paths:
437437
- file: http://foo.com/5.csv
438438
```
439439

440-
As outlined above, mltable can be created from a URI or a local folder path:
440+
As outlined above, MLTable can be created from a URI or a local folder path:
441441

442442
```yaml
443443
#source ../configs/types/22_input_mldataset_artifacts-PipelineJob.yaml
@@ -485,7 +485,7 @@ jobs:
485485
"
486486
```
487487

488-
MLTable-artifacts can yield files that are not necessarily located in the `mltable`'s storage. Or it can **subset or shuffle** the data that resides in the storage using the `take_random_sample` transform for example. That view is only visible if the MLTable file is actually evaluated by the engine. The user can do that as described above by using the MLTable SDK by running `mltable.load` -- but that requires python and the installation of the SDK.
488+
MLTable-artifacts can yield files that aren't necessarily located in the `mltable`'s storage. Or it can **subset or shuffle** the data that resides in the storage using the `take_random_sample` transform for example. That view is only visible if the MLTable file is evaluated by the engine. The user can do that as described above by using the MLTable SDK by running `mltable.load`, but that requires python and the installation of the SDK.
489489

490490
### Support globbing of files
491491
Along with users being able to provide a `file` or `folder`, the MLTable artifact file will also allow customers to specify a *pattern* to do globbing of files:
@@ -506,15 +506,15 @@ transformations:
506506
### Delimited text: Transformations
507507
There are the following transformations that are *specific to delimited text*.
508508

509-
- `infer_column_types`: Boolean to infer column data types. Defaults to True. Type inference requires that the data source is accessible from current compute. Currently type inference will only pull first 200 rows. If the data contains multiple types of value, it is better to provide desired type as an override via `set_column_types` argument
509+
- `infer_column_types`: Boolean to infer column data types. Defaults to True. Type inference requires that the data source is accessible from current compute. Currently type inference will only pull first 200 rows. If the data contains multiple types of value, it's better to provide desired type as an override via `set_column_types` argument
510510
- `encoding`: Specify the file encoding. Supported encodings are 'utf8', 'iso88591', 'latin1', 'ascii', 'utf16', 'utf32', 'utf8bom' and 'windows1252'. Defaults to utf8.
511511
- header: user can choose one of the following options:
512512
- `no_header`
513513
- `from_first_file`
514514
- `all_files_different_headers`
515515
- `all_files_same_headers` (default)
516516
- `delimiter`: The separator used to split columns.
517-
- `empty_as_string`: Specify if empty field values should be loaded as empty strings. The default (False) will read empty field values as nulls. Passing this as True will read empty field values as empty strings. If the values are converted to numeric or datetime then this has no effect, as empty values will be converted to nulls.
517+
- `empty_as_string`: Specify if empty field values should be loaded as empty strings. The default (False) will read empty field values as nulls. Passing this as True will read empty field values as empty strings. If the values are converted to numeric or datetime, then this has no effect as empty values will be converted to nulls.
518518
- `include_path_column`: Boolean to keep path information as column in the table. Defaults to False. This is useful when reading multiple files, and want to know which file a particular record originated from, or to keep useful information in file path.
519519
- `support_multi_line`: By default (support_multi_line=False), all line breaks, including those in quoted field values, will be interpreted as a record break. Reading data this way is faster and more optimized for parallel execution on multiple CPU cores. However, it may result in silently producing more records with misaligned field values. This should be set to True when the delimited files are known to contain quoted line breaks.
520520

@@ -531,7 +531,8 @@ Below are the supported transformations that are specific for json lines:
531531
- `encoding` Specify the file encoding. Supported encodings are `utf8`, `iso88591`, `latin1`, `ascii`, `utf16`, `utf32`, `utf8bom` and `windows1252`. Default is `utf8`.
532532

533533
## Global Transforms
534-
As well as having transforms specific to the delimited text, parquet, Delta. There are other transforms that mltable-artifact files support:
534+
535+
MLTable-artifacts provide transformations specific to the delimited text, parquet, Delta. There are other transforms that mltable-artifact files support:
535536

536537
- `take`: Takes the first *n* records of the table
537538
- `take_random_sample`: Takes a random sample of the table where each record has a *probability* of being selected. The user can also include a *seed*.
@@ -540,11 +541,11 @@ As well as having transforms specific to the delimited text, parquet, Delta. The
540541
- `keep_columns`: Keeps only the specified columns in the table. This transform supports regex so that users can keep columns matching a particular pattern.
541542
- `filter`: Filter the data, leaving only the records that match the specified expression. **NOTE: This will come post-GA as we need to define the filter query language**.
542543
- `extract_partition_format_into_columns`: Specify the partition format of path. Defaults to None. The partition information of each path will be extracted into columns based on the specified format. Format part '{column_name}' creates string column, and '{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime column, where 'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day, hour, minute and second for the datetime type. The format should start from the position of first partition key until the end of file path. For example, given the path '../Accounts/2019/01/01/data.csv' where the partition is by department name and time, partition_format='/{Department}/{PartitionDate:yyyy/MM/dd}/data.csv' creates a string column 'Department' with the value 'Accounts' and a datetime column 'PartitionDate' with the value '2019-01-01'.
543-
Our principle here is to support transforms *specific to data delivery* and not to get into wider feature engineering transforms.
544+
Our principle here's to support transforms *specific to data delivery* and not to get into wider feature engineering transforms.
544545

545546

546547
## Traits
547-
The keen eyed among you may have spotted that `mltable` type supports a `traits` section. Traits define fixed characteristics of the table (i.e. they are **not** freeform metadata that users can add) and they do not perform any transformations but can be used by the engine.
548+
The keen eyed among you may have spotted that `mltable` type supports a `traits` section. Traits define fixed characteristics of the table (that is, they are **not** freeform metadata that users can add) and they don't perform any transformations but can be used by the engine.
548549

549550
- `index_columns`: Set the table index using existing columns. This trait can be used by partition_by in the data plane to split data by the index.
550551
- `timestamp_column`: Defines the timestamp column of the table. This trait can be used in filter transforms, or in other data plane operations (SDK) such as drift detection.
@@ -553,7 +554,7 @@ Moreover, *in the future* we can use traits to define RAI aspects of the data, f
553554

554555
- `sensitive_columns`: Here the user can define certain columns that contain sensitive information.
555556

556-
Again, this is not a transform but is informing the system of some additional properties in the data.
557+
Again, this isn't a transform but is informing the system of some extra properties in the data.
557558

558559

559560

articles/machine-learning/how-to-identity-based-data-access.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ If you're training a model on a remote compute target and want to access the dat
144144

145145
By default, Azure Machine Learning can't communicate with a storage account that's behind a firewall or in a virtual network.
146146

147-
You can configure storage accounts to allow access only from within specific virtual networks. This configuration requires additional steps to ensure data isn't leaked outside of the network. This behavior is the same for credential-based data access. For more information, see [How to configure virtual network scenarios](how-to-access-data.md#virtual-network).
147+
You can configure storage accounts to allow access only from within specific virtual networks. This configuration requires extra steps to ensure data isn't leaked outside of the network. This behavior is the same for credential-based data access. For more information, see [How to configure virtual network scenarios](how-to-access-data.md#virtual-network).
148148

149149
If your storage account has virtual network settings, that dictates what identity type and permissions access is needed. For example for data preview and data profile, the virtual network settings determine what type of identity is used to authenticate data access.
150150

@@ -163,7 +163,7 @@ We recommend that you use [Azure Machine Learning datasets](./v1/how-to-create-r
163163
164164
Datasets package your data into a lazily evaluated consumable object for machine learning tasks like training. Also, with datasets you can [download or mount](how-to-train-with-datasets.md#mount-vs-download) files of any format from Azure storage services like Azure Blob Storage and Azure Data Lake Storage to a compute target.
165165

166-
To create a dataset, you can reference paths from datastores that also use identity-based data access .
166+
To create a dataset, you can reference paths from datastores that also use identity-based data access.
167167

168168
* If you're underlying storage account type is Blob or ADLS Gen 2, your user identity needs Blob Reader role.
169169
* If your underlying storage is ADLS Gen 1, permissions need can be set via the storage's Access Control List (ACL).

articles/machine-learning/v1/how-to-create-register-datasets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ms.date: 05/11/2022
1818

1919
> [!div class="op_single_selector" title1="Select the version of Azure Machine Learning SDK you are using:"]
2020
> * [v1](how-to-create-register-datasets.md)
21-
> * [v2 (current version)](../how-to-create-register-datasets.md)
21+
> * [v2 (current version)](../how-to-create-register-data-assets.md)
2222
2323
[!INCLUDE [sdk v1](../../../includes/machine-learning-sdk-v1.md)]
2424

0 commit comments

Comments
 (0)