-
Notifications
You must be signed in to change notification settings - Fork 28
Review main-docs/set_env_for_training_data_and_reference_doc.md
#72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,55 +1,65 @@ | ||
# Set env variables for training data and reference doc for Pro mode | ||
Folders [document_training](../data/document_training/) and [field_extraction_pro_mode](../data/field_extraction_pro_mode) contain the manually labeled data for training and reference doc for Pro mode as a quick sample. Before using these knowledge source files, you need an Azure Storage blob container to store them. Let's follow below steps to prepare the data environment: | ||
|
||
1. *Create an Azure Storage Account:* If you don’t already have one, follow the guide to [create an Azure Storage Account](https://aka.ms/create-a-storage-account). | ||
> If you already have an account, you can skip this step. | ||
2. *Install Azure Storage Explorer:* Azure Storage Explorer is a tool which makes it easy to work with Azure Storage data. Install it and login with your credential, follow the [guide](https://aka.ms/download-and-install-Azure-Storage-Explorer). | ||
3. *Create or Choose a Blob Container:* Create a blob container from Azure Storage Explorer or use an existing one. | ||
<img src="./create-blob-container.png" width="600" /> | ||
4. *Set SAS URL Related Environment Variables in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env). There are two options to set up environment variables to utilize required Shared Access Signature (SAS) URL. | ||
- Option A - Generate a SAS URL manually on Azure Storage Explorer | ||
- Right-click on blob container and select the `Get Shared Access Signature...` in the menu. | ||
- Check the required permissions: `Read`, `Write` and `List` | ||
- We will need `Write` for uploading, modifying, or appending blobs | ||
- Click the `Create` button. | ||
<img src="./get-access-signature.png" height="600" /> <img src="./choose-signature-options.png" height="600" /> | ||
- *Copy the SAS URL:* After creating the SAS, click `Copy` to get the URL with token. This will be used as the value for **TRAINING_DATA_SAS_URL** or **REFERENCE_DOC_SAS_URL** when running the sample code. | ||
# Set Environment Variables for Training Data and Reference Documents in Pro Mode | ||
|
||
The folders [document_training](../data/document_training/) and [field_extraction_pro_mode](../data/field_extraction_pro_mode) contain manually labeled data used for training and reference documents in Pro mode as quick samples. Before using these knowledge source files, you need an Azure Storage blob container to store them. Follow the steps below to prepare your data environment: | ||
|
||
1. **Create an Azure Storage Account:** | ||
If you don’t already have one, follow the guide to [create an Azure Storage Account](https://aka.ms/create-a-storage-account). | ||
> If you already have an account, you can skip this step. | ||
|
||
2. **Install Azure Storage Explorer:** | ||
Azure Storage Explorer is a tool that simplifies working with Azure Storage data. Install it and log in with your credentials by following the [installation guide](https://aka.ms/download-and-install-Azure-Storage-Explorer). | ||
|
||
3. **Create or Choose a Blob Container:** | ||
Using Azure Storage Explorer, create a new blob container or select an existing one. | ||
<img src="./create-blob-container.png" width="600" /> | ||
|
||
4. **Set SAS URL-related Environment Variables in the `.env` File:** | ||
Depending on the sample you plan to run, configure the required environment variables in the [.env](../notebooks/.env) file. There are two options to set up environment variables that utilize the required Shared Access Signature (SAS) URL. | ||
|
||
- **Option A - Generate a SAS URL Manually via Azure Storage Explorer** | ||
- Right-click on the blob container and select **Get Shared Access Signature...** from the menu. | ||
- Select the permissions: **Read**, **Write**, and **List**. | ||
- Note: **Write** permission is required for uploading, modifying, or appending blobs. | ||
- Click the **Create** button. | ||
<img src="./get-access-signature.png" height="600" /> <img src="./choose-signature-options.png" height="600" /> | ||
- **Copy the SAS URL:** After creating the SAS, click **Copy** to get the URL with the token. This URL will be used as the value for either **TRAINING_DATA_SAS_URL** or **REFERENCE_DOC_SAS_URL** when running the sample code. | ||
<img src="./copy-access-signature.png" width="600" /> | ||
|
||
- Set the following in [.env](../notebooks/.env). | ||
> NOTE: **REFERENCE_DOC_SAS_URL** can be the same as the **TRAINING_DATA_SAS_URL** to re-use the same blob container | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the SAS URL as value of **TRAINIGN_DATA_SAS_URL**. | ||
- Set the following variables in the [.env](../notebooks/.env) file: | ||
> **Note:** The value for **REFERENCE_DOC_SAS_URL** can be the same as **TRAINING_DATA_SAS_URL** to reuse the same blob container. | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the SAS URL as the value of **TRAINING_DATA_SAS_URL**. | ||
```env | ||
TRAINING_DATA_SAS_URL=<Blob container SAS URL> | ||
``` | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the SAS URL as value of **REFERENCE_DOC_SAS_URL**. | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the SAS URL as the value of **REFERENCE_DOC_SAS_URL**. | ||
```env | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
REFERENCE_DOC_SAS_URL=<Blob container SAS URL> | ||
``` | ||
- Option B - Auto-generate the SAS URL via code in sample notebooks | ||
- Instead of manually creating a SAS URL, you can set storage account and container information, and let the code generate a temporary SAS URL at runtime. | ||
> NOTE: **TRAINING_DATA_STORAGE_ACCOUNT_NAME** and **TRAINING_DATA_CONTAINER_NAME** can be the same as the **REFERENCE_DOC_STORAGE_ACCOUNT_NAME** and **REFERENCE_DOC_CONTAINER_NAME** to re-use the same blob container | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the storage account name as `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `TRAINING_DATA_CONTAINER_NAME`. | ||
|
||
- **Option B - Auto-generate the SAS URL via Code in Sample Notebooks** | ||
- Instead of manually creating a SAS URL, you can specify the storage account and container information and let the code generate a temporary SAS URL at runtime. | ||
> **Note:** **TRAINING_DATA_STORAGE_ACCOUNT_NAME** and **TRAINING_DATA_CONTAINER_NAME** can be the same as **REFERENCE_DOC_STORAGE_ACCOUNT_NAME** and **REFERENCE_DOC_CONTAINER_NAME** to reuse the same blob container. | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the storage account name as `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `TRAINING_DATA_CONTAINER_NAME`. | ||
```env | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
TRAINING_DATA_STORAGE_ACCOUNT_NAME=<your-storage-account-name> | ||
TRAINING_DATA_CONTAINER_NAME=<your-container-name> | ||
``` | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the storage account name as `REFERENCE_DOC_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `REFERENCE_DOC_CONTAINER_NAME`. | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the storage account name as `REFERENCE_DOC_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `REFERENCE_DOC_CONTAINER_NAME`. | ||
```env | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
REFERENCE_DOC_STORAGE_ACCOUNT_NAME=<your-storage-account-name> | ||
REFERENCE_DOC_CONTAINER_NAME=<your-container-name> | ||
``` | ||
|
||
5. *Set Folder Prefix in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env). | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add a prefix for **TRAINING_DATA_PATH**. You can choose any folder name you like for **TRAINING_DATA_PATH**. For example, you could use "training_files". | ||
5. **Set Folder Prefixes in the `.env` File:** | ||
Depending on the sample you will run, set the required environment variables in the [.env](../notebooks/.env) file. | ||
|
||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add a prefix for **TRAINING_DATA_PATH**. You can choose any folder name within the blob container. For example, use `training_files`. | ||
```env | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
TRAINING_DATA_PATH=<Designated folder path under the blob container> | ||
``` | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add a prefix for **REFERENCE_DOC_PATH**. You can choose any folder name you like for **REFERENCE_DOC_PATH**. For example, you could use "reference_docs". | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add a prefix for **REFERENCE_DOC_PATH**. You can choose any folder name within the blob container. For example, use `reference_docs`. | ||
```env | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
REFERENCE_DOC_PATH=<Designated folder path under the blob container> | ||
``` | ||
|
||
Now, we have completed the preparation of the data environment. Next, we could create an analyzer through code. | ||
|
||
|
||
Once these steps are completed, your data environment is ready. You can proceed to create an analyzer through code. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
categories: [Grammar, Clarity, Consistency, Formatting]
categories: [Grammar, Clarity]
categories: [Formatting, Consistency]
categories: [Clarity]
categories: [Clarity, Formatting]
categories: [Typo Fix, Consistency]
categories: [Formatting]