-
Notifications
You must be signed in to change notification settings - Fork 28
Review main-docs/set_env_for_training_data_and_reference_doc.md
#72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
chienyuanchang
wants to merge
3
commits into
main
Choose a base branch
from
review-main-docs-set_env_for_training_data_and_reference_doc.md-1754009026
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,55 +1,65 @@ | ||
# Set env variables for training data and reference doc for Pro mode | ||
Folders [document_training](../data/document_training/) and [field_extraction_pro_mode](../data/field_extraction_pro_mode) contain the manually labeled data for training and reference doc for Pro mode as a quick sample. Before using these knowledge source files, you need an Azure Storage blob container to store them. Let's follow below steps to prepare the data environment: | ||
|
||
1. *Create an Azure Storage Account:* If you don’t already have one, follow the guide to [create an Azure Storage Account](https://aka.ms/create-a-storage-account). | ||
> If you already have an account, you can skip this step. | ||
2. *Install Azure Storage Explorer:* Azure Storage Explorer is a tool which makes it easy to work with Azure Storage data. Install it and login with your credential, follow the [guide](https://aka.ms/download-and-install-Azure-Storage-Explorer). | ||
3. *Create or Choose a Blob Container:* Create a blob container from Azure Storage Explorer or use an existing one. | ||
<img src="./create-blob-container.png" width="600" /> | ||
4. *Set SAS URL Related Environment Variables in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env). There are two options to set up environment variables to utilize required Shared Access Signature (SAS) URL. | ||
- Option A - Generate a SAS URL manually on Azure Storage Explorer | ||
- Right-click on blob container and select the `Get Shared Access Signature...` in the menu. | ||
- Check the required permissions: `Read`, `Write` and `List` | ||
- We will need `Write` for uploading, modifying, or appending blobs | ||
- Click the `Create` button. | ||
<img src="./get-access-signature.png" height="600" /> <img src="./choose-signature-options.png" height="600" /> | ||
- *Copy the SAS URL:* After creating the SAS, click `Copy` to get the URL with token. This will be used as the value for **TRAINING_DATA_SAS_URL** or **REFERENCE_DOC_SAS_URL** when running the sample code. | ||
# Set Environment Variables for Training Data and Reference Documents in Pro Mode | ||
|
||
The folders [document_training](../data/document_training/) and [field_extraction_pro_mode](../data/field_extraction_pro_mode) contain manually labeled data used for training and reference documents in Pro mode as quick samples. Before using these knowledge source files, you need an Azure Storage blob container to store them. Follow the steps below to prepare your data environment: | ||
|
||
1. **Create an Azure Storage Account:** | ||
If you don’t already have one, follow the guide to [create an Azure Storage Account](https://aka.ms/create-a-storage-account). | ||
> If you already have an account, you can skip this step. | ||
|
||
2. **Install Azure Storage Explorer:** | ||
Azure Storage Explorer is a tool that simplifies working with Azure Storage data. Install it and log in with your credentials by following the [installation guide](https://aka.ms/download-and-install-Azure-Storage-Explorer). | ||
|
||
3. **Create or Choose a Blob Container:** | ||
Using Azure Storage Explorer, create a new blob container or select an existing one. | ||
<img src="./create-blob-container.png" width="600" /> | ||
|
||
4. **Set SAS URL-related Environment Variables in the `.env` File:** | ||
Depending on the sample you plan to run, configure the required environment variables in the [.env](../notebooks/.env) file. There are two options to set up environment variables that utilize the required Shared Access Signature (SAS) URL. | ||
|
||
- **Option A - Generate a SAS URL Manually via Azure Storage Explorer** | ||
- Right-click on the blob container and select **Get Shared Access Signature...** from the menu. | ||
- Select the permissions: **Read**, **Write**, and **List**. | ||
- Note: **Write** permission is required for uploading, modifying, or appending blobs. | ||
- Click the **Create** button. | ||
<img src="./get-access-signature.png" height="600" /> <img src="./choose-signature-options.png" height="600" /> | ||
- **Copy the SAS URL:** After creating the SAS, click **Copy** to get the URL with the token. This URL will be used as the value for either **TRAINING_DATA_SAS_URL** or **REFERENCE_DOC_SAS_URL** when running the sample code. | ||
<img src="./copy-access-signature.png" width="600" /> | ||
|
||
- Set the following in [.env](../notebooks/.env). | ||
> NOTE: **REFERENCE_DOC_SAS_URL** can be the same as the **TRAINING_DATA_SAS_URL** to re-use the same blob container | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the SAS URL as value of **TRAINIGN_DATA_SAS_URL**. | ||
- Set the following variables in the [.env](../notebooks/.env) file: | ||
> **Note:** The value for **REFERENCE_DOC_SAS_URL** can be the same as **TRAINING_DATA_SAS_URL** to reuse the same blob container. | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the SAS URL as the value of **TRAINING_DATA_SAS_URL**. | ||
```env | ||
TRAINING_DATA_SAS_URL=<Blob container SAS URL> | ||
``` | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the SAS URL as value of **REFERENCE_DOC_SAS_URL**. | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the SAS URL as the value of **REFERENCE_DOC_SAS_URL**. | ||
```env | ||
chienyuanchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
REFERENCE_DOC_SAS_URL=<Blob container SAS URL> | ||
``` | ||
- Option B - Auto-generate the SAS URL via code in sample notebooks | ||
- Instead of manually creating a SAS URL, you can set storage account and container information, and let the code generate a temporary SAS URL at runtime. | ||
> NOTE: **TRAINING_DATA_STORAGE_ACCOUNT_NAME** and **TRAINING_DATA_CONTAINER_NAME** can be the same as the **REFERENCE_DOC_STORAGE_ACCOUNT_NAME** and **REFERENCE_DOC_CONTAINER_NAME** to re-use the same blob container | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the storage account name as `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `TRAINING_DATA_CONTAINER_NAME`. | ||
|
||
- **Option B - Auto-generate the SAS URL via Code in Sample Notebooks** | ||
- Instead of manually creating a SAS URL, you can specify the storage account and container information and let the code generate a temporary SAS URL at runtime. | ||
> **Note:** **TRAINING_DATA_STORAGE_ACCOUNT_NAME** and **TRAINING_DATA_CONTAINER_NAME** can be the same as **REFERENCE_DOC_STORAGE_ACCOUNT_NAME** and **REFERENCE_DOC_CONTAINER_NAME** to reuse the same blob container. | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the storage account name as `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `TRAINING_DATA_CONTAINER_NAME`. | ||
```env | ||
chienyuanchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
TRAINING_DATA_STORAGE_ACCOUNT_NAME=<your-storage-account-name> | ||
TRAINING_DATA_CONTAINER_NAME=<your-container-name> | ||
``` | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the storage account name as `REFERENCE_DOC_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `REFERENCE_DOC_CONTAINER_NAME`. | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the storage account name as `REFERENCE_DOC_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `REFERENCE_DOC_CONTAINER_NAME`. | ||
```env | ||
chienyuanchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
REFERENCE_DOC_STORAGE_ACCOUNT_NAME=<your-storage-account-name> | ||
REFERENCE_DOC_CONTAINER_NAME=<your-container-name> | ||
``` | ||
|
||
5. *Set Folder Prefix in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env). | ||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add a prefix for **TRAINING_DATA_PATH**. You can choose any folder name you like for **TRAINING_DATA_PATH**. For example, you could use "training_files". | ||
5. **Set Folder Prefixes in the `.env` File:** | ||
Depending on the sample you will run, set the required environment variables in the [.env](../notebooks/.env) file. | ||
|
||
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add a prefix for **TRAINING_DATA_PATH**. You can choose any folder name within the blob container. For example, use `training_files`. | ||
```env | ||
chienyuanchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
TRAINING_DATA_PATH=<Designated folder path under the blob container> | ||
``` | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add a prefix for **REFERENCE_DOC_PATH**. You can choose any folder name you like for **REFERENCE_DOC_PATH**. For example, you could use "reference_docs". | ||
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add a prefix for **REFERENCE_DOC_PATH**. You can choose any folder name within the blob container. For example, use `reference_docs`. | ||
```env | ||
chienyuanchang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
REFERENCE_DOC_PATH=<Designated folder path under the blob container> | ||
``` | ||
|
||
Now, we have completed the preparation of the data environment. Next, we could create an analyzer through code. | ||
|
||
|
||
Once these steps are completed, your data environment is ready. You can proceed to create an analyzer through code. | ||
chienyuanchang marked this conversation as resolved.
Show resolved
Hide resolved
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.