|
1 | | -# PDFsLayouts-Processing-Fapp-DocIntelligence |
2 | | -Example of how to create to extract tables, check boxes, and text form forms and documents from an Azure Storage Account, process them using Azure Document Intelligence, and store the results in Cosmos DB for further analysis. |
| 1 | +# Demo: PDF Layout Extraction with Doc Intelligence (full-code approach) |
| 2 | + |
| 3 | +`Azure Storage + Document Intelligence + Function App + Cosmos DB` |
| 4 | + |
| 5 | +Costa Rica |
| 6 | + |
| 7 | +[](https://github.com) |
| 8 | +[](https://github.com/) |
| 9 | +[brown9804](https://github.com/brown9804) |
| 10 | + |
| 11 | +Last updated: 2025-06-03 |
| 12 | + |
| 13 | +---------- |
| 14 | + |
| 15 | +> [!IMPORTANT] |
| 16 | +> This example is based on a `public network site and is intended for demonstration purposes only`. It showcases how several Azure resources can work together to achieve the desired result. Consider the section below about [Important Considerations for Production Environment](#important-considerations-for-production-environment). Please note that `these demos are intended as a guide and are based on my personal experiences. For official guidance, support, or more detailed information, please refer to Microsoft's official documentation or contact Microsoft directly`: [Microsoft Sales and Support](https://support.microsoft.com/contactus?ContactUsExperienceEntryPointAssetId=S.HP.SMC-HOME) |
| 17 | +
|
| 18 | +<details> |
| 19 | +<summary><b>List of References</b> (Click to expand)</summary> |
| 20 | + |
| 21 | +- [Azure AI Document Intelligence documentation](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/?view=doc-intel-4.0.0) |
| 22 | +- [Get started with the Document Intelligence Sample Labeling tool](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/try-sample-label-tool?view=doc-intel-2.1.0#prerequisites-for-training-a-custom-form-model) |
| 23 | +- [Document Intelligence Sample Labeling tool](https://fott-2-1.azurewebsites.net/) |
| 24 | +- [Assign an Azure role for access to blob data](https://learn.microsoft.com/en-us/azure/storage/blobs/assign-azure-role-data-access?tabs=portal) |
| 25 | +- [Build and train a custom extraction model](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/build-a-custom-model?view=doc-intel-2.1.0) |
| 26 | +- [Compose custom models - Document Intelligence](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/compose-custom-models?view=doc-intel-2.1.0&tabs=studio) |
| 27 | +- [Deploy the Sample Labeling tool](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/deploy-label-tool?view=doc-intel-2.1.0) |
| 28 | +- [Train a custom model using the Sample Labeling tool](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/label-tool?view=doc-intel-2.1.0) |
| 29 | +- [Train models with the sample-labeling tool](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/supervised-table-tags?view=doc-intel-2.1.0) |
| 30 | +- [Azure Cosmos DB - Database for the AI Era](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction) |
| 31 | +- [Consistency levels in Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels) |
| 32 | +- [Azure Cosmos DB SQL API client library for Python](https://learn.microsoft.com/en-us/python/api/overview/azure/cosmos-readme?view=azure-python) |
| 33 | +- [CosmosClient class documentation](https://learn.microsoft.com/en-us/python/api/azure-cosmos/azure.cosmos.cosmos_client.cosmosclient?view=azure-python) |
| 34 | +- [Cosmos AAD Authentication](https://learn.microsoft.com/en-us/python/api/overview/azure/cosmos-readme?view=azure-python#aad-authentication) |
| 35 | +- [Cosmos python examples](https://learn.microsoft.com/en-us/python/api/overview/azure/cosmos-readme?view=azure-python#examples) |
| 36 | +- [Use control plane role-based access control with Azure Cosmos DB for NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/security/how-to-grant-control-plane-role-based-access?tabs=built-in-definition%2Ccsharp&pivots=azure-interface-portal) |
| 37 | +- [Use data plane role-based access control with Azure Cosmos DB for NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/security/how-to-grant-data-plane-role-based-access?tabs=built-in-definition%2Ccsharp&pivots=azure-interface-cli) |
| 38 | +- [Create or update Azure custom roles using Azure CLI](https://learn.microsoft.com/en-us/azure/role-based-access-control/custom-roles-cli) |
| 39 | + |
| 40 | +</details> |
| 41 | + |
| 42 | +> How to extract layout elements from PDFs stored in an Azure Storage Account, process them using Azure Document Intelligence, and store the results in Cosmos DB for further analysis. |
| 43 | +> |
| 44 | +> 1. Upload your PDFs to an Azure Blob Storage container. <br/> |
| 45 | +> 2. An Azure Function is triggered by the upload, which calls the Azure Document Intelligence Layout API to analyze the document structure. <br/> |
| 46 | +> 3. The extracted layout data (such as tables, checkboxes, and text) is parsed and subsequently stored in a Cosmos DB database, ensuring a seamless and automated workflow from document upload to data storage. |
| 47 | +
|
| 48 | +> [!NOTE] |
| 49 | +> Advantages of Document Intelligence for organizations handling with large volumes of documents: <br/> |
| 50 | +> |
| 51 | +> - Utilizes natural language processing, computer vision, deep learning, and machine learning. <br/> |
| 52 | +> - Handles structured, semi-structured, and unstructured documents. <br/> |
| 53 | +> - Automates the extraction and transformation of layout data into usable formats like JSON or CSV. |
| 54 | +
|
| 55 | +<div align="center"> |
| 56 | + <img src="https://github.com/user-attachments/assets/21ec5d04-1c9b-4273-ad98-7b46186de78e" alt="Centered Image" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/> |
| 57 | +</div> |
| 58 | + |
| 59 | + |
| 60 | +## Important Considerations for Production Environment |
| 61 | + |
| 62 | +<details> |
| 63 | + <summary>Private Network Configuration</summary> |
| 64 | + |
| 65 | + > For enhanced security, consider configuring your Azure resources to operate within a private network. This can be achieved using Azure Virtual Network (VNet) to isolate your resources and control inbound and outbound traffic. Implementing private endpoints for services like Azure Blob Storage and Azure Functions can further secure your data by restricting access to your VNet. |
| 66 | +
|
| 67 | +</details> |
| 68 | + |
| 69 | +<details> |
| 70 | + <summary>Security</summary> |
| 71 | + |
| 72 | + > Ensure that you implement appropriate security measures when deploying this solution in a production environment. This includes: <br/> |
| 73 | + > |
| 74 | + > - Securing Access: Use Azure Entra ID (formerly known as Azure Active Directory or Azure AD) for authentication and role-based access control (RBAC) to manage permissions. <br/> |
| 75 | + > - Managing Secrets: Store sensitive information such as connection strings and API keys in Azure Key Vault. <br/> |
| 76 | + > - Data Encryption: Enable encryption for data at rest and in transit to protect sensitive information. |
| 77 | +
|
| 78 | +</details> |
| 79 | + |
| 80 | +<details> |
| 81 | + <summary>Scalability</summary> |
| 82 | + |
| 83 | + > While this example provides a basic setup, you may need to scale the resources based on your specific requirements. Azure services offer various scaling options to handle increased workloads. Consider using: <br/> |
| 84 | + > |
| 85 | + > - Auto-scaling: Configure auto-scaling for Azure Functions and other services to automatically adjust based on demand. <br/> |
| 86 | + > - Load Balancing: Use Azure Load Balancer or Application Gateway to distribute traffic and ensure high availability. |
| 87 | +
|
| 88 | +</details> |
| 89 | + |
| 90 | +<details> |
| 91 | + <summary>Cost Management</summary> |
| 92 | + |
| 93 | + > Monitor and manage the costs associated with your Azure resources. Use Azure Cost Management and Billing to track usage and optimize resource allocation. |
| 94 | +
|
| 95 | +</details> |
| 96 | + |
| 97 | +<details> |
| 98 | + <summary>Compliance</summary> |
| 99 | + |
| 100 | + > Ensure that your deployment complies with relevant regulations and standards. Use Azure Policy to enforce compliance and governance policies across your resources. |
| 101 | +</details> |
| 102 | +
|
| 103 | +<details> |
| 104 | + <summary>Disaster Recovery</summary> |
| 105 | + |
| 106 | +> Implement a disaster recovery plan to ensure business continuity in case of failures. Use Azure Site Recovery and backup solutions to protect your data and applications. |
| 107 | +
|
| 108 | +</details> |
| 109 | + |
| 110 | +<div align="center"> |
| 111 | + <h3 style="color: #4CAF50;">Total Visitors</h3> |
| 112 | + <img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/> |
| 113 | +</div> |
0 commit comments