Skip to content

GenAI Intelligent Document Processing (IDP) Accelerator for AWS CDK is designed to transform unstructured documents into structured data at scale using AWS's latest AI/ML services

License

Notifications You must be signed in to change notification settings

cdklabs/genai-idp

GenAI IDP Accelerator for AWS CDK

Compatible with version: 0.3.18

A modular AWS CDK implementation of the GenAI Intelligent Document Processing (IDP) Accelerator, designed to transform unstructured documents into structured data at scale using AWS's latest AI/ML services.

Overview

This project is a representation of the GenAI Intelligent Document Processing Accelerator as a set of composable AWS CDK packages, enabling more flexible deployment, customization, and integration options.

Repository Structure

Packages

  • @cdklabs/genai-idp - Core building blocks for document processing infrastructure
  • @cdklabs/genai-idp-bda-processor - Pattern 1 implementation using Amazon Bedrock Data Automation
  • @cdklabs/genai-idp-bedrock-llm-processor - Pattern 2 implementation for custom extraction using Amazon Bedrock models
  • @cdklabs/genai-idp-sagemaker-udop-processor - Pattern 3 implementation for specialized document processing using Sagemaker Endpoint

Samples

  • sample-bda-lending - Complete Pattern 1 implementation for processing lending documents using Amazon Bedrock Data Automation
  • sample-bedrock - Pattern 2 demonstration using custom extraction with Amazon Bedrock foundation models
  • sample-sagemaker-udop-rvl-cdip - Pattern 3 implementation using fine-tuned Hugging Face RVL-CDIP model on Amazon SageMaker

Key Features

  • Modular CDK Architecture: Organized as reusable CDK constructs that can be composed into complete solutions
  • Multiple Processing Patterns: Pre-built document processing patterns for different use cases
  • Serverless Design: Built on AWS Lambda, Step Functions, SQS, and other serverless technologies
  • AI-Powered Document Processing: Leverages Amazon Bedrock, Textract, and other AWS AI services
  • Web User Interface: Optional secure web interface for document tracking and management
  • Document Knowledge Base: Query processed documents using natural language

Prerequisites

  • NVM (Node Version Manager)
  • yarn for node package management
  • Docker CLI (can be Docker Desktop or Rancher Desktop)
  • rsync for copying assets to packages
  • Python for building Python GenAI IDP distributable packages
  • .NET SDK for building .NET GenAI IDP distributable packages
  • AWS CLI configured with appropriate credentials
  • AWS CDK CLI (npm install -g aws-cdk)

Getting Started

Environment Setup

  1. Set up the correct Node.js version using NVM:
# Install the required Node.js version specified in .nvmrc
nvm install

# Use the project's Node.js version
nvm use
  1. Install Yarn globally (if not already installed):
npm i -g yarn
  1. Install project dependencies:
yarn install

Project Setup

  1. Ensure Docker is running and rsync is available

  2. (Re)scaffold the project:

yarn projen
  1. Build the packages:
yarn build

Note: During the first run this might take a while

License

This project is licensed under the terms specified in the LICENSE file.

Contributing

We welcome contributions! Please see our Contributing Guidelines for details on how to get started, development workflow, and coding standards.

Additional Resources

About

GenAI Intelligent Document Processing (IDP) Accelerator for AWS CDK is designed to transform unstructured documents into structured data at scale using AWS's latest AI/ML services

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •