Automated resume parsing using Amazon Bedrock Data Automation with hierarchical data extraction and real-time S3 event triggers.
Real-Time Event-Driven Processing:
- S3 Upload → Automatic trigger via S3 Event Notifications
- Lambda Processing → BDA hierarchical blueprint extraction
- Structured Output → Organized JSON results saved to S3
- Error Handling → DLQ + monitoring for failures
More detailed description in Architecture
Key Benefits:
- ✅ Hierarchical Data Structure - Organized sections (Personal, Education, Experience, Skills)
- ✅ Sub-second triggers - Immediate processing on upload
- ✅ Auto-scaling - Handle multiple concurrent resumes
- ✅ Infrastructure as Code - Complete CDK deployment with blueprints
- ✅ Development Workflow - DEV → LIVE promotion for safe deployments
# Deploy everything with CDK (blueprint, project, S3, Lambda, etc.)
cd infrastructure
uv sync
uv run cdk bootstrap # One-time setup
uv run cdk deploy
# Note: This creates blueprint and project in DEVELOPMENT stage# 🚀 Launch interactive menu
./scripts/bda_workflow.sh data/sample_resume.pdf
# Choose from menu:
# Option 3: Production processing with blueprint stage verification
# Option 5: Full DEV→LIVE workflow automation
# Option 1-2: Individual development and promotion steps
# Option 4: LIVE stage testingbda_usecases/
├── infrastructure/ # CDK Infrastructure as Code
│ ├── app.py # CDK app entry point
│ ├── stacks/bda_stack.py # S3, Lambda, IAM resources
│ └── pyproject.toml # CDK dependencies
├── lambda/ # Lambda function code
│ ├── handler.py # S3 event processor
│ ├── bda_parser/ # Core BDA processing module
│ │ ├── bda_client.py # BDA client with blueprint processing
│ │ ├── blueprint_schema.json # Custom extraction schema
│ │ └── config.py # AWS configuration
│ └── pyproject.toml # Runtime dependencies
├── cli/ # Setup & testing tools
│ ├── promote_blueprint.py # Promote DEV → LIVE stage
├── scripts/ # Automation scripts
│ ├── bda_workflow.sh # 🚀 Unified processing & workflow automation
│ └── cleanup.sh # Infrastructure cleanup script
├── data/
│ └── sample_resume.pdf # Test document
└── architecture/ # Architecture diagrams and documentation
├── README.md # Detailed architecture overview
├── bda_architecture.png # System architecture diagram
└── bda_workflow.png # Development workflow diagram
Resume Upload → s3://bucket/input/resume.pdf
↓ (S3 Event Notification)
Lambda Function Invoked
# Lambda receives S3 event
{
"Records": [{
"s3": {
"bucket": {"name": "bda-resume-bucket"},
"object": {"key": "input/resume.pdf"}
}
}]
}
# Processes with BDA blueprint
# Saves results to output/ prefixs3://bucket/
├── input/
│ └── 20240923_112801_resume.pdf # Uploaded resume
└── output/
└── 20240923_112801_resume/ # Processing results
└── 0/custom_output/0/
└── result.json # Structured data
- DEVELOPMENT Stage - Blueprint and project created for testing
- Testing Phase - Validate extraction quality with sample resumes
- LIVE Promotion - Move to production when ready
- Monitoring - CloudWatch logs and error tracking
Extracts structured resume data in organized sections:
{
"matched_blueprint": {
"arn": "arn:aws:bedrock:ap-south-1:###:blueprint/###",
"name": "resume-parser-hierarchical-###",
"confidence": 1
},
"document_class": {
"type": "Resume"
},
"split_document": {
"page_indices": [0, 1]
},
"inference_result": {
"skills": {
"technical": "Programming Languages: Python, JavaScript, Java, Go, SQL Cloud Platforms: AWS, Azure, Google Cloud Platform Frameworks: React, Django, Flask, Express.js Databases: PostgreSQL, MongoDB, Redis, DynamoDB DevOps: Docker, Kubernetes, Jenkins, Terraform, Git",
"languages": "English (Native), Spanish (Conversational)",
"certifications": "AWS Solutions Architect Associate (AWS-SAA-123456), Certified Kubernetes Administrator (CKA-789012)",
"tools": "Python, AWS, Docker, Kubernetes, PostgreSQL, JavaScript, React, Django, Flask, Express.js, JavaScript, React, Node.js, MongoDB, PostgreSQL, Redis, DynamoDB, Jenkins, Terraform, Git",
"soft": "Leadership, Team Collaboration, Problem Solving, Communication, Project Management"
},
"personal_info": {
"full_name": "John Smith",
"address": "123 Main Street, Seattle, WA 98101",
"phone": "(555) 123-4567",
"linkedin": "linkedin.com/in/johnsmith",
"email": "[email protected]"
},
"educational_info": {
"institution": "University of Washington, Seattle, WA",
"graduation_year": "June 2020",
"degree": "Bachelor of Science",
"gpa": "3.8",
"field_of_study": "Computer Science"
},
"experience": {
"key_achievements": "Lead development of cloud-native applications using AWS services, Reduced system latency by 40% through performance optimization, Led team of 5 engineers on microservices migration project, Implemented CI/CD pipeline reducing deployment time by 60%, Developed full-stack web applications for e-commerce platform, Built payment processing system handling $1M+ in monthly transactions, Improved application performance by 50% through code optimization",
"current_position": "Senior Software Engineer",
"current_company": "Tech Corp",
"years_total": "4+",
"previous_roles": "Software Engineer, StartupXYZ, July 2020 - December 2021"
}
}
}Benefits of Hierarchical Structure:
- ✅ Organized Output - Clear sections for downstream processing
- ✅ Easy Integration - Structured for databases and APIs
- ✅ Maintainable Schema - Reusable object definitions
- ✅ Type Safety - Consistent data types per section
# Lambda function logs
aws logs tail /aws/lambda/BDAResumeStack-BDAProcessorFunction --follow
# DLQ messages (failed processing)
aws sqs receive-message --queue-url <dlq-url>
# S3 event notifications
aws s3api get-bucket-notification-configuration --bucket <bucket-name># Check Lambda logs (real-time)
aws logs tail /aws/lambda/BDAResumeStack-BDAProcessorFunction --follow
# List processed results
aws s3 ls s3://<bucket-name>/output/ --recursive
# Check CDK stack outputs
cd infrastructure && uv run cdk list --json# Check DLQ for failed processing
aws sqs receive-message --queue-url <dlq-url>
# S3 event configuration
aws s3api get-bucket-notification-configuration --bucket <bucket-name># Interactive menu (recommended)
./scripts/bda_workflow.sh data/sample_resume.pdf
# Process multiple files (choose option 3 for each)
for file in resumes/*.pdf; do
echo "Processing $file..."
./scripts/bda_workflow.sh "$file"
# Select option 3 for production processing
donecd infrastructure
# Preview changes before deployment
uv run cdk diff
# Deploy updates
uv run cdk deploy
# View stack outputs (ARNs, bucket names, etc.)
uv run aws cloudformation describe-stacks --stack-name BDAResumeStack --query "Stacks[0].Outputs"
# Cleanup (removes all resources)
uv run cdk destroyThe project includes a comprehensive cleanup script that safely removes all AWS resources:
# Safe cleanup - shows what would be deleted but doesn't delete
./scripts/cleanup.sh
# Force cleanup - actually deletes all resources
./scripts/cleanup.sh trueCleanup Script Features:
- ✅ Safe by default - Dry run mode shows resources without deleting
- ✅ CDK-aware - Reads actual resources from CloudFormation stack
- ✅ Comprehensive - Removes S3 buckets, Lambda functions, BDA resources, IAM roles
- ✅ Smart detection - Finds resources even if stack is partially deleted
- ✅ Force mode -
trueargument enables actual deletion - ✅ Verification - Confirms cleanup completion and shows any remaining resources
What gets cleaned up:
- CloudFormation stack (CDK-managed resources)
- S3 bucket and all contents
- Lambda functions
- BDA blueprints and projects
- SQS dead letter queues
- IAM roles and policies
Usage Examples:
# Check what would be deleted (safe preview)
./scripts/cleanup.sh
# Actually delete everything (use with caution)
./scripts/cleanup.sh true
# Alternative: CDK-only cleanup (may leave some resources)
cd infrastructure && uv run cdk destroy
