This application processes documents using AWS Textract, restructures the content, and allows users to ask questions about the document using Claude, an AI model from Anthropic.
- Document upload and processing using AWS Textract
- Document content restructuring
- Question-answering capability using Claude AI
- User-friendly interface built with Streamlit
- Python 3.7+
- AWS account with access to S3 and Textract services
- Anthropic API access for Claude
-
Clone the repository:
git clone https://github.com/yourusername/document-qa-app.git cd document-qa-app
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your AWS credentials:
- Create a file named
~/.aws/credentials
(on Linux/Mac) orC:\Users\YourUsername\.aws\credentials
(on Windows) - Add your AWS access key and secret key:
[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
- Create a file named
-
Run the Streamlit app:
streamlit run app.py
-
Open your web browser and go to
http://localhost:8501
-
Upload a document (PDF, PNG, JPG, or JPEG)
-
Wait for the document to be processed and restructured
-
Enter your question about the document and click "Submit"
-
View the AI-generated answer
app.py
: Main Streamlit applicationtextract_processing.py
: Handles document processing with AWS Textractdocument_restructuring.py
: Restructures the processed document contentclaude_qa.py
: Manages interaction with the Claude AI model for question-answeringrequirements.txt
: Lists all Python dependencies