-
Notifications
You must be signed in to change notification settings - Fork 0
[ADE-152] Add AWS Textract, AWS Bedrock services with medical information extraction & analysis functionalities #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e57b4e4 to
ce15609
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 20 out of 22 changed files in this pull request and generated no comments.
Files not reviewed (2)
- backend/package.json: Language not supported
- package.json: Language not supported
Comments suppressed due to low confidence (3)
backend/src/utils/security.utils.ts:93
- The error message excludes 'application/pdf', which is defined in MAX_FILE_SIZES. Update the message to include PDF files or restrict allowed MIME types accordingly.
if (!ALLOWED_MIME_TYPES.has(mimeType)) { throw new BadRequestException('Only JPEG, PNG, and HEIC/HEIF images are allowed'); }
backend/src/services/document-processor.service.spec.ts:222
- Since processBatch is an async function that returns a promise, use 'await expect(testService.processBatch([], userId)).rejects.toThrow(BadRequestException)' to correctly test for promise rejections.
expect(() => { testService.processBatch([], userId); }).toThrow(BadRequestException);
backend/src/config/configuration.ts:26
- The radix provided in parseInt is 20, which might cause unexpected results. It is recommended to use 10 as the radix for proper decimal conversion.
AWS_BEDROCK_REQUESTS_PER_MINUTE: parseInt(process.env.AWS_BEDROCK_REQUESTS_PER_MINUTE || '20', 20),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 18 out of 20 changed files in this pull request and generated no comments.
Files not reviewed (2)
- backend/package.json: Language not supported
- package.json: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 18 out of 19 changed files in this pull request and generated no comments.
Files not reviewed (1)
- backend/package.json: Language not supported
…cal information extraction functionality
…erage-v8 and vitest, and adjust peer dependencies for compatibility with Node.js 18 and above.
…extraction, adding validation for medical reports and handling of missing information and low confidence scenarios in tests.
…nd consistency in test cases for medical information extraction
…e limiting, and enhance medical information extraction process with better error logging and validation checks.
…tting and consistency in error handling
…ce to use 'anthropic.claude-3-7-sonnet-20250219-v1:0'
… for better readability
…formation extraction, including validation for image types, improved error handling, and updated test cases for various image scenarios.
…upport HEIC/HEIF formats, update error messages, and add new test cases for JPEG and HEIC/HEIF images.
…d model handling and image processing capabilities Add a controller for testing AwsBedrockService
… mock implementations, enhance image processing tests, and ensure better handling of medical information extraction scenarios.
…r improved readability and consistency, including adjustments to mock implementations and error handling in medical information extraction scenarios.
…orts - Introduced AwsTextractService for handling interactions with AWS Textract API. - Added TextractModule to encapsulate Textract service functionality. - Implemented file validation and rate limiting for document processing. - Created documentation for AWS Textract integration detailing implementation and error handling. - Updated package.json and package-lock.json to include AWS Textract dependencies. - Enhanced security utilities to support PDF file validation.
…uding associated DTOs, module, and tests. This cleanup eliminates unused components related to AWS Bedrock testing, streamlining the codebase.
…ns and improve code formatting. Consolidated controller array into a single line and adjusted middleware exclusion for better readability.
…on and image processing capabilities - Refactored AwsBedrockService to remove unused dependencies and streamline the model invocation process. - Updated the mock implementation in app.module.spec.ts to reflect changes in model response handling. - Enhanced test coverage in aws-bedrock.service.spec.ts by removing outdated tests and improving mock setups for medical information extraction. - Increased the maximum allowed file size for PDF uploads in security.utils.ts to accommodate larger documents.
…nality - Increased the document requests per minute limit in backend/src/config/configuration.ts from 10 to 20. - Imported RateLimiter from security.utils in backend/src/services/aws-textract.service.ts to enhance request management. - Removed the RateLimiter class definition from aws-textract.service.ts as it is now imported from the utility module.
…e limiting - Added requestsPerMinute configuration in backend/src/config/configuration.ts to manage API request limits. - Refactored AwsBedrockService to include methods for initializing the Bedrock client, creating credentials, and configuring model ID and inference profile ARN. - Implemented a rate limiter to control the number of requests sent to AWS Bedrock, ensuring compliance with usage limits. - Improved error handling during Bedrock model invocation for better debugging and user feedback.
… ARN configuration - Removed fallback values for model ID and inference profile ARN in backend/src/services/aws-bedrock.service.ts, ensuring that configuration values are directly retrieved from the config service. - Updated logging to reflect the current configuration state without hardcoded defaults.
…response parsing - Eliminated metadata properties such as documentType, pageCount, and isLabReport from the ExtractedTextResult interface in backend/src/services/aws-textract.service.ts. - Updated the parseTextractResponse method to no longer require pageCount as a parameter and removed related logic for determining document type and lab report status. - Adjusted unit tests in backend/src/services/aws-textract.service.spec.ts to reflect the removal of metadata checks, ensuring tests focus on essential response validation.
- Introduced a private method createTextractClient in backend/src/services/aws-textract.service.ts to streamline the initialization of the AWS Textract client. - Removed redundant code from the constructor, enhancing readability and maintainability. - Improved logging for client initialization without exposing sensitive credentials.
- Renamed configService to mockConfigService for clarity in backend/src/services/aws-textract.service.spec.ts. - Simplified the setup of mock dependencies by directly creating the mockConfigService instance. - Enhanced readability by removing unnecessary async/await in the beforeEach setup.
- Introduced MedicalDocumentAnalysis interface in backend/src/services/aws-bedrock.service.ts to define the structure of medical analysis results. - Implemented analyzeMedicalDocument method to analyze medical documents and return structured data, including key medical terms, lab values, and diagnoses. - Added comprehensive mock responses for various scenarios in backend/src/services/aws-bedrock.service.spec.ts to improve unit test coverage. - Included validation for response structure and error handling for invalid or empty responses, ensuring robustness in medical document processing.
…ate limiting - Updated AwsBedrockService to include user ID as a parameter in analyzeMedicalDocument and generateResponse methods for improved rate limiting. - Refactored AwsTextractService to replace client IP with user ID in extractText and processBatch methods, ensuring consistent rate limiting across services. - Enhanced unit tests in aws-bedrock.service.spec.ts and aws-textract.service.spec.ts to validate the new user ID-based rate limiting functionality, including handling of rate limit exceptions.
…king - Added a cleanupOldEntries method in backend/src/utils/security.utils.ts to remove old entries from the requests map when it exceeds a defined threshold. - Enhanced the RateLimiter class to maintain efficient tracking of user requests by cleaning up inactive user IDs, ensuring optimal memory usage and performance.
…document processing - Introduced DocumentProcessorModule in backend/src/modules/document-processor.module.ts to encapsulate the document processing logic. - Implemented DocumentProcessorService in backend/src/services/document-processor.service.ts, integrating AWS Textract for text extraction and AWS Bedrock for medical analysis. - Added unit tests for DocumentProcessorService in backend/src/services/document-processor.service.spec.ts to ensure functionality and error handling. - Updated app.module.ts to include DocumentProcessorModule, enhancing the application's capability to process medical documents efficiently.
- Consolidated PDF and image processing into a single method, processDocument, in backend/src/services/aws-textract.service.ts for improved maintainability. - Updated logging to differentiate between PDF and image processing within the new method. - Removed redundant code related to separate processing methods for images and PDFs, enhancing code clarity.
…t processing - Introduced DocumentProcessorController in backend/src/controllers/document-processor.controller.ts to handle document upload and processing. - Implemented endpoints for uploading documents and retrieving a test form, enhancing the document processing functionality. - Updated backend/README.md to include detailed information about the new endpoints and usage instructions for the medical document processor.
…tractModule - Removed TextractModule from backend/src/app.module.ts as it is no longer needed. - Updated providers in app.module.ts to exclude AwsBedrockService. - Enhanced document-processor.module.ts to export AwsTextractService and AwsBedrockService, ensuring proper service availability for document processing.
- Removed the AWS Textract integration documentation from backend/docs/aws-textract-integration.md as it is no longer relevant. - Updated import paths in backend/src/app.module.ts and backend/src/app.module.spec.ts to reflect the new directory structure for document processing services. - Introduced backend/src/document-processor/document-processor.module.ts to encapsulate document processing logic, including AWS Textract and Bedrock services. - Added backend/src/document-processor/controllers/document-processor.controller.ts to handle document uploads and processing requests. - Implemented backend/src/document-processor/services/aws-textract.service.ts and backend/src/document-processor/services/aws-bedrock.service.ts for text extraction and medical analysis, respectively. - Enhanced unit tests for the new services and controller to ensure functionality and error handling.
…lified explanations - Added PerplexityService to backend/src/document-processor/document-processor.module.ts for generating simplified explanations of medical documents. - Updated DocumentProcessorService in backend/src/document-processor/services/document-processor.service.ts to include logic for generating simplified explanations during document processing. - Modified DocumentProcessorController in backend/src/document-processor/controllers/document-processor.controller.ts to return simplified explanations alongside analysis results. - Enhanced unit tests in backend/src/document-processor/services/document-processor.service.spec.ts to validate the integration of PerplexityService and the new simplified explanation feature.
…; update README.md to streamline document processing instructions.
Change
This pull request introduces a significant update to the backend module, focusing on adding a new document processing feature, updating dependencies, and enhancing configuration and testing.
New Document Processing Feature:
backend/README.md: Added a new section detailing the Medical Document Processor Test Controller, including endpoints for testing form, uploading documents, and checking test status.backend/src/document-processor/document-processor.module.ts: Created a newDocumentProcessorModulewith necessary services and controllers for document processing.Dependency Updates:
backend/package.json: Added new dependencies for AWS SDK clients (@aws-sdk/client-bedrock,@aws-sdk/client-bedrock-runtime,@aws-sdk/client-textract) and updated existing dependencies (@types/multer,@vitest/coverage-v8,vitest). [1] [2] [3]Configuration Enhancements:
backend/src/config/configuration.ts: Updated configuration to include new settings for AWS Bedrock and Textract services.Testing Improvements:
backend/src/app.module.spec.ts: Enhanced the test setup by adding mock services (AwsBedrockService,PerplexityService,AwsSecretsService,AwsTextractService) and loading configuration. [1] [2]backend/src/document-processor/services/aws-bedrock.service.spec.ts: Added comprehensive tests for theAwsBedrockService, including initialization, document analysis, error handling, and response validation.Middleware Adjustments:
backend/src/app.module.ts: Modified theAuthMiddlewareto exclude the health check endpoint from authentication.Does this PR introduce a breaking change?
No
What needs to be documented once your changes are merged?
Nothing
Additional Comments
No