Project SALEM is an advanced web application designed for the automatic review and validation of documents using cutting-edge machine learning techniques. The application leverages Convolutional Neural Networks (CNNs) and Faster R-CNN for object detection and classification tasks, ensuring high accuracy in document processing. By utilizing these algorithms, SALEM can efficiently extract and validate key information from various document types, such as invoices and service delivery notes.
The choice of CNNs, specifically ResNet50, allows the application to benefit from a deep learning model pre-trained on a vast dataset, providing robust feature extraction capabilities. Faster R-CNN is employed for its superior performance in object detection, enabling precise identification of document fields. Additionally, the application incorporates techniques like data augmentation and early stopping to enhance model generalization and prevent overfitting.
SALEM's architecture is built on the MVC (Model-View-Controller) pattern, ensuring a clean separation of concerns and facilitating maintainability. The integration of Tesseract OCR further enhances the system's ability to extract textual information from images, making it a comprehensive solution for document validation.
-
S for Smart Recognition: Implementing advanced AI algorithms for the automatic recognition of invoices, contracts, and reception reports, ensuring high accuracy in data extraction.
-
A for Automated Processing: Streamlining document processing workflows by automating data validation and classification, reducing manual effort and errors.
-
L for Learning Adaptability: Utilizing machine learning to improve recognition models over time, adapting to different document formats and languages.
-
E for Efficient Validation: Ensuring data consistency and correctness through intelligent validation mechanisms and error detection.
-
M for Management Integration: Seamlessly integrating with enterprise management systems to facilitate data synchronization and actionable insights.
By: Fabiana Liria & Mateo Ávila
- Python 3.8
- Node.js
- npm (Node Package Manager)
- Account in MongoDB Atlas or you must have install MongoDB on your machine
- Tesseract OCR
-
Clone the repository:
git clone https://github.com/fabiliria280802/SALEM.git
-
Open two terminals:
-
Terminal 1 (Backend setup): Navigate to the backend directory (adjust the path accordingly if needed):
cd ../SALEM/backendServer will run on port 5000
if you are using POSTMAN, you should copy and paste this url http://localhost:5000/api/auth/login
-
Terminal 2 (Frontend setup): In the second terminal, navigate to the frontend directory (adjust the path accordingly if needed):
cd ../SALEM/frontendOpen http://localhost:3000 to view it in your browser.
-
-
Install dependencies:
-
For Backend: In the backend terminal:
npm install
Note: For more information about backend see this page
-
For Frontend: In the frontend terminal:
npm install
-
-
Start the Backend: In the backend terminal, run:
npm start
-
Start the Frontend: In the frontend terminal, run:
npm start
The project uses a custom workflow for transitioning between branches through pull requests.
-
Transition from Development to QA: To transition from
developmentbranch to theqabranch, you must write the following in your commit message:QA transition: YOUR_MESSAGE_CONTENT
-
Transition from QA to Production: To transition from
qabranch to themainbranch, write the following in your commit message:Production transition: YOUR_MESSAGE_CONTENT
This will automate the process of creating pull requests for the respective branch transitions.
To ensure the quality of the code, the following scripts are used:
- Run Tests (Jest):
npm run test- Run Linter (ESLint with Auto-fix):
npm run lint- Run Code Formatter (Prettier):
npm run formatThese commands will help you maintain clean and consistent code throughout the project.