Caderneta is a Python-based financial transaction processing and classification system. It extracts, categorizes, and formats financial transaction data from textual messages, leveraging natural language processing (NLP) and machine learning techniques.
- Transaction Parsing: Extracts key details such as date, value, payment method, and category from financial messages.
- Text Classification: Uses a machine learning pipeline to classify messages into predefined categories.
- Data Persistence: Updates and saves classified data to a CSV file for future use.
- Customizable: Supports training and retraining of the classification model with new data.
- Invoice Image Processing: Allows users to upload an invoice image, which is stored in an S3 bucket. This triggers an AWS Lambda function that extracts text from the image and sends it to Amazon Bedrock for further processing.
-
Message Parsing:
- The
ConstrutorTransacaoclass processes financial messages to extract transaction details. - It identifies dates, monetary values, payment methods, and categories using regex patterns and predefined rules.
- The
-
Text Classification:
- The
ClassificadorTextoclass preprocesses messages using tokenization, lemmatization, and stopword removal. - A machine learning pipeline (TF-IDF vectorizer + classifier) predicts the category of the message.
- If the model is not confident, fallback rules are applied to classify the message.
- The
-
Data Management:
- Classified messages are stored in a CSV file for persistence.
- The model and vectorizer are saved as
.joblibfiles for reuse.
-
Transaction Formatting:
- The
format_transactionmethod formats transaction details into a human-readable string for display.
- The
⚠ Under Construction ⚠