Caderneta

Caderneta is a Python-based financial transaction processing and classification system. It extracts, categorizes, and formats financial transaction data from textual messages, leveraging natural language processing (NLP) and machine learning techniques.

Features

Transaction Parsing: Extracts key details such as date, value, payment method, and category from financial messages.
Text Classification: Uses a machine learning pipeline to classify messages into predefined categories.
Data Persistence: Updates and saves classified data to a CSV file for future use.
Customizable: Supports training and retraining of the classification model with new data.
Invoice Image Processing: Allows users to upload an invoice image, which is stored in an S3 bucket. This triggers an AWS Lambda function that extracts text from the image and sends it to Amazon Bedrock for further processing.

How It Works

Message Parsing:
- The ConstrutorTransacao class processes financial messages to extract transaction details.
- It identifies dates, monetary values, payment methods, and categories using regex patterns and predefined rules.
Text Classification:
- The ClassificadorTexto class preprocesses messages using tokenization, lemmatization, and stopword removal.
- A machine learning pipeline (TF-IDF vectorizer + classifier) predicts the category of the message.
- If the model is not confident, fallback rules are applied to classify the message.
Data Management:
- Classified messages are stored in a CSV file for persistence.
- The model and vectorizer are saved as .joblib files for reuse.
Transaction Formatting:
- The format_transaction method formats transaction details into a human-readable string for display.

Setup

⚠ Under Construction ⚠

Name		Name	Last commit message	Last commit date
Latest commit History 303 Commits
.github/workflows		.github/workflows
.vscode		.vscode
migrations		migrations
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
buildspec.yml		buildspec.yml
cli.py		cli.py
const.py		const.py
docker-compose.yml		docker-compose.yml
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caderneta

Features

How It Works

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Caderneta

Features

How It Works

Setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages