Extract and structure financial transaction data from unstructured text
txn_harvester is a Python package designed to parse and validate financial transaction data from raw text inputs (e.g., bank statements, transaction logs) into structured formats. It leverages LLM7 (via langchain_llm7) by default, but supports any LangChain-compatible LLM for flexibility.
- Extracts transaction details (amount, date, description, category) from unstructured text
- Validates output against predefined financial patterns using regex
- Supports custom LLMs (OpenAI, Anthropic, Google, etc.) via LangChain
- Lightweight and easy to integrate into financial workflows
pip install txn_harvesterfrom txn_harvester import txn_harvester
user_input = """
Paid for groceries at Whole Foods: $125.50 on 2024-05-15
Rent payment: $1500.00 (due 2024-05-20)
"""
response = txn_harvester(user_input)
print(response)from langchain_openai import ChatOpenAI
from txn_harvester import txn_harvester
llm = ChatOpenAI(model="gpt-4")
response = txn_harvester(user_input, llm=llm)from langchain_anthropic import ChatAnthropic
from txn_harvester import txn_harvester
llm = ChatAnthropic(model="claude-2")
response = txn_harvester(user_input, llm=llm)from langchain_google_genai import ChatGoogleGenerativeAI
from txn_harvester import txn_harvester
llm = ChatGoogleGenerativeAI(model="gemini-pro")
response = txn_harvester(user_input, llm=llm)- Default: Uses
LLM7_API_KEYfrom environment variables. - Manual: Pass via
api_keyparameter or setLLM7_API_KEYin your shell:export LLM7_API_KEY="your_api_key_here"
- Free Tier: Sufficient for most use cases (rate limits apply).
- Get Key: Register at https://token.llm7.io
| Parameter | Type | Description |
|---|---|---|
user_input |
str |
Raw text containing financial transactions (required). |
api_key |
Optional[str] |
LLM7 API key (optional; defaults to LLM7_API_KEY). |
llm |
Optional[BaseChatModel] |
Custom LangChain LLM (optional; defaults to ChatLLM7). |
Returns a list of structured transaction data (e.g., [{"amount": "$125.50", "date": "2024-05-15", ...}]).
- Modify regex patterns in
prompts.pyto adapt to specific transaction formats. - Extend the package by subclassing
txn_harvesterfor domain-specific parsing.
MIT
- GitHub Issues: https://github.com/chigwell/txn-harvester/issues
- Author: Eugene Evstafev (hi@euegne.plus)