Skip to content

A chat application using LangChain AI agent with MongoDB tools to query and analyze California's procurement data from Kaggle, featuring natural language queries on spending, suppliers, and orders via FastAPI and React.

License

Notifications You must be signed in to change notification settings

romanyn36/california-procurement-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

California Procurement Agent

California Procurement Agent
California Procurement Agent California Procurement Agent
Β 

GitHub top language GitHub language count Repository size License

LangChain MongoDB Google Gemini FastAPI Python Pandas Plotly React Vite SQLite JavaScript HTML5 CSS3 Kaggle VS Code Git Postman

About Β  | Β  Features Β  | Β  Technologies Β  | Β  Architecture Β  | Β  Requirements Β  | Β  Starting Β  | Β  License Β  | Β  Contact


🎯 About

California Procurement Agent is an intelligent analytical platform designed to explore and analyze California's state procurement data from the eSCPRS (Electronic California Procurement Reporting System). The system combines Retrieval-Augmented Generation (RAG) technology with advanced data visualization to provide insights into government spending patterns, supplier analysis, and procurement trends.

The platform features an AI-powered chat interface that allows users to ask natural language questions about procurement data and receive detailed text-based analysis, combined with comprehensive data exploration notebooks and interactive visualizations. The system processes real procurement data including purchase orders, supplier information, spending categories, and acquisition methods.

Key Data Insights:

  • Analysis of $X+ in state procurement spending
  • Examination of supplier diversity and qualification programs
  • Contract vs non-contract spending patterns
  • Department-wise spending analysis
  • CalCard (state credit card) usage patterns
  • Geographic distribution of procurement activities

Kaggle Notebook: For a detailed exploratory data analysis, check out our comprehensive notebook: California State Procurement EDA


✨ Features

AI-Powered Chat Interface:

  • Natural Language Queries: Ask questions about procurement data in plain English
  • Intelligent Agent: LangChain-powered agent with MongoDB query tools for complex data analysis
  • Intelligent Responses: Get contextual answers with proper formatting and data insights
  • Chat History: Save and manage conversation history with procurement analysis

Data Exploration & Visualization:

  • Interactive Charts: Plotly-powered visualizations for spending patterns, supplier analysis, and trends (available in Jupyter notebook)
  • Comprehensive EDA: Jupyter notebook with statistical analysis and data insights

Procurement Data Analysis:

  • Supplier Analysis: Identify top suppliers, qualification status, and diversity metrics
  • Spending Patterns: Analyze contract vs non-contract spending, CalCard usage
  • Department Insights: Compare spending across different state departments
  • Acquisition Methods: Understand procurement methods and their distribution

Technical Features:

  • MongoDB Integration: Efficient storage and querying of large procurement datasets
  • Data Normalization: Consistent field naming and data type handling
  • API Endpoints: RESTful APIs for data access and analysis
  • Modern UI: React-based frontend with responsive design

πŸš€ Technologies

The following tools and frameworks were used in this project:

  • Backend:

    • FastAPI - High-performance web framework for APIs
    • MongoDB - NoSQL database for procurement data storage
    • Google Gemini - AI model for natural language processing
    • OpenAI - AI model for text generation and understanding
    • LangChain - Framework for LLM applications and agent orchestration
    • Pandas - Data manipulation and analysis
    • Plotly - Interactive data visualization
    • Matplotlib - Static visualizations
    • Seaborn - Statistical data visualization
  • Frontend:

    • React - UI library for modern web applications
    • Vite - Fast build tool and development server
    • Axios - HTTP client for API communication
  • Data Analysis & Visualization:

    • Plotly - Interactive charts and graphs
    • Matplotlib - Static visualizations
    • Seaborn - Statistical data visualization
    • Jupyter - Interactive notebooks for data exploration
  • Data Processing:

    • Kaggle API - Dataset downloading and management
    • Custom data normalization and cleaning pipelines

Architecture

The application follows a modern data analysis architecture:

  1. Data Layer:

    • MongoDB for storing normalized procurement data and chat history and user sessions
  2. Processing Layer:

    • Data loading and normalization from Kaggle datasets
    • Field standardization and type conversion
    • Indexing for efficient querying
  3. Analysis Layer:

    • Pandas-based data manipulation and analysis
    • Statistical computations and aggregations
    • Time-series analysis for procurement trends
  4. AI Layer:

    • LangChain-powered RAG system with intelligent agent
    • Google Gemini for natural language understanding
    • MongoDB query tools for data retrieval and analysis
    • Custom prompts for procurement-specific queries
  5. Visualization Layer:

    • Plotly for interactive web-based charts
    • Matplotlib/Seaborn for static analysis
    • Jupyter notebooks for exploratory analysis
  6. API Layer:

    • FastAPI for RESTful endpoints
    • Chat management and data querying APIs
  7. Frontend Layer:

    • React-based chat interface with text-based AI responses
    • Data visualization available through Jupyter notebooks
    • Responsive design for multiple devices

Pipeline

The procurement data analysis pipeline includes:

  1. Data Acquisition: Download and load California procurement datasets from Kaggle
  2. Data Normalization: Standardize field names, convert data types, handle missing values
  3. Database Storage: Store processed data in MongoDB for efficient querying
  4. Analysis Engine: Generate insights, statistics, and visualizations
  5. AI Integration: Enable natural language queries about procurement data
  6. User Interface: Provide chat interface and data exploration tools

βœ… Requirements

Before starting, ensure you have the following installed:

  • Python 3.11+
  • Node.js 16+ and npm
  • MongoDB (local or cloud instance)
  • Google Gemini API key
  • Kaggle API credentials (optional, for data updates)

🏁 Starting

# Clone this project
$ git clone https://github.com/romanyn36/california-procurement-agent.git

# Navigate to the project directory
$ cd california-procurement-agent

# Create a virtual environment i use uv package manger
$ uv sync

# Activate the virtual environment
$ source .venv/bin/activate  # For Linux/Mac
$ .\.venv\Scripts\activate    # For Windows

# Set up environment variables
$ cp .env.example .env
# Edit .env with your API keys:
# MONGODB_URI=mongodb://localhost:27017/
# OPENAI_API_KEY=your_openai_api_key
# GEMINI_API_KEY=your_gemini_api_key
# KAGGLE_USERNAME=your_kaggle_username
# KAGGLE_KEY=your_kaggle_key

# Load the procurement data
# and populate the MongoDB database
$ python -m database.data_loader

# Start the backend server
$ python -m uvicorn app:app --reload --host 0.0.0.0 --port 8000

# In a separate terminal, navigate to the frontend directory
$ cd agent-frontend

# Install frontend dependencies
$ npm install

# Start the development server
$ npm run dev

# Access the application:
# Frontend: http://localhost:5173
# Backend API: http://localhost:8000
# Data Exploration: Open data_exploring.ipynb in Jupyter

Data Exploration

For detailed data analysis, visit my Kaggle notebook: California State Procurement EDA

The notebook includes:

  • Data loading and preprocessing
  • Statistical analysis
  • Interactive visualizations
  • Procurement insights and trends

πŸ’¬ Example Queries

  • tell me about the order with purchase order number REQ0011118
  • Show me all orders for laptops or computers
  • Who are the top 10 suppliers by total spending
  • How many purchase orders were created in 2013
  • What is the total spending across all departments?
  • What are the top 5 departments by number of orders?
  • Which suppliers have DVBE certification?

Configuration

Key configuration files:

  1. .env: API keys and database connections
  2. prompt_template.py: AI model prompts and field mappings
  3. database/mongodb_tools.py: Database query functions
  4. agent.py: LangChain agent configuration with MongoDB query tools

🚧 What's Next?

Future enhancements planned:

  • Chat Visualizations: Add interactive charts and graphs directly in the chat interface to visualize procurement data insights alongside text responses
  • Advanced Analytics: Machine learning models for spending prediction

πŸ“ License

This project is licensed under the MIT License. For more details, see the LICENSE file.

❀️ Contact Me

  • Made by Romani – an AI Engineer and Backend Developer. Feel free to reach out for collaborations, questions, or new projects! You can contact me via email: contact@romaninasrat.com

  • You can also find me on:

Back to top

About

A chat application using LangChain AI agent with MongoDB tools to query and analyze California's procurement data from Kaggle, featuring natural language queries on spending, suppliers, and orders via FastAPI and React.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors