About Β | Β Features Β | Β Technologies Β | Β Architecture Β | Β Requirements Β | Β Starting Β | Β License Β | Β Contact
California Procurement Agent is an intelligent analytical platform designed to explore and analyze California's state procurement data from the eSCPRS (Electronic California Procurement Reporting System). The system combines Retrieval-Augmented Generation (RAG) technology with advanced data visualization to provide insights into government spending patterns, supplier analysis, and procurement trends.
The platform features an AI-powered chat interface that allows users to ask natural language questions about procurement data and receive detailed text-based analysis, combined with comprehensive data exploration notebooks and interactive visualizations. The system processes real procurement data including purchase orders, supplier information, spending categories, and acquisition methods.
Key Data Insights:
- Analysis of $X+ in state procurement spending
- Examination of supplier diversity and qualification programs
- Contract vs non-contract spending patterns
- Department-wise spending analysis
- CalCard (state credit card) usage patterns
- Geographic distribution of procurement activities
Kaggle Notebook: For a detailed exploratory data analysis, check out our comprehensive notebook: California State Procurement EDA
- Natural Language Queries: Ask questions about procurement data in plain English
- Intelligent Agent: LangChain-powered agent with MongoDB query tools for complex data analysis
- Intelligent Responses: Get contextual answers with proper formatting and data insights
- Chat History: Save and manage conversation history with procurement analysis
- Interactive Charts: Plotly-powered visualizations for spending patterns, supplier analysis, and trends (available in Jupyter notebook)
- Comprehensive EDA: Jupyter notebook with statistical analysis and data insights
- Supplier Analysis: Identify top suppliers, qualification status, and diversity metrics
- Spending Patterns: Analyze contract vs non-contract spending, CalCard usage
- Department Insights: Compare spending across different state departments
- Acquisition Methods: Understand procurement methods and their distribution
- MongoDB Integration: Efficient storage and querying of large procurement datasets
- Data Normalization: Consistent field naming and data type handling
- API Endpoints: RESTful APIs for data access and analysis
- Modern UI: React-based frontend with responsive design
The following tools and frameworks were used in this project:
-
Backend:
- FastAPI - High-performance web framework for APIs
- MongoDB - NoSQL database for procurement data storage
- Google Gemini - AI model for natural language processing
- OpenAI - AI model for text generation and understanding
- LangChain - Framework for LLM applications and agent orchestration
- Pandas - Data manipulation and analysis
- Plotly - Interactive data visualization
- Matplotlib - Static visualizations
- Seaborn - Statistical data visualization
-
Frontend:
-
Data Analysis & Visualization:
- Plotly - Interactive charts and graphs
- Matplotlib - Static visualizations
- Seaborn - Statistical data visualization
- Jupyter - Interactive notebooks for data exploration
-
Data Processing:
- Kaggle API - Dataset downloading and management
- Custom data normalization and cleaning pipelines
The application follows a modern data analysis architecture:
-
Data Layer:
- MongoDB for storing normalized procurement data and chat history and user sessions
-
Processing Layer:
- Data loading and normalization from Kaggle datasets
- Field standardization and type conversion
- Indexing for efficient querying
-
Analysis Layer:
- Pandas-based data manipulation and analysis
- Statistical computations and aggregations
- Time-series analysis for procurement trends
-
AI Layer:
- LangChain-powered RAG system with intelligent agent
- Google Gemini for natural language understanding
- MongoDB query tools for data retrieval and analysis
- Custom prompts for procurement-specific queries
-
Visualization Layer:
- Plotly for interactive web-based charts
- Matplotlib/Seaborn for static analysis
- Jupyter notebooks for exploratory analysis
-
API Layer:
- FastAPI for RESTful endpoints
- Chat management and data querying APIs
-
Frontend Layer:
- React-based chat interface with text-based AI responses
- Data visualization available through Jupyter notebooks
- Responsive design for multiple devices
The procurement data analysis pipeline includes:
- Data Acquisition: Download and load California procurement datasets from Kaggle
- Data Normalization: Standardize field names, convert data types, handle missing values
- Database Storage: Store processed data in MongoDB for efficient querying
- Analysis Engine: Generate insights, statistics, and visualizations
- AI Integration: Enable natural language queries about procurement data
- User Interface: Provide chat interface and data exploration tools
Before starting, ensure you have the following installed:
- Python 3.11+
- Node.js 16+ and npm
- MongoDB (local or cloud instance)
- Google Gemini API key
- Kaggle API credentials (optional, for data updates)
# Clone this project
$ git clone https://github.com/romanyn36/california-procurement-agent.git
# Navigate to the project directory
$ cd california-procurement-agent
# Create a virtual environment i use uv package manger
$ uv sync
# Activate the virtual environment
$ source .venv/bin/activate # For Linux/Mac
$ .\.venv\Scripts\activate # For Windows
# Set up environment variables
$ cp .env.example .env
# Edit .env with your API keys:
# MONGODB_URI=mongodb://localhost:27017/
# OPENAI_API_KEY=your_openai_api_key
# GEMINI_API_KEY=your_gemini_api_key
# KAGGLE_USERNAME=your_kaggle_username
# KAGGLE_KEY=your_kaggle_key
# Load the procurement data
# and populate the MongoDB database
$ python -m database.data_loader
# Start the backend server
$ python -m uvicorn app:app --reload --host 0.0.0.0 --port 8000
# In a separate terminal, navigate to the frontend directory
$ cd agent-frontend
# Install frontend dependencies
$ npm install
# Start the development server
$ npm run dev
# Access the application:
# Frontend: http://localhost:5173
# Backend API: http://localhost:8000
# Data Exploration: Open data_exploring.ipynb in JupyterFor detailed data analysis, visit my Kaggle notebook: California State Procurement EDA
The notebook includes:
- Data loading and preprocessing
- Statistical analysis
- Interactive visualizations
- Procurement insights and trends
- tell me about the order with purchase order number REQ0011118
- Show me all orders for laptops or computers
- Who are the top 10 suppliers by total spending
- How many purchase orders were created in 2013
- What is the total spending across all departments?
- What are the top 5 departments by number of orders?
- Which suppliers have DVBE certification?
Key configuration files:
.env: API keys and database connectionsprompt_template.py: AI model prompts and field mappingsdatabase/mongodb_tools.py: Database query functionsagent.py: LangChain agent configuration with MongoDB query tools
Future enhancements planned:
- Chat Visualizations: Add interactive charts and graphs directly in the chat interface to visualize procurement data insights alongside text responses
- Advanced Analytics: Machine learning models for spending prediction
This project is licensed under the MIT License. For more details, see the LICENSE file.
-
Made by Romani β an AI Engineer and Backend Developer. Feel free to reach out for collaborations, questions, or new projects! You can contact me via email: contact@romaninasrat.com
-
You can also find me on:


