Welcome to the E-commerce Data Simulator! This project is a powerful, configurable tool designed to generate realistic user interaction data for an e-commerce website. Inspired by platforms like Google Analytics, it provides a robust stream of events - from page views to purchases - that can be used to test data pipelines, build analytics dashboards, and train machine learning models.
The primary goal is to create a flexible and scalable simulation environment that mimics real-world user behavior, allowing developers and data analysts to work with high-quality, high-volume data without relying on actual production traffic.
-
Dynamic Event Simulation: Generate a wide variety of user events, including:
- Website Visits (new and returning users)
- Page Views (home, category, product pages)
- Product Impressions
- Adding/Removing Products from Cart
- Initiating and Completing Checkout
- Adding Products to Wishlists
- User Sign-ups and Logins
-
High Configurability: Easily customize the simulation parameters to fit your needs.
- Product Catalog: Define your own products, categories, and pricing via a simple configuration file (e.g.,
products.json). - User Personas: Simulate different types of users (e.g., "Bargain Hunter," "Brand Loyalist," "Casual Browser") with distinct behavioral patterns.
- Behavior Control: Adjust the rate of events, session durations, and conversion funnels.
- Product Catalog: Define your own products, categories, and pricing via a simple configuration file (e.g.,
-
Scalable Data Streaming: Built with Apache Kafka and Kafka Streams to handle high-throughput, real-time data flow, ensuring the system can scale to millions of events.
-
Database Integration: Persist the generated data into a relational database of your choice. Configure the database connection string and let the simulator populate your tables.
-
Modular Architecture: The project is divided into clear modules (e-g.,
data-generator,data-persister,config-manager) to make it easy to understand, extend, and maintain.
[ ] Real-time Dashboard: A web-based interface to visualize the simulated data as it's generated.
[ ] Advanced Analytics: Integration with analytics tools to perform cohort analysis, funnel analysis, and more.
[ ] Anomaly Detection: Simulate unusual user behavior or system events (e.g., sudden traffic spikes, payment failures) to test monitoring systems.
[ ] A/B Testing Simulation: Model A/B tests by directing different user segments to different product variations.
- Core: Java 21+
- Framework: Spring Boot 3
- Messaging: Apache Kafka / Kafka Streams
- Database: Support for any JPA-compatible database (PostgreSQL, MySQL, H2, etc.)
- Build Tool: Maven
Prerequisites
- Java 21
- Docker (for running Kafka and a database)
Installation & Running
- Clone the repository:
git clone https://github.com/uy-nguyen00/ecom-data-simulator.gitcd ecommerce-data-simulator- Start the infrastructure (Kafka & Database):
docker-compose up -d- Configure the application:
-
Edit src/main/resources/application.yml to set up your database connection.
-
Modify src/main/resources/products.json to define your product catalog.
-
Adjust src/main/resources/simulation.properties to control the simulation behavior.
- Build and run the application:
./mvnw spring-boot:runThe simulator will now start generating data and sending it to the configured Kafka topics and database.
The simulation's behavior is primarily controlled by the simulation.properties file. Here are some of the key parameters you can tweak:
# Number of concurrent users to simulate
simulation.users.concurrent=100
# Rate of new events per second
simulation.events.rate=50
# Conversion rate from viewing a product to adding it to the cart
simulation.funnel.addToCartRate=0.10
# Conversion rate from cart to completed checkout
simulation.funnel.purchaseRate=0.05For a full list of configurable parameters, please see the simulation.properties file.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
-
Fork the Project
-
Create your Feature Branch (git checkout -b feature/AmazingFeature)
-
Commit your Changes (git commit -m 'Add some AmazingFeature')
-
Push to the Branch (git push origin feature/AmazingFeature)
-
Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.