A simple, powerful CLI tool to generate realistic fake data for testing, development, and demonstrations.
- What is This?
- Installation
- Getting Started
- How It Works
- Available Templates
- Step-by-Step Guide
- Export Options
- Troubleshooting
- FAQ
The Synthetic Data Generator is a command-line tool that creates realistic fake data for different scenarios:
β
Generate fake user profiles (names, emails, addresses)
β
Create e-commerce transaction records
β
Generate banking & financial data
β
Create healthcare patient records
β
Generate IoT sensor readings
β
Create product catalogs
β
Generate NLP text data
β
Create web analytics data
β
Generate image metadata
β
Create student academic records
- Test your application with realistic data without using real customer data
- Database training - populate your database for demos
- Development work - test features with varied data
- Privacy safe - all data is fake and synthetic
- Customize data - select only the fields you need
- Python 3.7+ installed on your computer
- pip (package manager for Python)
Open your terminal/command prompt and run:
pip install pandas fakerThis installs:
- pandas - for working with data tables
- faker - for generating realistic fake data
Make sure you have these files in the same folder:
CLI.py(the main program)DATA_ENGINE.py(all data templates)
Run this command to check if everything is set up:
python CLI.pyYou should see the menu appear. If yes, you're ready to go! π
# 1. Open terminal/command prompt
# 2. Go to your project folder
cd /path/to/your/project
# 3. Run the program
python CLI.py
# 4. Follow the menuThat's it! The program will guide you through everything.
Step 1: You choose a TEMPLATE (what kind of data you want)
β
Step 2: You choose SUBCATEGORIES (which fields from that template)
β
Step 3: You enter COUNT (how many rows of data)
β
Step 4: Program generates the fake data
β
Step 5: You save it as CSV or JSON
[INSERT SCREENSHOT: Full CLI menu here]
Generate fake user/customer profiles.
What data you get:
- Full name, username, email, phone
- Address (street, city, state, country)
- Account info (creation date, status)
- Preferences (language, currency)
- Device info (mobile/desktop/tablet)
Use cases: Customer databases, user profiles, testing auth systems
Generate fake online shopping transactions.
What data you get:
- Order ID, transaction date, order status
- Customer info (name, email, phone)
- Product info (name, category, quantity, price)
- Payment details (method, status, transaction ID)
- Shipping info (address, partner, status)
- Device info
Use cases: E-commerce websites, order management systems, payment testing
[INSERT SCREENSHOT: Sample generated e-commerce data]
Generate fake banking transactions.
What data you get:
- Account info (ID, type, bank name, IFSC)
- Transaction details (ID, date, type, amount, status)
- Customer info (name, email, PAN number, Aadhaar)
- Card details (type, network, last 4 digits)
- Loan info (type, amount, interest rate, EMI)
- Device info
Use cases: Banking apps, financial dashboards, transaction analytics
Generate fake medical records.
What data you get:
- Patient info (ID, name, age, gender, blood group)
- Medical records (diagnosis, symptoms, severity)
- Doctor info (name, specialization, hospital)
- Appointment details (date, status, type)
- Prescription info (medicine, dosage, duration)
- Billing info (consultation fee, medicine charges)
- Device info
Use cases: Hospital systems, telemedicine apps, medical records
[INSERT SCREENSHOT: Sample healthcare data]
Generate fake IoT sensor readings.
What data you get:
- Device info (ID, type, firmware version)
- Location data (latitude, longitude, altitude)
- Sensor readings (temperature, humidity, air quality)
- Network data (signal strength, connection type, IP)
- Battery info (level, health, charging status)
- Timestamp & maintenance info
Use cases: IoT dashboards, sensor data analysis, smart home testing
Generate fake text documents and NLP data.
What data you get:
- Text data (sentences, paragraphs, words, keywords)
- Document metadata (title, author, published year)
- NLP annotations (language, sentiment, emotion, toxicity)
- Named Entity Recognition (person, location, organization)
- Text statistics (word count, character count, avg word length)
- Timestamps
Use cases: NLP model testing, text analysis, chatbot training
Generate fake website analytics data.
What data you get:
- Session info (session ID, user ID, duration, engagement score)
- Page metrics (URL, time on page, scroll depth, interactions)
- Traffic source (source, medium, campaign, keyword)
- Device & browser info
- Geo data (IP, country, city, timezone)
- Performance data (page load time, DNS lookup, resource count)
Use cases: Analytics dashboards, website performance testing, traffic analysis
Generate fake image metadata.
What data you get:
- Basic info (filename, format, file size, color mode)
- Dimensions (width, height, aspect ratio, DPI)
- Camera EXIF (camera make, lens, focal length, aperture, ISO)
- Geolocation (latitude, longitude, city, country)
- Tags & labels (primary label, confidence score)
- Color stats (dominant color, brightness, contrast)
Use cases: Image database testing, photo library apps, image analysis
Generate fake student academic records.
What data you get:
- Student profile (ID, name, age, grade level)
- Academic scores (math, science, English, social science, computer)
- Attendance (total classes, classes attended, percentage)
- Behavior & activity (disciplinary actions, sports score, creativity)
- Performance metrics (study hours, homework completion, participation)
Use cases: School management systems, student performance dashboards
[INSERT SCREENSHOT: Sample student data]
Generate fake product listings.
What data you get:
- Basic info (product ID, name, category, brand, description)
- Variants (color, size, material, model number)
- Pricing (price, discount, final price, currency)
- Inventory (stock status, quantity, warehouse location)
- Ratings & reviews (average rating, total reviews, review text)
- Timestamps (release date, last updated)
Use cases: E-commerce catalogs, product databases, inventory management
python CLI.pyYou'll see:
-----------------------------
π₯ SYNTHETIC DATA GENERATOR π₯
-----------------------------
Available Templates:
1. USER TEMPLATE
2. ECOMMERCE TRANSACTION TEMPLATE
3. FINANCIAL BANKING TEMPLATE
... (and more)
[INSERT SCREENSHOT: Initial menu]
When asked:
Select a Template (number):
Type the number of the template you want. For example:
- Type
1for User Template - Type
2for E-commerce Template
Example:
Select a Template (number): 2
[INSERT SCREENSHOT: After selecting template]
After selecting, you'll see available subcategories:
Available Subcategories:
1. Order Info
2. Customer Info
3. Product Info
4. Payment Info
5. Shipping Info
6. Device Info
Enter Subcategories (comma-separated numbers):
Simple explanation: Each template has groups of related data fields.
Example:
Enter Subcategories (comma-separated numbers): 1,2,3
This means: "I want Order Info, Customer Info, and Product Info"
π‘ Pro Tip:
- Type
1,2,3,4,5,6for ALL fields - Type
1,3,5for only some fields - Pick only what you need!
[INSERT SCREENSHOT: Subcategory selection]
Enter number of rows to generate:
Simple: How many fake records do you want?
Examples:
Enter number of rows to generate: 100
Creates 100 rows of fake data.
Timing guide:
- 100 rows = ~1-2 seconds
- 1,000 rows = ~5-10 seconds
- 10,000 rows = ~30-60 seconds
Enter Seed (press Enter to skip):
What is a seed? It makes the data reproducible (same fake data every time).
Simple usage:
- Skip it (press Enter) = Different data each time
- Enter a number (like
42) = Same data every time with that number
Example:
Enter Seed (press Enter to skip): 42
[INSERT SCREENSHOT: Seed option]
You'll see:
β
DATA GENERATED SUCCESSFULLY!
==============================
SUMMARY
* Template: ECOMMERCE TRANSACTION TEMPLATE
β± Time Taken: 2.45 seconds
* Rows Generated: 100
* Columns Generated: 18
* Seed: 42
==============================
Great! Your data is ready.
Select what you want to do next:
1. Preview data (first 5 rows)
2. Save as CSV
3. Save as JSON
4. Exit
Enter options (comma-separated):
You can pick multiple options!
What it does: Shows you the first 5 rows in your terminal.
Use case: Quick check before saving.
Example output:
S.no. Order ID Customer Name Customer Email Total Amount
1 a1b2c3d4 John Doe john@example.com 15,500
2 e5f6g7h8 Jane Smith jane@example.com 8,200
3 i9j0k1l2 Bob Johnson bob@example.com 25,000
...
What it does: Saves data as an Excel-compatible CSV file.
How to use:
Enter CSV file name: my_orders
This creates: my_orders.csv
Open it:
- Double-click to open in Excel/Sheets
- Use in Python with:
pd.read_csv('my_orders.csv')
When to use:
- β For Excel/spreadsheet work
- β For uploading to most tools
- β For data analysis in Excel
[INSERT SCREENSHOT: CSV file in Excel]
What it does: Saves data as JSON format (web-friendly).
How to use:
Enter JSON file name: my_orders
This creates: my_orders.json
Example JSON output:
[
{
"S.no.": 1,
"Order ID": "a1b2c3d4",
"Customer Name": "John Doe",
"Customer Email": "john@example.com",
"Total Amount": 15500
},
{
"S.no.": 2,
"Order ID": "e5f6g7h8",
...
}
]When to use:
- β For APIs and web services
- β For Python/JavaScript applications
- β For database imports
Just closes the program. Simple!
Goal: Create 50 fake user profiles for testing
Steps:
1. Run: python CLI.py
2. Select Template: 1 (USER TEMPLATE)
3. Select Subcategories: 1,2,3,4 (Personal, Address, Account, Preferences)
4. Count: 50
5. Seed: (skip)
6. Export: 2 (Save as CSV)
7. Filename: user_profiles
Result: user_profiles.csv with 50 fake users β
[INSERT SCREENSHOT: User data sample]
Goal: Create 100 fake orders for testing checkout system
Steps:
1. Run: python CLI.py
2. Select Template: 2 (ECOMMERCE TRANSACTION TEMPLATE)
3. Select Subcategories: 1,2,3,4,5 (Order, Customer, Product, Payment, Shipping)
4. Count: 100
5. Seed: 123 (reproducible)
6. Export: 1,2 (Preview + Save as CSV)
7. Filename: test_orders
Result: Preview in terminal + test_orders.csv β
Goal: Generate 200 patient records for testing medical software
Steps:
1. Run: python CLI.py
2. Select Template: 4 (HEALTHCARE TEMPLATE)
3. Select Subcategories: 1,2,3,4 (Patient, Medical, Doctor, Appointment)
4. Count: 200
5. Seed: (skip)
6. Export: 3 (Save as JSON)
7. Filename: patient_records
Result: patient_records.json with 200 patients β
Solution:
pip install pandas fakerThen try again.
Solution:
- You might need to use
python3instead:
python3 CLI.pyOr on Windows, use full path to Python.
Reason: You entered a wrong number or non-existent template.
Solution:
- Look at the menu carefully
- Enter a number that's listed (usually 1-10)
- Don't enter letters or special characters
Why: You're generating too many rows at once.
Solution:
- Try smaller numbers first: 10, 50, 100
- Large datasets (100,000+) will take time
- Be patient! It's still much faster than manual entry
Reason: File might be corrupted or encoding issue.
Solution:
# Regenerate the data
python CLI.py
# Or check if file was saved correctly
# Try opening with a text editor firstReason: You entered numbers that don't exist or wrong format.
Solution:
- Separate numbers with commas:
1,2,3β - Don't use spaces:
1, 2, 3β - Check max number shown in menu
No! All data is 100% fake and synthetic. It's safe to use anywhere.
No. This is for testing and development only. Never use fake data in real production systems for customers.
Yes! The templates are in DATA_ENGINE.py. You can edit them to add custom fields.
(Advanced topic - ask me if interested!)
Using a library called Faker which creates realistic-looking but fake data using algorithms.
CSV: Better for Excel, spreadsheets, simple analysis JSON: Better for web APIs, web applications, databases
Technically yes, but:
- Will take time (depends on your computer)
- Might use lots of memory
- Better to generate in batches (100K at a time)
You have two options:
Option 1: Use only the subcategories you need (select specific fields)
Option 2: Modify DATA_ENGINE.py to customize templates
Yes! No internet needed once installed. Everything runs on your computer.
- Did I install pandas and faker?
- Are
CLI.pyandDATA_ENGINE.pyin same folder? - Did I enter valid template number?
- Did I enter subcategories correctly (comma-separated)?
- Did I enter a valid count (number)?
- Did I check the error message for clues?
-
Test with small data first
# Generate 10 rows first to test # If it works, generate 1000
-
Use seeds for consistent testing
# Same seed = same data every time # Good for reproducible tests
-
Preview before saving large files
# Select option 1 first # Check if data looks correct # Then save as CSV/JSON
-
Combine templates
# Generate users (Template 1) # Generate orders (Template 2) # Connect them manually for realistic data
-
Clean up old files
# Delete old CSV/JSON files before regenerating # Keeps your folder organized
(For experienced Python users)
To add custom fields to a template, edit DATA_ENGINE.py:
"New Field": lambda: fake.company(), # Uses Faker
"Age": lambda: random.randint(18, 65), # Random number
"ID": lambda: str(uuid.uuid4()), # Unique IDThen select it in CLI!
βββββββββββββββββββββββββββββββββββββββββββ
β CLI.py (Main Interface) β
β - Menu display β
β - User input β
β - Export options β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β DATA_ENGINE.py (All Templates) β
β - USER_TEMPLATE β
β - ECOM_TEMPLATE β
β - FINANCIAL_TEMPLATE β
β - HEALTHCARE_TEMPLATE β
β - IOT_SENSOR_TEMPLATE β
β - ... (7 more templates) β
β - generate_from_template() function β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β External Libraries β
β - Faker (generates fake data) β
β - Pandas (data processing) β
β - Random (random selections) β
β - UUID (unique IDs) β
βββββββββββββββββββββββββββββββββββββββββββ
your_project_folder/
βββ CLI.py β Run this file
βββ DATA_ENGINE.py β All templates here
βββ user_profiles.csv β Generated file (example)
βββ test_orders.json β Generated file (example)
βββ README.md β This documentation
Current Version: 1.0
Last Updated: December 2025
Python Required: 3.7+
Having issues?
- Check troubleshooting section
- Check FAQ
- Review examples
- Check file structure
This tool is free to use for personal, educational, and testing purposes.
Happy data generating! π
Made with β€οΈ for developers, testers, and data enthusiasts.