Skip to content

Commit 3559963

Browse files
Update README.md
1 parent da1dcf6 commit 3559963

File tree

1 file changed

+94
-13
lines changed

1 file changed

+94
-13
lines changed

README.md

Lines changed: 94 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,96 @@
11
# AI-DataScience-Lab
22

3-
A web application for data science workflows, allowing users to upload datasets, clean and analyze them using Pandas and scikit-learn (with future TensorFlow integration), powered by a Flask backend.
4-
5-
## Features
6-
- File upload interface
7-
- Data cleaning with Pandas
8-
- Initial analysis and predictions using scikit-learn
9-
- Ready for AI model integration via OpenAI API or TensorFlow
10-
11-
## Tech Stack:
12-
- Front-end: HTML, Bootstrap (optional)
13-
- Back-end: Python, Flask
14-
- Data processing: Pandas, scikit-learn
15-
- Deployment: Render / GitHub Pages
3+
**AI-DataScience-Lab** is an end-to-end forecasting web application designed to upload CSV datasets, clean and analyze them using Python libraries, generate visualizations and predictive models with `scikit-learn`, and summarize the dataset using OpenAI’s GPT-3.5 API.
4+
5+
The frontend is hosted on **GitHub Pages**, and the backend is deployed on **Azure App Service**, creating a scalable and professional architecture suitable for real-world AI and data science workflows.
6+
7+
---
8+
9+
## 🌐 Live Demo
10+
11+
- **Frontend (GitHub Pages):** [https://hariprashad-ravikumar.github.io/AI-DataScience-Lab/](https://hariprashad-ravikumar.github.io/AI-DataScience-Lab/)
12+
- **Backend (Azure):** [https://ai-dslab-backend-cpf2feachnetbbck.westus-01.azurewebsites.net/](https://ai-dslab-backend-cpf2feachnetbbck.westus-01.azurewebsites.net/)
13+
14+
---
15+
16+
## ⚙️ Features
17+
18+
- Upload CSV files with two columns: `X` (dates) and `Y` (numerical values)
19+
- Cleans data using `pandas`, removes invalid entries
20+
- Generates a scatter plot using `matplotlib`
21+
- Converts date strings to ordinal format and trains a `LinearRegression` model with `scikit-learn`
22+
- Uses **OpenAI API** (GPT-3.5-turbo) to summarize the uploaded dataset
23+
- Predicts future `Y` values for user-supplied future `X` (date) values
24+
- Secure HTTPS communication across GitHub and Azure (CORS-enabled)
25+
- Temporary file storage using Python's `tempfile`, cleaned automatically on restart
26+
27+
---
28+
29+
## 📊 Technical Workflow
30+
31+
### 1. **Frontend (GitHub Pages)**
32+
33+
- HTML + JavaScript app with forms to:
34+
- Upload CSV data
35+
- Request future predictions
36+
- Communicates with the backend via `fetch()` using HTTPS POST requests
37+
- Displays:
38+
- Processing log
39+
- OpenAI-generated summary
40+
- Forecast output
41+
- Auto-generated plot image
42+
43+
### 2. **Backend (Azure App Service - Python Flask)**
44+
45+
- **Routes:**
46+
- `POST /upload`: Handles file uploads, data cleaning, modeling, summary generation
47+
- `POST /predict`: Accepts future dates, returns predictions
48+
- `GET /plot.png`: Serves saved scatter plot image
49+
50+
### 3. **Processing Pipeline**
51+
52+
- **Step 1: Data Cleaning**
53+
- Reads CSV using `pandas`
54+
- Drops NA values and converts `X` to datetime format
55+
56+
- **Step 2: Visualization**
57+
- Uses `matplotlib` to generate scatter plot
58+
- Plot saved to a temporary directory and served on request
59+
60+
- **Step 3: Modeling**
61+
- Uses `scikit-learn` `LinearRegression` to fit `X` (date ordinal) → `Y`
62+
- Model used to predict future values based on user input
63+
64+
- **Step 4: Summarization**
65+
- Sends cleaned dataset (via `.head(10).to_csv()`) to OpenAI GPT-3.5 API
66+
- Summary generated and returned to frontend
67+
68+
---
69+
70+
## 🛠️ Tech Stack
71+
72+
| Layer | Technology |
73+
|-----------|-------------------------------------------|
74+
| Frontend | HTML, JavaScript, GitHub Pages |
75+
| Backend | Flask, Azure App Service |
76+
| ML Tools | `pandas`, `scikit-learn`, `matplotlib` |
77+
| AI | OpenAI GPT-3.5 (`openai` Python SDK) |
78+
| Storage | Python `tempfile` for secure cleanup |
79+
| Deployment| Gunicorn + Azure Linux App Container |
80+
81+
---
82+
83+
## 🔐 Security and Performance
84+
85+
- Uses `flask-cors` to securely allow cross-origin requests from GitHub Pages
86+
- All requests are served over HTTPS
87+
- Files and plots are saved temporarily and deleted automatically on app shutdown using `tempfile.TemporaryDirectory` and `atexit`
88+
89+
---
90+
91+
## 🚀 How to Run Locally
92+
93+
1. **Clone the repo**:
94+
```bash
95+
git clone https://github.com/Hariprashad-Ravikumar/AI-DataScience-Lab.git
96+
cd AI-DataScience-Lab/backend

0 commit comments

Comments
 (0)