|
1 | 1 | # AI-DataScience-Lab |
2 | 2 |
|
3 | | -A web application for data science workflows, allowing users to upload datasets, clean and analyze them using Pandas and scikit-learn (with future TensorFlow integration), powered by a Flask backend. |
4 | | - |
5 | | -## Features |
6 | | -- File upload interface |
7 | | -- Data cleaning with Pandas |
8 | | -- Initial analysis and predictions using scikit-learn |
9 | | -- Ready for AI model integration via OpenAI API or TensorFlow |
10 | | - |
11 | | -## Tech Stack: |
12 | | -- Front-end: HTML, Bootstrap (optional) |
13 | | -- Back-end: Python, Flask |
14 | | -- Data processing: Pandas, scikit-learn |
15 | | -- Deployment: Render / GitHub Pages |
| 3 | +**AI-DataScience-Lab** is an end-to-end forecasting web application designed to upload CSV datasets, clean and analyze them using Python libraries, generate visualizations and predictive models with `scikit-learn`, and summarize the dataset using OpenAI’s GPT-3.5 API. |
| 4 | + |
| 5 | +The frontend is hosted on **GitHub Pages**, and the backend is deployed on **Azure App Service**, creating a scalable and professional architecture suitable for real-world AI and data science workflows. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 🌐 Live Demo |
| 10 | + |
| 11 | +- **Frontend (GitHub Pages):** [https://hariprashad-ravikumar.github.io/AI-DataScience-Lab/](https://hariprashad-ravikumar.github.io/AI-DataScience-Lab/) |
| 12 | +- **Backend (Azure):** [https://ai-dslab-backend-cpf2feachnetbbck.westus-01.azurewebsites.net/](https://ai-dslab-backend-cpf2feachnetbbck.westus-01.azurewebsites.net/) |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## ⚙️ Features |
| 17 | + |
| 18 | +- Upload CSV files with two columns: `X` (dates) and `Y` (numerical values) |
| 19 | +- Cleans data using `pandas`, removes invalid entries |
| 20 | +- Generates a scatter plot using `matplotlib` |
| 21 | +- Converts date strings to ordinal format and trains a `LinearRegression` model with `scikit-learn` |
| 22 | +- Uses **OpenAI API** (GPT-3.5-turbo) to summarize the uploaded dataset |
| 23 | +- Predicts future `Y` values for user-supplied future `X` (date) values |
| 24 | +- Secure HTTPS communication across GitHub and Azure (CORS-enabled) |
| 25 | +- Temporary file storage using Python's `tempfile`, cleaned automatically on restart |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## 📊 Technical Workflow |
| 30 | + |
| 31 | +### 1. **Frontend (GitHub Pages)** |
| 32 | + |
| 33 | +- HTML + JavaScript app with forms to: |
| 34 | + - Upload CSV data |
| 35 | + - Request future predictions |
| 36 | +- Communicates with the backend via `fetch()` using HTTPS POST requests |
| 37 | +- Displays: |
| 38 | + - Processing log |
| 39 | + - OpenAI-generated summary |
| 40 | + - Forecast output |
| 41 | + - Auto-generated plot image |
| 42 | + |
| 43 | +### 2. **Backend (Azure App Service - Python Flask)** |
| 44 | + |
| 45 | +- **Routes:** |
| 46 | + - `POST /upload`: Handles file uploads, data cleaning, modeling, summary generation |
| 47 | + - `POST /predict`: Accepts future dates, returns predictions |
| 48 | + - `GET /plot.png`: Serves saved scatter plot image |
| 49 | + |
| 50 | +### 3. **Processing Pipeline** |
| 51 | + |
| 52 | +- **Step 1: Data Cleaning** |
| 53 | + - Reads CSV using `pandas` |
| 54 | + - Drops NA values and converts `X` to datetime format |
| 55 | + |
| 56 | +- **Step 2: Visualization** |
| 57 | + - Uses `matplotlib` to generate scatter plot |
| 58 | + - Plot saved to a temporary directory and served on request |
| 59 | + |
| 60 | +- **Step 3: Modeling** |
| 61 | + - Uses `scikit-learn` `LinearRegression` to fit `X` (date ordinal) → `Y` |
| 62 | + - Model used to predict future values based on user input |
| 63 | + |
| 64 | +- **Step 4: Summarization** |
| 65 | + - Sends cleaned dataset (via `.head(10).to_csv()`) to OpenAI GPT-3.5 API |
| 66 | + - Summary generated and returned to frontend |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +## 🛠️ Tech Stack |
| 71 | + |
| 72 | +| Layer | Technology | |
| 73 | +|-----------|-------------------------------------------| |
| 74 | +| Frontend | HTML, JavaScript, GitHub Pages | |
| 75 | +| Backend | Flask, Azure App Service | |
| 76 | +| ML Tools | `pandas`, `scikit-learn`, `matplotlib` | |
| 77 | +| AI | OpenAI GPT-3.5 (`openai` Python SDK) | |
| 78 | +| Storage | Python `tempfile` for secure cleanup | |
| 79 | +| Deployment| Gunicorn + Azure Linux App Container | |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +## 🔐 Security and Performance |
| 84 | + |
| 85 | +- Uses `flask-cors` to securely allow cross-origin requests from GitHub Pages |
| 86 | +- All requests are served over HTTPS |
| 87 | +- Files and plots are saved temporarily and deleted automatically on app shutdown using `tempfile.TemporaryDirectory` and `atexit` |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +## 🚀 How to Run Locally |
| 92 | + |
| 93 | +1. **Clone the repo**: |
| 94 | + ```bash |
| 95 | + git clone https://github.com/Hariprashad-Ravikumar/AI-DataScience-Lab.git |
| 96 | + cd AI-DataScience-Lab/backend |
0 commit comments