Merge pull request #7 from MicrosoftCloudEssentials-LearningHub/step4-inprogress

brown9804 · web-flow · commit c19494f10b4f · 2025-05-06T12:12:04.000-06:00
finishing model
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ Costa Rica
 [![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
 [brown9804](https://github.com/brown9804)
 
-Last updated: 2025-04-29
+Last updated: 2025-05-06
 
 ------------------------------------------
 
@@ -15,7 +15,7 @@ Last updated: 2025-04-29
 - Terraform [Demonstration: Deploying Azure Resources for a Data Platform (Microsoft Fabric)](./infrastructure/msFabric/)
 - Terraform [Demonstration: Deploying Azure Resources for an ML Platform](./infrastructure/azMachineLearning/)
 - [Demostration: How to integrate AI in Microsoft Fabric](./msFabric-AI_integration/)
-- [Demostration: Creating a Machine Learning Model](./azML-modelcreation/)
+- [Demostration: Creating a Machine Learning Model](./azML-modelcreation/) - in progress
 
 > Azure Machine Learning (PaaS) is a cloud-based platform from Microsoft designed to help `data scientists and machine learning engineers build, train, deploy, and manage machine learning models at scale`. It supports the `entire machine learning lifecycle, from data preparation and experimentation to deployment and monitoring.` It provides powerful tools for `both code-first and low-code users`, including Jupyter notebooks, drag-and-drop interfaces, and automated machine learning (AutoML). `Azure ML integrates seamlessly with other Azure services and supports popular frameworks like TensorFlow, PyTorch, and Scikit-learn.`
 
@@ -284,9 +284,6 @@ Read more about [Endpoints for inference in production](https://learn.microsoft.
 </details>
 
 
-
-
-
 <div align="center">
   <h3 style="color: #4CAF50;">Total Visitors</h3>
   <img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
diff --git a/azML-modelcreation/README.md b/azML-modelcreation/README.md
@@ -5,19 +5,34 @@ Costa Rica
 [![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
 [brown9804](https://github.com/brown9804)
 
-Last updated: 2025-04-29
+Last updated: 2025-05-06
 
 ------------------------------------------
 
 
 <details>
 <summary><b>List of References </b> (Click to expand)</summary>
 
+- [AutoML Regression](https://learn.microsoft.com/en-us/azure/machine-learning/component-reference-v2/regression?view=azureml-api-2)
+- [Evaluate automated machine learning experiment results](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml?view=azureml-api-2)
+- [Evaluate Model component](https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/evaluate-model?view=azureml-api-2)
+
 </details>
 
 <details>
 <summary><b>Table of Content </b> (Click to expand)</summary>
 
+- [Step 1: Set Up Your Azure ML Workspace](#step-1-set-up-your-azure-ml-workspace)
+- [Step 2: Create a Compute Instance](#step-2-create-a-compute-instance)
+- [Step 3: Prepare Your Data](#step-3-prepare-your-data)
+- [Step 4: Create a New Notebook or Script](#step-4-create-a-new-notebook-or-script)
+- [Step 5: Load and Explore the Data](#step-5-load-and-explore-the-data)
+- [Step 6: Train Your Model](#step-6-train-your-model)
+- [Step 7: Evaluate the Model](#step-7-evaluate-the-model)
+- [Step 8: Register the Model](#step-8-register-the-model)
+- [Step 9: Deploy the Model](#step-9-deploy-the-model)
+- [Step 10: Test the Endpoint](#step-10-test-the-endpoint)
+
 </details>
 
 ## Step 1: Set Up Your Azure ML Workspace
@@ -69,86 +84,239 @@ https://github.com/user-attachments/assets/c199156f-96cf-4ed0-a8b5-c88db3e7a552
 
 https://github.com/user-attachments/assets/f8cbd32c-94fc-43d3-a7a8-00f63cdc543d
 
+## Step 4: Create a New Notebook or Script
 
-### **4. Create a New Notebook or Script**
 - Use the compute instance to open a **Jupyter notebook** or create a Python script.
 - Import necessary libraries:
+
   ```python
   import pandas as pd
   from sklearn.model_selection import train_test_split
   from sklearn.ensemble import RandomForestClassifier
   from sklearn.metrics import accuracy_score
   ```
 
----
+  https://github.com/user-attachments/assets/16650584-11cb-48fb-928d-c032e519c14b
+
+## Step 5: Load and Explore the Data
+
+> Load the dataset and perform basic EDA (exploratory data analysis):
 
-### **5. Load and Explore the Data**
-- Load the dataset and perform basic EDA (exploratory data analysis):
   ```python
-  data = pd.read_csv('your_dataset.csv')
-  print(data.head())
+  import mltable
+  from azure.ai.ml import MLClient
+  from azure.identity import DefaultAzureCredential
+  
+  ml_client = MLClient.from_config(credential=DefaultAzureCredential())
+  data_asset = ml_client.data.get("employee_data", version="1")
+  
+  tbl = mltable.load(f'azureml:/{data_asset.id}')
+  
+  df = tbl.to_pandas_dataframe()
+  df
   ```
 
----
+  https://github.com/user-attachments/assets/5fa65d95-8502-4ab7-ba0d-dfda66378cc2
 
-### **6. Train Your Model**
-- Split the data and train a model:
-  ```python
-  X = data.drop('target', axis=1)
-  y = data['target']
-  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+## Step 6: Train Your Model
+
+> Split the data and train a model:
 
-  model = RandomForestClassifier()
+  ```python
+  # Step 1: Preprocessing
+  from sklearn.preprocessing import LabelEncoder, StandardScaler
+  
+  # Encode categorical columns
+  label_encoder = LabelEncoder()
+  df['Department'] = label_encoder.fit_transform(df['Department'])
+  
+  # Drop non-informative or high-cardinality columns
+  if 'Name' in df.columns:
+      df = df.drop(columns=['Name'])  # 'Name' is likely not predictive
+  
+  # Optional: Check for missing values
+  if df.isnull().sum().any():
+      df = df.dropna()  # or use df.fillna(method='ffill') for imputation
+  
+  # Step 2: Define Features and Target
+  X = df.drop('Salary', axis=1)  # Features: Age and Department
+  y = df['Salary']               # Target: Salary
+  
+  # Optional: Feature Scaling (especially useful for models sensitive to scale)
+  scaler = StandardScaler()
+  X_scaled = scaler.fit_transform(X)
+  
+  # Step 3: Split the Data
+  from sklearn.model_selection import train_test_split
+  
+  X_train, X_test, y_train, y_test = train_test_split(
+      X_scaled, y, test_size=0.2, random_state=42
+  )
+  
+  # Step 4: Train a Regression Model
+  from sklearn.ensemble import RandomForestRegressor
+  
+  model = RandomForestRegressor(
+      n_estimators=100,
+      max_depth=None,
+      random_state=42,
+      n_jobs=-1  # Use all available cores
+  )
   model.fit(X_train, y_train)
   ```
 
----
+  https://github.com/user-attachments/assets/2176c795-5fda-4746-93c7-8b137b526a09
+
+## Step 7: Evaluate the Model
+
+> Check performance:
 
-### **7. Evaluate the Model**
-- Check performance:
   ```python
+  # Step 5: Make Predictions
   predictions = model.predict(X_test)
-  print("Accuracy:", accuracy_score(y_test, predictions))
+  
+  # Step 6: Evaluate the Model
+  from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
+  import numpy as np
+  
+  mae = mean_absolute_error(y_test, predictions)
+  mse = mean_squared_error(y_test, predictions)
+  rmse = np.sqrt(mse)
+  r2 = r2_score(y_test, predictions)
+  
+  print("Model Evaluation Metrics")
+  print(f"Mean Absolute Error (MAE): {mae:.2f}")
+  print(f"Mean Squared Error (MSE): {mse:.2f}")
+  print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
+  print(f"R² Score: {r2:.2f}")
   ```
 
----
+  <img width="550" alt="image" src="https://github.com/user-attachments/assets/6aa19680-cadb-4fe4-a419-a626942e15f9" />
+
+> Distribution of prediction errors:
+
+```python
+import matplotlib.pyplot as plt
+
+# Plot 1: Distribution of prediction errors
+errors = y_test - predictions
+plt.figure(figsize=(10, 6))
+plt.hist(errors, bins=30, color='skyblue', edgecolor='black')
+plt.title('Distribution of Prediction Errors')
+plt.xlabel('Prediction Error')
+plt.ylabel('Frequency')
+plt.grid(True)
+plt.show()
+
+# Plot 2: Predicted vs Actual values
+plt.figure(figsize=(10, 6))
+plt.scatter(y_test, predictions, alpha=0.3, color='darkorange')
+plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
+plt.title('Predicted vs Actual Salary')
+plt.xlabel('Actual Salary')
+plt.ylabel('Predicted Salary')
+plt.grid(True)
+plt.show()
+```
+
+<img width="550" alt="image" src="https://github.com/user-attachments/assets/d8ec1f2c-eb97-4106-9cee-809849d02796">
+
+## Step 8: Register the Model
+
+> Save and register the model in Azure ML:
 
-### **8. Register the Model**
-- Save and register the model in Azure ML:
   ```python
   import joblib
   joblib.dump(model, 'model.pkl')
-
+  
   from azureml.core import Workspace, Model
   ws = Workspace.from_config()
-  Model.register(workspace=ws, model_path="model.pkl", model_name="my_model")
+  Model.register(workspace=ws, model_path="model.pkl", model_name="my_model_RegressionModel")
   ```
 
----
+https://github.com/user-attachments/assets/a82ff03e-437c-41bc-85fa-8b9903384a5b
+
+
+> [!TIP]
+> Click [here](./src/0_ml-model-creation.ipynb) to read the script used.
+
+## Step 9: Deploy the Model
+
+> Create the Scoring Script:
+
+```python
+import joblib
+import numpy as np
+from azureml.core.model import Model
+
+def init():
+    global model
+    model_path = Model.get_model_path("my_model_RegressionModel")
+    model = joblib.load(model_path)
+
+def run(data):
+    try:
+        input_data = np.array(data["data"])
+        result = model.predict(input_data)
+        return result.tolist()
+    except Exception as e:
+        return str(e)
+```
+
+https://github.com/user-attachments/assets/cdc64857-3bde-4ec9-957d-5399d9447813
+
+> Create the Environment File (env.yml):
+
+https://github.com/user-attachments/assets/8e7c37a2-e32b-4630-8516-f95926c374c0
+
+> Create a new notebook:
+
+https://github.com/user-attachments/assets/1b3e5602-dc64-4c39-be72-ed1cbd74361e
+
+> Create an **inference configuration** and deploy to a web service:
 
-### **9. Deploy the Model**
-- Create an **inference configuration** and deploy to a web service:
   ```python
+  from azureml.core import Workspace
   from azureml.core.environment import Environment
-  from azureml.core.model import InferenceConfig
+  from azureml.core.model import InferenceConfig, Model
   from azureml.core.webservice import AciWebservice
-
-  env = Environment.from_conda_specification(name="myenv", file_path="env.yml")
+  
+  # Load the workspace
+  ws = Workspace.from_config()
+  
+  # Get the registered model
+  registered_model = Model(ws, name="my_model_RegressionModel")
+  
+  # Create environment from requirements.txt (no conda)
+  env = Environment.from_pip_requirements(
+      name="regression-env",
+      file_path="requirements.txt"  # Make sure this file exists in your working directory
+  )
+  
+  # Define inference configuration
   inference_config = InferenceConfig(entry_script="score.py", environment=env)
-
+  
+  # Define deployment configuration
   deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
-  service = Model.deploy(workspace=ws,
-                         name="my-service",
-                         models=[model],
-                         inference_config=inference_config,
-                         deployment_config=deployment_config)
+  
+  # Deploy the model
+  service = Model.deploy(
+      workspace=ws,
+      name="regression-model-service",
+      models=[registered_model],
+      inference_config=inference_config,
+      deployment_config=deployment_config
+  )
+  
   service.wait_for_deployment(show_output=True)
+  print(f"Scoring URI: {service.scoring_uri}")
   ```
 
----
 
-### **10. Test the Endpoint**
-- Once deployed, you can send HTTP requests to the endpoint to get predictions.
+
+## Step 10: Test the Endpoint
+
+> Once deployed, you can send HTTP requests to the endpoint to get predictions.
 
 
 
diff --git a/azML-modelcreation/src/0_ml-model-creation.ipynb b/azML-modelcreation/src/0_ml-model-creation.ipynb