udacity · chavelei · Aug 9, 2025 · Aug 9, 2025 · Aug 9, 2025 · Aug 9, 2025
@@ -0,0 +1,33 @@
+name: Python application
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python 3.8
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.8
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install flake8 pytest
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+    - name: Lint with flake8
+      run: |
+        # stop the build if there are Python syntax errors or undefined names
+        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
+        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
+        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
+    - name: Test with pytest
+      run: |
+        pytest
@@ -1,5 +1,7 @@
 Working in a command line environment is recommended for ease of use with git and dvc. If on Windows, WSL1 or 2 is recommended.
 
+Link: https://github.com/chavelei/Deploying-a-Scalable-ML-Pipeline-with-FastAPI
+
 # Environment Set up (pip or conda)
 * Option 1: use the supplied file `environment.yml` to create a new environment with conda
 * Option 2: use the supplied file `requirements.txt` to create a new environment with pip

@@ -3,12 +3,12 @@
 import requests
 
 # TODO: send a GET using the URL http://127.0.0.1:8000
-r = None # Your code here
+r = requests.get("http://127.0.0.1:8000")
 
 # TODO: print the status code
-# print()
+print(r.status_code)
 # TODO: print the welcome message
-# print()
+print(r.text)
 
 
 
@@ -30,9 +30,9 @@
 }
 
 # TODO: send a POST using the data above
-r = None # Your code here
+r = requests.post("http://127.0.0.1:8000/data/", json=data)
 
 # TODO: print the status code
-# print()
+print(r.status_code)
 # TODO: print the result
-# print()
+print(r.text)
@@ -26,21 +26,21 @@ class Data(BaseModel):
     hours_per_week: int = Field(..., example=40, alias="hours-per-week")
     native_country: str = Field(..., example="United-States", alias="native-country")
 
-path = None # TODO: enter the path for the saved encoder 
+path = "../Deploying-a-Scalable-ML-Pipeline-with-FastAPI/model/encoder.pkl"
 encoder = load_model(path)
 
-path = None # TODO: enter the path for the saved model 
+path = "../Deploying-a-Scalable-ML-Pipeline-with-FastAPI/model/model.pkl"
 model = load_model(path)
 
 # TODO: create a RESTful API using FastAPI
-app = None # your code here
+app = FastAPI()
 
 # TODO: create a GET on the root giving a welcome message
 @app.get("/")
 async def get_root():
     """ Say hello!"""
     # your code here
-    pass
+    return {"message": "Welcome to the ML model API!"}
 
 
 # TODO: create a POST on a different path that does model inference
@@ -69,6 +69,11 @@ async def post_inference(data: Data):
         # use data as data input
         # use training = False
         # do not need to pass lb as input
-    )
-    _inference = None # your code here to predict the result using data_processed
+        data,
+        categorical_features=cat_features,
+        encoder=encoder,
+        training=False,
+        label=None
+    ) 
+    _inference = inference(model, data_processed)
     return {"result": apply_label(_inference)}
@@ -1,6 +1,8 @@
 import pickle
 from sklearn.metrics import fbeta_score, precision_score, recall_score
 from ml.data import process_data
+from sklearn.ensemble import RandomForestClassifier  # Example model, can be replaced with any other model
+import pandas as pd
 # TODO: add necessary import
 
 # Optional: implement hyperparameter tuning.
@@ -20,8 +22,9 @@ def train_model(X_train, y_train):
         Trained machine learning model.
     """
     # TODO: implement the function
-    pass
-
+    model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
+    model.fit(X_train, y_train)
+    return model
 
 def compute_model_metrics(y, preds):
     """
@@ -60,7 +63,7 @@ def inference(model, X):
         Predictions from the model.
     """
     # TODO: implement the function
-    pass
+    return model.predict(X)
 
 def save_model(model, path):
     """ Serializes model to a file.
@@ -73,12 +76,14 @@ def save_model(model, path):
         Path to save pickle file.
     """
     # TODO: implement the function
-    pass
+    with open(path, 'wb') as f:
+        pickle.dump(model, f)
 
 def load_model(path):
     """ Loads pickle file from `path` and returns it."""
     # TODO: implement the function
-    pass
+    with open(path, 'rb') as f:
+        return pickle.load(f)
 
 
 def performance_on_categorical_slice(
@@ -118,11 +123,18 @@ def performance_on_categorical_slice(
 
     """
     # TODO: implement the function
+    data_slice = data[data[column_name] == slice_value]
     X_slice, y_slice, _, _ = process_data(
         # your code here
         # for input data, use data in column given as "column_name", with the slice_value 
         # use training = False
+        data_slice,
+        categorical_features=categorical_features,
+        label=label,
+        training=False,
+        encoder=encoder,
+        lb=lb
     )
-    preds = None # your code here to get prediction on X_slice using the inference function
+    preds = inference(model, X_slice)
     precision, recall, fbeta = compute_model_metrics(y_slice, preds)
     return precision, recall, fbeta
@@ -3,16 +3,40 @@
 For additional information see the Model Card paper: https://arxiv.org/pdf/1810.03993.pdf
 
 ## Model Details
+This classification model was trained using the 1994 Census Bureau dataset from the UCI Machine Learning Repository (https://archive.ics.uci.edu/dataset/20/census+income). The goal is to predict whether an individual’s annual income exceeds $50,000 based on a set of demographic and socio-economic features, including:
+* Sex
+* Race
+* Marital status
+* Age
+* Native country
+* Education
+* Relationship status
+* Occupation
+* Hours worked per week
+* Work class
+* Capital gain
+* Capital loss
 
 ## Intended Use
+Primary use: Predicting income category for individuals based on demographic and economic data.
+Not intended for: Making real-world financial, hiring, or legal decisions without thorough fairness and bias evaluation.
 
 ## Training Data
+Source: 1994 Census Bureau dataset (UCI Machine Learning Repository).
+Size: 48,842 records after preprocessing.
 
 ## Evaluation Data
 
 ## Metrics
-_Please include the metrics used and your model's performance on those metrics._
+Precision : 0.7807 | Recall: 0.5379 | F1: 0.6369
 
 ## Ethical Considerations
+Dataset reflects social and economic patterns from 1994, which may not represent current demographics or job markets.
+Potential bias in predictions related to sensitive attributes such as race, sex, or marital status.
+Misuse could perpetuate existing inequalities if deployed in sensitive decision-making contexts.
 
 ## Caveats and Recommendations
+Model performance may degrade on modern census or employment datasets without retraining.
+Bias analysis should be conducted before deployment.
+Should not be the sole decision-making tool in critical domains such as hiring or lending.
+