Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Python application

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
Working in a command line environment is recommended for ease of use with git and dvc. If on Windows, WSL1 or 2 is recommended.

Link: https://github.com/chavelei/Deploying-a-Scalable-ML-Pipeline-with-FastAPI

# Environment Set up (pip or conda)
* Option 1: use the supplied file `environment.yml` to create a new environment with conda
* Option 2: use the supplied file `requirements.txt` to create a new environment with pip
Expand Down
12 changes: 6 additions & 6 deletions local_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
import requests

# TODO: send a GET using the URL http://127.0.0.1:8000
r = None # Your code here
r = requests.get("http://127.0.0.1:8000")

# TODO: print the status code
# print()
print(r.status_code)
# TODO: print the welcome message
# print()
print(r.text)



Expand All @@ -30,9 +30,9 @@
}

# TODO: send a POST using the data above
r = None # Your code here
r = requests.post("http://127.0.0.1:8000/data/", json=data)

# TODO: print the status code
# print()
print(r.status_code)
# TODO: print the result
# print()
print(r.text)
17 changes: 11 additions & 6 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,21 @@ class Data(BaseModel):
hours_per_week: int = Field(..., example=40, alias="hours-per-week")
native_country: str = Field(..., example="United-States", alias="native-country")

path = None # TODO: enter the path for the saved encoder
path = "../Deploying-a-Scalable-ML-Pipeline-with-FastAPI/model/encoder.pkl"
encoder = load_model(path)

path = None # TODO: enter the path for the saved model
path = "../Deploying-a-Scalable-ML-Pipeline-with-FastAPI/model/model.pkl"
model = load_model(path)

# TODO: create a RESTful API using FastAPI
app = None # your code here
app = FastAPI()

# TODO: create a GET on the root giving a welcome message
@app.get("/")
async def get_root():
""" Say hello!"""
# your code here
pass
return {"message": "Welcome to the ML model API!"}


# TODO: create a POST on a different path that does model inference
Expand Down Expand Up @@ -69,6 +69,11 @@ async def post_inference(data: Data):
# use data as data input
# use training = False
# do not need to pass lb as input
)
_inference = None # your code here to predict the result using data_processed
data,
categorical_features=cat_features,
encoder=encoder,
training=False,
label=None
)
_inference = inference(model, data_processed)
return {"result": apply_label(_inference)}
24 changes: 18 additions & 6 deletions ml/model.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import pickle
from sklearn.metrics import fbeta_score, precision_score, recall_score
from ml.data import process_data
from sklearn.ensemble import RandomForestClassifier # Example model, can be replaced with any other model
import pandas as pd
# TODO: add necessary import

# Optional: implement hyperparameter tuning.
Expand All @@ -20,8 +22,9 @@ def train_model(X_train, y_train):
Trained machine learning model.
"""
# TODO: implement the function
pass

model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
model.fit(X_train, y_train)
return model

def compute_model_metrics(y, preds):
"""
Expand Down Expand Up @@ -60,7 +63,7 @@ def inference(model, X):
Predictions from the model.
"""
# TODO: implement the function
pass
return model.predict(X)

def save_model(model, path):
""" Serializes model to a file.
Expand All @@ -73,12 +76,14 @@ def save_model(model, path):
Path to save pickle file.
"""
# TODO: implement the function
pass
with open(path, 'wb') as f:
pickle.dump(model, f)

def load_model(path):
""" Loads pickle file from `path` and returns it."""
# TODO: implement the function
pass
with open(path, 'rb') as f:
return pickle.load(f)


def performance_on_categorical_slice(
Expand Down Expand Up @@ -118,11 +123,18 @@ def performance_on_categorical_slice(

"""
# TODO: implement the function
data_slice = data[data[column_name] == slice_value]
X_slice, y_slice, _, _ = process_data(
# your code here
# for input data, use data in column given as "column_name", with the slice_value
# use training = False
data_slice,
categorical_features=categorical_features,
label=label,
training=False,
encoder=encoder,
lb=lb
)
preds = None # your code here to get prediction on X_slice using the inference function
preds = inference(model, X_slice)
precision, recall, fbeta = compute_model_metrics(y_slice, preds)
return precision, recall, fbeta
Binary file added model/encoder.pkl
Binary file not shown.
Binary file added model/model.pkl
Binary file not shown.
26 changes: 25 additions & 1 deletion model_card_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,40 @@
For additional information see the Model Card paper: https://arxiv.org/pdf/1810.03993.pdf

## Model Details
This classification model was trained using the 1994 Census Bureau dataset from the UCI Machine Learning Repository (https://archive.ics.uci.edu/dataset/20/census+income). The goal is to predict whether an individual’s annual income exceeds $50,000 based on a set of demographic and socio-economic features, including:
* Sex
* Race
* Marital status
* Age
* Native country
* Education
* Relationship status
* Occupation
* Hours worked per week
* Work class
* Capital gain
* Capital loss

## Intended Use
Primary use: Predicting income category for individuals based on demographic and economic data.
Not intended for: Making real-world financial, hiring, or legal decisions without thorough fairness and bias evaluation.

## Training Data
Source: 1994 Census Bureau dataset (UCI Machine Learning Repository).
Size: 48,842 records after preprocessing.

## Evaluation Data

## Metrics
_Please include the metrics used and your model's performance on those metrics._
Precision : 0.7807 | Recall: 0.5379 | F1: 0.6369

## Ethical Considerations
Dataset reflects social and economic patterns from 1994, which may not represent current demographics or job markets.
Potential bias in predictions related to sensitive attributes such as race, sex, or marital status.
Misuse could perpetuate existing inequalities if deployed in sensitive decision-making contexts.

## Caveats and Recommendations
Model performance may degrade on modern census or employment datasets without retraining.
Bias analysis should be conducted before deployment.
Should not be the sole decision-making tool in critical domains such as hiring or lending.

Binary file added screenshots/continuous_integration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added screenshots/local_api.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added screenshots/unit_test.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading