Skip to content

Commit 3c493ed

Browse files
authored
Merge pull request #88 from ciaran28/main
Bug Fix
2 parents 14825db + 4632b2d commit 3c493ed

File tree

2 files changed

+11
-22
lines changed

2 files changed

+11
-22
lines changed

README.md

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -168,36 +168,19 @@ Secrets in GitHub should look exactly like below. The secrets are case sensitive
168168

169169
<img width="893" alt="image" src="https://user-images.githubusercontent.com/108273509/205954210-c123c407-4c83-4952-ab4b-cd6c485efc2f.png">
170170

171-
- Azure Resources created (Production Environment snapshot)
171+
- Azure Resources created (Production Environment snapshot - For speed I have hashed out all environment deployments except Sandbox. Update onDeploy.yaml to deploy all environments)
172172

173173
<img width="1175" alt="image" src="https://user-images.githubusercontent.com/108273509/194638664-fa6e1809-809e-45b2-9655-9312f32f24bb.png">
174174

175175

176176
---
177177
---
178-
179178

180-
# Repo Guidance
179+
## Running Pipelines
181180

182-
## Databricks as Infrastructure
183-
<details close>
184-
<summary>Click Dropdown... </summary>
181+
- The end to end machine learning pipleine will be pre-configured in the "workflows" section in databricks. This utilises a Job Cluster which will automatically upload the necessary dependencies contained within a python wheel file
185182

186-
<br>
187-
There are many ways that a User may create Databricks Jobs, Notebooks, Clusters, Secret Scopes etc. <br>
188-
<br>
189-
For example, they may interact with the Databricks API/CLI by using: <br>
190-
<br>
191-
i. VS Code on their local machine, <br>
192-
ii. the Databricks GUI online; or <br>
193-
iii. a YAML Pipeline deployment on a DevOps Agent (e.g. GitHub Actions or Azure DevOps etc). <br>
194-
<br>
195-
196-
The programmatic way in which the first two scenarios allow us to interact with the Databricks API is akin to "Continuous **Development**", as opposed to "Continuous **Deployment**". The former is strong on flexibility, however, it is somewhat weak on governance, accountability and reproducibility. <br>
197-
198-
In a nutshell, Continuous **Development** _is a partly manual process where developers can deploy any changes to customers by simply clicking a button, while continuous **Deployment** emphasizes automating the entire process_.
199-
200-
</details>
183+
- If you wish to run the machine learning scripts from the Notebook instead, first upload the dependencies (automatic upload is in development). Simply navigate to python wheel file contained within the dist/ folder. Manually upload the python wheel file to the cluster that you wish to run for the Notebook.
201184

202185
---
203186
---

mlOps/modelOps/data_science/nyc_taxi/train_register.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
# Install pypi packages azureml-sdk[databricks], lightgbm, uszipcode
99
# The above will be automated in due course
1010

11+
12+
# https://learn.microsoft.com/en-us/azure/databricks/_extras/notebooks/source/machine-learning/automl-feature-store-example.html
13+
1114
# COMMAND ----------
1215

1316
from pyspark.sql import *
@@ -154,7 +157,7 @@ def __init__(self, spark: SparkSession, experiment_name: str, namespace: str, wo
154157
self.track_in_azure_ml = False
155158
self.namespace = namespace
156159
self.ws = workspace
157-
self.model_folder = "outputs"
160+
self.model_folder = "cached_models"
158161
self.dbutils = SparkRunner().get_dbutils()
159162

160163

@@ -337,6 +340,9 @@ def train_model(
337340
)
338341

339342
#Save The Model
343+
344+
self.create_model_folder()
345+
340346
model_file_path = self.get_model_file_path("taxi_example_fare_packaged")
341347
print(f"ModelFilePath: {model_file_path}")
342348
joblib.dump(

0 commit comments

Comments
 (0)