Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 60a3c01

Browse files
authored
Merge pull request #91 from panchul/alekp_kubeflow_install
Adding demo of using TensorBoard with PyTorch and TensorFlow on Azure Stack
2 parents 2e4ebdc + 89aff18 commit 60a3c01

File tree

14 files changed

+450
-57
lines changed

14 files changed

+450
-57
lines changed

Research/kubeflow-on-azure-stack/Readme.md

Lines changed: 55 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ This module demonstrates how to create and use a Kubeflow cluster on Azure Stack
88
- [Prerequisites](#prerequisites)
99
- [Installing Kubernetes manually](installing_kubernetes.md)
1010
- [Kubernetes Dashboard](#kubernetes-dashboard)
11-
- [Tensorboard](#tensorboard)
1211
- [Persistence on AzureStack](#persistence-on-azure-stack)
1312
- [Install Kubeflow](#install-kubeflow)
1413
- [Kubeflow dashboard](#preparing-kubeflow-dashboard) (preparing and using)
@@ -114,61 +113,6 @@ you imported them, you should be able to see the Kubernetes Dashboard in a brows
114113

115114
![pics/kubernetes_dashboard_intro.png](pics/kubernetes_dashboard_intro.png)
116115

117-
## Tensorboard
118-
119-
You can skip this chapter for now. There is another useful tool to monitor some ML applications if
120-
they support it. We provided a sample file to start it in your Kubernetes cluster, `tensorboard.yaml`.
121-
You might contact your cloud administrator to help you establish network access, or you can
122-
use ssh port forwarding to see it via your desktop's `localhost` address and port 6006.
123-
124-
It will look something like this(for a different app, outside the scope of this demo):
125-
126-
![pics/tensorboard_graph.png](pics/tensorboard_graph.png)
127-
128-
Here is how you would connect your Tensorboard with the persistence we discuss next:
129-
130-
$ cat tb.yaml
131-
apiVersion: extensions/v1beta1
132-
kind: Deployment
133-
metadata:
134-
labels:
135-
app: tensorboard
136-
name: tensorboard
137-
spec:
138-
replicas: 1
139-
selector:
140-
matchLabels:
141-
app: tensorboard
142-
template:
143-
metadata:
144-
labels:
145-
app: tensorboard
146-
spec:
147-
volumes:
148-
- name: samba-share-volume2
149-
persistentVolumeClaim:
150-
# claimName: azurefile
151-
claimName: samba-share-claim
152-
containers:
153-
- name: tensorboard
154-
image: tensorflow/tensorflow:1.10.0
155-
imagePullPolicy: Always
156-
command:
157-
- /usr/local/bin/tensorboard
158-
args:
159-
- --logdir
160-
- /tmp/tensorflow/logs
161-
volumeMounts:
162-
- mountPath: /tmp/tensorflow
163-
#subPath: somedemo55
164-
name: samba-share-volume2
165-
ports:
166-
- containerPort: 6006
167-
protocol: TCP
168-
dnsPolicy: ClusterFirst
169-
restartPolicy: Always
170-
171-
172116
## Persistence on Azure Stack
173117

174118
Most real-life applications need data storage. Azure Stack team actively works on making
@@ -450,6 +394,56 @@ it will look like so:
450394

451395
You can now re-install it if you would like.
452396

397+
## Tensorboard
398+
399+
You can skip this chapter for now. There is another useful tool to monitor some ML applications if
400+
they support it. We provided a sample file to start it in your Kubernetes cluster, `tensorboard.yaml`.
401+
You might contact your cloud administrator to help you establish network access, or you can
402+
use ssh port forwarding to see it via your desktop's `localhost` address and port 6006.
403+
404+
Here is how you would connect your Tensorboard with the persistence we discuss next:
405+
406+
$ cat tb.yaml
407+
apiVersion: extensions/v1beta1
408+
kind: Deployment
409+
metadata:
410+
labels:
411+
app: tensorboard
412+
name: tensorboard
413+
spec:
414+
replicas: 1
415+
selector:
416+
matchLabels:
417+
app: tensorboard
418+
template:
419+
metadata:
420+
labels:
421+
app: tensorboard
422+
spec:
423+
volumes:
424+
- name: samba-share-volume2
425+
persistentVolumeClaim:
426+
# claimName: azurefile
427+
claimName: samba-share-claim
428+
containers:
429+
- name: tensorboard
430+
image: tensorflow/tensorflow:1.10.0
431+
imagePullPolicy: Always
432+
command:
433+
- /usr/local/bin/tensorboard
434+
args:
435+
- --logdir
436+
- /tmp/tensorflow/logs
437+
volumeMounts:
438+
- mountPath: /tmp/tensorflow
439+
#subPath: somedemo55
440+
name: samba-share-volume2
441+
ports:
442+
- containerPort: 6006
443+
protocol: TCP
444+
dnsPolicy: ClusterFirst
445+
restartPolicy: Always
446+
453447
## Next Steps
454448

455449
Proceed to [TensorFlow on Kubeflow Tutorial](tensorflow-on-kubeflow/Readme.md#tensorflow-on-kubeflow-on-azure-stack)
@@ -458,6 +452,11 @@ to learn how to execute `TFJob`s on Kubeflow, in the environment that we just cr
458452
And then run [PyTorch on Kubeflow Tutorial](pytorch-on-kubeflow/Readme.md#pytorch-on-kubeflow-on-azure-stack) tutorial to learn running
459453
`PyTorchJob`s.
460454

455+
The PyTorch example we run will log data for TensorBoard, you will see something like this:
456+
457+
![pytorch-on-kubeflow/images/tensorboard_scalars.png](pytorch-on-kubeflow/images/tensorboard_scalars.png)
458+
459+
461460
# Links
462461

463462
The following resources might help during troubleshooting or modifications:
33.2 KB
Loading
184 KB
Loading
86.1 KB
Loading
81.4 KB
Loading
65 KB
Loading

Research/kubeflow-on-azure-stack/pytorch-on-kubeflow/Readme.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Shortly after, you should see the pods up-and-running:
139139

140140
At standard Kubernetes Dashboard you could see the pods:
141141

142-
![](images/pytorch_cluster_pods.png)
142+
![images/pytorch_cluster_pods.png](images/pytorch_cluster_pods.png)
143143

144144

145145
To get the information about the running PyTorch workload:
@@ -321,6 +321,12 @@ Look at the logs to see the progress:
321321
accuracy=0.9872
322322
saving model to 'mnist_cnn.pt'...
323323

324+
325+
And, since in our example we used tensorboardX to write summaries, if you connected the `Tensorboard` logs properly,
326+
you will see your data in the Tensorboard:
327+
328+
![images/tensorboard_scalars.png](images/tensorboard_scalars.png)
329+
324330
See more on PyTorch at [https://github.com/pytorch/pytorch](https://github.com/pytorch/pytorch)
325331

326332
# Links
37 KB
Loading
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/sh
2+
#
3+
# extracting TensorBoard's pod name and forwarding 6006 to the outside
4+
#
5+
export PODNAME=$(kubectl get pod -l app=tensorboard -o jsonpath='{.items[0].metadata.name}')
6+
kubectl port-forward ${PODNAME} 6006:6006

Research/kubeflow-on-azure-stack/tensorflow-on-kubeflow/Readme.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,46 @@ And soon after, you should see the serialized model in your folder:
257257

258258
For more tutorials and How-Tos, see TensorFlow's [save_and_load.ipynb](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/distribute/save_and_load.ipynb) example notebook.
259259

260+
## Tensorboard
261+
262+
There is another useful tool to monitor some ML applications if they support it. We provided a sample file to start it in your Kubernetes cluster, `tensorboard.yaml`.
263+
264+
A concrete example of a tensorboard-using script is in folder `mnist-w-tb`. You will need your
265+
github account to build the image(substitute `rollingstone` for yours) and run:
266+
267+
$ cd mnist-w-tb
268+
$ docker build -t rollingstone/tf-mnist-w-tb:latest .
269+
$ docker push rollingstone/tf-mnist-w-tb:latest
270+
$ kubectl create -f tb.yaml
271+
$ kubectl create -f tfjob.yaml
272+
273+
You might contact your cloud administrator to help you establish network access, or you can
274+
use ssh port forwarding to see it via your desktop's `localhost` address and port 6006.
275+
276+
$ export PODNAME=$(kubectl get pod -l app=tensorboard -o jsonpath='{.items[0].metadata.name}')
277+
$ kubectl port-forward ${PODNAME} 6006:6006
278+
279+
It will look something like this:
280+
281+
![../pics/tensorboard_graph.png](../pics/tensorboard_graph.png)
282+
283+
Another tab shows the input images (train/test):
284+
285+
![../pics/tensorboard_images.png](../pics/tensorboard_images.png)
286+
287+
There are scalars the model logged:
288+
289+
![../pics/tensorboard_scalars.png](../pics/tensorboard_scalars.png)
290+
291+
And histograms for different parameters:
292+
293+
![../pics/tensorboard_histograms.png](../pics/tensorboard_histograms.png)
294+
295+
There is a projector that animates the points in the NN layer dimensions:
296+
297+
![../pics/tensorboard_projector.png](../pics/tensorboard_projector.png)
298+
299+
260300
## Next Steps
261301

262302
Proceed to [PyTorch on Kubeflow Tutorial](../pytorch-on-kubeflow/Readme.md) tutorial to learn how to run `PyTorchJob`s.
@@ -267,3 +307,4 @@ For further information:
267307

268308
- https://www.kubeflow.org/docs/components/training/tftraining/
269309
- https://www.tensorflow.org/
310+
- https://github.com/Azure/kubeflow-labs

0 commit comments

Comments
 (0)