You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/platform/ai/deploy_guide_07_troubleshooting/guide.en-gb.md
+40-41Lines changed: 40 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: AI Deploy - Troubleshooting
3
3
slug: deploy/debug-apps
4
-
excerpt: Most popular questions and answer to troubleshoot your issues
4
+
excerpt: Find here all the most popular questions and answers to troubleshoot your issues
5
5
section: AI Deploy - Guides
6
6
order: 05
7
7
updated: 2023-03-30
@@ -16,26 +16,26 @@ This page gives you a few hints on how to debug your apps if you encounter some
16
16
## Requirements
17
17
18
18
- Access to the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.co.uk/&ovhSubsidiary=GB)
19
-
- A **Public Cloud** project
19
+
- A [**Public Cloud** project](https://docs.ovh.com/gb/en/public-cloud/create_a_public_cloud_project/)
20
20
21
21
## Building your app
22
22
23
23
### Best practices and mandatory guidelines to build your app
24
24
25
-
When you are deploying your own applications and models, some guidelines must be followed. We detail them on the guide [AI Deploy - Build & use custom Docker image](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/build-use-custom-image/).
26
-
Especially, be cautious about image requirements such as OVHcloud user and Docker architecture used. Otherwise, your deployment will end in `FAILED` status.
25
+
When you are deploying your own applications and models, some guidelines must be followed. We detail them in the guide [AI Deploy - Build & use custom Docker image](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/build-use-custom-image/).
26
+
Be particularly cautious about image requirements such as OVHcloud user and Docker architecture used. Otherwise, your deployment will end in `FAILED` status.
27
27
28
28
### Apps examples to follow
29
29
30
-
If you need some official examples, please follow this guide, where we share the source code: [AI Deploy - Apps portfolio](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/apps-portfolio/).
30
+
If you need some official examples, please follow this guide where we share the source code: [AI Deploy - Apps portfolio](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/apps-portfolio/).
31
31
32
32
### Test your app locally and in the cloud
33
33
34
34
Before paying for cloud resources, feel free to test locally your Docker image. For that, simply install Docker on your local environment.
35
35
36
36
For the building step, as explained in the mandatory guidelines linked in the previous section, your Docker image has to support at least `linux/amd64` platform to be deployed correctly. Otherwise deployment will fail.
37
37
38
-
Then perform a `docker run` as follow:
38
+
Then perform a `docker run` as follows:
39
39
40
40
```
41
41
# Build your Docker image for at least linux/amd64 architecture
docker run --rm -it --user=42420:42420 <image-identifier>
46
46
```
47
47
48
-
This way, we will imitate the OVHcloud user. Once validated locally, you can deploy your app first with CPUs, who are cheaper compared to GPUs.
49
-
48
+
This way, you will imitate the OVHcloud user. Once validated locally, you can deploy your app first with CPUs which are cheaper compared to GPUs.
50
49
51
50
## Deployments
52
51
53
52
### My deployment has failed
54
53
55
-
An AI Deploy app has a workflow in multiple steps, and the `FAILED` status is one of them. This state happens when OVHcloud is unable to deploy your app, meaning the infrastructure side (backend) is working fine but something is broken on the image side. You can find more details about AI Deploy workflow in [AI Deploy - Billing and lifecycle](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/billing/)
54
+
An AI Deploy app has a workflow in multiple steps, the `FAILED` status being one of them. This state happens when OVHcloud is unable to deploy your app, meaning the infrastructure side (backend) is working fine but something is broken on the image side. You can find more details about AI Deploy workflow on the [AI Deploy - Billing and lifecycle](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/billing/) page.
56
55
57
56
Main items to troubleshoot:
58
57
59
-
- Typography in your repository name, image or version name. Test to deploy your image locally first.
60
-
- Your Docker image is not following mandatory guidelines, such as OVhcloud user. See [AI Deploy - Build & use custom Docker image](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/build-use-custom-image/)
61
-
- Your Docker image is in a private registry, and you did not authorize OVHcloud to access it.
62
-
-Your have reached your quotas in terms of CPUs or GPUs. You can check them via the control panel (Project Management / Quotas) or via the `ovhai CLI` command `ovhai me`.
58
+
- Typography in your repository name, image or version name. Test deploying your image locally first.
59
+
- Your Docker image is not following mandatory guidelines, such as OVHcloud user. See [AI Deploy - Build & use custom Docker image](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/build-use-custom-image/).
60
+
- Your Docker image is in a private registry and you did not authorize OVHcloud to access it.
61
+
-You have reached your quotas in terms of CPUs or GPUs. You can check them via the OVHcloud Control Panel (Project Management / Quotas) or via the `ovhai CLI` command `ovhai me`.
63
62
64
-
If you are using `ovhai CLI`, you can learn more information with the `ovhai debug` command, which will give you more details about your command, and `ovhai app logs <app_ID>` to download logs history.
63
+
If you are using `ovhai CLI`, you can get more more details about your command with the `ovhai debug` command, and `ovhai app logs <app_ID>` to download logs history.
65
64
66
65
### My deployment is in error
67
66
68
67
While a deployment in `FAILED` state is due to a problem on the image, repository, etc., an app in `ERROR` state can occur when AI Deploy in encountering an issue.
69
68
70
-
Try to redeploy your app, and modify the targeted datacenter for example.
71
-
As the previous question, when using our CLI, you can learn more information with the `ovhai debug` command, which will give you more details about your command, and `ovhai app logs <app_ID>` to download logs history.
69
+
Try redeploying your app, and modify the targeted datacenter for example.
70
+
As in the previous answer, when using our CLI you can get more more details about your command with the `ovhai debug` command, and `ovhai app logs <app_ID>` to download logs history.
72
71
73
-
If the issue persists, please contact our support.
72
+
If the issue persists, please contact our support teams.
74
73
75
74
### My Deployment seems very long
76
75
77
-
When AI Deploy initialize your app, the Docker image is pulled (downloaded) in our infrastructure and replicated over the replicas, if any.
78
-
The larger the Docker image is, the longer it will take to be deployed on AI Deploy side.
76
+
When AI Deploy initializes your app, the Docker image is pulled (downloaded) in our infrastructure and replicated over the replicas, if any.
77
+
The larger the Docker image is, the longer it will take to be deployed on AI Deploy side.
79
78
80
-
Also, since we pull the data from a registry of your choice, if this particular registry is experiencing some issue or is restricted in terms of bandwidth or throughput, it may cause some slowness.
79
+
Also, since we pull the data from a registry of your choice, if this particular registry is experiencing some issues or is restricted in terms of bandwidth or throughput, it may cause some slowness.
81
80
82
-
In an ideal situation, for a Docker image sized approximately 1GB, without external data linked, it should take less than 10 minutes.
81
+
In an ideal situation, for a Docker image of approximately 1GB, without external data linked, it should take less than 10 minutes.
83
82
84
83
### My deployed app does not scale
85
84
86
85
AI Deploy provides manual scaling and autoscaling, allowing you to scale up or down based on triggers such as CPU or RAM usages.
87
-
More information on the official documentation about [scaling strategies](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/apps-deployments/).
86
+
Find more information on the official documentation about [scaling strategies](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/apps-deployments/).
88
87
89
88
If your app does not scale:
90
89
91
90
- Check if you deployed your app with manual or autoscaling.
92
91
- Verify triggers (CPU or RAM usage) and their value. By default the value is at 75%.
93
92
- Open the Monitoring dashboard of your app (Grafana dashboard is provided for each app) and check if the threshold has been reached.
94
-
-For load-testing tutorial and dashboard example to follow your scaling, you can refer to this tutorial: [AI Deploy - How to load test your application with Locust](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/load-test-app/).
93
+
-Refer to the following load-testing tutorial which also provides a dashboard example to follow your scaling: [AI Deploy - How to load test your application with Locust](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/load-test-app/).
95
94
96
95
97
96
### My deployed app is very slow
@@ -100,29 +99,29 @@ Slowness may find its roots in multiple reasons. Indeed, each deployed app is th
100
99
101
100
If you are experiencing slowness, here are some actions to investigate:
102
101
103
-
- Open the Monitoring dashboard for your app (Grafana dashboard is provided for each app) and check if you are reaching some resources to 90/100%, such as RAM, CPU, GPU or network. You can also check the overall latency.
104
-
- If nothing is visible, it can be an issue between the client (where the query comes) and the deployed app. As an example, if you are contacting your apps from a geographically distant point, it will add latency. Try to reduce the distances in your architecture.
105
-
- Your Docker image itself may be the root cause. Try to run your Docker image locally, and query your app locally. Some apps might be heavy to run or not well optimized.
102
+
- Open the Monitoring dashboard for your app (Grafana dashboard is provided for each app) and check if some resources are reaching 90/100%, such as RAM, CPU, GPU or network. You can also check the overall latency.
103
+
- If nothing is visible, it can be an issue between the client (where the query comes) and the deployed app. As an example, if you are contacting your apps from a geographically distant point, it will add latency. Try reducing the distances in your architecture.
104
+
- Your Docker image itself may be the root cause. Try running your Docker image locally, and query your app locally. Some apps might be heavy to run or not well optimized.
106
105
107
106
### My deployment has crashed
108
107
109
-
Like any cloud product, AI Deploy might experience hardware or software failures over time. To mitigate the risk on your side, please deploy you app on at least two replicas, allowing us to provide high availability. At this time, all replicas are in the same region, but it will prevent from a physical server failure.
108
+
Like any cloud product, AI Deploy might experience hardware or software failures over time. To mitigate the risk on your side, please deploy your app on at least two replicas, allowing us to provide high availability. At this time, all replicas are in the same region, but it will prevent them from a physical server failure.
110
109
111
-
Another root cause may be your own Docker image, for example by writing uncontrolled amount of data into your working directory.
110
+
Another root cause may be your own Docker image, for example by writing an uncontrolled amount of data into your working directory.
112
111
113
-
Also, we recommend to orchestrate your workflow with third party tools such as Airflow, Prefect, Dagster or Kestra, allowing you to relaunch an app once it has crashed.
112
+
We also recommend orchestrating your workflow with third party tools such as Airflow, Prefect, Dagster or Kestra, allowing you to relaunch an app once it has crashed.
114
113
115
-
If your app crashed and you are using `ovhai CLI`, you can learn more information with `ovhai app logs <app_ID>` to download logs history.
114
+
If your app crashed and you are using `ovhai CLI`, you can get more information with `ovhai app logs <app_ID>` to download logs history.
116
115
117
116
### My data is not synchronized back
118
117
119
-
AI Deploy does not synchronize back your remote data. Please follow [official guideline to build & use custom Docker image](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/build-use-custom-image/).
118
+
AI Deploy does not synchronize back your remote data. Please follow [official guidelines to build & use custom Docker image](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/build-use-custom-image/).
120
119
121
120
## Connectivity
122
121
123
122
### I don't understand how I can connect to my app
124
123
125
-
AI Deploy provides an HTTP endpoint for each deployed app. You can find your endpoint via OVHcloud control panel (*Public Cloud / AI Deploy / My app / Access URL*), API or CLI.
124
+
AI Deploy provides an HTTP endpoint for each deployed app. You can find your endpoint via the OVHcloud control panel (*Public Cloud / AI Deploy / My app / Access URL*), API or CLI.
126
125
127
126
An HTTP endpoint will look like this: `https://<unique_id>.app.gra.ai.cloud.ovh.net`
128
127
@@ -132,9 +131,9 @@ Depending on what you deployed, you then just have an API endpoint or a web inte
132
131
133
132
### I'm unable to connect (unauthorized)
134
133
135
-
When you deploy an app, you can opt for unrestricted access (open to the internet) or secured access.
134
+
When you deploy an app, you can opt for unrestricted access (open to the internet) or secured access.
136
135
137
-
While unrestricted access means that everyone is authorized, a secured access will require credentials. Two ways are available:
136
+
While unrestricted access means that everyone is authorized, a secured access will require credentials. Two options are available:
138
137
139
138
- An AI user. It can be seen as a user and password restriction. Quite simple but not a lot of granularity.
140
139
- An AI token (preferred solution). A token is very effective since you can link them with labels. For example, a token for a specific app ID, for a team, ...
@@ -143,22 +142,22 @@ If you selected a restricted access, don't forget to [generate an applicative to
143
142
144
143
### I need more than one port to be exposed
145
144
146
-
By design, AI Deploy links your app to one HTTP endpoint and one port (default is 8080). If you need more than one port, best practice is to split you deployment in multiple apps.
147
-
If you cannot afford it, you can tweak your HTTP endpoint as follow: `https://<unique_id>-<specific_port>.app.<region>.ai.cloud.ovh.net`.
145
+
By design, AI Deploy links your app to one HTTP endpoint and one port (default is 8080). If you need more than one port, best practice is to split your deployment in multiple apps.
146
+
If you cannot afford it, you can tweak your HTTP endpoint as follows: `https://<unique_id>-<specific_port>.app.<region>.ai.cloud.ovh.net`.
148
147
149
148
For example, just add `-8000` after your unique ID and you will be routed to this specific port.
150
149
151
150
## Billing
152
151
153
-
### I don't understand how it will cost to deploy an app
152
+
### I don't understand how much it will cost to deploy an app
154
153
155
-
AI Deploy pricing model is quite simple compared to competitors. You pay for the compute resources (CPUs/GPUs) during the lap of time you will use them.
154
+
The AI Deploy pricing model is quite simple compared to competitors. You pay for the compute resources (CPUs/GPUs) during the lap of time you will use them.
156
155
157
-
Basic example : If you deploy one app with 2 x GPU at 1 euro each for 6 hours, you will pay 12 euros at the end. (2 x 1€ x 6h), whatever the amount of calls or users received.
156
+
-Basic example : If you deploy one app with 2 x GPU at 1 euro each for 6 hours, you will pay 12 euros at the end. (2 x 1€ x 6h), whatever the amount of calls or users received.
158
157
159
-
Prices are shown statically in our [official website](http://www.ovhcloud.com), inside our Public Cloud section. For dynamic estimation, use our control panel. An estimation will be available before launching a deployment.
158
+
Prices are shown statically on our [official website](http://www.ovhcloud.com), inside our Public Cloud section. For a dynamic estimation, use the OVHcloud Control Panel. An estimation will be available before launching a deployment.
160
159
161
-
Also, for more detailed information, please refer to [AI Deploy - Billing and lifecycle](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/billing/).
160
+
Also, for more detailed information, please refer to our [AI Deploy - Billing and lifecycle](https://docs.ovh.com/gb/en/publiccloud/ai/deploy/billing/) page.
0 commit comments