|
| 1 | +--- |
| 2 | +title: AI Deploy - Troubleshooting |
| 3 | +slug: deploy/debug-apps |
| 4 | +excerpt: Find here all the most popular questions and answers to troubleshoot your issues |
| 5 | +section: AI Deploy - Guides |
| 6 | +order: 05 |
| 7 | +updated: 2023-03-30 |
| 8 | +routes: |
| 9 | + canonical: 'https://docs.ovh.com/gb/en/publiccloud/ai/deploy/debug-apps/' |
| 10 | +--- |
| 11 | + |
| 12 | +**Last updated 30th March, 2023.** |
| 13 | + |
| 14 | +## Objective |
| 15 | + |
| 16 | +This page gives you a few hints on how to debug your apps if you encounter some issues. |
| 17 | + |
| 18 | +## Requirements |
| 19 | + |
| 20 | +- Access to the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de) |
| 21 | +- A [**Public Cloud** project](https://docs.ovh.com/de/public-cloud/create_a_public_cloud_project/) |
| 22 | + |
| 23 | +## Building your app |
| 24 | + |
| 25 | +### Best practices and mandatory guidelines to build your app |
| 26 | + |
| 27 | +When you are deploying your own applications and models, some guidelines must be followed. We detail them in the guide [AI Deploy - Build & use custom Docker image](https://docs.ovh.com/de/publiccloud/ai/deploy/build-use-custom-image/). |
| 28 | +Be particularly cautious about image requirements such as OVHcloud user and Docker architecture used. Otherwise, your deployment will end in `FAILED` status. |
| 29 | + |
| 30 | +### Apps examples to follow |
| 31 | + |
| 32 | +If you need some official examples, please follow this guide where we share the source code: [AI Deploy - Apps portfolio](https://docs.ovh.com/de/publiccloud/ai/deploy/apps-portfolio/). |
| 33 | + |
| 34 | +### Test your app locally and in the cloud |
| 35 | + |
| 36 | +Before paying for cloud resources, feel free to test locally your Docker image. For that, simply install Docker on your local environment. |
| 37 | + |
| 38 | +For the building step, as explained in the mandatory guidelines linked in the previous section, your Docker image has to support at least `linux/amd64` platform to be deployed correctly. Otherwise deployment will fail. |
| 39 | + |
| 40 | +Then perform a `docker run` as follows: |
| 41 | + |
| 42 | +``` |
| 43 | +# Build your Docker image for at least linux/amd64 architecture |
| 44 | +docker buildx build --platform linux/amd64,linux/arm64 ... |
| 45 | +
|
| 46 | +# Run your Docker image as OVHcloud user |
| 47 | +docker run --rm -it --user=42420:42420 <image-identifier> |
| 48 | +``` |
| 49 | + |
| 50 | +This way, you will imitate the OVHcloud user. Once validated locally, you can deploy your app first with CPUs which are cheaper compared to GPUs. |
| 51 | + |
| 52 | +## Deployments |
| 53 | + |
| 54 | +### My deployment has failed |
| 55 | + |
| 56 | +An AI Deploy app has a workflow in multiple steps, the `FAILED` status being one of them. This state happens when OVHcloud is unable to deploy your app, meaning the infrastructure side (backend) is working fine but something is broken on the image side. You can find more details about AI Deploy workflow on the [AI Deploy - Billing and lifecycle](https://docs.ovh.com/de/publiccloud/ai/deploy/billing/) page. |
| 57 | + |
| 58 | +Main items to troubleshoot: |
| 59 | + |
| 60 | +- Typography in your repository name, image or version name. Test deploying your image locally first. |
| 61 | +- Your Docker image is not following mandatory guidelines, such as OVHcloud user. See [AI Deploy - Build & use custom Docker image](https://docs.ovh.com/de/publiccloud/ai/deploy/build-use-custom-image/). |
| 62 | +- Your Docker image is in a private registry and you did not authorize OVHcloud to access it. |
| 63 | +- You have reached your quotas in terms of CPUs or GPUs. You can check them via the OVHcloud Control Panel (Project Management / Quotas) or via the `ovhai CLI` command `ovhai me`. |
| 64 | + |
| 65 | +If you are using `ovhai CLI`, you can get more more details about your command with the `ovhai debug` command, and `ovhai app logs <app_ID>` to download logs history. |
| 66 | + |
| 67 | +### My deployment is in error |
| 68 | + |
| 69 | +While a deployment in `FAILED` state is due to a problem on the image, repository, etc., an app in `ERROR` state can occur when AI Deploy in encountering an issue. |
| 70 | + |
| 71 | +Try redeploying your app, and modify the targeted datacenter for example. |
| 72 | +As in the previous answer, when using our CLI you can get more more details about your command with the `ovhai debug` command, and `ovhai app logs <app_ID>` to download logs history. |
| 73 | + |
| 74 | +If the issue persists, please contact our support teams. |
| 75 | + |
| 76 | +### My Deployment seems very long |
| 77 | + |
| 78 | +When AI Deploy initializes your app, the Docker image is pulled (downloaded) in our infrastructure and replicated over the replicas, if any. |
| 79 | +The larger the Docker image is, the longer it will take to be deployed on AI Deploy side. |
| 80 | + |
| 81 | +Also, since we pull the data from a registry of your choice, if this particular registry is experiencing some issues or is restricted in terms of bandwidth or throughput, it may cause some slowness. |
| 82 | + |
| 83 | +In an ideal situation, for a Docker image of approximately 1GB, without external data linked, it should take less than 10 minutes. |
| 84 | + |
| 85 | +### My deployed app does not scale |
| 86 | + |
| 87 | +AI Deploy provides manual scaling and autoscaling, allowing you to scale up or down based on triggers such as CPU or RAM usages. |
| 88 | +Find more information on the official documentation about [scaling strategies](https://docs.ovh.com/de/publiccloud/ai/deploy/apps-deployments/). |
| 89 | + |
| 90 | +If your app does not scale: |
| 91 | + |
| 92 | +- Check if you deployed your app with manual or autoscaling. |
| 93 | +- Verify triggers (CPU or RAM usage) and their value. By default the value is at 75%. |
| 94 | +- Open the Monitoring dashboard of your app (Grafana dashboard is provided for each app) and check if the threshold has been reached. |
| 95 | +- Refer to the following load-testing tutorial which also provides a dashboard example to follow your scaling: [AI Deploy - How to load test your application with Locust](https://docs.ovh.com/de/publiccloud/ai/deploy/load-test-app/). |
| 96 | + |
| 97 | + |
| 98 | +### My deployed app is very slow |
| 99 | + |
| 100 | +Slowness may find its roots in multiple reasons. Indeed, each deployed app is the combination of software code and resources, such as compute and network. |
| 101 | + |
| 102 | +If you are experiencing slowness, here are some actions to investigate: |
| 103 | + |
| 104 | +- Open the Monitoring dashboard for your app (Grafana dashboard is provided for each app) and check if some resources are reaching 90/100%, such as RAM, CPU, GPU or network. You can also check the overall latency. |
| 105 | +- If nothing is visible, it can be an issue between the client (where the query comes) and the deployed app. As an example, if you are contacting your apps from a geographically distant point, it will add latency. Try reducing the distances in your architecture. |
| 106 | +- Your Docker image itself may be the root cause. Try running your Docker image locally, and query your app locally. Some apps might be heavy to run or not well optimized. |
| 107 | + |
| 108 | +### My deployment has crashed |
| 109 | + |
| 110 | +Like any cloud product, AI Deploy might experience hardware or software failures over time. To mitigate the risk on your side, please deploy your app on at least two replicas, allowing us to provide high availability. At this time, all replicas are in the same region, but it will prevent them from a physical server failure. |
| 111 | + |
| 112 | +Another root cause may be your own Docker image, for example by writing an uncontrolled amount of data into your working directory. |
| 113 | + |
| 114 | +We also recommend orchestrating your workflow with third party tools such as Airflow, Prefect, Dagster or Kestra, allowing you to relaunch an app once it has crashed. |
| 115 | + |
| 116 | +If your app crashed and you are using `ovhai CLI`, you can get more information with `ovhai app logs <app_ID>` to download logs history. |
| 117 | + |
| 118 | +### My data is not synchronized back |
| 119 | + |
| 120 | +AI Deploy does not synchronize back your remote data. Please follow [official guidelines to build & use custom Docker image](https://docs.ovh.com/de/publiccloud/ai/deploy/build-use-custom-image/). |
| 121 | + |
| 122 | +## Connectivity |
| 123 | + |
| 124 | +### I don't understand how I can connect to my app |
| 125 | + |
| 126 | +AI Deploy provides an HTTP endpoint for each deployed app. You can find your endpoint via the OVHcloud control panel (*Public Cloud / AI Deploy / My app / Access URL*), API or CLI. |
| 127 | + |
| 128 | +An HTTP endpoint will look like this: `https://<unique_id>.app.gra.ai.cloud.ovh.net` |
| 129 | + |
| 130 | +Your app will be directly exposed to this HTTP endpoint and linked to a port (by default, port 8080). |
| 131 | + |
| 132 | +Depending on what you deployed, you then just have a REST endpoint or a Web interface. You can refer to our [Getting Started guide](https://docs.ovh.com/de/publiccloud/ai/deploy/getting-started/) for full explanations. |
| 133 | + |
| 134 | +### I'm unable to connect (unauthorized) |
| 135 | + |
| 136 | +When you deploy an app, you can opt for unrestricted access (open to the internet) or secured access. |
| 137 | + |
| 138 | +While unrestricted access means that everyone is authorized, a secured access will require credentials. Two options are available: |
| 139 | + |
| 140 | +- An AI user. It can be seen as a user and password restriction. Quite simple but not a lot of granularity. |
| 141 | +- An AI token (preferred solution). A token is very effective since you can link them with labels. For example, a token for a specific app ID, for a team, ... |
| 142 | + |
| 143 | +If you selected a restricted access, don't forget to [generate an applicative token](https://docs.ovh.com/de/publiccloud/ai/deploy/tokens/). |
| 144 | + |
| 145 | +### I need more than one port to be exposed |
| 146 | + |
| 147 | +By design, AI Deploy links your app to one HTTP endpoint and one port (default is 8080). If you need more than one port, best practice is to split your deployment in multiple apps. |
| 148 | +If you cannot afford it, you can tweak your HTTP endpoint as follows: `https://<unique_id>-<specific_port>.app.<region>.ai.cloud.ovh.net`. |
| 149 | + |
| 150 | +For example, just add `-8000` after your unique ID and you will be routed to this specific port. |
| 151 | + |
| 152 | +## Billing |
| 153 | + |
| 154 | +### I don't understand how much it will cost to deploy an app |
| 155 | + |
| 156 | +The AI Deploy pricing model is quite simple compared to competitors. You pay for the compute resources (CPUs/GPUs) during the lap of time you will use them. |
| 157 | + |
| 158 | +- Basic example : If you deploy one app with 2 x GPU at 1 euro each for 6 hours, you will pay 12 euros at the end. (2 x 1€ x 6h), whatever the amount of calls or users received. |
| 159 | + |
| 160 | +Prices are shown statically on our [official website](http://www.ovhcloud.com), inside our Public Cloud section. For a dynamic estimation, use the OVHcloud Control Panel. An estimation will be available before launching a deployment. |
| 161 | + |
| 162 | +Also, for more detailed information, please refer to our [AI Deploy - Billing and lifecycle](https://docs.ovh.com/de/publiccloud/ai/deploy/billing/) page. |
| 163 | + |
| 164 | +### I'm unable to get a "pay per call" deployment |
| 165 | + |
| 166 | +So far, only a "pay per minute" model is available. We also share the ambition for a "pay per call" model, but it is not available for now. |
| 167 | + |
| 168 | +## Feedback |
| 169 | + |
| 170 | +Please send us your questions, feedback and suggestions to improve the service: |
| 171 | + |
| 172 | +- On the OVHcloud [Discord server](https://discord.gg/ovhcloud) |
0 commit comments