|
| 1 | +--- |
| 2 | +title: Data Processing Apache Spark Notebooks - Getting started |
| 3 | +slug: apache-spark-notebooks |
| 4 | +excerpt: Learn how to create an Apache Spark notebook |
| 5 | +section: Getting started |
| 6 | +order: 03 |
| 7 | +updated: 2023-04-20 |
| 8 | +routes: |
| 9 | + canonical: 'https://help.ovhcloud.com/csm/en-gb-public-cloud-data-processing-apache-spark-notebooks?id=kb_article_view&sysparm_article=KB0057682' |
| 10 | +--- |
| 11 | + |
| 12 | +**Last updated April 20th, 2023.** |
| 13 | + |
| 14 | +> [!primary] |
| 15 | +> |
| 16 | +> The Notebooks for Apache Spark feature is in `alpha`. During the alpha-testing phase, the infrastructure’s availability and data longevity are not guaranteed. Please do not use this service for applications that are in production, while this phase is not complete. |
| 17 | +> |
| 18 | +
|
| 19 | +## Objective |
| 20 | + |
| 21 | +The OVHcloud Data Processing Notebooks service provides you Jupyter notebooks, linked to an Apache Spark environment totally configured that can be propagated to all nodes and executors without installation. |
| 22 | + |
| 23 | +This guide will cover the creation of a new Apache Spark notebook from the OVHcloud Control Panel and the OVHcloud APIv6. |
| 24 | + |
| 25 | +## Requirements |
| 26 | + |
| 27 | +For the OVHcloud Control Panel: |
| 28 | + |
| 29 | +- A [Public Cloud project](https://www.ovhcloud.com/de/public-cloud/) in your OVHcloud account |
| 30 | +- Access to the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de) |
| 31 | +- A Public Cloud user with the `administrator` role |
| 32 | +- Data Processing activated (see [How to activate the Data Processing service](https://docs.ovh.com/de/data-processing/activation/) for details) |
| 33 | + |
| 34 | +For the OVHcloud APIv6: |
| 35 | + |
| 36 | +- [OVHcloud API credentials](https://docs.ovh.com/de/data-processing/use-api/) |
| 37 | +- An OVHcloud account |
| 38 | +- An activated Public Cloud project in your OVHcloud account (see [How to create a project](https://docs.ovh.com/de/public-cloud/create_a_public_cloud_project/) and [How to activate the Data Processing service](https://docs.ovh.com/de/data-processing/activation/) for details) |
| 39 | + |
| 40 | +## Definition |
| 41 | + |
| 42 | +**Notebooks** are files which contain both computer code (e.g. Python) and rich text elements (paragraph, equations, figures, links, etc.). Notebooks are both human-readable documents containing the analysis description and the results (figures, tables, etc.) as well as executable files which can be run to perform data analysis. It's vastly used across the developer world, especially in the data and artificial intelligence fields. |
| 43 | + |
| 44 | +The advantage compared to doing your own setup is that everything is already installed for you, and that you pay only for your notebooks while they are running. |
| 45 | + |
| 46 | +Each notebook is linked to a **Public Cloud** project and specific hardware resources. |
| 47 | + |
| 48 | +You can create notebooks in the [OVHcloud Control Panel](#controlpanel) or use the [OVHcloud APIv6](#apiv6). |
| 49 | + |
| 50 | +## Instructions |
| 51 | + |
| 52 | +### OVHcloud Control Panel <a name="controlpanel"></a> |
| 53 | + |
| 54 | +Log in to the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.de/&ovhSubsidiary=de), go to the `Public Cloud`{.action} section and select the Public Cloud project concerned. |
| 55 | + |
| 56 | +Access the administration UI for your OVHcloud Data Processing by clicking on `Data Processing`{.action} (1) in the left-hand menu and then click on `Create a notebook`{.action} (2). |
| 57 | + |
| 58 | +{.thumbnail} |
| 59 | + |
| 60 | +In the Control Panel the name is generated automatically, remember it for later. Select a framework (1) and a region (2). |
| 61 | + |
| 62 | +Then choose the notebook privacy setting (3). Be careful with your sensitive data in case you choose public access. |
| 63 | + |
| 64 | +{.thumbnail} |
| 65 | + |
| 66 | +Now select the dimensioning necessary for the job: |
| 67 | + |
| 68 | +(1) Select your notebook. |
| 69 | + |
| 70 | +(2) Select the cluster size. The dimensioning of the cluster is proposed here as an indication and will become final only when the kernel is selected from JupyterLab. |
| 71 | + |
| 72 | +(3) After configuring your notebook, check in the summary box at the top right that all information is correct. |
| 73 | + |
| 74 | +(4) Then click on `Create your notebook`{.action} to create your notebook. |
| 75 | + |
| 76 | +{.thumbnail} |
| 77 | + |
| 78 | +You will be redirected to the notebook dashboard. There you will find information such as the notebook life cycle (1) and the notebook ID (2). |
| 79 | +To access your notebook, click on `JupyterLab`{.action} (3). |
| 80 | + |
| 81 | +{.thumbnail} |
| 82 | + |
| 83 | +Once on the notebook, you will be able to choose the size of the cluster. If you want to find the cluster costs, refer to the dashboard of your notebook as explained in the previous step. |
| 84 | + |
| 85 | +{.thumbnail} |
| 86 | + |
| 87 | +Now you can start to enter your code in the code section (1): |
| 88 | + |
| 89 | +```python |
| 90 | +print("Hello World") |
| 91 | +``` |
| 92 | + |
| 93 | +Run the code by pressing the `▶️`{.action} button (2): |
| 94 | + |
| 95 | +```bash |
| 96 | +Hello World |
| 97 | +``` |
| 98 | + |
| 99 | +Your code is executed in your browser. You can save your example by clicking `Save`{.action} in the `File` menu. |
| 100 | + |
| 101 | +{.thumbnail} |
| 102 | + |
| 103 | +To see all your active kernels, click on the "Terminals and running kernels" menu (3). |
| 104 | + |
| 105 | +To change kernels, click on `Select kernel`{.action} (4) and select a new kernel (changing the cluster may include additional costs). |
| 106 | + |
| 107 | +At the bottom left, a small summary (5) shows how many kernels are used with which cluster. |
| 108 | + |
| 109 | +#### Stopping the Data Processing notebook |
| 110 | + |
| 111 | +Go back to the OVHcloud Control Panel. In the `Data Processing`{.action} panel you can stop each notebook by clicking on `...`{.action}. |
| 112 | + |
| 113 | +{.thumbnail} |
| 114 | + |
| 115 | + |
| 116 | +### OVHcloud APIv6 <a name="apiv6"></a> |
| 117 | + |
| 118 | +In the [OVHcloud APIv6](https://api.ovh.com/console/) you can find all the Data Processing endpoints in the `cloud` section. |
| 119 | + |
| 120 | +{.thumbnail} |
| 121 | + |
| 122 | +Scroll down inside the `cloud` section until you reach the `/cloud/project/{serviceName}/dataProcessing/notebooks/...` endpoints. |
| 123 | + |
| 124 | +Once you have expanded the section, you can try out the endpoints directly in the UI by clicking on them. |
| 125 | + |
| 126 | +>[!primary] |
| 127 | +> |
| 128 | +> The "serviceName" parameter for each endpoint of the `cloud` section requires your Public Cloud project ID. |
| 129 | +
|
| 130 | +For further information about an endpoint, the `Response Class` tab under the `Execute`{.action} button shows what the API response will look like. Switch the tabs to display wrapper code examples. |
| 131 | + |
| 132 | +{.thumbnail} |
| 133 | + |
| 134 | +Before creating a notebook, you can list all the existing notebooks linked to a Public Cloud project by entering your Public cloud Project ID. This way, you can access all the notebook IDs related to your Public Cloud project. |
| 135 | + |
| 136 | +{.thumbnail} |
| 137 | + |
| 138 | +To create a new notebook, use the endpoint `/cloud/project/{service name}/dataProcessing/notebooks`. Specify the Public Cloud project ID (0) and fill in the other fields. |
| 139 | +Define your `spark` notebook and choose a version (1). Define a name and region for your notebook (2). |
| 140 | + |
| 141 | +Then click `Execute` to generate the notebook (3). |
| 142 | + |
| 143 | +{.thumbnail} |
| 144 | + |
| 145 | +Finally, you will be able to get several pieces of information including the ID and URL of the notebook. To access it, click on the URL and you will be redirected to the notebook as shown in step 1. |
| 146 | + |
| 147 | +{.thumbnail} |
| 148 | + |
| 149 | +You can get notebook information at any time from the endpoint `Get notebook information` by entering the Public Cloud project ID. |
| 150 | + |
| 151 | +{.thumbnail} |
| 152 | + |
| 153 | +You can also start or stop your notebook from the APIv6. This allows you to free up resources when you don't need them. |
| 154 | + |
| 155 | +{.thumbnail} |
| 156 | + |
| 157 | +When you are done with your notebook, you can delete it with its Id to free the resources it uses. |
| 158 | + |
| 159 | +{.thumbnail} |
| 160 | + |
| 161 | + |
| 162 | +### Considerations |
| 163 | + |
| 164 | +- A notebook will run indefinitely until manually interrupted, meaning that it will be billed for this runtime. |
| 165 | +- When you stop a Apache Spark notebook, you release the compute resources, but we keep the data from your workspace. It will be billed at the price of OVHcloud Object storage. |
| 166 | +- Billing is per minute. Each started minute is due. |
| 167 | + |
| 168 | +### Notebook lifecycle |
| 169 | + |
| 170 | +During the lifetime of an Apache Spark notebook it will transition between the following statuses: |
| 171 | + |
| 172 | +> [!primary] |
| 173 | +> |
| 174 | +> - Billing starts once a notebook is `Pending` and ends when its status switches to `Cancelling`. |
| 175 | +> - Only notebooks in the states `Pending` and `In service` are included in the resource quota computation. |
| 176 | +> |
| 177 | +
|
| 178 | + |
| 179 | +- `Pending`: The notebook is starting. |
| 180 | +- `In service`: The notebook is running and can be accessed from your browser. |
| 181 | +- `Cancelling`: The notebook is still running, but an interruption order was received. |
| 182 | +- `Stopped`: The notebook is stopped. Compute resources are released. |
| 183 | +- `Deleted`: The notebook data is fully deleted, no further payment is due. |
| 184 | + |
| 185 | +## Go further |
| 186 | + |
| 187 | +To go further and use the Apache Spark notebook, you can follow our tutorials: |
| 188 | + |
| 189 | +- [wordcount-spark](https://docs.ovh.com/de/data-processing/wordcount-spark/#objective) |
| 190 | +- [Calculating π number with Apache Spark](https://docs.ovh.com/de/data-processing/pi-spark/#objective) |
| 191 | + |
| 192 | +Join our community of users on <https://community.ovh.com/en/>. |
| 193 | + |
| 194 | +## Feedback |
| 195 | + |
| 196 | +Please send us your questions, feedback and suggestions to improve the service: |
| 197 | + |
| 198 | +- On the OVHcloud [Discord server](https://discord.com/invite/vXVurFfwe9) |
0 commit comments