Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ category: containers
product: kubernetes
---

**Centralized monitoring is now available**, allowing you to send Kubernetes container logs to Cockpit for streamlined monitoring. Setup is easy with **one-click deployment** via Easy Deploy using Promtail. This feature captures **all container logs**, including pod stdout/stderr and systemd journal. Additionally, you can control ingestion costs with **customizable filtering options**.
**Centralized monitoring is now available**, allowing you to send Kubernetes container logs to Cockpit for streamlined monitoring. Setup is easy with **one-click deployment** via Easy Deploy using Promtail. This feature captures **all container logs**, including Pod stdout/stderr and systemd journal. Additionally, you can control ingestion costs with **customizable filtering options**.

Learn more in our dedicated documentation: [Monitor Data Plane with Cockpit](https://www.scaleway.com/en/docs/kubernetes/how-to/monitor-data-plane-with-cockpit/)
10 changes: 5 additions & 5 deletions pages/cockpit/how-to/configure-alerts-for-scw-resources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Data source managed alert rules allow you to configure alerts managed by the dat

## Define your metric and alert conditions

Switch between the tabs below to create alerts for a Scaleway Instance, an Object Storage bucket, a Kubernetes cluster pod, or Cockpit logs.
Switch between the tabs below to create alerts for a Scaleway Instance, an Object Storage bucket, a Kubernetes cluster Pod, or Cockpit logs.

<Tabs id="install">
<TabsTab label="Scaleway Instance">
Expand Down Expand Up @@ -105,15 +105,15 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
6. Click **Save rule and exit** in the top right corner of your screen to save and activate your alert.
7. Optionally, check that your configuration works by temporarily lowering the threshold. This will trigger the alert and notify your [contacts](/cockpit/concepts/#contact-points).
</TabsTab>
<TabsTab label="Kubernetes pod">
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no new pod activity occurs, which could mean your cluster is stuck or unresponsive.**
<TabsTab label="Kubernetes Pod">
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no new Pod activity occurs, which could mean your cluster is stuck or unresponsive.**

1. In the query field next to the **Loading metrics... >** button, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_name`) correspond to those of the target resource.
```bash
rate(kubernetes_cluster_k8s_shoot_nodes_pods_usage_total{resource_name="k8s-par-quizzical-chatelet"}[15m]) == 0
rate(kubernetes_cluster_k8s_shoot_nodes_Pods_usage_total{resource_name="k8s-par-quizzical-chatelet"}[15m]) == 0
```
<Message type="tip">
The `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric represents the total number of pods currently running across all nodes in your Kubernetes cluster. It is helpful to monitor current pod consumption per node pool or cluster, and help track resource saturation or unexpected workload spikes.
The `kubernetes_cluster_k8s_shoot_nodes_Pods_usage_total` metric represents the total number of Pods currently running across all nodes in your Kubernetes cluster. It is helpful to monitor current Pod consumption per node pool or cluster, and help track resource saturation or unexpected workload spikes.
</Message>
2. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
3. Enter a name in the **Namespace** and **Group** fields to categorize and manage your alert rules. Rules that share the same group will use the same configuration, including the evaluation interval which determines how often the rule is evaluated (by default: every 1 minute). You can modify this interval later in the group settings.
Expand Down
6 changes: 3 additions & 3 deletions pages/cockpit/how-to/send-logs-from-k8s-to-cockpit.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: How to send logs from your Kubernetes cluster to your Cockpit
description: Learn how to send your pod logs to your Cockpit using Scaleway's comprehensive guide. This tutorial covers sending Kubernetes pods logs to Scaleway's Cockpit for centralized monitoring and analysis using Grafana, ensuring efficient monitoring and log analysis in your infrastructure.
description: Learn how to send your Pod logs to your Cockpit using Scaleway's comprehensive guide. This tutorial covers sending Kubernetes Pods logs to Scaleway's Cockpit for centralized monitoring and analysis using Grafana, ensuring efficient monitoring and log analysis in your infrastructure.
tags: kubernetes cockpit logs observability monitoring cluster
dates:
validation: 2025-08-20
Expand Down Expand Up @@ -58,7 +58,7 @@ nodeLogs:
enabled: true
destinations: ["my-cockpit-logs"]
# -- Pod logs.
podLogs:
PodLogs:
enabled: true
destinations: ["my-cockpit-logs"]
volumeGatherSettings:
Expand Down Expand Up @@ -93,7 +93,7 @@ Once you have configured your `values.yml` file, you can use Helm to deploy the
<Message type="iam">
The `-f` flag specifies the path to your `values.yml` file, which contains the configuration for the Helm chart. <br /><br />
Helm installs the `k8s-monitoring` chart, which includes the Alloy DaemonSet configured to collect logs from your Kubernetes cluster. <br /><br />
The DaemonSet ensures that a pod is running on each node in your cluster, which collects logs and forwards them to the specified Loki endpoint in your Cockpit.
The DaemonSet ensures that a Pod is running on each node in your cluster, which collects logs and forwards them to the specified Loki endpoint in your Cockpit.
</Message>
3. Optionally, run the following command to check the status of the release and ensure it was installed:

Expand Down
6 changes: 3 additions & 3 deletions pages/cockpit/how-to/send-metrics-from-k8s-to-cockpit.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: How to send metrics from your Kubernetes cluster to your Cockpit
description: Learn how to send your pod metrics to your Cockpit using Scaleway's comprehensive guide. This tutorial covers sending Kubernetes pods metrics to Scaleway's Cockpit for centralized monitoring and analysis using Grafana, ensuring efficient monitoring and metrics analysis in your infrastructure.
description: Learn how to send your Pod metrics to your Cockpit using Scaleway's comprehensive guide. This tutorial covers sending Kubernetes Pods metrics to Scaleway's Cockpit for centralized monitoring and analysis using Grafana, ensuring efficient monitoring and metrics analysis in your infrastructure.
tags: kubernetes cockpit metrics observability monitoring cluster
dates:
validation: 2025-08-20
Expand Down Expand Up @@ -70,7 +70,7 @@ alloy-singleton:
## Add annotations for auto-discovery
Annotations in Kubernetes provide a way to attach metadata to your resources. For `k8s-monitoring`, these annotations signal which pods should be scraped for metrics, and what port to use. In this documentation we are adding annotations to specify we want `k8s-monitoring` to scrape the pods from our deployment. Make sure that you replace `$METRICS_PORT` with the port where your application exposes Prometheus metrics.
Annotations in Kubernetes provide a way to attach metadata to your resources. For `k8s-monitoring`, these annotations signal which Pods should be scraped for metrics, and what port to use. In this documentation we are adding annotations to specify we want `k8s-monitoring` to scrape the Pods from our deployment. Make sure that you replace `$METRICS_PORT` with the port where your application exposes Prometheus metrics.

### Kubernetes deployment template

Expand Down Expand Up @@ -153,7 +153,7 @@ Once you have configured your `values.yml` file, you can use Helm to deploy the
<Message type="iam">
The `-f` flag specifies the path to your `values.yml` file, which contains the configuration for the Helm chart. <br /><br />
Helm installs the `k8s-monitoring` chart, which includes the Alloy DaemonSet configured to collect metrics from your Kubernetes cluster. <br /><br />
The DaemonSet ensures that a pod is running on each node in your cluster, which collects metrics and forwards them to the specified Prometheus endpoint in your Cockpit.
The DaemonSet ensures that a Pod is running on each node in your cluster, which collects metrics and forwards them to the specified Prometheus endpoint in your Cockpit.
</Message>
3. Optionally, check the status of the release to ensure it was installed:

Expand Down
4 changes: 2 additions & 2 deletions pages/data-lab/concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ dates:

## Apache Spark cluster

An Apache Spark cluster is an orchestrated set of machines over which distributed/Big data calculus is processed. In the case of Scaleway Data Lab, the Apache Spark cluster is a Kubernetes cluster, with Apache Spark installed in each pod. For more details, check out the [Apache Spark documentation](https://spark.apache.org/documentation.html).
An Apache Spark cluster is an orchestrated set of machines over which distributed/Big data calculus is processed. In the case of Scaleway Data Lab, the Apache Spark cluster is a Kubernetes cluster, with Apache Spark installed in each Pod. For more details, check out the [Apache Spark documentation](https://spark.apache.org/documentation.html).

## Data Lab

Expand Down Expand Up @@ -40,7 +40,7 @@ A notebook for an Apache Spark cluster is an interactive, web-based tool that al

## Persistent volume

A Persistent Volume (PV) is a cluster-wide storage resource that ensures data persistence beyond the lifecycle of individual pods. Persistent volumes abstract the underlying storage details, allowing administrators to use various storage solutions.
A Persistent Volume (PV) is a cluster-wide storage resource that ensures data persistence beyond the lifecycle of individual Pods. Persistent volumes abstract the underlying storage details, allowing administrators to use various storage solutions.

Apache Spark® executors require storage space for various operations, particularly to shuffle data during wide operations such as sorting, grouping, and aggregation. Wide operations are transformations that require data from different partitions to be combined, often resulting in data movement across the cluster. During the map phase, executors write data to shuffle storage, which is then read by reducers.

Expand Down
28 changes: 14 additions & 14 deletions pages/gpu/how-to/use-mig-with-kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernete

## Configure MIG partitions inside a Kubernetes cluster

1. Find the name of the pods running the Nvidia Driver:
1. Find the name of the Pods running the Nvidia Driver:
```
% kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
Expand Down Expand Up @@ -163,7 +163,7 @@ In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernete

## Deploy containers that use NVIDIA MIG technology partitions

1. Write a deployment file to deploy 8 pods executing NVIDIA SMI.
1. Write a deployment file to deploy 8 Pods executing NVIDIA SMI.
Open a text editor of your choice and create a deployment file `deploy-mig.yaml`, then paste the following content into the file, save it, and exit the editor:
```yaml
apiVersion: v1
Expand Down Expand Up @@ -321,7 +321,7 @@ In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernete
nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb
```

2. Deploy the pods:
2. Deploy the Pods:
```
% kubectl create -f deploy-mig.yaml
pod/test-1 created
Expand All @@ -334,7 +334,7 @@ In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernete
pod/test-8 created
```

3. Display the logs of the pods. The pods print their UUID with the `nvidia-smi` command:
3. Display the logs of the Pods. The Pods print their UUID with the `nvidia-smi` command:
```
% kubectl get -f deploy-mig.yaml -o name | xargs -I{} kubectl logs {}
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
Expand All @@ -354,19 +354,19 @@ In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernete
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb Device 0: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)
```
As you can see, seven pods have been executed on different MIG partitions, while the eighth pod had to wait for one of the seven MIG partitions to become available to be executed.
As you can see, seven Pods have been executed on different MIG partitions, while the eighth Pod had to wait for one of the seven MIG partitions to become available to be executed.

4. Clean the deployment:
```
% kubectl delete -f deploy-mig.yaml
pod "test-1" deleted
pod "test-2" deleted
pod "test-3" deleted
pod "test-4" deleted
pod "test-5" deleted
pod "test-6" deleted
pod "test-7" deleted
pod "test-8" deleted
Pod "test-1" deleted
Pod "test-2" deleted
Pod "test-3" deleted
Pod "test-4" deleted
Pod "test-5" deleted
Pod "test-6" deleted
Pod "test-7" deleted
Pod "test-8" deleted
```

## Disable MIG inside a Kubernetes cluster
Expand All @@ -377,7 +377,7 @@ In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernete
node/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled
```

2. Check the status of NVIDIA SMI in the driver pod:
2. Check the status of NVIDIA SMI in the driver Pod:
```
% kubectl exec nvidia-driver-daemonset-8t89m -t -n kube-system -- nvidia-smi -L
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Below, you will find a guide to help you make an informed decision:
* Up to 2 PCIe GPU with [H100 Instances](https://www.scaleway.com/en/h100-pcie-try-it-now/) or 8 PCIe GPU with [L4](https://www.scaleway.com/en/l4-gpu-instance/) or [L4OS](https://www.scaleway.com/en/contact-l40s/) Instances.
* Or better, an HGX-based server setup with up to 8x NVlink GPUs with [H100-SXM Instances](/gpu/reference-content/choosing-gpu-instance-type/)
* A [supercomputer architecture](https://www.scaleway.com/en/ai-supercomputers/) for a larger setup for workload-intensive tasks
* Another way to scale your workload is to use [Kubernetes and MIG](/gpu/how-to/use-nvidia-mig-technology/): You can divide a single H100 or H100-SXM GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S pods.
* Another way to scale your workload is to use [Kubernetes and MIG](/gpu/how-to/use-nvidia-mig-technology/): You can divide a single H100 or H100-SXM GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S Pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S Pods.
* **Online resources:** Check for online resources, forums, and community discussions related to the specific GPU type you are considering. This can provide insights into common issues, best practices, and optimizations.

Remember that there is no one-size-fits-all answer, and the right GPU Instance type will depend on your workload’s unique requirements and budget. It is important that you regularly reassess your choice as your workload evolves. Depending on which type best fits your evolving tasks, you can easily migrate from one GPU Instance type to another.
Expand Down
Loading
Loading