Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
f51cf78
- installed otel libraries
prajwalvathreya Nov 19, 2024
bd6e4ee
- added tracing to the grpc and https metrics endpoint process
prajwalvathreya Nov 19, 2024
6581860
- created new tracker.go file in package metrics for tracing
prajwalvathreya Nov 19, 2024
093b7b0
- added tracing elements to create volume function for testing
prajwalvathreya Nov 19, 2024
1f520fb
- fixed lint errors
prajwalvathreya Nov 20, 2024
af9fd1d
- fixed import error from IDE
prajwalvathreya Nov 20, 2024
d5b3a39
- updated tracing to store into otel-collector backend
prajwalvathreya Nov 20, 2024
baf165c
- updated libraries
prajwalvathreya Nov 25, 2024
0bc2eb9
- added check for pre-existing span
prajwalvathreya Nov 25, 2024
b359270
- upgraded InitOtelTracing function to incorporate resources
prajwalvathreya Nov 25, 2024
d8966d4
- initilizeing tracer in driver.go
prajwalvathreya Nov 25, 2024
d59fbbd
- updated createvolume functions with new span records
prajwalvathreya Nov 25, 2024
eedcc6d
- added WithInsecure() flag to allow connection without TLS handshake…
prajwalvathreya Nov 25, 2024
e3e036f
Merge branch 'main' into otel-tracing
prajwalvathreya Nov 25, 2024
4a9b038
- installed otel libraries
prajwalvathreya Nov 19, 2024
287b1c9
- added tracing to the grpc and https metrics endpoint process
prajwalvathreya Nov 19, 2024
3e23b60
- created new tracker.go file in package metrics for tracing
prajwalvathreya Nov 19, 2024
0bd34d6
- added tracing elements to create volume function for testing
prajwalvathreya Nov 19, 2024
13b86ca
- fixed lint errors
prajwalvathreya Nov 20, 2024
f86a5c9
- fixed import error from IDE
prajwalvathreya Nov 20, 2024
0c3fe85
- updated tracing to store into otel-collector backend
prajwalvathreya Nov 20, 2024
d9609f9
- updated libraries
prajwalvathreya Nov 25, 2024
f82fb39
- added check for pre-existing span
prajwalvathreya Nov 25, 2024
38ce8b8
- upgraded InitOtelTracing function to incorporate resources
prajwalvathreya Nov 25, 2024
d552f05
- initilizeing tracer in driver.go
prajwalvathreya Nov 25, 2024
287caaf
- updated createvolume functions with new span records
prajwalvathreya Nov 25, 2024
21dcb86
- added WithInsecure() flag to allow connection without TLS handshake…
prajwalvathreya Nov 25, 2024
1d6b42d
Merge remote-tracking branch 'origin/otel-tracing' into otel-tracing
prajwalvathreya Nov 25, 2024
934b7c5
- fixed lint error
prajwalvathreya Nov 25, 2024
52d880b
- switched form grpc endpoint to http endpoint
prajwalvathreya Nov 25, 2024
ff14322
- removed http from the url while setting endpoint
prajwalvathreya Nov 25, 2024
8a81f5a
- updated service name
prajwalvathreya Nov 25, 2024
fc6f9b3
- added sub function calls to be stored in trace
prajwalvathreya Nov 26, 2024
b801c38
- added child span to record function to ensure all subfunction calls…
prajwalvathreya Nov 26, 2024
9d76759
- moved subfunction call tracing to within function
prajwalvathreya Nov 27, 2024
85eb934
- updated charts
prajwalvathreya Nov 27, 2024
efded8a
- added entry point for tracing in main.go
prajwalvathreya Nov 27, 2024
c7eb310
- utilized tracing variables to conditionally enable tracing for func…
prajwalvathreya Nov 27, 2024
81cd3f7
- refactored to move everything to tracker.go package
prajwalvathreya Nov 27, 2024
fbda586
- added more resource attributes
prajwalvathreya Nov 27, 2024
d2ec23e
- added tracing for delete volume
prajwalvathreya Nov 27, 2024
7a7da77
- removed finish child span processes
prajwalvathreya Nov 27, 2024
b53cd02
Merge branch 'main' into otel-tracing
prajwalvathreya Nov 27, 2024
c8c278a
- fixed pointer dereferencing
prajwalvathreya Nov 27, 2024
93f3294
Merge remote-tracking branch 'origin/otel-tracing' into otel-tracing
prajwalvathreya Nov 27, 2024
e3b76ca
- fixed pointer dereferencing v2
prajwalvathreya Nov 27, 2024
3128595
- added tracing for publish and unpublish volume
prajwalvathreya Nov 28, 2024
ecf4953
- added tracing to remainign functions in controller and nodeserver
prajwalvathreya Dec 2, 2024
44df3af
- testing span depth
prajwalvathreya Dec 2, 2024
e4b7da8
- removed duplicate parent spans
prajwalvathreya Dec 2, 2024
8d41ec8
- removed duplicate parent spans
prajwalvathreya Dec 2, 2024
5d1cc70
- added tracing to functions with linodego calls
prajwalvathreya Dec 2, 2024
6e3ea37
- added more subfunction tracing
prajwalvathreya Dec 3, 2024
737261e
Merge branch 'main' into otel-tracing
prajwalvathreya Dec 3, 2024
de32e9e
- updated go mod
prajwalvathreya Dec 3, 2024
72ed48e
- added check for duplicate spans
prajwalvathreya Dec 3, 2024
f1eaf5f
- update check for duplicate spans
prajwalvathreya Dec 3, 2024
6fa824f
- updated package name from metrics to observability
prajwalvathreya Dec 3, 2024
f543746
- manually creating a parent span
prajwalvathreya Dec 3, 2024
81e5f5d
- manually creating a parent span
prajwalvathreya Dec 3, 2024
ec5c12d
- updated formatting
prajwalvathreya Dec 4, 2024
0568a43
- added SkipObservability flag for tests to isolate testing
prajwalvathreya Dec 4, 2024
cc86eef
- test to check is base span is created
prajwalvathreya Dec 4, 2024
bc21af4
- refactored span creation to manual spans
prajwalvathreya Dec 5, 2024
0516a72
- added else spans
prajwalvathreya Dec 5, 2024
b2ad1db
- updated new spans for each function call
prajwalvathreya Dec 5, 2024
7b73740
- updated serialzerequest to seralize obj
prajwalvathreya Dec 5, 2024
d09f579
- removing context
prajwalvathreya Dec 5, 2024
ded76e2
- updated spans
prajwalvathreya Dec 5, 2024
2e7b7f0
- reverted nodeserver.go to original file, with change from metrics p…
prajwalvathreya Dec 5, 2024
d0db8fe
- working span handling in createvolume function in controllerserver.go
prajwalvathreya Dec 5, 2024
f7e8401
- updated validate create volume request
prajwalvathreya Dec 10, 2024
70f44d5
- reverted files to main branch
prajwalvathreya Dec 10, 2024
e295030
- updated package tracker to store observability data more effectively
prajwalvathreya Dec 10, 2024
793c150
- added custom grpc function parameter storage
prajwalvathreya Dec 10, 2024
5201306
- added spans to level 2 functions
prajwalvathreya Dec 11, 2024
1505cd1
Merge branch 'main' into otel-tracing
prajwalvathreya Dec 11, 2024
4376ae5
- updaed go mod
prajwalvathreya Dec 11, 2024
8fc5b09
- added skip observability to avoid nil pointer span creations
prajwalvathreya Dec 11, 2024
a6f3624
- added files required to setup otel-collector and jaeger
prajwalvathreya Dec 11, 2024
5327167
- added make target for setting up tracing
prajwalvathreya Dec 12, 2024
ff71691
- added documentation for tracing
prajwalvathreya Dec 12, 2024
a3fbff1
Merge branch 'main' into otel-tracing
prajwalvathreya Dec 12, 2024
a14057e
- added links to README.md for tracing and metrics
prajwalvathreya Dec 13, 2024
3c43cd1
- added option for port-forwarding
prajwalvathreya Dec 13, 2024
8630729
- added meaningful lines to improve doc
prajwalvathreya Dec 13, 2024
777287d
Merge branch 'main' into otel-tracing
prajwalvathreya Dec 13, 2024
cdfcd08
- go vet fix
prajwalvathreya Dec 13, 2024
42ec7d3
- updated script with retry logic
prajwalvathreya Dec 16, 2024
ca7d995
Merge branch 'main' into otel-tracing
prajwalvathreya Dec 16, 2024
4dce97b
- fixed go vet error
prajwalvathreya Dec 16, 2024
a4cd2e6
- updated doc
prajwalvathreya Dec 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@ linters:
- usestdlibvars
- varnamelen
- whitespace

disable:
- spancheck
presets:
- bugs
- unused
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,7 @@ install-grafana:
.PHONY: setup-dashboard
setup-dashboard:
KUBECONFIG=test-cluster-kubeconfig.yaml ./hack/setup-dashboard.sh --namespace=monitoring --dashboard-file=observability/metrics/dashboard.json

.PHONY: setup-tracing
setup-tracing:
KUBECONFIG=test-cluster-kubeconfig.yaml ./hack/setup-tracing.sh
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
- [Contributing](docs/contributing.md)
- [Observability](docs/observability.md)
- [Metrics](docs/metrics-documentation.md)
- [How to opt-in for Metrics](docs/observability.md#steps-to-opt-in-for-the-csi-driver-metrics)
- [Tracing](docs/tracing-documentation.md)
- [How to opt-in for Tracing](docs/observability.md#steps-to-opt-in-for-tracing-in-the-csi-driver)
- [License](#license)
- [Disclaimers](#-disclaimers)
- [Community](#-join-us-on-slack)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/example-images/tracing/create-volume.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/example-images/tracing/landing-page.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
79 changes: 75 additions & 4 deletions docs/observability.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Observability with Grafana Dashboard
# Observability for CSI Driver

This document explains how to use the `grafana-dashboard` make target to install and configure observability tools, including Prometheus and Grafana, on your Kubernetes cluster. The setup uses Helm charts to install Prometheus and Grafana, provides a Prometheus data source, and applies a Grafana dashboard configuration.
This document explains how to use the `grafana-dashboard` and `setup-tracing` make targets to install and configure observability tools.

## Prerequisites

Expand Down Expand Up @@ -32,7 +32,7 @@ helm template linode-csi-driver \
helm-chart/csi-driver --namespace kube-system > csi.yaml
```

### 2. Delete the Existing Release of the CSI Driver
### 2. Delete the Existing Release of the CSI Driver (Needed only if the CSI driver is already installed on your cluster)

Before applying the new configuration, you need to delete the current release of the Linode CSI driver. This step is necessary because the default CSI driver installation does not have metrics enabled, and Helm doesn’t handle changes to some components gracefully without a clean reinstall.

Expand Down Expand Up @@ -183,4 +183,75 @@ kubectl logs <grafana-pod-name> -n monitoring

This setup provides a quick and easy way to enable observability using Grafana dashboards, ensuring that you have visibility into your Kubernetes cluster and CSI driver operations.

---
---

## Steps to Opt-In for Tracing in the CSI Driver

To enable the tracing for the Linode CSI driver, follow the steps below. These steps involve exporting a new Helm template with tracing enabled, deleting the current CSI driver release, and applying the newly generated configuration.

### 1. Export the Helm Template for the CSI Driver with Tracing Enabled

First, you need to generate a new Helm template for the Linode CSI driver with the `enableTracing` flag set to `true`. You will also have to specify an address that isn't in use for the otel server to run on. By default, the port is set to `4318`.

```bash
helm template linode-csi-driver \
--set apiToken="${LINODE_API_TOKEN}" \
--set region="${REGION}" \
--set enableTracing="true" \
--set tracingPort="4318" \
helm-chart/csi-driver --namespace kube-system > csi.yaml
```

### 2. Delete the Existing Release of the CSI Driver (Needed only if the CSI driver is already installed on your cluster)

Before applying the new configuration, you need to delete the current release of the Linode CSI driver. This step is necessary because the default CSI driver installation does not have tracing enabled, and Helm doesn’t handle changes to some components gracefully without a clean reinstall.

```bash
kubectl delete -f csi.yaml --namespace kube-system
```

### 3. Apply the Newly Generated Template

Once the old CSI driver installation is deleted, you can apply the newly generated template that includes the tracing configuration.

```bash
kubectl apply -f csi.yaml
```

Now, that we have the configuration ready, we must install otel and jaeger to visualize the traces.

## Steps to Install otel and jaeger for visualizing traces

### 1. Run the Tracing setup

The make target `setup-tracing` installs `otel-collector` and `jaeger` for visualizing the traces.

```bash
make setup-tracing
```

### 2. Access the Jaeger Dashboard

Once the setup is complete, you can access the jaeger dashboard through the configured LoadBalancer service. After the setup script runs, the external IP of the LoadBalancer is printed, and you can access Jaeger by opening the following URL in your browser:

```
http://<LoadBalancer-EXTERNAL-IP>:16686
```

### 3. Development Setup (Optional)

In case you want to use Jaeger in a dev environment run the following port-forward command:

```bash
kubectl port-forward svc/jaeger-collector 16686:16686 -n kube-system
```

You can access jaeger now by opening the following URL in your browser:

```
http://localhost:16686
```

Note: If you have made changes to the port, ensure that you change them while running this command.

---
237 changes: 237 additions & 0 deletions docs/tracing-documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Using the Jaeger Dashboard for Linode CSI Driver

This guide provides a step-by-step explanation of how to use the Jaeger dashboard to analyze traces in the Linode CSI Driver. It includes visual examples for both the **landing page** and an example trace for the `createvolume` operation.

---

## 1. Accessing the Jaeger Dashboard

To access the Jaeger dashboard:
1. Open the Jaeger dashboard in your browser using the external IP (e.g., `http://<external-ip>:16686`).
2. The landing page will appear, providing options to search and analyze traces.

---

## 2. Landing Page Overview

The landing page is the first screen you see upon accessing the Jaeger dashboard. Here's an example:

**Example Landing Page Screenshot**:
![Landing Page](example-images/tracing/landing-page.jpg)

### Key Features of the Landing Page:
- **Search Panel**:
- **Service**: Select the service you want to analyze (e.g., `linode-csi-driver`).
- **Operation**: Choose a specific operation to filter traces, such as `createvolume` or `listvolumes`. By default, all operations are shown.
- **Tags**: Filter traces by tags like `http.status_code=200` or other metadata.
- **Lookback**: Select a time range for trace results (e.g., "Last Hour").
- **Max/Min Duration**: Specify duration filters for traces to focus on slow or fast requests.
- **Limit Results**: Set the maximum number of traces to display.

- **Results Table**:
- Lists all traces matching the search criteria.
- Displays the following information:
- **Service and Operation**: The service (e.g., `linode-csi-driver`) and the operation (e.g., `createvolume` or `listvolumes`).
- **Duration**: Total time taken by the trace.
- **Spans**: Number of sub-operations (spans) in the trace.
- **Timestamp**: The time the trace started.

### Example Analysis:
From the landing page example:
- Two traces are displayed:
1. **Trace ID: 042abeb**:
- **Operation**: `csi.v1.controller/createvolume`.
- **Duration**: `3.37s`.
- **Spans**: `9`.
2. **Trace ID: a039cb1**:
- **Operation**: `csi.v1.controller/listvolumes`.
- **Duration**: `77.35ms`.
- **Spans**: `1`.

To analyze a trace in detail, click on its row (e.g., `042abeb` for `createvolume`).

---

## 3. Viewing a Trace for `createvolume`

Clicking on a trace opens a detailed view of all operations (spans) involved in the request. Here's an example trace for `createvolume`:

**Example `createvolume` Trace Screenshot**:
![Create Volume Trace](example-images/tracing/create-volume.jpg)
![Create Volume Trace Continued](example-images/tracing/create-volume-continued.jpg)

### Trace View Key Features:
1. **Trace Timeline**:
- Visualizes the entire flow of the request as a timeline.
- Horizontal bars represent spans, showing the relative time and duration of each operation.
- The black line represents the critical path of the selected operation.
- Total trace duration is displayed at the top (e.g., `3.37s`).

2. **Service & Operation Breakdown**:
- Displays a hierarchical list of operations executed during the trace.
- **Parent Span**: Represents the top-level operation (e.g., `csi.v1.controller/createvolume`).
- **Child Spans**: Nested operations under the parent span.

### Example Breakdown:
For the `createvolume` trace:
- **Parent Span**:
- **Operation**: `csi.v1.controller/createvolume`.
- **Duration**: `3.37s`.
- Includes the following sub-operations:
1. **`validatecreatevolumerequest`**:
- **Duration**: `2µs`.
- **Purpose**: Validates the incoming request for required parameters.
2. **`preparevolumeparams`**:
- **Duration**: `2µs`.
- **Purpose**: Prepares necessary parameters for volume creation.
3. **`getcontentsourcevolume`**:
- **Duration**: `1µs`.
- **Purpose**: Retrieves existing content sources (if applicable).
4. **`createandwaitforvolume`**:
- **Duration**: `3.37s`.
- **Purpose**: Creates the volume in Linode and waits for the operation to complete.
- Sub-operations include:
- **`attemptcreatelinodevolume`**:
- **Duration**: `232.64ms`.
- **Purpose**: It checks for existing volumes with the same label and either returns the existing volume or creates a new one, optionally cloning from a source volume.
- **`createLinodeVolume`**:
- **Duration**: `155.47ms`.
- **Purpose**: creates a new Linode volume with the specified label, size, and tags. It returns the created volume or an error if the creation fails.
5. **`createvolumecontext`**:
- Prepares the context for the created volume and adds necessary attributes.
- - **Duration**: `4µs`.
6. **`preparecreatevolumeresponse`**:
- **Duration**: `4µs`.
- **Purpose**: Prepares the response to return to the caller.

---

# Updating spans to provide additional information

If you want to track additional information in a span, you can utilize the functions `TraceFunctionData` and `SerializeObject` in `pkg/observability/tracker.go` to your advantage.

## 1. `TraceFunctionData`: Tracing Function Calls

The `TraceFunctionData` function simplifies the process of tracing the behavior of your functions. It captures key information about function execution, including parameters, success or error status, and error details (if any).

### **Function Signature**

```go
func TraceFunctionData(span tracer.Span, operationName string, params map[string]string, err error) error
```

### **Key Features**
- **Span Attributes**:
- Adds key-value pairs from the `params` map as attributes to the span for better trace details.
- **Success or Error Handling**:
- Sets the span status to `codes.Ok` for successful execution or `codes.Error` for failures.
- Logs the result (`success` or `error`) along with the `operationName` and `params`.
- **Error Recording**:
- Captures error details in the span using `span.RecordError`.

### **Example Usage**

You can use `TraceFunctionData` in any function to add tracing with custom parameters:

```go
observability.TraceFunctionData(span, "ValidateCreateVolumeRequest", map[string]string{
"volume_name": req.GetName(),
"requestBody": observability.SerializeObject(req)}, err)
```

Here:
- `span`: The current tracing span.
- `"ValidateCreateVolumeRequest"`: The name of the operation being traced.
- `map[string]string`: A map of custom parameters to include in the trace. Add any details you want to capture, like volume names, request IDs, or serialized objects returned by API calls.
- `err`: The error object (if any) from the function being traced.

---

## 2. `SerializeObject`: Serializing Objects for Tracing

The `SerializeObject` function converts complex objects into JSON strings, making it easier to include them in trace parameters or logs.

### **Function Signature**

```go
func SerializeObject(obj interface{}) string
```

### **Key Features**
- Converts any object (`struct`, `map`, etc.) into a JSON string.
- Handles serialization errors gracefully and logs the issue.
- Useful for including large or complex objects in the trace parameters.

### **Example Usage**

You can serialize objects like a request body and append them to the `params` map:

```go
observability.TraceFunctionData(span, "CreateVolume", map[string]string{
"requestBody": observability.SerializeObject(req),
"volume_type": "block-storage",
}, nil)
```
Here:
- The request object `req` is serialized into a JSON string using `SerializeObject`.
- The serialized string is added to the `params` map as `"requestBody"`.

---

## 3. Adding tracing to a function

To integrate `TraceFunctionData` and `SerializeObject` into your function:

1. Create a Span:
- Use the `StartFunctionSpan` function from tracker.go to create a span at the beginning of your function
2. Capture Parameters:
- Use a `map[string] string` to include parameters you want to capture.
- Serialize objects using `SerializeObject` if needed.
3. Call `TraceFunctionData`:
- Pass the span, operation name, parameters, and any error to `TraceFunctionData` wherever necessary.

### **Example**

```go
func CreateVolumeRequest(ctx context.Context, req *csi.CreateVolumeRequest) error {
// Step 1: Create a Span
_, span := observability.StartFunctionSpan(ctx)
defer span.End() // Ensure the span ends when the function exits

// Step 2: Capture Parameters
// Initialize a map to hold custom trace parameters
params := map[string]string{
"volume_name": req.GetName(),
"capacity_range": observability.SerializeObject(req.GetCapacityRange()),
"parameters": observability.SerializeObject(req.GetParameters()),
}

// Simulate parameter validation
if req.GetName() == "" {
err := fmt.Errorf("volume name is missing")

// Step 3: Call TraceFunctionData with error
observability.TraceFunctionData(span, "ValidateCreateVolumeRequest", params, err)
return err
}

// On success
// Step 3: Call TraceFunctionData with no error
observability.TraceFunctionData(span, "CreateVolumeRequest", params, nil)
return nil
}
```

---

## Benefits of Using This Approach

- **Detailed Traces**:
- Include all relevant details about function execution, making it easier to debug issues.
- **Error Visibility**:
- Automatically records errors and logs them with context.
- **Flexibility**:
- Add or modify parameters dynamically based on your function's needs.
- **Serialization**:
- Handles complex objects seamlessly without additional manual string conversion.
---
Loading
Loading