Warp 10 output plugin: retry storms on token expiration causes DDoS-like behavior

### Relevant telegraf.conf

```toml
[agent]
debug = false
flush_interval = "45s"
hostname = "28349e03-4898-4d12-bb35-85cc48a1efc4"
interval = "60s"
metric_batch_size = 1000
metric_buffer_limit = 10000
quiet = true
round_interval = false

[global_tags]
datacenter = "par8"
deployment_id = "deployment_570baf0d-4f67-4e41-b252-22018943050d"
flavor_name = "M"
hypervisor = "hv-par8-018"
image_type = "rust"
image_variant = "rust"
instance_source = "apps"
vm_type = "volatile"
zone = "par"
[[inputs.conntrack]]

[[inputs.cpu]]
fieldexclude = ["time_*", "usage_idle"]
percpu = false
totalcpu = true

[[inputs.cpu]]
fieldinclude = ["usage_idle"]
interval = "45s"
percpu = false
totalcpu = true

[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs"]
interval = "5m"

[[inputs.exec]]
commands = ["echo 'series,application_name=lbaccess-logsarchiver user_application_name=\"lbaccess-logsarchiver\"'"]
data_format = "influx"
interval = "20m"
timeout = "5s"

[[inputs.exec]]
alias = "oomd"
commands = ["oomd -d"]
data_format = "json"
fieldexclude = ["oomd.dropin.*"]
interval = "20m"
name_override = "oomd"
timeout = "30s"

[[inputs.filestat]]
fieldinclude = ["md5_sum"]
files = ["/etc/passwd"]
interval = "1h"
md5 = true

[[inputs.http_response]]
follow_redirects = false
method = "GET"
response_timeout = "5s"
urls = ["http://127.0.0.1:8080"]

[inputs.http_response.headers]
Forwarded = "proto=https"
X-CleverCloud-Monitoring = "telegraf"
X-Forwarded-Proto = "https"

[[inputs.kernel]]
interval = "5m"

[[inputs.linux_sysctl_fs]]
fieldinclude = ["file-max"]
interval = "1h"

[[inputs.linux_sysctl_fs]]
fieldinclude = ["file-nr", "inode-nr", "inode-free-nr"]

[[inputs.mem]]
fieldexclude = ["available", "used_percent"]

[[inputs.mem]]
fieldinclude = ["available", "used_percent"]
interval = "45s"

[[inputs.net]]
fieldexclude = ["icmp_*", "icmpmsg_*", "ip_*", "tcp_*", "udp_*", "udplite_*"]

[[inputs.net_response]]
address = "127.0.0.1:8080"
protocol = "tcp"
timeout = "5s"

[[inputs.netstat]]

[[inputs.processes]]

[[inputs.procstat]]
cgroup = "system.slice/bas-deploy.service"
fieldinclude = ["pid_count"]
interval = "45s"

[[inputs.prometheus]]
metric_version = 2
response_timeout = "30s"
urls = ["http://localhost:8080/metrics"]

[[inputs.statsd]]
allowed_pending_messages = 100
delete_counters = false
name_prefix = "statsd."
service_address = "127.0.0.1:8125"

[[inputs.system]]
fieldinclude = ["load1", "uptime"]

[[inputs.system]]
fieldinclude = ["load1_per_cpu"]
interval = "45s"
[[outputs.warp10]]
print_error_body = true
token = "xxxx"
warp_url = "https://xxx"
```

### Logs from Telegraf

```text
I am sorry, but I did not save them.
```

### System info

Telegraf 1.36.5-r500 on Exherbo Linux 

### Docker

_No response_

### Steps to reproduce

1. Setup the environment with a Warp 10 standalone from docker
2. Create a very short lived token and paste it in the configuration above
3. Update the endpoint as well
4. See what happens...


### Expected behavior

To me, as it is achieved in other module, it must stop retrying indefinitely

### Actual behavior

It retries indefinitely, aggregating telemetry and create huge load issue on bandwidth and requests. 

### Additional info

Problem

  When a Warp10 API token expires or gets revoked, the warp10 output plugin retries indefinitely, creating a DDoS-like effect against the Warp10 platform.

  Impact

  At Clever Cloud, we experienced this issue when token renewal failed. The telegraf agents kept retrying metrics against the Warp10 endpoint, which was returning authentication errors (Invalid token, Token Expired, Token revoked). This created excessive load on the Warp10 platform as thousands of agents simultaneously hammered the API with requests that would never succeed.

  Root Cause

  The warp10 plugin treated all errors as retryable. When the API returned authentication failures, metrics were kept in the retry buffer and continuously re-sent instead of being dropped.

  Unlike REST APIs that use HTTP 401/403 status codes, Warp10 returns HTTP 500 with error details in the response body. The plugin wasn't parsing these responses to determine if errors were retryable.

  Affected Error Types

  Non-retryable errors (should drop metrics immediately):
  - Invalid token
  - Token Expired
  - Token revoked
  - Write token missing
  - Application suspended or closed

  Retryable errors (should keep in buffer):
  - Exceeded Monthly Active Data Streams limit
  - Exceeded Daily Data Points limit
  - broken pipe
  - Server unavailable (503)

  Workaround

  If experiencing this issue, the only workaround is to restart telegraf after fixing the token, or to disable the warp10 output until the token is valid again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warp 10 output plugin: retry storms on token expiration causes DDoS-like behavior #18118

Relevant telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Warp 10 output plugin: retry storms on token expiration causes DDoS-like behavior #18118

Description

Relevant telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions