Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 58 additions & 45 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,23 @@ Command-line flags are used to configure settings of the exporter which cannot b

All flags may be prefixed with either one hypen or two (i.e., both `-config.file` and `--config.file` are valid).

| Flag | Description | Default value |
| --- | --- | --- |
| `-listen-address` | Network address to listen to | `127.0.0.1:5000` |
| `-config.file` | Path to the configuration file | `config.yml` |
| `-log.format` | Output format of log messages. One of: [logfmt, json] | `json` |
| `-debug` | Log at debug level | `false` |
| `-fips` | Use FIPS compliant AWS API | `false` |
| `-cloudwatch-concurrency` | Maximum number of concurrent requests to CloudWatch API | `5` |
| `-cloudwatch-concurrency.per-api-limit-enabled` | Enables a concurrency limiter, that has a specific limit per CloudWatch API call. | `false` |
| `-cloudwatch-concurrency.list-metrics-limit` | Maximum number of concurrent requests to CloudWatch `ListMetrics` API. Only applicable if `per-api-limit-enabled` is `true`. | `5` |
| `-cloudwatch-concurrency.get-metric-data-limit` | Maximum number of concurrent requests to CloudWatch `GetMetricsData` API. Only applicable if `per-api-limit-enabled` is `true`. | `5` |
| `-cloudwatch-concurrency.get-metric-statistics-limit` | Maximum number of concurrent requests to CloudWatch `GetMetricStatistics` API. Only applicable if `per-api-limit-enabled` is `true`. | `5` |
| `-tag-concurrency` | Maximum number of concurrent requests to Resource Tagging API | `5` |
| `-scraping-interval` | Seconds to wait between scraping the AWS metrics | `300` |
| `-metrics-per-query` | Number of metrics made in a single GetMetricsData request | `500` |
| `-labels-snake-case` | Output labels on metrics in snake case instead of camel case | `false` |
| `-profiling.enabled` | Enable the /debug/pprof endpoints for profiling | `false` |
| Flag | Description | Default value |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind keeping the changes here to the minimum please?

| ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------- |
| `-listen-address` | Network address to listen to | `127.0.0.1:5000` |
| `-config.file` | Path to the configuration file | `config.yml` |
| `-log.format` | Output format of log messages. One of: [logfmt, json] | `json` |
| `-debug` | Log at debug level | `false` |
| `-fips` | Use FIPS compliant AWS API | `false` |
| `-cloudwatch-concurrency` | Maximum number of concurrent requests to CloudWatch API | `5` |
| `-cloudwatch-concurrency.per-api-limit-enabled` | Enables a concurrency limiter, that has a specific limit per CloudWatch API call. | `false` |
| `-cloudwatch-concurrency.list-metrics-limit` | Maximum number of concurrent requests to CloudWatch `ListMetrics` API. Only applicable if `per-api-limit-enabled` is `true`. | `5` |
| `-cloudwatch-concurrency.get-metric-data-limit` | Maximum number of concurrent requests to CloudWatch `GetMetricsData` API. Only applicable if `per-api-limit-enabled` is `true`. | `5` |
| `-cloudwatch-concurrency.get-metric-statistics-limit` | Maximum number of concurrent requests to CloudWatch `GetMetricStatistics` API. Only applicable if `per-api-limit-enabled` is `true`. | `5` |
| `-tag-concurrency` | Maximum number of concurrent requests to Resource Tagging API | `5` |
| `-scraping-interval` | Seconds to wait between scraping the AWS metrics | `300` |
| `-metrics-per-query` | Number of metrics made in a single GetMetricsData request | `500` |
| `-labels-snake-case` | Output labels on metrics in snake case instead of camel case | `false` |
| `-profiling.enabled` | Enable the /debug/pprof endpoints for profiling | `false` |

## YAML configuration file

Expand Down Expand Up @@ -91,8 +91,8 @@ type: <string>
roles:
[ - <role_config> ... ]

# List of Key/Value pairs to use for tag filtering (all must match).
# The key is the AWS Tag key and is case-sensitive
# List of Key/Value pairs to use for tag filtering (all must match).
# The key is the AWS Tag key and is case-sensitive
# The value will be treated as a regex
searchTags:
[ - <search_tags_config> ... ]
Expand All @@ -114,7 +114,7 @@ dimensionNameRequirements:
# This is useful for reducing the number of metrics returned by CloudWatch, which can be very large for some services. See AWS Cloudwatch API docs for [ListMetrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_ListMetrics.html) for more details.
[ recentlyActiveOnly: <boolean> ]

# Can be used to include contextual information (account_id, region, and customTags) on info metrics and cloudwatch metrics. This can be particularly
# Can be used to include contextual information (account_id, region, and customTags) on info metrics and cloudwatch metrics. This can be particularly
# useful when cloudwatch metrics might not be present or when using info metrics to understand where your resources exist
[ includeContextOnInfoMetrics: <boolean> ]

Expand All @@ -137,6 +137,14 @@ statistics:
# Export the metric with the original CloudWatch timestamp (General Setting for all metrics in this job)
[ addCloudwatchTimestamp: <boolean> ]

# Enables the inclusion of past metric data points from the CloudWatch response if available.
# This is useful when a metric is configured with a 60-second period and a 300-second duration, ensuring that all
# five data points are exposed at the metrics endpoint instead of only the latest one.
# Note: This option requires `addCloudwatchTimestamp` to be enabled.
# The metric destination must support out of order timestamps, see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tsdb
# (General Setting for all metrics in this job)
[ exportAllDataPoints: <boolean> ]

# List of metric definitions
metrics:
[ - <metric_config> ... ]
Expand All @@ -152,18 +160,18 @@ discovery:
kafka:
- Name
jobs:
- type: kafka
regions:
- eu-west-1
searchTags:
- key: env
value: dev
metrics:
- name: BytesOutPerSec
statistics:
- Average
period: 600
length: 600
- type: kafka
regions:
- eu-west-1
searchTags:
- key: env
value: dev
metrics:
- name: BytesOutPerSec
statistics:
- Average
period: 600
length: 600
```

### `static_job_config`
Expand All @@ -178,23 +186,19 @@ name: <string>
namespace: <string>

# List of AWS regions
regions:
[ - <string> ...]
regions: [- <string> ...]

# List of IAM roles to assume (optional)
roles:
[ - <role_config> ... ]
roles: [- <role_config> ...]

# Custom tags to be added as a list of Key/Value pairs
customTags:
[ - <custom_tags_config> ... ]
customTags: [- <custom_tags_config> ...]

# CloudWatch metric dimensions as a list of Name/Value pairs
dimensions: [ <dimensions_config> ]
dimensions: [<dimensions_config>]

# List of metric definitions
metrics:
[ - <metric_config> ... ]
metrics: [- <metric_config> ...]
```

Example config file:
Expand All @@ -208,15 +212,15 @@ static:
regions:
- eu-west-1
dimensions:
- name: AutoScalingGroupName
value: MyGroup
- name: AutoScalingGroupName
value: MyGroup
customTags:
- key: CustomTag
value: CustomValue
metrics:
- name: GroupInServiceInstances
statistics:
- Minimum
- Minimum
period: 60
length: 300
```
Expand Down Expand Up @@ -276,6 +280,14 @@ statistics:
# Export the metric with the original CloudWatch timestamp (General Setting for all metrics in this job)
[ addCloudwatchTimestamp: <boolean> ]

# Enables the inclusion of past metric data points from the CloudWatch response if available.
# This is useful when a metric is configured with a 60-second period and a 300-second duration, ensuring that all
# five data points are exposed at the metrics endpoint instead of only the latest one.
# Note: This option requires `addCloudwatchTimestamp` to be enabled.
# The metric destination must support out of order timestamps, see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tsdb
# (General Setting for all metrics in this job)
[ exportAllDataPoints: <boolean> ]

# List of metric definitions
metrics:
[ - <metric_config> ... ]
Expand Down Expand Up @@ -336,9 +348,10 @@ statistics:
```

Notes:

- Available statistics: `Maximum`, `Minimum`, `Sum`, `SampleCount`, `Average`, `pXX` (e.g. `p90`).

- Watch out using `addCloudwatchTimestamp` for sparse metrics, e.g from S3, since Prometheus won't scrape metrics containing timestamps older than 2-3 hours.
- Watch out using `addCloudwatchTimestamp` for sparse metrics, e.g from S3, since Prometheus won't scrape metrics containing timestamps older than 2-3 hours. Also the same applies when enabling `exportAllDataPoints` in any metric

### `exported_tags_config`

Expand Down
29 changes: 29 additions & 0 deletions examples/historic-data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: v1alpha1
discovery:
jobs:
- type: AWS/SQS
regions:
- us-east-1
period: 60
length: 300
addCloudwatchTimestamp: true
exportAllDataPoints: true
metrics:
- name: NumberOfMessagesSent
statistics: [Sum]
- name: NumberOfMessagesReceived
statistics: [Sum]
- name: NumberOfMessagesDeleted
statistics: [Sum]
- name: ApproximateAgeOfOldestMessage
statistics: [Average]
- name: NumberOfEmptyReceives
statistics: [Sum]
- name: SentMessageSize
statistics: [Average]
- name: ApproximateNumberOfMessagesNotVisible
statistics: [Sum]
- name: ApproximateNumberOfMessagesDelayed
statistics: [Sum]
- name: ApproximateNumberOfMessagesVisible
statistics: [Sum]
18 changes: 16 additions & 2 deletions pkg/clients/cloudwatch/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,26 @@ type ConcurrencyLimiter interface {
}

type MetricDataResult struct {
ID string
// A nil datapoint is a marker for no datapoint being found
ID string
Datapoints []DatapointWithTimestamp
}

type DatapointWithTimestamp struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatapointWithTimestamp seems redundant, can we drop the WithTimestamp suffix?

Also, please pick either DataPoint or Datapoint and change it consistently through the PR (including the config option).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a data point with a timestamp, there is already a data point without a timestamp. I'm happy for any suggestion of a better name. But I think this is the most expressive one and easy to understand. What do you suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it's more readable something like

type Datapoint struct {
  Value     *float64
  Timestamp time.Time
}

Datapoint *float64
Timestamp time.Time
}

func NewDataPoint(datapoint *float64, timestamp time.Time) DatapointWithTimestamp {
return DatapointWithTimestamp{
Timestamp: timestamp,
Datapoint: datapoint,
}
}

func SingleDataPoint(datapoint *float64, timestamp time.Time) []DatapointWithTimestamp {
return []DatapointWithTimestamp{NewDataPoint(datapoint, timestamp)}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helpers seems a bit of an overkill for such a small struct, can we avoid them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind, I believe this improves readability especially in the tests.


type limitedConcurrencyClient struct {
client Client
limiter ConcurrencyLimiter
Expand Down
9 changes: 5 additions & 4 deletions pkg/clients/cloudwatch/v1/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,10 +143,11 @@ func (c client) GetMetricData(ctx context.Context, getMetricData []*model.Cloudw
func toMetricDataResult(resp cloudwatch.GetMetricDataOutput) []cloudwatch_client.MetricDataResult {
output := make([]cloudwatch_client.MetricDataResult, 0, len(resp.MetricDataResults))
for _, metricDataResult := range resp.MetricDataResults {
mappedResult := cloudwatch_client.MetricDataResult{ID: *metricDataResult.Id}
if len(metricDataResult.Values) > 0 {
mappedResult.Datapoint = metricDataResult.Values[0]
mappedResult.Timestamp = *metricDataResult.Timestamps[0]
mappedResult := cloudwatch_client.MetricDataResult{
ID: *metricDataResult.Id,
Datapoints: make([]cloudwatch_client.DatapointWithTimestamp, 0, len(metricDataResult.Timestamps))}
for i := 0; i < len(metricDataResult.Timestamps); i++ {
mappedResult.Datapoints = append(mappedResult.Datapoints, cloudwatch_client.NewDataPoint(metricDataResult.Values[i], *metricDataResult.Timestamps[i]))
}
output = append(output, mappedResult)
}
Expand Down
18 changes: 14 additions & 4 deletions pkg/clients/cloudwatch/v1/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,13 @@ func Test_toMetricDataResult(t *testing.T) {
},
},
expectedMetricDataResults: []cloudwatch_client.MetricDataResult{
{ID: "metric-1", Datapoint: aws.Float64(1.0), Timestamp: ts.Add(10 * time.Minute)},
{ID: "metric-2", Datapoint: aws.Float64(2.0), Timestamp: ts},
{ID: "metric-1", Datapoints: []cloudwatch_client.DatapointWithTimestamp{
cloudwatch_client.NewDataPoint(aws.Float64(1.0), ts.Add(10*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(2.0), ts.Add(5*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(3.0), ts),
},
},
{ID: "metric-2", Datapoints: cloudwatch_client.SingleDataPoint(aws.Float64(2.0), ts)},
},
},
{
Expand All @@ -88,8 +93,13 @@ func Test_toMetricDataResult(t *testing.T) {
},
},
expectedMetricDataResults: []cloudwatch_client.MetricDataResult{
{ID: "metric-1", Datapoint: aws.Float64(1.0), Timestamp: ts.Add(10 * time.Minute)},
{ID: "metric-2", Datapoint: nil, Timestamp: time.Time{}},
{ID: "metric-1", Datapoints: []cloudwatch_client.DatapointWithTimestamp{
cloudwatch_client.NewDataPoint(aws.Float64(1.0), ts.Add(10*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(2.0), ts.Add(5*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(3.0), ts)},
},
{ID: "metric-2",
Datapoints: []cloudwatch_client.DatapointWithTimestamp{}},
},
},
}
Expand Down
10 changes: 6 additions & 4 deletions pkg/clients/cloudwatch/v2/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,12 @@ func (c client) GetMetricData(ctx context.Context, getMetricData []*model.Cloudw
func toMetricDataResult(resp cloudwatch.GetMetricDataOutput) []cloudwatch_client.MetricDataResult {
output := make([]cloudwatch_client.MetricDataResult, 0, len(resp.MetricDataResults))
for _, metricDataResult := range resp.MetricDataResults {
mappedResult := cloudwatch_client.MetricDataResult{ID: *metricDataResult.Id}
if len(metricDataResult.Values) > 0 {
mappedResult.Datapoint = &metricDataResult.Values[0]
mappedResult.Timestamp = metricDataResult.Timestamps[0]
mappedResult := cloudwatch_client.MetricDataResult{
ID: *metricDataResult.Id,
Datapoints: make([]cloudwatch_client.DatapointWithTimestamp, 0, len(metricDataResult.Timestamps)),
}
for i := 0; i < len(metricDataResult.Timestamps); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this new behaviour should be used only if exportAllDataPoints is enabled. If not, we should keep the previous behaviour and avoid copying around the additional values/timestamps from the CW response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required because this function is not aware of any configuration. This is in my opinion the cleaned implementation. See comments from @kgeckhart in the linked pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Kyle's comment was mainly around allowing the client to return a slice of Datapoint + Timestamp, but he can confirm.

My concern is around useless mem allocations when exportAllDataPoints is not enabled. I think the option can be made available to the aws client, e.g. via CloudwatchData or GetMetricDataProcessingParams.

Copy link
Contributor Author

@woehrl01 woehrl01 Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure there is no significant useless memory allocation. All the values are already in memory because it's part of the api response. So we just allocate the slice a bit bigger, because we use a struct of slice and not a reference to the struct slice, there is a single memory alloc for the underlying array, everything else is just copying over values. So this just keeps the already returned data for a few function calls longer in memory, to unify the struct across v1 and v2.

Feels like an unessary add of complexity, to pass around those parameters and adding additional tests for validate that behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my earlier comment: even if we assume returning a slice of 5 structs increases memory usage, the overhead is still reasonable, around ~128 MB for 1 million metrics. That comes from storing 4 additional data points per metric, each at 32 bytes (a time.Time value + a *float64 pointer). It's just a larger backing array and a few more pointers to scan during GC. This only becomes relevant if we're allocating and retaining millions of these slices, which we're not.

And if we were dealing with that many, querying 1 million CloudWatch metrics, the real issue wouldn't be memory. Even with 5 data points per request, that's 200K metric fetches, costing ~$2 per query. At that scale, CloudWatch costs and API limits are a far bigger concern than a ~100 MB memory difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run YACE in a stateless multi-tenant fashion so querying a very large amount of metrics in a short period of time often happens. I was concerned about the potential memory overhead for this change and had shared that with Cristian.

This wasn't the area I was concerned about initially as we keep our length + period setup to only produce a single datapoint as much as possible. I do agree with Cristian, if we stick to only mapping a single datapoint when the setting is disabled it will ensure there's a minimal overhead to those who upgrade to latest without using the feature.

I was primarily concerned about overhead from switching to a single datapoint -> a slice for our use case. I don't think we should do anything about it now. I wrote a OneOrMany[T] that benchmarks nicely vs a single entry slice and would PR it separately if needed.

Copy link
Contributor Author

@woehrl01 woehrl01 Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgeckhart I just blindly merged yesterday your suggested changes, to get this PR of my plate. Still I had a look today, about what you changed around your memory concerns. Looking at the current changes, I don't see any change which would reduce memory usage at all. Despite your suggestion, it still creates a slice everytime and it always creates it with full size for all data points. There is now just added complexity around passing over a flag, to stop the loop early, which you could argue reduced CPU overhead, but is likely neglectable at the scale where the cloudwatch costs would explode.

Before we are moving into premature optimisations, have you actually measured and proved that this actually is an issue? (see my comments above why I doubt that based on numbers). You mention that you are running this at large scale. Maybe you can run the initial PR, side by side for 1-5% of your workload (depending on the scale) and share some realworld memory impact?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woehrl01 thanks for taking a look at it and providing feedback.

Despite your suggestion, it still creates a slice everytime and it always creates it with full size for all data points.

This was an oversight on my part as I was hastily making the change before I needed to go to an event yesterday 😅. My intention was to make sure to only allocate exactly what is necessary and nothing more which I can do with another minor change

There is now just added complexity around passing over a flag, to stop the loop early, which you could argue reduced CPU overhead, but is likely neglectable at the scale where the cloudwatch costs would explode.

IMO the added complexity is rather minimal and easily testable which is why I went through with the change.

Before we are moving into premature optimisations, have you actually measured and proved that this actually is an issue? (see my comments above why I doubt that based on numbers). You mention that you are running this at large scale. Maybe you can run the initial PR, side by side for 1-5% of your workload (depending on the scale) and share some realworld memory impact?

When Cristian brought it up I didn't quite get it at first because we do run it at scale but we do it in such a way that we should only ever get one data point back. We do this intentionally for performance reasons, why ask for data you won't use? I don't know if it was Cristian's intent but my realization was that introducing this feature should not present a noticeable negative impact on the larger community just by upgrading. This exporter is embedded in to places like https://grafana.com/docs/alloy/latest/reference/components/prometheus/prometheus.exporter.cloudwatch/ which I know is used by customers at a scale large enough to incur some rather hefty CloudWatch costs.

Going from a single data point to a slice presents some memory increase that is unlikely to be noticeable for most. We don't know how many people are setting up their configs with a length that is larger than their period (length > period = more than 1 data point) and to what degree they are doing it (2x, 3x, 10x?). This is part of the complexity of building the configs CloudWatch has some incredibly odd behaviors for different metrics so the configs get cargo culted in a way that is not optimal but works.

Could this guard be unnecessary, yes but the complexity of adding it feels acceptable as a means to try to provide a stable experience to existing users.

Since I messed up the DCO + need another change we will have to figure out what to do with this PR. But first it would probably be good to have @cristiangreco take a look to make sure there's nothing further.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgeckhart @cristiangreco what's the plan with this PR?
Don't get me wrong, I understand we're all busy and this is OSS after all. So no pressure here please... ;) I would just highly appreciate some short indication of you guys if and when you plan to move ahead with this PR to be able to make plans on our end as well.

From what I see, this PR seems to be waiting on some action from your side and there is nothing really left the community could support with? If that is wrong and there is still something todo, please let me know, I'm happy to contribute as well.

mappedResult.Datapoints = append(mappedResult.Datapoints, cloudwatch_client.NewDataPoint(&metricDataResult.Values[i], metricDataResult.Timestamps[i]))
}
output = append(output, mappedResult)
}
Expand Down
17 changes: 13 additions & 4 deletions pkg/clients/cloudwatch/v2/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,13 @@ func Test_toMetricDataResult(t *testing.T) {
},
},
expectedMetricDataResults: []cloudwatch_client.MetricDataResult{
{ID: "metric-1", Datapoint: aws.Float64(1.0), Timestamp: ts.Add(10 * time.Minute)},
{ID: "metric-2", Datapoint: aws.Float64(2.0), Timestamp: ts},
{ID: "metric-1", Datapoints: []cloudwatch_client.DatapointWithTimestamp{
cloudwatch_client.NewDataPoint(aws.Float64(1.0), ts.Add(10*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(2.0), ts.Add(5*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(3.0), ts),
},
},
{ID: "metric-2", Datapoints: cloudwatch_client.SingleDataPoint(aws.Float64(2.0), ts)},
},
},
{
Expand All @@ -73,8 +78,12 @@ func Test_toMetricDataResult(t *testing.T) {
},
},
expectedMetricDataResults: []cloudwatch_client.MetricDataResult{
{ID: "metric-1", Datapoint: aws.Float64(1.0), Timestamp: ts.Add(10 * time.Minute)},
{ID: "metric-2", Datapoint: nil, Timestamp: time.Time{}},
{ID: "metric-1", Datapoints: []cloudwatch_client.DatapointWithTimestamp{
cloudwatch_client.NewDataPoint(aws.Float64(1.0), ts.Add(10*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(2.0), ts.Add(5*time.Minute)),
cloudwatch_client.NewDataPoint(aws.Float64(3.0), ts),
}},
{ID: "metric-2", Datapoints: []cloudwatch_client.DatapointWithTimestamp{}},
},
},
}
Expand Down
19 changes: 18 additions & 1 deletion pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ type JobLevelMetricFields struct {
Delay int64 `yaml:"delay"`
NilToZero *bool `yaml:"nilToZero"`
AddCloudwatchTimestamp *bool `yaml:"addCloudwatchTimestamp"`
ExportAllDataPoints *bool `yaml:"exportAllDataPoints"`
}

type Job struct {
Expand Down Expand Up @@ -99,6 +100,7 @@ type Metric struct {
Delay int64 `yaml:"delay"`
NilToZero *bool `yaml:"nilToZero"`
AddCloudwatchTimestamp *bool `yaml:"addCloudwatchTimestamp"`
ExportAllDataPoints *bool `yaml:"exportAllDataPoints"`
}

type Dimension struct {
Expand Down Expand Up @@ -154,7 +156,7 @@ func (c *ScrapeConf) Load(file string, logger *slog.Logger) (model.JobsConfig, e

func (c *ScrapeConf) Validate(logger *slog.Logger) (model.JobsConfig, error) {
if c.Discovery.Jobs == nil && c.Static == nil && c.CustomNamespace == nil {
return model.JobsConfig{}, fmt.Errorf("At least 1 Discovery job, 1 Static or one CustomNamespace must be defined")
return model.JobsConfig{}, fmt.Errorf("at least 1 Discovery job, 1 Static or one CustomNamespace must be defined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: this change is unrelated to the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It unrelated but is a lint error. Should I revert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What lint error is this solving?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard golang lint error that an error text should start with a lowercase letter.

}

if c.Discovery.Jobs != nil {
Expand Down Expand Up @@ -387,6 +389,19 @@ func (m *Metric) validateMetric(logger *slog.Logger, metricIdx int, parent strin
}
}

mExportAllDataPoints := m.ExportAllDataPoints
if mExportAllDataPoints == nil {
if discovery != nil && discovery.ExportAllDataPoints != nil {
mExportAllDataPoints = discovery.ExportAllDataPoints
} else {
mExportAllDataPoints = aws.Bool(false)
}
}

if aws.BoolValue(mExportAllDataPoints) && !aws.BoolValue(mAddCloudwatchTimestamp) {
return fmt.Errorf("Metric [%s/%d] in %v: ExportAllDataPoints can only be enabled if AddCloudwatchTimestamp is enabled", m.Name, metricIdx, parent)
}

if mLength < mPeriod {
return fmt.Errorf(
"Metric [%s/%d] in %v: length(%d) is smaller than period(%d). This can cause that the data requested is not ready and generate data gaps",
Expand All @@ -398,6 +413,7 @@ func (m *Metric) validateMetric(logger *slog.Logger, metricIdx int, parent strin
m.Delay = mDelay
m.NilToZero = mNilToZero
m.AddCloudwatchTimestamp = mAddCloudwatchTimestamp
m.ExportAllDataPoints = mExportAllDataPoints
m.Statistics = mStatistics

return nil
Expand Down Expand Up @@ -519,6 +535,7 @@ func toModelMetricConfig(metrics []*Metric) []*model.MetricConfig {
Delay: m.Delay,
NilToZero: aws.BoolValue(m.NilToZero),
AddCloudwatchTimestamp: aws.BoolValue(m.AddCloudwatchTimestamp),
ExportAllDataPoints: aws.BoolValue(m.ExportAllDataPoints),
})
}
return ret
Expand Down
Loading