Skip to content

Conversation

@moko-poi
Copy link
Contributor

Summary

This PR addresses issue #8497 by adding comprehensive visibility into how Karpenter calculates kubeReserved and allocatable resources for instance types. This helps users, especially those using Custom AMI families, understand and troubleshoot capacity-related issues.

Changes

Code Changes

  • Added detailed V(1) logging in NewInstanceType() to track resource calculations
    • Logs instance type, AMI family, and pod configuration
    • Shows effective pods count used for kubeReserved calculation
    • Displays capacity, reserved, and allocatable values for CPU and memory
  • Refactored calculation logic for better readability and debuggability
    • Extracted effectivePods and kubeReservedResources as variables
    • Makes the calculation process more transparent

Documentation Changes

  • Added "Debugging kubeReserved Calculations" section to troubleshooting docs
    • Step-by-step guide for enabling and viewing verbose logs
    • Detailed explanation of each log field
    • Documentation of default kubeReserved calculation formulas
    • Practical example for Custom AMI users
  • Fixed metric name in existing documentation

Problem Solved

Issue #8497 - Problem 3: Lack of visibility

"Some way to know what Karpenter thinks the allocable an instance type has."

Users previously had no way to understand:

  • How Karpenter calculates kubeReserved for different instance types
  • Why their configured maxPods might not match the effective pod count
  • How Custom AMI configurations affect resource calculations

Impact

Testing

  • Code compiles successfully
  • Existing tests pass
  • Log output verified with verbose mode

Example Log Output

{
  "level": "info",
  "ts": "2025-12-23T19:30:52Z",
  "msg": "calculated instance type resources",
  "instance-type": "m5.large",
  "ami-family": "Custom",
  "max-pods-configured": 110,
  "effective-pods": 737,
  "uses-eni-limited-overhead": true,
  "capacity-memory": "8192Mi",
  "capacity-cpu": "2",
  "kube-reserved-memory": "8362Mi",
  "kube-reserved-cpu": "80m",
  "system-reserved-memory": "0",
  "system-reserved-cpu": "0",
  "allocatable-memory": "6.5Gi",
  "allocatable-cpu": "1920m"
}

Related Issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works (N/A - observability only)
  • New and existing unit tests pass locally with my changes

Screenshots/Recordings

N/A - This is a logging and documentation enhancement

- Add detailed V(1) logging for instance type resource calculations
- Log effective pod count, kube-reserved, and allocatable values
- Add documentation section for debugging kubeReserved calculations
- Include examples for Custom AMI configuration

This change provides visibility into how Karpenter calculates
allocatable resources, helping users troubleshoot capacity issues
especially with Custom AMI configurations.

Relates to aws#8497
@moko-poi moko-poi requested a review from a team as a code owner December 23, 2025 10:33
@moko-poi moko-poi requested a review from bwagner5 December 23, 2025 10:33
Copy link
Contributor

@DerekFrank DerekFrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really address #8497, and I am not sure that the amount of logs is worth it.


// Log kubeReserved calculation details for troubleshooting
allocatable := it.Allocatable()
log.FromContext(ctx).V(1).Info("calculated instance type resources",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this log a line for every single instance on startup? I'm not sure that kind of spam is useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've updated the code to only log for Custom AMI families where this troubleshooting info is most valuable. This avoids the log spam from hundreds of instance types at startup.

@DerekFrank
Copy link
Contributor

For the non-custom AMIs, we could instead put this information into the website. After we figure out the override behavior for #8497, we can log the result for custom amis only I think

@moko-poi
Copy link
Contributor Author

moko-poi commented Jan 6, 2026

@DerekFrank Thank you for the valuable feedback! You're absolutely right about the log volume concern and the relationship with #8497.

After reviewing the situation, I understand that:

  1. PR fix(kubereserved): allow custom AMI bypass and floor kubeReserved to … #8705 addresses Problems 1 & 2 (calculation logic fixes)
  2. This PR should focus on Problem 3 (visibility) as a complementary solution

I'll update this PR to:

Rationale:

  • For EKS-optimized AMIs (AL2, AL2023, Bottlerocket): Static formulas → documentation is sufficient
  • For Custom AMIs: Dynamic calculations based on user config → runtime logging is valuable for troubleshooting

@moko-poi
Copy link
Contributor Author

moko-poi commented Jan 6, 2026

Fixed in 27e46d3! Now only logs for Custom AMI families to avoid the spam:

if amiFamilyType == v1.AMIFamilyCustom {
    log.FromContext(ctx).V(1).Info("calculated instance type resources for Custom AMI", ...)
}

For EKS-optimized AMIs, I've added detailed documentation with formulas and examples instead.

@jigisha620
Copy link
Contributor

@moko-poi,
Even if this is specific for custom AMIs, I think on startup this will print log for every instance type. Can we move this to trace level verbosity?

@moko-poi
Copy link
Contributor Author

@jigisha620 Good point! I've updated the log level to V(2) in 4a49e45.

This ensures the logs are only visible when users explicitly need detailed troubleshooting with --v=2, avoiding log spam during normal operations even with Custom AMI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants