Skip to content

improve the vram calculator#38

Open
Yuyz0112 wants to merge 1 commit intomainfrom
yz-calc
Open

improve the vram calculator#38
Yuyz0112 wants to merge 1 commit intomainfrom
yz-calc

Conversation

@Yuyz0112
Copy link
Contributor

Please check the internal design doc.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR centralizes inference and vLLM parameters into configuration constants, refactors core VRAM calculation functions to use those constants, and removes the CPU resources UI and related CPU calculations.

  • Introduce INFERENCE_CONFIG and VLLM_CONFIG to manage empirical parameters
  • Refactor calculateKVCache, calculateActivations, calculateFrameworkOverhead, calculateMultiDeviceOverhead, and estimateGenerationSpeed to use new configs
  • Remove CPU memory/cores calculation logic and the corresponding UI section

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/pages/vram-calculator/VramCalculatorPage.tsx Remove CPU resources block; drop unused type & Label imports
src/lib/vram-calculator.ts Add config constants, refactor calculation functions, remove CPU logic
Comments suppressed due to low confidence (3)

src/lib/vram-calculator.ts:122

  • Consider adding unit tests for calculateKVCache to cover scenarios with different effective batch sizes based on VLLM_CONFIG.ACTIVE_USER_RATIO and MEMORY_POOL_OVERHEAD.
function calculateKVCache(

src/lib/vram-calculator.ts:136

  • [nitpick] The variable name cachePerToken is somewhat generic; consider renaming it to kvCachePerToken to clarify that it refers to KV cache byte size per token.
  const cachePerToken =

src/pages/vram-calculator/VramCalculatorPage.tsx:557

  • Since the CPU resources section has been removed, consider cleaning up related translation keys (e.g., pages.vramCalculator.sections.cpuResources) to avoid unused localization entries.
            <div className="border border-border rounded-lg p-6">

Comment on lines 201 to 207
if (availableVramPerGpu >= 80) {
baseOverhead = 2.0;
baseOverhead = 2.0; // H100, A100 80GB
} else if (availableVramPerGpu >= 40) {
baseOverhead = 1.5;
baseOverhead = 1.5; // A100 40GB, A6000
} else if (availableVramPerGpu >= 24) {
baseOverhead = 1.2;
baseOverhead = 1.2; // RTX 4090, RTX 3090
} else if (availableVramPerGpu >= 16) {
Copy link

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider extracting the GPU memory tier thresholds (80, 40, 24, 16) into a configuration object or enum to improve maintainability and avoid magic numbers.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@samuel032khoury samuel032khoury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants