Skip to content

Conversation

Copy link

Copilot AI commented Jan 8, 2026

Summary

VRAM statistics incorrectly reported high VRAM usage for non-GPU nodes on second and subsequent runs. Nodes like string, integer, and core_metadata showed 9.920G instead of 0.000G. Additionally, GPU nodes that reused PyTorch's cached memory incorrectly showed 0 VRAM usage intermittently. Finally, the RAM cache size was displaying as 0.00G instead of the calculated dynamic cache size.

Root causes:

  1. Original code: torch.cuda.max_memory_allocated() showed carryover from previous nodes
  2. Initial fix using only memory_allocated() showed 0 when nodes reused cached memory
  3. Second fix using only memory_reserved() showed 0 when reserved memory didn't increase
  4. RAM cache size: CacheStats.cache_size field was never populated with the actual cache size

Final Solution:

Track both memory_allocated() and memory_reserved() deltas and use the maximum to handle all scenarios:

# Before invocation
start_vram_allocated = torch.cuda.memory_allocated()
start_vram_reserved = torch.cuda.memory_reserved()
torch.cuda.reset_peak_memory_stats()

# After invocation
peak_vram_allocated = torch.cuda.max_memory_allocated()
peak_vram_reserved = torch.cuda.max_memory_reserved()

allocated_delta = peak_vram_allocated - start_vram_allocated
reserved_delta = peak_vram_reserved - start_vram_reserved

# Use maximum (ensuring non-negative) to capture all scenarios
peak_vram_bytes = max(0, allocated_delta, reserved_delta)
peak_vram_gb = peak_vram_bytes / GB

Populate RAM cache size when stats object is assigned:

@stats.setter
def stats(self, stats: CacheStats) -> None:
    self._stats = stats
    if self._stats is not None:
        self._stats.cache_size = self._ram_cache_size_bytes

This dual-tracking approach ensures accurate statistics for:

  • Non-GPU nodes: Both deltas are 0, correctly show 0 VRAM
  • GPU nodes reusing cache: allocated_delta captures usage even when reserved memory doesn't increase
  • GPU nodes allocating new memory: reserved_delta captures usage
  • No intermittent 0G readings: At least one delta always captures GPU usage
  • RAM cache: Correctly shows calculated cache size (e.g., "9.19/12.78G" instead of "9.19/0.00G")

The overall "VRAM in use" summary continues to use memory_allocated() to show actively-used memory (not cached memory).

Related Issues / Discussions

QA Instructions

Run multiple generations sequentially and examine server log output. Verify:

  • First run: Non-GPU nodes (string, integer, core_metadata) show ~0.000G VRAM
  • Second+ runs: Same nodes continue to show ~0.000G VRAM (not carrying over previous values)
  • GPU nodes (text_encoder, denoise, l2i) show consistent non-zero VRAM usage across all runs with no intermittent 0G readings
  • Cache high water mark shows correct format: "X.XX/Y.YYG" where Y.YY is the calculated RAM cache size (not 0.00G)

Merge Plan

N/A - Minimal change, no special merge considerations.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)
Original prompt

This section details on the original issue you should resolve

<issue_title>[bug]: VRAM usage statistics are wrong</issue_title>
<issue_description>### Is there an existing issue for this problem?

  • I have searched the existing issues

Install method

Invoke's Launcher

Operating system

Linux

GPU vendor

Nvidia (CUDA)

GPU model

No response

GPU VRAM

No response

Version number

v6.10.0

Browser

No response

System Information

No response

What happened

I ran several generations and looked in the server log output. At the end of generation, the system prints out performance information on each of the executed nodes, including the time used for each node and the amount of VRAM used by that node.

The first time I ran a generation, I got a display like this:

                    Node   Calls   Seconds  VRAM Used 
                        string       1    0.003s     0.000G 
                       integer       1    0.001s     0.000G 
                 core_metadata       1    0.000s     0.000G
          z_image_model_loader       1    0.001s     0.000G
          z_image_text_encoder       1   12.996s     4.594G
                       collect       1    0.001s     4.288G 
               z_image_denoise       1   21.837s    10.370G 
                   z_image_l2i       1    0.850s    12.190G 

This looks correct. The string, integer, core_metadata and z_image_model_loader nodes do not use GPU, so I expect zero consumption of VRAM.

However, the second and subsequent times I ran a generation, I got displays like this:

                          Node   Calls   Seconds  VRAM Used
                        string       1    0.000s     9.920G 
                       integer       1    0.000s     9.920G 
                 core_metadata       1    0.000s     9.920G 
          z_image_model_loader       1    0.000s     9.920G 
          z_image_text_encoder       1    0.000s     9.920G 
                       collect       1    0.000s     9.920G 
               z_image_denoise       1    6.219s    10.389G 
                   z_image_l2i       1    0.499s    12.047G                                                                         

This is not right. It looks like the routine that calculates VRAM is using values from the previous run.

What you expected to happen

See above.

How to reproduce the problem

Run generations two or more times and look at the log output.

Additional context

No response

Discord username

No response</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix incorrect VRAM usage statistics display Fix VRAM usage statistics showing incorrect values on subsequent runs Jan 8, 2026
Copilot AI requested a review from lstein January 8, 2026 04:11
@lstein
Copy link
Owner

lstein commented Jan 8, 2026

@copilot The statistics look better but they are still somewhat confusing to me. For example, the z_image_l2i node, which performs a latent to image operation, should always use VRAM. However on some runs it is consuming ~2GB, and on others the stats are showing it using 0GB. See these two consecutive runs:

                          Node   Calls   Seconds  VRAM Used
                        string       1    0.001s     0.000G
                       integer       1    0.001s     0.000G
                 core_metadata       1    0.000s     0.000G
          z_image_model_loader       1    0.001s     0.000G
          z_image_text_encoder       1    0.000s     0.000G
                       collect       1    0.001s     0.000G
               z_image_denoise       1   10.303s     0.469G
                   z_image_l2i       1    0.505s     2.128G

and

                          Node   Calls   Seconds  VRAM Used
                        string       1    0.001s     0.000G
                       integer       1    0.001s     0.000G
                 core_metadata       1    0.001s     0.000G
          z_image_model_loader       1    0.000s     0.000G
          z_image_text_encoder       1    0.988s     4.587G
                       collect       1    0.000s     0.000G
               z_image_denoise       1   10.301s     0.469G
                   z_image_l2i       1    0.550s     0.000G

Is there an explanation for this, or is it a bug?

Copy link
Author

Copilot AI commented Jan 8, 2026

@copilot The statistics look better but they are still somewhat confusing to me. For example, the z_image_l2i node, which performs a latent to image operation, should always use VRAM. However on som...

You're absolutely right - this was a bug in my initial fix. The issue was that I was using torch.cuda.memory_allocated() which tracks active tensor memory. When PyTorch's caching allocator reuses cached memory, memory_allocated() doesn't increase, causing nodes to incorrectly show 0 VRAM.

I've fixed this in commit 1a96765 by switching to torch.cuda.memory_reserved() instead. Reserved memory includes PyTorch's caching allocator, so it will correctly show VRAM usage even when nodes reuse cached memory blocks.

This should fix the z_image_l2i inconsistency - it will now consistently show its VRAM usage regardless of whether it's allocating new memory or reusing cached memory, while non-GPU nodes will still correctly show 0 VRAM.

@lstein
Copy link
Owner

lstein commented Jan 8, 2026

@copilot The other anomaly I'm seeing is that the last line of the RAM cache stats is reading:

Cache high water mark: 9.19/0.00G

The RAM cache should not be 0G. It is calculated dynamically at startup time and on my system the startup message reads

[ModelManagerService]::INFO --> [MODEL CACHE] Calculated model RAM cache size: 12775.81 MB. Heuristics applied: [1, 2].

So I expect to see 9.19/12.78G here. The RAM cache used to be hard-coded into the config file, and I think that when the cache was made dynamic its value was not propagated properly to the logging data structure. Please see if you can fix.

@lstein
Copy link
Owner

lstein commented Jan 8, 2026

@copilot commit 1a96765 doesn't seem to have had an effect. I'm still seeing the z_image_l2i node showing 0G VRAM usage at irregular intervals.

Copy link
Author

Copilot AI commented Jan 8, 2026

@copilot The other anomaly I'm seeing is that the last line of the RAM cache stats is reading:

Cache high water mark: 9.19/0.00G

...

You're correct - the RAM cache size wasn't being propagated to the stats. The issue was in the ModelCache.stats setter - when the CacheStats object was assigned, its cache_size field (which defaults to 0) was never populated with the actual _ram_cache_size_bytes from the model cache.

Fixed in commit 2d7cf97 by populating stats.cache_size = self._ram_cache_size_bytes in the setter. The cache high water mark should now correctly display as "9.19/12.78G" instead of "9.19/0.00G".

Copilot AI changed the title Fix VRAM usage statistics showing incorrect values on subsequent runs Fix VRAM and RAM cache usage statistics showing incorrect values Jan 8, 2026
Copy link
Author

Copilot AI commented Jan 8, 2026

@copilot commit 1a96765 doesn't seem to have had an effect. I'm still seeing the z_image_l2i node showing 0G VRAM usage at ...

You're right - using only memory_reserved() wasn't sufficient. The issue is that PyTorch may not always reserve new memory when reusing cached blocks, so the reserved delta could be 0 even when the node is actively using GPU memory.

I've implemented a better solution in commit 034be44: track both memory_allocated() and memory_reserved() deltas and use the maximum. This handles all scenarios:

  • When reusing cached memory: allocated_delta captures the usage
  • When allocating new memory: reserved_delta captures it
  • When no GPU work happens: both are 0

This should eliminate the intermittent 0G readings for GPU nodes like z_image_l2i.

@lstein
Copy link
Owner

lstein commented Jan 8, 2026

I have reverted back to the simple way of calculating VRAM usage, and changed the captions to indicate that it is "VRAM Changed" not absolute VRAM used (which is impractical to calculate given caching).

Copy link
Owner

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@lstein lstein force-pushed the copilot/fix-vram-usage-statistics branch from ed768f0 to d43b056 Compare January 8, 2026 17:05
@lstein lstein changed the base branch from lstein-master to main January 8, 2026 17:13
@lstein lstein changed the base branch from main to lstein-master January 8, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: VRAM usage statistics are wrong

2 participants