Skip to content

GS: Backup texture to upper 8 bits of Z during format conversion (WIP)#13133

Draft
TJnotJT wants to merge 1 commit intoPCSX2:masterfrom
TJnotJT:gs-keepz32bits
Draft

GS: Backup texture to upper 8 bits of Z during format conversion (WIP)#13133
TJnotJT wants to merge 1 commit intoPCSX2:masterfrom
TJnotJT:gs-keepz32bits

Conversation

@TJnotJT
Copy link
Copy Markdown
Contributor

@TJnotJT TJnotJT commented Aug 13, 2025

Status: Waiting on this PR #13159 (refactors to make implmentations cleaner)

Draft PR to address issue noted here: #13037. Only OpenGL has been implemented and some other things are also unfinished (see below). Any feedback/criticism on whether this is a feasible approach is much appreciated.

All renderers except OpenGL are currently broken in the PR (including software since its uses Vulkan shaders).

Description of Changes

Add a new member to GSTexture to keep track of 32 bit Z when Z32 is converted to Z24, and the code needed to make sure that the upper 8 bits of Z are preserved when converting from Z24 back to Z32 or to C32/24.

Unfinished things:

  1. Conversions from Z24 -> Z32 to a 16 (rather than 32) bit format.
  2. Not sure yet how to handle clears, whether the upper 8 bits should be preserved or cleared also when Z buf is in Z24.
  3. Handling other conversions like Z32 -> Z24 then using upper 8 bits as P8H, etc. (not sure if games actually do this).

Rationale behind Changes

Allows preserving the upper 8 bits of Z that might help the issue above.

Suggested Testing Steps

Might affects a lot of code so testing any game for bugs (there are probably a few) would help. Haven't yet done a fully dump run.

Army Men - Major Malfunction_SLES-53996_20250806151204.zip

Master (OpenGL):
00102_f00001_fr-1_02800_C_16S

PR (OpenGL):
00102_f00001_fr-1_02800_C_16S

Did you use AI to help find, test, or implement this issue or feature?

GitHub copilot.

Copy link
Copy Markdown
Contributor

@lightningterror lightningterror left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a fan how the shader copy is handled, is there a more cleaner way?

ds->m_alpha_min = std::min(static_cast<u32>(ds->m_alpha_min), static_cast<u32>(m_vt.m_min.p.z) >> 24);
// Clamp to prevent overflow with float -> int conversion.
ds->m_alpha_max = std::max(static_cast<u32>(ds->m_alpha_max), std::clamp(static_cast<u32>(m_vt.m_max.p.z * 0x1.0fp-24), 0u, 255u));
ds->m_alpha_min = std::min(static_cast<u32>(ds->m_alpha_min), std::clamp(static_cast<u32>(m_vt.m_min.p.z * 0x1.0fp-24), 0u, 255u));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to convert this to a float operation? the original method worked fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly it doesn't make a difference in games but I was having issues with a homebrew. I used Z = 0xFFFFFFFFu on a sprite so the vertex trace had (float)0xFFFFFFFFu, which is outside the range of u32 so casting back to (u32)(float)0xFFFFFFFFu is undefined (iirc, 0 on my machine). I can revert it back though if that is preferable.

Copy link
Copy Markdown
Member

@refractionpcsx2 refractionpcsx2 Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is 0xFFFFFFFF out of range of u32? it's literally UINT_MAX. Also the values come in as unsigned integer. But why are you converting it to float? You didn't answer that, you could just shift it right 24 and cast it to u8 instead of u32, if you only want the top 8bits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0xFFFFFFFFu is not representable as a float so (float)0xFFFFFFFFu get rounded to 2^32, which is out of range.

All the XYZ values in the vertex trace like m_vt.m_max.p.z are floats. It would probably have been better if Z was stored as u32, but that's how it currently is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z is stored as float as a consequence of XY being stored as float and using GSVector4 and maybe aids normalization? idk, but it's really not representative of what really gets sent to the GS which is what we probably should be storing, maybe that needs changing to GSVector2 and z splitting out as a uint, but the struct would need updating if we didn't want to have to change every reference.

Copy link
Copy Markdown
Contributor Author

@TJnotJT TJnotJT Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not really sure why it was done that way. I did a search for m_vt.m_min.p and it looks like it shouldn't be too hard to split Z into its own uint as Z seems to always be accessed separately from XY in the GSVector4. Might have to leave XY as a GSVector4 though because there are a few uses of swizzling, upld(), etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's also true for software.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the only ones I could find in software:

GSVector4i bbox = GSVector4i(m_vt.m_min.p.floor().upld(m_vt.m_max.p.ceil()));
gd.sel.zoverflow = (u32)GSVector4i(m_vt.m_max.p).z == 0x80000000U;
gd.sel.zclamp = (u32)GSVector4i(m_vt.m_max.p).z > z_max;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So software was relying on overflow conversion failure, cool. lol

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, yeah seems like it.

// Convert a GL_FLOAT32 depth texture into a RGBA color texture
uint d = uint(sample_c().r * exp2(32.0f));
uint d2 = uint(sample_c2().r * exp2(8.0f));
SV_Target0 = vec4(uvec4((d & 0xFFu), ((d >> 8) & 0xFFu), ((d >> 16) & 0xFFu), clamp(d2, 0, 0xFFu))) / vec4(255.0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you storing the old alpha as depth? It would make more sense to store it in color since PSMT8H which is just the alpha channel of RGBA is an actual GS format, it would be more usable in the future.

Copy link
Copy Markdown
Contributor Author

@TJnotJT TJnotJT Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows the backup of the upper 8 bits of depth to be lazier in terms of texture copies. This way when Z32 is cast to Z24, there is a single texture copy to remove the upper 8 bits (as currently done), but I just keep the old Z32 texture as a backup instead of recycling it. If the texture is ever read back as C32, C24, etc., then we use both the upper 8 bits from the old Z32 and the lower 24 from the Z24 to get all the channels. I haven't yet added the code to convert to PSMT8H, but the same thing could be done in that case.

We could definitely save the upper 8 bits as a color texture though it would take an additional texture copy when casting Z32 to Z24. Let me know if this is preferable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite likely that the upper 8 bits may get used for something else and may get combined later from 8H, so it would make more sense to convert to/from that, than depth of upper 8 bits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will change it to that.

@TJnotJT
Copy link
Copy Markdown
Contributor Author

TJnotJT commented Aug 17, 2025

Not really a fan how the shader copy is handled, is there a more cleaner way?

Yea, the shader code and the StretchRect() code is a bit messy right now. I'll try to clean it up a bit. Anything specific you had in mind? One option is to add a single #define to the shader in GetShaderSource() to switch between single depth and double depth to avoid having defined(...) || defined (....) ... etc.

@lightningterror
Copy link
Copy Markdown
Contributor

I was thinking if possible to avoid touching the StretchRects since it looks messy now, or try to minimize it, could also be just on the final StretchRect.

@TJnotJT
Copy link
Copy Markdown
Contributor Author

TJnotJT commented Aug 17, 2025

I was thinking if possible to avoid touching the StretchRects since it looks messy now, or try to minimize it, could also be just on the final StretchRect.

Yea, some things can be localized to a single StretchRect(), preferably in GSDevice so that it doesn't have to be duplicated in all renderers.

I can also remove the extra argument to GSDeviceOGL::StretchRect() for using the upper-8-bit backup since that info is passed in the texture anyway.

@TJnotJT
Copy link
Copy Markdown
Contributor Author

TJnotJT commented Aug 17, 2025

Fixed the shader compilation issue with the int/uint overloading ambiguity so should hopefully compile on all drivers.

Still have to do the other fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants