Skip to content

Conversation

@softcookiepp
Copy link

As the title says, the creation of vk::ShaderModule objects has been moved from kp::Algorithm kp::Module, a new class.
This will make it easier to implement caching of vk::ShaderModule objects, as well as add aditional features for shader module metadata handling in the future.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done an initial review, seems like still a way to go, there's a few inconsistencies that need to be addressed, as well as caveats on existing approach and coding style. There also doesn't seem to be a fully complete implementation as this suggests an optionally owned resource but currently the PR does not introduce the functionaly to make it optionally owned. We also need to ensure new functionality is tested, especially in contexts where these resources are owned, and particularly assessing the memory management / lifecycle aspect.

@@ -0,0 +1,27 @@
#pragma once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is quite random, what is this here for - remove

/*
* getter for mShaderModule
*/
vk::ShaderModule& getShaderModule() { return mShaderModule; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be const and should not return a mutable reference

public:

/*
* Constructor accepting a device and a SPIR-V binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment conventions should align to rest of codebase - see manager.hpp

// the vulkan device; not owned by this object
std::weak_ptr<vk::Device> mDevice;

// the shader module handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous comment, see other classes for examples

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • needs clarity on ownerhsip as this is sometimes owned resource

* Wrapper for Vulkan's shader modules.
* The purpose of this is to manage the module lifetime, while
* building the groundwork for easily integrating things like
* SPIR-V reflection and multiple entry points in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite as description of class as of now not for future / groundwork

* building the groundwork for easily integrating things like
* SPIR-V reflection and multiple entry points in the future.
*/
class Module : public std::enable_shared_from_this<Module>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming convention here is not consistent, should be Shader as per file - module is a generic keyword

bool mFreeDescriptorSet = false;
std::shared_ptr<vk::ShaderModule> mShaderModule;
bool mFreeShaderModule = false;
std::shared_ptr<Module> mModule = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suggested as optionally owned resource but it's not handled as such, follow the same convention as per the rest of the optionally owned resources

src/Shader.cpp Outdated
spv.size());
vk::ShaderModuleCreateInfo shaderModuleInfo(vk::ShaderModuleCreateFlags(),
sizeof(uint32_t) * spv.size(), spv.data());
this->mDevice.lock()->createShaderModule(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't use device.lock throughout the codebase, let's ensure consitency

class Module : public std::enable_shared_from_this<Module>
{
// the vulkan device; not owned by this object
std::weak_ptr<vk::Device> mDevice;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device is not a weakptr as we need to ensure that this resource doesnt outlive the lifecylce by storing as strong reference and/or ensuring this is handled correctly

src/Shader.cpp Outdated
{
KP_LOG_DEBUG("Kompute Module destructor started");
KP_LOG_DEBUG("Kompute Module Destroying shader module");
if (!mDevice.expired() )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistency with other classes, including coding style

@softcookiepp
Copy link
Author

Alright, that should cover everything.

@softcookiepp softcookiepp requested a review from axsaucedo January 8, 2026 19:43
Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @softcookiepp - all tests are failing, let's make sure tests pass locally before re-submitting

@softcookiepp
Copy link
Author

Right now I cannot even get the tests to run on my system due to this issue: #445
I will likely have to fix this as well before merging; is that alright?

@axsaucedo
Copy link
Member

Oh right I see, you could also try running it with act https://github.com/nektos/act, as the CI runs with SwiftShader (https://github.com/KomputeProject/kompute/blob/master/docker-builders/Swiftshader.Dockerfile); or you can just set up SwiftShader, which allows you to run against CPU instead.

@softcookiepp
Copy link
Author

Thanks! Will do.

@softcookiepp
Copy link
Author

softcookiepp commented Jan 12, 2026

Alright, I got all the tests passing. I also changed the vk::Device queue family selection criteria from just vk::QueueFlagBits::eCompute to (vk::QueueFlagBits::eCompute | vk::QueueFlagBits::eTransfer). In practice most queue families supporting vk::QueueFlagBits::eCompute, will also support vk::QueueFlagBits::eTransfer, but there are always edge cases. It is never a good idea to assume the driver will take care of everything in Vulkan; the only reason the test with multiple queues passed at all is because you got lucky with the way your drivers handle it!

The queue family selection in general could use some reworking to prevent invalid API usage (see here: #445 (comment)) but I will leave that for another day.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further changes requested - some strange changes that don't have anything to do with the PR, let's avoid this. Let's also make sure that this code is tested by evaluating an Algo rebuild as this would trigger the full cycle.

Copy link
Member

@axsaucedo axsaucedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments but seems almost there. Are there any tests that we can add as well? Seems to me this segfault piece would benefit from one. We also need to make sure that the documentation is updated. Once these are resolved we should be able to merge. Thank you.

@axsaucedo
Copy link
Member

Thanks @softcookiepp - almost there, I just added a follow up reply to claify the tests

@axsaucedo
Copy link
Member

@softcookiepp just reviewed the last outsanding comment - looks good, nice one. Last remaining ones are: 1) ensure DCO is updated (if you click the job it will tell you how). 2) Review to assess if there's any further tests that need to be added, if none all good. 3) Ensure documentation is updated (currently there's automated docstrings for classes so needs to be added respectively - we can leave diagrams as a followup as these classes are surfaced for reuse. We should then be good to merge.

@softcookiepp
Copy link
Author

Sounds good. I don't think new tests are necessary yet. The issue with the destructor should already be implicitly tested everywhere else (since it no longer crashes.) I will update the docs, etc. soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants