Skip to content

Implementing streamed tiles system for big georeferenced images #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
192 changes: 168 additions & 24 deletions 62_CAD/DrawResourcesFiller.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -631,20 +631,19 @@ bool DrawResourcesFiller::ensureMultipleStaticImagesAvailability(std::span<Stati
return true;
}

bool DrawResourcesFiller::ensureGeoreferencedImageAvailability_AllocateIfNeeded(image_id imageID, const GeoreferencedImageParams& params, SIntendedSubmitInfo& intendedNextSubmit)
bool DrawResourcesFiller::ensureGeoreferencedImageAvailability_AllocateIfNeeded(StreamedImageManager& manager, SIntendedSubmitInfo& intendedNextSubmit)
Copy link
Contributor

@Erfan-Ahmadi Erfan-Ahmadi Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I have a few requests about this.
This function should only care about the GPU allocation and the image itself and decide which resolution to create it with. (i.e. decide to go full res or streamed with tiles)

And you can know these with only 1. OriginalImageResolution 2. ViewportResolution 3. TileSize and nothing more

I don't see any need for the manager to go through here.
If this function is doing more than just figuring out what resolution to allocate the image in the vram with, then it's a sign something is wrong.


then when you're in addGeorefImage function, you can look up the cached image and see if it was decided to be "Stremead" or "FullRes" depending on that, you either request loading the whole shit from the loader OR you request tiles (through your tile manager or whatever)

{
auto* device = m_utilities->getLogicalDevice();
auto* physDev = m_utilities->getLogicalDevice()->getPhysicalDevice();

// Try inserting or updating the image usage in the cache.
// If the image is already present, updates its semaphore value.
auto evictCallback = [&](image_id imageID, const CachedImageRecord& evicted) { evictImage_SubmitIfNeeded(imageID, evicted, intendedNextSubmit); };
CachedImageRecord* cachedImageRecord = imagesCache->insert(imageID, intendedNextSubmit.getFutureScratchSemaphore().value, evictCallback);
CachedImageRecord* cachedImageRecord = imagesCache->insert(manager.georeferencedImageParams.imageID, intendedNextSubmit.getFutureScratchSemaphore().value, evictCallback);

// TODO: Function call that gets you image creaation params based on georeferencedImageParams (extents and mips and whatever), it will also get you the GEOREFERENED TYPE
// TODO: Function call that gets you image creaation params based on georeferencedImageParams (extents and mips and whatever), it will also get you the GEOREFERENCED TYPE
IGPUImage::SCreationParams imageCreationParams = {};
ImageType georeferenceImageType;
determineGeoreferencedImageCreationParams(imageCreationParams, georeferenceImageType, params);
determineGeoreferencedImageCreationParams(imageCreationParams, manager);

// imageParams = cpuImage->getCreationParameters();
imageCreationParams.usage |= IGPUImage::EUF_TRANSFER_DST_BIT|IGPUImage::EUF_SAMPLED_BIT;
Expand All @@ -671,11 +670,11 @@ bool DrawResourcesFiller::ensureGeoreferencedImageAvailability_AllocateIfNeeded(
const auto cachedImageType = cachedImageRecord->type;
// image type and creation params (most importantly extent and format) should match, otherwise we evict, recreate and re-pus
const auto currentParams = static_cast<asset::IImage::SCreationParams>(imageCreationParams);
const bool needsRecreation = cachedImageType != georeferenceImageType || cachedParams != currentParams;
const bool needsRecreation = cachedImageType != manager.imageType || cachedParams != currentParams;
if (needsRecreation)
{
// call the eviction callback so the currently cached imageID gets eventually deallocated from memory arena.
evictCallback(imageID, *cachedImageRecord);
evictCallback(manager.georeferencedImageParams.imageID, *cachedImageRecord);

// instead of erasing and inserting the imageID into the cache, we just reset it, so the next block of code goes into array index allocation + creating our new image
*cachedImageRecord = CachedImageRecord(currentFrameIndex);
Expand Down Expand Up @@ -705,17 +704,17 @@ bool DrawResourcesFiller::ensureGeoreferencedImageAvailability_AllocateIfNeeded(
if (cachedImageRecord->arrayIndex != video::SubAllocatedDescriptorSet::AddressAllocator::invalid_address)
{
// Attempt to create a GPU image and image view for this texture.
ImageAllocateResults allocResults = tryCreateAndAllocateImage_SubmitIfNeeded(imageCreationParams, asset::E_FORMAT::EF_COUNT, intendedNextSubmit, std::to_string(imageID));
ImageAllocateResults allocResults = tryCreateAndAllocateImage_SubmitIfNeeded(imageCreationParams, asset::E_FORMAT::EF_COUNT, intendedNextSubmit, std::to_string(manager.georeferencedImageParams.imageID));

if (allocResults.isValid())
{
cachedImageRecord->type = georeferenceImageType;
cachedImageRecord->type = manager.imageType;
cachedImageRecord->state = ImageState::CREATED_AND_MEMORY_BOUND;
cachedImageRecord->lastUsedFrameIndex = currentFrameIndex; // there was an eviction + auto-submit, we need to update AGAIN
cachedImageRecord->allocationOffset = allocResults.allocationOffset;
cachedImageRecord->allocationSize = allocResults.allocationSize;
cachedImageRecord->gpuImageView = allocResults.gpuImageView;
cachedImageRecord->staticCPUImage = nullptr;
cachedImageRecord->staticCPUImage = manager.georeferencedImageParams.geoReferencedImage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should stay nullptr, georeferenced images should have nothing to do with staticCPUImage. the cacheImageRecord only holds the status of the gpu allocated georef image.
then you just copy your regions with buffers into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. the cpu image should only be part of the loader and not the draw resources filler OR the tile calculator

}
else
{
Expand Down Expand Up @@ -743,7 +742,7 @@ bool DrawResourcesFiller::ensureGeoreferencedImageAvailability_AllocateIfNeeded(
}

// erase the entry we failed to fill, no need for `evictImage_SubmitIfNeeded`, because it didn't get to be used in any submit to defer it's memory and index deallocation
imagesCache->erase(imageID);
imagesCache->erase(manager.georeferencedImageParams.imageID);
}
}
else
Expand Down Expand Up @@ -867,7 +866,7 @@ void DrawResourcesFiller::addImageObject(image_id imageID, const OrientedBoundin
endMainObject();
}

void DrawResourcesFiller::addGeoreferencedImage(image_id imageID, const GeoreferencedImageParams& params, SIntendedSubmitInfo& intendedNextSubmit)
void DrawResourcesFiller::addGeoreferencedImage(StreamedImageManager& manager, const float64_t3x3& NDCToWorld, SIntendedSubmitInfo& intendedNextSubmit)
{
beginMainObject(MainObjectType::STREAMED_IMAGE);

Expand All @@ -879,11 +878,21 @@ void DrawResourcesFiller::addGeoreferencedImage(image_id imageID, const Georefer
return;
}

// Generate upload data
auto uploadData = manager.generateTileUploadData(NDCToWorld);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct/Good decision to calculate the tiling and which sections to load here in addGeoreferencedImage


// Queue image uploads - if necessary
if (manager.imageType == ImageType::GEOREFERENCED_STREAMED)
{
for (const auto& imageCopy : uploadData.tiles)
queueGeoreferencedImageCopy_Internal(manager.georeferencedImageParams.imageID, imageCopy);
}

GeoreferencedImageInfo info = {};
info.topLeft = params.worldspaceOBB.topLeft;
info.dirU = params.worldspaceOBB.dirU;
info.aspectRatio = params.worldspaceOBB.aspectRatio;
info.textureID = getImageIndexFromID(imageID, intendedNextSubmit); // for this to be valid and safe, this function needs to be called immediately after `addStaticImage` function to make sure image is in memory
info.topLeft = uploadData.worldspaceOBB.topLeft;
info.dirU = uploadData.worldspaceOBB.dirU;
info.aspectRatio = uploadData.worldspaceOBB.aspectRatio;
info.textureID = getImageIndexFromID(manager.georeferencedImageParams.imageID, intendedNextSubmit); // for this to be valid and safe, this function needs to be called immediately after `addStaticImage` function to make sure image is in memory
if (!addGeoreferencedImageInfo_Internal(info, mainObjIdx))
{
// single image object couldn't fit into memory to push to gpu, so we submit rendering current objects and reset geometry buffer and draw objects
Expand Down Expand Up @@ -1370,7 +1379,7 @@ bool DrawResourcesFiller::pushStaticImagesUploads(SIntendedSubmitInfo& intendedN
std::vector<CachedImageRecord*> nonResidentImageRecords;
for (auto& [id, record] : imagesCache)
{
if (record.staticCPUImage && record.type == ImageType::STATIC && record.state < ImageState::GPU_RESIDENT_WITH_VALID_STATIC_DATA)
if (record.staticCPUImage && (record.type == ImageType::STATIC || record.type == ImageType::GEOREFERENCED_FULL_RESOLUTION) && record.state < ImageState::GPU_RESIDENT_WITH_VALID_STATIC_DATA)
nonResidentImageRecords.push_back(&record);
}

Expand Down Expand Up @@ -1557,7 +1566,7 @@ bool DrawResourcesFiller::pushStreamedImagesUploads(SIntendedSubmitInfo& intende
std::vector<IGPUCommandBuffer::SPipelineBarrierDependencyInfo::image_barrier_t> afterCopyImageBarriers;
afterCopyImageBarriers.reserve(streamedImageCopies.size());

// Pipeline Barriers before imageCopy
// Pipeline Barriers after imageCopy
for (auto& [imageID, imageCopies] : streamedImageCopies)
{
auto* imageRecord = imagesCache->peek(imageID);
Expand Down Expand Up @@ -2461,30 +2470,43 @@ DrawResourcesFiller::ImageAllocateResults DrawResourcesFiller::tryCreateAndAlloc
return ret;
}

void DrawResourcesFiller::determineGeoreferencedImageCreationParams(nbl::asset::IImage::SCreationParams& outImageParams, ImageType& outImageType, const GeoreferencedImageParams& georeferencedImageParams)
void DrawResourcesFiller::determineGeoreferencedImageCreationParams(nbl::asset::IImage::SCreationParams& outImageParams, StreamedImageManager& manager)
{
auto& georeferencedImageParams = manager.georeferencedImageParams;
// Decide whether the image can reside fully into memory rather than get streamed.
// TODO: Improve logic, currently just a simple check to see if the full-screen image has more pixels that viewport or not
// TODO: add criterial that the size of the full-res image shouldn't consume more than 30% of the total memory arena for images (if we allowed larger than viewport extents)
const bool betterToResideFullyInMem = georeferencedImageParams.imageExtents.x * georeferencedImageParams.imageExtents.y <= georeferencedImageParams.viewportExtents.x * georeferencedImageParams.viewportExtents.y;

if (betterToResideFullyInMem)
outImageType = ImageType::GEOREFERENCED_FULL_RESOLUTION;
manager.imageType = ImageType::GEOREFERENCED_FULL_RESOLUTION;
else
outImageType = ImageType::GEOREFERENCED_STREAMED;
manager.imageType = ImageType::GEOREFERENCED_STREAMED;

outImageParams.type = asset::IImage::ET_2D;
outImageParams.samples = asset::IImage::ESCF_1_BIT;
outImageParams.format = georeferencedImageParams.format;

if (outImageType == ImageType::GEOREFERENCED_FULL_RESOLUTION)
if (manager.imageType == ImageType::GEOREFERENCED_FULL_RESOLUTION)
{
outImageParams.extent = { georeferencedImageParams.imageExtents.x, georeferencedImageParams.imageExtents.y, 1u };
}
else
{
// TODO: Better Logic, area around the view, etc...
outImageParams.extent = { georeferencedImageParams.viewportExtents.x, georeferencedImageParams.viewportExtents.y, 1u };
// Pad sides to multiple of tileSize. Even after rounding up, we might still need to add an extra tile to cover both sides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing from https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/pull/205/files#r2242153374
I believe these calculations is better if done inside addGeorefImage function

// I added two to be safe and to have issues at the borders.
const auto xExtent = core::roundUp(georeferencedImageParams.viewportExtents.x, manager.TileSize) + 2 * manager.TileSize;
const auto yExtent = core::roundUp(georeferencedImageParams.viewportExtents.y, manager.TileSize) + 2 * manager.TileSize;
outImageParams.extent = { xExtent, yExtent, 1u };
manager.maxResidentTiles.x = xExtent / manager.TileSize;
manager.maxResidentTiles.y = yExtent / manager.TileSize;
// Create a "sliding window OBB" that we use to offset tiles
manager.fromTopLeftOBB.topLeft = georeferencedImageParams.worldspaceOBB.topLeft;
manager.fromTopLeftOBB.dirU = georeferencedImageParams.worldspaceOBB.dirU * float32_t(manager.TileSize * manager.maxResidentTiles.x) / float32_t(georeferencedImageParams.imageExtents.x);
manager.fromTopLeftOBB.aspectRatio = float32_t(manager.maxResidentTiles.y) / float32_t(manager.maxResidentTiles.x);
// I think aspect ratio can stay the same since worldspace OBB and imageExtents should have same aspect ratio.
// If the image can be stretched/sheared and not simply rotated, then the aspect ratio *might* have to change, although I think that's covered by
// the OBB's aspect ratio
}


Expand Down Expand Up @@ -2624,4 +2646,126 @@ void DrawResourcesFiller::flushDrawObjects()
drawCalls.push_back(drawCall);
drawObjectsFlushedToDrawCalls = resourcesCollection.drawObjects.getCount();
}
}

DrawResourcesFiller::StreamedImageManager::StreamedImageManager(GeoreferencedImageParams&& _georeferencedImageParams)
: georeferencedImageParams(std::move(_georeferencedImageParams))
{
maxImageTileIndices = georeferencedImageParams.imageExtents / uint32_t2(TileSize, TileSize);
// If it fits perfectly along any dimension, we need one less tile with this scheme
maxImageTileIndices -= uint32_t2(maxImageTileIndices.x * TileSize == georeferencedImageParams.imageExtents.x, maxImageTileIndices.y * TileSize == georeferencedImageParams.imageExtents.y);

// R^2 can be covered with a lattice of image tiles. Real tiles (those actually covered by the image) are indexed in the range [0, maxImageTileIndices.x] x [0, maxImageTileIndices.y],
// but part of the algorithm to figure out which tiles need to be resident for a draw involves figuring out the coordinates in this lattice of each of the viewport corners.
// To that end, we devise an algorithm that maps a point in worldspace to its coordinates in this tile lattice:
// 1. Get the displacement (will be an offset vector in world coords and world units) from the `topLeft` corner of the image to the point
// 2. Transform this displacement vector into a displacement into the coordinates spanned by the basis {dirU, dirV}. Notice that these vectors are still in world units
// 3. Map world units to tile units. This scaling is generally nonuniform, since it depends on the ratio of pixels to world units per coordinate.
// The name of the `offsetCoBScaleMatrix` follows by what is computed at each step

// 1. Displacement. The following matrix computes the offset for an input point `p` with homogenous worldspace coordinates.
// By foregoing the homogenous coordinate we can keep only the vector part, that's why it's `2x3` and not `3x3`
float64_t2 topLeftWorld = georeferencedImageParams.worldspaceOBB.topLeft;
float64_t2x3 displacementMatrix(1., 0., - topLeftWorld.x, 0., 1., - topLeftWorld.y);

// 2. Change of Basis. Since {dirU, dirV} are orthogonal, the matrix to change from world coords to `span{dirU, dirV}` coords has a quite nice expression
// Non-uniform scaling doesn't affect this, but this has to change if we allow for shearing (basis vectors stop being orthogonal)
float64_t2 dirU = georeferencedImageParams.worldspaceOBB.dirU;
float64_t2 dirV = float32_t2(dirU.y, -dirU.x) * georeferencedImageParams.worldspaceOBB.aspectRatio;
float64_t dirULengthSquared = nbl::hlsl::dot(dirU, dirU);
float64_t dirVLengthSquared = nbl::hlsl::dot(dirV, dirV);
float64_t2 firstRow = dirU / dirULengthSquared;
float64_t2 secondRow = dirV / dirVLengthSquared;
float64_t2x2 changeOfBasisMatrix(firstRow, secondRow);

// 3. Scaling. The vector obtained by doing `CoB * displacement * p` are now the coordinates in the `span{dirU, dirV}`, which would be `uv` coordinates in [0,1]^2
// (or outside this range for points not in the image). To get tile lattice coordinates, we need to scale this number by an nTiles vector which counts
// (fractionally) how many tiles fit in the image along each axis
float32_t2 nTiles = float32_t2(georeferencedImageParams.imageExtents) / float32_t2(TileSize, TileSize);
float64_t2x2 scaleMatrix(nTiles.x, 0., 0., nTiles.y);

// Put them all together
offsetCoBScaleMatrix = nbl::hlsl::mul(scaleMatrix, nbl::hlsl::mul(changeOfBasisMatrix, displacementMatrix));
}

DrawResourcesFiller::StreamedImageManager::TileUploadData DrawResourcesFiller::StreamedImageManager::generateTileUploadData(const float64_t3x3& NDCToWorld)
{
if (imageType == ImageType::GEOREFERENCED_FULL_RESOLUTION)
return TileUploadData{ {}, georeferencedImageParams.worldspaceOBB };

// Following need only be done if image is actually streamed

// Using Vulkan NDC, the viewport has coordinates in the range [-1, -1] x [1,1]. First we get the world coordinates of the viewport corners, in homogenous
const float64_t3 topLeftNDCH(-1., -1., 1.);
const float64_t3 topRightNDCH(1., -1., 1.);
const float64_t3 bottomLeftNDCH(-1., 1., 1.);
const float64_t3 bottomRightNDCH(1., 1., 1.);

const float64_t3 topLeftWorldH = nbl::hlsl::mul(NDCToWorld, topLeftNDCH);
const float64_t3 topRightWorldH = nbl::hlsl::mul(NDCToWorld, topRightNDCH);
const float64_t3 bottomLeftWorldH = nbl::hlsl::mul(NDCToWorld, bottomLeftNDCH);
const float64_t3 bottomRightWorldH = nbl::hlsl::mul(NDCToWorld, bottomRightNDCH);

// We can use `offsetCoBScaleMatrix` to get tile lattice coordinates for each of these points
const float64_t2 topLeftTileLattice = nbl::hlsl::mul(offsetCoBScaleMatrix, topLeftWorldH);
const float64_t2 topRightTileLattice = nbl::hlsl::mul(offsetCoBScaleMatrix, topRightWorldH);
const float64_t2 bottomLeftTileLattice = nbl::hlsl::mul(offsetCoBScaleMatrix, bottomLeftWorldH);
const float64_t2 bottomRightTileLattice = nbl::hlsl::mul(offsetCoBScaleMatrix, bottomRightWorldH);

// Get the min and max of each lattice coordinate
const float64_t2 minTop = nbl::hlsl::min(topLeftTileLattice, topRightTileLattice);
const float64_t2 minBottom = nbl::hlsl::min(bottomLeftTileLattice, bottomRightTileLattice);
const float64_t2 minAll = nbl::hlsl::min(minTop, minBottom);

const float64_t2 maxTop = nbl::hlsl::max(topLeftTileLattice, topRightTileLattice);
const float64_t2 maxBottom = nbl::hlsl::max(bottomLeftTileLattice, bottomRightTileLattice);
const float64_t2 maxAll = nbl::hlsl::max(maxTop, maxBottom);

// Floor them to get an integer for the tiles they're in
const int32_t2 minAllFloored = nbl::hlsl::floor(minAll);
const int32_t2 maxAllFloored = nbl::hlsl::floor(maxAll);

// Clamp them to reasonable tile indices
minLoadedTileIndices = nbl::hlsl::clamp(minAllFloored, int32_t2(0, 0), int32_t2(maxImageTileIndices));
maxLoadedTileIndices = nbl::hlsl::clamp(maxAllFloored, int32_t2(0, 0), nbl::hlsl::min(int32_t2(maxImageTileIndices), int32_t2(minLoadedTileIndices + maxResidentTiles - uint32_t2(1,1))));

// Now we have the indices of the tiles we want to upload, so create the vector of `StreamedImageCopies` - 1 per tile.
core::vector<StreamedImageCopy> tiles;
tiles.reserve((maxLoadedTileIndices.x - minLoadedTileIndices.x + 1) * (maxLoadedTileIndices.y - minLoadedTileIndices.y + 1));

// Assuming a 1 pixel per block format - otherwise math here gets a bit trickier
auto bytesPerPixel = getTexelOrBlockBytesize(georeferencedImageParams.format);
const size_t bytesPerSide = bytesPerPixel * TileSize;

// Dangerous code - assumes image can be perfectly covered with tiles. Otherwise will need to handle edge cases
for (uint32_t tileX = minLoadedTileIndices.x; tileX <= maxLoadedTileIndices.x; tileX++)
{
for (uint32_t tileY = minLoadedTileIndices.y; tileY <= maxLoadedTileIndices.y; tileY++)
{
asset::IImage::SBufferCopy bufCopy;
bufCopy.bufferOffset = (tileY * (maxImageTileIndices.x + 1) * TileSize + tileX) * bytesPerSide;
bufCopy.bufferRowLength = georeferencedImageParams.imageExtents.x;
bufCopy.bufferImageHeight = 0;
bufCopy.imageSubresource.aspectMask = IImage::EAF_COLOR_BIT;
bufCopy.imageSubresource.mipLevel = 0u;
bufCopy.imageSubresource.baseArrayLayer = 0u;
bufCopy.imageSubresource.layerCount = 1u;
bufCopy.imageOffset = { (tileX - minLoadedTileIndices.x) * TileSize, (tileY - minLoadedTileIndices.y) * TileSize, 0u };
bufCopy.imageExtent.width = TileSize;
bufCopy.imageExtent.height = TileSize;
bufCopy.imageExtent.depth = 1;

tiles.emplace_back(georeferencedImageParams.format, georeferencedImageParams.geoReferencedImage->getBuffer(), std::move(bufCopy));
}
}

// Last, we need to figure out an obb that covers only the currently loaded tiles
// By shifting the `fromTopLeftOBB` an appropriate number of tiles in each direction, we get an obb that covers at least the uploaded tiles
// It might cover more tiles, possible some that are not even loaded into VRAM, but since those fall outside of the viewport we don't really care about them
OrientedBoundingBox2D worldspaceOBB = fromTopLeftOBB;
const float32_t2 dirV = float32_t2(worldspaceOBB.dirU.y, -worldspaceOBB.dirU.x) * worldspaceOBB.aspectRatio;
worldspaceOBB.topLeft += worldspaceOBB.dirU * float32_t(minLoadedTileIndices.x) / float32_t(maxResidentTiles.x);
worldspaceOBB.topLeft += dirV * float32_t(minLoadedTileIndices.y) / float32_t(maxResidentTiles.y);
return TileUploadData{ std::move(tiles), worldspaceOBB };

}
Loading