Skip to content

Commit 49fcc35

Browse files
committed
Updates
1 parent a0c4524 commit 49fcc35

File tree

2 files changed

+2
-37
lines changed

2 files changed

+2
-37
lines changed

content/learning-paths/mobile-graphics-and-gaming/android_halide/android.md

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -346,42 +346,7 @@ The code defines three utility methods:
346346
2. extractGrayScaleBytes - converts a Bitmap into a grayscale byte array suitable for native processing.
347347
3. createBitmapFromGrayBytes - converts a grayscale byte array back into a Bitmap for display purposes.
348348

349-
Note that performing the grayscale conversion in Halide allows us to exploit operator fusion, further improving performance by avoiding intermediate memory accesses. This could be done as follows:
350-
```cpp
351-
// Halide variables
352-
Halide::Var x("x"), y("y"), c("c");
353-
354-
// Original RGB input buffer (interleaved RGB)
355-
Halide::Buffer<uint8_t> inputBuffer(inputRgbData, width, height, 3);
356-
357-
// Convert RGB to grayscale directly in Halide pipeline
358-
Halide::Func grayscale("grayscale");
359-
grayscale(x, y) = Halide::cast<uint8_t>(
360-
0.299f * inputBuffer(x, y, 0) +
361-
0.587f * inputBuffer(x, y, 1) +
362-
0.114f * inputBuffer(x, y, 2)
363-
);
364-
365-
// Continue pipeline: Gaussian blur (example)
366-
Halide::Func blur("blur");
367-
Halide::RDom r(-1, 3, -1, 3);
368-
Halide::Expr kernel[3][3] = {
369-
{1, 2, 1},
370-
{2, 4, 2},
371-
{1, 2, 1}
372-
};
373-
374-
Halide::Expr blurSum = 0;
375-
for (int i = 0; i < 3; ++i) {
376-
for (int j = 0; j < 3; ++j) {
377-
blurSum += grayscale(x + r.x, y + r.y) * kernel[i][j];
378-
}
379-
}
380-
blur(x, y) = Halide::cast<uint8_t>(blurSum / 16);
381-
382-
// Fuse grayscale and blur operations
383-
grayscale.compute_at(blur, x);
384-
```
349+
Note that performing the grayscale conversion in Halide allows us to exploit operator fusion, further improving performance by avoiding intermediate memory accesses. This could be done as in our examples before (processing-workflow).
385350

386351
The JNI integration occurs through an external method declaration, blurThresholdImage, loaded via the companion object at app startup. The native library (armhalideandroiddemo) containing this function is compiled separately and integrated into the application (native-lib.cpp).
387352

content/learning-paths/mobile-graphics-and-gaming/android_halide/fusion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -528,4 +528,4 @@ Fusion isn’t always best. You’ll want to materialize an intermediate (comput
528528
The fastest way to check whether fusion helps is to measure it. Our demo prints timing and throughput per frame, but Halide also includes a built-in profiler that reports per-stage runtimes. To learn how to enable and interpret the profiler, see the official [Halide profiling tutorial](https://halide-lang.org/tutorials/tutorial_lesson_21_auto_scheduler_generate.html#profiling).
529529

530530
## Summary
531-
In this lesson, we learned about operation fusion in Halide, a powerful technique to reduce memory bandwidth and improve computational efficiency. We explored why fusion matters, identified scenarios where fusion is most effective, and demonstrated how Halide’s scheduling constructs (compute_at, store_at, fuse) enable you to apply fusion easily and effectively. By fusing the Gaussian blur and thresholding stages, we improved the performance of our real-time image processing pipeline.
531+
In this lesson, we learned about operator fusion in Halidea powerful technique for reducing memory bandwidth and improving computational efficiency. We explored why fusion matters, looked at scenarios where it is most effective, and saw how Halide’s scheduling constructs such as compute_root() and compute_at() let us control whether stages are fused or materialized. By experimenting with different schedules, including fusing the Gaussian blur and thresholding stages, we observed how fusion can significantly improve the performance of a real-time image processing pipeline

0 commit comments

Comments
 (0)