Skip to content

Commit 3638bf6

Browse files
authored
Fixing notebook in Hristiyan`s blog (#369)
* Delete images/blog/MergeSortTest.png * Update 2026-03-01-Creating teaching materials for C++ and CUDA with xeus-cpp.md * Fixed typos * Update terms.txt
1 parent 96ce261 commit 3638bf6

File tree

3 files changed

+142
-6
lines changed

3 files changed

+142
-6
lines changed

.github/actions/spelling/allow/terms.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,14 +48,17 @@ backpropagation
4848
biodynamo
4949
bioinformatics
5050
blogs
51+
chrono
5152
cms
5253
codegen
5354
consteval
55+
cout
5456
cplusplus
5557
cppyy
5658
cytokine
5759
cytokines
5860
doxygen
61+
endl
5962
gitlab
6063
gpu
6164
gridlay
@@ -70,6 +73,7 @@ llm
7073
llvm
7174
meetinglist
7275
microenvironments
76+
nomarkdown
7377
omp
7478
openmp
7579
oop
@@ -84,6 +88,7 @@ rntuple
8488
samtools
8589
samtoramntuple
8690
sbo
91+
setprecision
8792
sitemap
8893
softsusy
8994
superbuilds
@@ -131,6 +136,7 @@ MAMODE
131136
meetup
132137
metaprogramming
133138
Miapb
139+
milli
134140
multilanguage
135141
omnidisciplinary
136142
optimisation

_posts/2026-03-01-Creating teaching materials for C++ and CUDA with xeus-cpp.md

Lines changed: 136 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ sitemap: false
66
author: Hristiyan Shterev
77
permalink: blogs/xeus_cpp_Hristiyan_Shterev_blog/
88
date: 2026-03-01
9-
tags: xeus-cpp cuda jupyter c++ xeus
9+
tags: xeus-cpp cuda jupyter c++ xeus internship high-school systems-programming
10+
custom_css: jupyter
1011
---
1112

1213
{% include dual-banner.html
@@ -56,11 +57,140 @@ More specific goals include:
5657

5758
## Example
5859

59-
**CPU - std::sort vs GPU - Merge sort speed test**
60-
61-
The example below shows a C++ benchmark comparing the performance of sorting a large array on a CPU versus a GPU. It provides a clear visual of how parallel processing can drastically outperform traditional sequential execution for data-heavy tasks.
62-
63-
<img src="/images/blog/MergeSortTest.png"/>
60+
{::nomarkdown}
61+
62+
<div tabindex="-1" id="notebook" class="border-box-sizing">
63+
<div class="container" id="notebook-container">
64+
<div class="cell border-box-sizing text_cell rendered">
65+
<div class="prompt input_prompt"></div>
66+
<div class="inner_cell">
67+
<div class="text_cell_render border-box-sizing rendered_html">
68+
<h1 id="CPU - std::sort vs GPU - Merge sort speed test">CPU - std::sort vs GPU - Merge sort speed test<a class="anchor-link" href="#CPU - std::sort vs GPU - Merge sort speed test">&#182;</a></h1>
69+
<p>
70+
The example below shows a C++ benchmark comparing the performance of sorting a large array on a CPU versus a GPU. It provides a clear visual of how parallel processing can drastically outperform traditional sequential execution for data-heavy tasks.
71+
</p>
72+
73+
<p>
74+
In the first cell we create the unsorted data that is going to be sorted by the CPU and GPU. We have loaded a compiled CUDA .so file beforehand.
75+
</p>
76+
</div>
77+
</div>
78+
</div>
79+
80+
<div class="cell border-box-sizing code_cell rendered">
81+
<div class="input">
82+
<div class="prompt input_prompt">In&nbsp;[1]:</div>
83+
<div class="inner_cell">
84+
<div class="input_area">
85+
<div class=" highlight hl-c++">
86+
<pre>
87+
<span class="kt">unsigned int</span> <span class="n">N_bench = <span class="mi">1048576</span>;</span>
88+
<span class="n">std<span class="o">::</span>vector<</span><span class="kt">unsigned int</span><span class="n">> data_cpu(N_bench);</span>
89+
<span class="n">std<span class="o">::</span>vector<</span><span class="kt">unsigned int</span><span class="n">> data_gpu(N_bench);</span>
90+
91+
<span class="k">for</span> <span class="p">(</span><span class=
92+
"kt">unsigned int</span> <span class="n">i</span> <span class="o">=</span> <span class=
93+
"mi">0</span><span class="n">;</span> <span class="n">i</span> <span class=
94+
"o">&lt;</span> <span class="n">N_bench;</span></span> <span class="n">i</span><span class="o">++</span><span class=
95+
"n">)</span> <span class="p">{</span>
96+
<span class="kt">unsigned int </span><span class="n">val = N_bench - i;</span>
97+
<span class="n">data_cpu[i] <span class="o">=</span> val;</span>
98+
<span class="n">data_gpu[i] <span class="o">=</span> val;</span>
99+
<span class="p">}</span>
100+
</pre>
101+
</div>
102+
</div>
103+
</div>
104+
</div>
105+
</div>
106+
<div class="cell border-box-sizing text_cell rendered">
107+
<div class="prompt input_prompt"></div>
108+
<div class="inner_cell">
109+
<div class="text_cell_render border-box-sizing rendered_html">
110+
<h1 id="CPU and GPU sorting">CPU and GPU sorting<a class="anchor-link" href="#CPU and GPU sorting">&#182;</a></h1>
111+
<p>
112+
Next we use std::sort and merge_sort_gpu_full function form the loaded CUDA code and measure the time the CPU and GPU sorts the data.
113+
</p>
114+
</div>
115+
</div>
116+
</div>
117+
<div class="cell border-box-sizing code_cell rendered">
118+
<div class="input">
119+
<div class="prompt input_prompt">In&nbsp;[2]:</div>
120+
<div class="inner_cell">
121+
<div class="input_area">
122+
<div class=" highlight hl-c++">
123+
<pre>
124+
<span class="k">auto</span> <span class="n">start_cpu = std<span class="o">::</span>chrono<span class="o">::</span>high_resolution_clock<span class="o">::</span>now();</span>
125+
<span class="n">std<span class="o">::</span>sort(data_cpu.begin(), data_cpu.end());</span>
126+
<span class="k">auto</span> <span class="n">end_cpu = std<span class="o">::</span>chrono<span class="o">::</span>high_resolution_clock<span class="o">::</span>now();</span>
127+
128+
<span class="n">std<span class="o">::</span>chrono<span class="o">::</span>duration&lt;double, std<span class="o">::</span>milli&gt; cpu_ms <span class="o">=</span> end_cpu - start_cpu;</span>
129+
130+
<span class="k">auto</span> <span class="n">start_gpu = std::chrono::high_resolution_clock::now();</span>
131+
<span class="n">merge_sort_gpu_full(data_gpu.data(), N_bench);</span>
132+
<span class="k">auto</span> <span class="n">end_gpu = std::chrono::high_resolution_clock::now();</span>
133+
134+
<span class="n">std<span class="o">::</span>chrono<span class="o">::</span>duration&lt;double, std<span class="o">::</span>milli&gt; gpu_ms <span class="o">=</span> end_gpu - start_gpu;</span>
135+
</pre>
136+
</div>
137+
</div>
138+
</div>
139+
</div>
140+
</div>
141+
<div class="cell border-box-sizing text_cell rendered">
142+
<div class="prompt input_prompt"></div>
143+
<div class="inner_cell">
144+
<div class="text_cell_render border-box-sizing rendered_html">
145+
<h1 id="">Printing the times and comparing them<a class="anchor-link" href="#Printing the times and comparing them">&#182;</a></h1>
146+
<p>
147+
Finally we print both times and compare them to see how much faster parallel processing is.
148+
</p>
149+
</div>
150+
</div>
151+
</div>
152+
<div class="cell border-box-sizing code_cell rendered">
153+
<div class="input">
154+
<div class="prompt input_prompt">In&nbsp;[3]:</div>
155+
<div class="inner_cell">
156+
<div class="input_area">
157+
<div class=" highlight hl-c++">
158+
<pre>
159+
<span class="n"><span class="kt">double</span> speedup = cpu_ms.count() / gpu_ms.count();</span>
160+
161+
<span class="n">std<span class="o">::</span>cout << <span class="s">"CPU (std::sort) took: "</span> << std<span class="o">::</span>fixed << std<span class="o">::</span>setprecision(<span class="mi">4</span>) << cpu_ms.count() << <span class="s">" ms"</span> << std<span class="o">::</span>endl;</span>
162+
<span class="n">std<span class="o">::</span>cout << <span class="s">"GPU (Merge Sort) took: "</span> << gpu_ms.count() << <span class="s">" ms"</span> << std<span class="o">::</span>endl;</span>
163+
164+
<span class="n">std<span class="o">::</span>cout << std<span class="o">::</span>endl;</span>
165+
166+
<span class="n">std<span class="o">::</span>cout << <span class="s">"GPU Speedup: "</span> << speedup << <span class="s">" times faster than CPU"</span> << std<span class="o">::</span>endl;</span>
167+
</pre>
168+
</div>
169+
</div>
170+
</div>
171+
</div>
172+
</div>
173+
<div class="output_wrapper">
174+
<div class="output">
175+
<div class="output_area">
176+
<div class="prompt output_prompt">Out[3]:</div>
177+
<div class="output_text output_subarea output_execute_result">
178+
<pre>
179+
CPU (std<span class="o">::</span>sort) took: 145.3539 ms
180+
GPU (Merge Sort) took: 9.6199 ms
181+
182+
GPU Speedup: 15.1097 times faster than CPU
183+
</pre>
184+
</div>
185+
</div>
186+
</div>
187+
</div>
188+
</div>
189+
</div>
190+
191+
<br /> <br /> <br />
192+
193+
{:/}
64194

65195
## Related links
66196

images/blog/MergeSortTest.png

-118 KB
Binary file not shown.

0 commit comments

Comments
 (0)