Skip to content

Commit 4967d52

Browse files
committed
Preliminary re-haul of application design unit (last day).
1 parent 57cc2c6 commit 4967d52

30 files changed

+617
-593
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ The slides are available [here](https://csc-training.github.io/summerschool/).
167167
| Time | Topic |
168168
| ---- | ----- |
169169
| 08:00 | Breakfast
170-
| 09:00 | [Application design](application-design)
170+
| 09:00 | [Putting It All Together: HPC Applications at Scale](application-design-deployment)
171171
| 11:00 | [Closing](https://csc-training.github.io/summerschool/)
172172
| 11:30 | Hotel check-out
173173
| 12:00 | Lunch

about.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,6 @@ modules:
88
- parallel-io
99
- gpu
1010
- hpc-ai
11-
- application-design
11+
- application-design-deployment
1212
- application-performance
1313
- closing
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Putting It All Together: HPC Applications at Scale
2+
3+
This section has no exercises.
4+
5+
6+
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# This file is used in the generation of the web page
2-
title: Application Design & HPC Deployment
2+
title: Putting It All Together: HPC Applications at Scale
33
slidesdir: docs
Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
---
2+
title: Application design
3+
event: CSC Summer School in High-Performance Computing 2026
4+
lang: en
5+
---
6+
7+
# Why develop software?
8+
9+
<div class=column>
10+
11+
- **Do science**
12+
- Answer questions, solve problems
13+
- Scientific method:
14+
- Observations/measurements
15+
- Theoretical model
16+
- **Simulation**
17+
- Speedup scientific procedures and analysis, reducing also their costs
18+
</div>
19+
20+
<div class=column>
21+
22+
- **Code as a product**
23+
- Prestige and fame
24+
- Gateway into projects, collaborations
25+
- Citations, co-authorships
26+
- Work on the bleeding edge
27+
</div>
28+
29+
30+
# Starting position
31+
32+
- New code or existing project / rewrite of old code?
33+
- How much effort do you have at your disposal?
34+
- Number of developers may grow
35+
36+
- **Question**: What is your software project?
37+
38+
39+
# Design model
40+
41+
- Development is not only about physics and numerics
42+
- Also about **how** you do it
43+
- Instead of "Just code" it is advantageous to **plan** a little too!
44+
- Also think about future possible extensions!
45+
- **Software engineering** has come up with lots of different development models
46+
- Waterfall, V-model, Agile models (Scrum etc.), ...
47+
- Also scientific software may benefit from formal development models
48+
49+
50+
# Design considerations
51+
52+
- Parallelisation strategies
53+
- Data design
54+
- Programming languages
55+
- Modularity
56+
- I/O formats
57+
- Documentation
58+
- Testing
59+
60+
61+
# Parallelisation strategies
62+
63+
- Planning includes thinking what is the target platform
64+
- **Target machines**: laptops, small clusters, supercomputers
65+
- OpenMP, MPI, MPI+OpenMP, GPUs
66+
- From shared memory to distributed memory machines
67+
- Keep in mind that most machines are distributed memory systems = MPI
68+
- Moving from <1000 cores to >10k cores
69+
- Parallelisation strategies need to be considered
70+
- Non-blocking, avoiding global calls, ...
71+
- **Accelerators**
72+
- GPUs have their own tricks and quirks
73+
74+
# Parallelisation strategies
75+
76+
- Going **BIG** &rarr; GPUs are pretty mandatory these days
77+
- But not all HPC needs to be exascale
78+
- Size is not a goal in itself
79+
80+
81+
# Programming languages
82+
83+
- Selection of languages
84+
- **Performance** oriented languages (low level)
85+
- **Programmability** oriented languages (high level)
86+
- Mix
87+
- Best of both worlds
88+
- Low-level languages for costly functions
89+
- High-level languages for main functions
90+
91+
92+
# Low level languages
93+
94+
- Direct control over memory
95+
- Most common are **C, C++, Fortran**
96+
- Better support for GPU programming in C/C++, HIP does not support Fortran kernels
97+
98+
<div class=column>
99+
100+
- C++
101+
- `std` library for data structures
102+
- low level memory management (concept of data ownership, move semantics, ...)
103+
- metaprogramming
104+
105+
</div>
106+
107+
<div class=column>
108+
109+
- Fortran
110+
- Good for number crunching
111+
- Good array syntax
112+
- Language semantics make optimisation easier for compilers
113+
114+
</div>
115+
116+
117+
# High level languages
118+
119+
- Python/Julia
120+
- Faster coding cycle and less error prone
121+
- Testing, debugging, and prototyping much easier
122+
- Built on top of high performance libraries (numpy, tensorflow, ...)
123+
124+
- Combinations/suggestions
125+
- Python & C++ (PyBind11) for object-oriented programming
126+
- Julia & Fortran (native) for functional programming
127+
128+
129+
# GPU programming approaches
130+
131+
- **Directive** based approaches: OpenACC and OpenMP
132+
- "standard" and "portable"
133+
- **Native** low level languages: CUDA (NVIDIA) and HIP (AMD)
134+
- HIP supports in principle also NVIDIA devices
135+
- With HIP, Fortran needs wrappers via C-bindings
136+
- **Performance portability** frameworks: SYCL, Kokkos
137+
- Support only C++
138+
- **Standard language features**: parallel C++, `do concurrent`
139+
- Rely on implicit data movements
140+
- Compiler support incomplete
141+
142+
143+
# Modular code design: programming
144+
145+
- Good code is **modular**
146+
- Encapsulation
147+
- Self-contained functions
148+
- No global variables, input what you need
149+
- Modular code takes more time to design but is **a lot** easier to extend and understand
150+
151+
152+
# Break for questions and breathing
153+
154+
155+
# Version control
156+
157+
- Version control is the **single most important software development tool**
158+
- Git is nowadays ubiquitous, Subversion (SVN) was more common but is less popular now
159+
- Additional tools in web services (GitHub, GitLab, Bitbucket)
160+
- Forking
161+
- Issue tracking
162+
- Review of pull/merge requests
163+
- Wikis
164+
- Integrations
165+
- Vlasiator: public [GitHub repository](https://github.com/fmihpc/vlasiator)
166+
167+
168+
# Code design: tools
169+
170+
<div class=column>
171+
172+
- Avoid **not invented here** syndrome
173+
- Leverage existing software and **libraries**
174+
- Numerical (BLAS, solvers, ...)
175+
- I/O
176+
- Parallelisation
177+
- Profiling/monitoring
178+
</div>
179+
180+
<div class=column>
181+
182+
- Caveats:
183+
- Is the lib still supported/updated?
184+
- Do you trust the source, is it widely used?
185+
- Is there documentation?
186+
- Does it support all the features, can you extend it if needed?
187+
</div>
188+
189+
190+
# Code design: development tools
191+
192+
- Software development is time consuming, many **tools** exist to help you in the process
193+
- Build systems automate **configuring and compiling**
194+
- CMake
195+
- GNU Autotools
196+
- Make, Ninja
197+
- The bigger your project is, the better is to rely on these **automatic** tools
198+
- Setup can be painful
199+
200+
201+
# Code design: development tools
202+
203+
- Debuggers
204+
- Compilers
205+
- Compilers are not the same, compiler bugs are real!
206+
- Test your code with different compilers (gnu, clang, intel, cray, ...)
207+
- Linters (check coding style)
208+
209+
- **Questions**: What development tools do you use? Do they make your work easier?
210+
211+
212+
# Data design
213+
214+
- Data has to be "designed" too
215+
- Use **structures**!
216+
- Note possible performance difference between structure of arrays vs.
217+
arrays of structures
218+
- Think about the **flow**
219+
- How to distribute the data
220+
- GPU introduce more data related problems and opportunities:
221+
- Memory copies between host and device
222+
- Preallocation, prefetching, overlapping computation with copy
223+
- GPU-aware MPI
224+
225+
226+
# I/O Data formats
227+
228+
- **Data** formats
229+
- Not just plain text files/binary files
230+
- Platform-independent formats (HDF5, NetCDF, ...)
231+
- Metadata together with the data?
232+
- **Log** files
233+
- Standard formats
234+
- Your field might have some data standards
235+
- Remember also that large simulations produce lots of data
236+
- Storing "big data" is an issue
237+
- A global climate simulation can produce one PB in a day
238+
239+
240+
# Coding style
241+
242+
- Code **readability** comes first
243+
- **Consistency** helps readability
244+
- Indentation, how/when to have instructions longer than one line, ...
245+
- Many editors have tools to help
246+
- There are exceptions!
247+
248+
249+
# Documentation – TBD use Diátaxis framework
250+
- In-code:
251+
- **Explanation** of what files, classes, functions are doing
252+
- Text and e.g. ascii-art explanations of complex parts
253+
- Can be formatted to build external documentation (e.g. `Doxygen`, `Sphinx`)
254+
- Along with the code (wiki, manual)
255+
- **How to** contribute
256+
- How to install and use
257+
- How to analyse
258+
- How to cite
259+
260+
261+
# Documentation
262+
- **For whom** am I writing documentation? Think of:
263+
- You after vacation (did *I* write this?)
264+
- Who comes after your PhD (they *must* have had a good reason for writing it like this?!)
265+
- Future contributors (where to start? how do I contribute my optimised kernel to their repo?)
266+
- Future users (I could use this for my research, how does it work?)
267+
- **What tools** do I use to support writing and deploying good documentation?
268+
269+
270+
# Documentation – Diátaxis framework
271+
272+
![](images/diataxis_axes-of-needs.png){width=70%}
273+
274+
D. Procida, [Diátaxis documentation framework](https://diataxis.fr/) (CC-BY-SA 4.0)
275+
276+
277+
# Documentation – Diátaxis framework
278+
279+
- What are the documentation users' needs?
280+
- Practical knowledge vs. theoretical knowledge
281+
- Acquiring knowledge vs. applying knowledge
282+
283+
284+
# Testing
285+
- **Unit** testing (does this function/solver/module work?)
286+
- **Integration** testing (hopefully my new feature doesn't break everything?)
287+
- **Verification** (does my code do what I designed it to do?)
288+
- **Validation** (does my code do things as expected compared to theory/data?)
289+
290+
**Use automated tools to streamline as much testing as you can! Ensure your test coverage is adequate!**
291+
292+
293+
# Conclusions
294+
295+
- Software design is all about planning
296+
- Productivity
297+
- Modular design
298+
- Use existing libraries
299+
- Use and integrate design, community, and collaboration tools
300+
- Programming language and design selection
301+
- Re-/Usability
302+
- Not only for a single developer
303+
- Automation/standardisation where possible
304+
- Adopt practices and tools to ease the burden of a single person
305+
306+
307+

0 commit comments

Comments
 (0)