-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.qmd
More file actions
276 lines (222 loc) · 25.2 KB
/
index.qmd
File metadata and controls
276 lines (222 loc) · 25.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
# Introduction
The term Research Software Engineer (RSE) has become a label
for specific job roles in universities and research institutions.
It relates to people doing work that includes elements of traditional software engineering
but also rests on a broad spectrum of skills from neighbouring domains
such as scientific computing, data science, research data management, and the open science movement in general
including skills that have not been packaged into elements of established curricula in formal education yet.
Although researchers have used computers in scientific work since the dawn of the computing era, the formal role of Research Software Engineer (RSE) has only recently been recognized [@baxter2012research].
Nevertheless, the RSE community is largely comprised of individuals without formal education in software engineering.
However, it has been realised that the increasing importance of
computational science, complexity, demands, size of teams, number of collaboration partners
and generally the need for more ambitious software projects
calls for a better and more systematic reuse of knowledge and skills from software engineering.
This is especially true for natural sciences,
where RSEs' expertise often focuses on very specialised technical
problem-solving in close conjunction with a specific research question
at the expense of the mastery of more general concepts of software engineering.
A disconnect between scientific computing and software engineering
practitioners and communities has been already observed by others including [@Kelly2007].
We aim to improve the situation in the spirit of her conclusion,
which urges the scientific community to better apply the current solutions
offered by the software engineering community and the latter to better cater
to the sciences as an application domain.
Thus, the idea of introducing a master's curriculum for research software engineering for natural sciences was born.
Establishing this formal training in Research Software Engineering (RSEng) will raise specialists that can help alleviate the lack of computational specialists in the sciences.
On the one hand, those RSEs should be firmly grounded in software engineering and assume practices and the mindset of a software engineer.
On the other hand, these RSEs should develop a deep understanding of computational scientific work and the relevant methods and practices.
They should assume a specific professional identity that also includes the self-image of a scientist with the associated values and practices, and is recognised as an equal member of a domain research team.
We then have to ask, what is the identity needed for students to be accepted and feel as qualified software engineers but also be intellectually connected to the natural sciences?
In other words: what is the special relationship between natural sciences and computing historically, and how does it relate to software engineering foundations?
Historically, natural sciences and software engineering share common threads in the fabric of their identities,
but different challenges in the past decades have led to a divergence of the fields.
Today, students of one field can not easily cross over into the other field and contribute their knowledge,
due to the different domain identities that have been forged.
We argue that it is not enough to augment a software engineering curriculum with a specialisation in a natural science
or to improve the computer programming modules in a science curriculum.
We propose that it is necessary to create a set of new disciplines that more fundamentally fuse software engineering with a natural science.
A discipline of this kind, e.g. "Research Software Engineering in Geoscience", should be more than a combination of both fields.
It should rather assume its own distinct identity
and generate new research questions, new methods to address them,
and a new type of expert to substantially re-shape and advance relevant scientific sub-domains, e.g. climate system modelling.
The paper is structured as follows: we will first trace back the history of the natural sciences and computing. After that we will give a view on the history of software engineering as it relates to research.
Arriving at the present, we shine a light on how software is currently developed in academia,
before considering larger trends and challenges, that will shape software development for research in the near future.
From this, we synthesise a common core that should be the identity of a research software engineer in science, before we detail how this ideal identity of a research software engineer shapes the planned curriculum for research software engineering.
# Related Work
The German Computer Science Society (GI) has a long track record in establishing computer science programs. In this
context the Guidelines for University Education in Computer Science play an important role
in streamlining computer science curricula [@zukunft2016empfehlungen].
Examples for GI lead concepts are the recommendations for Business & Information Systems Engineering [@mertens_rahmenempfehlung_2002]
or data science [@abedjan_empfehlungen_2021]. A possible RSE master would be similar to the first in that it combines two fields (i.e. computer science and physics) but also
has similarities to the latter as it represents a cross-cutting computational field.
Currently, we envision two tracks within said curriculum depending on the bachelor's degree/domain of the students.
The first track, in line with this paper, will be RSE for Natural Science, which is suited for students from natural science domains.
The second track will be for students with a background in computer science and a larger focus on software engineering, but is secondary to the paper at hand.
According to the categorisation of the Gesellschaft für Informatik (German Informatics society) [@zukunft2016empfehlungen],
the first track will be of type 3 with an equal share of domain science and computer science
and the second track will be of type 1.
The current state of the development process and corresponding discussions can be found at [@RSECurriculums2021].
The planned master is going to enable students to build quality research software according to the recommendations for research software [@leitlinien2025] and FAIR principles [@FAIR].
Similar efforts were carried out in allied communities and have yielded tailored curricula,
such as the RSE-HPC curriculum [@Filinger2025] by the UNIVERSE-HPC project,
the RSE curriculum track for computational scientists and engineers [@Chourdakis2025] at the Technical University of Munich,
the Simulation Software Engineering course [@Uekermann2021] at the University of Stuttgart,
or the bachelor's and master's programme "Simulation Technology" from the Cluster
of Excellence "Data-Integrated Simulation Science (SimTech)" [@SimTech2019]
for computer science students. In addition, there are experiences from teaching Research Software Engineering courses at four German universities in different master's programmes described by Bertrand et al. [@Bertrand2025].
# Science and Computing: Common History
Ever since early astronomy, humans have used computation to quantify and predict nature.
To help with these tasks, humans also invented practical tools like water clocks and the abacus
as well as theoretical tools and concepts,
like the Hindu-Arabic decimal system or binary numbers.
Some of these theoretical concepts became complete subfields such as Algebra, which was introduced as al-jabr by the Persian astronomer al-Khwarizmi,
whose name incidentally is also the origin of the word algorithm.
In a sort of virtuous cycle, advancements in science led to advancements in engineering and manufacturing,
allowing for the construction of increasingly complex computational machines,
which in turn enabled further scientific discoveries.
These computing machines range from mechanical devices like the Pascaline or the Analytical Engine [@bromley2008charles], over electro-mechanical machines like Zuse's Z3 [@Zuse1986], or the British COLOSSUS [@Copeland2004] to the fully electronic machines that are ubiquitous today.
Starting with the first high-level programming language
FORTRAN by John Backus and his team at IBM [@Backus1978],
scientists have been enabled to express
algorithms independently of hardware,
transforming how they interacted with computation.
The rapid increase in computing capabilities,
together with advancements in mathematical modelling and scientific software,
has led to a high fidelity of simulation results.
Consequently, so-called in silico experiments are becoming a valid alternative for many in vivo or in vitro experiments.
Computation, in the modern sense of computer-aided work, has since permeated nearly every field of natural science,
often giving rise to entire sub-disciplines, such as computational astrophysics, bioinformatics, computer-aided pharmacy, and earth system simulation.
Today, at least some computational skills are needed in most fields of natural sciences.
The development of computational tools is not just a support activity;
it defines how science is practised, scaled, and extended.
# Software Engineering and Research Applications: Divergence
The close relationship between computation and science
did not make science the primary application domain for the discipline concerned with the professional production of software
that is called Software Engineering today.
In the early days of transistor and integrated circuit based computers,
programming as an activity was dominated by the mathematics, science and engineering communities.
Starting in the late 1950s, large computers became available to universities and research institutions
and were mainly used in engineering and the natural sciences [@wirth2008].
As its name suggests, FORTRAN (FORmula TRANslator),
was designed for scientific and engineering applications [@oregan2012].
The appearance of COBOL (COmmon Business-oriented Language) in 1960
marks the adoption of computing for application in business.
The term "software engineering" is attributed to Margaret Hamilton,
who wanted to stress the legitimacy of her work "as part of the overall systems engineering process"
when developing the guidance and navigation system for the Apollo missions [@cameron2018].
The ever-increasing size and complexity of software systems in military and business applications in the 1960s
led to a gap between ambitions and achievements in software development regarding performance, reliability and cost.
To address this "software crisis", the first software engineering conferences were held in 1968 and 1969 [@Randell1979; @Buxton1970].
These conferences are frequently cited as the beginning of the field "software engineering".
The decoupling of software engineering as the "application of engineering to software, and the study of such approaches" [@ieee610.12-990]
from the root of software development in scientific programming
was a necessary and foundational step for the development of the field.
In an early article defining the term "software engineering", @boehm1976 separates two problem-areas that very well map to the difference between software development in science and general software engineering:
"*Area 1: detailed design and coding of systems software by experts in a relatively economics-independent context.*
Unfortunately, the most pressing software development problems are in an area we shall call *Area 2: requirements analysis design, test, and maintenance of applications software by technicians in an economics-driven context.*"
He continues "And in Area 2, our scientific foundations are so slight that one can seriously question whether our current techniques deserve to be called "software engineering."
This question has been answered in the positive by the academic community by now.
However, it took well into the 21st century for software engineering to fully emancipate from computer science (see e.g. @parnas1999).
Today, software engineering has developed into a mature academic discipline in its own right,
has an extensive body of knowledge (see @swebok2024), continually increases its number of sub-disciplines,
is taught in numerous degree programmes, and produces highly sought-after professionals.
The bulk of this development however took place in what @boehm1976 identified as "Area 2"
and catered to problems that have not been pressing in scientific applications.
A lot of the development of software engineering is driven by the needs of businesses and business applications.
Software engineers "need to learn the key engineering skills to enable them to build products that are safe for the public to use" [@oregan2012]
and consider the "challenges and constraints of 'industrial-strength' software in a competitive market" [@mahoney2004].
Moreover, software engineering was influenced by the aim of serving management needs
such as analysis and design matching hierarchical division of projects, hierarchical assignment of tasks and methods for cost accounting and estimation [@mahoney2004].
A direct application of SE techniques in the research context is often neither feasible nor necessary.
If the developer coincides with the user, requirements analysis for example might be neglected.
Scientific software development frequently aims exclusively at the production of code that supports a specific investigation where speed is often of the essence.
Hence, the whole software construction process might best be characterised as "rapid prototyping"
and foregoes the level of planning that would be required by SE best practices.
In a business-context, the externalisation of knowledge about the software in the form of the documentation
(of the development process, of the internal functioning of the software and of the user-facing functionality) is very important,
but hardly plays a role in many scientific applications, where users are domain-experts and can read and modify the source code.
For these reasons the community of software developers in science has lost touch with mainstream software engineering,
which for a long time had little to offer to a domain where state-of-the-art software-projects are written in Fortran (e.g. [@powers2017]).
There are efforts from the software engineering community to identify and describe the lack of adoption of software engineering principles in computational science [@Kelly2007; @Johanson2018].
# Research Software Engineering for Natural Sciences: Convergence
There is much that the RSE community can learn from software engineering and adopt common practices. Conversely, the research software engineering opens up a new field of research for the SE community nowadays referred to as RSE research [@Felderer2025; @leitlinien2025].
There are certain influencing factors that have developed differently over time, causing
classical software engineering and computing for the sciences to drift apart.
These differences can be illustrated with (at least three) structural dichotomies:
- fast prototyping (short term) vs. long-term software growth
- internalised knowledge vs. externalised knowledge
- mathematically driven modelling vs. human driven modelling
We will now discuss these formerly dividing points and how they are becoming less polarised due to the changing academic landscape.
Most research in academia is usually driven by individuals that often do their research
in order to achieve a personal qualification goal, such as a PhD.
This leads to a set of intermingling goals. On the one hand, there is the personal qualification goal, that has to be achieved in a given time frame
in order to advance in the career. On the other hand, the institution has its own long-term goals and must cope with a high turnover of researchers.
Also, the scientific academic culture in general is characterised by a rapidly changing environment and frequent publication pressure.
This web of goals leads to certain aspects that are particular to software development in academia, such as frequent turnover of developers, limited long-term maintenance, and a focus on producing quick results for publications rather than building robust, reusable software.
In particular, software is often developed and maintained only during a single PhD project. Hence, the development is mostly research-oriented. Therefore, the software often remains in the prototype stage instead of prototyping in order to prepare a blueprint that a larger team of engineers can use to build a well-designed and sound product.
Often, this results in a duplication of effort or "reinventing the wheel" [@Smith2024].
To counteract this trend, funding agencies have created incentives to consolidate fragmented domain codes by prioritising research software that is easily extensible and re-usable in different contexts with minimally-invasive adaptations [@DFG2022RSE].
Academic software development operates in rapidly evolving environments, demanding flexible processes. In principle, this fits well with agile approaches. However, agile methods such as Scrum or Kanban are usually designed and taught for larger, industry-style teams. In research, teams are often small, short-lived, or even composed of a single developer, making direct adoption impractical. Instead, these practices require thoughtful adaptation. Embedding lightweight, research-appropriate agile workflows, guided by professional software engineering expertise, should be a core component of an RSE curriculum for the Natural Sciences. More broadly, this highlights that software engineering techniques cannot simply be transplanted into research contexts without adjustment, and it points to an important area for RSE-focused research.
In addition, the ongoing digitisation of society and of science in particular has reshaped the scientific ecosystem, making a reevaluation of this separation necessary. The increased complexity, and also the way how science is conducted and organised have incurred the need to incorporate practices from software engineering that hitherto have been neglected.
In the past, it was considered "good enough" for software to be developed by a single PhD student. However, as software projects have become more ambitious and research projects have become more complex, there is now a need for more systematic approaches to workflows and team organisation. Previously, software was often developed as a by-product of scientific work and was rarely made explicit in grant proposals. However, with the increasing reliance of science on software, funding agencies have changed their requirements: software development must now be transparent, documented, and tracked to justify funding and quantify development costs. Research software itself is recognised to be a valuable asset that can and should be reused and built upon.
A recent trend is the growing awareness of the environmental impact of scientific computing, which is now being considered in research planning [@lannelongue2023]. Moreover, the lack of established processes to ensure software quality contributed to the so-called "reproducibility crisis" [@Pashler2012]. In response, there have been increasing demands for reproducible research, leading to the formulation of the FAIR principles [@Hutton2016; @Stagge2019; @Stodden2018].
While initially much research could be conducted using standard commercial software packages with few dependencies, an ever-increasing amount of research software now relies on other research software. This means that software must provide stable APIs and be reliable across multiple versions, placing greater pressure on reliability. There have also been cases where faulty software led to incorrect scientific conclusions, resulting in what has been called the "credibility crisis" [@Miller2006Xray; @Smart2018; @Miller2016fMRI].
Additionally, the trend toward transdisciplinary projects has introduced more complex requirements due to a more heterogeneous set of stakeholders. Also, the emerging use of large language models (LLMs) or quantum computing in software development processes further requires that the latest results from software engineering research are taken into account [@Farshidi2025].
While the historical separation between computing in science and in software engineering may seem logical at first glance, the recent developments increasingly challenge this divide.
Scientific work now faces growing demands that align closely with established software engineering principles.
These include the need for transparent and auditable software contributions to justify funding, standardised development processes driven by the reproducibility crisis, and an emphasis on energy efficiency in response to the climate crisis.
Furthermore, the growing complexity of research, the rise of transdisciplinary collaboration, and the integration have introduced intricate software pipelines across disciplines.
Together, these trends highlight the convergence of scientific computing and software engineering practices.
# A RSE Master's Curriculum
As Research Software Engineering starts to combine software engineering and science again, thereby reversing the historical trend. The question arises which disciplinary identity a research software engineer should have.
We argue that neither additional training in a natural science for software engineers nor vice versa would be suited for the development of an adequate professional identity. For example an SE-curriculum augmented with physics courses within the typical time frame allotted for a master's degree will not be sufficient for a student to accumulate the exposure, knowledge and experience necessary to develop a physicist's identity in addition to her SE identity. As @gomez2025 observe, a new discipline is required for a new professional identity acquired through formal academic education. For this reason a natural science track (type 3 [@zukunft2016empfehlungen]) is planned in addition to the generalist RSE master (type 1 [@zukunft2016empfehlungen]).
Based on the above discussion and workshops with the RSE-community [@derseev] the natural science track in the curriculum should contain these parts:
1. identity building modules that addresses key RSE issues such as ...
- history of software engineering in science [@leroy_when_2021]
- research software science [@Felderer2025;@heroux_research_2022]
- interpersonal competencies such as management and communication in research
- ethical implications of research software (energy consumption, social engineering etc.)
- software requirements in research
- the open source software community of practice
2. software engineering foundation and required computer science modules, e.g.
- software architecture and design
- database and information systems
- computer architecture
- distributed systems
- software engineering operations
- software construction
- practical software engineering project
3. science specific computation modules
- numerical methods and high performance computing
- statistics and machine learning methods
- the role of simulation modelling in scientific practice
- distributed research infrastructure technologies
4. specialisation in their original natural science domain
a. exemplary specialisation in theoretical particle physics
- quantum field theory
- elementary particle physics
- particle detector design
- further lab work (e.g. electronics labs),
- elective course from string theory, group theory, general relativity
b. exemplary specialisation in computational biology
- applied cell and molecular biology
- quantitative genetics
- quantitative cell and molecular biology laboratory
- genomics
- statistical genetics
5. neighbouring cross-cutting computational fields
- data science
- research data management
This set of skills makes the RSE a professional who is able to take a set of
equations from a modern theory, implement them reliably and reusable, and lead an interdisciplinary team to create and maintain a successful software-project for the academic world around it.
# Conclusion
The availability of ever more computing resources requires a control
of the increasing complexity of deeper technology stacks. It also requires larger forms
of organisation in order to drive progress in the natural sciences forward.
Combined with external pressures to making Research Software FAIR, we expect a transition in the natural sciences to more structured development processes guided by Software Engineering principles.
In order to be prepared for these changes we argue that a new job profile is necessary.
While the generic Research Software Engineer is already meant to work at this intersection of research and Software Engineering, successful work in the natural sciences poses additional challenges. The inherent complexity of the domain requires deep knowledge in the domain. Furthermore, collaboration in teams of scientists is only facilitated by a respective knowledge of the domains culture.
We argue that in order to address this challenge, a unique identity is needed for these professionals. This identity can be established through a specialised master's degree that builds upon a domain bachelor's.
The RSE-Master's for the natural sciences combines a domain science with software engineering training and modules that address key RSE issues, enabling them to work in or lead highly specialised teams of scientists.
Thus qualified RSEs can operate at the intersection of natural sciences and software engineering, positioning them uniquely to advance scientific discovery while developing and sustaining complex research software.