Proper ordering for publications

marcopernpruner · web-flow · commit 7dc1fa4e00c2 · 2025-11-21T13:48:10.000+01:00
diff --git a/_data/publications.yml b/_data/publications.yml
@@ -2283,6 +2283,25 @@
   year: 2025
   doi:
 
+- id: FGCS2025
+  id_iris: 364408
+  title: "A comparative benchmark study of LLM-based threat elicitation tools"
+  authors:
+    - DimitriVanLanduyt
+    - MajidMollaeefar
+    - MarioRaciti
+    - StefVerreydt
+    - AbdulazizKalash
+    - AndreaBissoli
+    - DavyPreuveneers
+    - GiampaoloBella
+    - SilvioRanise
+  abstract: >
+    Threat modeling refers to the software design activity that involves the proactive identification, evaluation, and mitigation of specific potential threat scenarios. Recently, attention has been growing for the potential to automate the threat elicitation process using Large Language Models (llms), and different tools have emerged that are capable of generating threats based on system models and other descriptive system documentation. This paper presents the outcomes of an experimental evaluation study of llm-based threat elicitation tools, which we apply to two complex and contemporary application cases that involve biometric authentication. The comparative benchmark is based on a grounded approach to establish four distinct baselines which are representative of the results of human threat modelers, both novices and experts. In support of scale and reproducibility, the evaluation approach itself is maximally automated using sentence transformer models to perform threat mapping. Our study evaluates 56 distinct threat models generated by 6 llm-based threat elicitation tools. While the generated threats are somewhat similar to the threats documented by human threats modelers, relative performance is low. The evaluated llm-based threat elicitation tools prove to be particularly inefficient in eliciting the threats on the expert level. Furthermore, we show that performance differences between these tools can be attributed on a similar level to both the prompting approach (e.g., multi-shot, knowledge pre-prompting, role prompting) and the actual reasoning capabilities of the underlying llms being used.
+  destination: FGCS
+  year: 2025
+  doi: 10.1016/j.future.2025.108243
+
 - id: IWBF2025
   id_iris: 362127
   title: "Spotting Tell-Tale Visual Artifacts in Face Swapping Videos: Strengths and Pitfalls of CNN Detectors"
@@ -2384,22 +2403,4 @@
   year: 2025
   doi: 
 
-- id: FGCS2025
-  id_iris: 364408
-  title: "A comparative benchmark study of LLM-based threat elicitation tools"
-  authors:
-    - DimitriVanLanduyt
-    - MajidMollaeefar
-    - MarioRaciti
-    - StefVerreydt
-    - AbdulazizKalash
-    - AndreaBissoli
-    - DavyPreuveneers
-    - GiampaoloBella
-    - SilvioRanise
-  abstract: >
-    Threat modeling refers to the software design activity that involves the proactive identification, evaluation, and mitigation of specific potential threat scenarios. Recently, attention has been growing for the potential to automate the threat elicitation process using Large Language Models (llms), and different tools have emerged that are capable of generating threats based on system models and other descriptive system documentation. This paper presents the outcomes of an experimental evaluation study of llm-based threat elicitation tools, which we apply to two complex and contemporary application cases that involve biometric authentication. The comparative benchmark is based on a grounded approach to establish four distinct baselines which are representative of the results of human threat modelers, both novices and experts. In support of scale and reproducibility, the evaluation approach itself is maximally automated using sentence transformer models to perform threat mapping. Our study evaluates 56 distinct threat models generated by 6 llm-based threat elicitation tools. While the generated threats are somewhat similar to the threats documented by human threats modelers, relative performance is low. The evaluated llm-based threat elicitation tools prove to be particularly inefficient in eliciting the threats on the expert level. Furthermore, we show that performance differences between these tools can be attributed on a similar level to both the prompting approach (e.g., multi-shot, knowledge pre-prompting, role prompting) and the actual reasoning capabilities of the underlying llms being used.
-  destination: FGCS
-  year: 2025
-  doi: 10.1016/j.future.2025.108243
 # PLEASE KEEP ALPHABETICAL ORDER BY ID WITHIN YEARS