|
| 1 | +--- |
| 2 | +layout: publication |
| 3 | +permalink: /c/aire25 |
| 4 | +title: "Beyond Retrieval: A Study of Using LLM Ensembles for Candidate Filtering in Requirements Traceability" |
| 5 | +description: |
| 6 | +publication: fuchss_beyond_2025 |
| 7 | +authors: |
| 8 | + - dominik_fuchss |
| 9 | + - stefan_schwedt |
| 10 | + - jan_keim |
| 11 | + - tobias_hey |
| 12 | +--- |
| 13 | + |
| 14 | +To be published at the [33rd International Requirements Engineering Conference Workshops (REW)](https://aire-ws.github.io/aire25/). |
| 15 | + |
| 16 | +{:width="100%" style="background-color: white; border-radius: 8px; padding: 10px; display: block; margin: 0 auto;"} |
| 17 | + |
| 18 | +## Abstract |
| 19 | + |
| 20 | +**[Introduction]** |
| 21 | +Requirements traceability is essential in software development, supporting tasks such as system understanding and change impact analysis. |
| 22 | +Traceability link recovery (TLR) methods, including those using large language models (LLMs) or retrieval-augmented generation (RAG), often rely on information retrieval (IR) to identify candidate artifact pairs. |
| 23 | +They are sensitive to hyperparameters (e.g., top-k, similarity thresholds) that require extensive, project-specific tuning. |
| 24 | + |
| 25 | +**[Methods]** |
| 26 | +We propose an inter-requirements TLR approach that uses an ensemble of small LLMs (or small language models (SLM)) to incrementally reduce the search space, aiming to replace IR-based candidate selection. |
| 27 | +We first analyze the sensitivity of IR methods to hyperparameters, then evaluate the ability of small LLMs to filter unrelated requirement pairs, and compare their performance when integrated into a TLR approach. |
| 28 | +The evaluation includes five projects from the requirements engineering community. |
| 29 | + |
| 30 | +**[Results]** |
| 31 | +We find that IR performance heavily depends on project-specific hyperparameter tuning. |
| 32 | +Furthermore, small LLMs effectively reduce the candidate space with minimal loss of recall. |
| 33 | +While our LLM-based ensemble approach achieves comparable F2-scores to IR methods, it lags in precision. |
| 34 | + |
| 35 | +**[Conclusion]** |
| 36 | +This work provides insights into the capabilities of small LLMs as a filter in inter-requirements TLR. |
| 37 | +Moreover, it provides insights into the performance of traditional IR techniques for TLR and their dependency on hyperparameters. |
| 38 | + |
| 39 | +## Links |
| 40 | + |
| 41 | +- Paper on [KITopen](https://publikationen.bibliothek.kit.edu/1000183058) |
| 42 | +- Replication Package on [Zenodo](https://doi.org/10.5281/zenodo.15837231) and the corresponding [GitHub repository](https://github.com/ArDoCo/Replication-Package-AIRE25_Beyond-Retrieval-Using-LLM-Ensembles-for-Candidate-Filtering-in-Req-TLR) |
0 commit comments