This repository demonstrates the use of Retrieval-Augmented Generation (RAG) and Prompt Engineering to classify software requirements into functional (F) and non-functional requirements (NFRs). These techniques leverage Large Language Models (LLMs) to enhance classification accuracy by combining retrieval and context-aware generation.
Prompt Engineering uses optimized task-specific prompts with few-shot learning to guide the LLMs in understanding and classifying requirements effectively.
Highlights:
- Leverages representative examples for accurate predictions.
- Processes LLM responses with a parser for consistent output.
RAG improves classification by retrieving contextually relevant examples from a semantic vector database and integrating them into the prompt.
Workflow:
- Retrieve: Use embeddings to find semantically similar examples from a database (e.g., Pinecone).
- Generate: Append retrieved examples to the prompt and process through an LLM.
- Parse: Ensure output aligns with predefined categories.
| Method | Model | F1 | Accuracy |
|---|---|---|---|
| Prompt Engineering | GPT-3.5-Turbo | 96.03 | 74.74 |
| RAG | GPT-3.5-Turbo (RAG) | 96.63 | 79.79 |
- Python 3.x
- Jupyter Notebook
- Libraries:
transformers,pinecone,openai, etc. (seerequirements.txt)
The classification is performed using the PROMISE and PROMISE_exp datasets:
-
PROMISE Dataset:
- Contains 625 natural language software requirements.
- Includes:
- 255 functional requirements (F).
- 370 non-functional requirements (NFRs) distributed across categories like security, usability, performance, etc.
-
PROMISE_exp Dataset:
- An extended version of PROMISE with 969 requirements.
- Includes:
- 444 functional requirements (F).
- 525 non-functional requirements (NFRs) distributed across additional categories like maintainability, scalability, and fault tolerance.
Dataset Split:
- Training: 80%
- Testing: 20%
Non-Functional Categories in PROMISE_exp:
| Category | Count |
|---|---|
| Availability (A) | 31 |
| Fault Tolerance (FT) | 18 |
| Legal (L) | 15 |
| Look & Feel (LF) | 49 |
| Maintainability (MN) | 24 |
| Operability (O) | 77 |
| Performance (PE) | 67 |
| Scalability (SC) | 22 |
| Security (SE) | 125 |
| Usability (US) | 85 |
| Portability (PO) | 12 |
| Total | 969 |
The dataset provides a comprehensive basis for training and testing, ensuring diverse and real-world software requirement scenarios are covered.
This project is licensed under the MIT License. See the LICENSE file for details.
