Skip to content

Commit fc630f4

Browse files
Merge pull request #25 from Bobbins228/add-pre-commit
Add pre-commit
2 parents 38256fa + 8493c54 commit fc630f4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1561
-821
lines changed

.github/workflows/pre-commit.yaml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
name: Pre-commit
2+
3+
on:
4+
pull_request:
5+
push:
6+
branches: [main]
7+
8+
concurrency:
9+
group: ${{ github.workflow }}-${{ github.ref }}
10+
cancel-in-progress: true
11+
12+
jobs:
13+
pre-commit:
14+
runs-on: ubuntu-latest
15+
16+
steps:
17+
- name: Checkout code
18+
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
19+
20+
- name: Set up Python
21+
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
22+
with:
23+
python-version: '3.12'
24+
cache: pip
25+
cache-dependency-path: |
26+
**/requirements*.txt
27+
.pre-commit-config.yaml
28+
29+
- uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
30+
env:
31+
SKIP: no-commit-to-branch
32+
RUFF_OUTPUT_FORMAT: github
33+
34+
- name: Verify if there are any diff files after pre-commit
35+
run: |
36+
git diff --exit-code || (echo "There are uncommitted changes, run pre-commit locally and commit again" && exit 1)
37+
38+
- name: Verify if there are any new files after pre-commit
39+
run: |
40+
unstaged_files=$(git ls-files --others --exclude-standard)
41+
if [ -n "$unstaged_files" ]; then
42+
echo "There are uncommitted new files, run pre-commit locally and commit again"
43+
echo "$unstaged_files"
44+
exit 1
45+
fi

.pre-commit-config.yaml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
exclude: 'build/'
2+
3+
default_language_version:
4+
python: python3.12
5+
6+
repos:
7+
- repo: https://github.com/pre-commit/pre-commit-hooks
8+
rev: v5.0.0 # Latest stable version
9+
hooks:
10+
- id: check-merge-conflict
11+
args: ['--assume-in-merge']
12+
- id: trailing-whitespace
13+
exclude: '\.py$' # Exclude Python files as Ruff already handles them
14+
- id: check-added-large-files
15+
args: ['--maxkb=1000']
16+
- id: end-of-file-fixer
17+
exclude: '^(.*\.svg|.*\.md)$'
18+
- id: no-commit-to-branch
19+
- id: check-yaml
20+
args: ["--unsafe"]
21+
- id: detect-private-key
22+
- id: requirements-txt-fixer
23+
- id: mixed-line-ending
24+
args: [--fix=lf] # Forces to replace line ending by LF (line feed)
25+
- id: check-executables-have-shebangs
26+
- id: check-json
27+
- id: check-shebang-scripts-are-executable
28+
29+
- repo: https://github.com/Lucas-C/pre-commit-hooks
30+
rev: v1.5.4
31+
hooks:
32+
- id: insert-license
33+
files: \.py$|\.sh$
34+
args:
35+
- --license-filepath
36+
- docs/license_header.txt
37+
38+
- repo: https://github.com/astral-sh/ruff-pre-commit
39+
rev: v0.9.4
40+
hooks:
41+
- id: ruff
42+
args: [ --fix ]
43+
exclude: ^llama_stack/strong_typing/.*$
44+
- id: ruff-format
45+
46+
- repo: https://github.com/adamchainz/blacken-docs
47+
rev: 1.19.0
48+
hooks:
49+
- id: blacken-docs
50+
additional_dependencies:
51+
- black==24.3.0
52+
53+
ci:
54+
autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
55+
autoupdate_commit_msg: ⬆ [pre-commit.ci] pre-commit autoupdate

CONTRIBUTING.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,22 @@ When contributing to the `stack/` directory:
9292
- **Document any new overlays** or configurations in the [DEPLOYMENT.md](DEPLOYMENT.md) guide
9393
- **Test deployments** on both OpenShift or Kubernetes when possible
9494
- **Include resource requirements** in documentation
95+
96+
### pre-commit
97+
98+
This project is configured to use pre-commit for every new PR.
99+
You can find instructions for installing pre-commit [here](https://pre-commit.com/#installation)
100+
101+
## Setup pre-commit for the RAG project
102+
103+
Run the following command to allow pre-commit to run before each commit:
104+
105+
``` bash
106+
pre-commit install
107+
```
108+
109+
To run pre-commit without commiting run:
110+
111+
``` bash
112+
pre-commit run --all-files
113+
```

DEPLOYMENT.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ oc patch secret hf-token-secret --type='merge' -p='{"data":{"HF_TOKEN":"'$(echo
4747

4848
```bash
4949
# Create secret llama-stack-inference-model-secret providing model info
50-
# Important:
50+
# Important:
5151
# - Make sure that the value for INFERENCE_MODEL is correct (it doesn't have points)
5252
# - In VLLM_URL you can use internal or external endpoints for the model. Add /v1 at the end
5353
# - NEVER set VLLM_TLS_VERIFY=false in production
@@ -60,8 +60,8 @@ oc create secret generic llama-stack-inference-model-secret \
6060
--from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
6161
--from-literal VLLM_URL="$VLLM_URL" \
6262
--from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
63-
--from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"
64-
63+
--from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"
64+
6565
# Deploy the LlamaStackDistribution
6666
oc apply -k stack/overlays/vllm-remote-inference-model
6767
```
@@ -267,7 +267,7 @@ To completely remove the project and all its resources from OpenShift, follow th
267267
```bash
268268
# Check for processes using port 8080
269269
lsof -i :8080
270-
270+
271271
# Kill the process if found (replace PID with the actual process ID)
272272
kill <PID>
273273
```
@@ -278,4 +278,4 @@ After completing these steps, all resources associated with the RAG stack will b
278278

279279
- [OpenShift Documentation](https://docs.openshift.com/)
280280
- [KServe Documentation](https://kserve.github.io/website/)
281-
- [vLLM Documentation](https://vllm.readthedocs.io/)
281+
- [vLLM Documentation](https://vllm.readthedocs.io/)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# RAG
22

33
## Project Overview
4-
The RAG project serves as a repository of comprehensive demonstrations, benchmarking scripts, and deployment guides for the RAG Stack on Kubernetes/OpenShift.
4+
The RAG project serves as a repository of comprehensive demonstrations, benchmarking scripts, and deployment guides for the RAG Stack on Kubernetes/OpenShift.
55

66
## Getting Started
77
### Deployment

benchmarks/embedding-models-with-beir/benchmark_beir_embedding_models.py

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright 2025 IBM, Red Hat
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
115
import argparse
216
import os
317
import uuid
@@ -92,7 +106,11 @@ def load_beir_dataset(dataset_name: str, custom_datasets_pairs: dict):
92106

93107

94108
def inject_documents(
95-
llama_stack_client: LlamaStackAsLibraryClient, corpus: dict, batch_size: int, vector_db_provider_id: str, embedding_model: str
109+
llama_stack_client: LlamaStackAsLibraryClient,
110+
corpus: dict,
111+
batch_size: int,
112+
vector_db_provider_id: str,
113+
embedding_model: str,
96114
) -> str:
97115
vector_db_id = f"beir-rag-eval-{embedding_model}-{uuid.uuid4().hex}"
98116

@@ -137,7 +155,9 @@ def inject_documents(
137155

138156
# LlamaStack RAG Retriever
139157
class LlamaStackRAGRetriever:
140-
def __init__(self, vector_db_id: str, query_config: RAGQueryConfig, top_k: int = 10):
158+
def __init__(
159+
self, vector_db_id: str, query_config: RAGQueryConfig, top_k: int = 10
160+
):
141161
self.llama_stack_client = llama_stack_client
142162
self.vector_db_id = vector_db_id
143163
self.query_config = query_config
@@ -163,7 +183,9 @@ def retrieve(self, queries, top_k=None):
163183

164184

165185
# Adapted from https://github.com/opendatahub-io/llama-stack-demos/blob/main/demos/rag_eval/Agentic_RAG_with_reference_eval.ipynb
166-
def permutation_test_for_paired_samples(scores_a: list, scores_b: list, iterations: int = 10_000):
186+
def permutation_test_for_paired_samples(
187+
scores_a: list, scores_b: list, iterations: int = 10_000
188+
):
167189
"""
168190
Performs a permutation test of a given statistic on provided data.
169191
"""
@@ -184,7 +206,9 @@ def _statistic(x, y, axis):
184206

185207

186208
# Adapted from https://github.com/opendatahub-io/llama-stack-demos/blob/main/demos/rag_eval/Agentic_RAG_with_reference_eval.ipynb
187-
def print_stats_significance(scores_a: list, scores_b: list, overview_label: str, label_a: str, label_b: str):
209+
def print_stats_significance(
210+
scores_a: list, scores_b: list, overview_label: str, label_a: str, label_b: str
211+
):
188212
mean_score_a = np.mean(scores_a)
189213
mean_score_b = np.mean(scores_b)
190214

benchmarks/llama-stack-rag-with-beir/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ llama stack build --template ollama --image-type venv
2525
```
2626

2727
### About the run.yaml file
28-
* The run.yaml file makes use of Milvus inline as its vector database.
28+
* The run.yaml file makes use of Milvus inline as its vector database.
2929
* There are 3 default embedding models `ibm-granite/granite-embedding-125m-english`, `ibm-granite/granite-embedding-30m-english` and `all-MiniLM-L6-v2`.
3030

3131
To add your own embedding models you can update the `models` section of the `run.yaml` file.
@@ -116,7 +116,7 @@ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" uv run python benchmark_beir_
116116

117117
``` text
118118
dataset-name.zip/
119-
├── qrels/
119+
├── qrels/
120120
│ └── test.tsv # Relevance judgments mapping query IDs to document IDs with relevance scores
121121
├── corpus.jsonl # Document collection with document IDs, titles, and text content
122122
└── queries.jsonl # Test queries with query IDs and question text for retrieval evaluation

0 commit comments

Comments
 (0)