Skip to content

Commit 26dd94c

Browse files
udpated readme
1 parent 0514a0c commit 26dd94c

File tree

2 files changed

+8
-36
lines changed

2 files changed

+8
-36
lines changed

image/benchmark_result.jpg

268 KB
Loading

readme.md

Lines changed: 8 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -19,42 +19,14 @@ You need the following to run the QueryCraft pipeline:
1919
---
2020
## Benchmark Results
2121

22-
### 1. QueryCraft pipeline experiment results using various LLMs
23-
24-
| Base Model | Eval Dataset | Accuracy |
25-
| -------------------------------- | ------------ | -------- |
26-
| DB-chat-sql model/Codellama-13b | Spider Dev | 63.60% |
27-
| Granite 20B code Instruct | Spider Dev | 63% |
28-
| CodeLlama 34B instruct | Spider Dev | 62.11% |
29-
| CodeLlama 13B Instruct | Spider Dev | 57.98% |
30-
| CodeLlama 7B Instruct | Spider Dev | 56% |
31-
| CodeLlama 7B Instruct -finetune | Spider Dev | 55.62% |
32-
| CodeLlama 13B Instruct -finetune | Spider Dev | 53.87% |
33-
| Lllama-2-70B | Spider Dev | 53.44% |
34-
| Lllama-2-70B-Chat | Spider Dev | 52.00% |
35-
| Defog-sqlcoder-34b-alpha | Spider Dev | 51.93% |
36-
| sqlcoder-34b-alpha | Spider Dev | 51.21% |
37-
| Llama-2-7B-Chat -finetune | Spider Dev | 27.00% |
38-
39-
40-
### 2. Enhanced outcomes are evident with the query correction service, as demonstrated by its post-processing accuracy.
41-
42-
| Base Model | Eval Dataset | Accuracy | Post Processed Accuracy |
43-
| -------------------------------- | ------------ | -------- | ----------------------- |
44-
| DB-chat-sql model/Codellama-13b | Spider Dev | 63.60% | 70.57% |
45-
| Granite 20B code Instruct | Spider Dev | 63% | 69% |
46-
| codeLlama 34B instruct | Spider Dev | 62.11% | 62.11% |
47-
| CodeLlama 13B Instruct | Spider Dev | 57.98% | 64.04% |
48-
| CodeLlama 7B Instruct | Spider Dev | 56% | 57.00% |
49-
| CodeLlama 7B Instruct -finetune | Spider Dev | 55.62% | 61.24% |
50-
| CodeLlama 13B Instruct -finetune | Spider Dev | 53.87% | 59.96% |
51-
| Lllama-2-70B | Spider Dev | 53.44% | 53.44% |
52-
| Lllama-2-70B-Chat | Spider Dev | 52.00% | 53.04% |
53-
| Defog-sqlcoder-34b-alpha | Spider Dev | 51.93% | 62.92% |
54-
| sqlcoder-34b-alpha | Spider Dev | 51.21% | 56.05% |
55-
| Llama-2-7B-Chat -finetune | Spider Dev | 27.00% | 32% |
56-
57-
### 3. Smaller fine-tuned models outperform larger ones
22+
We conducted various experiments on the Spider development dataset using different pre-trained and fine-tuned language model (LLM) architectures. Below are the highest results achieved with their respective LLMs. The highest result, <strong>70.57%</strong>, was obtained with the DB-Chat SQL model, compared to <strong>69% </strong>achieved by the Granite 20B code instruction-based model.
23+
24+
<img src= "image/benchmark_result.jpg">
25+
26+
#### Outcomes:
27+
1. Enhanced outcomes are evident with the query correction service, as demonstrated by its post-processing accuracy.
28+
2. Smaller fine-tuned models outperforms some larger ones pretrained models
29+
5830

5931

6032

0 commit comments

Comments
 (0)