Skip to content

Commit 2ff772e

Browse files
authored
[None][feat] Add benchmark to DeepConf (#8776)
Signed-off-by: Dong Cao <[email protected]>
1 parent 497a070 commit 2ff772e

File tree

5 files changed

+320
-75
lines changed

5 files changed

+320
-75
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{"question": "One hundred concentric circles are labelled $C_{1}, C_{2}, C_{3}, \\ldots, C_{100}$. Each circle $C_{n}$ is inscribed within an equilateral triangle whose vertices are points on $C_{n+1}$. Given $C_{1}$ has a radius of $1$, what is the radius of $C_{100}$ ?\n", "answer": "2^{99}"}
2+
{"question": "An infinite geometric sequence with common ratio $r$ sums to $91$. A new sequence starting with the same term has common ratio $r^{3}$. The sum of the new sequence produced is $81$. What was the common ratio of the original sequence?\n", "answer": "\\frac{1}{9}"}
3+
{"question": "Let $A, B, C, D$, and $E$ be five equally spaced points on a line in that order. Let $F, G, H$, and $I$ all be on the same side of line $A E$ such that triangles $A F B, B G C, C H D$, and $D I E$ are equilateral with side length $1$. Let $S$ be the region consisting of the interiors of all four triangles. Compute the length of segment $A I$ that is contained in $S$.", "answer": "\\frac{\\sqrt{13}}{2}"}
4+
{"question": "If $5 f(x)-x f\\left(\\frac{1}{x}\\right)=\\frac{1}{17} x^{2}$, determine $f(3)$.", "answer": "\\frac{1}{9}"}
5+
{"question": "How many ways are there to arrange $1,2,3,4,5,6$ such that no two consecutive numbers have the same remainder when divided by $3$ ?", "answer": "240"}
6+
{"question": "Joshua is playing with his number cards. He has $9$ cards of $9$ lined up in a row. He puts a multiplication sign between two of the $9 \\mathrm{~s}$ and calculates the product of the two strings of $9 \\mathrm{~s}$. For example, one possible result is $999 \\times 999999=998999001$. Let $S$ be the sum of all possible distinct results (note that $999 \\times 999999$ yields the same result as $999999 \\times 999$ ). What is the sum of digits of $S$ ?", "answer": "72"}
7+
{"question": "Bruno the Bear is tasked to organize $16$ identical brown balls into $7$ bins labeled 1-7. He must distribute the balls among the bins so that each odd-labeled bin contains an odd number of balls, and each even-labeled bin contains an even number of balls (with $0$ considered even). In how many ways can Bruno do this?", "answer": "924"}
8+
{"question": "Let $f(n)$ be the number obtained by increasing every prime factor in $f$ by one. For instance, $f(12)=(2+1)^{2}(3+1)=36$. What is the lowest $n$ such that $6^{2025}$ divides $f^{(n)}(2025)$, where $f^{(n)}$ denotes the $n$th iteration of $f$ ?", "answer": "20"}
9+
{"question": "How many positive integer divisors of $63^{10}$ do not end in a $1$ ?", "answer": "173"}
10+
{"question": "Bruno is throwing a party and invites $n$ guests. Each pair of party guests are either friends or enemies. Each guest has exactly $12$ enemies. All guests believe the following: the friend of an enemy is an enemy. Calculate the sum of all possible values of $n$. (Please note: Bruno is not a guest at his own party)", "answer": "100"}
11+
{"question": "In acute $\\triangle A B C$, let $D$ be the foot of the altitude from $A$ to $B C$ and $O$ be the circumcenter. Suppose that the area of $\\triangle A B D$ is equal to the area of $\\triangle A O C$. Given that $O D=2$ and $B D=3$, compute $A D$.", "answer": "3+2\\sqrt{2}"}
12+
{"question": "Alice has $10$ gifts $g_{1}, g_{2}, \\ldots, g_{10}$ and $10$ friends $f_{1}, f_{2}, \\ldots, f_{10}$. Gift $g_{i}$ can be given to friend $f_{j}$ if\n\n$$\ni-j=-1,0, \\text { or } 1 \\quad(\\bmod 10)\n$$\n\nHow many ways are there for Alice to pair the $10$ gifts with the $10$ friends such that each friend receives one gift?", "answer": "125"}
13+
{"question": "Let $\\triangle A B C$ be an equilateral triangle with side length $1$. A real number $d$ is selected uniformly at random from the open interval $(0,0.5)$. Points $E$ and $F$ lie on sides $A C$ and $A B$, respectively, such that $A E=d$ and $A F=1-d$. Let $D$ be the intersection of lines $B E$ and $C F$.\n\nConsider line $\\ell$ passing through both points of intersection of the circumcircles of triangles $\\triangle D E F$ and $\\triangle D B C . O$ is the circumcenter of $\\triangle D E F$. Line $\\ell$ intersects line $\\overleftrightarrow{B C}$ at point $P$, and point $Q$ lies on $A P$ such that $\\angle A Q B=120^{\\circ}$. What is the probability that the line segment $\\overline{Q O}$ has length less than $\\frac{1}{3}$ ?", "answer": "\\frac{1}{3}"}
14+
{"question": "Define sequence $\\left\\{a_{n}\\right\\}_{n=1}^{\\infty}$ such that $a_{1}=\\frac{\\pi}{3}$ and $a_{n+1}=\\cot ^{-1}\\left(\\csc \\left(a_{n}\\right)\\right)$ for all positive integers $n$. Find the value of\n\n$$\n\\frac{1}{\\cos \\left(a_{1}\\right) \\cos \\left(a_{2}\\right) \\cos \\left(a_{3}\\right) \\cdots \\cos \\left(a_{16}\\right)}\n$$", "answer": "7"}
15+
{"question": "Define $\\{x\\}$ to be the fractional part of $x$. For example, $\\{20.25\\}=0.25$ and $\\{\\pi\\}=\\pi-3$. Let $A=\\sum_{a=1}^{96} \\sum_{n=1}^{96}\\left\\{\\frac{a^{n}}{97}\\right\\}$, where $\\{x\\}$ denotes the fractional part of $x$. Compute $A$ rounded to the nearest integer.", "answer": "4529"}
16+
{"question": "Find the smallest positive integer $n$ such that $n$ is divisible by exactly $25$ different positive integers.", "answer": "1296"}
17+
{"question": "Two squares, $A B C D$ and $A E F G$, have equal side length $x$. They intersect at $A$ and $O$. Given that $C O=2$ and $O A=2 \\sqrt{2}$, what is $x$ ?", "answer": "1+\\sqrt{3}"}
18+
{"question": "Bruno and Brutus are running on a circular track with a $20$ foot radius. Bruno completes $5$ laps every hour, while Brutus completes $7$ laps every hour. If they start at the same point but run in opposite directions, how far along the track's circumference (in feet) from the starting point are they when they meet for the sixth time? Note: Do not count the moment they start running as a meeting point.", "answer": "20\\pi"}
19+
{"question": "What is the smallest positive integer $n$ such that $z^{n}-1$ and $(z-\\sqrt{3})^{n}-1$ share a common complex root?", "answer": "12"}
20+
{"question": "Consider a pond with lily pads numbered from $1$ to $12$ arranged in a circle. Bruno the frog starts on lily pad 1. Each turn, Bruno has an equal probability of making one of three moves: jumping $4$ lily pads clockwise, jumping $2$ lily pads clockwise, or jumping $1$ lily pad counterclockwise. What is the expected number of turns for Bruno to return to lily pad $1$ for the first time?", "answer": "12"}
21+
{"question": "$4$ bears - Aruno, Bruno, Cruno and Druno - are each given a card with a positive integer and are told that the sum of their $4$ numbers is $17$. They cannot show each other their cards, but discuss a series of observations in the following order:\n\nAruno: \"I think it is possible that the other three bears all have the same card.\"\nBruno: \"At first, I thought it was possible for the other three bears to have the same card. Now I know it is impossible for them to have the same card.\"\nCruno: \"I think it is still possible that the other three bears have the same card.\"\nDruno: \"I now know what card everyone has.\"\nWhat is the product of their four card values?", "answer": "160"}
22+
{"question": "Digits $1$ through $9$ are placed on a $3 x 3$ square such that all rows and columns sum to the same value. Please note that diagonals do not need to sum to the same value. How many ways can this be done?", "answer": "72"}
23+
{"question": "Define the operation $\\oplus$ by\n\n$$\nx \\oplus y=x y-2 x-2 y+6 .\n$$\n\nCompute all complex numbers $a$ such that\n\n$$\na \\oplus(a \\oplus(a \\oplus a))=a .\n$$", "answer": "2,3,\\frac{3+i\\sqrt{3}}{2},\\frac{3-i\\sqrt{3}}{2}"}
24+
{"question": "Define the function $f$ on positive integers\n\n$$\nf(n)= \\begin{cases}\\frac{n}{2} & \\text { if } n \\text { is even } \\\\ n+1 & \\text { if } n \\text { is odd }\\end{cases}\n$$\n\nLet $S(n)$ equal the smallest positive integer $k$ such that $f^{k}(n)=1$. How many positive integers satisfy $S(n)=11$ ?", "answer": "89"}
25+
{"question": "Let $A B C D E F$ be a convex cyclic hexagon. Suppose that $A B=D E=\\sqrt{5}, B C=E F=3$, and $C D=F A=\\sqrt{20}$. Compute the circumradius of $A B C D E F$.", "answer": "\\frac{1+\\sqrt{31}}{2}"}
26+
{"question": "A repetend is the infinitely repeated digit sequence of a repeating decimal. What are the last three digits of the repetend of the decimal representation of $\\frac{1}{727}$, given that the repetend has a length of $726$ ? Express the answer as a three-digit number. Include preceding zeros if there are any.", "answer": "337"}
27+
{"question": "Consider a $54$-deck of cards, i.e. a standard $52$-card deck together with two jokers. Ada draws cards from the deck until Ada has drawn an ace, a king, and a queen. How many cards does Ada pick up on average?", "answer": "\\frac{737}{39}"}
28+
{"question": "Let $\\omega$ be a circle, and let a line $\\ell$ intersect $\\omega$ at two points, $P$ and $Q$. Circles $\\omega_{1}$ and $\\omega_{2}$ are internally tangent to $\\omega$ at points $X$ and $Y$, respectively, and both are tangent to $\\ell$ at a common point $D$. Similarly, circles $\\omega_{3}$ and $\\omega_{4}$ are externally tangent to $\\omega$ at $X$ and $Y$, respectively, and are tangent to $\\ell$ at points $E$ and $F$, respectively.\nGiven that the radius of $\\omega$ is $13$, the segment $\\overline{P Q}=24$, and $\\overline{Y D}=\\overline{Y E}$, find the length of segment $\\overline{Y F}$.", "answer": "5\\sqrt{2}"}
29+
{"question": "Let $f$ be a degree $7$ polynomial satisfying\n$$\nf(k)=\\frac{1}{k^{2}}\n$$\n\nfor $k \\in\\{1 \\cdot 2,2 \\cdot 3, \\ldots, 8 \\cdot 9\\}$. Find $f(90)-\\frac{1}{90^{2}}$.", "answer": "-\\frac{2431}{50}"}
30+
{"question": "Let $\\triangle A B C$ be an isosceles triangle with $A B=A C$. Let $D$ be a point on the circumcircle of $\\triangle A B C$ on minor arc $A B$. Let $\\overline{A D}$ intersect the extension of $\\overline{B C}$ at $E$. Let $F$ be the midpoint of segment $A C$, and let $G$ be the intersection of $\\overline{E F}$ and $\\overline{A B}$. Let the extension of $\\overline{D G}$ intersect $\\overline{A C}$ and the circumcircle of $\\triangle A B C$ at $H$ and $I$, respectively. Given that $D G=3, G H=5$, and $H I=1$, compute the length of $A E$.", "answer": "\\frac{9\\sqrt{30}}{4}"}

examples/scaffolding/contrib/DeepConf/run_generation.py

Lines changed: 106 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
11
import argparse
2+
import json
23
import time
4+
from dataclasses import dataclass
5+
from pathlib import Path
6+
from typing import Dict
7+
8+
import numpy as np
9+
from utils import equal_func, prepare_prompt
310

411
from tensorrt_llm.scaffolding import (NativeGenerationController,
5-
ScaffoldingLlm, TRTLLMWorker)
12+
ScaffoldingLlm, TRTLLMWorker,
13+
extract_answer_from_boxed)
614
from tensorrt_llm.scaffolding.contrib.DeepConf import (
715
DeepConfOfflineController, DeepConfOfflineMajorityVoteController,
816
DeepConfOnlineController, DeepConfOnlineMajorityVoteController)
@@ -28,35 +36,91 @@ def parse_arguments():
2836
required=True,
2937
choices=list(_RUN_TYPE_TO_IMPL.keys()),
3038
help="Type of the run. Available choices: %(choices)s")
31-
parser.add_argument('--sample_num', type=int, default=20)
32-
parser.add_argument('--conf_group_size', type=int, default=128)
39+
parser.add_argument('--warmup_sample_num', type=int, default=16)
40+
parser.add_argument('--sample_num', type=int, default=256)
41+
parser.add_argument('--conf_group_size', type=int, default=2048)
3342
parser.add_argument('--conf_threshold', type=float, default=0.5)
3443
parser.add_argument('--vote_policy',
3544
type=str,
3645
default="top10_bottom_window_filtered")
37-
parser.add_argument('--warmup_sample_num', type=int, default=5)
38-
parser.add_argument('--confidence_percentile', type=int, default=90)
46+
parser.add_argument('--confidence_percentile', type=int, default=10)
3947
parser.add_argument('--logprobs_topk', type=int, default=20)
40-
parser.add_argument('--max_tokens', type=int, default=8192)
48+
parser.add_argument('--max_tokens', type=int, default=64000)
4149
parser.add_argument('--temperature', type=float, default=0.6)
4250
parser.add_argument('--top_p', type=float, default=0.95)
51+
parser.add_argument('--top_k', type=int, default=0)
52+
parser.add_argument('--qid', type=int, default=-1)
53+
parser.add_argument('--dataset', type=str, default="brumo_2025.jsonl")
54+
parser.add_argument('--repeat_times', type=int, default=1)
55+
parser.add_argument('--tensor_parallel_size', type=int, default=1)
4356
args = parser.parse_args()
4457
return args
4558

4659

47-
def run_scaffolding_llm(prompts, proposer_worker, controller):
60+
@dataclass
61+
class BenchResult:
62+
right_answer_count: int = 0
63+
total_answer_count: int = 0
64+
accuracy: float = 0.0
65+
generated_tokens: int = 0
66+
67+
68+
def run_scaffolding_llm(prompts,
69+
proposer_worker,
70+
controller,
71+
repeat_times=1,
72+
ground_truth=None,
73+
**kwargs):
4874
llm = ScaffoldingLlm(
4975
controller,
5076
{
5177
NativeGenerationController.WorkerTag.GENERATION: proposer_worker,
5278
},
5379
)
54-
time_start = time.time()
55-
results = llm.generate(prompts)
56-
time_end = time.time()
57-
print(f"time cost: {time_end - time_start} seconds")
58-
for i, result in enumerate(results):
59-
print(f"result {i}:\n{result.outputs[0].text}")
80+
81+
is_majority_vote = isinstance(
82+
controller, DeepConfOnlineMajorityVoteController) or isinstance(
83+
controller, DeepConfOfflineMajorityVoteController)
84+
vote_policy_to_bench_result: Dict[str, BenchResult] = {}
85+
times = []
86+
for i in range(repeat_times):
87+
print(f"=========== round {i} ===========")
88+
start_time = time.time()
89+
results = llm.generate(prompts)
90+
times.append(time.time() - start_time)
91+
92+
for j, result in enumerate(results):
93+
print(
94+
f"result {j}: {extract_answer_from_boxed(result.outputs[0].text)}"
95+
)
96+
97+
if is_majority_vote and ground_truth is not None:
98+
vote_policy_to_voted_task = result.cur_output.vote_policy_to_voted_task
99+
for vote_policy, voted_task in vote_policy_to_voted_task.items(
100+
):
101+
bench_result = vote_policy_to_bench_result.get(
102+
vote_policy, BenchResult())
103+
104+
voted_answer = voted_task.customized_result_fields[
105+
'extracted_answer']
106+
if equal_func(voted_answer, ground_truth[j]):
107+
bench_result.right_answer_count += 1
108+
bench_result.total_answer_count += 1
109+
bench_result.generated_tokens += result.cur_output.output_token_num
110+
111+
vote_policy_to_bench_result[vote_policy] = bench_result
112+
113+
print(f"e2e inference median time cost: {np.median(times):.2f} seconds")
114+
115+
if is_majority_vote:
116+
for vote_policy, bench_result in vote_policy_to_bench_result.items():
117+
bench_result.accuracy = bench_result.right_answer_count / bench_result.total_answer_count
118+
print(
119+
f"vote_policy: {vote_policy}, accuracy: {bench_result.accuracy}"
120+
)
121+
122+
print(f"generated tokens: {bench_result.generated_tokens}")
123+
60124
llm.shutdown(shutdown_workers=True)
61125

62126

@@ -83,7 +147,8 @@ def test_single_vote_controller(prompts,
83147
conf_group_size=conf_group_size,
84148
conf_threshold=conf_threshold,
85149
)
86-
run_scaffolding_llm(prompts, proposer_worker, prototype_controller)
150+
run_scaffolding_llm(prompts, proposer_worker, prototype_controller,
151+
**kwargs)
87152

88153

89154
def test_majority_vote_controller(prompts,
@@ -94,6 +159,7 @@ def test_majority_vote_controller(prompts,
94159
temperature,
95160
max_tokens,
96161
top_p,
162+
top_k,
97163
sample_num,
98164
warmup_sample_num,
99165
vote_policy,
@@ -106,6 +172,7 @@ def test_majority_vote_controller(prompts,
106172
"max_tokens": max_tokens,
107173
"num_logprobs": logprobs_topk,
108174
"top_p": top_p,
175+
"top_k": top_k,
109176
})
110177
DeepConfControllerKwargs = {
111178
"generation_controller": generation_controller,
@@ -125,7 +192,8 @@ def test_majority_vote_controller(prompts,
125192
vote_policy=vote_policy,
126193
warmup_sample_num=warmup_sample_num,
127194
confidence_percentile=confidence_percentile)
128-
run_scaffolding_llm(prompts, proposer_worker, majority_vote_controller)
195+
run_scaffolding_llm(prompts, proposer_worker, majority_vote_controller,
196+
**kwargs)
129197

130198

131199
def main():
@@ -138,25 +206,39 @@ def main():
138206
"warmup_sample_num": args.warmup_sample_num,
139207
"confidence_percentile": args.confidence_percentile,
140208
"logprobs_topk": args.logprobs_topk,
141-
"max_tokens": args.max_tokens,
142209
"temperature": args.temperature,
143210
"top_p": args.top_p,
211+
"top_k": args.top_k,
212+
"repeat_times": args.repeat_times,
213+
"max_tokens": args.max_tokens,
144214
}
145215

146-
prompts = [
147-
"Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\r\n\r\n",
148-
"There exist real numbers $x$ and $y$, both greater than 1, such that $\\log_x\\left(y^x\\right)=\\log_y\\left(x^{4y}\\right)=10$. Find $xy$.",
149-
"Find the largest possible real part of \\[(75+117i)z+\\frac{96+144i}{z}\\]where $z$ is a complex number with $|z|=4$.",
150-
]
151-
152216
llm_worker = TRTLLMWorker.init_with_new_llm(
153217
args.model_dir,
154218
backend="pytorch",
155-
max_batch_size=32,
156-
max_num_tokens=kwargs.get("max_tokens"),
219+
max_batch_size=2048,
220+
max_num_tokens=args.max_tokens,
157221
)
158222
print(f"init llm worker done")
159223

224+
dataset_path = Path(__file__).parent / args.dataset
225+
with open(dataset_path, 'r', encoding='utf-8') as file:
226+
question_data = [json.loads(line.strip()) for line in file]
227+
228+
if args.qid != -1:
229+
question_data = [question_data[args.qid]]
230+
prompts = [
231+
prepare_prompt(question_data['question'], llm_worker.tokenizer)
232+
for question_data in question_data
233+
]
234+
ground_truth = [
235+
str(question_data.get('answer', '')).strip()
236+
for question_data in question_data
237+
]
238+
kwargs["ground_truth"] = ground_truth
239+
240+
print(f"has {len(prompts)} prompts")
241+
160242
if args.run_type == "offline" or args.run_type == "online":
161243
test_single_vote_controller(prompts,
162244
llm_worker,

0 commit comments

Comments
 (0)