You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have built my own tactic-generation model, inspired by ReProver, but due to limited computational resources, I have trained it only on a subset of leandojo_benchmark_4, specifically the theorems from the linear_algebra domain. I have also created a matching subset of the corpus with only the premises needed for those theorems.
To evaluate my model fairly, I would like to test it against the same test dataset that ReProver uses—just the part corresponding to the linear_algebra subset. To do this, I am using your released retriever and generator models (kaiyuy/leandojo-lean4-retriever-byt5-small and kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small), but only on my restricted data.
Could you please confirm if the following evaluation pipeline is a proper way to use your retriever and generator models in this context?
corpus
CORPUS_PATH = "./leandojo_benchmark_4/corpus_linear.jsonl"
corpus_premises: list[str] = []
with open(CORPUS_PATH, "r", encoding="utf-8") as f:
for line in f:
entry = json.loads(line)
for p in entry.get("premises", []):
name = p.get("full_name", "")
code = p.get("code", "")
if isinstance(code, str) and name:
# Προσθέτουμε name μπροστά από τον κώδικα
corpus_premises.append(f"{name} {code}")
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi and thank you for your great work on ReProver.
I have built my own tactic-generation model, inspired by ReProver, but due to limited computational resources, I have trained it only on a subset of leandojo_benchmark_4, specifically the theorems from the linear_algebra domain. I have also created a matching subset of the corpus with only the premises needed for those theorems.
To evaluate my model fairly, I would like to test it against the same test dataset that ReProver uses—just the part corresponding to the linear_algebra subset. To do this, I am using your released retriever and generator models (kaiyuy/leandojo-lean4-retriever-byt5-small and kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small), but only on my restricted data.
Could you please confirm if the following evaluation pipeline is a proper way to use your retriever and generator models in this context?
corpus
CORPUS_PATH = "./leandojo_benchmark_4/corpus_linear.jsonl"
corpus_premises: list[str] = []
with open(CORPUS_PATH, "r", encoding="utf-8") as f:
for line in f:
entry = json.loads(line)
for p in entry.get("premises", []):
name = p.get("full_name", "")
code = p.get("code", "")
if isinstance(code, str) and name:
# Προσθέτουμε name μπροστά από τον κώδικα
corpus_premises.append(f"{name} {code}")
RETRIEVER_NAME = "kaiyuy/leandojo-lean4-retriever-byt5-small"
GENERATOR_NAME = "kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small"
DEVICE = "cuda"
Load retriever tokenizer & model
tok_ret = AutoTokenizer.from_pretrained(RETRIEVER_NAME)
mod_ret = AutoModelForTextEncoding.from_pretrained(RETRIEVER_NAME).to(DEVICE)
@torch.no_grad()
def encode_texts(texts: list[str], batch_size: int = 32) -> torch.Tensor:
embs = []
for i in range(0, len(texts), batch_size):
batch = texts[i : i + batch_size]
toks = tok_ret(batch, return_tensors="pt", padding=True, truncation=True).to(DEVICE)
hs = mod_ret(toks.input_ids).last_hidden_state
mask = toks.attention_mask.unsqueeze(-1)
# mean‐pool over tokens:
feat = (hs * mask).sum(dim=1) / mask.sum(dim=1)
embs.append(feat.cpu())
return torch.cat(embs, dim=0) # [N, D]
precompute embeddings
corpus_embs = encode_texts(corpus_premises, batch_size=64)
Retrieval function (dot‐product)
def retrieve_topk(state: str, k: int = 100) -> list[str]:
state_emb = encode_texts([state])[0]
scores = torch.matmul(state_emb, corpus_embs.T)
idxs = scores.topk(k).indices.tolist()
return [corpus_premises[i] for i in idxs]
Load retrieval‐augmented generator
tok_gen = AutoTokenizer.from_pretrained(GENERATOR_NAME)
mod_gen = AutoModelForSeq2SeqLM.from_pretrained(
GENERATOR_NAME, device_map=DEVICE, load_in_8bit=True
)
generator = pipeline(
"text2text-generation", model=mod_gen, tokenizer=tok_gen,
max_length=1024, device_map="auto"
)
def get_tactic_with_retrieval(state: str, k: int = 20) -> str:
topk = retrieve_topk(state["pp"], k=k)
prompt = "Premises:\n" + "\n".join(f"- {p}" for p in topk)
+ "\n\nGoal:\n" + state["pp"]
Beta Was this translation helpful? Give feedback.
All reactions