Skip to content

Conversation

@ilmarinen
Copy link
Collaborator

No description provided.

@ilmarinen ilmarinen requested review from nushib and safooray July 8, 2025 20:51
AzureOpenAIModel,
{
"model_name": "gpt-4o",
"url": "https://eurekaevals.openai.azure.com/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's replace this with a placeholder url



@dataclass
class ARCAGI_CleanCOTAnswer(DFTransformBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in the general transforms so others can use the same if they need to for other benchmarks? This is because cleaning COTs is not necessarily arc agi specific. Also, here is another way how we did it for other benchmarks if useful

self.evalreporting_comp.data_reader_config.init_args["transform"].transforms.append(


With that in mind, do your best to solve the question below.

{{ prompt }} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the prompt itself does not ask the model to format the answer in . However the answer extraction sort of assumes this. Is this because reasoning models are expected to have this format? What if the model uses other tags or no tags at all (e.g. if it is a conventional model with no thinking block)? For other reasoning tasks, we ask the model to clearly mark the final answer according to some format. For example:

Final Answer:

and then extract what comes after that.

max_concurrent=1,
)

if resume_logdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to over write results on the same directory? If so this is somewhat of an unexpected behavior because the other pipelines do not do this by default.

"path": os.path.join(self.inference_comp.output_dir, "inference_result.jsonl"),
"format": ".jsonl",
"transform": SequenceTransform(
[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there supposed to be a transformation here? If not, maybe remove the whole component altogether?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read your comment as to why you have an empty post-processing here and would like to point out that this is exactly why we have another data reader in the eval reporting component, that would be where these potential transforms go.

return pipeline


class COT_ARC_AGI_v1_PIPELINE_5Run(ARC_AGI_v1_PIPELINE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this inherit from COT_ARC_AGI_v1_PIPELINE instead?



class ARC_AGI_v1_PIPELINE_5Run(ARC_AGI_v1_PIPELINE):
"""This class specifies the config for running the GPQA benchmark 5 repeated times"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale comment. Maybe search the whole file for gpqa

"path": os.path.join(self.inference_comp.output_dir, "inference_result.jsonl"),
"format": ".jsonl",
"transform": SequenceTransform(
[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty transform

data_reader_config=DataSetConfig(
HFDataReader,
{
"path": "pxferna/ARC-AGI-v1-5050",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this pipeline the same as the previous one but the HF file changes? If so, it is possible to just inherit from the previous one and only change the path of the HF data reader. I did notice however that this one also has majority vote and worst of n, while the other one does not.

nushib
nushib previously approved these changes Jul 14, 2025
@@ -0,0 +1,347 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a pipeline test for this new pipeline under the tests folder.

Parameters:
response (str): Input string containing answer X in the form of "<output>final answer string</output>".
Returns:
answer (str): The final answer string with leading and training spaces stripped.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training -> trailing

"""

model_output_column: str
model_answer_column: str
Copy link
Collaborator

@safooray safooray Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please make the think tag an argument since this is a general transform now?

ColumnRename,
CopyColumn,
ExtractUsageTransform,
MajorityVoteTransform,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like there are unused imports here. please run our formatting commands to clean up the code. See "how to contribute" in readme. TLDR: make format-inplace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants