Skip to content

Commit 11ed758

Browse files
committed
98 halted questions results
1 parent b174d2f commit 11ed758

File tree

7 files changed

+418
-281
lines changed

7 files changed

+418
-281
lines changed

agent/notebooks/20250521_halted_questions/analyze_model_consensus.py

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,53 @@
55
# Set working directory to the datapoints folder
66
datapoints_dir = Path("datapoints")
77

8-
# Read the evaluation results
9-
file_path = datapoints_dir / "ddf--datapoints--evaluation_result--by--question--model_configuration--prompt_variation.csv"
10-
df = pl.read_csv(file_path)
8+
# Read the rate files
9+
correct_rate_df = pl.read_csv(datapoints_dir / "ddf--datapoints--correct_rate--by--question--model_configuration.csv")
10+
wrong_rate_df = pl.read_csv(datapoints_dir / "ddf--datapoints--wrong_rate--by--question--model_configuration.csv")
11+
very_wrong_rate_df = pl.read_csv(datapoints_dir / "ddf--datapoints--very_wrong_rate--by--question--model_configuration.csv")
12+
indecisive_rate_df = pl.read_csv(datapoints_dir / "ddf--datapoints--indecisive_rate--by--question--model_configuration.csv")
1113

12-
# Group by question and model_configuration to get the most common evaluation result for each model on each question
13-
# This aggregates across different prompt variations
14+
# Join the dataframes
1415
model_results = (
15-
df.group_by(["question", "model_configuration", "evaluation_result"])
16-
.count()
17-
.sort(["question", "model_configuration", "count"], descending=[False, False, True])
18-
.group_by(["question", "model_configuration"])
19-
.first()
20-
.select(["question", "model_configuration", "evaluation_result"])
16+
correct_rate_df
17+
.join(wrong_rate_df, on=["question", "model_configuration"])
18+
.join(very_wrong_rate_df, on=["question", "model_configuration"])
19+
.join(indecisive_rate_df, on=["question", "model_configuration"])
2120
)
2221

22+
# Determine the model result based on the highest rate
23+
def determine_result(row):
24+
rates = {
25+
"correct": row["correct_rate"],
26+
"wrong": row["wrong_rate"],
27+
"very_wrong": row["very_wrong_rate"],
28+
"indecisive": row["indecisive_rate"]
29+
}
30+
31+
# Find the result with the highest rate
32+
max_result = max(rates.items(), key=lambda x: x[1])
33+
result_type, max_rate = max_result
34+
35+
# If the highest rate is greater than 85%, use that result
36+
# Otherwise, mark as "indecisive"
37+
threshold = 90.0
38+
if max_rate > threshold:
39+
return result_type
40+
else:
41+
return "indecisive"
42+
43+
# Apply the function to determine the model result for each row
44+
results = []
45+
for row in model_results.to_dicts():
46+
results.append({
47+
"question": row["question"],
48+
"model_configuration": row["model_configuration"],
49+
"evaluation_result": determine_result(row)
50+
})
51+
52+
# Convert to Polars DataFrame
53+
model_results = pl.DataFrame(results)
54+
2355
# Get the distinct model configurations to count how many we have
2456
model_configs = model_results.select("model_configuration").unique().sort("model_configuration")
2557
num_models = len(model_configs)
Lines changed: 27 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,36 @@
11
question,most_common_result,consensus_count,total_models,consensus_percentage,question_text
2-
61,correct,5,5,100.0,What Happens To The Global Average Temperature After A Large Volcanic Eruption?
2+
12,indecisive,5,5,100.0,"In The Early 2000'S, More Than 80% Of The World'S Population Received The Full Dose Of The Measles, Mumps And Rubella Vaccine. What Do You Think Happened To This Percentage In The Last 10 Years?"
33
15,correct,5,5,100.0,What Share Of The Adult Population (15-74 Years) In High-Income Countries Are Currently On Medication Against Depression?
4-
60,correct,5,5,100.0,How Much Must The Total Co2 Emissions In The World Change Before 2030 To Avoid A Global Warming Of More Than 1.5 Degrees Celsius?
5-
42,correct,5,5,100.0,"Around 820,000 Patent Applications Are Registered Yearly In High-Income Countries. What Is The Number In Low- And Middle-Income Countries?"
6-
28,correct,5,5,100.0,How Many People In The World Have Jobs But Are Still Considered To Be Poor (Working Poor)?
7-
47,correct,5,5,100.0,What Kind Of Conutries Have In General More Equal Incomes?
8-
31,correct,5,5,100.0,"Worldwide, What Is The Unemployment Rate Among People Aged 25 Years And Older?"
9-
65,correct,5,5,100.0,"The Atmosphere Is Part Of The Lower Layer Around The Planet. When This Lower Layer Warms Up Due To Climate Change, What Happens To The Layer Above It?"
10-
9,correct,5,5,100.0,"Of The Poorest People Who Managed To Escape Extreme Poverty Only To Then Fall Back, What Was The Most Common Reason They Became Poorer Again (Before The Pandemic)?"
114
19,correct,5,5,100.0,"In 1980, Around 20% Of The World'S Population Lived In Areas Where Water Was Scarce. What Is That Figure Today?"
12-
59,very_wrong,5,5,100.0,What Share Of The Global Population Have Already Experienced A Temperature Increase Of 1.5 Degrees Celsius In At Least One Season?
13-
96,correct,5,5,100.0,The World Population Is Currently 7.7 Billion People. When Will The World Population Be Double Of That?
14-
1,wrong,5,5,100.0,"If Economic Trends Continue In The Three African Countries With Most People In Extreme Poverty, What Will Happen To The Number Of Extremely Poor In The Next 10 Years?"
15-
4,correct,5,5,100.0,What Share Of The World'S Population Is Undernourished?
16-
10,very_wrong,5,5,100.0,What'S The Main Reason Why Overweight Is More Common In Richer Countries?
17-
67,correct,5,5,100.0,Which Of The Following Gases Does Not Contribute To Global Warming?
5+
2,indecisive,5,5,100.0,"What Share Of The World'S Population Has Access To Food, Shelter, Water, Vaccination, Education And Mobile Phones?"
6+
20,correct,5,5,100.0,The Number Of Liters Of Water It Takes To Produce One T-Shirt Is The Same Amount Of Drinking Water One Person Needs For How Long?
187
21,correct,5,5,100.0,Which Country Is The Largest Exporter Of Coal In The World Today?
19-
92,correct,5,5,100.0,"In The Past Ten Years, The Share Of The World’S Population Living In Democratic Countries Has:"
8+
29,correct,5,5,100.0,"Worldwide, How Many People With Jobs Live In Extreme Poverty?"
9+
31,correct,5,5,100.0,"Worldwide, What Is The Unemployment Rate Among People Aged 25 Years And Older?"
10+
33,indecisive,5,5,100.0,"Since 2010, What Has Happened To The Number Of Atms Per Person Worldwide?"
11+
34,indecisive,5,5,100.0,What Has Grown The Fastest Since The Year 2000?
12+
36,correct,5,5,100.0,What Is The Global Unemployment Rate?
13+
37,correct,5,5,100.0,How Can The World Best Be Divided?
14+
4,correct,5,5,100.0,What Share Of The World'S Population Is Undernourished?
15+
40,indecisive,5,5,100.0,What Do Large Firms (100 Employees Or More) Find The Biggest Obstacle To Doing Business In Africa?
16+
47,correct,5,5,100.0,What Kind Of Conutries Have In General More Equal Incomes?
17+
48,correct,5,5,100.0,"In 2019, The Un General Assembly Voted To Declare September 18Th Each Year As The International Day Of What?"
2018
49,correct,5,5,100.0,Did A Majority Of The World'S Population Get Richer Or Poorer In The Last 30 Years?
19+
58,indecisive,5,5,100.0,"If Current Trends Continue, When Are Fossil Fuels Expected To Be Used Up?"
20+
60,correct,5,5,100.0,How Much Must The Total Co2 Emissions In The World Change Before 2030 To Avoid A Global Warming Of More Than 1.5 Degrees Celsius?
21+
61,correct,5,5,100.0,What Happens To The Global Average Temperature After A Large Volcanic Eruption?
22+
65,correct,5,5,100.0,"The Atmosphere Is Part Of The Lower Layer Around The Planet. When This Lower Layer Warms Up Due To Climate Change, What Happens To The Layer Above It?"
23+
67,correct,5,5,100.0,Which Of The Following Gases Does Not Contribute To Global Warming?
2124
68,correct,5,5,100.0,Which Of The Following Planets Has The Coldest Surfaces Temperatures?
22-
45,very_wrong,5,5,100.0,"If You Line Up All Roads In The World, How Many Times Can You Go Around The World?"
23-
94,correct,5,5,100.0,"Thirty Years Ago, Democratic Countries Produced Around 75% Of All The World’S Income. What Is That Figure Today?"
24-
71,correct,5,5,100.0,Which Among The Following Three Sectors Produces The Largest Share Of Greenhouse Gas Emissions?
25-
13,correct,5,5,100.0,"Ten Years Ago, 80% Of The World Received At Least One Dose Of The Measles, Mumps And Rubella Vaccine. What Do You Think Happened To This Percentage In The Last 10 Years?"
2625
70,correct,5,5,100.0,The Levels Of Co2 In The Air Are Increasing. What Happens To Vegetation As A Result Of This?
27-
22,correct,5,5,100.0,"Since 2016, The Number Of People Who Cook Using Stoves That Don’T Produce Smoke Has Remained At Around 3 Billion People. Why Is That?"
28-
37,correct,5,5,100.0,How Can The World Best Be Divided?
29-
55,correct,5,5,100.0,"What Share Of Water That Is Used By Households, Businesses And Industry Goes Back Into The Environment Without Being Cleaned Properly?"
30-
36,correct,5,5,100.0,What Is The Global Unemployment Rate?
31-
81,correct,5,5,100.0,Agricultural Output In Africa Is Mainly Determined By:
32-
20,correct,5,5,100.0,The Number Of Liters Of Water It Takes To Produce One T-Shirt Is The Same Amount Of Drinking Water One Person Needs For How Long?
33-
29,correct,5,5,100.0,"Worldwide, How Many People With Jobs Live In Extreme Poverty?"
34-
85,correct,5,5,100.0,Which Countries Have The Highest Levels Of Corruption?
35-
72,very_wrong,5,5,100.0,What Share Of Coral Reefs Is Currently Not Threatened By Ocean Warming And Acidification?
36-
5,correct,5,5,100.0,What Share Of All Obese Children Under 5 In The World Lives In Africa?
37-
27,wrong,5,5,100.0,"Globally, Exported Goods Make Up 24% Of The Total World Economy. What Is That Number In Africa?"
38-
54,wrong,5,5,100.0,"On Average, How Much Land In Cities In The World Is Dedicated To Open Public Spaces?"
39-
14,wrong,5,5,100.0,In Which Region Will You Find The Largest Share Of People With Cardiovascular Disease?
40-
64,correct,5,5,100.0,"Agricultural Productivity Is How Efficient All The Farming Land, Labor, Money And Machinery Is At Producing Food. How Has That Productivity Been Affected By Climate Change Since 1960?"
26+
71,correct,5,5,100.0,Which Among The Following Three Sectors Produces The Largest Share Of Greenhouse Gas Emissions?
27+
75,indecisive,5,5,100.0,What Share Of 220 Coastal Regions Around The World Improved Their Water Quality Between 2012 And 2018?
28+
8,indecisive,5,5,100.0,Africa Can'T Feed Its Entire Population Yet. This Is Mainly Because Of:
4129
80,correct,5,5,100.0,What Share Of All Forests In The World Is Natural Forest?
42-
53,correct,5,5,100.0,"Of All People Living In Cities In Sub-Saharan Africa, How Many Live Within A Few Hundred Meters Of Public Transport?"
30+
83,indecisive,5,5,100.0,"Worldwide, How Many Firms Feel They Are Expected To Give Gifts To Secure Government Contracts?"
4331
84,correct,5,5,100.0,"In 2019, Which Countries Hosted The Fewest Refugees?"
32+
85,correct,5,5,100.0,Which Countries Have The Highest Levels Of Corruption?
33+
86,indecisive,5,5,100.0,What Is The Most Imported Product To Sub-Saharan Africa?
34+
89,indecisive,5,5,100.0,What Share Of Development Frameworks Designed In Countries Donating Aid Are Drawn Up Using Priorities And Objectives Drawn Up By The Recipient Country?
35+
95,indecisive,5,5,100.0,"In 1990, There Were 4 Countries Where Populists Were In Power. In How Many Countries Are Populists In Power Today?"
36+
96,correct,5,5,100.0,The World Population Is Currently 7.7 Billion People. When Will The World Population Be Double Of That?

0 commit comments

Comments
 (0)