Skip to content

Commit 653179b

Browse files
committed
task diversity study scripts added.
1 parent df4860b commit 653179b

10 files changed

+1233
-0
lines changed
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Configuration for Diverse Task Generator
2+
3+
# Model settings
4+
model:
5+
name: gpt-4o # OpenAI model to use
6+
temperature: 1.0 # Temperature for all steps
7+
max_tokens: 8192 # Max tokens for all steps
8+
9+
# Task generation settings
10+
generation:
11+
tasks_per_blueprint: 3 # Number of tasks to generate per blueprint
12+
min_subtopics: 3 # Suggested minimum number of sub-topics
13+
max_subtopics: 8 # Suggested maximum number of sub-topics
14+
15+
# Output settings
16+
output:
17+
base_dir: diverse_task_outputs
18+
save_intermediate_steps: true # Save each step's output
19+
pretty_print_json: true # Indent JSON files
20+
21+
# Input settings
22+
input:
23+
capability_json_path: capability.json # Default capability JSON file path
24+
25+
# Bloom's Taxonomy definitions
26+
# Source: Revised Bloom's Taxonomy (Anderson & Krathwohl, 2001)
27+
blooms_taxonomy:
28+
Remember:
29+
description: "Retrieving relevant knowledge from long-term memory. Involves recognizing and recalling facts, terms, basic concepts, or answers."
30+
keywords: ["define", "list", "identify", "recall", "name", "state"]
31+
32+
Understand:
33+
description: "Constructing meaning from instructional messages. Involves interpreting, exemplifying, classifying, summarizing, inferring, comparing, and explaining."
34+
keywords: ["explain", "describe", "interpret", "summarize", "compare", "contrast"]
35+
36+
Apply:
37+
description: "Carrying out or using a procedure in a given situation. Involves executing or implementing a method, technique, or process."
38+
keywords: ["apply", "use", "implement", "execute", "solve", "demonstrate"]
39+
40+
Analyze:
41+
description: "Breaking material into constituent parts and determining how parts relate to one another and to an overall structure. Involves differentiating, organizing, and attributing."
42+
keywords: ["analyze", "differentiate", "organize", "distinguish", "examine", "compare"]
43+
44+
Evaluate:
45+
description: "Making judgments based on criteria and standards. Involves checking for internal consistency or logical fallacies, and critiquing based on external criteria."
46+
keywords: ["evaluate", "judge", "critique", "assess", "justify", "argue"]
47+
48+
Create:
49+
description: "Putting elements together to form a novel, coherent whole or make an original product. Involves generating, planning, and producing."
50+
keywords: ["create", "design", "construct", "develop", "formulate", "generate"]
51+
52+
# Difficulty level definitions
53+
difficulty_levels:
54+
easy:
55+
description: "Basic, straightforward problems requiring minimal steps and fundamental knowledge."
56+
characteristics:
57+
- "Single concept application"
58+
- "Direct recall or simple calculation"
59+
- "Clear and unambiguous"
60+
- "Minimal prerequisite knowledge"
61+
62+
medium:
63+
description: "Moderate complexity requiring multiple steps, integration of concepts, or non-trivial reasoning."
64+
characteristics:
65+
- "Multiple concept integration"
66+
- "Multi-step solution required"
67+
- "Some prerequisite knowledge needed"
68+
- "May involve edge cases"
69+
70+
hard:
71+
description: "Complex, challenging problems requiring deep understanding, multiple concepts, edge cases, or sophisticated reasoning."
72+
characteristics:
73+
- "Complex multi-concept integration"
74+
- "Multiple challenging steps"
75+
- "Deep domain knowledge required"
76+
- "Edge cases and exceptions"
77+
- "May require insight or creative approach"
78+
79+
# Verification criteria
80+
verification:
81+
pass_threshold: 0.8 # Minimum pass rate to consider successful
82+
strict_mode: false # If true, all alignment criteria must pass
83+
84+
# Example capability for quick testing
85+
example_capability:
86+
name: "compound_interest_calculations"
87+
description: "The ability to calculate compound interest for various scenarios, including different compounding frequencies (annually, semi-annually, quarterly, monthly), different time periods, and understanding how changes in principal, rate, or time affect the final amount."
88+
domain: "personal_finance"
89+
area: "investing_and_savings"
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
"""Dataclasses for the diverse task generation pipeline."""
2+
3+
from dataclasses import dataclass, field
4+
from typing import Dict, List, Optional
5+
6+
7+
@dataclass
8+
class Capability:
9+
"""Represents a capability to be tested."""
10+
11+
name: str
12+
description: str
13+
domain: str
14+
area: Optional[str] = None
15+
example_tasks: List[Dict] = field(default_factory=list)
16+
17+
18+
@dataclass
19+
class SubTopic:
20+
"""Represents a sub-topic within a capability."""
21+
22+
name: str
23+
description: Optional[str] = None
24+
25+
26+
@dataclass
27+
class Combination:
28+
"""Represents a valid (content, difficulty, reasoning) combination."""
29+
30+
content: str
31+
difficulty: str
32+
reasoning: str
33+
rationale: Optional[str] = None
34+
35+
36+
@dataclass
37+
class Blueprint:
38+
"""Represents a task blueprint for a specific combination."""
39+
40+
combination_id: int
41+
subtopic: str
42+
difficulty: str
43+
reasoning: str
44+
blueprint: str
45+
key_characteristics: List[str] = field(default_factory=list)
46+
example_question_outline: Optional[str] = None
47+
rationale: Optional[str] = None
48+
49+
50+
@dataclass
51+
class Task:
52+
"""Represents a generated multiple-choice task."""
53+
54+
task_id: str
55+
blueprint_id: int
56+
subtopic: str
57+
difficulty: str
58+
reasoning: str
59+
question: str
60+
choices: Dict[str, str]
61+
correct_answer: str
62+
explanation: Optional[str] = None
63+
alignment_notes: Optional[str] = None
64+
65+
66+
@dataclass
67+
class VerificationResult:
68+
"""Represents the verification result for a task."""
69+
70+
task_id: str
71+
subtopic_aligned: bool
72+
difficulty_aligned: bool
73+
reasoning_aligned: bool
74+
choices_appropriate: bool
75+
overall_aligned: bool
76+
feedback: str
77+
suggested_improvements: Optional[str] = None

0 commit comments

Comments
 (0)