Merge pull request #4 from mgb45/Handling-preferences

mgb45 · web-flow · commit 8a3b9964b713 · 2025-07-03T20:24:26.000+10:00
Add preference handling, graceful behaviour with constraint subsets.
diff --git a/.gitignore b/.gitignore
@@ -35,3 +35,6 @@ var/
 .installed.cfg
 *.egg
 MANIFES
+
+# Test files
+*.xlsx
diff --git a/README.md b/README.md
@@ -1,53 +1,84 @@
 # Teamformer
 
-
-Teamformer builds student teams for you. The primary objective is to form as few teams as are needed while ensuring constraints are met, and encouraging WAM (weighted average mark/gpa) balance across teams. The system is basically a wrapper around a CP-SAT solver using Google OR-Tools.
+Teamformer builds student teams for you. The primary objective is to form as few teams as needed while ensuring constraints are met and encouraging WAM (weighted average mark/GPA) balance across teams. The system is a wrapper around a CP-SAT solver using Google OR-Tools.
 
 Constraint handling includes:
 
-✅ Each student is assigned to exactly one team 
+✅ Each student is assigned to exactly one team
 
 ✅ Each team has between min and max students
 
-✅ No team has only one student of a given gender (current only M/F, if other self-report categories these are ignored and not balanced, but does not break anything)
+✅ No team has only one student of a given gender (currently only M/F; other self-reported categories are ignored for balancing, but won't break anything)
 
 ✅ The number of teams used is minimized
 
 ✅ Students are only assigned to teams in the same lab as them
 
-✅ Deviation from average WAM across class is penalised
+✅ Deviation from average WAM across the class is penalised
+
+✅ **Student preferences are favoured (positive and negative preferences now supported!)**
 
-❌ Student preferences are favoured (Not yet implemented)
+The output is an Excel sheet with students and teams. Team numbers may not be sequential (drawn from 1\:max\_teams).
 
-The output is an excel sheet with students and teams. Team numbers may not be sequential (drawn from 1:max_teams).
+---
 
 ### Data structure
-Team former assumes data is in a spreadsheet that looks something like (fake data):
 
-|    | first_name   | last_name   | email                     | gender   |   wam |   lab |
-|---:|:-------------|:------------|:--------------------------|:---------|------:|------:|
-|  0 | Mark         | Johnson     | ...  | M        | 51.13 |     3 |
-|  1 | Donald       | Walker      | ...  | M        | 60.04 |     1 |
-|  2 | Sarah        | Rhodes      | ...  | F        | 76.57 |     1 |
-|  3 | Steven       | Miller      | ...  | M        | 54.22 |     2 |
-|  4 | Javier       | Johnson     | ... | M        | 75.26 |     4 |
+Teamformer expects data in a spreadsheet like this (fake example):
+
+|   | Student\_ID | first\_name | last\_name | email | gender | wam  | lab | Prefer\_With | Prefer\_Not\_With |
+| - | ----------- | ----------- | ---------- | ----- | ------ | ---- | --- | ------------ | ----------------- |
+| 0 | S1          | Mark        | Johnson    | ...   | M      | 51.1 | 3   | S2, S3       | S4                |
+| 1 | S2          | Donald      | Walker     | ...   | M      | 60.0 | 1   |              |                   |
+| 2 | S3          | Sarah       | Rhodes     | ...   | F      | 76.6 | 1   | S1           | S3                |
+| 3 | S4          | Steven      | Miller     | ...   | M      | 54.2 | 2   |              |                   |
+| 4 | S5          | Javier      | Johnson    | ...   | M      | 75.3 | 4   |              |                   |
+
+**Columns used:**
+
+* `Student_ID`
+* `gender`
+* `wam`
+* `lab`
+* `Prefer_With` (optional): comma-separated list of Student\_IDs the student wants to work with
+* `Prefer_Not_With` (optional): comma-separated list of Student\_IDs the student prefers not to work with
 
-**Only the gender, wam and lab columns are used.**
+---
 
 ### Install
 
-```
+```bash
 pip install -e .
 ```
+
+---
+
 ### Run
 
-```
-team_former --input_file=students.xlsx --sheet_name=0 --output_file=teams.xlsx --wam_weight=0.05 --min_team_size=3 --max_team_size=5 --max_solve_time=30
+```bash
+team_former --input_file=students.xlsx --sheet_name=0 --output_file=teams.xlsx --wam_weight=0.05 --pos_pref_weight=0.8 --neg_pref_weight=0.8 --min_team_size=3 --max_team_size=5 --max_solve_time=30
 ```
 
+---
+
 ### How to get a good solution
 
-Depending on your class sizes, demographics and lab distribution, you may struggle to find a feasible solution. Options to address this include:
-* Increase the max solve time, it may just be a matter of waiting a bit longer
-* Reduce or remove the wam weight penalty
-* Reduce the minimimun team size, it may be that the balance of students is infeasible.
+Depending on your class sizes, demographics, and lab distribution, you may struggle to find a feasible solution. Options to address this include:
+
+* Increase the max solve time — it may just be a matter of waiting longer
+* Reduce or remove the WAM weight penalty
+* Adjust the minimum team size — sometimes team balance is infeasible
+* Adjust positive or negative preference weights (e.g., set `pos_pref_weight=0.5` if you want preferences to have less influence)
+
+---
+
+### Preference handling
+
+When using preference columns, Teamformer will attempt to:
+
+* **Keep students together** if listed in `Prefer_With`, unless it conflicts with other constraints.
+* **Avoid assigning students together** if listed in `Prefer_Not_With`.
+
+Preferences are not strictly enforced (they are "soft" constraints), but they strongly influence the solution when weights are set high.
+
+---
diff --git a/team_former/make_teams.py b/team_former/make_teams.py
@@ -7,24 +7,67 @@
 from ortools.sat.python import cp_model
 
 
+def parse_preferences(df):
+    """Parse positive and negative preferences from the DataFrame columns."""
+    id_to_index = {row["Student_ID"]: idx for idx, row in df.iterrows()}
+
+    positive_prefs = []
+    negative_prefs = []
+
+    has_pos = "Prefer_With" in df.columns
+    has_neg = "Prefer_Not_With" in df.columns
+
+    for _, row in df.iterrows():
+        student = row["Student_ID"].strip()
+
+        # Positive preferences
+        if has_pos and pd.notna(row["Prefer_With"]) and row["Prefer_With"].strip():
+            preferred = [s.strip() for s in row["Prefer_With"].split(",") if s.strip()]
+            for target in preferred:
+                if target in id_to_index:
+                    positive_prefs.append((student, target))
+
+        # Negative preferences
+        if (
+            has_neg
+            and pd.notna(row["Prefer_Not_With"])
+            and row["Prefer_Not_With"].strip()
+        ):
+            not_preferred = [
+                s.strip() for s in row["Prefer_Not_With"].split(",") if s.strip()
+            ]
+            for target in not_preferred:
+                if target in id_to_index:
+                    negative_prefs.append((student, target))
+
+    positive_prefs = [(id_to_index[a], id_to_index[b]) for (a, b) in positive_prefs]
+    negative_prefs = [(id_to_index[a], id_to_index[b]) for (a, b) in negative_prefs]
+
+    return positive_prefs, negative_prefs
+
+
 def allocate_teams(
     *,
     input_file="students.xlsx",
     sheet_name=0,
     output_file="class_teams.xlsx",
     wam_weight=0.05,
+    pos_pref_weight=0.05,
+    neg_pref_weight=0.1,
     min_team_size=4,
     max_team_size=5,
     max_solve_time=60,
 ):
     """
-    Allocate students into balanced teams based on WAM, gender, and lab constraints.
+    Allocate students into balanced teams based on optional WAM, gender, lab, and preferences.
 
     Args:
         input_file (str): Path to the Excel file with student data.
         sheet_name (int or str): Sheet index or name.
         output_file (str): Output Excel file with team assignments.
         wam_weight (float): Weight for WAM balancing in the objective.
+        pos_pref_weight (float): Weight for positive preference balancing.
+        neg_pref_weight (float): Weight for negative preference balancing.
         min_team_size (int): Minimum number of students per team.
         max_team_size (int): Maximum number of students per team.
         max_solve_time (int): Solver timeout in seconds.
@@ -34,13 +77,25 @@ def allocate_teams(
 
     students = student_df.to_dict(orient="index")
     num_students = len(students)
-    genders = student_df["gender"]
-    wams = student_df["wam"].astype(int).values
-    lab_ids = sorted(set(student_df["lab"].astype(int).values))
-    student_labs = student_df["lab"].astype(int).values
-    global_avg_wam = sum(wams) // len(wams)
     max_teams = num_students // min_team_size
 
+    has_wam = "wam" in student_df.columns
+    has_lab = "lab" in student_df.columns
+    has_gender = "gender" in student_df.columns
+
+    if has_wam:
+        wams = student_df["wam"].astype(int).values
+        global_avg_wam = int(sum(wams) / len(wams))
+
+    if has_lab:
+        lab_ids = sorted(set(student_df["lab"].astype(int).values))
+        student_labs = student_df["lab"].astype(int).values
+
+    if has_gender:
+        genders = student_df["gender"].values
+
+    pos_preferences, neg_preferences = parse_preferences(student_df)
+
     model = cp_model.CpModel()
 
     # Variables
@@ -51,11 +106,13 @@ def allocate_teams(
     }
 
     team_used = [model.NewBoolVar(f"team_used_{team}") for team in range(max_teams)]
-    lab_team = {
-        (team, lab): model.NewBoolVar(f"team_{team}_lab_{lab}")
-        for team in range(max_teams)
-        for lab in lab_ids
-    }
+
+    if has_lab:
+        lab_team = {
+            (team, lab): model.NewBoolVar(f"team_{team}_lab_{lab}")
+            for team in range(max_teams)
+            for lab in lab_ids
+        }
 
     # Constraints
     for i in range(num_students):
@@ -67,45 +124,76 @@ def allocate_teams(
         model.Add(team_size >= min_team_size).OnlyEnforceIf(team_used[team])
         model.Add(team_size == 0).OnlyEnforceIf(team_used[team].Not())
 
-    for team in range(max_teams):
-        model.AddExactlyOne(lab_team[team, lab] for lab in lab_ids)
-
-    for i in range(num_students):
+    if has_lab:
         for team in range(max_teams):
-            model.Add(lab_team[team, student_labs[i]] == 1).OnlyEnforceIf(
-                assign[i, team]
-            )
+            model.AddExactlyOne(lab_team[team, lab] for lab in lab_ids)
 
-    for team in range(max_teams):
-        male_students = [
-            assign[i, team] for i in range(num_students) if genders[i] == "M"
-        ]
-        female_students = [
-            assign[i, team] for i in range(num_students) if genders[i] == "F"
-        ]
-        if male_students:
-            model.Add(sum(male_students) != 1)
-        if female_students:
-            model.Add(sum(female_students) != 1)
-
-    # Objective: minimize number of teams + balance WAM
+        for i in range(num_students):
+            for team in range(max_teams):
+                model.Add(lab_team[team, student_labs[i]] == 1).OnlyEnforceIf(
+                    assign[i, team]
+                )
+
+    if has_gender:
+        for team in range(max_teams):
+            male_students = [
+                assign[i, team] for i in range(num_students) if genders[i] == "M"
+            ]
+            female_students = [
+                assign[i, team] for i in range(num_students) if genders[i] == "F"
+            ]
+            if male_students:
+                model.Add(sum(male_students) != 1)
+            if female_students:
+                model.Add(sum(female_students) != 1)
+
+    # Objective terms
     squared_deviation_terms = []
-    for team in range(max_teams):
-        wam_sum = model.NewIntVar(0, 100 * max_team_size, f"wam_sum_{team}")
-        size_var = model.NewIntVar(0, max_team_size, f"team_size_{team}")
-        model.Add(size_var == sum(assign[i, team] for i in range(num_students)))
-        model.Add(
-            wam_sum == sum(wams[i] * assign[i, team] for i in range(num_students))
-        )
-        diff = model.NewIntVar(-500, 500, f"wam_diff_{team}")
-        model.Add(diff == wam_sum - size_var * global_avg_wam)
-        squared_diff = model.NewIntVar(0, 250000, f"squared_diff_{team}")
-        model.AddMultiplicationEquality(squared_diff, [diff, diff])
-        squared_deviation_terms.append(squared_diff)
-
-    model.Minimize(
-        sum(team_used) + int(wam_weight * 1000) * sum(squared_deviation_terms)
-    )
+    if has_wam:
+        for team in range(max_teams):
+            wam_sum = model.NewIntVar(0, 100 * max_team_size, f"wam_sum_{team}")
+            size_var = model.NewIntVar(0, max_team_size, f"team_size_{team}")
+            model.Add(size_var == sum(assign[i, team] for i in range(num_students)))
+            model.Add(
+                wam_sum == sum(wams[i] * assign[i, team] for i in range(num_students))
+            )
+            diff = model.NewIntVar(-500, 500, f"wam_diff_{team}")
+            model.Add(diff == wam_sum - size_var * global_avg_wam)
+            squared_diff = model.NewIntVar(0, 250000, f"squared_diff_{team}")
+            model.AddMultiplicationEquality(squared_diff, [diff, diff])
+            squared_deviation_terms.append(squared_diff)
+
+    pref_bonus_terms = []
+    for i, j in pos_preferences:
+        for team in range(max_teams):
+            together = model.NewBoolVar(f"prefer_{i}_{j}_team_{team}")
+            model.AddBoolAnd([assign[i, team], assign[j, team]]).OnlyEnforceIf(together)
+            model.AddBoolOr(
+                [assign[i, team].Not(), assign[j, team].Not()]
+            ).OnlyEnforceIf(together.Not())
+            pref_bonus_terms.append(together)
+
+    negative_terms = []
+    for i, j in neg_preferences:
+        for team in range(max_teams):
+            both = model.NewBoolVar(f"neg_pref_{i}_{j}_team_{team}")
+            model.AddBoolAnd([assign[i, team], assign[j, team]]).OnlyEnforceIf(both)
+            model.AddBoolOr(
+                [assign[i, team].Not(), assign[j, team].Not()]
+            ).OnlyEnforceIf(both.Not())
+            negative_terms.append(both)
+
+    # Objective
+    objective_terms = [sum(team_used)]
+
+    if has_wam and wam_weight > 0:
+        objective_terms.append(int(wam_weight * 1000) * sum(squared_deviation_terms))
+    if pos_pref_weight > 0:
+        objective_terms.append(-pos_pref_weight * sum(pref_bonus_terms))
+    if neg_pref_weight > 0:
+        objective_terms.append(neg_pref_weight * sum(negative_terms))
+
+    model.Minimize(sum(objective_terms))
 
     # Solve
     solver = cp_model.CpSolver()
diff --git a/tests/test_teams.py b/tests/test_teams.py