This repository contains the code for the paper "Towards Human-AI Complementarity in Matching Tasks" by Adrian Arnaiz-Rodriguez, Nina Corvelo Benz, Suhas Thejaswi, Nuria Oliver, Manuel Gomez Rodriguez.
Accepted for oral presentation at the HLDM'25 workshop at ECMLPKDD2025 (the Third Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2025).
The folder data/20250217-160341-results.zip contains the outputs from the Prolific user study in the following folders:
Each participant's answer to it's assigned matching task. It is organized as submissions/pool{xx}/B{b}/matching-B_{b}-pool_pool{xx}-PID_{userID}.json, where for each pool and deferral size b, there is a JSON files for each participant's submission.
Each JSON file contains:
matching: Partial assignment{person_id: slot or null}utility: Achieved utilityoptimalUtility: Best possible utility for remaining peopletime_taken: Time in secondstime_expired: Whether the time limit was reachedtimestamp: Submission timestampprobabilitiesFile,capacitiesFile: path of probabilities and capacities files (under20250217-160341.zip)prolific_pid,pool_id,problem_id: Submission metadatasequence_idx: Index of assigned problem sequenceproblem_number: Order of the problem in the sequence
Each participant has an assigned metadata file organized as:
user\_data/U\_{userID}-S\_{sequence}/
└── userdata-U\_{userID}-S\_{sequence}.json
Each JSON file contains:
sequence: List of 10 problems assigned to the user (CSV pairs: probabiblities + capacities)sequence_idx: Index of assigned problem sequenceprolific_pid: Prolific participant IDstudy_id: Prolific internal study session IDproblem_number: Number of tasks completed.attention: Whether attention checks were passedreturned: Whether the study was returned instead of submitted
The data from our stylized human-subject study is included in data\20250217-160341. The data structure is as follows:
data/{timestamp}/
│ prob_x.npy // general data generation information: generation probabilities for x as defined in the script
│ prob_y_per_x.npy // general data generation information: generation probabilities for y given x as defined in the script
│ capacities.npy // general data generation information: capacities for each type as defined in the script
│ user_problem_matrix.csv // general data generation information: precomputed problem sequences to show to Prolific users (each user does half of the arms from random pools in random order)
├─ pool0/ // Pool 0 (first pool generated)
│ pool.pkl // Pool Object of individuals - Details in `MWBM.data_genetrion.Pool`
│ g.npy // expected success `individuals` X `resources` (ground-truth success probabilities)
│ g_alg.npy // confidence scores `individuals` X `resources` (biased success probabilities)
│ matching_opt_real_prob.npy // optimal matching with real probabilities (g.npy)
│ ├─ B{b}/ // Sub-folders `B{b}` for every deferral size `b = 0,...,n`
│ │ alg_matching.csv // algorithmic matching (using biased confidence scores `g_alg`) of n-b individuals (only available if b<n)
│ │ opt_rem_match.csv // optimal remaining matching (using real probabilities `g`) of the b remaining individuals i.e. best human can do
│ │ remaining_capacities.csv // remaining capacities after algorithmic matching - one row per resource
│ │ remaining_g.csv // expected utilities of remaining `individuals` X `resources`
│ │ remaining_people.csv // remaining ID of individuals (after algorithmic matching)
│ └─
├─ pool1/
└─ …
See the section Data Generation at the bottom of this file for details on how this data was generated.
Ro rung the bandit system, the script run_bandit_parallel.py is used to run the bandit system. This script simulates the bandit system using the data from the user study, and it can be run in parallel or sequentially.
python run_bandit_parallel.py # or python run_bandit_human_data.py for parallel or sequential execution
--pools-folder data/20250217-160341 # Path to folder with pool data.
--results-folder data/20250217-160341-results # Path to folder with human answers and also used to store results (preferred `{pools-folder}-results`).
--num_sims 100 # Number of simulation runs.
--horizon 2000 # Number of rounds per simulation.
--min-arm 5 # Lowest arm, below this, we consider all human matchings to be optimal.
--problems pool_assignment # name of tensor in {results-folder}\stats, dimensions `n_pools` x `arms` x `answers per pool-arm`, with the ID of users that made each problem.
--allowed-users all # 'all' or path to a JSON file with list of selected users in `{results-folder}/bandit/users/*.json`
--random-assign # If enabled, fill incomplete human matchings randomly.
--title all_random # identifier for this experiment, used in results' folder.
--seed 0 # Base seed for reproducibility.
--assignment-strategy human # Strategy to assign the remaining $b$ individuals to the remaining slots. 'human' option uses the answers from the user study.The argument --problems {tensor} loads a 3-d tensor of n_pools x arms x answers per pool-arm indicating which users made which problems.
It is already computed and saved in data data/20250217-160341-results.zip/stats/pool_user_assignment/pool_assignment.npy, and can be used directly by the bandit system.
However, if these system is used with other data, it must be computed before running the bandit and saved in {results-folder}\results\stats\pool_user_assignment\{name}.npy.
It is computed by the script UserStudy-Analysis\run_analysis.bat, which processes the user study results and saves the necessary statistics, including:
- The required
nxarmsxanswers per pool-armtensor indicating which users made which problems. - Additional statistics: utilities of each problem, normalized utilities (scores wrt optimal score) for each user and arm, time taken for each problem, etc, saved in
{results-folder}\results\stats.
The variables used in UserStudy-Analysis\run_analysis.bat, are definded inside the script as follows:
set poolsfolder=data/20250217-160341 # Path to folder with pool and matching problem data.
set resultsfolder=%poolsfolder%-results # Path to store results (preferred `{pools-folder}-results`).
set npools=40 # set number of different pools that users solved
set narms=21 # max arm + 1 because arms = [0, max_arm]
set minarm=5 # minimum b collected in the study (We will use optimal matchings as human mathcings for lower arms)
set totalproblems=10 # total problems per user WITH the att checks (so arms done +2)
set userlist=users_valid
set problemlist=pool_assignmentThe data from our stylized human-subject study is included in data\20250217-160341.
In addition, to personlaize the data generation for human assisted matching, run the script:
python generate_prolific_matchings.pyThis script generates synthetic data for human-assisted matching scenarios, as follows:
Given (as internal variables in the script):
N = 20: Number of people per poolM = 10: Number of appointment slots per poolN_POOLS = 40: Total number of pools to generatecapacities_types = [2]: Each slot can host 2 peopleprob_x: Prior distribution over person types (categorical over K groups)prob_y_per_x: Success probability matrix for each person type × slot (shape: K × M)slot_keys: Fixed 10 appointment keys (Mo-am,Mo-pm, ...,Fr-pm)
Creates:
- All generated data and results are saved under a 'data//' directory, including numpy arrays, pickled objects, and CSV tables for each pool and matching scenario.
- It includes general files with the details of the generation, one folder for each created pool, and sub-folders for each deferral size
b(from 0 ton), wherenis the number of individuals in the pool. - The data structure is as follows:
data/{timestamp}/
│ prob_x.npy // general data generation information: generation probabilities for x as defined in the script
│ prob_y_per_x.npy // general data generation information: generation probabilities for y given x as defined in the script
│ capacities.npy // general data generation information: capacities for each type as defined in the script
│ user_problem_matrix.csv // general data generation information: precomputed problem sequences to show to Prolific users (each user does half of the arms from random pools in random order)
├─ pool0/ // Pool 0 (first pool generated)
│ pool.pkl // Pool Object of individuals - Details in `MWBM.data_genetrion.Pool`
│ g.npy // expected success `individuals` X `resources` (ground-truth success probabilities)
│ g_alg.npy // confidence scores `individuals` X `resources` (biased success probabilities)
│ matching_opt_real_prob.npy // optimal matching with real probabilities (g.npy)
│ ├─ B{b}/ // Sub-folders `B{b}` for every deferral size `b = 0,...,n`
│ │ alg_matching.csv // algorithmic matching (using biased confidence scores `g_alg`) of n-b individuals (only available if b<n)
│ │ opt_rem_match.csv // optimal remaining matching (using real probabilities `g`) of the b remaining individuals i.e. best human can do
│ │ remaining_capacities.csv // remaining capacities after algorithmic matching - one row per resource
│ │ remaining_g.csv // expected utilities of remaining `individuals` X `resources`
│ │ remaining_people.csv // remaining ID of individuals (after algorithmic matching)
│ └─
├─ pool1/
└─ …
pip install -r requirements.txtor
conda env create -f environment.ymlIMPORTANT: Gurobi license is required. Instructions at How do I install Gurobi for Python?.