2026, COIN, A Probabilistic Greedy Attempt to be Fair in Neural Team Recommendation.
Just Acceptedmajorminor
2023, BIAS-ECIR, Bootless Application of Greedy Re-ranking Algorithms in Fair Neural Team Formation.
doireviewsvideo
2026, XXXX, Graph Neural Team Recommendation: An Integrated Approach.
Under Review
2025, SIGIR, Translative Neural Team Recommendation.
doireviewsposter
2025, WSDM, Adaptive Loss-based Curricula for Neural Team Recommendation.
doireviews
2024, ECIR, A Streaming Approach to Neural Team Formation Training.
doireviewspptvideo
2024, WISE, Skill Vector Representation Learning for Collaborative Team Recommendation: A Comparative Study.
doireviews
2022, CIKM, Effective Neural Team Formation via Negative Samples.
doireviews
2022, CIKM, OpeNTF: A Benchmark Library for Neural Team Formation.
doireviewsvideo
2026, WWW, Learning Collaborative Teams via Social Information Retrieval.
web
2025, CIKM, Neural Shifts in Collaborative Team Recommendation.
web
2025, WSDM, Bridging Subgraph Optimization and Graph Neural Network in Team Recommendations.
webdoippt
2024, SIGIR-AP, Paradigm Shifts in Team Recommendation: From Subgraph Optimization to Graph Neural Network.
webdoipptvideo
2024, UMAP, Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, and Perspectives.
webdoipptvideo
Team formation (recommendation) involves selecting a team of skillful experts who will, more likely than not, accomplish a task. Researchers have proposed a rich body of computational methods to automate the traditionally tedious and error-prone manual process. We release OpeNTF, an open-source neural team formation framework hosting canonical neural models as the cutting-edge class of approaches, along with large-scale training datasets from varying domains. It further includes temporal training strategy for neural models’ training to capture the evolution of experts’ skills and collaboration ties over time, as opposed to randomly shuffled training datasets. OpeNTF also integrates debiasing reranking algorithms at its last step to mitigate the popularity and gender disparities in the neural models’ team recommendations based on two alternative notions of fairness: equal opportunity and demographic parity. OpeNTF is a forward-looking effort to automate team formation via fairness-aware and time-sensitive methods. AI-ML-based solutions are increasingly impacting how resources are allocated to various groups in society, and ensuring fairness and time are systematically considered is key.
![]() |
1. Setup
OpeNTF needs Python >= 3.8 and installs required packages lazily and on-demand, i.e., as it goes through the steps of the pipeline, it installs a package if the package or the correct version is not available in the environment. For further details, refer to requirements.txt and pkgmgr.py. To set up an environment locally:
#python3.8
python -m venv opentf_venv
source opentf_venv/bin/activate #non-windows
#opentf_venv\Scripts\activate #windows
pip install --upgrade pip
pip install -r requirements.txtFor installation of a specific version of a python package due to, e.g., CUDA versions compatibility, edit requirements.txt like:
#$ torch==2.4.1 --index-url https://download.pytorch.org/whl/cu118
To run in a container, a docker image can be built and run by customizing the Dockerfile.
cd src
python main.py "cmd=[prep, train, test, eval]" \
"models.instances=[mdl.rnd.Rnd, mdl.fnn.Fnn, mdl.bnn.Bnn]" \
data.domain=cmn.publication.Publication data.source=../data/dblp/toy.dblp.v12.json data.output=../output/dblp/toy.dblp.v12.json \
~data.filter \
train.train_test_ratio=0.85 train.nfolds=3 train.save_per_epoch=3 \
test.per_epoch=True test.topK=100 \
eval.topk=\'2,5,10\'The above run, loads and preprocesses a tiny-size toy example dataset toy.dblp.v12.json from dblp with no filtering followed by 3-fold cross train-validation on a training split and a final test on the test set for feedforward and Bayesian neural models as well as a random model using default hyperparameters from ./src/mdl/__config__.yaml. For a step-by-step guide and output trace, see our colab script .
Each model has been defined in ./src/mdl/ under an inheritance hierarchy. They override abstract functions for train, test, eval, and plot steps. For example, for our feedforward baseline fnn, the model has been implemented in ./src/mdl/fnn.py. Model's hyperparameters such as the learning rate (lr) or the number of epochs (e) can be set in ./src/mdl/__config__yaml, or overriden in running command as shown in the quickstart script.
Currently, from ./src/mdl/, we support
Neural mutlilabel classifiers including non-Bayesian feedforward
fnnand Bayesianbnn, where each expert candidate is a label and team recommendation is a multilabel classification;See A Variational Neural Architecture for Skill-based Team Formation, TOIS23 for details and results.
Seq-to-seq and transformer-based models via
nmtwrapper overOpenNMT, where the required subset of skills is mapped to the optimum subset of experts;See Translative Neural Team Recommendation, SIGIR25 for details and results.
Gnn-based models via
gnnusingPyG, where the optimum subset of experts is predicted via link prediction between expert and team nodes in an expert graph, as shown in our colab script.
From ./src/mdl/emb/, we also support dense vector representation learning methods for skills to be fed into the neural mutlilabel classifiers:
d2v: Inspired by paragraph vectors by Le and Mikolov, we consider a team as a document and skills as the document words and embed skills usinggensim.
gnn: A graph neural network can be used to embed skills in an expert graph (transfer-based). ViaPyG, we implemented random-walk-based methods likenode2vecandmetapath2vec, or message-passing-based like 'graphsage', and many more. See.
See Skill Vector Representation Learning for Collaborative Team Recommendation: A Comparative Study, WISE24 for details and results.
3.2. Adila: Fairness-aware Team Formation
Neural team formation methods largely ignore the fairness in the recommended teams of experts. We study the application of fairness-aware re-ranking algorithms to mitigate the potential popularity or gender biases in Adila and integrate it as submodule in OpeNTF. We support fairness notions equal opportunity and demographic parity. To achieve fairness, we utilize post-hoc reranking algorithms (det_greedy, det_cons, det_relaxed, fa*ir). Fairness criteria can be set in ./src/__config__.yaml#L95. For sample run, see .
See Bootless Application of Greedy Re-ranking Algorithms in Fair Neural Team Formation. BIAS-ECIR23 for details and results.
Team formation models generally assume that teams follow the i.i.d property and follow the bag of teams approach during training (a shuffled dataset of teams). With temporal training, we aim to predict future teams of experts. We sort the teams by time intervals and train a neural model incrementally through the ordered collection of teams, as seen below. To run models under temporal training strategy, set models.instances to [mdl.tntf.tNtf_mdl.fnn.Fnn], [mdl.tntf.tNtf_mdl.bnn.Bnn], or both [mdl.tntf.tNtf_mdl.fnn.Fnn, mdl.tntf.tNtf_mdl.bnn.Bnn] in src/__config__.yaml.
See A Streaming Approach to Neural Team Formation Training, ECIR24 for details and results.
As is known, using unsuccessful teams convey complementary negative signals to the model. However, most real-world training datasets in the team formation domain do not have explicit unsuccessful teams (e.g., collections of rejected papers.) In the absence of unsuccessful training instances, we proposed negative sampling strategies based on the closed-world assumption, where no currently known successful group of experts for the required skills is assumed to be unsuccessful. We study the effect different negative sampling strategies:
Uniform distribution (
uniform), where subsets of experts are randomly chosen with thesame probabilityas unsuccessful teams from the uniform distribution over all subsets of experts.
Unigram distribution (
unigram), where subsets of experts are chosen regardingtheir frequencyin all previous successful teams. Intuitively, teams of experts that have been more successful but for other skill subsets will be given a higher probability and chosen more frequently as a negative sample to dampen the effect of popularity bias.
Smoothed unigram distribution in each training minibatch (
unigram_b), where we employed theadd-1 or Laplace smoothingwhen computing the unigram distribution of the experts but in each training minibatch.
To include a negative sampling strategy, set nsd and the number of negative samples to draw ns in src/mdl/__config__.yaml
See Effective Neural Team Formation via Negative Samples, CIKM22 for details and results.
Raw dataset, e.g., scholarly papers from AMiner's citation network dataset of dblp, movies from imdb, or US patents from uspt were assumed to be populated in data. For the sake of integration test, tiny-size toy example datasets toy.dblp.v12.json from dblp, [toy.title.basics.tsv, toy.title.principals.tsv, toy.name.basics.tsv] from imdb, toy.repos.csv for github, and toy.patent.tsv from US patents have been already provided.
Raw data will be preprocessed into two main sparse matrices each row of which represents:
teamsvecs['member']: occurrence (boolean) vector representation for members of a team, e.g., authors of a paper or crew members of a movie,
teamsvecs['skill']: occurrence (boolean) vector representation for required skills for a team, e.g., keywords of a paper or genre of a movie.
teamsvecs['loc']: occurrence (boolean) vector representation for a team's location, e.g., conference/journal of a paper.
Also, indexes will be created to map the vector's indexes to members', skills', and locations' names, i.e., i2c, c2i, i2s, s2i.
The sparse matrices and the indices will be persisted in ouptut/{dblp,imdb,uspt}/{name of dataset} as pickles teamsvecs.pkl and indexes.pkl. For example, the preprocessed data for our dblp toy example are output/dblp/toy.dblp.v12.json/teamsvecs.pkl and output/dblp/toy.dblp.v12.json/indexes.pkl.
Our pipeline benefits from parallel generation of sparse matrices for teams, which significantly reduces the preprocessing time as shown below:
Please note that the preprocessing step will be executed once. Subsequent runs load the persisted pickle files. In order to re-generate them, one should simply delete them.
We used pytrec_eval to evaluate the performance of models on the test set as well as on their own train sets (should overfit) and validation sets. We report the predictions, evaluation metrics on each test instance, and average on all test instances in ./output/{domain}/{dataset}/{split}/{model}.{setting}/. For example, see output/dblp/toy.dblp.v12.json/splits.f3.r0.85/gcn.b1000.e100.ns5.lr0.001.es5.spe10.d128.add.stm.h128.nn30-20, where:
f0.test.predis the predictions per test instance for a model which is trained folds [1,2,3,4] and validated on fold [0].
f0.test.pred.eval.csvis the values of evaluation metrics for the predictions per test instance
f0.test.pred.eval.mean.csvis the average of values for evaluation metrics over all test instances.
test.pred.eval.mean.csvis the average of values for evaluation metrics over all n fold models.
We benefit from bayesian-torch, OpenNMT-py, reranking and fairsearchcore, pytrec_eval, and other libraries. We would like to thank the authors of these libraries and helpful resources.
©2025. This work is licensed under a CC BY-NC-SA 4.0 license.
CAD$300, Best Research, Demo Day, School of Computer Science, University of Windsor, 2022.






