Skip to content

edfvalim/ml-product-matching

Repository files navigation

ml-product-matching

This repository contains the code and data to reproduce the experiments from the paper "Analysis and Performance Evaluation of Machine Learning Techniques for Product Matching." The study investigates various machine learning techniques applied to product matching through a systematic literature review and experiments using datasets from the WDC Product Data Corpus and Magellan Data Repository.

Overview

The repository offers implementations of methods from six key studies reviewed in the paper, including fine-tuning pre-trained language models and additional optimizations.

Contents

  • Code: Implementations of machine learning techniques from the reviewed studies.
  • Data: Links to the datasets used in the experiments, including subsets from the WDC Product Data Corpus and Magellan Data Repository.
  • Experiments: Scripts and configurations to replicate the experiments and results presented in the study.

Methods and Implementations

The evaluated methods are implemented based on the reviewed studies:

  1. Deep Entity Matching with Pre-Trained Language Models (2020)
  2. Intermediate Training of BERT for Product Matching (2020)
  3. Dual-Objective Fine-Tuning of BERT for Entity Matching (2021)
  4. Multilingual Transformers for Product Matching – Experiments and a New Benchmark in Polish (2022)
  5. Supervised Contrastive Learning for Product Matching (2022)
  6. Entity Resolution with Hierarchical Graph Attention Networks (2022)

Usage

Cloning the Repository

To clone this repository, run:

git clone https://github.com/edfvalim/ml-product-matching

Running the Experiments

To conduct the experiments, navigate to the appropriate subdirectory for each implementation. Detailed instructions for setting up dependencies, downloading datasets, and executing scripts are provided within the corresponding README file.

License

The code in this repository is licensed under the MIT License. However, some subdirectories contain code that is licensed under different licenses, such as the BSD License and the Apache License 2.0. Please refer to the specific license files in those subdirectories for detailed information.

Acknowledgements

This repository includes code and data from various studies. We acknowledge the original authors for their contributions and licenses.

About

Repository with the code and data to reproduce the experiments presented in the paper “Analysis and Performance Evaluation of Machine Learning Techniques for Product Matching”.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors