ml-product-matching

This repository contains the code and data to reproduce the experiments from the paper "Analysis and Performance Evaluation of Machine Learning Techniques for Product Matching." The study investigates various machine learning techniques applied to product matching through a systematic literature review and experiments using datasets from the WDC Product Data Corpus and Magellan Data Repository.

Overview

The repository offers implementations of methods from six key studies reviewed in the paper, including fine-tuning pre-trained language models and additional optimizations.

Code: Implementations of machine learning techniques from the reviewed studies.
Data: Links to the datasets used in the experiments, including subsets from the WDC Product Data Corpus and Magellan Data Repository.
Experiments: Scripts and configurations to replicate the experiments and results presented in the study.

Methods and Implementations

The evaluated methods are implemented based on the reviewed studies:

Deep Entity Matching with Pre-Trained Language Models (2020)
Intermediate Training of BERT for Product Matching (2020)
Dual-Objective Fine-Tuning of BERT for Entity Matching (2021)
Multilingual Transformers for Product Matching – Experiments and a New Benchmark in Polish (2022)
Supervised Contrastive Learning for Product Matching (2022)
Entity Resolution with Hierarchical Graph Attention Networks (2022)

Usage

Cloning the Repository

To clone this repository, run:

git clone https://github.com/edfvalim/ml-product-matching

Running the Experiments

To conduct the experiments, navigate to the appropriate subdirectory for each implementation. Detailed instructions for setting up dependencies, downloading datasets, and executing scripts are provided within the corresponding README file.

License

The code in this repository is licensed under the MIT License. However, some subdirectories contain code that is licensed under different licenses, such as the BSD License and the Apache License 2.0. Please refer to the specific license files in those subdirectories for detailed information.

Acknowledgements

This repository includes code and data from various studies. We acknowledge the original authors for their contributions and licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
HierGAT		HierGAT
contrastive-product-matching		contrastive-product-matching
ditto		ditto
jointbert		jointbert
mlt4pm		mlt4pm
productbert-intermediate		productbert-intermediate
.gitignore		.gitignore
LEIAME.md		LEIAME.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-product-matching

Overview

Contents

Methods and Implementations

Usage

Cloning the Repository

Running the Experiments

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ml-product-matching

Overview

Contents

Methods and Implementations

Usage

Cloning the Repository

Running the Experiments

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages