FedPS: Federated data Preprocessing via aggregated Statistics

The workflow consists of five steps: ① Compute local statistics; ② Share and aggregate statistics; ③ Derive preprocessing parameters; ④ Broadcast parameters to clients; ⑤ Apply preprocessing locally.

Installation

Dependencies

Python (>= 3.10)
Scikit-learn (~= 1.7)
NumPy (>= 1.20)
DataSketches (<= 4.1.0)
PyZMQ

Building from source

Create a Python env

conda create --name fedps python=3.10
conda activate fedps

Clone this project

git clone https://github.com/xuefeng-xu/fedps.git && cd fedps

Build the project

pip install -e .

Usage

Set up communication channels

# Client1 channel
from fedps.channel import ClientChannel

channel = ClientChannel(
    local_ip="127.0.0.1", local_port=5556,
    remote_ip="127.0.0.1", remote_port=5555,
)

# Client2 channel
from fedps.channel import ClientChannel

channel = ClientChannel(
    local_ip="127.0.0.1", local_port=5557,
    remote_ip="127.0.0.1", remote_port=5555,
)

# Server channel
from fedps.channel import ServerChannel

channel = ServerChannel(
    local_ip="127.0.0.1", local_port=5555,
    remote_ip=["127.0.0.1", "127.0.0.1"],
    remote_port=[5556, 5557],
)

Specify FL_type and role in the preprocessor

FL_type: "H" (Horizontal) or "V" (Vertical)
role: "client" or "server"

# Client1 code example
from fedps.preprocessing import MinMaxScaler

X = [[-1, 2], [-0.5, 6]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)

# Client2 code example
from fedps.preprocessing import MinMaxScaler

X = [[0, 10], [1, 18]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)

# Server code example
from fedps.preprocessing import MinMaxScaler

est = MinMaxScaler(FL_type="H", role="server", channel=channel)
est.fit()

Run the script

# Run in three terminals
python client1.py
python client2.py
python server.py

PS: See more cases in the example folder.

Available preprocessing modules

Discretization
- KBinsDiscretizer
Encoding
Scaling
Transformation
Imputation
- IterativeImputer (experimental)
- KNNImputer
- SimpleImputer

Differences from Scikit-learn

Currently, this library does not support sparse data.
KBinsDiscretizer, StandardScaler, and SplineTransformer cannot set the sample_weight parameter in their fit methods.
KBinsDiscretizer does not support the quantile_method parameter.
IterativeImputer does not support the sample_posterior and n_nearest_features parameters.
KNNImputer does not support custom weight funtion and distance metric.

Acknowledgement

This project is build on Scikit-learn.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
doc		doc
example		example
fedps		fedps
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FedPS: Federated data Preprocessing via aggregated Statistics

Installation

Dependencies

Building from source

Usage

Available preprocessing modules

Differences from Scikit-learn

Acknowledgement

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FedPS: Federated data Preprocessing via aggregated Statistics

Installation

Dependencies

Building from source

Usage

Available preprocessing modules

Differences from Scikit-learn

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages