Population Synthesis

Population Synthesis is a modular framework for generating synthetic populations using various statistical and machine learning methods. It is designed for flexibility, reproducibility, and extensibility, supporting multiple synthesis algorithms and comprehensive benchmarking.

Project Structure

PopSynthesis/: Main package containing all core modules.
- cli/: Command-line interface (CLI) for running synthesis tasks via click.
- DataProcessor/: Classes and utilities for processing raw and intermediate data, including DataProcessorGeneric for unified data handling.
- Methods/: Implementation of synthesis methods (e.g., IPU, IPSF/SAA, BN, CSP, GAN, RL, naive, IPF). Each method is in its own subfolder.
- Benchmark/: Tools and scripts for comparing synthetic results to census or ground truth data.
- Generator_data/: Scripts and notebooks for generating and preparing input data.
- Analyse/: Jupyter notebooks for analysis and visualization of results.
- tests/: Unit and integration tests for methods and utilities.

Installation

Requirements are managed in pyproject.toml. Main dependencies include:

Python >= 3.9
pandas, polars, numpy, geopandas, scikit-learn, scipy, torch, click, pyyaml, pulp, pgmpy, ipfn, pyarrow

Install with:

pip install .

Or, for development:

pip install -e .

The project uses uv so it can be installed on another machine for CLI run only

uv pip install git+https://github.com/bobkatla/PopSyn_Monash.git

or to upgrade (for new developments) with

uv pip install --upgrade git+https://github.com/bobkatla/PopSyn_Monash.git

Usage

Command Line Interface (CLI)

The main entry point is the psim command (see PopSynthesis/cli/main.py). Example:

psim synthesize --runs-yml IO/configs/runs.yml --output-path IO/output/

--runs-yml: Path to a YAML file specifying synthesis runs and parameters.
--output-path: Directory for output files.

Programmatic Usage

You can also use the main classes and methods in your own scripts. For example, to run the IPU method:

from PopSynthesis.Methods.IPU.run import run_ipu
run_ipu(hh_marg_file, p_marg_file, hh_sample_file, p_sample_file, output_path, hh_syn_name, pp_syn_name, stats_name)

Or to process data:

from PopSynthesis.DataProcessor.DataProcessor import DataProcessorGeneric
processor = DataProcessorGeneric(raw_data_dir, processed_data_dir, output_dir)
processor.process_all_seed()

Data and Configuration

Input data and configuration files are in IO/configs/ and PopSynthesis/DataProcessor/data/.
Output and processed data are in IO/output/ and PopSynthesis/DataProcessor/output/.

Testing

Tests are in PopSynthesis/tests/ and can be run with your preferred test runner (e.g., pytest):

pytest PopSynthesis/tests/

Notebooks

Analysis and visualization notebooks are in PopSynthesis/Analyse/ and PopSynthesis/DataProcessor/utils/.

Extending

Add new synthesis methods in PopSynthesis/Methods/.
Utilities and shared code go in PopSynthesis/DataProcessor/utils/.
Benchmarking tools are in PopSynthesis/Benchmark/.
Add tests in PopSynthesis/tests/.

For more details, see the code and docstrings in each module. Contributions and suggestions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 1,075 Commits
.github/workflows		.github/workflows
IO/configs		IO/configs
PopSynthesis		PopSynthesis
general_drawio		general_drawio
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
job_ipsf.script		job_ipsf.script
job_saa.slurm		job_saa.slurm
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Population Synthesis

Project Structure

Installation

Usage

Command Line Interface (CLI)

Programmatic Usage

Data and Configuration

Testing

Notebooks

Extending

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Population Synthesis

Project Structure

Installation

Usage

Command Line Interface (CLI)

Programmatic Usage

Data and Configuration

Testing

Notebooks

Extending

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages