Skip to content

Latest commit

 

History

History
195 lines (153 loc) · 6.49 KB

File metadata and controls

195 lines (153 loc) · 6.49 KB

Human3DSeg

Paper Project Page License

Human3DSEG: Part Segmentation of Human Meshes via Multi-View Human Parsing
James Dickens, Kamyar Hamad
2025

THuman2 1_Release_301-600_0303

Abstract

Human3DSeg is an open-source framework for 3D human body part segmentation that leverages multi-view human parsing techniques. This repository provides a complete pipeline for automatic annotation and segmentation of 3D human point clouds and meshes. The framework consists of two main components: a data preprocessing module that handles automatic 3D annotation through 2D projections, and a model training pipeline that implements a Point Transformer architecture for accurate point cloud segmentation. Human3DSeg achieves considerable performance on human body part segmentation tasks while requiring minimal manual annotation effort. The framework is designed to be modular, allowing researchers and practitioners to adapt individual components for their specific 3D human analysis tasks.

Repository Structure

This repository contains components for automatic 3D human point cloud annotation and segmentation, organized into two main parts:

Human3DSeg/
├── Data_Processing/      # Data preparation and preprocessing scripts
├── Model/                # Model training, evaluation, and inference code
├── data/                 
├── data_fps.py           
├── utils/                # Utility functions and helper modules
├── README.md             # Project documentation

Prepcoessed data is available at: https://drive.google.com/drive/folders/1JU8f-IQLJxt3Gsuu4AEq61xD8vanAfXF?usp=sharing

Installation

# Clone the repository
git clone git@github.com:kamyarothmanhamad/Human3DSegmentation.git
cd Human3DSEG

# Create conda environment
conda create -n human3dseg python=3.9
conda activate human3dseg

# Install dependencies
pip install -r requirements.txt

Data Preparation (Data_Preprocessing/ folder)

The data preparation component provides tools and scripts for automatic 3D human point cloud annotation. This component:

  • Operates independently from the training pipeline
  • Includes comprehensive data preparation utilities
  • Handles automatic annotation of 3D human point clouds
  • Can be used as a standalone module for data preprocessing
Data_Preprocessing/
├── src/                   # Data loading utilities and scripts
│   ├── PointTransformerV3
│   ├── PyOpenGL
│   ├── Sapiens
│   ├── m2fp
│   └── Yolov8
├── data_processing/       # Evaluation scripts and metrics  
├── __init__.py 

To use m2fp as 2d segmentor, clone the official implementation into the src/ directory. See https://github.com/soeaver/M2FP. Requires Detectron2 and Cuda toolkit installations.

the Sapiens torchscript files is available here: https://huggingface.co/facebook/sapiens-pose-1b-torchscript

Usage for data preparation:

cd Data_Preprocessing
python Data_Processing/data_processing.py 

Training Pipeline (Model/ folder)

The training pipeline processes the prepared data from the data preparation stage. This component:

  • Works with output from the data preparation module
  • Implements the main segmentation model training
  • Includes evaluation and inference capabilities
Model/
├── Dataloaders/      # Data loading utilities and scripts
├── Evaluation/       # Evaluation scripts and metrics
├── Training/         # Training scripts and model definitions
├── __init__.py    
├── run.py            # Main entry point for running training/evaluation

To use Point Transformer v1 as a backbone, clone the official implementation into the Model/ directory.

You can do this by running:

cd Model
git clone https://github.com/POSTECH-CVLab/point-transformer.git .

Training:

python Model/run.py pointtransformer_cihp_seg.yml human_seg_3d_cihp_PT.yml

Citation

To cite this work cite the following:

@misc{dickens2025segmentationhumanmeshesmultiview,
      title={Part Segmentation of Human Meshes via Multi-View Human Parsing}, 
      author={James Dickens and Kamyar Hamad},
      year={2025},
      eprint={2507.18655},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.18655}, 
}

Citation

To cite this work, please use:

@article{dickens2025segmentationhumanmeshesmultiview,
  title={Part Segmentation of Human Meshes via Multi-View Human Parsing}, 
  author={James Dickens and Kamyar Hamad},
  year={2025},
  eprint={2507.18655},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2507.18655}, 
}

Please also cite the following works:

@article{yang2023humanparsing,
  title={Deep Learning Technique for Human Parsing: A Survey and Outlook},
  author={Lu Yang and Wenhe Jia and Shan Li and Qing Song},
  journal={arXiv preprint arXiv:2301.00394},
  year={2023}
}

@article{khirodkar2024sapiens,
  title={Sapiens: Foundation for Human Vision Models},
  author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2408.12569},
  year={2024}
}

@inproceedings{zhao2021point,
  title={Point transformer},
  author={Zhao, Hengshuang and Jiang, Li and Jia, Jiaya and Torr, Philip HS and Koltun, Vladlen},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16259--16268},
  year={2021}
}

License

MIT © 2025 James Dickens, Kamyar O. Hamad

Acknowledgments

We thank the THUman2.0 dataset for providing the human scans used in this work; please cite the dataset if you use it.