Skip to content

UCL-CHME0039-24-25/blender_colonoscopy

blender_colonoscopy

Project 3: Using synthetic surgical data from Blender to train a polyp detection model

Project owner: Ruaridh Gollifer

Blender Randomiser Add-on tutorials on Moodle

These are the areas to focus on for the Blender part of the project:

Example of how to render data using Blender by running the run_RandomiserBlender.sh script:

bash run_RandomiserBlender.sh -blend ../../datasets/blend_files/colon.blend -json_in ../input_files/input_bounds_hackathon.json -json_out ../output_files/test_docker.json -seed 32 -frame 10 -basename ../output_files/docker_output_frame -render_main

Project details

Image Guided Surgery (IGS) researchers use machine learning in some form (registration, segmentation, stereo reconstruction, classification etc.) with polyp detection in image guided colonoscopy surgery (project 3) an example. However, often there is limited real data that can be used to train machine learning models in IGS research, and moreover any large datasets would involve extensive time consuming manual labelling from a trained clinician [1].

One potential solution to address this clinical challenge is synthetic data generation which can provide us with large amounts and a variety of realistic ground truth data, which in real data would be very challenging to generate or would involve extensive time consuming manual labelling. Blender is a free and open-source software that can be used for 3D geometry rendering. Uses include synthetic datasets generation and by creating large amounts of synthetic but realistic data, we can improve the performance of models in tasks such as polyp detection in image guided colonoscopy surgery. Synthetic data generation has other advantages since using tools like Blender gives us more control and we can generate a variety of labelled ground truth data from segmentation masks to optic flow fields. Another advantage of this approach is that often we can easily scale up our synthetic datasets by randomising parameters of the modelled 3D geometry [2].

This synthetic data provided may need some additional modification i.e. geometry, texture, lighting, and camera settings to make it more realistic. The challenge will be to generate synthetic data that is realistic enough to be useful for training models, while also being able to generate enough data to train the models effectively. A Blender add-on (or plug-in) developed at UCL in 2023 by a collaboration between ARC and WEISS for the purpose of data generation which would be a starting point for the project with the aim to check data quality and pre-train models for polyp detection, with performance evaluated on open-source datasets.

Learning objectives

  1. learn to practice agile software development in research
  2. learn to practice using synthetic data in applying ML techniques

Learning components

  1. Facilitating collaborative research through agile methodologies.
  2. Using GitHub workflows to do best practices in software reserach engineering.
  3. Utilising python-based programing language for data manipulation, machine learning and AI applications.
  4. Utilising python-based programing language for generating synthetic data using Blender.

Expected outcomes

  • Deliver a project consisting of four stages, and gain experience working with both synthetic data and open-source real world data. The model evalutation phase will incorporate benchmarks for pre-trained models (YoloV7 and therefore ways to improve models with synthetic data.

Project stages

The project stages include: Pre-requisites (Data, Software, and Hardware), model selection, train and evaluation, interface development and presentation.

Prerequisites

  • Skills: Students need to be comfortable (or are willing to self-teach) the following: git for version controlling and Python for data manipulation and machine learning. Blender may also be required for generating synthetic data which can be done through the Blender interface or through scripting in Python.

  • Data request requirements: This project requires access to publicly available synthetic surgical data generated by Blender which includes synthetic colonoscopy and laparoscopic data which can be modified using Blender to generate more realistic data. The list and size of data includes:

  • blender.zip (~61MB) contains colon.blend files that be loaded into Blender and modified
  • polyps.zip (~11.56GB) contain training data as .png
  • examples.blend (~15.67MB) contains .avi videos of colonoscopy and laparoscopy data

Open-source real world datasets are also required for evaluation.

  • Data preparation and cleaning:
    The synthetic data provided by Blender needs to be rendered in .png format and for the polyp detection task there needs to be 2 files generated per video frame:
  1. The full rendered colonoscopy image with polyps
  2. The segmentation mask of the polyps

Blender version 3.4.1 needs to downloaded and installed on your laptop depending on your operating system i.e. Windows 8.1, 10, or 11 (64-bit required), macOS 10.13 or later (macos-arm64.dmg is for Mac M1/M2/M3 and macos-x64.dmg is for Mac Intel), or any modern Linux distribution (64-bit).

You may want to download the latest stable version of Blender which is Blender 3.6 LTS or the latest version of Blender which is version 4.3 However, these versions may not be compatible with the Blender Randomiser add-on and the baseline synthetic data provided. If the students want to test these more up-to-date versions they can report any issues in the Blender Randomiser add-on repository.

Additional Python-based libraries that can be used include: Blender Randomiser plug-in installed if needing to modify data which will also require Blender version 3.4.1.

NOTE: Here are some notebooks and instructions that serve as a great starting point for preparing model fine-tuning and evaluation for polyp detection.

  • Hardware and infrastructure specifications:
    • For laptops without a GPU, consider using Google Colab's free service as an alternative. Change runtime by going to Edit > Notebook settings > T4 GPU, resulting in: Tesla T4 GPU RAM (TPU) with 15.0 GB memory. System: RAM 12.7 GB. Disk: 112.6GB. See details using !nvidia-smi. NOTE "In the version of Colab that is free of charge notebooks can run for at most 12 hours, depending on availability and your usage patterns." source
    • For laptops equipped with a GPU and CUDA drivers:
      • Ensure you have sufficient hard drive space to store data and models.
      • Confirm that your GPU has the necessary CUDA drivers install

Hardware Requirements for Blender:

  • Minimum Requirements:
    • CPU: Dual-core 64-bit processor (Intel or AMD).
  • RAM: 4 GB (8 GB recommended).
    • GPU: Integrated graphics or a discrete GPU with 1 GB VRAM.
    • Disk Space: Approximately 500 MB for installation; additional space required for projects and assets.
  • Recommended Requirements for Smooth Performance:
    • CPU: Quad-core 64-bit processor.
    • RAM: 16 GB or more.
    • GPU (Non-GPU/CPU rendering): Blender can use your CPU for rendering, but it will be slower.
    • GPU (GPU rendering):
      • NVIDIA: GeForce GTX 10xx series or newer, CUDA compute capability 3.0 or higher.
      • AMD: RDNA architecture or newer.
      • macOS: Apple Silicon (M1/M2) or newer with Metal support.
    • Disk Space: 20+ GB for large projects and libraries.
  • High-End Recommendations (Heavy Workloads like Simulations or Cycles Rendering):
    • 32+ GB RAM, NVIDIA RTX-series GPUs, and NVMe SSD for storage.

Model training and model evaluation

  1. Pre-train YoloV7 model using synthetic data
  2. Evaluate the metrics reported by YoloV7 i.e. precision, recall and mean average precision (mAP) on real-world datasets

Interface development

To present reporting results, we recommend developing a Python-based interface using either Streamlit for a web-based solution or a simple command-line interface with Click or another suitable tool.

Team members and their roles

Roles for each stage will rotate based on each student's expertise, with students encouraged to propose their own roles and rotation schedule before starting the project. This will allow students to take on the positions of Project Manager, Clinical Lead / Principal Investigator (PI), Data Scientist(s), and AI/ML Engineer(s).

Presentation and report

Each group will make a group presentation and each student will write an individual report.

Group work and presentation (20%)

You will be asked to deliver a 15 minute group presentation on the assessment of the project at the end of the project.

Written Report (80%) An 1,500 word individual report documenting the project, reporting the results and the individual’s contribution.

Assignment submission deadline for the Written Report: 30 April 2025, 16:00 BST.

Allocation of marks

As a general guide, marks for the presentation will be allocated according to the following weighting:

Clear statement of goals 4%

Appropriate use of algorithms and tools 4%

Clear statement of results 4%

Evidence of professional approach 4%

Clarity of presentation 4%

As a general guide, marks for the report will be allocated according to the following weighting:

Problem Statement: Background and review 15%

Detailed Project Plan 20%

Summary of Testing, Results 20%

Conclusions & key lessons 15%

Presentation 10%

References

Click to see references
  1. Dowrick, Thomas, Long Chen, João Ramalhinho, Juana González-Bueno Puyal, and Matthew J. Clarkson. "Procedurally generated colonoscopy and laparoscopy data for improved model training performance." In MICCAI Workshop on Data Engineering in Medical Imaging, pp. 67-77. Cham: Springer Nature Switzerland, 2023.

  2. Gollifer, Ruaridh and Minano, Sofia. "Randomising Blender scene properties for semi-automated data generation." Centre for Advanced Research Computing Blogpost, 2023

  3. Blender

  4. Blender Randomiser add-on

  5. Blender synthetic datasets

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors