Project owner: Ruaridh Gollifer
These are the areas to focus on for the Blender part of the project:
- Blender Scripting Tab with UI
- Blender Materials panel/tab walkthrough
- Blender Geometry Panel Basic
- Blender Geometry advanced recursive nodes
Example of how to render data using Blender by running the run_RandomiserBlender.sh script:
bash run_RandomiserBlender.sh -blend ../../datasets/blend_files/colon.blend -json_in ../input_files/input_bounds_hackathon.json -json_out ../output_files/test_docker.json -seed 32 -frame 10 -basename ../output_files/docker_output_frame -render_mainImage Guided Surgery (IGS) researchers use machine learning in some form (registration, segmentation, stereo reconstruction, classification etc.) with polyp detection in image guided colonoscopy surgery (project 3) an example. However, often there is limited real data that can be used to train machine learning models in IGS research, and moreover any large datasets would involve extensive time consuming manual labelling from a trained clinician [1].
One potential solution to address this clinical challenge is synthetic data generation which can provide us with large amounts and a variety of realistic ground truth data, which in real data would be very challenging to generate or would involve extensive time consuming manual labelling. Blender is a free and open-source software that can be used for 3D geometry rendering. Uses include synthetic datasets generation and by creating large amounts of synthetic but realistic data, we can improve the performance of models in tasks such as polyp detection in image guided colonoscopy surgery. Synthetic data generation has other advantages since using tools like Blender gives us more control and we can generate a variety of labelled ground truth data from segmentation masks to optic flow fields. Another advantage of this approach is that often we can easily scale up our synthetic datasets by randomising parameters of the modelled 3D geometry [2].
This synthetic data provided may need some additional modification i.e. geometry, texture, lighting, and camera settings to make it more realistic. The challenge will be to generate synthetic data that is realistic enough to be useful for training models, while also being able to generate enough data to train the models effectively. A Blender add-on (or plug-in) developed at UCL in 2023 by a collaboration between ARC and WEISS for the purpose of data generation which would be a starting point for the project with the aim to check data quality and pre-train models for polyp detection, with performance evaluated on open-source datasets.
- learn to practice agile software development in research
- learn to practice using synthetic data in applying ML techniques
- Facilitating collaborative research through agile methodologies.
- Using GitHub workflows to do best practices in software reserach engineering.
- Utilising
python-based programing language for data manipulation, machine learning and AI applications. - Utilising
python-based programing language for generating synthetic data using Blender.
- Deliver a project consisting of four stages, and gain experience working with both synthetic data and open-source real world data. The model evalutation phase will incorporate benchmarks for pre-trained models (YoloV7 and therefore ways to improve models with synthetic data.
The project stages include: Pre-requisites (Data, Software, and Hardware), model selection, train and evaluation, interface development and presentation.
-
Skills: Students need to be comfortable (or are willing to self-teach) the following:
gitfor version controlling andPythonfor data manipulation and machine learning.Blendermay also be required for generating synthetic data which can be done through theBlenderinterface or through scripting inPython. -
Data request requirements: This project requires access to publicly available synthetic surgical data generated by Blender which includes synthetic colonoscopy and laparoscopic data which can be modified using Blender to generate more realistic data. The list and size of data includes:
- blender.zip (~61MB) contains colon.blend files that be loaded into Blender and modified
- polyps.zip (~11.56GB) contain training data as .png
- examples.blend (~15.67MB) contains .avi videos of colonoscopy and laparoscopy data
Open-source real world datasets are also required for evaluation.
- Data preparation and cleaning:
The synthetic data provided by Blender needs to be rendered in .png format and for the polyp detection task there needs to be 2 files generated per video frame:
- The full rendered colonoscopy image with polyps
- The segmentation mask of the polyps
- Software requirements and dependencies: A laptop with a Python virtual environment configured, including the following libraries: Pandas and PyTorch for YoloV7
Blender version 3.4.1 needs to downloaded and installed on your laptop depending on your operating system i.e. Windows 8.1, 10, or 11 (64-bit required), macOS 10.13 or later (macos-arm64.dmg is for Mac M1/M2/M3 and macos-x64.dmg is for Mac Intel), or any modern Linux distribution (64-bit).
You may want to download the latest stable version of Blender which is Blender 3.6 LTS or the latest version of Blender which is version 4.3 However, these versions may not be compatible with the Blender Randomiser add-on and the baseline synthetic data provided. If the students want to test these more up-to-date versions they can report any issues in the Blender Randomiser add-on repository.
Additional Python-based libraries that can be used include: Blender Randomiser plug-in installed if needing to modify data which will also require Blender version 3.4.1.
NOTE: Here are some notebooks and instructions that serve as a great starting point for preparing model fine-tuning and evaluation for polyp detection.
- Hardware and infrastructure specifications:
- For laptops without a GPU, consider using Google Colab's free service as an alternative.
Change runtime by going to
Edit>Notebook settings>T4 GPU, resulting in: Tesla T4 GPU RAM (TPU) with 15.0 GB memory. System: RAM 12.7 GB. Disk: 112.6GB. See details using!nvidia-smi. NOTE "In the version of Colab that is free of charge notebooks can run for at most 12 hours, depending on availability and your usage patterns." source - For laptops equipped with a GPU and CUDA drivers:
- Ensure you have sufficient hard drive space to store data and models.
- Confirm that your GPU has the necessary CUDA drivers install
- For laptops without a GPU, consider using Google Colab's free service as an alternative.
Change runtime by going to
Hardware Requirements for Blender:
- Minimum Requirements:
- CPU: Dual-core 64-bit processor (Intel or AMD).
- RAM: 4 GB (8 GB recommended).
- GPU: Integrated graphics or a discrete GPU with 1 GB VRAM.
- Disk Space: Approximately 500 MB for installation; additional space required for projects and assets.
- Recommended Requirements for Smooth Performance:
- CPU: Quad-core 64-bit processor.
- RAM: 16 GB or more.
- GPU (Non-GPU/CPU rendering): Blender can use your CPU for rendering, but it will be slower.
- GPU (GPU rendering):
- NVIDIA: GeForce GTX 10xx series or newer, CUDA compute capability 3.0 or higher.
- AMD: RDNA architecture or newer.
- macOS: Apple Silicon (M1/M2) or newer with Metal support.
- Disk Space: 20+ GB for large projects and libraries.
- High-End Recommendations (Heavy Workloads like Simulations or Cycles Rendering):
- 32+ GB RAM, NVIDIA RTX-series GPUs, and NVMe SSD for storage.
- Pre-train YoloV7 model using synthetic data
- Evaluate the metrics reported by YoloV7 i.e. precision, recall and mean average precision (mAP) on real-world datasets
To present reporting results, we recommend developing a Python-based interface using either Streamlit for a web-based solution or a simple command-line interface with Click or another suitable tool.
Roles for each stage will rotate based on each student's expertise, with students encouraged to propose their own roles and rotation schedule before starting the project. This will allow students to take on the positions of Project Manager, Clinical Lead / Principal Investigator (PI), Data Scientist(s), and AI/ML Engineer(s).
Each group will make a group presentation and each student will write an individual report.
Group work and presentation (20%)
You will be asked to deliver a 15 minute group presentation on the assessment of the project at the end of the project.
Written Report (80%) An 1,500 word individual report documenting the project, reporting the results and the individual’s contribution.
Assignment submission deadline for the Written Report: 30 April 2025, 16:00 BST.
Allocation of marks
As a general guide, marks for the presentation will be allocated according to the following weighting:
Clear statement of goals 4%
Appropriate use of algorithms and tools 4%
Clear statement of results 4%
Evidence of professional approach 4%
Clarity of presentation 4%
As a general guide, marks for the report will be allocated according to the following weighting:
Problem Statement: Background and review 15%
Detailed Project Plan 20%
Summary of Testing, Results 20%
Conclusions & key lessons 15%
Presentation 10%