Skip to content

aliasgar-saria/IDEAS-Voxel51_FiftyOne-Multiple_Computer_Vision_Datasets_Exploration_Analysis_Training_Inference_-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring the Voxel51 FiftyOne Computer Vision Toolkit as a Centralized Platform for Cross-Disciplinary Data Analysis, Modeling, Embedding and Visualization

This repository contains the collective work of Group 18 (2025) for the Autumn Advance Data Science Internship conducted by the IDEAS – Institute of Data Engineering, Analytics and Science Foundation, ISI Kolkata.

The project demonstrates the versatility of the Voxel51 FiftyOne toolkit as a centralized platform for data analysis, modeling, embedding, and visualization across diverse scientific domains.

About Voxel51 FiftyOne

FiftyOne is a powerful open-source toolkit designed to enhance the quality of datasets and computer vision models. It provides a flexible and interactive environment to visualize, curate, and analyze complex data, bridging the gap between raw data and machine learning models.


Project Chapters

This repository is structured into five chapters, each exploring a unique application of the FiftyOne toolkit:

  1. Chapter 1: Medical Image Analysis of IDRiD Dataset - Aliasgar Saria
    • Analysis of the Indian Diabetic Retinopathy Image Dataset (IDRiD), demonstrating dataset management, disease severity prediction (96% accuracy), and the use of FiftyOne's "Brain" for advanced error analysis.
1
  1. Chapter 2: Astronomical Morphology Classification - Sk Salman Parbhage
    • A novel multi-representation ensemble learning methodology on the Galaxy10 DECals dataset, achieving 99.66% classification accuracy and using FiftyOne for comprehensive misclassification analysis.
image51
  1. Chapter 3: Image Deduplication - Mukesh G
    • A systematic workflow for data cleaning using perceptual hashing to efficiently identify and manage duplicate images, showcasing FiftyOne's capabilities in data curation.
image42
  1. Chapter 4: Physical Sciences Data Analysis - Arja Banerjee
    • A novel application of FiftyOne to a numerical dataset of specific heat capacities, successfully visualizing and validating the classical Dulong–Petit Law.
image61
  1. Chapter 5: Foundational Computer Vision Exploration - Sritoma Roy
    • A core demonstration on the Caltech101 dataset, showing how to generate and visualize image embeddings to uncover semantic structures, clusters, and anomalies.
image60

Core Technologies

  • Primary Framework: Voxel51 FiftyOne
  • Programming Language: Python
  • Machine Learning: Scikit-learn, XGBoost, PyTorch
  • Key Concepts: Deep Learning Embeddings (ViT, EfficientNetV2, CLIP), Perceptual Hashing, Dimensionality Reduction (UMAP, t-SNE)

Key FiftyOne Features Explored

  • Interactive Data Visualization: Using the FiftyOne App to explore datasets, visualize complex labels, and analyze model predictions.

Documentation and Presentations: (https://tinyurl.com/y66nx2y5)

  • Advanced Analytics with FiftyOne Brain: Leveraging tools for similarity searches, uniqueness detection, and identifying labeling mistakes.
  • Comprehensive Dataset Management: Efficiently organizing, querying, and manipulating large and diverse datasets.
  • In-depth Model Evaluation: Analyzing model performance through interactive confusion matrices and precision-recall curves.

This project serves as a practical guide and a testament to FiftyOne's capability as a unifying platform for data-centric AI workflows across various scientific and technical fields.

About

Exploring the Voxel51 FiftyOne Computer Vision Toolkit as a Centralized Platform for Cross-Disciplinary Data Analysis, Modeling, Embedding and Visualization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors