GLEAM (Galaxy Learning and Modeling) is a suite of machine learning tools for the Galaxy platform. Developed by the Goecks Lab, GLEAM empowers researchers to train models, generate predictions, and produce reproducible reports—all from a user-friendly interface without writing code.
- Modern best practices for machine learning
- Reproducible and scalable workflows
- Machine learning support for diverse data types: tabular, image, text, categorical, and more
- Deep learning via Ludwig and automated ML via PyCaret
- Easy installation in Galaxy via XML wrappers
- Auto-generated visual reports
Machine learning for structured tabular datasets using PyCaret.
- Train classification and regression models
- Evaluate performance and extract feature importance
- Generate predictions on new datasets
- Create interactive HTML reports
Deep learning-based image classification using Ludwig.
- input files: Zip file with images and csv with metadata
- Tasks: classification
- Models available: ResNet, EfficientNet, VGG, Shufflenet, Vit, AlexNet and More...
- Output: Ludwig_model file, a report in the form of an HTML file (with learning curves, confusion matrices, and etc...), and a collection of CSV/json/png files containing the predictions, experiment stats and visualizations.
General-purpose interface to Ludwig's full machine learning capabilities.
- Train and evaluate models on structured input (tabular, image, text, etc.)
- Expose Ludwig’s flexible configuration system
- Ideal for users needing advanced model customization
Set of three specialized tools designed to transforms raw, large pathology images into a structured format, enabling the application of best practices for model development and ensuring data readiness for robust and efficient training.
- Image Tiler: Accepts .svs image format, which is the most common proprietary format for digital pathology whole slide images.
- Embedding Extractor: Leverages pre-trained models from the TorchVision foundation models for feature extraction (for example, ResNet50, EfficientNet_B0, DenseNet121).
- Multiple Instance Learning (MIL) Bag Processor: Facilitates the aggregation of embeddings from individual image tiles into "bags" using various pooling techniques (such as Max Pooling or Attention Pooling).
-
Clone the repository:
git clone https://github.com/goeckslab/gleam.git
-
Add entries for each tool in your tool_conf.xml of your galaxy instance:
<tool file="<path-to-your-local-tabularlearner/tabular_learner.xml>" /> <tool file="<path-to-your-local-imagelearner/image_learner_train.xml>" /> <tool file="<path-to-your-local-galaxy-ludwig/ludwig_train.xml>" />
We welcome contributions. To propose new tools, report bugs, or suggest improvements:
-
Fork the repository
-
Create a feature branch
-
Commit and test your changes
-
Submit a pull request