Skip to content
View BulutHamali's full-sized avatar

Block or report BulutHamali

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BulutHamali/README.md

Bulut Hamali

Turning Biological Complexity Into Computational Clarity

Typing SVG


The Mission

The gap between raw genomic data and actionable clinical insight is vast — and mostly unautomated.

My work sits precisely in that gap. I design and build AI-driven systems that transform high-dimensional biological data (single-cell transcriptomics, variant calls, clinical trial criteria) into structured, auditable intelligence. Whether that means orchestrating a multi-agent LLM pipeline to verify patient eligibility or engineering a high-performance variant detection tool in Python, the goal is the same: make the biology computable and the computation trustworthy.


Currently Building

Status Project Focus
🛠 Ongoing ClinPilot Expanding the FDA Guardrail agent with real-time ClinicalTrials.gov API integration
🛠 Ongoing Spatial Transcriptomics Pipeline Adding cell-cell communication inference via CellChat
🛠 Ongoing MERN Clinical Dashboard Building a patient-facing eligibility interface for ClinPilot

Featured Projects


ClinPilot — Multi-Agent Clinical Trial Orchestrator

How do you reliably match a real patient to the right clinical trial when the eligibility criteria span 40+ pages of regulatory language?

Problem Clinical trial eligibility verification is time-intensive, error-prone, and bottlenecked by human reviewers who must reconcile unstructured patient records against complex inclusion/exclusion criteria. A single missed criterion can disqualify a patient or expose a site to compliance risk.

Solution A 5-agent LLM pipeline where each agent owns a distinct epistemic role, running in sequence with structured handoffs:

Agent Role
Researcher Retrieves and parses relevant trial criteria via RAG over ChromaDB
Advocate Constructs the case for patient eligibility
Critic Constructs the case against, stress-testing edge cases
Auditor Reconciles the Advocate/Critic debate into a structured verdict
FDA Guardrail Final pass for regulatory compliance and citation integrity

The Granville Strategy underpins performance: a SQLite-based semantic caching layer intercepts repeated or near-duplicate queries before they reach the LLM, reducing latency and inference cost by short-circuiting redundant reasoning chains.

Tech Stack

Python Llama ChromaDB SQLite LangChain

Architecture: RAG → Multi-Agent Deliberation → Regulatory Guardrail → Cached Response
Caching:      Granville Strategy (SQLite semantic cache) — reduces redundant LLM calls
Model:        Llama 3.3 70B (instruction-tuned) via local inference

Bioinformatics Research Portfolio

Spatial Transcriptomics — Colon Cancer Tumor Microenvironment

Problem Bulk RNA-seq averages over all cells in a tissue, obscuring the spatial organization of tumor, immune, and stromal compartments critical to understanding cancer progression.

Solution End-to-end spatial transcriptomics pipeline processing 10x Visium data: spot deconvolution, spatially-variable gene detection, and ligand-receptor interaction mapping to characterize the colon cancer tumor microenvironment at tissue resolution.

Tech Stack

R Python Seurat Squidpy


scRNA-seq Analysis — Gastric Cancer Cell Atlas

Problem Gastric cancer subtypes are clinically heterogeneous, and bulk profiling fails to resolve the malignant, immune, and fibroblast populations driving treatment resistance.

Solution Single-cell RNA-seq pipeline from raw count matrices through clustering, cell-type annotation, differential expression, and trajectory inference — reconstructing the cellular landscape of gastric tumor samples.

Tech Stack

Python Scanpy R


GATK Variant Calling — INDELseek

Problem Standard short-read variant callers under-perform on insertion/deletion detection in low-coverage or complex genomic regions, producing false-negative calls that matter clinically.

Solution High-performance Python implementation of a targeted INDEL detection pipeline built on GATK best-practices, with custom filtering logic and optimized I/O for large cohort processing.

Tech Stack

Python GATK Bash Bioconductor


Full-Stack Engineering

Problem Biological insights trapped in notebooks and scripts are invisible to clinicians, collaborators, and stakeholders who need them most.

Solution MERN-stack applications that surface genomic and clinical data through accessible, authenticated interfaces — from gene expression visualizers to role-based clinical dashboards with RESTful APIs and JWT-secured endpoints.

Tech Stack

React Node.js Express MongoDB JavaScript


Skills Matrix

Bioinformatics & Data Science

Domain Tools & Methods
Single-Cell Genomics Scanpy, Seurat, scRNA-seq clustering, trajectory inference
Spatial Transcriptomics 10x Visium, Squidpy, spatially-variable gene analysis
Variant Analysis GATK, INDEL detection, VCF filtering, cohort pipelines
Statistical Analysis R/Bioconductor, differential expression, survival analysis
Data Engineering Pandas, NumPy, high-performance Python, HPC/SLURM

AI & LLM Orchestration

Domain Tools & Methods
Multi-Agent Systems LangChain, custom agent graphs, deliberation frameworks
Retrieval-Augmented Generation ChromaDB, vector embeddings, semantic search
LLM Integration Llama 3.3 70B, OpenAI API, prompt engineering
Caching & Optimization SQLite semantic cache (Granville Strategy), inference cost reduction
Regulatory AI FDA-aware guardrail agents, citation integrity, audit trails

Full-Stack Engineering

Domain Tools & Methods
Frontend React, JavaScript (ES6+), responsive UI
Backend Node.js, Express, RESTful API design
Database MongoDB, SQLite, data modeling
Auth & Security JWT, role-based access control
DevOps Git, GitHub Actions, Linux/Bash, Docker

Coding Activity

From: 25 February 2026 - To: 04 March 2026

Markdown   26 mins               █████████████████████░░░░   83.67 %
YAML       5 mins                ████░░░░░░░░░░░░░░░░░░░░░   16.33 %

GitHub Stats

Profile Summary

Stats Top Languages by Commit Top Languages by Repo

GitHub Streak

Activity Graph

DNA Contribution Animation


"The most consequential code running today is the code that interprets biology."

Pinned Loading

  1. scRNAseq-Pipeline-GastricCancer scRNAseq-Pipeline-GastricCancer Public

    Reconstruction and augmentation of the Wang B. et al. (2021) paper's computational framework for dissecting metastatic gastric cancer through their provided single-cell transcriptomics dataset.

    Jupyter Notebook

  2. GATK-Based-Variant-Calling-and-Analysis-on-Chromosome-17 GATK-Based-Variant-Calling-and-Analysis-on-Chromosome-17 Public

    This repository contains a comprehensive pipeline for germline variant calling and analysis using the Genome Analysis Toolkit (GATK).

  3. interactive-registration-form interactive-registration-form Public

    A fully interactive and responsive registration form built with HTML, CSS, and JavaScript. Features live validation, user feedback, and accessibility enhancements.

    JavaScript

  4. personal-blog-sba personal-blog-sba Public

    A clean and responsive personal blog layout with structured posts and smooth navigation, built with HTML, CSS, and JavaScript.

    JavaScript

  5. url-shortening-api url-shortening-api Public

    A responsive URL shortening web app using Tailwind CSS and vanilla JavaScript. Features CleanURI API integration, copy-to-clipboard functionality, and mobile-first design.

    HTML

  6. solubility-prediction-machine-learning solubility-prediction-machine-learning Public

    Jupyter Notebook