Skip to content
View Omar-Ar1's full-sized avatar

Block or report Omar-Ar1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Omar-Ar1/README.md

Omar Arbi

🧠 AI Research · ENS Paris-Saclay (MVA) & CentraleSupélec Generative Models · Representation Learning ⚙️

MSc in Mathematics, Vision & Learning (MVA) with a background in applied mathematics and data science.

I work on generative models and representation learning, with a bias toward building systems that actually run; clean experiments, reproducible pipelines, honest benchmarks.

Currently finishing my master's and looking for Applied Scientist / Research Engineering roles in foundation models or generative AI 🚀

🔬 Research Interests

🌊 Diffusion models & noise geometry

Replacing IID Gaussian noise with structured processes (Simplex, Matérn, rank-based Gaussianization) to control denoising difficulty and improve anomaly separability.

Building principled diagnostics to understand why noise geometry shifts AUROC — not just that it does.


🧩 Representation learning

Probing internal feature geometry in Transformers and GNNs via t-SNE, PPCA, KL trajectories.

Interested in when and how representations become interpretable across layers.

🛠 Selected Work

Project Core idea
🦷 Structured-noise diffusion for anomaly detection Simplex/Matérn noise → +11.6% detection on CBCT dental pathology; validated on brain MRI
🔍 GPT-2 interpretability: Tuned Lens vs Logit Lens KL trajectory analysis for prompt injection detection; Tuned Lens consistently more stable
GRPO fine-tuning on Ministral-3B 4-stage curriculum (Mate-in-1 → Full Game) to stabilize sparse-reward RL for chess
🧾 Knowledge distillation: Qwen2.5 14B → 1.5B TF-IDF + NMF corpus curation + LoRA distillation for Arabic summarization
📊 GPU-accelerated PPCA with missing data Fully vectorized EM loop; benchmarked against PCA/mini-batch variants at scale
📈 Online NMF for financial time series Sliding-window factorization with stabilized dictionary evolution

🧰 Stack

Frameworks: PyTorch · PyTorch Lightning · MONAI · Transformers · vLLM · PyG

Training: LoRA/QLoRA · DDP · Slurm · CUDA profiling · gradient checkpointing

Math: variational inference · ELBO · spectral methods · optimal transport

🎯 Now

Finishing my MVA master's.

Actively looking for research engineering or applied scientist roles focused on:

  • 🏗 Foundation models
  • 🎨 Generative modeling
  • 🔎 Interpretability

📎 LinkedIn: https://linkedin.com/in/omararbi

Pinned Loading

  1. onmf-timeseries onmf-timeseries Public

    Jupyter Notebook 3

  2. graph-conv-networks graph-conv-networks Public

    Graph Convolutional Networks Made Transparent

    Jupyter Notebook

  3. LLava_for_Radiographic_Images LLava_for_Radiographic_Images Public

    Jupyter Notebook

  4. monotone-gradient-networks monotone-gradient-networks Public

    PyTorch implementation of Monotone Gradient Networks (MGN) for Optimal Transport and Generative Modeling (Chaudhari et al. 2023).

    Jupyter Notebook