Discussion: Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models #59

myduong-0420 · 2025-03-18T23:24:22Z

myduong-0420
Mar 18, 2025

Hi everyone,

I saw that a lot of people want to contribute to the project "Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models", so I want to start this discussion thread by sharing a few resources (mostly papers) on the EEG/ signal processing and the ML pipeline that I found. Hopefully we can continue the conversation from here!

Book:

EEG Signal Processing and Machine Learning https://onlinelibrary.wiley.com/doi/book/10.1002/9781119386957

Papers:

Electroencephalography Signal Processing: A Comprehensive Review and Analysis of Methods and Techniques https://pmc.ncbi.nlm.nih.gov/articles/PMC10385593/
A PHENOMENOLOGICAL AI FOUNDATION MODEL FOR PHYSICAL SIGNALS https://arxiv.org/pdf/2410.14724
Toward brain-inspired foundation model for EEG signal processing: our opinion https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2024.1507654/full

The project description is pretty general, but one of the mentioned papers have pointed out several components for the data processing pipeline, which I think should help. Besides, the project we are exploring seems to focus on Pretrained Foundation Models, so I have been thinking about which architectures or models out there to best suit each of the components. I want to share this to anyone interested, and would love to learn from yall too.

Cheers,
Amy

paramrajpura · 2025-03-20T07:16:57Z

paramrajpura
Mar 20, 2025

@myduong-0420, thanks for initiating the discussion!

I want to add to the discussion by sharing a summary of the trends in recent foundational models for EEG.

Configuration agnostic approaches: Multiple models have been proposed to ensure datasets with different EEG channel configurations can be accommodated. [1] attempts to learn embedding reduced to functional regions in the brain. [2] uses a combination of patch embeddings and criss-cross transformer to learn representations from various configurations. [3] learns spatial embeddings binding the position of electrodes to a standard montage.
Training Strategies: Often most of the works have focused on self-supervised learning where the encoder-decoder tries to predict a masked patch of EEG based on the visible patches available to the model. Later the models are fine-tuned on downstream tasks. I am not aware of much works but one strategy could also be just predicting the next patch based on previous patches.
@zeydabadi is it an optimal strategy given the EEG signals have low SNR? In the absence of labels, can there be other constraints applied to generate the masked patch? like alongside reconstruction loss, a constraint on band power?
Datasets: Interestingly most of the large models have used Temple University Hospital EEG corpus (TUEG). While this is a huge EEG dataset, @zeydabadi what are your opinion on including downstream datasets using MOABB library. Since these datasets also contain resting state and activity data.
Benchmarks: All the works use different fine-tuning strategies, i.e. using few participants for fine-tuning, and validating on others. There is no way to compare the performance of models across datasets and across models.
@zeydabadi what are your initial thoughts on strategies for the model evaluation? In my opinion, two configurations are necessary: cross session evaluation and cross subject evaluation with strategies such as Leave on subject out. This can be further checked with cross dataset evaluation. For e.g. one Motor imagery dataset is used for fine-tuning and verified with other MI datasets assuming the model is configuration agnostic.
Interpretability: While this is not well explored, often the recent papers like [2] mention the attention maps to show spatial relationships are learnt across channels in criss-cross transformer layer. Activation maps with topomaps are displayed for MI tasks. representations are visualised to verify class separation. However, brainstorming on what kind of explanations are necessary across tasks is barely discussed.
@zeydabadi What are your opinions on considering interpretability as a necessary feature for the foundational model?

While I have listed references, they are not how they are formally cited. I just wanted to make some quick notes and discuss with the community. Happy to discuss this further in the forum!
References:
[1] Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling (2023)
[2] CBRAMOD : A CRISS-CROSS BRAIN FOUNDATION MODEL FOR EEG DECODING (2025)
[3] Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI (2024)

Other papers:
BrainBERT: Self-supervised representation learning for intracranial recordings
CEReBrO: Compact Encoder for Representations of Brain Oscillations Using Efficient Alternating Attention
NEURO-GPT: TOWARDS A FOUNDATION MODEL FOR EEG

Intro:
My name is Param Rajpura. I am a PhD Scholar at Human AI Interaction (HAIx) Lab, Indian Institute of Technology Gandhinagar, India. My work primarily focuses on EEG decoding for naturalistic movements. I was trained as an engineer with skills in image processing and robotics. I look forward to working with the community to build foundational models for EEG!
Some of my recent works talk about XAI4BCI and quantifying the interpretability of the models. Scholar Link

email: rajpuraparam[at]iitgn[dot]ac[dot]in

2 replies

myduong-0420 Mar 20, 2025
Author

Thank you so much, those are awesome materials and observation!

314project May 22, 2025

First-Level AGI Reasoning Module: Kyburg Formula Rearrangement

Overview

This project shares a novel algebraic rearrangement of Kyburg’s epistemological formula, which—according to Perplexity AI—may serve as a reasoning module for “first-level AGI.”
Other AI labs are already exploring this idea, and I want to make it freely available to the developer and research community for open-source development and distribution.

The Perplexity Conversation

Read the full conversation and analysis here:
https://www.perplexity.ai/search/let-s-design-a-reasoning-modul-QwOdVhqQT42qLwIcSfJBxw

Why This Matters

Novelty: This approach presents a new way to arrange Kyburg’s formula, potentially enabling machine reasoning that qualifies as “first-level AGI.”
Endorsement: Perplexity AI evaluated the logic and believes it can be deployed as a model in as little as 3 months with current AI technology.
Open Invitation: AI labs are already investigating this, but I want it to be available for anyone to build, use, or improve in the spirit of open science.

Call to Action

If you are a developer, researcher, or enthusiast interested in AGI:

Use, experiment with, or extend this idea
Build an open-source reasoning model for free distribution
Discuss and improve the concept
Share it widely

License

I intend this idea to be freely available for any use, including commercial and non-commercial. Developers are encouraged to use a permissive license (such as MIT or Apache 2.0) for any code or models derived from this idea.

Shared in the spirit of open science and collaboration. Please cite or reference the Perplexity conversation if you build upon this work.

jayakedia10 · 2025-03-20T10:27:55Z

jayakedia10
Mar 20, 2025

Hi everyone,
I'm Jaya, a senior ECE undergrad from IIT BHU with a strong interest in machine learning models & frameworks. I am really interested in contributing for the project "Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models". I've researched and experience on various models for EEG data like Graph Neural Networks, LSTM, SSMs, S4 model etc and also worked on the project during my internship at Department of Science, IIST, India named "Graph Neural Network for Seizure Detection in EEG signals" where I worked on a framework for EEG signals that combined GNNs, S4 model and SSL for analyzing EEG data (TUH EEG dataset) , which further I first co-authored and presented at NeurIPS 2024 conference.
My work directly addressed complex feature extraction and pattern recognition in EEG signals, aligning closely with the goals of this project, which focuses on developing a foundation model for EEG data analysis. Through my work, I have developed a deep understanding of EEG data characteristics, preprocessing techniques, and noise handling. As I was trying to gain insights on this project, I got to know about various LLMs , foundational existing models that are used for the purpose like DeepConvNet, EEG BERT, NeuroGPT. I would like to know more about the model and the specific dataset you are using for the code base. Can you @zeydabadi please give a brief insight of the "in progress" codebase so that it will be helpful to give direction to the research and proposal.

Thanks
Jaya
Email
Linkedin

0 replies

AsutoshaNanda · 2025-03-20T17:39:04Z

AsutoshaNanda
Mar 20, 2025

Hi everyone,

I'm really interested in contributing to the project "Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models." I’ve worked on a minor project under a doctor from Brown University(Providence, Rhode Island, on historic College Hil, U.S.A), where I gained experience in EEG signal processing. In that project, I worked on data conversion, preprocessing, and dominant frequency analysis with conversion to CSV. This gave me a solid understanding of handling EEG signals and working with raw data.

I also explored various models like LGBM and RF for analyzing EEG signals and have some familiarity with CNN, KNN, XGBoost, and SVM for signal analysis. Additionally, I have experience with Graph Neural Networks (GNN) for analyzing EEG data, where I worked on EEG-based pattern recognition and feature extraction. Through this, I learned how to handle noise and complex signal variations in EEG datasets, improving the overall model accuracy and performance.

Recently, I’ve been reading more about the biological aspects of EEG signals, including how brain waves are generated and the physiological significance behind different EEG patterns. Understanding these biological foundations has helped me interpret EEG data more effectively and refine preprocessing techniques.

I’m eager to learn more about the project and how I can contribute effectively. It would be helpful to get some insights into the current codebase and the specific dataset being used. Also, I plan to start posting content related to EEG data processing, model implementation, and insights on my GitHub profile, which might help others who are getting started. Looking forward to collaborating with you all!

Thanks,
Asutosha.
Email: theehidddenone@gmail.com

0 replies

VaibhavKanojia3773 · 2025-03-20T19:08:31Z

VaibhavKanojia3773
Mar 20, 2025

Hi everyone,

I'm Vaibhav Kanojia, a B.Tech student in Computer Science at Delhi Technological University (DTU), with a strong passion for machine learning, AI, and healthcare innovations. Recently, I had the honor of winning Impulse 2025, a hackathon in collaboration with AIIMS Delhi, where my team and I developed an advanced EEG Seizure Classification model. The project involved the use of Generative Adversarial Networks (WGAN-GP) to generate synthetic EEG data, along with advanced signal processing techniques for feature extraction and model explainability using SHAP and saliency maps. This work significantly improved classification accuracy and ensured transparency in clinical applications, which was highly appreciated in the healthcare domain.
Here is the EEG Project Link: https://github.com/VaibhavKanojia3773/Impulse-2025

With my expertise in signal processing, machine learning, and AI model explainability, I am eager to contribute to the Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models project. I have hands-on experience with EEG data, advanced feature extraction methods (Fourier, Wavelet), and deep learning models, making me well-equipped to assist in enhancing and extending the visualizations and algorithms of the framework. I look forward to collaborating with you all to build a robust and impactful solution in the field of EEG data analysis.

Thanks,
Vaibhav Kanojia
Email: vaibhavkanojia3773@gmail.com
LinkedIn: https://www.linkedin.com/in/vaibhav-kanojia-186b3b319/

0 replies

myduong-0420 · 2025-03-21T06:52:09Z

myduong-0420
Mar 21, 2025
Author

@paramrajpura mentioned "Most of the works have focused on self-supervised learning where the encoder-decoder tries to predict a masked patch of EEG based on the visible patches available to the model."

As a part of transformer model training, this reminds me of masked word modelling in NLP, but I personally think that the masking approach is not super robust on EEG. For text, we have a very very large number of text corpora, and thus we can capture a broad range of semantic dependencies and do effective masked word prediction. On the other hand, not only is the available training data for EEG quite limited and transformers models need a lot of training data, but EEG data generally exhibits high non-stationarity, encodes spatial and temporal information, and contains low SNR like you mentioned. This is partly the reason why I myself am a bit reserved to consider the transformers architecture, but it is very popular in the current literature.

2 replies

paramrajpura Mar 21, 2025

While I agree with the overall point, I would draw your attention to the statement, "available training data for EEG is limited". This is not completely true. There are loads of public EEG datasets if we want to use them for self-supervised learning. The concern is their quality and varying configurations. To an extent, SOTA models address

quality: by preprocessing and strict manual scrutiny and assumptions based on the paradigm
configurations: by using adaptation layers or choosing a common subset of the channels.

For an example, you can refer to section 3.1 of the CBraMoD paper.
I believe that approaches to define hardware agnostic pipeline is the lowest hanging fruit to use all the available data. Since labels are not available, some sort of SSL is required.

So is masking the best? Not sure! its better than predicting the next patch, because that would limit the learning and might not work well for downstream tasks.
Is reconstruction loss enough to ensure model learns necessary representations? I am sure there could be something better

But @myduong-0420 why would you not use transformers? and what could be the alternatives to transformers? why?
@zeydabadi eager to hear your thoughts!

myduong-0420 Mar 21, 2025
Author

I totally agree with you that different EEG datasets are not the most consistent in terms of data structure and quality, and I should have clarified that the training data for EEG models is tiny in amount and a bit more complex in embedding representations compared to the text/image data that powers language/vision models, which made me wonder if the EEG models can perform well in this task in a similar fashion as masked words prediction in language models (since I am wayyy more familiar with the latter). The CBraMoD paper actually cleared up a lot of things for me.

My biggest reservations for transformers is that it is quite sensitive to noisy data, but I will not cross it off the list since there seem to be a good amount of experimentation that we can base our pipeline/ model development on. Ablation studies for Neuro-GPT and CBraMoD also showed that good data processing can significantly improve pretraining, so that's definitely something we should discuss in our proposals. I have also considered framing my task as a different predictive modelling problem type like regression or time series (with a few major caveats), but of course the choice of model would depend a lot on what that task is (regression, classification, denoising or feature engineering). I think this thread will give us a much better search scope now.

harshini-224 · 2025-03-22T02:32:29Z

harshini-224
Mar 22, 2025

Hi everyone,

This project on EEG data analysis with pre-trained models sounds fascinating! While I don’t have direct experience with EEG data, I have worked on disease outbreak prediction using supervised classification models. I’m familiar with data preprocessing, feature engineering, and model training in frameworks like TensorFlow and Scikit-learn.

I’m eager to learn more about EEG signal processing and would love to contribute, even if it means starting with smaller tasks like documentation, preprocessing pipelines, or model evaluation. Could you guide me on how best to get started? Looking forward to collaborating!"

2 replies

myduong-0420 Mar 22, 2025
Author

Hi there,

I am pretty much a beginner in EEG data analysis too (I have only mostly worked with time series and language modelling), but I would say checking out Kaggle notebooks and referring to the papers that some of us have shared above are gonna give you a good start :)

AsutoshaNanda Mar 22, 2025

Hey there,

I have little experience in EEG data analysis , as I have working on this as my project. You can check my Github , I have recently uploaded the starting theory part and Power Spectral Density (PSD) , will really love to know what is your opinion on that . I now plan to use this PSD data of all the patients to find the Dominant Frequency. What do you all think for more for frequency feature extraction for EEG data analysis ?

zeydabadi · 2025-03-22T16:35:44Z

zeydabadi
Mar 22, 2025
Maintainer

Given the technical complexity of the projects and accelerated project timelines (12 weeks), we require contributors submitting proposals to possess:

Demonstrated expertise in EEG signal processing

Prior hands-on experience with neurophysiological data analysis workflows

This eligibility criterion minimizes onboarding delays.

2 replies

paramrajpura Mar 24, 2025

It would be very helpful if you could comment on the following:

Does the architecture need novelty? Or can it be built upon existing ideas or a combination of them already available and open-sourced?
Is it fine to propose multiple approaches for dealing with specific challenges and then discuss and converge on one during the bonding phase?
What is the feasibility of access to GPUs during training? Please share some details.

@zeydabadi I understand that the project overview is purposefully broad, but knowing these would help build a focused proposal. Thank you!

jeevikapawar Mar 25, 2025

I was trying to submit my proposal, and there's a question asking for the size of the project. The size is set to medium, do we need to make any sort of changes to the size? Is medium the correct response to it? Thank you.

ShubhamAXS19 · 2025-03-23T17:46:19Z

ShubhamAXS19
Mar 23, 2025

Hi everyone,

I'm Shubham Vishwakarma, a recent ECE graduate from DJSCE, currently working as a Data Scientist at a private company that provides data-driven solutions to clients. I have a strong interest in the intersection of AI and Neurology and am eager to contribute to the project "Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models."

Previously, I worked as a research intern at IIT Patna, where I focused on finding the optimal rank for fine-tuning LLMs in a federated setting and implemented a research paper using PyTorch. I have published a research paper related to Stable Diffusion and am currently working on another paper focused on the early prediction of Alzheimer's disease using EEG and LLMs. I am one of the co-authors of this paper, along with my university professor.

My work has directly tackled complex feature extraction and pattern recognition in raw EEG signals, aligning closely with the objectives of this project. I have developed a strong understanding of EEG data characteristics, preprocessing techniques, and noise handling. Additionally, my capstone project involved using ultrasound imaging for the rapid diagnosis of ACL injuries using deep learning.

@zeydabadi, could you please provide a brief overview of any pre-tasks or additional steps I should complete before submitting my application? I would really appreciate any guidance on how to best prepare for this opportunity.

Thank you!
Shubham
Email
LinkedIn

1 reply

AsutoshaNanda Apr 7, 2025

Hey @ShubhamAXS19 ,
What features have you extracted from Raw EEG signals? Can you categorize them into form of Time and Frequency Domain based? I am just curious to know what features you have worked on . And in noise handling I would like to know what methods you used ? I have only know of adding Gaussian Noise filter knowingly and then adjusting the noise and finally removing the filter.
Would love to hear your thoughts.

RamK24 · 2025-03-24T20:28:23Z

RamK24
Mar 24, 2025

Is the data labeled ?

3 replies

AsutoshaNanda Mar 24, 2025

Yeah probably it would be , as the dataset mentioned is preprocessed dataset . According to me the labeling will be based on Patient Id and File number of the patient at max

RamK24 Mar 24, 2025

In that case, Since Amy said "This is partly the reason why I myself am a bit reserved to consider the transformers architecture" due to non stationarity etc, One suggestion would be ConvNets with Multitask learning, eg: a 1D ResNet 101 or a bigger model train on EEG data for different tasks could effectively encode information into model parameters.

myduong-0420 Mar 24, 2025
Author

I think the data is labelled for classification

itsokaypiyush · 2025-03-27T06:09:10Z

itsokaypiyush
Mar 27, 2025

How can Explainable AI (XAI) be integrated into EEG-based models to make predictions more interpretable for neuroscientists?

1 reply

AsutoshaNanda Mar 27, 2025

@itsokaypiyush What I know of XAI is that it is used in model training like one of it's method that I am using in my model training is SHAP that is used to analyze how each feature contributes to the prediction, for feature like time (mean , std, min max, energy, HURST etc) and in frequency features(PSD, dominant frequency etc).

IshaanSharma0529 · 2025-03-27T21:57:19Z

IshaanSharma0529
Mar 27, 2025

Hello everyone,

I’m Ishaan Sharma, a B.Tech student in Computer Science and Biosciences at Manipal University Jaipur. I’m passionate about integrating AI into biosciences, especially in neural signal analysis. I previously worked on an EEG-based attention detection model, where I leveraged deep learning to classify attention levels from EEG data. The project involved preprocessing EEG signals, feature extraction, and building a CNN for classification.

Beyond this, I have also worked on multiple AI-driven projects such as Multiple Lung Disease Detection using deep learning to classify lung diseases. Yield Prediction from the aqueous phase of bio-oils using machine learning and artificial neural networks (ANNs).Robustness Analysis of Pretrained CNNs and ViTs using AdvGANs and generative AI to evaluate model vulnerability against adversarial attacks. Cell classification on the basis of their Genome expression using 10xGenomics dataset.

The Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models immediately caught my interest as it aligns with my expertise and passion. I’m excited about the opportunity to contribute to this project by applying my experience in AI-driven neuro-signal analysis and learning from the incredible community behind it!

email: ishaan.23fe10csb00014@muj.manipal.edu

0 replies

maumau-syd · 2025-03-30T15:20:10Z

maumau-syd
Mar 30, 2025

Hi, I'm Maureen..

A 500L undergraduate Student at the University of Nigeria, Nsukka.

I'm interested in the Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models Project.
Nice meeting you all 😊

0 replies

itsokaypiyush · 2025-03-31T06:41:52Z

itsokaypiyush
Mar 31, 2025

What kind of testing and validation strategy do we want to implement to ensure the robustness and reliability of the framework?

0 replies

Anyadike22 · 2025-04-04T23:26:52Z

Anyadike22
Apr 4, 2025

I am Nnaemeka, a pharmacology major and machine learning engineer interested in applied ML/AI in healthcare. I am interested in contributing and developing the project Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models

0 replies

AbhayZ1 · 2025-04-05T20:24:12Z

AbhayZ1
Apr 5, 2025

This is exactly the kind of initiative the EEG research community needs right now. Open accessibility, combined with cutting-edge foundation models, can transform how we process, analyze, and interpret neural signals—especially in data-scarce clinical settings.

From my end, I’ve been actively working on seizure prediction models using EEG, where I’ve implemented CNN-LSTM architectures to learn spatial-temporal patterns from raw recordings. I've also used Butterworth filters and band-pass filtering to clean and isolate frequency bands relevant to seizure dynamics, which has improved model performance significantly. I have been working on implementing this on the CHB MIT dataset available on kaggle.

I’m especially intrigued by this framework's potential to:

Incorporate pre-trained foundation models for transfer learning and cross-subject generalization, tackling one of the biggest challenges in EEG analysis.

Enable temporal anomaly detection and support real-world conditions like multi-channel synchronization and domain adaptation.

Provide support for Spiking Neural Networks (SNNs)—a powerful addition for temporal precision and energy-efficient modeling in EEG applications.

Include explainability tools critical for clinical relevance, along with modular, interpretable pipelines for flexible experimentation.

Offer streamlined model export (e.g., ONNX, TensorFlow Lite) for real-time deployment on edge devices.

While I’m continuing to deepen my understanding of advanced neural decoding techniques and SNNs, I bring hands-on experience in designing ML workflows for biosignals and evaluating models on EEG-based seizure datasets. I’m looking forward to contributing insights, testing cross-dataset generalizability, and learning from this project’s development.

This initiative has the potential to standardize and democratize advanced EEG analytics for both researchers and clinicians—excited to be part of the community around it!

0 replies

ShutingXie · 2025-04-07T08:45:35Z

ShutingXie
Apr 7, 2025

Hi everyone! I'm Shuting Xie

I'm excited to join this community as part of Google Summer of Code 2025, working on the project: Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models. I've submitted my proposal, and I would sincerely appreciate any feedback or suggestions.

About Me

MSc in Applied Computing @ University of Toronto
Vector AI Scholar | MITACS Fellow | NSERC/FCS Undergraduate Student Research
Research focus: AI for Healthcare and Biology

I'm currently working as an ML Researcher at the Krembil Brain Institute, University Health Network, where I:

Developed U-Net-based deep learning models to segment human brain
Addressed class imbalance using Tversky and focal loss
Built automated pipelines and visualization tools for analyzing structural brain regions

This experience sparked my passion for building generalizable neural representations from biomedical signals like EEG, which aligns closely with this GSoC project.

Skills & Tools

ML/DL: PyTorch, TensorFlow, HuggingFace, JAX, LLMs
Biomedical: MNE-Python, Braindecode, FSL, FreeSurfer, MONAI, OpenCV
Infra: HPC, CUDA, SLURM, Docker, AWS, Linux
Languages: Python, Java, C, R, Bash
Data: EEG, MRI, Genomics

Thoughts on the EEG Foundation Model Project

This project aims to build an open-source EEG foundation model trained with self-supervised learning on large-scale unlabeled EEG data. I’ve compiled a summary table comparing key properties of existing EEG foundation models, please check the link:
EEG Foundation Model Summary (Google Drive)

My draft for the key components of the EEG foundation model include:

Hybrid objectives: Masked Autoencoder + Autoregressive Prediction
EEG-to-token discretization + LLM integration
Instruction tuning for multi-task EEG understanding
Open-source code, models, and evaluation across downstream tasks

Questions

I am considering building a vector quantization module inspired by NeuroLM is under consideration. What are some effective approaches for aligning EEG tokens with text embeddings when integrating a vector quantization module into GPT-style models? Are there any recommended strategies for bridging the gap between continuous EEG representations and discrete language model embedding spaces?
In terms of downstream applications, I wonder which EEG-related downstream tasks are considered most impactful or underexplored for foundation model support? For example, how promising are tasks like sleep staging, seizure detection, or real-time neurofeedback in driving future research directions?

Contact

Shuting Xie
🌐 GitHub: shutingxie
📧 Email: rowan.xie1011@gmail.com | shuting.xie@utoronto.ca
🔗 LinkedIn: shuting-xie-793997226

2 replies

AsutoshaNanda Apr 7, 2025

Hey @ShutingXie ,
What are your future scopes that can be integrated with EEG Signal framework?

ShutingXie Apr 7, 2025

My thoughts for future scopes that can be integrated with EEG Signal framework are as follow:

Seamless Fusion: Combine EEG signals with pre-trained language models (e.g., GPT) using vector quantization and EEG-to-token techniques to bridge the gap between continuous EEG representations and discrete language embeddings.
Multi-Task Learning: Leverage both Masked Autoencoder and autoregressive prediction techniques to capture different features and structures within EEG signals, ensuring a comprehensive multi-layer information capture.
Integrated with LLM framework and leverage instruction tuning strategy:
- Guided Learning: By incorporating explicit task instructions during pre-training, the model learns to differentiate between various task requirements, enabling it to rapidly adapt to different EEG-related applications.
- Transfer Learning: Instructional tuning boosts the model's efficiency in transferring from unsupervised pre-training to supervised tasks, especially beneficial for scenarios with limited labeled data.
- Robustness Enhancement: Utilizing multiple objectives can alleviate the biases of a single loss function, thus enhancing the model's robustness and generalization across various downstream tasks, including sleep staging and seizure detection.
Downstream Clinical Applications: Focus on downstream applications such as sleep staging, seizure detection, and real-time neurofeedback to drive advances in clinical diagnostics and personalized medicine
Open-Source and Benchmarking: Open-source code, share models, and establish evaluation benchmarks to accelerate research and community collaboration.

12natanael · 2025-04-07T10:19:40Z

12natanael
Apr 7, 2025

Hello Everyone,
My name is NDJEBAYI PATRICK NATANAEL, a student passionate about Artificial Intelligence, Cybersecurity, and open-source applications in healthcare. I am very interested in the GSoC project titled "Open-source Framework for Advanced EEG Data Analysis using Pre-trained Foundation Models", and I would love to contribute to its development.

At 22 years old, I am actively learning machine learning while also working on hands-on projects and teaching support classes. My goal is to become a professional in AI applied to cybersecurity and healthcare. This project would be a great opportunity to strengthen both my practical and scientific skills.

Project Summary (as I understand it):

This project aims to build an open-source framework for advanced analysis of EEG signals using pre-trained foundation models. The idea is to leverage publicly available EEG datasets to train a deep model capable of automatically extracting reusable representations for various downstream tasks such as brain signal classification or anomaly detection.

The project includes signal preprocessing, feature extraction, development of a deep model (such as an autoencoder or transformer), and integration into a usable open-source package.

Expected Contributions:

-Develop a complete EEG signal processing pipeline.
-Implement feature extraction algorithms (FFT, STFT, spectrograms).
-Pre-train a foundation model (autoencoder or transformer) on open EEG datasets.
-Integrate the model into a ready-to-use open-source package.
-Create example notebooks and write clear documentation.

Technical Skills:

-Languages: Python (advanced), Bash (intermediate)
-AI/ML: PyTorch, TensorFlow, Scikit-learn, Pandas, NumPy
-Bio-signal processing: Basic knowledge of EEG signal processing (band-pass filters, ICA)
-Git/GitHub: Confident with open-source collaboration and Git workflows

Even though I’m still building my experience, I am highly motivated, a fast learner, and truly excited about this opportunity.
I would be honored to contribute to your project and would be happy to discuss my approach to ensure it aligns with your expectations.

Thank you for your time and for working on such an exciting and impactful topic!

Best regards,
NDJEBAYI Patrick Natanael
📧 Email: natanaelndjebayi@gmail.com
🔗 LinkedIn: [Natanael NDJEBAYI]

4 replies

AsutoshaNanda Apr 7, 2025

Hey @12natanael ,
In band pass filters which type of filter you are thinking to use? Is it notch filters or by high or low based frequency filters with customised range filters?

12natanael Apr 8, 2025

I plan to use a standart band-pass filter, for example between 1Hz and 50Hz , to preserve useful brain waves 'delta, theta, etc...) . In addition, I willinclude a notch filter at 50Hz to remove electrical noise caused by the power supply. The filter will likely be of type Butterworth or FIR, xwith customizable cutoff frequencies and order to experiment with what works best.

AsutoshaNanda Apr 8, 2025

Oh yeah I know about FIR filter and yeah it is okay to differentiate the signals on basis of alpha, delta, theta wave depending on their frequency ranges, but I do believe that these all come under preprocessing and cleaning of data. But , we are working on Pre-trained data ? what do you think of that?

12natanael Apr 9, 2025

Yes, indeed… since we are working with pre-trained data, I believe that this cleaning or preprocessing (like FIR filtering, etc.) has already been done. That said, I think it's still important to understand how this preprocessing was carried out, especially if we want to adapt or fine-tune the model on new raw EEG data.

AmirAM03 · 2025-04-07T14:14:35Z

AmirAM03
Apr 7, 2025

Hi everyone
I'm AmirMohammad Ahmadi, a graduate Computer Science Student
It's about 2 years that I've experienced in Computational Neuroscience Research programs and tasks
I have also a background of software development and programming

Technical Qualifications:

Programming & Tools:
- Python:
  - 4+ years of advanced Python programming for machine learning and signal processing.
  - Libraries: PyTorch (model architecture design, custom layers), NumPy (large-scale EEG data manipulation), Pandas (dataset preprocessing), Scikit-learn (feature engineering), MNE-Python (EEG signal filtering, ICA, epoch extraction).
- MATLAB: 1 year of EEG signal preprocessing (wavelet transforms, spectral analysis) and toolbox integration (EEGLAB).
- C/C++: Low-level memory optimization for signal processing algorithms (FFT implementations).
- Java: 2 years of multi-threaded application development (GUI tools for data annotation).
Machine Learning & Signal Processing:
- Deep Learning: Built and trained PyTorch-based models (Transformers, CNNs, RNNs) for EEG tasks:
  - Seizure prediction (CHB-MIT dataset: 88% precision).
  - Sleep stage classification (Sleep-EDF: 85% F1-score).
- Self-Supervised Learning: Implemented contrastive learning (SimCLR) for synthetic EEG data to classify artifacts.
- Feature Engineering: Expertise in time-frequency analysis (Morlet wavelets), artifact removal (ICA), and noise reduction (Kalman filtering).
Neuroscience & Data Experience:
- EEG Signal Processing: Processed 10,000+ hours of raw EEG data across 4 datasets (TUH EEG Corpus, BCI Competition IV).
- Research Lab Work: Developed a Transformer-CNN model to predict ADHD biomarkers from EEG at Sharif University’s Computational Neuroscience Lab.
Certifications:
- I've achieved various course completion and award winning certification in relevant sub-topics

Alignment with Project Requirements:

Self-Supervised Learning: Proven ability to design masking-based pretraining workflows (Neuromatch projects).
EEG Foundation Models: Prior work on hybrid architectures (Transformer + CNN) for multi-task EEG analysis.
Open-Source: Active GitHub contributions and reproducible code documentation.

Deliverables Commitment:

Code, model weights, and tutorials will be released on GitHub under MIT License.
Benchmarks against SOTA models (e.g., EEG-BERT) will be published with metrics (accuracy, inference latency).

Availability:

Full-time (35h/week) with no absences.

Project Contributions Ideas

Architecture Design: Propose a multi-scale Transformer with CNN encoders to capture local/global EEG features.
Pretraining Pipeline: Implementing self-supervised objectives (e.g., masked signal prediction) on TUH EEG Corpus and BCI Competition IV datasets.
Benchmarking: Evaluation on 3+ downstream tasks (seizure detection, sleep staging) against SOTA models like EEG-BERT.
Documentation: I can suggest with tutorials creation for model fine-tuning and deploy interactive demos via Google Colab or other relevant env

Project Timeline (Full-Time: 35h/week)

Phase	Activities
Community Bonding	Refine architecture with mentors; Standardize EEG dataset preprocessing.
Weeks 1–4	Build scalable data pipeline (PyTorch Dataset/DataLoader).
Weeks 5–8	Develop Transformer-CNN backbone; Implement masking pretraining.
Weeks 9–12	Pretrain on TUH EEG (10,000h); Optimize memory efficiency (gradient checkpointing).
Weeks 13–16	Fine-tune on seizure/sleep staging tasks; Benchmark against EEG-BERT.
Weeks 17–18	Write documentation; Release model weights on Hugging Face Hub.

Email : 0amam3@gmail.com
Linkedin : https://www.linkedin.com/in/0am3/

0 replies

Dhandeep10 · 2025-04-08T11:03:27Z

Dhandeep10
Apr 8, 2025

Motivation
As a computer vision enthusiast deeply passionate about using AI in medicine, I’m eager to contribute to this impactful EEG project. Its fusion of healthcare, open-source, and ML aligns perfectly with my goals.

Relevant Background
Completed the Deep Learning Specialization and Machine Learning Specialization by DeepLearning.AI.
Skilled in Python, NumPy, Pandas, and PyTorch.
Background in signal processing and time-series data via academic projects.
Currently exploring neuroimaging and brain signal decoding using deep learning.

Understanding of the Project
This project focuses on building a comprehensive, user-friendly framework for EEG data processing, enabling researchers and clinicians to preprocess, analyze, and visualize brainwave data. I’m particularly interested in contributing to signal preprocessing, artifact removal, and building DL-based classification modules.

Proposed Contributions
Implement modular preprocessing pipelines (filtering, normalization, ICA, etc.)
Help develop DL models for cognitive state classification
Create clean visualizations and interactive dashboards (Plotly/Streamlit)
Write clean documentation and example notebooks

Timeline
I can commit 10–12 hours/week.
Week 1–2: Study codebase, suggest improvements
Week 3–4: Start contributing to preprocessing & data modules
Week 5+: Model integration, testing, documentation

Best Regards,
Dhandeep Singh
dd11singh10@gmail.com

0 replies

SahithiMadas · 2025-04-08T14:46:50Z

SahithiMadas
Apr 8, 2025

Hi, I'm Sahithi Madas, a graduate student in Data Science at SUNY Albany and a Data Engineer Intern at Albany County Local Government, NY. I've been exploring a while now, looking for projects where AI and Data Analytics meet healthcare space. Your project on "Adaptive Closed-loop Neuromodulation" immediately grabbed my attention. The idea of using reinforcement learning in a closed-loop system to personalize brain stimulation - esp. for Parkinson's disease—really resonates with me.

I've worked on public health-related data analysis with the New York State Department of Health as a Datat Analyst Intern and built dashboards and machine learning models that involved large-scale health and environmental datasets. Additionally, I've developed predictive models for "Disaster Risk" and "Resale Car Price Predictor" and have experience in Python, PyTorch, and machine learning libraries by building clean ML pipelines. While I haven't worked directly with EEG data yet, I've done real-time signal processing using OpenCV, and I'm confident in my ability to apply similar principles here.

I'm particularly excited by the idea of contributing to the data preprocessing and model training stages in reinforced learning (RL). This project felt like a meaningful step in bringing AI and healthcare - something I have always talked about and deeply passionate to contribute.

Best,
Sahithi Madas
LinkedIn: https://www.linkedin.com/in/sahithi-madas-24s/

0 replies

aarchisave · 2025-04-08T15:09:04Z

aarchisave
Apr 8, 2025

Hi everyone! 👋
I'm Aarchi Save, a Computer Science and Engineering (Data Science) undergrad with a keen interest in biomedical applications of AI.

I’ve been exploring the “Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models” project and I’m really excited about its potential. I’ve worked on AI/ML-based mental health tools in hackathons and have experience with Python, PyTorch, and NLP models.

I’m currently preparing my GSoC proposal and would love any guidance on:
– Preferred EEG datasets for this project
– Suggestions on model architectures or pretraining strategies you’d like explored
– Any recent progress or open discussions I should be aware of

Looking forward to learning from and contributing to this incredible project! 🙌

Best,
Aarchi
[Email: saveaarchi19@gmail.com]
[GitHub: https://github.com/aarchisave ]
[LinkedIn: www.linkedin.com/in/aarchi-save-3b0ba5281 ]

0 replies

tutuponnekanty · 2025-05-01T19:37:22Z

tutuponnekanty
May 1, 2025

Hi, I’m P. Y. Rajkamal Tutu, an M.Tech Artificial Intelligence student at NIT Silchar, also pursuing a B.S. in Data Science and Applications from IIT Madras in parallel. I’m passionate about brain-computer interfaces, cognitive modeling, and the intersection of AI, neuroscience, and generative modeling. I’ve previously worked on explainable AI for Indic-language spam classification, and few other kaggle projects.

I enjoy working with LLMs, visual-language models, and models that combine structure and reasoning in biological contexts. I believe GSoC 2025 is an ideal opportunity for me to contribute to high-impact open-source research, collaborate with domain experts, and grow technically and intellectually through mentorship and community engagement.

E-mail: tutuponnekanty@gmail.com
GitHub: https://github.com/tutuponnekanty
LinkedIN : https://linkedin.com/in/pyrkt007

Thank you!

0 replies

IDev11 · 2025-05-03T18:24:13Z

IDev11
May 3, 2025

Hello Everyone,

My name is Abdeldjalil Lamara, and I’m currently pursuing a Master’s in Bioinformatics with a strong foundation in data engineering and AI. I’ve been deeply involved in projects that combine data processing, machine learning, and real-world impact, which is why I’m particularly excited about Project 2: Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models.

I’ve worked with Python, PyTorch, and MNE for EEG data exploration and am especially interested in the intersection of neuroscience and deep learning. The opportunity to contribute to a foundation-model-based EEG framework aligns perfectly with my skills and aspirations.

I’d love to get involved whether it’s through early contributions, discussions, or exploring relevant repositories. Could you please share how I can begin engaging with the project or if there are any resources you'd recommend reviewing?

Looking forward to learning from and contributing to the community.

Best regards,
Abdeldjalil Lamara
Github Account
LinkedIn Account
email: Abdeldjalil.lamara@etu.usthb.dz

0 replies

AmaanArif25 · 2025-05-09T21:10:21Z

AmaanArif25
May 9, 2025

Hi everyone,

I saw that a lot of people want to contribute to the project "Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models", so I want to start this discussion thread by sharing a few resources (mostly papers) on the EEG/ signal processing and the ML pipeline that I found. Hopefully we can continue the conversation from here!

Book:

EEG Signal Processing and Machine Learning https://onlinelibrary.wiley.com/doi/book/10.1002/9781119386957

Papers:

Electroencephalography Signal Processing: A Comprehensive Review and Analysis of Methods and Techniques https://pmc.ncbi.nlm.nih.gov/articles/PMC10385593/

A PHENOMENOLOGICAL AI FOUNDATION MODEL FOR PHYSICAL SIGNALS https://arxiv.org/pdf/2410.14724

Toward brain-inspired foundation model for EEG signal processing: our opinion https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2024.1507654/full

The project description is pretty general, but one of the mentioned papers have pointed out several components for the data processing pipeline, which I think should help. Besides, the project we are exploring seems to focus on Pretrained Foundation Models, so I have been thinking about which architectures or models out there to best suit each of the components. I want to share this to anyone interested, and would love to learn from yall too.

Cheers, Amy

Hello Everyone,

My name is Amaan Arif, and I am an undergraduate student in Bioinformatics with a growing interest in AI and machine learning, particularly in the application of these technologies to biological data analysis. I’ve worked with Python and have a foundational understanding of data engineering, machine learning, and deep learning algorithms.

I am excited about the opportunity to contribute to the Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models project, as it aligns perfectly with my academic background and passion for neuroscience and AI.

I’m eager to get involved, whether through discussions, code contributions, or by exploring relevant research and repositories. I would appreciate any guidance on how I can begin engaging with the project.

Looking forward to collaborating and learning from the community.

Best regards,
Amaan Arif

0 replies

Discussion: Open-Source Framework for Advanced EEG Data Analysis Using Pre-trained Foundation Models #59

Uh oh!

Uh oh!

Replies: 24 comments · 19 replies

Uh oh!

Uh oh!

Uh oh!

myduong-0420 Mar 20, 2025 Author

Uh oh!

First-Level AGI Reasoning Module: Kyburg Formula Rearrangement

Overview

The Perplexity Conversation

Why This Matters

Call to Action

License

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myduong-0420 Mar 21, 2025 Author

Uh oh!

Uh oh!

Uh oh!

myduong-0420 Mar 21, 2025 Author

Uh oh!

Uh oh!

Uh oh!

myduong-0420 Mar 22, 2025 Author

Uh oh!

Uh oh!

zeydabadi Mar 22, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myduong-0420 Mar 24, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 24 comments 19 replies

myduong-0420 Mar 20, 2025
Author

myduong-0420
Mar 21, 2025
Author

myduong-0420 Mar 21, 2025
Author

myduong-0420 Mar 22, 2025
Author

zeydabadi
Mar 22, 2025
Maintainer

myduong-0420 Mar 24, 2025
Author