Google Summer of Code 2025: Cloudcasting ML Discussion Thread #25
Replies: 24 comments 20 replies
-
|
Hello @dfulu I have opened an issue regarding training models on CPU-only machines. I wanted to ask whether GPU acceleration is strictly necessary for training, or if the models can be trained efficiently on a CPU as well. Are there any recommended optimizations or modifications to make training feasible on CPU-only systems? |
Beta Was this translation helpful? Give feedback.
-
|
Hey @dfulu, My name is Syed Mahmood, and I am a CSE-DS professional with a strong background in machine learning. The Cloudcasting ML project really interests me, and I would love to contribute to it as part of GSoC '25. I wanted to ask if there are any subject-specific resources or key areas I should focus on to better understand this project and align my contributions effectively. I have experience working with PyTorch and ML model development, so I’d love to know if there are any particular frameworks, datasets, or methodologies I should get familiar with beforehand. Looking forward to your guidance! |
Beta Was this translation helpful? Give feedback.
-
|
Hi James, First, came to know about Open Climate Fix, and wanted to give a huge round of applause for building the need of hour tech for climate change. I have previously worked with image and video data, but mainly with medical data. Before I commit, wanted to know is there a need for subject expertise like GIS, Climate Science etc.? You also mentioned that this project is in it's early stage, but is there any similar published word out there? The repo (and other linked repos) you shared doesn't have any detailed information of the models, results or anything as such. Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
Hi James, I'm Vinay Palakurthy. I hope you are doing great! I've gone through Open Climate Fix's website and read the story of OCF. I felt happy reading the origin of Open Climate Fix, developing AI driven solutions to improve efficiency in the energy sector and reducing green house gas emissions. I was particularly impressed by the 5% accuracy improvement in the UK in solar generation forecast with AI cloud Forecasting that out-performed both UK and Europe Met Office Weather Services' when measuring the impact on short-term solar generation forecasting. It's amazing what Open Climate Fix has achieved in 6 years! I'm excited for Cloudcasting to go live in the summer 2025. I'm eager to contribute to the Cloudcasting project, especially given my background in data science and with the expertise in time series forecasting. I believe my skills and knowledge in these areas could be valuable to your team. I have couple of question for you if you don't mind to answer:
Best, |
Beta Was this translation helpful? Give feedback.
-
|
Hey @dfulu First off, thanks for making this discussion space available! I’ve been going through the repo and had a few questions:
|
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
Hi @dfulu, I am a 3rd-year student, pursuing an Integrated M.Tech in Mathematics and Computing. I have a strong background in AI/ML and GPU computing, and I am very interested in contributing to this project. |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu, I would love to be part of the Cloudcasting ML project. Because of my skills in Python, PyTorch, and ML, as well as my working knowledge of Generative Adversarial Networks, I am certain I can contribute positively to the satellite forecast model. I've been digging through the project description and related discussions, and I'm intrigued by the potential to explore different video prediction models like ConvLSTM or Temporal Convolutional Networks (TCNs), and AI weather models. Plus, integrating diffusion models could be a game-changer for reducing blurriness in the forecasts. To get a better sense of the project's technical scope and challenges, I had a few questions: Model Architecture: Have you explored using U-Net or ResNet architectures as the generator in a GAN framework for satellite image prediction? How do these models perform compared to simpler architectures? Evaluation Metrics: Are you using metrics like Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), or Mean Squared Error (MSE) to evaluate the model's performance? Are there any specific benchmarks or baselines you're comparing against? Scalability and Deployment: Do you have plans to deploy the model using tools like Docker and Kubernetes for scalability and cost efficiency? How are you handling hyperparameter tuning and model serving in a production environment? I'm really looking forward to contributing to this project and learning more about your approach. Thanks for your time! |
Beta Was this translation helpful? Give feedback.
-
|
Hi, my name is Catherine, I am from China, and I am currently a second-year master's student in the Cognitive Science Lab. My research focuses on various deep learning models based on EEG signals, and most of my work is based on Pytorch. I am familiar with Python, Transformers, TCN, and LSTM (the above models are replicated because of the need for baseline algorithm comparison). I am very interested in the Cloudcasting ML project, and I would like to contribute to it as part of GSoC '25. I would like to know if there are any specific frameworks, datasets, or methods that I should be familiar with in advance. Since I am currently doing related research, I think there is the ability to operate on a local GPU unless the amount of data reaches the level of LLM. In the sat_pred repository, I saw that train.py calls the pre-trained model. This project is mainly based on video prediction, and I think such practical work can help me accumulate more experience before graduation. In GSoC 2025, will this task focus on improving the model? Will more data be involved? Looking forward to your reply~ I am very eager to join you to contribute to this great work, thank you. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @dfulu and team, This project really caught my attention! The idea of improving short-term satellite image prediction to enhance solar forecasting sounds both challenging and impactful. I have a solid background in machine learning, PyTorch, and Python, and I'm especially interested in applying deep learning to real-world environmental challenges. I'm curious—are you leaning towards any specific architectures for the next iteration? For example, ConvLSTMs, transformers, or diffusion models, as you mentioned? Also, will the evaluation focus more on visual quality (reducing blur) or on downstream tasks like solar forecast accuracy? Excited to learn more about the project and how I can contribute! |
Beta Was this translation helpful? Give feedback.
-
|
Hi James,
Thank you for the clarification! I am very interested in the *Cloudcasting
ML* project and confident that my AI/ML and PyTorch experience will allow
me to contribute effectively.
I’m excited to work on improving simVP and exploring alternative
spatiotemporal models. Would it be possible to share a draft of my GSoC
proposal with you early on for feedback?
Thanks again, and I look forward to collaborating with the OCF team!
Best regards,
Safal Singh
…On Sat, Mar 22, 2025 at 1:19 AM James Fulton ***@***.***> wrote:
Hi @CaiRuinhan <https://github.com/CaiRuinhan>, happy to hear you are
interested in the project.
You are right that this project will focus on trying to beat our current
best performing model (simVP). That could be by changing the model
architecture or by finding a more promising model from the literature. In
terms of frameworks, good knowledge of pytorch is required and knowing
pytorch-lightning would be useful, but lightning would be fairly easy to
pick up. We have our own custom dataset we are working with so there is no
need for knowledge of that. And we will be creating and training
spatiotemporal models so some knowledge of those would be useful.
The full dataset is 1TB of data, so not much compared to state-of-the-art
LLMs. But since this is a compute intensive project, we will likely set up
the student on OCFs internal compute server.
The train.py script doesn't always use a pretrained model, that is only an
option. We have been training models from scratch most of the time and that
option to start from pretrained was added so we could fine-tune our own
models we had already trained
—
Reply to this email directly, view it on GitHub
<#25 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BHCPSEC65NUIEZVCUNCBILT2VRULJAVCNFSM6AAAAABX7JDZC2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENJYGE2DOOA>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu I've been exploring various architectures for this task, and I believe there's potential to enhance the current model by implementing a Hybrid Diffusion-Transformer Model. The idea is to: I'm enthusiastic about contributing to this project, and I would love to discuss my ideas further with you. Additionally, I would appreciate your guidance on how I can effectively contribute to the project. Are there specific areas or issues you would recommend I start with? Also, would you be open to me implementing the Hybrid Diffusion-Transformer architecture to compare its performance with the existing models? |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu and everyone, I’m Ashish Kumar, a second-year undergraduate student at IIT Kharagpur with a strong interest in Machine Learning and Deep Learning. I’ve been exploring the Cloudcasting ML project and find the idea of improving satellite image forecasting with AI really exciting. I’ve worked on ML projects involving PyTorch, computer vision, and time-series forecasting, and I’m particularly interested in exploring a Hybrid Diffusion-Transformer Model for this project. My idea is to: Use Diffusion Models to improve image clarity and reduce blurriness in satellite predictions. Integrate a Transformer-based approach (such as Swin Transformer) to better capture short- and long-term dependencies in satellite image sequences. Utilize Perceptual Loss functions along with MSE/MAE to enhance visual quality. I’d love to discuss my ideas further and get your insights on how I can contribute effectively. Are there specific areas you’d suggest I start with? Also, would it be valuable to experiment with the Hybrid Diffusion-Transformer architecture to compare its performance with existing models? Looking forward to learning and contributing! |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu , I'm a graduate student pursuing my MS in Applied Machine Learning at UMD. I'm very interested in the Cloudcasting ML project for GSoC 2025 and believe my background makes me a strong candidate. My research experience includes extensive work with U-Net architectures for remote sensing applications, where I analyzed 60+ research papers on U-Net variants for satellite imagery. I also have experience with PyTorch, TensorFlow, and model optimization techniques - I recently compressed a BERT model by 80% while maintaining 91.5% of its performance. After reviewing the repo and discussions, I have a few questions: I see SimVP is currently your best performing model. Would you be interested in exploring modifications to address its shortcomings, particularly around the blurriness issue? I'm curious about potentially implementing a hybrid approach that maintains SimVP's strengths while incorporating diffusion techniques for sharper predictions. You mentioned not having fully explored papers that follow and improve upon SimVP. Are there specific aspects of these follow-up architectures you'd like to prioritize investigating? For evaluation, you mentioned MAE as the primary metric since direct PV forecast accuracy measurement is computationally intensive. Are there any other proxy metrics you've found correlate well with downstream PV forecast improvements? I'm excited about the potential to contribute to this project and appreciate your time in answering these questions. |
Beta Was this translation helpful? Give feedback.
-
|
Hey @dfulu and everyone, I am Sathvik, a Data Scientist. Enjoying reading through the thread. One thing that caught my attention was the curiosity around whether improving the sharpness of predictions (with diffusion models or otherwise) actually leads to better solar forecast performance. That got me thinking, maybe we can test that connection on a smaller scale? Like comparing visual metrics like SSIM or perceptual loss with downstream PV accuracy. During my time working with satellite imagery in NASA’s Transform to Open Science workshop, I came across a similar challenge. Our models generated visually stunning results, but they didn’t always translate to better outcomes for the downstream task. That experience made this kind of trade-off really stick in my mind. Open to any feedback in the thread. Let’s build something awesome together :) Cheers, |
Beta Was this translation helpful? Give feedback.
-
|
Hi @dfulu and the Open Climate Fix community, My interest in applying technology to environmental challenges, specifically wildfires, grew significantly due to the severe events we face annually in Bolivia. This concern led me to get more involved starting in 2022. In 2023, I developed an early wildfire detection model which I managed to deploy on a small scale. While it was a helpful contribution, it also made me realize the effectiveness of such solutions is tied to complex temporal, social, and political factors. Unfortunately, the 2024 wildfires in my region were even more devastating, burning over 10 million hectares and blanketing cities in smoke for weeks. This spurred a group of friends and me to seek more proactive ways to help. We decided to focus on mass reforestation using drones. We developed a platform and a small ML model that analyzes satellite imagery to identify soils with a higher probability of success for the germination of pelletized seeds. We had the chance to present this solution at a local smart city-themed hackathon, which we won, aiming to raise awareness of such alternatives among relevant authorities. Early in 2025, we connected with the organization "Bosque Vidas," who shared valuable insights into the real-world challenges of bringing such projects into production and other issues surrounding fire management. Researching further into the connection between wildfires, climate change, and technology led me to discover Open Climate Fix and the GSoC program. I admire OCF's focus on open-source solutions to combat climate change and I'm excited by the possibility of contributing my skills in ML (PyTorch, Computer Vision with satellite data) to this innovative project. Considering that SimVP is the current best-performing model, and a key objective is tackling the blurriness issue (potentially with diffusion or transformers), while the ultimate measure of success is the downstream PV forecast improvement (which is computationally expensive to evaluate frequently), how does OCF/the mentor envision guiding the GSoC contributor? Specifically, how will the project balance exploring approaches that directly improve proxy metrics (like MAE, SSIM, visual sharpness) versus potentially different approaches that might initially score lower on these proxies but could better capture the underlying physical phenomena relevant to solar irradiance, thus potentially leading to a greater final impact on the PV forecast? Thank you very much for creating this space and for your time. I look forward to the possibility of collaborating! |
Beta Was this translation helpful? Give feedback.
-
|
Hi, @dfulu I’ve been exploring architecture ideas for improving the Cloudcasting model and wanted to share two promising directions grounded in recent research. Would love your thoughts on which aligns better with OCF’s goals! Option 1: SimVP++-FNO Hybrid
Pros: Retains SimVP’s efficiency, adds physical consistency via FNO, T4-friendly. Option 2: Optimized Diffusion
Pros: Superior visual fidelity, sharper outputs. Some clarifying questions
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @dfulu, I am Vishwajit Sarnobat, currently working as an AI-ML intern at ISRO (Indian Space Research Organisation) and pursuing my B.Tech. My partner and I are working on a very similar problem, utilizing precipitation satellite imagery available at 30-minute intervals to predict the next six frames here at ISRO. ISRO currently employs Pysteps (which uses the Optical Flow or Lucas-Kanade method) for precipitation prediction over the Indian Subcontinent. However, it offers just enough accuracy, and previous neural network models implemented by other interns produce blurry predictions (to reduce the metrics, models smooth out the predictions over pixels, which is not practically helpful). |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu! I'm excited about the project focused on improving satellite predictions with machine learning. My background includes experience with ML through the Amazon ML Summer School 2024 and practical implementations like an LSTM for time-series data. I'm currently working on enhancing hourly temperature predictions using RNNs, which aligns well with the challenges of satellite data analysis. I'm passionate about leveraging ML for environmental applications and look forward to submitting a proposal. One question I have, particularly regarding the satellite data, is: What are the primary sources and formats of the satellite data we'll be working with, and are there any known challenges or biases within this data that we should be aware of from the outset? |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu! I'm excited about the project focused on improving satellite predictions with machine learning—it's a fascinating and timely application. I'm currently pursuing an MS in Artificial Intelligence at Yeshiva University, where my coursework includes deep learning, reinforcement learning, NLP, and predictive modeling. One question I have is: How is model performance currently evaluated in the project—are the focus areas metrics like SSIM or MSE on the predicted satellite frames, or is there a stronger emphasis on downstream performance, such as improvements in solar energy forecasting? |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu , I'm excited about the opportunity to contribute to the Cloudcasting ML project. With my prior experience in Python, PyTorch, machine learning, deep learning, Computer Vision along with a solid understanding of Generative Adversarial Networks and diffusion models. I’m confident in my ability to make a meaningful impact on the satellite forecasting model. I've been closely reviewing the project description and related discussions, and I'm particularly fascinated by the potential to explore various video prediction models. |
Beta Was this translation helpful? Give feedback.
-
|
Hello @dfulu, I am Satyam Sinha, I am actively exploring into sat_pred repository, I am currently facing issue finding .zarr files to execute train.py file. I'd really appreciate your help in resolving this.
On the other hand, I went through the Project description and I have mostly figured out about the works to be done to improve model using SimVP. Simultaneously, I would also like to give it a try with other models like Earthformer or UNet variants model and evaluate models on MAE and visual consistency of predictions. Further I will be working into blurriness mitigation using GANs or perceptual losses and conduct comparisons between SimVP and newer models. Please let me know if my approach is appropriate enough to get started with this project. Thanks and regards, |
Beta Was this translation helpful? Give feedback.
-
Google Summer of Code 2025 applications are now closed.We are currently reviewing all applications. Contributors will be announced 8 May 2025. Thank you! |
Beta Was this translation helpful? Give feedback.
-
|
I'm closing this discussing now as GSOC 2025 is nearly over. Thank you for everyones input and help. We hope to take part next year and we'll be posting info here |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Cloudcasting ML Discussion Thread
This space is for you to ask any questions you have about this project. We're here to provide clarifications and help you understand the project's goals, scope, and requirements. Feel free to ask about anything that interests you!
Please note that this discussion is for questions and clarifications, not for formal applications.
Project Description
Traditionally, forecasting tools are trained using historical satellite data. As part of an innovative new project, we have been training a model to predict satellite images up to 3 hours ahead over the UK. This work in early stages, but we have already proved that a satellite forecast using a very simple ML model can improve our solar energy forecast. There is lots of opportunity to improve on this new and unique satellite forecast, from trying different video prediction and AI weather model architectures to training a diffusion based model to stop the satellite forecast being blurry.
Expected Outcome
An improved satellite forecast model
Other Key Information
Beta Was this translation helpful? Give feedback.
All reactions