GitHub - MichelFaloughi/cis5200-project

Hola chicos I hope you're well

Here's a quick guide to stuff we'll do:

I need everyone of you to make an account here: https://cds.climate.copernicus.eu/ and retrieve your API key Once you do that, you will have to create a .cdsapirc file (in your home directory), that looks like this url: https://cds.climate.copernicus.eu/api key:

so that you'll be able to query data from the Copernicus Climate Data Store (CDS), not to be confused with the Coperniucs Climate Change Service (CDS) to see how to query data from CDS, see the playground.ipynb notebook, where I'm trying a bunch of stuff

Questions:

How to make the compute not super expensive ? Should we focus on a smaller specific geography ?

Concerns:

Make sure one method is novel: a non-standard training loss, transfer learning, regularization, or such
Cite packages
I'm scared we won't get any meaningful results... Could we still get a full grades ? Wind speed prediction might be too chaotic...

TO DO:

Check the literature, the current state of the art, etc
Pre-process data.
Build 5 models
Build common evaluation metrics on which to test each model. Figures, etc. Like a 5-subplot plot for each. The losses, the results, etc

Maybe as a step one, I can build an RNN from start to finish, including data preprocessing and post processing, then use that for all 4 other models

TO DO: Define the task formally: "We predict 10m wind speed using past 24h ERA5 at time horizon: 1-hour ? Maybe do autoregressive rollouts ?

Data: TO DO:

What we could do later is make a grid around the globe at a higher resolution, doesn't have to be ALL the data point that era5 has.

Eval: TO DO:

Figure out what evaluation metrics you want per model
Create a standardized evaluation 'score card' for all models
Then write some code to show the results side by side

Models: TO DO:

Figure out what 5 models we want (RNN ? Ensemble ?)
Figure out what the baseline model iscd

Hence, the next steps are:

Define the task formally in your notebook: “We predict next-hour 10m wind speed using past 24h ERA5 variables at location X.”
Build the dataset (single DataFrame used by all models).
Implement Baseline (Persistence) and evaluate it.
Implement Linear Regression + Random Forest + XGBoost → plug into shared eval.
Implement MLP.
(If time) Implement LSTM or 1D CNN as the 5th model.
Create the scorecard table + 1–2 plots.
Mirror this into Overleaf sections (Models + Evaluation).

Latest:

make a requirements.txt file for the requirements like dask etc.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Images		Images
data		data
eval		eval
models		models
tutorials		tutorials
.gitignore		.gitignore
README.md		README.md
playground.ipynb		playground.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages