Conversation
|
The error that I get is copied below. |
|
what if you use the CPU and start julia with |
|
Thanks @simone-silvestri for the suggestion. Will try it now. |
|
Things go a lot further but there is a problem with the lines that defines It seems this is with |
|
@simone-silvestri , any advice on what is going wrong here? |
|
It looks like there is a bug in the |
|
Thanks @simone-silvestri , I will give that a try! |
Make changes so that it runs
|
@simone-silvestri : I tried it and it seems like a function is not defined. I added this at the beginning and now it seems to be running! |
|
Ah nice. I think we can export that type. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #142 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 34 34
Lines 1962 1983 +21
=====================================
- Misses 1962 1983 +21 ☔ View full report in Codecov by Sentry. |
|
It's running on a CPU (i.e. slow) and still on the initial time step. I made all these changes on the branch and can revert back to what we had previously as other fixes come along. Maybe I'll have something to share tomorrow. |
|
I started the job yesterday and it hasn't updated the output files in over 24 hours. I think something has gone wrong. Below is the currently display that I have. It hasn't stopped and still running on a CPU. Maybe we need to try it on a GPU or have more output to see what has gone wrong? Any suggestions? |
|
My correction. It is still running on one CPU. It is at 4 days after 7 days of computing. Not a great ratio. What needs to be done so we need to do to run this on a GPU? @simone-silvestri |
|
Wow, that seems quite slow! What if you move it on the GPU? |
Sorry @simone-silvestri for the late reply. I am happy to try it again on a GPU but last time there was an error. I can try it again and let you know what the error is. |
|
@simone-silvestri |
|
I stand corrected, there is more information. To me this actually looks very different even. |
|
Can you make an MWE for this and open an issue? |
I will certainly give it a try and see what part of it is causing the issue. This will likely take me a day or two to get to. |
|
@simone-silvestri , I realize it's been a few months but I am still keen to this this example up and running. I can try this all again this week but if you had time to meet for an hour, I wonder if that would help? |
|
Sure, I ll text on slack. |
|
Interestingly, when the new example is competely removed (thank you again @glwagner), the docs still fail. Sadly, I still can't see the errors. I am trying to build the docs on a server and I will let you know if I come across errors of any kind. |
|
When I tried running the Question: why does it fail on these two checks on github? |
|
It's an out-of-memory error on GPU To display the buildkite publically we probably need to fiddle with buildkite settings |
Let's try a coarser grid to see if this allows us to avoid the memory errors.
|
Ah, good to know! I'm trying a coarser grid, 720x120x40, to see if this avoids the error. |
|
Funny that when I reduce the resolution, and don't do anything else differently, now we have more errors. See below. I'll return the parameters since this clearly did not help. |
Returning to the desired resolution since more errors happened with a coarser grid. Hmm...
|
Thanks @navidcy for updating it. I hope we get this pushed sooner than later. I looked at the errors but when I do I get a login page for buildkite. Is that expected? Should I have a login to see those errors? |
|
I pushed some updates; let's see! |
|
Looks like we are getting different errors from before, so maybe that's progress. The part that seems to be failing is with ecco4 dataset, in particular I think it's this line in |
|
The error message that we get in the test is copied below. What's strange is that when I try running this test on a server all the tests in this script. On this server I get a problem with the package |
|
Can someone explain to me why it is that when I click on the first two failed tests I can see exactly what has been returned, but when I do the last two, I get a login screen for buildkite? |
|
It's the same for me re: login screen. @cmbengue any ideas? |
This error makes me think that there are corrupted jld2 files. It might be that both CPU and GPU are trying to download and inpaint data and they conflict with each other. We should specify different donwload directories for the CPU and GPU tests I think. |
|
Thanks @simone-silvestri for the suggestion. I'm giving that a try now for this one test. If it passes, it might be useful to do it for all the tests I suppose. We will see. |
|
I tried making the suggested change. Maybe someone and double check what I changed to see if it makes sense to you? With these changes the tests still pass on my server, so it might be working correctly. However, we are still getting an error with the same function but now it says we are using too much memory. Using 4.078 KiB does not seem like a lot to me. Is this a limit we can change to faciliate testing? |
|
Any idea why we get this error when running distributed tests? Also, can anyone see the error from the docs? I get a login page but can't login. (@cmbengue ?) |
|
Thanks @navidcy for the update! Things are better in that now the failed test passes. Yay! I still can't see the error with building of the example. Can anyone else see this? I also see this error with the downloading. I wonder if this is a fluke and might go away on future attemps? |
|
Hi, @glwagner and @francispoulin, I don't know why you see the login page for some tests on Buildkite, as I neither actively manage nor have admin access to Buildkite. |
|
Thanks for your reply and sorry to have bothered you with this. |
Following up on #106, this is a first attempt to create a regional model with ECCO-derived restoring at the boundaries. We decided to try focusing on the ACC in the southern ocean.
It does not run yet, but after it does, it would be good to know if people agree this is a good example to include. If yes, then we need to turn this into an example.