blocked miner process/thread - NFS issue - 1.9.0+ - failing wdPosts #6456
-
wdPost fails. Symptoms:
The syslog shows:
after approx. 120s. stopping the miner process ends up in a defunct process. Facts:
Lotus Version: stock v1.9.0 release tag, the problem is confirmed to exist in 1.11-master checkouts (not by me but reliable sources) To Reproduce Logs Additional context We were running v1.9.0 since it got released. The issue started occurring roughly on monday. I assumed network problems or faulty drives or other not lotus related issues. Today a slack thread (linked below) mentioned one of the symptoms mentioned above and over the day we found at least one miner with the same issue (the one running the 1.11 master). Others put other, same error - different message, nfs syslog errors on the table and it came apparent that i am not alone with this - so i rule out hardware issues for now. My best guess is: a change in the lotus code in between 1.8.0 to 1.9.0 triggers something in an ubuntu update released this week resulting in the NFS errors we see, ultimately resulting in the failing wdPost Slack Threads as reference minerX2 chan, non public: https://filecoinproject.slack.com/archives/C022ZR4JA1M/p1623366279455700 i link the slack conversations here since i do not have access to facts other miners might present - this issue looks like it occurs in different forms and with different error messages for different miners and setups/OS's |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
log excerpt from another miner
also resolved via 1.8.0 downgrade |
Beta Was this translation helpful? Give feedback.
-
@f8-ptrk could you please share your nfs setup/configuration? cc @magik6k |
Beta Was this translation helpful? Give feedback.
-
1.9.0 worked fine for ~18 days before the issues occurred 1.8.0 passed another round of wdPosts again today, it seems to be stable and at least a quick fix for the problem. if there are any questions about the setup/configuration please let me know soon, we will get rid of the ubuntu in the next 4 days - we cannot avoid running a version past 1.8.0 forever, so ubuntu must go |
Beta Was this translation helpful? Give feedback.
-
1.8.0 seems not to be the solution others say |
Beta Was this translation helpful? Give feedback.
-
the network gets shaky when running the vanilla proofs. it made it though it but it looked scary watching it live f'n ubuntu desktop hardware |
Beta Was this translation helpful? Give feedback.
1.9.0 worked fine for ~18 days before the issues occurred
1.8.0 passed another round of wdPosts again today, it seems to be stable and at least a quick fix for the problem.
if there are any questions about the setup/configuration please let me know soon, we will get rid of the ubuntu in the next 4 days - we cannot avoid running a version past 1.8.0 forever, so ubuntu must go