28 June incident report #1827
Closed
helinanever
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What happened
Early today around 1 AM UTC, one of our server nodes lost connection to the Internet and the automatic recovery logic failed. This unrolled a series of events resulting in unprocessed tasks piling up and causing an increased load. While our services are designed to handle a much higher load, unfortunately, this time we encountered an issue with the Google Cloud Platform where we host our Linux and Windows machines. Our continuous requests to create new machines in GCP caused us to exceed the rate limits after which our Linux and Windows build machines became unavailable.
By 4 AM UTC, the issue had snowballed to the state where it started affecting our macOS pool as well, resulting in macOS builds remaining queued.
Current state
We resolved the issue around 8 AM UTC and by now all services are operational. However, we are monitoring the situation to ensure that everything continues to work as expected. If you are still experiencing issues with builds remaining queued, please refresh the page to see the actual information – the build may have already started. You can also try canceling the queued build and starting a new one.
Planned improvements
We are still investigating the logs to understand the full picture of why our existing logic did not handle the situation properly. However, we are already planning a series of improvements to prevent such incidents from happening again in the future.
Some of you have been asking about a status page to get informed of outages faster. We are working on it and have plans to make the status page available soon.
We sincerely apologize for any inconvenience this outage may have caused and will continue to work hard to improve the reliability of our infrastructure.
Beta Was this translation helpful? Give feedback.
All reactions