Service Outage Report - 20230914
Cause
The authentiation daemon was overloaded too much due to abnormal connection from multiple IP addresses. Due to this issue, the multiple operations handled by the authenticatioin daemon (for example authentication, fetching the mudfish node information, assigning IP address) didn't work well.
There would have been no problems in using the Mudfish nodes, but the connections to all web services of Mudfish were unstable.
Failure Duration
- 2023-09-14 7:00 AM ~ 2023-09-14 10:0 AM (based on PST)
- For about 3 hours, the authentication daemon were overloaded.
Action and Progress
- Patched to the authentication daemon with 1) adding the debug messages to trace the root cause 2) adding the short timeout for slow-write connection.