Revision 1 as of 2018-03-21 14:13:46

Service Outage Report - 20180321

Summary

This outage was caused by too many authentication requests to Mudfish Authentication Server. It was like DoS (Denial Of Service) attach but I think one of mudfish users wrote a program to use HTTP Proxy feature but wrote wrong code to handle threads. :-( There was multiple mixed issues for this outage when I checked the mudfish servers. Mostly DB server and authentication server causes this issue because recently many new mudfish users are using our service and no. of users are incresing. Because DB server was too much overloaded yesterday it begans to make delays while handling the request. It leads another TIME-WAIT issues of Authentication server. :-(

Outage Time

  • 2018-03-21 7:30 PM ~ 2018-03-21 9:00 PM (KST)
    • It lasts around one and half hours.

Done

  • Leaved a warning(?) message to the user but don't know it'd be reached. ;-)

  • Rejects a request if too many auth requests in seconds. Auth server daemon is fixed now.