• Diff for "Service Outage Report - 20190306"
Last updated at 2019-03-06 15:25:25
Differences between revisions 2 and 3
Revision 2 as of 2019-03-06 15:22:40
Size: 830
Editor: loxch
Comment:
Revision 3 as of 2019-03-06 15:24:25
Size: 969
Editor: loxch
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:

After DB server encounters very high CPU usages, all DB queries start to get the hang. At the end most of services are unresponsive.

Service Outage Report - 20190306

Summary

This outage is caused by DoS attack to mudfish nodes. At the previous outage, its target was to Mudfish Web Servers. However at this time, its target was Mudfish Nodes. This issue was started at DB servers overloaded but we found that it's caused by internal log collector daemons because due to DoS attack too many logs were generated and inserted into the DB server.

After DB server encounters very high CPU usages, all DB queries start to get the hang. At the end most of services are unresponsive.

Outage Time

  • 2019-3-6 8:30 PM ~ 2019-3-6 11 PM (KST)
    • For 2 ~ 3 hours, accessing the services of Mudfish was hard and very laggy including Web Server / Authentication services.

Current Status

  • As temporary workarounds, I'd disabled to trace the user / nodes events which used internally to reduce the overloads of DB.
  • All services of Mudfish should work fine now.

Service Outage Report - 20190306 (last edited 2019-03-06 15:25:25 by loxch)