Last updated at 2019-03-06 15:25:25

Service Outage Report - 20190306

Summary

This outage is caused by DoS attack to mudfish nodes. At the previous outage, its target was to Mudfish Web Servers. However at this time, its target was Mudfish Nodes. This issue was started at DB servers overloaded but we found that it's caused by internal log collector daemons because due to DoS attack too many logs were generated and inserted into the DB server.

After DB server encounters very high CPU usages, all DB queries start to get the hang. At the end most of services are unresponsive.

Outage Time

  • 2019-3-6 8:30 PM ~ 2019-3-6 11 PM (KST)
    • For 2 ~ 3 hours, accessing the services of Mudfish was hard and very laggy including Web Server / Authentication services.

Current Status

  • As temporary workarounds, I'd disabled to trace the user / nodes events which used internally to reduce the overloads of DB. To solve this issue, we need to get some PM time and fix those pieces with DB optimization.
  • All services of Mudfish should work fine now.

Service Outage Report - 20190306 (last edited 2019-03-06 15:25:25 by loxch)