[ROOT CAUSE ANALYSIS] Service Interruption 05/26/2014

At 12:04am PST on May 26th, Yammer on-call engineers began investigation of platform performance issues. At 12:55am PST, a batch of servers that handle background posting of Yammer messages were identified as having issues with database connectivity.

At this point the engineering team followed standard procedures to restart the worker processes on each of these servers. Following these restarts, database connectivity was established again on all servers, and queued messages started to post at 1.35am PST. By 6:06am PST the backlog of messages was cleared, and message delivery times had returned to normal.

In order to prevent similar performance degradation in the future we are adding additional monitoring for database connection issues.

This entry was posted in Uncategorized. Bookmark the permalink.