Elevated API Latency

Incident Report for Robin

Resolved

We're marking the issue as resolved after monitoring the service for the last hour.

In summary, there were two periods of about 15-20 minutes of outage, caused by our database becoming unhealthy and not failing over at an acceptable speed to a backup. This, in turn, delayed our syncing queue, adding additional load to the system when the database came back online.

We're working with our cloud provider to address the failover issue and hope to make additional upgrades to our hardware during the next maintenance period to avoid similar outages in the future.

Posted Aug 14, 2018 - 15:43 EDT

Update

To keep the service stable while we continue investigating the issues from today, we've throttled calendar syncing. This may result in remote calendar changes taking longer than usual to appear in Robin.

Posted Aug 14, 2018 - 13:04 EDT

Update

Things are once again stable, though we continue to monitor closely. The outage seemed to stem from some odd behavior we're observing in our primary database and we're looking into the issue with our cloud provider, AWS.

Posted Aug 14, 2018 - 12:44 EDT

Update

We are continuing to monitor for any further issues.

Posted Aug 14, 2018 - 12:31 EDT

Update

Seeing elevated latency again. We're addressing the issue.

Posted Aug 14, 2018 - 12:25 EDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Aug 14, 2018 - 10:54 EDT

Identified

We've noticed elevated latency in the API causing longer page load times. We're adding additional resources to help fix the issue.

Posted Aug 14, 2018 - 10:51 EDT

This incident affected: Platform (API), Dashboard, and Calendar Syncing (Syncing Service).