We're marking the issue as resolved after monitoring the service for the last hour.
In summary, there were two periods of about 15-20 minutes of outage, caused by our database becoming unhealthy and not failing over at an acceptable speed to a backup. This, in turn, delayed our syncing queue, adding additional load to the system when the database came back online.
We're working with our cloud provider to address the failover issue and hope to make additional upgrades to our hardware during the next maintenance period to avoid similar outages in the future.
Posted Aug 14, 2018 - 15:43 EDT
To keep the service stable while we continue investigating the issues from today, we've throttled calendar syncing. This may result in remote calendar changes taking longer than usual to appear in Robin.
Posted Aug 14, 2018 - 13:04 EDT
Things are once again stable, though we continue to monitor closely. The outage seemed to stem from some odd behavior we're observing in our primary database and we're looking into the issue with our cloud provider, AWS.
Posted Aug 14, 2018 - 12:44 EDT
We are continuing to monitor for any further issues.
Posted Aug 14, 2018 - 12:31 EDT
Seeing elevated latency again. We're addressing the issue.
Posted Aug 14, 2018 - 12:25 EDT
A fix has been implemented and we are monitoring the results.
Posted Aug 14, 2018 - 10:54 EDT
We've noticed elevated latency in the API causing longer page load times. We're adding additional resources to help fix the issue.
Posted Aug 14, 2018 - 10:51 EDT
This incident affected: Calendar Syncing (Syncing Service) and Dashboard, API.