Between the hours of 11:00 UTC (June 10) - 02:00 UTC (June 13) a small subset of our Office365 and Exchange calendar events (~2% globally) were affected by a bug that prevented users from booking or syncing associated spaces. The behavior was caused by a scheduled database maintenance (https://status.robinpowered.com/incidents/frybm33l0r46) where a handful of our table columns were inadvertently converted to a case insensitive collation.
At 21:00 UTC (June 12) a replicated database was spun up and began converting the incorrect tables. The replica was promoted to master and service was fully restored to the small set of affected customers by 02:00 UTC (June 13).
Upon reports of several customers experiencing difficulty booking events and rooms having trouble syncing, we quickly identified that all of the errors were contained to a handful of Office365 and Microsoft Exchange users. All of the errors indicated that events were failing a uniqueness validation within the database which would cause room syncing to fail and prevent some spaces from being booked at specific times. Once we began to investigate the database we found that querying for an event by its identifier would result in multiple returned entries.
As noted by Microsoft’s official documentation, the identifiers for calendar events should be stored as case-sensitive values (https://docs.microsoft.com/en-us/exchange/client-developer/exchange-web-services/ews-identifiers-in-exchange#working-with-identifiers). Our database has accounted for this specification ever since Microsoft calendar support but the accidental change to the table’s collation resulted in case-insensitive queries being performed on the database. To fix the the problem tables the following steps were taken:
1. A new read replica of the existing database was spun up.
2. Schema changes were performed on the read replica to remediate the incorrect collations.
3. The replica was promoted to the new master database and all DNS entries were resolved.
4. API, Dashboard, and Analytic services were all tested to ensure the bug was resolved.