Between the hours of 20:45 UTC (Jan. 5) - 03:30 UTC (Jan. 6) the majority of tablets were experiencing API request errors. The errors were due to an extraneous 301
redirect introduced to the production environment which forced clients contacting the affected server to use an incorrect API server indefinitely, despite the redirect being removed almost immediately.
At 03:15 UTC an update was rolled out to tablet devices, repointing them back to the correct servers. Service was restored to the majority of affected devices by 03:30 UTC.
Upon reports of misbehaving tablets, we quickly realized that many clients were receiving erroneous 404
responses. This type of response indicated to us that there may be an issue with these clients making requests to the wrong locations.
We noticed that the amount of requests to our application servers nearly halved. It was soon found that an HTTP to HTTPS permanent redirect was pointing clients to the wrong server cluster. Because these redirects are 301
, HTTP clients being utilized on devices will elect to never again use the previous, correct endpoint. Even once the 301
redirect is removed, due to heavy caching these devices continued to use the incorrect location.
The faulty redirect had already been removed from the original cluster by the time the issue had become realized, however further steps were needed to mitigate the issues on a large amount of devices that had become affected.
A proxy server was deployed to the cluster that the 301
had redirected the clients to, which would point clients back to a correct URL.
All tablets were remotely updated to alter their API endpoint slightly so that it could not be recognized as the same endpoint that was 301
redirected. This allowed those tablets to immediately come back online.
Evaluation of our deployment process is being conducted to help identify how the 301
redirect was allowed to make it into the application cluster and steps are being taken to adjust our internal processes to protect against a similar issue occurring in the future.
Our clients and servers will be updated with safer caching headers to help mitigate any future accidental 301
redirects.