What is Wrong with Facebook today

What Is Wrong With Facebook Today - Early today Facebook was down or inaccessible for most of you for approximately 2.5 hours. This is the most awful outage we've had in over four years, as well as we intended to to start with excuse it. We likewise wished to offer a lot more technical information on what happened and share one big lesson learned.

What's Wrong With Facebook

What Is Wrong With Facebook Today


The key flaw that created this failure to be so extreme was an unfavorable handling of a mistake condition. A computerized system for verifying arrangement worths wound up triggering far more damages than it dealt with.

The intent of the computerized system is to look for arrangement worths that are void in the cache and also change them with updated values from the persistent shop. This works well for a short-term issue with the cache, however it doesn't work when the persistent store is void.

Today we made a modification to the persistent copy of a setup worth that was interpreted as void. This suggested that every client saw the void worth as well as tried to fix it. Since the repair entails making a query to a collection of data sources, that collection was promptly bewildered by numerous thousands of questions a 2nd.

To make issues worse, every time a customer got a mistake attempting to inquire among the databases it analyzed it as a void value, as well as erased the equivalent cache trick. This indicated that even after the original issue had actually been taken care of, the stream of inquiries proceeded. As long as the data sources stopped working to service some of the demands, they were triggering even more demands to themselves. We had gone into a responses loop that really did not permit the databases to recover.

The way to stop the responses cycle was fairly painful - we had to stop all web traffic to this database collection, which indicated switching off the website. When the data sources had recuperated and the root cause had been repaired, we slowly permitted more individuals back onto the website.

This obtained the site back up as well as running today, and also for now we've turned off the system that tries to correct setup values. We're exploring brand-new designs for this arrangement system adhering to design patterns of other systems at Facebook that deal more beautifully with responses loops and also short-term spikes.

We apologize once again for the site interruption, and we want you to understand that we take the performance as well as dependability of Facebook very seriously.