What is Wrong with Facebook
What Is Wrong With Facebook
The key problem that triggered this blackout to be so severe was an unfavorable handling of a mistake condition. A computerized system for verifying configuration worths wound up creating far more damage than it dealt with.
The intent of the computerized system is to look for setup values that are void in the cache and also replace them with updated values from the consistent store. This functions well for a transient issue with the cache, however it does not function when the persistent shop is void.
Today we made a modification to the consistent copy of a configuration value that was interpreted as void. This meant that every single customer saw the invalid worth as well as attempted to repair it. Because the solution includes making a question to a cluster of databases, that cluster was rapidly overwhelmed by thousands of countless questions a second.
To make matters worse, each time a client obtained an error trying to inquire among the databases it analyzed it as an invalid worth, as well as removed the corresponding cache trick. This meant that also after the initial problem had been repaired, the stream of queries continued. As long as the data sources fell short to service a few of the demands, they were causing much more demands to themselves. We had gotten in a responses loop that really did not permit the data sources to recuperate.
The method to stop the comments cycle was quite painful - we needed to quit all website traffic to this data source cluster, which implied turning off the website. As soon as the data sources had recouped as well as the source had been fixed, we gradually permitted even more people back onto the website.
This got the website back up as well as running today, and also for now we have actually switched off the system that attempts to deal with configuration worths. We're exploring brand-new designs for this setup system complying with design patterns of various other systems at Facebook that deal more with dignity with feedback loopholes as well as short-term spikes.
We apologize once again for the website outage, and also we want you to know that we take the efficiency as well as integrity of Facebook very seriously.