The company blamed routine maintenance work for the outage: Its engineers had issued a command that inadvertently disconnected Facebook’s data centers from the general internet. About 827,000 people responded to Zuckerberg’s apology.
The messages ranged from the funny: “It was terrible, I had to talk to my family,” commented one Italian user, to the confused: “I took my phone to the workshop thinking it was broken,” wrote someone from Namibia.
And of course, there were some very upset and angry: “You cannot make everything close at the same time. The impact is unprecedented,” posted a Nigerian businessman. Another from India asked for compensation for his business interruption.
What is clear now, if it wasn’t already obvious, is how dependent billions of people have become on these services, not just for fun, but for essential communication and commerce as well.
What’s also clear is that this is far from unique: Experts suggest that widespread outages are becoming more frequent and more disruptive.
“One of the things we’ve seen in recent years is a greater reliance on a small number of networks and businesses to deliver large chunks of internet content,” says Luke Deryckx, Technical Director at Down Detector.
“When one of those, or more than one, has a problem, it affects not only them, but hundreds of thousands of other services,” he adds.
Facebook, for example, is now used to access a variety of different services and devices, such as smart TVs.
“And so, you know, we have these kinds of ‘snow closures’ on the internet that happen now,” says Deryckx. “Something happens [y] We all looked at each other like ‘well, what are we going to do?’
Deryckx and his team at Down Detector monitor web services and websites for disruptions. It says that widespread outages affecting major services are becoming more frequent and more severe.
“When Facebook has a problem, it creates such a huge impact on the internet, but also on the economy and, you know … on society. Millions, or potentially hundreds of millions, of people are just sitting around waiting for a small team in California fix something. It’s an interesting phenomenon that has grown in recent years. “
October 2021: A “configuration error” brought down Facebook, Instagram and WhatsApp for almost 6 hours. Other sites such as Twitter were also disrupted due to the increase in new visits to their applications.
July 2021 – More than 48 services including: Airbnb, Expedia, Home Depot, Salesforce were down for about an hour after a domain name system (DNS) bug at content delivery company Akamai. It followed a similar business interruption a month earlier.
June 2021 – Amazon, Reddit, Twitch, Github, Shopify, Spotify, various news sites went down for about an hour after a customer of the cloud computing service provider accidentally triggered a previously unknown bug.
December 2020: Gmail, YouTube, Google Drive, and other Google services went down simultaneously for about 90 minutes after the company said it encountered an “internal storage quota issue.”
November 2020 – A technical issue with one of the Amazon Web Service facilities in Virginia, USA, affected thousands of third-party online services for several hours, mostly in North America.
March 2019: Facebook, Instagram, and WhatsApp were down or severely disrupted for about 14 hours after a “server configuration change.” Some other sites, including Tinder and Spotify, that use Facebook to get in, were also affected.
Inevitably, at some point during a major outage, people are concerned that the crash is the result of some kind of cyberattack.
But experts suggest that, most of the time, it is due to a more mundane case of human error, compounded, they say, by the way the internet is held together with a complex set of outdated and complicated systems.
During the Facebook outage, experts joked on Twitter, saying that some of the usual suspects or reasons for outage issues are “older than the Spice Girls” and are “designed on the back of a napkin.”
Internet scientist Professor Bill Buchanan agrees with this characterization: “The Internet is not the large-scale distributed network that DARPA (Defense Advanced Research Projects Agency), the original architects of the internet, attempted to create, which it could withstand a nuclear impact. “
“The protocols that it uses are basically the ones that were written when we connected to mainframes from dumb terminals. A single failure in its central infrastructure can cause everything to crash.”
Professor Buchanan says that improvements can be made to make the internet more resilient, but that many of the fundamentals of the web are here to stay for better or for worse.
“In general, the systems work and you cannot ‘turn off’ certain internet protocols for a day to try to remake them,” he says. Rather than trying to rebuild the systems and fabric of the internet, Professor Buchanan believes that we must improve the way we use it to store and share data, or risk more massive disruptions in the future.
He argues that the internet has become too centralized, that is, when too much data comes from a single source. That trend must be reversed with systems that have multiple nodes, he explains, so that no single fault can stop a service from operating.
There is a silver lining here. Although major internet outages affect the lives of users and businesses, ultimately they can also help improve the resilience of the internet and the web services connected to it.
For example, Forbes estimates that Facebook lost $ 66 million during the six-hour outage due to the suspension or exodus of advertisers on the site.
That kind of loss is likely to focus the minds of top executives on preventing it from happening again. “They lost a great deal of money that day, not just in their stock price, but also in their operating income,” according to Deryckx.
“And if you look at the disruptions caused by content delivery networks like Fastly and Cloudflare, they also lost a lot of customers to the competition.”
“So I think these operators are doing everything they can to keep things online.”