voyager624 - Fotolia

Slack outages raise reliability concerns

Slack outages are becoming frequent enough to make some analysts wonder whether companies would trust the collaboration startup for business-critical communications.

Slack's more than three-hour outage this week was the latest in a string of crashes that have left some analysts wondering whether the site's uptime problems could scare away enterprise customers.

Analysts warned if the Slack outages continue, rivals Microsoft, Cisco and Atlassian could use them as a reason for companies to avoid the fast-growing startup.

"I think the recent outages certainly open up an avenue of attack by Slack's competitors and will raise red flags for customers and prospects as to whether they can rely on the app for business-critical communications," said Irwin Lazar, an analyst at Nemertes Research, based in Mokena, Ill.

Whether Slack is suffering only growing pains, as opposed to a severe flaw in its technology, is difficult to determine without insight into the exact cause of each incident. Slack said this week's outage was due to "a bug included in an offline batch process of data."

Last month, Slack said it had suffered only four significant disruptions since May 2017. But when contacted again on Thursday, a Slack spokesperson said the company had done another review and identified eight major Slack outages in that time frame.

That list of outages includes more than two hours last Oct. 31, almost an hour on Jan. 9, about 2.5 hours on May 21 and roughly 20 minutes on May 23. Despite the troubles, Slack has yet to announce the leader of a safety engineering team it formed in March to improve its uptime.

In an interview last month, a Slack representative acknowledged the company's rapid growth has been challenging to keep up with at times. Since January 2015, the company has grown from 1.1 million daily users to 8 million regular users today.

"To be frank, we're still learning as we go," said Julia Grace, senior director of infrastructure engineering at Slack, based in San Francisco. "This is such a complex piece of software. We're operating at a global scale. We're learning and evolving and growing and making the service better along the way."

Some analysts pointed out that Slack's performance was much worse when it was starting out.

"Once upon a time, in the very, very, very early days of Slack, they were built on a model that couldn't scale," said Michael Facemire, an analyst at Forrester Research. "You remember those outages; you remember the old days when [Slack] would be down, and it would be down for very perceivable amounts of time."

Nevertheless, with tech powerhouses Cisco and Microsoft as competitors, Slack can no longer afford to look weak. Companies are unlikely to standardize on a collaboration vendor with an uptime record significantly less than rivals.

"I'm sure Slack is well-aware of the criticality of any downtime," said Larry Cannell, an analyst at Gartner. "Nevertheless, if you are trying to use Slack, then this is a big deal to you. These types of collaboration tools have become primary communication channels."

Slack recently updated its status page to make it simpler and less confusing, a company spokesperson said. Now, the only outages listed there reflect instances when no one was able to connect to the service. That appears to have led the company to redefine some connectivity troubles as "incidents" or "notices."

Under the previous classification scheme, the number of Slack-reported outages had jumped from nine in 2016 to 38 in 2017. This year, the vendor had been on pace to reach 24. However, the recent reporting changes have made it difficult to compare this year's numbers with those in the past.

"Delivering a reliable service is our primary commitment to our customers, and we take these types of incidents very seriously," the Slack spokesperson said. "We continue to learn from outages through rigorous postmortem processes so we can improve the availability, reliability, and stability of our service moving forward."

Slack publishes the percentage of time its app was online every month. An uptime of 99.75% in October 2017 was the worst of the last 12 months. In June, the percentage so far is 99.82%, the fourth time since 2013 that the app's monthly uptime fell below 99.9%.

Microsoft Office 365's worst worldwide uptime percentage since early 2016 was 99.97% during the second quarter of 2017.

"As we get more reliant on the cloud, even 99.8% uptime just isn't enough for us," said Wayne Kurtzman, an analyst at IDC. "Slack's base is very active on social media and Reddit, which makes any Slack outage stand out."

Dig Deeper on Team collaboration software