November 8, 2024 13 min read @Scott Nelson Windels There’s No Such Thing as a Free Lunch! Incident Management takes time Incidents need responders that are trained and experienced. At Slack, training…
August 18, 2022 19 min read @Frank Chen Slowing Down to Speed Up – Circuit Breakers for Slack’s CI/CD What happens when your distributed service has challenges with stampeding herds of internal requests? How do you…
August 23, 2022 14 min read @Carlos Valdez@Frank Chen Balancing Safety and Velocity in CI/CD at Slack In 2021, we changed developer testing workflows for Webapp, Slack’s main monorepo, from predominantly testing…
July 14, 2020 9 min read @Ryan Katkov All Hands on Deck This story speaks to the process behind incident response at Slack and uses the May 12th, 2020 outage as an…
July 14, 2020 8 min read @Laura Nolan A Terrible, Horrible, No-Good, Very Bad Day at Slack This story describes the technical details of the problems that caused the Slack downtime on May 12th, 2020. To…