In this article, development environments refer to sandboxes where you can test your code changes before deploying, and should not be confused with integrated development environments (IDEs) like Eclipse or Microsoft Visual Studio.
Dev environments have always been a mystery to me. Despite learning about them on my first day at Slack, and using them almost every day for the last three years, I have never understood how they truly worked.
Half a year ago, I set a goal to understand dev environments inside and out. I interviewed some of the most senior engineers at Slack, studied countless pages of documentation, and sifted through years of Slack conversations. What I discovered was a fascinating journey of how our dev environments evolved over time.
Why do we need dev environments?
Dev environments are copies of our application that we can modify freely. Their main value is allowing us to test changes to our application without impacting real users or putting real data at risk.
They enable us to iterate rapidly because it’s quick and easy to test changes on them. They also make it possible for us to easily share our changes with others for review.
Altogether, dev environments drastically increase development speed and safety.
What lies under the hood?
Slack’s dev environments are copies of our application that live on remote servers — Amazon EC2 instances to be exact. These instances are configured to run the Slack application and the many services it depends on.
Each dev environment hosts its own Slack subdomain, which we can navigate to on our browser to see the changes we make.
No changes in dev environments can impact real users because they use their own set of infrastructure (e.g. databases) isolated from production.
Developing remotely vs. locally
At Slack, we develop remotely, meaning our dev environments live on servers. Another option is developing locally on our personal computers. Local development is great for smaller projects because it’s fast and doesn’t require an internet connection. However, for larger projects, remote dev environments offer significant advantages.
First, we don’t have to set up the Slack application locally. Given that Slack has a very complex architecture that depends on many different services, not having to set things up locally is immensely valuable.
Second, if a change works in dev, it’ll most likely work in production, because our dev environments are configured to mirror production. Some level of drift may still happen with especially long-lived dev environments, but the likelihood and magnitude are much smaller than when developing locally with unique machines that often end up with inconsistent configurations.
Third, remote dev environments don’t rely on a personal computer, which may crash or lag. Cloud hardware is much more affordable, resilient, and scalable. Further, they allow us to easily develop on multiple machines and share our work with teammates for review.
As the internet becomes increasingly faster and more reliable, it makes more sense to develop remotely.
How we use dev environments at Slack
The best way to illustrate Slack’s dev environment workflow is with an example.
Let’s say for some reason we wanted to test a version of Slack’s homepage with all-caps, purple, Comic Sans text.
We first create a feature branch, and then attach it to a dev environment using a command line tool called slack sync-dev
. It reserves a random dev environment then syncs our local changes to it, so whatever local edits we save automatically transfer to the dev environment.
At its core, slack sync-dev
is simply a wrapper around two well-known utilities — fswatch
(detects changes) and rsync
(transfers changes).
If we make any frontend changes, we have to build them locally using webpack, an open source tool we adopted as our build system. The command slack run buildy:watch
builds our frontend assets and serves them to our dev environment over localhost.
When we’re done, we can navigate to dev575’s subdomain, and voila! Behold our violet masterpiece.
Now, we can poke around on the page for bugs, fine-tune our change, and share it with others for review.
Remember that our frontend changes are built and served from our personal computer? If we want others to be able to see them after we close our computer, we have to generate a static build, which builds our frontend assets on our dev environment instead of locally.
Why do we build frontend changes locally?
When we first introduced webpack in 2017, we built frontend changes remotely in our dev environments. Whenever someone made a frontend change while synced, the dev environment automatically rebuilt their assets.
However, as our codebase grew, webpack became more resource-intensive. One build would consume the memory of the entire instance. At the time, multiple developers worked on the same instance, and they were constantly interrupted. So, we moved webpack onto our local machines.
With just one dev environment per instance today, and with more advanced instances, it would be entirely possible to move webpack back onto our dev environments, which would make the developer experience a little smoother. But the current system is fast and scalable, so we don’t feel the need to fix what ain’t broke.
Improving our command line tools
Let’s talk command line tools for a sec. We’ve already covered some of them, like slack sync-dev
. We can’t live without them at Slack because they make developing so much faster and easier.
Early on, when we didn’t have slack sync-dev
, we had to manually copy our changes over to the dev environment, which was slow and error-prone. Now, we boast over 60 command line tools that simplify many mundane tasks like this one.
Other examples include slack bot-me
, which creates a bot user on the current dev environment, and slack tail-dev
, which tails remote logs from our current dev environment. If you’d like to read more about our dev tools, check out our blog post from 2016.
Scaling our dev environments
Back in 2014, we only had one dev environment that everyone shared. If one person broke it, nobody else would be able to test their changes. That wasn’t a big issue then, but as Slack grew, we had to add more. By the end of 2019, we were maintaining 550 dev environments, enough for every Slack engineer to attach to a different one.
However, this increasing trend did not continue, and in fact completely reversed in 2020. Before we talk about why, let’s first take a look at another interesting metric that changed over time: the number of dev environments per EC2 instance.
This number fell over time because we wanted to isolate dev environments from each other. When multiple environments share the same instance, one developer running a memory-heavy process on one environment would slow down all of the others.
There is a tradeoff though — fewer dev environments per instance means we have to pay for more EC2 instances. Also, these instances were statically managed, so lots of engineering hours were required to provision new ones and deprovision corrupt ones. To make matters worse, long-lived instances got gunked up over time and would stop behaving reliably.
To resolve these issues, we created a new system to provision dev instances based on demand. This caused a drastic reversal in the increasing trend shown in the first chart. Instead of keeping hundreds of instances running concurrently, we provision new instances when needed. Once developers are finished testing, their instances are automatically deprovisioned. With this system, we’re able to use our dev environments much more efficiently. We’ll be diving into these scaling evolutions in an upcoming post, so stay tuned!
Well-designed dev environments are critical to the success of any technology company. Here at Slack, our dev environments, and our tooling infrastructure at large, are entering an exciting stage of scale and automation — join us to build out this future.
A huge thank you to Ross Harmes, Felix Rieseberg, Tito Sandoval, Harrison Page, Melissa Khuat, Kefan Xie, Shannon Burns, Nolan Caudill, Matt Haughey, and many others for helping with research and editing.
Author
Michael is a Software Engineer on the Platform team at Slack. He has been at Slack for almost 3 years, and in that time, he has worked on a wide variety of projects, including user-facing features, API infrastructure, and growth experiments.