Remote Development at Slack

9 minutes • Written 2 years ago

In this article, “remote development environments” refer to AWS EC2 instances where engineers make code changes and can see a running Slack application with those changes.

For years, engineers at Slack isolated and tested their changes by running microcosms of the Slack application on their local computers. This was difficult for many reasons: it involved installing and maintaining local dependencies, handling resource intensive software, and writing custom scripts that must work across different operating systems.

As a developer productivity team, we noticed this pain through our user surveys and metrics, and desperately wanted to solve the problem. We explored moving the entire development experience to remote environments, from code intelligence to type-checking and builds. Engineers would no longer have to maintain code or dependencies on their local laptops. They could get a fresh isolated environment on demand, ready to be used within a couple of minutes.

Before we dive into the details of remote development environments, let’s look at where we began.

Status quo

Slack’s backend is written in Hack, which runs on HHVM. The codebase has grown significantly over the past few years, and today a fresh clone takes nearly 30 minutes. Owing to this size, running HHVM has become resource-intensive and is infamous for making engineers’ laptops sound like a power generator, in addition to impacting the performance of other applications.

Though there’s no mandated single development environment, the majority of webapp engineers use VSCode for their work. VSCode provides a wide range of extensions to support multiple languages, including Hack, with features like syntax highlighting, formatters, and linters. Over the years Slack engineers have invested a lot of time in coming up with different strategies to improve our development experience. And while it’s worked fairly well, it’s always been brittle.

The initial development workflow for development involved running HHVM directly on macOS, which encountered its own set of common problems: homebrew breaking installs, upgrade failures, and having to play catch up when a new version of HHVM was released. This became a major pain point due to the frequency of HHVM updates, causing overhead and frequent disruptions for engineers.

With the limitations of the initial workflow, the next iteration moved all the HHVM dependencies to local docker instead, making it easier for engineers to stay in sync. This required a lot of custom scripts and services to build docker images used for running webapp with HHVM. Even though this solution ironed out a lot of inconsistencies, it came with the overhead of new dependencies such as docker, maintaining HHVM build images, and automating scripts that put everything together. While this was a significant improvement over the previous workflow, it was apparent from the high traffic in our internal docker self-help channels that more needed to be done to make for a smooth development experience.

If you’d like to learn more about the prior development workflow, check out this blog post.

Remote development

At the beginning of 2021, a new team was formed inside the Developer Productivity org with the express purpose of improving the consistency and speed at which engineers could write code. Based on the information gathered from user interviews, engineer feedback, and existing infrastructure, there was a clear need to evolve how we write code without the overhead of local setup and the maintenance associated with it. This led to the inception of the Remote Development Environments project.

Slack’s web application requires a lot of setup for running. It accounts for 60% of code changes made across our whole engineering organization. We focused remote development environments on eliminating the limitations of our existing workflows and improving developer productivity. The proposed solution was shared to gather feedback across engineering orgs, and went through multiple iterations before getting finalized. Through this process many engineers shared their pain points with the existing development workflow, which further informed the proposal.

Once the proposal was finalized, we prototyped it. Slack has rich CLI tools used for all sorts of tasks, such as interacting with development environments, opening pull requests, and updating local dependencies. This was the obvious place to add our prototype. A new command was added to kick off a remote environment:

slack remote-dev -b <branch_name>

This command lets engineers create a new branch, reserve a development environment, attach their branch to that environment, install any required dependencies, and open a local VSCode instance with required settings and extensions installed. The result: a fresh new instance, ready for development work, in under 90 seconds. It even eliminates the need to clone the webapp repository locally.

How does it all fit together?

A set number of development environments are always available to be reserved for engineers using AWS Auto Scaling Group (ASG). These environments periodically sync themselves with master, pulling down new commits and dependencies to minimize setup time when an engineer requests them. Once an environment is reserved, periodic jobs are disabled and the instance is detached from the ASG. It then provisions a new instance to maintain the number of available environments. This saves about 12 minutes of bootstrap time required for a new development environment every time an engineer reserves an environment for work.

With slack remote-dev -b <branch_name> engineers reserve a remote environment with their branch checked out in the workspace. It then configures the remote development environment with the dependencies required to use VSCode’s Remote SSH extension and develop remotely. Every VSCode version comes with a vscode-server, which gets installed on the remote environment as part of this setup. It uses standardized VSCode settings.json and launch.json files required for efficient webapp development. This setup provides a local-quality development experience including full IntelliSense, code navigation, and debugging out of the box.

Early on, we recognised that both editor choice and terminal configuration are personal preferences. To this end, we did our best to enable as much customization as possible. Engineers can configure custom extensions, scripts, bash profiles, aliases, git config, and VSCode settings based on their preferences. By creating a local directory ~/.remote-dev-home, engineers can sync everything they need to their remote development environment during setup. This provides engineers the flexibility to use and configure editors like Vim or Emacs based on their preference. Git operations are also supported in the same way you’d expect them to work locally.

Once a branch has served its purpose, engineers detach from their environment, which automatically terminates the underlying instance, making them short-lived. Alternatively if the branch associated with a development environment gets merged, the environment is automatically detached and terminated. This keeps our fleet lean and cost-effective. We run as few instances as possible at any given time.

Improved development experience

Remote development environments standardized the development requirements across all webapp engineers in Slack and eliminated a great deal of setup overhead. They enabled engineers to work on multiple branches in parallel, a difficult task when all we had was local development. Engineers can request a fresh, isolated environment for each branch they work on, significantly reducing the cost of context switching. They can also collaborate with other engineers on the same remote environments by accessing and working on an environment already reserved for a branch.

Crucially, this also made it easier to control and upscale machine specs as the need arose. Remote development environments were initially configured to use C5.2xlarge (8 core, 16 GB RAM) instances, which were sufficient for running our backend services. As support for frontend builds and other tools was added, we observed a decline in performance. This was easily resolved by upgrading remote environments to a bigger, badder instance size: C5.4xlarge (16 core, 32 GB RAM) with a single configuration change, where previously a costly and time-consuming laptop rollout would’ve been needed. While this doubled the cost of our development environments, it was justified by the increased efficiency of our engineering team.

For new hires, getting started with webapp development was a breeze. Remote development environments eliminated much of the prerequisites for webapp development, bringing down the setup time from about an hour to mere minutes.

Switching to remote development environments also resulted in improved performance for many common tasks, such as running unit tests, running debugger, installing dependencies, and frontend builds. The metrics below compare frontend build duration between remote and local development environments.

Adoption

We released a minimal remote development workflow in August 2021. This allowed the onboarding of a few regular webapp engineers who helped identify bugs, use cases, and existing limitations.

We spent the next couple of months smoothing the experience and addressing the limitations reported by our beta users. While in beta, 30% of webapp engineers had completely switched to a remote development workflow. It was released for General Availability in October 2021 and by the end of January 2022 over 90% of our engineers had completely switched to the remote development workflow.

A few weeks after the general release, we conducted an internal survey to get feedback on the new workflow. The response was overwhelmingly positive, and feedback from this wider audience helped us to continue improving it.

Reception

Convincing engineers to change a workflow they’ve used for years is a hard sell. We knew the story had to be compelling if it was going to convince the team at large to switch. But the overwhelming benefits of remote dev environments made this easy. Word of mouth spread from our core group of beta testers quickly, and before long the majority of the team had switched of their own accord. It was worth it to them.

Here’s what a few engineers from around our organization had to say after their experience working with remote development environments:

“It’s a game changer! We’re going from checkers to chess here folks.”

“I think remote dev is a dramatically better experience. It allows working on multiple branches at once and frees up tons of resources on our laptops.”

“I’m a huge fan – this work has greatly improved my productivity and my quiet laptop fans really thank you.”

“This is one of the very few projects I remember us doing that so directly impacted the productivity of product development with almost no trade offs.”

“I can do webapp development on battery power for, like, 2-3 hours straight before my laptop dies thanks to your work. Thank you so much!!!”

An efficient, reliable, and scalable development workflow is critical to the success of fast-paced technology companies like Slack. The Dev Infrastructure team is continuously working towards making development experience seamless for all engineers by leveraging cloud resources — join us to build out the future of development in Slack.

A huge thank you to Rowan Oulton for helping with reviewing and editing this blog post.

Author

Sylvestor is a Staff Software Engineer on the Internal Tools team at Slack. He has been at Slack for a little over 3.5 years, and in that time, he has worked on a wide variety of projects to improve developer productivity.