Growing Pains: Migrating Slack’s Desktop App to BrowserView

12 minutes • Written 6 years ago

Recently Slack on the desktop has been going through an awkward adolescence. Instead of flailing limbs and pitch squeaks, ours has manifested in ways rather more grim: inexplicably failing to render content, reloading during common operations, and error screens that aren’t actionable. The only silver lining has been being on the receiving end of some absolutely savage burns:

Kinda seems like that something is “writing a desktop chat app in JavaScript”. pic.twitter.com/bOls7WS8n8

— Matt Diephouse (@mdiep) August 11, 2017

In all seriousness, the experience some customers have had leaves us with a pit in our stomach, and we’ve been working tirelessly towards a more mature version of the app, dubbed Slack 3.0. The good news is that it’s available on our beta channel now. Before we dig into the specifics of 3.0 — why it was necessary and how we got there — we need to cover a little bit of Slack history.

Web vs Desktop: An Illustrated Primer

You’ll sometimes see us refer to “the webapp” vs “the desktop app;” what does this mean? How do they relate to one another? A picture might clear this up:

It’s like a browser that only takes you to Slack.com

The desktop app is a host for some number of guest pages. The guest pages are like browser tabs pointed at slack.com, which we call the webapp. Although the webapp is on its own quest for modernity, this post is about the Electron container around it.

You might think there’s not much to embedding a web page, but like the 1990 classic Tremors, there’s a lot happening underground. Support for multiple workspaces is the main customer-facing feature, but much of our codebase is devoted to creating a layer of native integration that most folks don’t notice until it’s gone:

Support for notifications across all platforms — even Windows 7
Support for spell-check and language detection
App, tray, taskbar, and context menus
Support for deep-linking, launch on login, taskbar and dock badging
Installers for all platforms — mainly Windows

With that distinction made, let’s talk about why we needed an overhaul.

I Can’t Believe It’s Not BrowserView

We host pages using an Electron feature called webview. You can think of the webview as a specialized iframe with concessions made for security: it runs out-of-process and lets you avoid polluting the guest page with Node. Although we (and others in the Electron community) have found it to be a spawn point for bugs, until recently it was the only secure way to embed content. Since it’s implemented in Chromium and imported wholesale into Electron, we can’t tinker with it as easily as other APIs. And since it’s used only by Chrome extensions — not the tabs themselves — issues filed against it can languish. Besides renderer crashes during drag & drop and a litany of focus issues, the worst problem we faced was that sometimes, after a webview was hidden, it would not render content the next time it was shown.

Unfortunately for Slack, the webview was the linchpin of the app. There’s a view for each workspace and switching between them is a visibility toggle.

Should be fine, right…?

**Narrator:** **It wasn’t.** (Photo in background by barackobamadotcom licensed under Creative Commons)

In hindsight, we spent more time than we should have trying to work around the problem on our end. We considered trade-offs no responsible engineer should face: should users sometimes see a blank page or always have idle CPU usage?

Together, Slack and Sketch only take up 100.000% of my CPU. I have no complaints.

— daniel.pizza 🍕 (@dvdwinden) September 5, 2017

While we were exploring the boundaries of our creativity, the folks at Figma had already abandoned ship and begun on a new strategy for embedding web content. Enter BrowserView. Their post goes into more detail, but in a nutshell:

It behaves more like a Chrome tab than the webview does
It’s used more like a native window than a DOM element

What we mean by that is — unlike the webview — you can’t drop a BrowserView into the DOM and manipulate it with CSS. Similar to top-level windows, these views can only be created from the background Node process. Since our app was written as a set of React components that wrapped the webview, and — being React — those components lived in the DOM, this looked to be a full rewrite. But we needed to make haste, since users were encountering problems on a daily basis. So, how did we manage to pull the rug out from under our furniture without moving it first? Were there any design decisions that helped us out?

It turns out there were, or this would be a very short post. There are three parts of our client stack worth mentioning:

How we manage Redux stores
How we manage side-effects / async actions
How we refactor code rapidly

Sync About It

Like every webapp written circa 2017, Slack uses Redux. But unlike most Redux apps, Slack sometimes has to synchronize data between stores. That’s because instead of one tidy little process, we’ve got oodles of them.

All Electron apps have a main process that runs Node, and some number of renderer processes that are old-fashioned, card-carrying web pages, complete with a document, a body, and stifling inconsistencies between Mac and Windows.

**Tag yourself** (Photo in background by Steve Hopson licensed under Creative Commons)

“How could you possibly need that many processes?” — every Slack customer, to us

Not only do we have one process per workspace, but we might also have a process for the modal dialog you’re interacting with, a process working quietly in the background, or a process to show you a notification when you’re on a platform that doesn’t support them (here’s to you, Windows 7). All these disparate processes often need access to the same state, so in a leap of faith, they each create a Redux store and set it up with a clever middleware called electron-redux. It uses Electron’s IPC to bounce actions between processes, like so:

If an action is dispatched in a renderer process, that renderer ignores it and forwards it to the main process
If an action is dispatched in the main process, it is handled there first, then replayed in the renderers

This makes the main process’ store the One True Store, and ensures that the others are eventually consistent. With this strategy, there’s no need to shuttle state or get into the serialization game. The only things that cross a process boundary are your actions, and hopefully those are already FSA-compliant. For us this means that our Redux code is virtually process-agnostic: the actions can come from any process; the reducers can live in any process; the work gets done all the same.

It Was Super (Side-)Effective!

One oft-expressed critique of Redux is that asynchronous actions — and their side-effects — are a bit of an afterthought. There are dozens of solutions out there and since none of them are included in Redux, it’s up to you to choose what best fits your app. Slack’s desktop app preaches the gospel of Observable, so redux-observable was a natural fit for us. If you’re acquainted with Observables, you may have heard the mantra Everything is a Stream. And lo, what is a store but a stream of actions?

In redux-observable, you’re given that stream of actions as an input, and you write “epics” (like a saga but more Homeric) that manipulate it. It’s worth noting that the values emitted by this stream are the actions, not the reduced state. Here’s a contrived example, where we show a sign-in BrowserWindow on startup, if we’re not signed into any workspaces:

import { BrowserWindow } from 'electron';
import { REHYDRATE } from 'redux-persist/constants';
import { getWorkspacesCount } from '../reducers/workspaces';

const signInWindowEpic = (action$, store) => {
  // Rehydrate is just a $10 word for "we loaded saved state from a file"
  // Since we're a redux-persist app, it's one of the first actions that occurs 
  return action$.ofType(REHYDRATE)
    .filter(() => getWorkspacesCount(store) > 0)
    .map(() => createSignInWindow(store))
    .do((browserWindow) => browserWindow.show());
};

function createSignInWindow(store) {
  const browserWindow = new BrowserWindow( /* ... you get the idea */ );
}

This lets us compose sequences of actions, which is sometimes more valuable than looking at the byproduct of the actions (the state). Any objects returned from the stream are automatically dispatched as actions, but nothing says you have to emit an action. Oftentimes we just want to call an Electron API. Redux refers to this as a “side-effect,” but I refer to it as “doing my job.” It becomes really powerful when combined with a Redux store in each process, because now we can kickoff main process side-effects from a renderer and vice-versa, in a decoupled way. It’s similar to an event bus or pub-sub, but across Chromium processes.

How about a more involved example — what if we needed to keep a running total of time spent in different workspaces, to determine which ones were most and least used? This could grow into a mess of timeouts and booleans, but since the stream of actions is an Observable, let’s leverage the suite of operators that come with it:

/**
 * Keep a running total of time spent on each workspace and, once the app is quit,
 * fire an action that updates the usage property in the store.
 */
const tallyWorkspaceUsageEpic = (action$, store, scheduler) => {
  return selectionChangedObservable(action$, store)
    .timeInterval(scheduler)
    .pairwise()
    .reduce(usagePayloadFromIntervals, {})
    .map((payload) => ({
      type: WORKSPACE.UPDATE_USAGE,
      payload
    }));
};

/**
 * An Observable that emits any time the selected workspace might change.
 */
function selectionChangedObservable(action$, store) {
  /**
   * We need to terminate the stream when the app is quit, otherwise reduce
   * won't kick in.
   */
  return action$.ofType(
    WORKSPACE.ADDED,
    WORKSPACE.REMOVED,
    WORKSPACE.SELECTION_CHANGED
  )
  .takeUntil(action$.ofType(APP.QUIT))
  .filter(() => getWorkspacesCount(store) > 1);
}

/**
 * An interval pair here represents a workspace changed event. Given a pair like:
 *
 * [ { value: 'Workspace 1', interval: 5000 } ],
 * [ { value: 'Workspace 2', interval: 10000 } ]
 *
 * The value from the first corresponds to the workspace that was selected, and
 * the interval from the second represents the amount of time it was selected for.
 */
function usagePayloadFromIntervals(payload, intervals) {
  const [ first, second ] = intervals;
  const existingTime = payload[first.value] || 0;
  payload[first.value] = existingTime + second.interval;
  return payload;
}

You might be like, “Charlie, that sure looks fancy, but aren’t Observables impossible to debug?” And you’d have been mostly right less than a year ago. But this is JavaScript and in JavaScript, the only const is change. rxjs-spy makes debugging (i.e. logging and visualizing) streams as simple as adding a tag. A tagged stream can be monitored, paused, and replayed, right from the console. Testing Observables is a delight too, with the help of the utilities in RxSandbox (by our own OJ Kwon):

import { rxSandbox } from 'rx-sandbox';
import { tallyWorkspaceUsageEpic } from '../epics/tally-workspace-usage';

describe('tallyWorkspaceUsageEpic', () => {

  it('should accumulate time until the app is quit', () => {
    const { hot, flush, getMessages, e } = rxSandbox.create();
    
    const w = { type: WORKSPACE.SELECTION_CHANGED, payload: 'Hiro' };
    const x = { type: WORKSPACE.SELECTION_CHANGED, payload: 'Fiona' };
    const y = { type: WORKSPACE.SELECTION_CHANGED, payload: 'Da5id' };
    const z = { type: APP.QUIT };
    
    const usageAction = {
      type: WORKSPACE.UPDATE_USAGE,
      payload: { 'Hiro': 60, 'Fiona': 40, 'Da5id': 20 }
    };
    
    const action$ = hot('w-----x---y-z', { w, x, y, z });
    const expected = e( '------------(x|)', { x: usageAction });
    
    const result = getMessages(
      tallyWorkspaceUsageEpic(action$, store)
    );
    
    flush();
    expect(result).to.deep.equal(expected);
  });
});

What we’re doing here is creating a mock stream of actions, and providing it as the input to the epic. We define the stream with a marble string, which looks odd but is quite simple: any letter represents an action, and a – represents 10ms of virtual time. We can make assertions about the action we expect from the epic, and there’s no need for async or callbacks here — flushing the scheduler runs the clock to completion.

Refactor feat. TypeScript (Club Mix)

With the exception of LSD, there’s no shorter path to questioning your reality than embarking on a large refactor in a JavaScript codebase. Here a linter is like an over eager sidekick: it means well, but is mostly a distraction. “Don’t put parentheses there,” it chides, while the bug slips away. A type-checker, particularly when integrated with an editor, is the Watson you deserve.

Of course there’s an upfront cost — one that we had already paid — but that investment saw major returns throughout this project. Much of the work involved rearranging existing features, and a type-checker helped us avoid what would have typically been a long tail of bug fixes. It also makes working with Observables more natural. Never again will you ponder over the output of a flatMap (do I get the array or just one item?), the argument order for a reduce, or the name of that one operator that’s like throttle but starts with a D… (it’s debounce). When coupled with autocomplete in VS Code, writing JavaScript feels a lot like writing, say, C#. And I mean that in the nicest way possible.

No Main (Process), No Gain

Sometimes, when you haven’t used a workspace in a long time, we take the same approach as Chrome and unload that “tab’s” webContents to save memory. We still need to show notifications and badges for that workspace, so previously we would navigate to a slim-Slack page that responds to a handful of web-socket messages. Once selected, we stealthily disposed of the intermediate page and spun up the full webapp in its place.

Somewhere along the way, we had a realization: why not run all of the slim-Slacks in the main process, instead of each having their own page (and incurring the overhead of a renderer process)? This dovetailed nicely with our effort to make Redux actions process-agnostic: we could just as easily dispatch actions from the main process to update badges or show notifications. All we needed to do was connect to the web-socket from Node, something our colleagues down the hall knew a thing or two about.

With this change, customers signed in to a lot of workspaces will see a drop in both memory usage and number of processes:

Four workspaces = seven processes? That’s just bad math.

TL;DR

So, to wrap it all up: we rewrote most of our Electron app to move from the janky webview to the new-fangled BrowserView. We managed to do it in a relatively short timeframe, thanks to a combination of elbow grease and reasonable choices in our client stack, like:

Redux + redux-electron: Means we don’t have to think about where reducers live or where actions are dispatched
Rx + redux-observable: Turns our store into an interprocess event bus with functional superpowers
TypeScript: Helps us refactor code quickly and correctly

While it can be tempting to scrap a codebase and go back to green(field) pastures, particularly when faced with a mountain of bugs, this rarely works out for customers. When all was said and done, we reused more than 70% of our code, fixed most, if not all, of the webview’s shortcomings, doubled our test coverage, and substantially reduced memory usage. We think it’ll show in the user experience, but you, dear reader, can be the judge of that. ✌

If any of this sounds interesting and/or terrifying to you, come work with us!

Growing Pains: Migrating Slack’s Desktop App to BrowserView

Web vs Desktop: An Illustrated Primer

I Can’t Believe It’s Not BrowserView

Sync About It

It Was Super (Side-)Effective!

Refactor feat. TypeScript (Club Mix)

No Main (Process), No Gain

TL;DR

How We Built Slack AI To Be Secure and Private

The Scary Thing About Automating Deploys

Our Journey Migrating to AWS IMDSv2

Building Custom Animations in the Workflow Builder

@SlackEng how can I stay up-to-date on what's happening over there?

Follow us on Twitter

Growing Pains: Migrating Slack’s Desktop App to BrowserView

Web vs Desktop: An Illustrated Primer

I Can’t Believe It’s Not BrowserView

Sync About It

It Was Super (Side-)Effective!

Refactor feat. TypeScript (Club Mix)

No Main (Process), No Gain

TL;DR

A Day in the Life of a Frontend Foundations Engineer at Slack

Interning on Slack’s Product Security Team

Service Workers at Slack: Our Quest for Faster Boot Times and Offline Support