On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!

To help our users find more time in their day, the Search, Learning, and Intelligence team set out to improve the quality of Slack’s search results. We built a new personalized relevance sort and a new section in search called Top Results, which presents both personalized and recent results in one view.

A Unique Search Problem

Search inside Slack is very different from web search. Each Slack user has access to a unique set of documents, and what’s relevant at the time frequently changes. By contrast, in web search, queries for “Prince,” “Powerball” or “Pokémon Go,” can get millions of hits per day, whereas queries within a Slack team are rarely repeated.

Even though Slack search lacks the aggregate search data that has been put to such effective use in web search engines, it does benefit from some other advantages:

  • We know more about the user’s interaction history with other users, channels, messages, and UI elements within Slack.
  • Unlike web search engines, we don’t have to deal with spam or SEO gaming.
  • Although the total size of the text corpus is large, each team’s corpus is relatively small and thus allows us to devote more computational resources to each message during ranking.
  • We control not only the search interface but also the presentation and structure of the target documents.
Recent and Relevant toggles in Slack

Relevant and Recent Search

Slack provides two strategies for searching: Recent and Relevant. Recent search finds the messages that match all terms, and presents them in reverse chronological order. If a user is trying to recall something that just happened, Recent is a useful presentation of the results.

Relevant search relaxes the age constraint and takes into account the Lucene score of the document — how well it matches the query terms (Solr powers search at Slack). Used about 17% of the time, Relevant search performed slightly worse than Recent according to the search quality metrics we measured: the number of clicks per search and the click through rate of the search results in the top several positions. We recognized that Relevant search could benefit from using the user’s interaction history with channels and other users — their “work graph.”

As a motivating example, suppose you are searching for “roadmap”. You’re most likely looking for your team’s roadmap. If a member of your team shared a document containing the word “roadmap” in a channel that you frequently read and write messages in, this search result should be more relevant than another team’s 2017 roadmap.

By incorporating a user’s work graph into Relevant search, we saw a 9% increase in searches that resulted in clicks and a 27% increase in clicks at position 1.

Learning to Rank

Our team was confident that incorporating additional features — such as the searcher’s affinity to the author of the result and engagement in the channel, as well as certain characteristics of the message itself — would improve the ranking of results for Relevant search. To achieve this we settled on a two-stage approach: we would leverage Solr’s custom sorting functionality to retrieve a set of messages ranked by only the select few features that were easy for Solr to compute, and then re-rank those messages in the application layer according to the full set of features, weighted appropriately.

In building a model to determine these weights, our first task was to build a labeled training set. Because of the unique nature of search in Slack, we could not rely on techniques that use click performance of results from repeat queries to judge relevance. We also knew that the relevance of a document would drift over time. So we focused on a labeling strategy that judged the relative relevance of documents within a single search using clicks known as a Pairwise Transform. Here’s an example:

Illustration of the pairwise transform

If a query shows messages M1, M2, and M3, and the searcher clicks M2, then there must have been something different about M2 that made it better than M1. Since M2 was a better result than M1, the difference in the corresponding feature vectors, F2-F1, should capture the difference, and this difference in values is given a positive label. Inversely, F1-F2 is given a negative label. There are several strategies for picking the pairs for the pairwise transform, and we tried a few before settling on one. We ended up pairing each click at position n with the message at position n-1 and the message at position n+1.

Overcoming Click-Position Bias

One issue we struggled with was people’s tendency to click on messages at the top of the search results list — a message at position n is on average 30% more likely to be clicked than a message at position n+1. Because the position of a message is such a strong indicator of whether or not it was clicked, we found that our initial models were learning to reconstruct the original order of the list. To counteract this effect, we evened out the distribution of clicks by position by oversampling clicks on results lower down in the list.

In addition, by pairing each click at position n with the message at position n-1 and the message at position n+1, we ensured that our training examples would contain equal numbers of message pairs with clicks in the lower and upper positions. We recognize that this is somewhat unsound since the user might not have even seen the message at position n+1. For now, this is an acceptable tradeoff, and we are actively pursuing other approaches to this problem.

Training the Model and Results

Using this dataset, we trained a model using SparkML’s built-in SVM algorithm. The model determined that the following signals were the most significant:

  • The age of the message
  • The Lucene score of the message with respect to the query
  • The searcher’s affinity to the author of the message (we defined affinity of one user for another as the propensity of that user to read the other’s messages — a subject for another post!)
  • The priority score of the searcher’s DM channel with the message author
  • The searcher’s priority score for the channel the message appeared in
  • Whether the message author is the same as the searcher
  • Whether the message was pinned, starred or had emoji reactions
  • The propensity of searchers to click on other messages from the channel the message appeared in
  • Aspects of the content of the message, such as word count, presence of line breaks, emoji and formatting.

Notably, aside from the Lucene “match” score, we have not yet incorporated any other semantic features of the message itself in our models.

Measuring our Progress

Our team released our machine-learned re-ranker for Relevant sort on November 30th to 50% of users. For our top-line metrics, we looked at sessions per user, searches per session, clicks per search, and click-through rate among the top 1, 2, and 5 search results. As previously mentioned, we saw significant gains over the existing Relevant search — a 9% increase in clicked searches and among searches that received at least one click, a 27% increase in clicks at position 1.

Top Results for the query “platform roadmap”

Putting it all together

We wanted to make sure our users could find what they’re looking for more quickly without having to worry about sorting by relevancy or recency. The new Top Results module solves this problem by showing the top 3 messages from Relevant search above the normal Recent results.

Every time you run a Recent search, we also run the Relevant search in parallel. We decide whether to show the Top Results section based on some simple heuristics, such as result diversity and quantity. Our initial experiment results show a significant increase in search sessions per user, an increase in clicks per search, and a reduction in searches per session, indicating that the Top Results module is helping users find what they are looking for faster.

Future Work

Top Results is just the beginning of really exciting opportunities in making Search in Slack better. Imagine searching for “401k matching” and instead of just receiving relevant messages or files, you also get a list of people in HR that can answer your question, or a list of channels for your query where you might be able to find the information you are looking for, or even a list of commonly asked questions relevant to that topic with links to the channel where each one was answered. We still have a lot of work to do to reduce that 20% of information-seeking time, allowing users of Slack to have a more pleasant, productive experience.

If you want to help us tackle data-driven product development challenges and help make the working life of millions of people simpler, more pleasant and more productive, check out our SLI job openings.