Have you ever been given a relatively inactive project and asked to fix a bug? What about having to update code that’s used by thousands of projects without the guidance of the original author?

I stepped into a circumstance like that when I joined the Developer Relations Tools Team at Slack. At the start of 2019, my manager tasked me with taking over as the lead maintainer of the Slack Python SDK: a set of open-source tools, documentation, and code samples that aid in the creation of Slack apps written in Python.

This blog post is a view into my journey throughout the last few months: what motivated me to make big changes, the process I took to implement them, and how getting feedback from the community changed my original plans.

The desire for change

The Slack Python SDK, originally created in 2016, provides a simple HTTP and WebSocket client to help communicate with Slack’s Platform APIs. The goal was to make it easy to get started building a Slack app in Python, as well as to ensure that no matter how complex your app becomes, it’s scalable and maintainable.

As our platform grew over the years, the Slack Python SDK had challenges keeping up. Adding a new feature felt akin to making a late-stage move in Jenga.

Pagination, a feature available on the platform since July 2018, was difficult to support due to the complexity around how requests were made and responses were parsed. There was also code that existed for deprecated features, such as workspace token apps.

Ultimately though, my attempts to resolve existing issues in GitHub became the biggest motivator to make major changes. A lot of the errors encountered were difficult to reproduce and the most common were all related to our WebSocket interface. Proxy and SSL issues became the bane of my existence. This was due to the fact that we’re at the mercy of our internal client library dependencies. This prompted me to dig deep into the structure of our code and the decisions that were previously made.

Assessing the situation: refactor or rewrite?

Code structure

Before diving into specific challenges, I wanted to make sure I understood the structure of the project. I spent a week evaluating our current state, the existing issues and PR’s, and the new features we wanted to add. I questioned the purpose of each function, the state it needed, and in which object it lived. I also loosely mapped how data flows when interacting with our Web API versus interacting with our RTM API.

SDK design

At a high level, all Slack apps can be broken down into a few primary functions:

  1. Query for information from Slack (Web API).
  2. Enact change to state of Slack (Web API).
  3. Listen for state changes inside of Slack (RTM & Events API).

The Slack Python SDK should enable these functions with the least amount of effort. So I walked through the implementation to check if this was true.

Querying for information and enacting change

To query for information from and enact change in a Slack workspace you’d utilize the Web API. The Web API is a collection of HTTP RPC-style methods, all with URLs in the form of https://slack.com/api/METHOD_FAMILY.method.

To use the Web API, you instantiate a SlackClient and execute a function called api_call while passing the name of the Web API method, along with any data the method required.

Here’s an example of sending a message to a channel:

from slackclient import SlackClient

slack_token = os.environ["SLACK_API_TOKEN"]
client = SlackClient(slack_token)

client.api_call(
   "chat.postMessage",
   channel="C0XXXXXX",
   text="Hello from Python! :tada:"
)

What worked well?

  1. Accepting the Web API method as a parameter kept maintenance costs low when methods were added or removed.
  2. Accepting keyword arguments enabled flexibility with what users could send to the API.

What didn’t work so well?

  1. Performance could be improved if we reused sessions on subsequent API requests.
  2. Pagination was complex to add due to the structure of how requests were being made. The same would be true if we decided to add direct support for a Block Kit message builder class.
  3. When uploading files to Slack, every app had to implement basic file handling.
  4. There was no context about the Web API. Every time a developer wanted to interact with the API they were required to visit the documentation site for information. This killed productivity.

Listening for state changes inside of Slack

State changes in Slack, such as a message being posted to a channel, are represented by what we call “events”. Events are JSON payloads containing details about the change that occurred.

There are two ways to receive events:

  1. Allow Slack to send you HTTP requests through our Events API (preferred).
  2. Stream events through a WebSocket connection with our RTM API.

A quick note about the Events API: It allows you to subscribe only to the event types you need, and is governed by OAuth permission scopes. It requires a running WebServer. The Events API is not yet supported in our SDK, but it’s on our backlog and will be available soon. You can, however, use our Flask-based extension slackeventsapi to get access to the Events API.

The RTM API solves a major problem for our customers who do not allow for incoming HTTP requests from 3rd parties. It is a WebSocket-based API that allows you to stream events from Slack in real time from an outgoing connection that you initiate.

To use the RTM API, you instantiate a SlackClient and establish a connection with the rtm_connect() method. While connected you then read from the stream as events come in by calling rtm_read().

Below is a typical example of what this looked like:

from slackclient import SlackClient

slack_token = os.environ["SLACK_API_TOKEN"]
client = SlackClient(slack_token)

if client.rtm_connect():
   while client.server.connected is True:
       for data in client.rtm_read():
           if "type" in data and data["type"] == "message":
               # ...
else:
   print "Connection Failed"

What worked well?

  1. Developers are interacting through one main interface, i.e. SlackClient.

What didn’t work so well?

  1. Every app built on this client was required to implement a loop to read messages.
  2. All apps had to inspect internal state to determine the connection status.
  3. SSL certification errors started to occur more frequently due to a change in our internal dependency on websocket-client.
  4. HTTPS proxy support was limited also due to constraints with the same websocket-client dependency.
  5. Each type of event that your app needs to respond to was appended to an if statement. This increased complexity and reduced readability.

Opening an RFC

After assessing the code, it was clear that using the existing client to make HTTP requests didn’t provide as much value as it could to developers. Further, using the client to listen for state changes inside of Slack with the RTM API was more complex than it needed to be. To best leverage Python’s features for creating clean, efficient code, it was time to restructure our codebase. However, since I was new to the project, I didn’t want to impose this decision unilaterally. Instead I opened issue #384, “RFC: Python Slack Client v2.0”, to break down how I thought we could improve our client.

Client decomposition

In the v1 implementation, SlackClient lacked a separation of concerns. It’s a single class that contains both an HTTP client to support the Web API and a WebSocket client to support the RTM API. These objects, however, do not have a mutually beneficial relationship. So the first thing I proposed was splitting them into two separate classes. This would allow us to encapsulate all relevant information for both clients behind well-defined interfaces which would improve the Slack Python SDK’s maintainability.

Web client

In the RFC, the proposed WebClient would be a thin HTTP client wrapper that focused solely on preparing requests and parsing responses from Slack’s Web API. At the time, this was built using the requests library.

I saw the opportunity to reduce the boilerplate code by implementing the Web API methods directly. This would also afford us the ability to provide more information about the API methods while they were being used, as well as handle common functionality such as opening a file from a specified path. Here’s an example of what this would look like if implemented as suggested:

RTM client

Inspired by the open-source Ruby Slack SDK by Daniel Doubrovkine, I wanted to abstract away the complex looping that was previously required to respond to events from the RTM API. I wanted to allow developers to focus more on their application and less on reading from the WebSocket stream. Regardless of the APIs used, structuring your app as an event-based system is the simplest and most efficient way to build apps and bots that respond to activities in Slack.

The proposed RTMClient would allow you to simply link your application’s callbacks to corresponding Slack events. Below is pseudo-code of what this would look like:

import slack

def message(**payload):
   data = payload["data"]
   web_client = payload["web_client"]

rtm_client = slack.RTMClient(token="xoxb-123")
rtm_client.on(event="message", callback=message)
rtm_client.start()

In v2 I also wanted to ensure that:

  • Every piece of state was critical to the object it was associated with, and anything that wasn’t was deprecated.
  • Internal methods and variables were made “private” (i.e. prefixed with an underscore).
  • The public API was defined explicitly and its intended usage made clear with documentation.

Getting feedback

The RFC was created and feedback from our open-source community started to roll in. I also received feedback on this proposal internally at Slack from a council of Platform engineers and Python developers. Initial feedback was overwhelmingly positive. People were excited that we were making a major update to the project.

In fact, most of the feedback from our community was from developers who wanted to take advantage of some of Python 3’s new features. Async support ending up being the top request. There was also interest in seeing Type Hints and Return Types added. So I posed the question internally at Slack…

“Can we drop support for Python 2 in this new version?”

Well it turns out this question is still one that ends friendships. I’m kidding, but seriously, it was a deeply debated decision.

On the one hand, we previously supported this version and it’s still in-use. Deprecating it would be a major inconvenience to some of our developers. Let’s also not forget that not everyone has control of their environments, making it hard, or even impossible, to upgrade to Python 3.

On the other hand, Python 2 will reach its end of life at the end of 2019. The active Python 2.7 usage in our project is trending down, and has gone from about 50% to 35% over the last few months. Why hold back the majority of our community from being able to take advantage of new features in Python 3?

The settlement

As the lead of this project, I believe it’s important that we strive to follow and encourage the best practices. The Python community has decided to move away from 2.7, therefore so will our project. I acknowledge that this may not work for everyone, and for this population we’ll continue to support bug fixes in our previous version until Dec 31st, 2019 (Python 2’s EOL).

Redesigning for Python 3

Okay so now that we’re targeting Python 3, let’s go through some quick wins.

Keyword-only arguments

PEP 3102 allows us to enforce the use of keyword arguments. This makes usage of API methods and the arguments you’re passing in much more explicit. I decided to take advantage of this because it gives developers a much clearer picture of what each argument represents. This increases a projects maintainability over time. If Slack’s API changes, their code remains flexible.

Type hints

PEP 0484 allows us to provide argument type hints. In the same spirit of optimizing the Slack Python SDK for developer experience, I’ve implemented type hints for every method. This allows us to be explicit about what our API expects when sending data.

Async support

The next thing added was async support. However before we jump into how we implemented async, let’s walk through some basics first.

What is async?

From a high level, asynchronous programming is a way to write code that allows other tasks to execute while it’s still performing some slow operation. It eventually regains control and resumes execution.

Okay, but why should you use it?

Async is not a one size fits all and should only be used only when it adds value. An app running CPU intensive operations will not see much gained from asynchronous programming.

Given the heavy I/O nature of Slack apps, taking advantage of async programming can help drastically improve the performance, responsiveness, and efficiency of your app. This is because your app will not sit idly by while sending data out to or waiting for a response back from Slack.

Asynchronous I/O

In Python, asyncio is a library that provides a foundation to write concurrent code using the async/await syntax. To add async support, I had to find an HTTP client and a WebSocket client that supported asyncio out of the box.

Enter AIOHTTP

After some research and experimentation I ultimately settled on AIOHTTP. AIOHTTP is an asynchronous client/server framework for asyncio and Python. AIOHTTP supports both HTTP and WebSocket protocols. It’s exactly what we need.

Building the WebClient

Building the WebClient class was pretty straightforward. Simply put, it’s a class that implements methods that mirror Slack’s Web API methods. When executed, they construct and submit an HTTP request using the internal HTTP client class provided by AIOHTTP. When a response is received, it’s parsed into a JSON-like object we call SlackResponse.

Here’s what usage looks like:

import slack
import asyncio

async def run_api_test():
   client = slack.WebClient(token="xoxb-abc-123")
   response = await client.api_test()
   assert response["ok"]

if __name__ == "__main__":
   event_loop = asyncio.get_event_loop()
   event_loop.run_until_complete(run_api_test())

While this works for projects that embrace asynchronous programming, it wouldn’t fit very well in those projects that don’t. In an effort to make our SDK more flexible, I pulled some of the async logic into the WebClient class. By default, async is turned “off”, which means that all API requests will run until completion.

Here’s what making the same API requests looks like run synchronously:

import slack

def run_api_test():
   client = slack.WebClient(token="xoxb-abc-123")
   response = client.api_test()
   assert response["ok"]

if __name__ == "__main__":
   run_api_test()

Now to enable async again, you simply set the client variable run_async to True. This will cause all API requests to return a Future just like before.

Here’s what async usage looks like now:

import slack
import asyncio

async def run_api_test():
   client = slack.WebClient(token="xoxb-abc-123", run_async=True)
   response = await client.api_test()
   assert response["ok"]

if __name__ == "__main__":
   event_loop = asyncio.get_event_loop()
   event_loop.run_until_complete(run_api_test())

Building the RTMClient

Building the RTMClient was a bit more complex. We had to support the same idea that it was easy for everyone to build apps, whether they wanted to brave the new world of asyncio or not. I wanted our SDK to do the heavy lifting for you when you first get started, and to also get out of the way when you want to take control.

Linking callbacks to events

The first feature I implemented was the ability to link your application’s behaviors to events that occur in Slack. To do this, I created a private dictionary called _callbacks on the RTMClient class, as well as a class method called on() that’s responsible for updating the callbacks container. This was done at the class level to ensure that applications respond consistently to events regardless of the teams they’re connected to. I generally recommend that all logic that’s related to specific teams be handled in your application layer. Linking callbacks to events should be akin to mapping routes to controller actions in the MVC pattern.

from slack import RTMClient

def say_hello(**payload):
   data = payload["data"]
   web_client = payload["web_client"]
   if "Hello" in data["text"]:
       channel_id = data["channel"]
       thread_ts = data["ts"]
       user = data["user"]

       web_client.chat_postMessage(
           channel=channel_id,
           text=f"Hi <@{user}>!",
           thread_ts=thread_ts
       )

RTMClient.on(event="message", callback=say_hello)

I also created the run_on decorator to make it more convenient to link your callbacks to events.

from slack import RTMClient

@RTMClient.run_on(event="message")
def say_hello(**payload):
   # ...

It’s important to note that your callback must accept keyword-arguments or “**kwargs”. If it does not, an exception will be raised immediately. I made this decision to ensure we could always pass the following collection of objects: the corresponding RTMClient instance, a WebClient, and an object called data which will hold any related data to the occurring event.

Establishing an RTM WebSocket connection

To connect to Slack’s RTM server, users simply call the start() function:

from slack import RTMClient

rtm_client = RTMClient("slack_token")
rtm_client.start()

This function is responsible for retrieving the WebSocket URL from Slack’s Web API and establishing a connection to the message server.

Dispatching events and executing callbacks

Almost everything that happens in Slack will result in an event being sent to all connected clients. Throughout a connection’s lifetime, the RTMClient is responsible for dispatching these events as they occur. This means we execute any related callbacks when the specified trigger occurs.

There are also 3 WebSocket-related events that occur when the state of WebSocket connection changes: open, close, and error.

In my current approach, I forked the execution of callbacks into two separate private functions. One was responsible for executing callbacks asynchronously and the other for executing callbacks synchronously. The full implementation can be found in the RTMClient method _dispatch_event().

The function _execute_callback_async runs the following steps:

  1. Ensure the callback is coroutine. Manually convert it if it wasn’t.
  2. Schedule its execution on the event loop.

The function _execute_callback runs the following steps:

  1. Create a ThreadPoolExecutor.
  2. Execute the callback with the executor.
  3. Wait until it’s complete.
  4. Return the results.

This worked pretty well initially when testing, but I’ve started to see that this approach is flawed. Manually changing a callback into a coroutine is a misguided attempt to optimize something that really didn’t need to be optimized. In hindsight, I also see that there are a number of places where executing code with ThreadPoolExecutor is necessary. Creating this object over and over again is wasteful and unnecessary. I’m currently working on an update to resolve these issues. I’ll likely handle the execution of non-async callbacks by utilizing the asyncio eventloop function #run_in_executor().

If you’d like to share some advice, suggestions, or have any additional ideas for how this could be handled, please let me know on this GitHub issue. I’d love your feedback!

Implementing auto-reconnect for the WebSocket connection

The last piece to this update was ensuring that we automatically reconnected to RTM if the connection was dropped for any reason other than the user stopping it. The initial implementation focused on ensuring that the wait time increased exponentially if the exceptions we specified continued to occur. We also added random number of milliseconds to avoid coincidental synchronized client retries.

while not self._stopped:
    # Connect and read from RTM...
except (SlackClientNotConnectedError,SlackApiError):
    if self.auto_reconnect:
        wait_time = min((2 ** self._connection_attempts) + random.random(), max_wait_time)
        await asyncio.sleep(float(wait_time))
        continue

Conclusion

My initial aim with v2 of this SDK was to address a few goals:

  • Improve developer productivity by providing more context about Slack API’s in your editor,
  • Improve the performance of apps,
  • Make it simpler to fix bugs, and
  • Add features as our platform continues to grow.

While most of this impact is reflected internally in the project, there are a couple things we can highlight that improves the experience for our developers.

I’ve taken basic Web API requests such as this:

sc = SlackClient("slack_token")
sc.api_call("chat.postMessage", channel="C0XXXXXX", text="Hello from Python! :tada:")

And have made them simpler, while providing more value in your editor:

sc = WebClient("slack_token")
sc.chat_postMessage(channel="C0XXXXXX", text="Hello from Python! :tada:")

For those using the RTM API, previously you were required to loop over the stream of messages:

client = SlackClient("slack_token")

if client.rtm_connect():
   while client.server.connected is True:
       for data in client.rtm_read():
           if "type" in data and data["type"] == "message":
               # ...
else:
   print "Connection Failed"

This is now slightly easier to scale since you simply link your callbacks to events:

@RTMClient.on(event="message")
def say_hello(**payload):
    # ...

rtm_client = RTMClient("slack_token")
rtm_client.start()

To those who’ve struggled with SSL and Proxy issues, I believe you’ll be very satisfied with AIOHTTP’s advanced configuration options.

In our current state with v2, I’d say that we’ve made some pretty good progress! We’ve redesigned the project to improve the developer experience, to make it more maintainable, and to take advantage of the new features in Python 3. However, we’re just getting started.

What’s coming next?

Performance and quality Improvements

For the next release of the Slack Python SDK, I’ll be focusing on improving the performance of both the RTMClient as well as the WebClient. Along with improving the quality of the code by fixing any missed bugs and increasing our test coverage.

HTTP Event Server

I’ll also be working to build a Web Server capable of supporting the Slack Events API. This will most likely take advantage of the HTTP Server that comes built into the AIOHTTP library. It seems wise to utilize the same dependency to provide everything our customers need to build a Slack app.

More tutorials and sample code

Finally, I’ll be working to add more code and documentation to help get our developers up and running quickly. Do you have an idea for a Slack app you’d like to see built? Add a comment to this Github issue. If it’s useful for others, I’ll consider building and open-sourcing it.

Try it yourself!

Interested in trying the new SDK? Want to build a quick app yourself? Take a look at the getting started tutorial in Github. You should be able to create a running app in under 10 minutes.

P.S. Interested in working on open-source tools? My team is hiring!

P.P.S. You can dig through the Slack Python SDK source code in the GitHub Repository.