Calls: Is it you or is it me?

6 minutes • Written 7 years ago

Slack Calls are now in beta, on Mac, Windows, iOS, Android and Chrome. If you haven’t given it a try yet, please do (and let us know how it goes)! Our help center article on Calls has more details on the feature.

We wanted to answer the age-old question that we have all asked each other during a voice call when quality degrades: “Is it my connection or yours?”. We wanted a UI that the user can trust. So we decided to build in the logic to answer that question for you. It’s much better than a generic “Having network issues” warning.

We utilize a selective forwarding architecture to better support group calls. In this architecture, participants communicate only with the media server and are unaware of the state of each other’s network. In this post, we’ll talk about how we analyze real-time control protocol (RTCP) feedback from each of the users to generate correct feedback to all users about their individual uplink / downlink quality.

We use Janus as our selective forwarding unit (SFU) and WebRTC for voice chat. WebRTC provides us with standardized RTCP feedback but Janus doesn’t analyze it out of the box. This post is about us adding this support. Most code excerpts that you see are from Janus alongside some of our customizations.

SFU in operation. Notice that Susy’s voice (in yellow) is broadcasted as-is to Mike and Raja through the server

A little about SFUs

An SFU’s job is to broadcast audio which it receives on the real-time protocol (RTP) layer from each participant to all other participants (this type of architecture is also known as a router server). This is in contrast to a Multipoint Control Unit, or MCU, which mixes the audio from each participant to maintain a single client/server stream. More details on the differences between these approaches here.

In addition to RTP, there is also the RTCP layer (Real-time Control Protocol), which is used to relay back information about the connection. The topology offers us an advantage: since the server sits in the middle, it can separate out the two legs of the call (upload and download). We can use this to pinpoint a problematic link.

How we do it (TL;DR version)

We analyze the RTCP Receiver Reports (RR) that all the receivers of a particular stream send back (which denotes total loss) while simultaneously calculating the upload loss of said stream from the sender. Given this, we figure out the realistic loss (loss caused only by the downlink from the server) on each of the downloads. We use ALL the realistic losses gathered by the process and convey them as custom Slack messages called download_link_quality and upload_link_quality.

Now we’ll show you how to analyze the RTCP RR packets that WebRTC receivers send back and use it to figure out the quality of each leg of the call.

Analyzing RTCP RR Packets

According to the RFC, the RTCP RR packet looks like this:

For simplicity, we assume that all RTCP-RR packets only have a single report block within them (called non-compound RTCP). Generally, they can have many of these RR Report blocks (for multiple streams) but for this example we assume there is only one. Also we assume you are familiar with packet parsing and that code is removed for brevity. To learn more about RTP/RTCP parsing check out section A-2 of the RFC.

The important part here is “fraction lost” which is a ratio (out of 255) of packets lost divided by the number of packets transmitted in this period.

Okay enough talk! Time to see some code.

Download loss

To calculate the actual number lost we need to know how many packets the receiver has seen so far. The ehsnr (Extended high sequence number) holds the 32-bit sequence number which the receiver has last seen before this report was sent.

uint32_t packets_received = ehsnr — receiver->last_reported_seq_num;
    receiver->last_reported_seq_num = ehsnr;
    double packets_seemed_missing = (double)packets_received *
    fraction_lost / 255;

The above gets us the actual number of missing packets. We must remember that we have just counted the total amount that the receiver has seen missing. It’s unclear whether this is due to download loss on the receiver’s side or due to the sender’s upload loss. Since we maintain the sequence number spacing from the original sender’s stream, the loss reported here is total loss (sender’s upload loss + receiver’s download loss). We must go deeper.

We calculate pkts_missed_due_to_upload_loss_since_last_rr by calculating how many sequence numbers have been missed on the upload side of this stream (code not shown for simplicity).

double loss = packets_seemed_missing — 
   sender->pkts_missed_due_to_upload_loss_since_last_rr;
    double realistic_loss = 0;
    if(loss > 0)
   realistic_loss = loss;
    slack_download_loss = realistic_loss / packets_received * 255.0f;

For download loss > 20% we would transmit an event to the receiver which would make their UI give the appropriate warning. As an example if Faraz is calling Pavel and is suffering from download loss > 20% he’d see the following:

Faraz’s side indicating to him that his download is bad.

This state is broadcasted to all other users so everyone knows that Faraz’s download is bad and he might not be able to understand what is being said. So during this call Pavel’s side would correctly say:

Why 20%? For two reasons. Firstly the audio codec WebRTC uses (Opus) is able to deal with packet loss < 20% elegantly so people may not even notice it. Opus does this using forward error correction (FEC) which is an ingenious way of encoding information about the previous packets into current ones (so they can be recovered if lost). There’s evidence that indicates that user perception of audio quality does not degrade significantly at loss levels of up to 20%.

Upload loss

This is the easy one and is done without using RTCP. Every RTP packet that you receive has a sequence number. We simply maintain a list of all sequence numbers received to date. Every 500ms, we run a loop that calculates fraction loss as:

int num_packets = g_list_length(sender->seq_list_since_last_rr);
    uint16_t exp_since_last = (packet->seq_number — 
    sender->seq_number_at_last_rr);
    int missing = exp_since_last — num_packets;
    upload_frac_lost = (uint8_t)(256.0f * missing / exp_since_last);

As with download loss, this is communicated to the sender and warns him appropriately above 20%. With the same call if Faraz is suffering upload instead of download loss this time around Pavel would see:

Pavel sees me as having upload problems on his side

So how much does all this help?

Though we do slightly more with the fraction loss we have calculated here (like pack it into a RTCP RR message back to the sender to increase FEC) sometimes there is little you can do to help. So we do the best we can and put up accurate UI warnings alerting users as to why they are experiencing the problem. We have found that this helps users understand the problem instead of “Having network issues” which would leave them more in the dark.

Combating the variability of the internet is a never-ending battle in real-time communication. We can’t guarantee perfect audio quality every time, but hopefully with changes such as these it makes your call experience a bit more pleasant, and your conversations more productive.

Learning more

We could not have built Slack Calls without amazing open source software, standardized protocols, and freely available research. Here are some resources to learn more about the technologies mentioned in this post:

We’re working hard to build a good team here at Slack, and, as you can see, have many big, interesting challenges! If they sound exciting to you, join us! Apply now