Microsoft Teams issues are one of the most common complaints facing IT operations and the help desk. Over 75% of Microsoft Teams issues are caused by network problems between users and the Microsoft Teams media servers that stream video, screen sharing and call audio between participants. Microsoft doesn’t offer monitoring capabilities to detect, diagnose and resolve Microsoft Teams issues caused by poor internet, SD WAN, private line or VPN connectivity, leaving IT without a clear path to isolating the cause of Teams quality issues.
Fix Microsoft Teams Performance Issues
Watch this Livestream LinkedIn event to learn how to fix Microsoft Teams issues using automated network path and performance testing. Learn how to verify “is Microsoft Teams down?”, and how to optimize your employees’ MS Teams digital experience.
We will show you how to resolve:
- Microsoft Teams connection issues
- Microsoft Teams video not working complaints
- Microsoft Teams call quality problems
Kadiska’s experts will explain how MS Teams works: its media server architecture and how the performance of Teams calls is defined by when and how users connect to a Teams Meeting. We will use live traffic to show how connectivity impacts Teams performance, and will demonstrate how to detect and resolve common issues encountered by onsite and work from home users accessing Microsoft Teams.
Webinar Transcript
Thanks for joining us. My name’s Scott Sumner. I’m a the CMO at Kadiska, and I’ll be joined here today by Terry Notman, who’s our Chief Product Officer, and we hope to illuminate some of the problems that you guys have with Teams and give you an actionable path to solving those in this webinar today. I through a little bit of the background of Teams and then we’ll hop into live diagnostics of Teams issues and why they happen and how you can fix them.
Is Teams actually one of the single biggest issues you guys have? We’ve had a lot of communication online on LinkedIn, and I’ve actually had this conversation with one of the executives at Salesforce, not something you can make up, right?
And he said we actually paid, 27 billion to eliminate our Teams problems. Now that’s probably exceptional, and it’s a little bit unusual just because of course, that they bought Slack just a little while back. And they’ve actually replaced Teams completely by using Slack conferencing instead.
But for most of us who don’t have, an extra $20B lying around to solve this problem or don’t want to change all of our infrastructure, there is definitely a better way to solve these problems. So when we talk to IT operations guys, network operations, people at large enterprises, they tell us this is their single biggest performance issue.
And of course working from home and a lot of remote working hybrid working is causing this problem to become much more critical for the enterprise. When you look at down detector and people report in the end what they’re the. The root cause of the problem was it’s more than three quarters of the problems with Microsoft Teams call quality connectivity issues come from the network side of things, which is not a big surprise, but actually with Microsoft Teams, the way they actually demand the network to perform is extremely.
Different actually, than some of the other conferencing solutions. And this is actually what exacerbates the issue and causes Microsoft Teams to behave in some cases quite a bit worse than other conferencing solutions. The, when you look at the majority of these issues coming from the network side, the question is, why is the architecture of the way Teams is designed?
Dependent on a specific kind of network behavior to make things work well. So we’ll take a look at that live in a little bit, but I wanna show you that, behind Teams when you, whether you’re logging in through a web client or the desktop client, there’s a lot of different host names and servers behind Microsoft team session.
It’s not just, Teams.Live.com or one server. There’s all of these different servers and there’s many more than this. There’s more like 14 different kinds of servers you have to connect to join, to stream the chat side, to do the whiteboard, but most important f rom your users’ perspective are the media servers. And the media servers are the ones that are really responsible for the streaming of the video and sharing your screens between participants as well as like the audio.
So the media servers are the critical component that once the Teams session is up and running it can define pretty much everything that’s gonna happen to you, and probably has in your experience with Microsoft Teams. So call drops call quality issues one way. Audio, all the things that can go wrong.
That’s all responsible for the media servers performance. And I’m not actually sure that it was Bill Gates who said this, but it could have been because this is a really important piece of information for IT guys. If you want Teams to perform well you have to really connect as, as quickly as possible to the Microsoft Core network and the roundtrip latency here between yourself and these media servers needs to hundred milliseconds or less or as Bill is saying here, the Teams calls are going to suck. There’s a lot of reasons why this latency causes an issue. But it’s something you see quite often and it’s particularly acute in, in Microsoft Teams’ connectivity. And that’s because the Microsoft Media server architecture is different than you get on another other conferencing solutions.
They have media servers all over the world. There’s one near you for sure. And if you’re having a good Teams call, it’s because you’re close to your server that you’re connecting to. But when you’re working on a call that has people across multiple continents, sometimes things can go really bad.
And that’s because all the users on a single Teams call have to join the same media server. So unlike if you’re looking at, say, Google Meet or WebEx, where you join the closes, Media server and it allows you to then stream between them all the Teams call participants have to join the same media server.
And how does Microsoft decide where your media server will be? That’s gonna host your call? It’s actually almost random. It’s based on wherever the person is who joins the call first. That you’ve read that right? If one person joins from a remote location like Australia, everybody else on the Teams call has to connect through the same media server in Australia.
And when you’re talking about these latency issues, a hundred millisecond, you’ve just broken the call. And of course you , you we all have experience with this. Sometimes you can’t control this. What if the person’s an external person joining from a different company and you’re having a call with someone in India or Latin in America or anywhere that’s outside your region.
You’re going to be suffering from this quite badly. So we know from the stats before that if Teams issues happen, it’s most likely the network and this architecture choice that Microsoft has made. Almost guarantees you’re gonna see this in international calls, but the problem. In this scenario in terms of debugging it, troubleshooting and trying to fix it, is that there’s not just one network between you and these media servers, right?
There’s really complex matrix of connections when you look at people working from home. Right now, Gartner just reported a couple weeks ago that 58% of employees spend are predominantly working. In jobs that support that very highly distributed sites. So if you’re at a retail location or manufacturing site or headquarters all over the place and you have a lot of different connections that are now broker through things like SASE or cloud service broker, the CASB, VPNs, there’s SD WANs, there’s cloud networks out there. There’s private connections and there’s CDN streaming traffic as well. So you end up in a situation where this hybrid connectivity matrix is very difficult to understand exactly where the problems that can cause long latency originate from.
So that’s what Kadiska does, and we’ll show you that in a few seconds. Set up test stations around the world that are managed by us, and you can, anyone can use those to test out connectivity from all the main ISPs and countries in the world. And also you can test from your own sites by spinning up just a simple, container based tracer station. And from there you can understand the absolute route between yourself and team server or any other SaaS or web-based application. And you can see all different service providers in between you and the team server. And also which one is breaking the call.
And be surprised how you get redirected all over the place depending on your issp or your providers or your SD WAN’s configuration. So this is what we’re gonna take a look at and I’m gonna pass this on to Thierry and look at how to find problems with Teams and how to fix it using our application.
Thanks Scott. I will try to give a sense of what we can do in just 10 minutes. As a short wrap up of what Scott explained before showing you life, there are two really important things you want to monitor when dealing with Microsoft Teams connectivity perform. First, you want to monitor the network latency between your users and Microsoft Media servers.
They may connect to, as you should remain below 100 millisecond. Next, to achieve this, Microsoft recommends to connect as quickly as possible to its backbone, which is optimized to transport streaming traffic. So this is another aspect you want to measure and understand. How do your users connect to the Microsoft backbone through the local?
And you have to keep one important aspect in mind, which is quite specific to Microsoft compared to other players. Like Scott already mentioned, all attendees will connect to the same media server and the first attendee to join the session will determine the choice of the server. Let’s take now a first example.
So for this example, let’s assume that Scott and I have a session together. Scott resides in Canada and I’m living in Belgium. . So by now, you know that we both have to connect to the same Microsoft Media server. So in this case, let’s assume that Scott is the first to connect to the Microsoft team session.
It means that Scott will probably connect to the closest Microsoft Media server and this is exactly what is shown here. So this dashboard shows the situation of Scott connecting from Montreal. Two different Microsoft Media servers potentially. So we have here a panel of 14 media servers. We monitor for the purpose of this demo, and you’ll see here from the list that the smallest round trip time Scott will have is when he will connect to the Microsoft Media server located in USA here.
So this is this 1 31 milliseconds when connecting to an East coast. US based my soft media server. But what about me? What will be the situation? Because I will have to connect to the same server, so I will change my viewing angle by choosing another source. So I will now be focusing on Belgium.
This is where I live and obviously my situation is different from the situation of Scott. I’m not getting in Belgium, so I better connect to major servers located in Europe. But in this case, I will have to connect to this one. And that this one has a run trip time of 101 millisecond, which is really at the limit of the recommended value by Microsoft.
But it’s not so bad because don’t forget that the traffic will have to cross the ocean between the Belgium and the United. so I can quickly understand why the situation is not so bad by going to the past performance.
The past performance view shows me all the network traffic paths between my location and the Microsoft media servers in the us. And I can understand why the network latency is not so bad in my case because remember the recommendation from Microsoft, which is to peer, to connect as quickly as possible to its backbone.
And this is exactly what happens. I’m connected on my local Proximus ISP network, which appears directly with the Microsoft backbone, even here in Belgium, way before crossing the ocean. And I can check that by highlighting the delays. And you see here, most of the delays are within the Microsoft backbone.
This is where the traffic has to cross the. And this is why here I have a not so bad experience. Let’s now take another example and let’s focus on a user who is located in Argentina, for example. So let’s focus on Argentina and let’s have a look at the user situation. So as. When you are located in Argentina, you better connect to the closest Microsoft Media server, which is located in Brazil.
But in this case, that the round trip is quite significant is 57 milliseconds average, which you can argue it’s quite a long time for a connection between Argentina and Brazil. So let’s have a look of what happened from a network connectivity and network past point of view. I simply select the media server from Brazil, and again, I go through the past performance view so that I will discover all the network paths that relate to this connect.
And here you see in fact what happened. The users in Argentina are connected locally on the level three network backbone on the autonomous system, 35, 49, before reaching another autonomous system, which is the 33, 56 from the same level three communications operator. Then finally, before reaching the Microsoft back, And that in fact traversing the whole level three communications networks before reaching Microsoft Corporation.
Cause a lot of delays
But I can check that by highlighting the delays here and you see indeed that most of the delays are introduced when the traffic is traversing the level three communication backbone. So one of my conclusion as an organization, if I have a lot of users located Argentina using the Microsoft Teams platform, I should potentially better use an alternative local operator to connect my users to the Microsoft backbone more efficient.
The very last thing I want to show you is how you can also use the Watcher technology in a context of network connectivity, performance monitoring. The watcher provides performances metrics directly from the user’s browser. The experience of a user when dealing with Microsoft team does not start when being in a session
it starts by joining a session which involves performing API calls that we are able to identify and analyze from a performance standpoint. So let’s take an example here. So I will go to the watcher, one of the watcher monitoring dashboards, and I will select a lot more data here from the past. So I will take up to August for example.
Okay, and let’s focus on the Microsoft Teams applications and more specifically, I do want to filter on all joining events, so when users join Microsoft Team Sessions. So for that, I can simply here filter on joint event. . Okay, So I do not have time to go into all the details of the different metrics we provide, but what I want to show you is how we can easily identify the users and their respective locations.
So for example, we make the distinction between remote users typically connecting from home. We can identify automatically on which ISPs they are locally connected and where we can also. Locate them by country, region, city, continent. And then we can distinguish these remote users with the onsite users that are connected from corporate sites.
And here you have always for each for each section here, you see that. Provide the main performance drivers for the application, and you see the yellow part, which is related to the network performances. So I can tell you that joining Microsoft Team session what for users connected to the Proximus network was quite a problem from a network.
Point of view, it took nearly 400 milliseconds to initiate a session to join a Microsoft team session when a user was connected to Proximus Network. Okay, so here I want, I can easily identify who was impacted by simply filtering on the Proximus ISP name so that I can see that it was me, in fact, who was?
Then I’m probably interested in knowing more about the proximus ISP network performances in general. For that, I can navigate to the network dashboard and get rid of some filters to have an overview of how proximus behave over time in terms of DNS resolution time, connectivity time, and TLS handshake.
You can easily benchmark proximus how Proximus behaves compared to other ISPs, and you can, for example, easily identify the users that are connected on this Proximus local isp. So I see most of the users are connecting from Brussels and from the Asian, for example, but you see that. Users connected from sinJustin, No. On the Proximus local operator did experience a very bad TS handshake set up time so that here I can easily focus on this location and then for example, identify the application, what was degraded. And in this case, the users did have some TLS hench problems when connecting to the Kadiska website.
And by the way, you can see the impact from a loading point of view. For example, the impact of the bad network setup has had on this user.
Thanks for everybody for for watching, and I just wanted to let you know that if you are interested in trying to solve your own Teams issues we offering a free trial.
So Kadiska is very easy to spin up. It’s a software as a service, so you can deploy it almost instantly to trace out to your different sites and from our different stations around the world to see how people in your company are experiencing Teams and what the problems might be. So if. you fill in the form at learn.com/try, then we’ll spin up an instance for you and we can give you a quick tutorial one on one to get you set up and get started with that.
Let us know.