Best Practices for SD-WAN Monitoring

by | Feb 1, 2022 | Network Performance

Boris Rogier

Boris Rogier

Co-founder

SD-WAN is now being massively adopted by enterprise network teams. Enterprise IT departments leverage SD-WAN to rationalize the cost of WAN / cloud / SaaS access and optimize user performance to internal, cloud-hosted and SaaS resources. They initially trust their SD-WAN vendor to provide a comprehensive out-of-the-box set of monitoring capabilities. Are these monitoring resources sufficient to manage the day-to-day operation of a SD-WAN deployment? What are the best practices to implement SD-WAN performance monitoring? 

SD-WAN is here to stay! 

Some of us questioned the adoption level SD-WAN would reach as remote work and WFH were becoming the new norm during the pandemic.
The COVID crisis temporarily defocused IT network teams away from their SD-WAN projects and deployments as they had to quickly adapt their infrastructure to the bursting need for secure remote work. 

Nevertheless the drivers of the adoption of SD-WAN are still present: cloud and SaaS adoption. While remote work has reached a new scale, offices, stores, production units still need to access both internal, cloud based and SaaS applications to run.
SD-WAN offers a great opportunity to optimize the cost of that access and improve the performance for users: SD-WAN makes it possible to securely decentralize cloud/internet/SaaS traffic to consume cloud resources and SaaS apps from remote offices (without backhauling to data centers through expensive private circuits). 

SASE functionalities cover the security gaps that until now prevented enterprise teams to move away from backhauling internet / SaaS / cloud traffic through their on-premise secure gateways. 

Native SD-WAN monitoring gaps and weaknesses

SD-WAN solutions offer some monitoring and visibility functions: what can you expect out of them? Where are the main gaps? 

Any SD-WAN solution will provide: 

  • Device based metrics (availability, resources through SNMP)
  • Some form of traffic/bandwidth usage breakdown
  • Some vendors will provide edge to edge latency monitoring through an overlay tunnel (i.e. the network response time from one SD-WAN device to another) 

Most SD-WAN monitoring will show two common limitations: small metric set, short / no historical retention. 

 

In the end, the native coverage of SD-WAN instrumentation leaves the following gaps: 

Visibility offered by SD-WAN solutions Visibility needed
  • Bandwidth
  • Overlay
  • Infrastructure device monitoring
  • Edge to edge
  • Live data
  • Overlay and underlay
  • User experience
  • Complete delivery chain
  • Historical data to troubleshoot degradations after the fact

 

Traditional network performance monitoring (NPM) is unfit to monitor SD-WAN performance

NPM stands for Network Performance Monitoring and regroups solutions that offer the following capabilities: 

  • Device monitoring (SNMP)
  • xFlow collection (bandwidth analysis using Netflow or other type of flow data)
  • Traffic analysis 
  • Synthetic monitoring 

While the device monitoring is often covered by SD-WAN solutions, the other NPM capabilities fail to adapt to the context of Cloud and SaaS access through SD-WAN for the following reasons: 

  • Advanced encryption is a vast obstacle for flow collection and traffic analysis. It makes it impossible for most solutions to provide insights on both the user level and network level performance and sometimes also to recognize the different applications running through the network. The adoption of TLS protocols like TLS1.3 and following combined with the newer version of HTTP (2.0 and Quic/3.0) reduce the visibility they produce to a worthless item. 
  • Flow and traffic data collection in a distributed network like a SD-WAN environment are far too costly for the visibility they can still provide (both for the capture device, to centralize data and to manage the resulting data)
  • SD-WAN relies on 3rd party networks where the devices cannot be polled. 
  • Traditional synthetic NPM capabilities measure end to end circuits, often requiring two end instrumentation, fail to deliver as: 
    • SaaS/Cloud side instrumentation is often impossible. 
    • End to end measurement provides no indication of the origin of the degradation on the network path and the actions that can be taken to fix it.

 

Finally, they do not either align with the context of cloud-based services and SaaS application infrastructure monitoring: SaaS apps are by nature distributed to fit the needs of their global customer base. Traditional NPM measures the performance of users to servers which are located in a limited number of locations (usually datacenters). They struggle to adapt to the structure of SaaS applications. 

SaaS applications rely on:  

  • a massive variety of hostnames: most SaaS applications use a very distributed cloud platform (using CDN, third party services, API servers, app servers) as well as other services plugged-in by their customers (example integration with other SaaS applications or API calls to in-house applications).

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column width=”1/1″][vc_single_image media=”4821″ media_width_percent=”100″ uncode_shortcode_id=”640970″][/vc_column][/vc_row][vc_row][vc_column width=”1/1″][vc_column_text uncode_shortcode_id=”158841″]

  • Users are redirected in many different ways depending on the geolocation of users. The DNS resolution for the same hostname is handled in a different way based on the location of users to redirect them to the closest host for that service. This can of course be the source of errors (e.g. the IP address of the user is not correctly interpreted and the user ends up being connected to a host which stands far from their actual location. Providers also offer very different coverage levels depending on the users’ location)

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column width=”1/1″][vc_single_image media=”4822″ media_width_percent=”100″ uncode_shortcode_id=”190653″][/vc_column][/vc_row][vc_row][vc_column width=”1/1″][vc_column_text uncode_shortcode_id=”294744″]

  • Many network routes between users and the different parts of the SaaS platform 

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column width=”1/1″][vc_single_image media=”4823″ media_width_percent=”100″ uncode_shortcode_id=”206482″][/vc_column][/vc_row][vc_row][vc_column width=”1/1″][vc_column_text uncode_shortcode_id=”727006″]

Best practices: what monitoring techniques contribute to a complete SD-WAN monitoring solution? 

 

Monitoring technique Visibility provided Use case Native SD-WAN Monitoring
App Synthetic Testing End to end application Monitoring

Baseline vs abnormal behavior

Proactive monitoring No
Network synthetic testing End to end network monitoring Proactive monitoring Yes
Path Monitoring Measure end to end network performance

Analyze DNS redirections

Hop by hop path visualization and metrics

Find routing, internal vs external / internet / SaaS / Cloud network issues

Proactive monitoring

Troubleshooting

No
End point derived Device performance

WiFi behavior

Application Transaction performance 

Troubleshooting No
Traffic analysis See app utilization  Proactive monitoring

Troubleshooting

Yes

 

Monitoring SD-WAN performance requires a combination of monitoring techniques. Whichever vendor you pick, you will need to complement their native monitoring capabilities to apply the best practices to maintain user performance to its best level. Traditional NPM solutions will not be the right complement to your native SD-WAN monitoring capabilities. Discover what are the details of the capabilities required to monitor your SD-WAN in this article.

Share this post

Newsletter

All our latest network monitoring and user experience stories and insights straight to your inbox.