These attacks rely on a networks of bot (commonly thousands of compromised machines connected to the Internet). Each of these bots contribute to saturate your digital platform (at the network, load balancer and server levels) and make your service unavailable. The fact that these attacks are distributed across thousands of devices spread across a multitude of operator networks make their mitigation a difficult task (if you want to dig further into the nature of DDoS: https://en.wikipedia.org/wiki/DDoS_mitigation). Most organizations relying on digital services / web applications to run their business subscribe to DDoS protection services. The question is to understand how these DDoS protections influence the network path and how fast their users access their digital platform. Every customer of these services need to evaluate the performance impact of DDoS protections for their users.
Most world-scale cloud / CDN providers have their own DDoS mitigation offerings:
- Google → Projectshield
- CloudFlare → CloudFlare DDoS
- Amazon Web Services → AWS Shield
- Microsoft Azure → Azure DDoS Protection
- Verisign → Verisign DDoS Protection
If you are looking for a good comparison of these services, please take a look at this article.
How DDoS protections work
All of the DDoS protection services listed above rely on massive scale cloud infrastructure and especially distributed network presence. This enables them to remain unaffected by massive quantities of DDoS traffic they have to filter out.
From this list, the first distinction to make is to isolate:
- the cloud service providers offering DDoS protection for the workloads hosted in their cloud platform,
- the providers who offer that service for workloads independently of their location (e.g. in your datacenter).
Finally they all aim at filtering out botnet traffic by a series of heuristics powered by network inspections, proxies, etc. They perform that filtering through different means:
- Acting as a reverse proxy like Projectshield
- Redirecting traffic to the DDoS protection platform by changing your DNS records
- Having the DDoS protection providers announcing your IP prefixes in their BGP policies to redirect your traffic through them.
All of these methods consist in rerouting traffic that would normally go from your users’ network / autonomous systems to your platform directly through your provider’s platform to deliver a “clean” traffic to your platform.
This DDoS protection may be active 365 days a year or on demand.
What’s the performance impact of a DDoS protection?
In case of DDoS your performance will be protected, but what is the performance impact in normal conditions?
The very first thing to understand is that the path from your users to your platform will be affected, either for all users or for certain users at certain times.
Let’s take an example of a DDoS protection provided by CloudFlare through BGP routing. In this case, CloudFlare announces their customer’s prefixes in their BGP policy. This is done 365 days a year and the CloudFlare route may be preferred in case of DDoS to avoid flooding the customer’s infrastructure with DDoS traffic.
In the screenshot below, we see the route taken from one of Kadiska’s Stations to the customer’s platform. We can identify multiple paths to go from one end to another.
We see two main routes from our Station in Tokyo to this customer’s platform:
- the route goes out from a first provider which takes the decision to route all the traffic through iD3.net.
- iD3.net then in most cases routes the traffic through its own network straight to the customer’s autonomous system (mostly likely that AS peers directly with the customer’s AS), but in some cases can decide to route through CloudFlare’s infrastructure.
The most important fact to retain from this diagram is that the routing decisions are driven by the BGP policy of the operator / AS located on your users’ end (left) which will make routing decisions based on performance (shortest path) and economics (cost of transit / peering arrangements). If you need some clarification on how BGP manages this, please refer to this article.
What are the key performance questions to answer?
- First of all, you need to understand where your users are located and through which operator or AS they connect.
- Second question on your list: where are your points of presence (or the ones of your cloud / hosting providers)? How far from a latency and number of hops standpoint they stand from your users’ operators.
- Third question on your list: where are your DDoS vendors points of presence and how close are they to your users?
- Finally, how do the most important AS on your users’ end route the traffic to the different elements in your platform?
This will tell you whether you route your traffic in normal conditions directly to your AS or through your DDoS protection vendor and whether it translates into a performance loss or gain (more or less latency, packet loss and hops).
Depending on the route taken, latency will be higher or lower, the number of hops larger or smaller. Packet loss will also vary.
What we see hereinabove is the evolution over 2 weeks, but you need to consider that all of this is dynamic and driven by multiple factors and is subject to frequent change:
- Each operator’s BGP policy on the left hand
- The evolution of the network path (unavailability, congestion, events) affecting all possible routes which are evaluated.
What can you do to optimize the reachability of your platform?
Monitoring data is only as useful as it is actionable! Obviously your BGP policy impacts how you route traffic from your autonomous system to the rest of the internet.
As we focus on incoming traffic, the key question here is how can you affect the route taken to reach your own platform. It is easy to understand that the further an AS sits from your own AS, the smaller is your ability to act on it.
The only actions you can take are:
- Amending peering and transit arrangements (to avoid advertising routes from / to your AS through AS that deliver poor performance).
- Asking your transit providers to act in the same way and avoid poorly performing routes.
As an example (see figure below), you may want to avoid the route from Cogent to CloudFlare (leading to the customer’s AS) as it shows high packet loss rates and prefer other routes (including through the same tier 1 but through a different path).
If you would like to take action and start optimizing the reachability of your platform (going through a DDoS protection service or not), I suggest you take a look at this article.