How we handled a 40K IP DDoS attack on a tight budget



One happy morning we were notified that “The site is down”. It was about the application of one of our long standing partners, organization which we have full responsibility for the technical solution of. We immediately started looking at the issue and looking at our AWS infrastructure we saw that our servers were scaling up like crazy. We were doing ok lately and the traffic was going upwards day after day. Thus we initially thought the scale up was due to normal usage and assumed we made an error with our scale group setup (not that this was the first time we were scaling up, but you know you always assume the error is yours).

However, a couple of minutes after this downtime started we looked at the in-traffic and immediately it became obvious the volume was far from anything expected. The volume was up about a hundred times. We started considering the possibility of being subject to distributed denial of service attack (DDoS).

Up to this moment we were relying on AWS Shield Standard, which is included in ALB offering to protect us against DoS attacks.This seemed perfect as we are always situated in projects with restricted infrastructure budget. To be frank we had never experienced DDoS before that and did not know that Shield Standard is simply not protecting you against such a thing. We checked quickly and it turned out that Shield has an Advanced offering, that protects you against DDoS, but its cost is 3000 USD per month, far beyond our client’s budget. We also checked for alternative solutions. Obviously CloudFlare came up fast as an alternative solution. CloudFlare has a free offering for website protection, but CloudFlare DNS turned out not to support alias A records, something we needed in order to target our load balancer. No direct solution came out in our quick search, we needed to act differently.

It took us some time and analysis to determine conclusively that the attack traffic was coming from abroad (something we either way expected). We also know that the application of our partner was specifically geolocated toward Bulgarian customers so we did the easiest thing: we completely forbade international traffic to our system.

Yey! The first breakthrough! The application was up! Or was it? This application is a B2C solution that is especially focusing on user experience. So it has been long since we had it using server side rendering (SSR). However, when we stopped all international traffic we inadvertently blocked all requests from the SSR to the other system components. Thus the SSR was not really working. When we identified this, we whitelisted our machines to be able to still pass through the firewall (AWS WAF).

Now, with this the application was really up and running and we had a bit of air to breathe and analyze what had actually happened. We quickly identified when the attack was really initiated. After that we started analyzing our ALB logs (with Ruby scripts, as these tend to be very fast to write) and started finding patterns of the attack. Actually the attack was targeted towards one specific URI of the system, which was even unique to a single user, so it was extremely easy to identify all attacking requests. We thought this URI to be an unstable identifier (as the attacker could change the attack vector, something that they did in later stages of the attack). So we did not want this to be our sole firewall rule. However, when we analyzed the traffic we identified 159 ip subnetworks the attack was conducted from. We blacklisted them all on firewall level and this immediately blocked all adversary traffic. With this we were able to allow all international traffic too (initially with rate limits to guard against changing of attacking networks).

The interesting thing is that the attacker did not discontinue the attack for three more days. for about 72 hours the application was hit by more than 205 million requests, from over 40000 unique ips. It took us some time until we found the right approach to the situation, but we are happy that eventually we found a way to keep the website up without the need of thousands of dollars per month, thus keeping the infrastructure costs within limits.

We know that the attack was conducted by someone hostile to our solution. We believe this will not be their sole attempt of bringing the site down. However, with some reading and experience we believe we are better prepared.

The only sad conclusion from the whole exercise is that DDoS is exactly like real world war: it is a situation in which all sides are wasting all kinds of resources (in the case of war even worse: lives) without bringing any value to humanity. We would love it if our attacker realizes this and focuses his energy into devising ways to improve something around him rather than destroy.