Ban Autonomous Systems

4 months ago 1

More people have been working on blocking whole ranges of IP numbers, since that catches hosting providers that give bots access to the whole range they control. The bots switch IP numbers all the time so a filter based on IP numbers won’t catch them. But if we can determine their autonomous system number (ASN), we can not only block an IP number range, we can block all the IP number rangers the ASN controls.

Now, since these hosting providers also host nice things like other fediverse instances, I don’t want to block them forever. I want to block them for 10min, and if they continue after a few of these shorter blocks, I want to block them for a week. Hopefully, their clients have ended their Internet slurping and things are back to normal. This is how fail2ban works, but only for individual IP numbers.

I want code that bridges this gap.

#Butlerian Jihad

Where to start

fail2ban-bloc tries to guess (!) IP ranges and bans those using fail2ban. I need to investigate more.

I’m still fascinated by asncounter. It might even work without logfiles, using tcpdump! For now, it generates an interesting Top 10 list.

Working with asncounter

Here’s me looking at the last Apache log file, excluding my fedi instance:

awk '!/^social/ {print $2}' /var/log/apache2/access.log | asncounter INFO: using datfile ipasn_20250616.1200.dat.gz INFO: collecting addresses from <stdin> INFO: loading datfile /root/.cache/pyasn/ipasn_20250616.1200.dat.gz... INFO: finished reading data INFO: loading /root/.cache/pyasn/asnames.json count percent ASN AS 9264 9.49 29691 NINE, CH 6776 6.94 45899 VNPT-AS-VN VNPT Corp, VN 4207 4.31 7922 COMCAST-7922, US 3728 3.82 7018 ATT-INTERNET4, US 2193 2.25 24940 HETZNER-AS, DE 2015 2.06 13030 INIT7, CH 1802 1.85 396982 GOOGLE-CLOUD-PLATFORM, US 1470 1.51 701 UUNET, US 1364 1.4 136907 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK 1257 1.29 32934 FACEBOOK, US total: 97657 count percent prefix ASN AS 9260 9.48 178.209.32.0/19 29691 NINE, CH 1761 1.8 2601::/20 7922 COMCAST-7922, US 1542 1.58 212.51.144.0/20 13030 INIT7, CH 1305 1.34 2001:ee0:4f00::/42 45899 VNPT-AS-VN VNPT Corp, VN 1092 1.12 73.0.0.0/8 7922 COMCAST-7922, US 1080 1.11 57.141.6.0/24 32934 FACEBOOK, US 1079 1.1 99.88.0.0/13 7018 ATT-INTERNET4, US 1058 1.08 114.119.128.0/19 136907 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK 953 0.98 2600:1700::/28 7018 ATT-INTERNET4, US 938 0.96 2a01:4f9::/32 24940 HETZNER-AS, DE total: 97657

INIT7 is my Internet service provider at home and NINE is my hosting provider for the server. Better not ban those! 😅

So what is VNPT-AS-VN VNPT Corp doing? This could use better tool support!

grep '2001:ee0:4f' /var/log/apache2/access.log | awk '{print $8}' | sort | uniq -c | head 2 /c2-search?url=http%3A%2F%2Fwiki.c2.com%2F%3Fsearch%3D%22OpenSourceSecondLife%22 1 /cgi-bin/wiki.pl?ErcReplace 1 /cw-fr/BarneySock 1 /edit/2011-06-16_Session_Reports_Are_Read_Just_Once,_If_At_All 1 /edit/2019-03-15_Dungeon_Master%E2%80%99s_Handbook 1 /emacs/AcrobatReader 1 /emacs?action=admin;id=AssociationList 1 /emacs?action=admin&id=Comments_on_AdamShand 1 /emacs?action=admin&id=Comments_on_Categor%C3%ADaRegi%C3%B3n 1 /emacs?action=admin&id=Comments_on_nickat

OK, this is bots. Useless random URLs.

Ban all the networks managed by an ASN

I’m going to use ipset to use two lists, banlist and banlist6. I use these two for ban-cidr, too.

# Use hash:net because of the CIDR stuff ipset create banlist hash:net iptables -I INPUT -m set --match-set banlist src -j DROP iptables -I FORWARD -m set --match-set banlist src -j DROP ipset create banlist6 hash:net family inet6 ip6tables -I INPUT -m set --match-set banlist6 src -j DROP ip6tables -I FORWARD -m set --match-set banlist6 src -j DROP

To ban all the IP ranges an ASN manages, I created the following little fish function using ip.guide:

function asn-ban for asn in $argv for cidr in (curl -sL "https://ip.guide/as$asn" | jq --raw-output '.routes.v4[]') echo ipset add banlist $cidr end for cidr in (curl -sL "https://ip.guide/as$asn" | jq --raw-output '.routes.v6[]') echo ipset add banlist6 $cidr end end end

Let’s try it with the ASN 45899!

asn-ban 45899 | sh netfilter-persistent save

For more about netfilter-persistent save see the comments on 2025-01-23 The bots are at it again.

When I ran the asn-ban command above, I noticed that I got a single “it’s already added” response. Before adding the same numbers to my shell script, therefore:

for cidr in (asn-ban 45899|awk '{print $4}'); if grep -q $cidr bin/admin/ban-cidr; echo $cidr; end; end

That told me I had to remove 14.187.96.0/20 from my script. Once this is done:

echo (echo "#"; date --iso) >> bin/admin/ban-cidr asn-ban 45899 >> bin/admin/ban-cidr

I really need to figure out how to manage this smartly. And I need to figure out a way to unban the whole list!

Integration with fail2ban

Let’s start with fail2ban. I need a jail! Every jail needs a filter!

In /etc/fail2ban/jail.d/alex.conf (this is where I maintain all my jails) I added:

[butlerian-jihad] enabled = true bantime = 1d

Note that this jail doesn’t define log paths. I hope that works as intended.

I created a matching filter with no definition in /etc/fail2ban/filter.d/butlerian-jihad.conf:

# Author: Alex Schroeder <[email protected]> [Definition]

Reload it all, and check:

fail2ban-client reload OK fail2ban-client status Status |- Number of jail: 6 `- Jail list: alex-apache, alex-bots, butlerian-jihad, ngircd, recidive, sshd

Nice! So now I have a new jail.

Undo the banlist

asn-ban 45899 | sed 's/ipset add/ipset del/' | sh

I also manually edited my ban-cidr file to remove the lines I added above. Let’s have fail2ban handle this!

Switch from ipset to fail2ban-client

function asn-ban for asn in $argv set --local cidr (curl -sL "https://ip.guide/as$asn" | jq --raw-output '.routes.v4[],.routes.v6[]') echo fail2ban-client set butlerian-jihad banip $cidr end end

Examine it:

asn-ban 45899 | less

Run it:

asn-ban 45899 | sh 3640

If you messed up, clear the jail:

fail2ban-client reload --unban butlerian-jihad

Check the jail:

fail2ban-client get butlerian-jihad banned

Count the entries in the jail:

fail2ban-client get butlerian-jihad banned | sed 's/\'/"/g' | jq length 3640

What do we have?

With asncounter we have a tool to quickly discover if an ASN is providing services to a bot.

With asn-ban we have a tool to quickly add all the IP networks the ASN is managing to a jail for fail2ban.

The jail which we called butlerian-jihad bans the IP networks for a day.

What’s left to do?

I should check whether this actually works! Let’s see whether the ban gets lifted after 24h. That’s the main point of this exercise!

asn-ban uses the ip.guide site for the data. This should be rewritten such that it uses the same data as asncounter. I guess that would be pyasn. See below!

I need a cron job that runs every 10 minutes, takes the last ten minutes worth of Apache access log files, ignores the fedi subdomain, identifies all the ASNs, ignores my own ASNs and bans the rest.

Some bans

Wow, some of the autonomous systems are big. These are the ones I banned yesterday and today:

# AMAZON-02, US (18772!) asn-ban 16509|sh # VNPT-AS-VN VNPT Corp, VN (3640!) asn-ban 45899 | sh # TENCENT-NET-AP Shenzhen Tencent Computer Systems Company Limited, CN (2278!) asn-ban 45090|sh # ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN (852!) asn-ban 45102 | sh # FACEBOOK, US (541!) asn-ban 32934|sh # SEMRUSH-AS, CY (5!) asn-ban 209366|sh

Using pyasn data files from the command-line

How to determine the name of an autonomous system number:

jq --raw-output '.["32934"]' .cache/pyasn/asnames.json FACEBOOK, US

How to determine the networks for an ASN:

zgrep '209366$' .cache/pyasn/ipasn_20250616.1200.dat.gz | awk '{print $1}' 85.208.96.0/24 85.208.97.0/24 85.208.99.0/24 185.170.167.0/24 185.191.171.0/24

How to determine the ASN of a CIDR:

zgrep '^85\.208\.96\.0/24' .cache/pyasn/ipasn_20250616.1200.dat.gz | awk '{print $2}' 209366

ASN networks without an external service

asn-networks is a tiny script with a bunch of lines taken from asncounter to print the IP ranges managed by one or more autonomous systems.

python3 asn-ban 209366 185.170.167.0/24 185.191.171.0/24 85.208.96.0/24 85.208.97.0/24 85.208.99.0/24

It uses the pyasn datafiles that a regular run of asncounter has downloaded. That is to say, asn-networks does not download or refresh these files. I’m assuming that you have run asncounter just moments earlier.

Given this script, we can now call fail2ban-client as follows (I use fish) to ban all the networks:

fail2ban-client set butlerian-jihad banip (asn-networks 209366) 5

Unbanning works the same way:

fail2ban-client set butlerian-jihad unbanip (asn-networks 209366) 5

Remember that fail2ban-client prints the number of IP numbers or ranges added or removed.

Identifying suspicious ASN

What is suspicious activity? How about this: In a 2h window, no ASN should send more than 1000 requests? So we need a script that filters the log files and prints a 2h window, skipping the lines we want to ignore: 2h-access-log. Then pass the IP numbers to asncounter, throw away all the things we don’t care about and just print the appropriate lines:

bin/2h-access-log !^social \ | awk '{print $2}' \ | bin/asncounter --no-prefixes 2>/dev/null \ | awk '/^[0-9]/ && $1>1000 { print }' 3062 31.93 24940 HETZNER-AS, DE 1642 17.12 16276 OVH, FR

So do I dare ban those numbers?? I’m not sure! I should figure out a way to find those 3062 requests made by services hosted on Hetzner.

asn-access-log does just that. You pass it an ASN, it determines all the networks it manages and then it filters standard input, assuming that it consists of Apache access log lines (what counts is that the second field is an IP number).

bin/2h-access-log !^social | bin/asn-access-log 24940

I see a lot of RSS services (NewsBlur, fiperbot, MyNewspaper Agent, FreshRSS), git, some bot (from the 159.69.0.0/16 range, for example), and on and on. Ugh. It’s not easy to know what to do!

I think the best answer would be to lower the stakes but also ban for shorter amounts of time and let fail2ban handle the rest. The only thing I need to consider is whether I find the current amount of resources spent OK. Do I? Let’s look at the latest numbers.

This here shows that fedi traffic is 60% Hetzner and OVH. This makes it hard for me to block these autonomous systems.

bin/2h-access-log ^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes INFO: using datfile ipasn_20250616.1200.dat.gz INFO: collecting addresses from <stdin> INFO: loading datfile /root/.cache/pyasn/ipasn_20250616.1200.dat.gz... INFO: finished reading data INFO: loading /root/.cache/pyasn/asnames.json count percent ASN AS 2148 45.36 24940 HETZNER-AS, DE 738 15.59 16276 OVH, FR 273 5.77 14061 DIGITALOCEAN-ASN, US 202 4.27 14361 HOPONE-GLOBAL, US 195 4.12 15796 SALT-, CH 105 2.22 214640 HOSTUP HOSTUP, SE 102 2.15 63949 AKAMAI-LINODE-AP Akamai Connected Cloud, SG 62 1.31 47692 NESSUS, AT 59 1.25 197540 NETCUP-AS netcup GmbH, DE 50 1.06 44684 MYTHIC Mythic Beasts Ltd, GB total: 4735

What’s the situation without fedi traffic, keeping in mind that I will most likely not be able to block fedi hosters?

bin/2h-access-log !^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes INFO: using datfile ipasn_20250616.1200.dat.gz INFO: collecting addresses from <stdin> INFO: loading datfile /root/.cache/pyasn/ipasn_20250616.1200.dat.gz... INFO: finished reading data INFO: loading /root/.cache/pyasn/asnames.json count percent ASN AS 249 5.47 7922 COMCAST-7922, US 189 4.16 9808 CHINAMOBILE-CN China Mobile Communications Group Co., Ltd., CN 129 2.84 7018 ATT-INTERNET4, US 122 2.68 396982 GOOGLE-CLOUD-PLATFORM, US 118 2.59 24940 HETZNER-AS, DE 96 2.11 55836 RELIANCEJIO-IN Reliance Jio Infocomm Limited, IN 96 2.11 56046 CMNET-JIANGSU-AP China Mobile communications corporation, CN 75 1.65 140061 CHINANET-QINGHAI-AS-AP Qinghai Telecom, CN 73 1.61 4837 CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN 70 1.54 701 UUNET, US total: 4548

The autonomous systems that show up in the second list but not in the first list are my prime candidates, like COMCAST and CHINAMOBILE-CN.

So how about going after the autonomous systems on the second list that produce more than 1000 hits in a 2h period.

Something like this? I’m going to but this into /etc/cron.daily/butlerian-jihad

#!/bin/sh bin/2h-access-log !^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes 2>/dev/null \ | awk '/^[0-9]/ && $1>1000 { print $3 }' \ | xargs bin/asn-networks \ | ifne xargs echo fail2ban-client set butlerian-jihad banip

I use ifne to prevent the execution of the last command if there is no output. Thanks, @acdw!

Summary

/etc/cron.daily/butlerian-jihad runs every hour and checks if there have been any abusive autonomous systems in the last two hours. If so, they are banned.

2h-access-log prints the last two hours worth of log lines from /var/log/apache2/access.log (and access.log.1 if necessary).

The !^social argument ensures that connecting to my fedi server doesn’t trigger the ban hammer.

The !178.209.50.237 argument ensures that I don’t ban the server itself as it monitors stuff and as I test things on the server. I might have to add my home IP numbers. We’ll see!

asncounter finds the autonomous system numbers for all the IP numbers in the web server log file and prints a report.

asn-networks then turns the selected autonomous system numbers and returns the IP ranges they manage.

These are then banned by fail2ban-client using the butlerian-jihad jail.

The butlerian-jihad jail is mentioned in enabled via a config file in /etc/fail2ban/jail.d/. In my case, the file is called alex.conf and for this jail, it says:

[butlerian-jihad] enabled = true bantime = 1h

The jail also needs a filter definition even though no filtering happens as no logfile is checked. My /etc/fail2ban/filter.d/butlerian-jihad.conf contains just this:

# Author: Alex Schroeder <[email protected]> [Definition]

What this means is that every hour, an autonomous system unit can get banned. If they are banned, they are banned for 1h. If they are banned for activity in the last hour leading up to the ban, the script will find the same log entries and ban them “again”. This results in no changes in the jail, since all the networks are already in the butlerian-jihad jail.

The bans themselves are reported in /etc/log/fail2ban.log.

I’ve also enabled the recidive jail. That is, in the same file where I defined my butlerian-jihad jail, I have:

[recidive] enabled = true

The defaults are in /etc/fail2ban/jail.conf:

[recidive] logpath = /var/log/fail2ban.log banaction = %(banaction_allports)s bantime = 1w findtime = 1d

So if some network is banned for more than five times in a day, it is banned for a week. I say five times because maxretry is set to 5 in /etc/fail2ban/jail.conf.

Let’s assume a scraper is started from some network managed by an autonomous system. It starts using IP numbers from all its ranges. It sends 600 requests per hour, more than a human could read and more than a feed reader should need, etc.

after the first hour, nothing happens, as 600 is less than the 1000 needed to trigger the system
after the second hour, the ASN is banned because the sum total for the last two hours is 1200
after the third hour, the ASN is unbanned and not banned again because it only made 600 requests in the second hour
after the fourth hour, the ASN is banned again (1200 requests)
after the fifth hour, the ASN is unbanned
after the sixth hour, the ASN is banned for the third time (1200 requests)
after the seventh hour, the ASN is unbanned
after the eighth hour, the ASN is banned for the fourth time (1200 requests)
after the ninth hour, the ASN is unbanned
after the tenth hour, the ASN is banned for the fifth time (1200 requests)
after the eleventh hour, the ASN is unbanned
after the twelfth hour, the ASN is banned for the sixth time, the recidive filter kicks in and the networks belonging to the ASN are banned for a week

This escalation takes twelve hours. The ASN was already banned for half this time.

Assuming this repeats every week, it means that the pattern repeats every 7½ weeks and the abusive ASN still gets service on 6h out of 180h or 3% of the time. For my taste, that is still way too nice.

Let’s see how this goes for a while.

I’m already looking forward to dropping my banlist and banlist6 sets I created for ban-cidr.

Read Entire Article