More people have been working on blocking whole ranges of IP numbers, since that catches hosting providers that give bots access to the whole range they control. The bots switch IP numbers all the time so a filter based on IP numbers won’t catch them. But if we can determine their autonomous system number (ASN), we can not only block an IP number range, we can block all the IP number rangers the ASN controls.
Now, since these hosting providers also host nice things like other fediverse instances, I don’t want to block them forever. I want to block them for 10min, and if they continue after a few of these shorter blocks, I want to block them for a week. Hopefully, their clients have ended their Internet slurping and things are back to normal. This is how fail2ban works, but only for individual IP numbers.
I want code that bridges this gap.
Where to start
fail2ban-bloc tries to guess (!) IP ranges and bans those using fail2ban. I need to investigate more.
I’m still fascinated by asncounter. It might even work without logfiles, using tcpdump! For now, it generates an interesting Top 10 list.
Working with asncounter
Here’s me looking at the last Apache log file, excluding my fedi instance:
awk '!/^social/ {print $2}' /var/log/apache2/access.log | asncounter INFO: using datfile ipasn_20250616.1200.dat.gz INFO: collecting addresses from <stdin> INFO: loading datfile /root/.cache/pyasn/ipasn_20250616.1200.dat.gz... INFO: finished reading data INFO: loading /root/.cache/pyasn/asnames.json count percent ASN AS 9264 9.49 29691 NINE, CH 6776 6.94 45899 VNPT-AS-VN VNPT Corp, VN 4207 4.31 7922 COMCAST-7922, US 3728 3.82 7018 ATT-INTERNET4, US 2193 2.25 24940 HETZNER-AS, DE 2015 2.06 13030 INIT7, CH 1802 1.85 396982 GOOGLE-CLOUD-PLATFORM, US 1470 1.51 701 UUNET, US 1364 1.4 136907 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK 1257 1.29 32934 FACEBOOK, US total: 97657 count percent prefix ASN AS 9260 9.48 178.209.32.0/19 29691 NINE, CH 1761 1.8 2601::/20 7922 COMCAST-7922, US 1542 1.58 212.51.144.0/20 13030 INIT7, CH 1305 1.34 2001:ee0:4f00::/42 45899 VNPT-AS-VN VNPT Corp, VN 1092 1.12 73.0.0.0/8 7922 COMCAST-7922, US 1080 1.11 57.141.6.0/24 32934 FACEBOOK, US 1079 1.1 99.88.0.0/13 7018 ATT-INTERNET4, US 1058 1.08 114.119.128.0/19 136907 HWCLOUDS-AS-AP HUAWEI CLOUDS, HK 953 0.98 2600:1700::/28 7018 ATT-INTERNET4, US 938 0.96 2a01:4f9::/32 24940 HETZNER-AS, DE total: 97657INIT7 is my Internet service provider at home and NINE is my hosting provider for the server. Better not ban those! 😅
So what is VNPT-AS-VN VNPT Corp doing? This could use better tool support!
grep '2001:ee0:4f' /var/log/apache2/access.log | awk '{print $8}' | sort | uniq -c | head 2 /c2-search?url=http%3A%2F%2Fwiki.c2.com%2F%3Fsearch%3D%22OpenSourceSecondLife%22 1 /cgi-bin/wiki.pl?ErcReplace 1 /cw-fr/BarneySock 1 /edit/2011-06-16_Session_Reports_Are_Read_Just_Once,_If_At_All 1 /edit/2019-03-15_Dungeon_Master%E2%80%99s_Handbook 1 /emacs/AcrobatReader 1 /emacs?action=admin;id=AssociationList 1 /emacs?action=admin&id=Comments_on_AdamShand 1 /emacs?action=admin&id=Comments_on_Categor%C3%ADaRegi%C3%B3n 1 /emacs?action=admin&id=Comments_on_nickatOK, this is bots. Useless random URLs.
Ban all the networks managed by an ASN
I’m going to use ipset to use two lists, banlist and banlist6. I use these two for ban-cidr, too.
# Use hash:net because of the CIDR stuff ipset create banlist hash:net iptables -I INPUT -m set --match-set banlist src -j DROP iptables -I FORWARD -m set --match-set banlist src -j DROP ipset create banlist6 hash:net family inet6 ip6tables -I INPUT -m set --match-set banlist6 src -j DROP ip6tables -I FORWARD -m set --match-set banlist6 src -j DROPTo ban all the IP ranges an ASN manages, I created the following little fish function using ip.guide:
function asn-ban for asn in $argv for cidr in (curl -sL "https://ip.guide/as$asn" | jq --raw-output '.routes.v4[]') echo ipset add banlist $cidr end for cidr in (curl -sL "https://ip.guide/as$asn" | jq --raw-output '.routes.v6[]') echo ipset add banlist6 $cidr end end endLet’s try it with the ASN 45899!
asn-ban 45899 | sh netfilter-persistent saveFor more about netfilter-persistent save see the comments on 2025-01-23 The bots are at it again.
When I ran the asn-ban command above, I noticed that I got a single “it’s already added” response. Before adding the same numbers to my shell script, therefore:
for cidr in (asn-ban 45899|awk '{print $4}'); if grep -q $cidr bin/admin/ban-cidr; echo $cidr; end; endThat told me I had to remove 14.187.96.0/20 from my script. Once this is done:
echo (echo "#"; date --iso) >> bin/admin/ban-cidr asn-ban 45899 >> bin/admin/ban-cidrI really need to figure out how to manage this smartly. And I need to figure out a way to unban the whole list!
Integration with fail2ban
Let’s start with fail2ban. I need a jail! Every jail needs a filter!
In /etc/fail2ban/jail.d/alex.conf (this is where I maintain all my jails) I added:
[butlerian-jihad] enabled = true bantime = 1dNote that this jail doesn’t define log paths. I hope that works as intended.
I created a matching filter with no definition in /etc/fail2ban/filter.d/butlerian-jihad.conf:
# Author: Alex Schroeder <[email protected]> [Definition]Reload it all, and check:
fail2ban-client reload OK fail2ban-client status Status |- Number of jail: 6 `- Jail list: alex-apache, alex-bots, butlerian-jihad, ngircd, recidive, sshdNice! So now I have a new jail.
Undo the banlist
asn-ban 45899 | sed 's/ipset add/ipset del/' | shI also manually edited my ban-cidr file to remove the lines I added above. Let’s have fail2ban handle this!
Switch from ipset to fail2ban-client
function asn-ban for asn in $argv set --local cidr (curl -sL "https://ip.guide/as$asn" | jq --raw-output '.routes.v4[],.routes.v6[]') echo fail2ban-client set butlerian-jihad banip $cidr end endExamine it:
asn-ban 45899 | lessRun it:
asn-ban 45899 | sh 3640If you messed up, clear the jail:
fail2ban-client reload --unban butlerian-jihadCheck the jail:
fail2ban-client get butlerian-jihad bannedCount the entries in the jail:
fail2ban-client get butlerian-jihad banned | sed 's/\'/"/g' | jq length 3640What do we have?
With asncounter we have a tool to quickly discover if an ASN is providing services to a bot.
With asn-ban we have a tool to quickly add all the IP networks the ASN is managing to a jail for fail2ban.
The jail which we called butlerian-jihad bans the IP networks for a day.
What’s left to do?
I should check whether this actually works! Let’s see whether the ban gets lifted after 24h. That’s the main point of this exercise!
asn-ban uses the ip.guide site for the data. This should be rewritten such that it uses the same data as asncounter. I guess that would be pyasn. See below!
I need a cron job that runs every 10 minutes, takes the last ten minutes worth of Apache access log files, ignores the fedi subdomain, identifies all the ASNs, ignores my own ASNs and bans the rest.
Some bans
Wow, some of the autonomous systems are big. These are the ones I banned yesterday and today:
# AMAZON-02, US (18772!) asn-ban 16509|sh # VNPT-AS-VN VNPT Corp, VN (3640!) asn-ban 45899 | sh # TENCENT-NET-AP Shenzhen Tencent Computer Systems Company Limited, CN (2278!) asn-ban 45090|sh # ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN (852!) asn-ban 45102 | sh # FACEBOOK, US (541!) asn-ban 32934|sh # SEMRUSH-AS, CY (5!) asn-ban 209366|shUsing pyasn data files from the command-line
How to determine the name of an autonomous system number:
jq --raw-output '.["32934"]' .cache/pyasn/asnames.json FACEBOOK, USHow to determine the networks for an ASN:
zgrep '209366$' .cache/pyasn/ipasn_20250616.1200.dat.gz | awk '{print $1}' 85.208.96.0/24 85.208.97.0/24 85.208.99.0/24 185.170.167.0/24 185.191.171.0/24How to determine the ASN of a CIDR:
zgrep '^85\.208\.96\.0/24' .cache/pyasn/ipasn_20250616.1200.dat.gz | awk '{print $2}' 209366ASN networks without an external service
asn-networks is a tiny script with a bunch of lines taken from asncounter to print the IP ranges managed by one or more autonomous systems.
python3 asn-ban 209366 185.170.167.0/24 185.191.171.0/24 85.208.96.0/24 85.208.97.0/24 85.208.99.0/24It uses the pyasn datafiles that a regular run of asncounter has downloaded. That is to say, asn-networks does not download or refresh these files. I’m assuming that you have run asncounter just moments earlier.
Given this script, we can now call fail2ban-client as follows (I use fish) to ban all the networks:
fail2ban-client set butlerian-jihad banip (asn-networks 209366) 5Unbanning works the same way:
fail2ban-client set butlerian-jihad unbanip (asn-networks 209366) 5Remember that fail2ban-client prints the number of IP numbers or ranges added or removed.
Identifying suspicious ASN
What is suspicious activity? How about this: In a 2h window, no ASN should send more than 1000 requests? So we need a script that filters the log files and prints a 2h window, skipping the lines we want to ignore: 2h-access-log. Then pass the IP numbers to asncounter, throw away all the things we don’t care about and just print the appropriate lines:
bin/2h-access-log !^social \ | awk '{print $2}' \ | bin/asncounter --no-prefixes 2>/dev/null \ | awk '/^[0-9]/ && $1>1000 { print }' 3062 31.93 24940 HETZNER-AS, DE 1642 17.12 16276 OVH, FRSo do I dare ban those numbers?? I’m not sure! I should figure out a way to find those 3062 requests made by services hosted on Hetzner.
asn-access-log does just that. You pass it an ASN, it determines all the networks it manages and then it filters standard input, assuming that it consists of Apache access log lines (what counts is that the second field is an IP number).
bin/2h-access-log !^social | bin/asn-access-log 24940I see a lot of RSS services (NewsBlur, fiperbot, MyNewspaper Agent, FreshRSS), git, some bot (from the 159.69.0.0/16 range, for example), and on and on. Ugh. It’s not easy to know what to do!
I think the best answer would be to lower the stakes but also ban for shorter amounts of time and let fail2ban handle the rest. The only thing I need to consider is whether I find the current amount of resources spent OK. Do I? Let’s look at the latest numbers.
This here shows that fedi traffic is 60% Hetzner and OVH. This makes it hard for me to block these autonomous systems.
bin/2h-access-log ^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes INFO: using datfile ipasn_20250616.1200.dat.gz INFO: collecting addresses from <stdin> INFO: loading datfile /root/.cache/pyasn/ipasn_20250616.1200.dat.gz... INFO: finished reading data INFO: loading /root/.cache/pyasn/asnames.json count percent ASN AS 2148 45.36 24940 HETZNER-AS, DE 738 15.59 16276 OVH, FR 273 5.77 14061 DIGITALOCEAN-ASN, US 202 4.27 14361 HOPONE-GLOBAL, US 195 4.12 15796 SALT-, CH 105 2.22 214640 HOSTUP HOSTUP, SE 102 2.15 63949 AKAMAI-LINODE-AP Akamai Connected Cloud, SG 62 1.31 47692 NESSUS, AT 59 1.25 197540 NETCUP-AS netcup GmbH, DE 50 1.06 44684 MYTHIC Mythic Beasts Ltd, GB total: 4735What’s the situation without fedi traffic, keeping in mind that I will most likely not be able to block fedi hosters?
bin/2h-access-log !^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes INFO: using datfile ipasn_20250616.1200.dat.gz INFO: collecting addresses from <stdin> INFO: loading datfile /root/.cache/pyasn/ipasn_20250616.1200.dat.gz... INFO: finished reading data INFO: loading /root/.cache/pyasn/asnames.json count percent ASN AS 249 5.47 7922 COMCAST-7922, US 189 4.16 9808 CHINAMOBILE-CN China Mobile Communications Group Co., Ltd., CN 129 2.84 7018 ATT-INTERNET4, US 122 2.68 396982 GOOGLE-CLOUD-PLATFORM, US 118 2.59 24940 HETZNER-AS, DE 96 2.11 55836 RELIANCEJIO-IN Reliance Jio Infocomm Limited, IN 96 2.11 56046 CMNET-JIANGSU-AP China Mobile communications corporation, CN 75 1.65 140061 CHINANET-QINGHAI-AS-AP Qinghai Telecom, CN 73 1.61 4837 CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN 70 1.54 701 UUNET, US total: 4548The autonomous systems that show up in the second list but not in the first list are my prime candidates, like COMCAST and CHINAMOBILE-CN.
So how about going after the autonomous systems on the second list that produce more than 1000 hits in a 2h period.
Something like this? I’m going to but this into /etc/cron.daily/butlerian-jihad
#!/bin/sh bin/2h-access-log !^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes 2>/dev/null \ | awk '/^[0-9]/ && $1>1000 { print $3 }' \ | xargs bin/asn-networks \ | ifne xargs echo fail2ban-client set butlerian-jihad banipI use ifne to prevent the execution of the last command if there is no output. Thanks, @acdw!
Summary
/etc/cron.daily/butlerian-jihad runs every hour and checks if there have been any abusive autonomous systems in the last two hours. If so, they are banned.
#!/bin/sh bin/2h-access-log !^social !178.209.50.237 \ | awk '{print $2}' \ | bin/asncounter --no-prefixes 2>/dev/null \ | awk '/^[0-9]/ && $1>1000 { print $3 }' \ | xargs bin/asn-networks \ | ifne xargs echo fail2ban-client set butlerian-jihad banip2h-access-log prints the last two hours worth of log lines from /var/log/apache2/access.log (and access.log.1 if necessary).
The !^social argument ensures that connecting to my fedi server doesn’t trigger the ban hammer.
The !178.209.50.237 argument ensures that I don’t ban the server itself as it monitors stuff and as I test things on the server. I might have to add my home IP numbers. We’ll see!
asncounter finds the autonomous system numbers for all the IP numbers in the web server log file and prints a report.
asn-networks then turns the selected autonomous system numbers and returns the IP ranges they manage.
These are then banned by fail2ban-client using the butlerian-jihad jail.
The butlerian-jihad jail is mentioned in enabled via a config file in /etc/fail2ban/jail.d/. In my case, the file is called alex.conf and for this jail, it says:
[butlerian-jihad] enabled = true bantime = 1hThe jail also needs a filter definition even though no filtering happens as no logfile is checked. My /etc/fail2ban/filter.d/butlerian-jihad.conf contains just this:
# Author: Alex Schroeder <[email protected]> [Definition]What this means is that every hour, an autonomous system unit can get banned. If they are banned, they are banned for 1h. If they are banned for activity in the last hour leading up to the ban, the script will find the same log entries and ban them “again”. This results in no changes in the jail, since all the networks are already in the butlerian-jihad jail.
The bans themselves are reported in /etc/log/fail2ban.log.
I’ve also enabled the recidive jail. That is, in the same file where I defined my butlerian-jihad jail, I have:
[recidive] enabled = trueThe defaults are in /etc/fail2ban/jail.conf:
[recidive] logpath = /var/log/fail2ban.log banaction = %(banaction_allports)s bantime = 1w findtime = 1dSo if some network is banned for more than five times in a day, it is banned for a week. I say five times because maxretry is set to 5 in /etc/fail2ban/jail.conf.
Let’s assume a scraper is started from some network managed by an autonomous system. It starts using IP numbers from all its ranges. It sends 600 requests per hour, more than a human could read and more than a feed reader should need, etc.
- after the first hour, nothing happens, as 600 is less than the 1000 needed to trigger the system
- after the second hour, the ASN is banned because the sum total for the last two hours is 1200
- after the third hour, the ASN is unbanned and not banned again because it only made 600 requests in the second hour
- after the fourth hour, the ASN is banned again (1200 requests)
- after the fifth hour, the ASN is unbanned
- after the sixth hour, the ASN is banned for the third time (1200 requests)
- after the seventh hour, the ASN is unbanned
- after the eighth hour, the ASN is banned for the fourth time (1200 requests)
- after the ninth hour, the ASN is unbanned
- after the tenth hour, the ASN is banned for the fifth time (1200 requests)
- after the eleventh hour, the ASN is unbanned
- after the twelfth hour, the ASN is banned for the sixth time, the recidive filter kicks in and the networks belonging to the ASN are banned for a week
This escalation takes twelve hours. The ASN was already banned for half this time.
Assuming this repeats every week, it means that the pattern repeats every 7½ weeks and the abusive ASN still gets service on 6h out of 180h or 3% of the time. For my taste, that is still way too nice.
Let’s see how this goes for a while.
I’m already looking forward to dropping my banlist and banlist6 sets I created for ban-cidr.
.png)
