Botnet attack against https://fossil-scm.org

(1) By Richard Hipp (drh) on 2022-08-28 10:45:42 [link] [source]

Over the past few hours, there have been over 25,000 HTTP requests https://fossil-scm.org/home/vinfo/9d3560b885c2a091. It takes Fossil over 1 second to render that page. So this is putting a serious load on the (wimpy) server that hosts this forum.

Over 150 diverse IP addresses are involved in this attack. Over 22,000 distinct User-Agent strings are used, all intended to look like an ordinary browser. So, we know that the robot is lying about who it is. Often two or three requests for the same page will occur from the same IP address within a second or two. Each will have a different User-Agent string.

Fossil makes no attempt to cache the page in question. I'm not sure why. Maybe that is a bug. I'll have to look into it. But because the "no-cache" header is included in the reply, using something like CloudFlare will not help to mitigate this attack, I don't believe.

So because of the large number of IP addresses involved, I cannot easily null-route the attacker. And I cannot filter based on User-Agent. And I cannot reduce the CPU load by caching (at least not until I figure out why the page is marked as "no-cache").

Any other suggestions on how I might mitigate this attack?

(2) By anonymous on 2022-08-28 11:20:44 in reply to 1 [link] [source]

Same page frequency convert into lower respone time? Could that help?

(3) By bohwaz on 2022-08-28 12:09:02 in reply to 1 [link] [source]

Maybe just return a 403 (or other error message) unless the client is logged-in as anonymous?

(10) By sean (jungleboogie) on 2022-08-28 23:58:11 in reply to 3 [link] [source]

This seems like the simplest to implement and enable. It could even be a cli config,
so if there's problems logging into the web interface, it could be done over ssh.

Additionally, if the CPU usage exceeds X over Y period of time, automatically enable
this setting.

(4.1) By John Rouillard (rouilj) on 2022-08-28 16:34:59 edited from 4.0 in reply to 1 [link] [source]

Is fossil running under althttpd, or is it in server mode?

If under althttpd can you redirect /home/vinfo/9d3560b885c2a091 to a static file?

Since my reading of the althttpd info says that althttpd just takes a root directory and has no explicit path filter/handling, maybe replacing the fossil CGI file with a shell script that looks at PATH_INFO and returns a static file content (or a 404/403/429 error) for that url path would help?

Replacing the fossil cgi with something like this (lightly tested):

#!/bin/sh
# set these to prevent mischief
PATH=/bin:/usr/bin
unset IFS
unset LD_LIBRARY_PATH

## CONFIGURE
# set to length of static file for vinfo/9d3560b885c2a091
CONTENT_LENGTH=2976691
# get nonce from saved html file
CONTENT_NONCE=nonce-9291fc6a32bc6f5161859d0d9c099635c0ad094c1ef0e6a6

case "$PATH_INFO" in
  */vinfo/9d3560b885c2a091)

    ## output headers and blank line
    # Disabled, but change to long cache time if needed
    #   printf "Cache-control: no-cache\r\n"
    # Just make it all go away
    #     printf "Status: 404 Not Found\r\n\r\nNot Found\n"; exit 0
    # note set the nonce to match the page's script nonce
    #   otherwise page won't work right
    printf "Content-Security-Policy: default-src 'self' data:; script-src 'self' '$CONTENT_NONCE'; style-src 'self' 'unsafe-inline'; img-src * data:\r\n"
     printf "X-Frame-Options: SAMEORIGIN\r\n"
     printf "Content-Type: text/html; charset=utf-8\r\n"
     printf "Content-length: $CONTENT_LENGTH\r\n"
     # end of headers blank line
     printf "\r\n"

     # tarpit if you want with
     # sleep 10

     # output contents - cat is 100X faster than using a shell built-in
     # while read loop
     exec cat /tmp/9d3560b885c2a091

     # use shell built-ins
     # read strips nl so we (slowly) add it back in printf.
     #while IFS= read -r line ; do
     #   printf "%s\n" "$line"
     #done < /tmp/9d3560b885c2a091

     exit 0 ;;

  *) exec $0.real.exec ;;

esac

# not reached - generate error if we get here
exit 1

Basically, if the path doesn't match the problem path, exec the real fossil.cgi renamed to fossil.real.exec. If it does match the problem path send the file captured using curl -o /tmp/9d3560b885c2a091 .../vinfo/....

IIRC you run chrooted. So I gave an example using shell built-ins for everything. Be forewarned it's slow.

Hopefully, you have a simple shell (ash, dash) since you can run cgi scripts under fossil.

The shell wrapper will slow down all fossil requests, but AFAICT this would be the easiest way to handle the issue.

You could also sleep 10 before returning the file contents to try to tarpit the requests a little. This will increase the number of processes in the process table (and load average) which can become an issue.

(5.1) By KIT.james (kjames3411) on 2022-08-28 14:26:13 edited from 5.0 in reply to 1 [link] [source]

Are you using OpenBSD? pf should help a lot I believe. Power/cost (time and money) ratio is tremendous.

But I really believe dangerous pages should be accessible only to logged in users.

If you have time you could also implement a simple system that bans (even temporarily) users spamming the server. And ideally (finally) that also checks that a single account does not use too many IPs too quickly.

People using heavily the site should really work locally, so I believe it's okay to send some error page to users that go in the grey zone.

(6) By Roy Keene (rkeene) on 2022-08-28 16:07:14 in reply to 1 [link] [source]

One option might be to, under high load conditions, require a cryptographically unforgable HTTP cookie that is valid for 30 minutes. And to get that cookie you must solve a human verification test. For Fossil-based connections this
could be omitted or be required to be passed in the URL or something.

Cloudflare has a mechanism to do something similar: https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/

(7.1) By Kirill M (Kirill) on 2022-08-28 16:09:29 edited from 7.0 in reply to 1 [link] [source]

[...] But because the "no-cache" header is included in the reply, using something like CloudFlare will not help to mitigate this attack, I don't believe.

CloudFlare should be good at detecting that the user agent is an evil bot, and so either block the request or serve it a captcha or whatnot. But CloudFlare would want to have authoritative DNS for the domain name. Maybe they don't for some enterpri$se plan, but the free services requires delegating the domain name to them.

I'd probably consider using a DNS blacklist of some kind, maybe XBL? That would work by adding a check to see if the remote IP address is listed in the DNS blacklist before going on with processing the request. I'm mostly familiar with DNSBLs for mail, but XBL seems to be generic enough. Also, in the FAQ Spamhaus mentions that "[t]he SBL and XBL can be queried to prevent things such as blog-comment and guestbook spamming, click-fraud, and automated email address harvesting" and also shows how to do the queries.

For this to work, the majority of the offending IPs should be listed...

(11) By John Rouillard (rouilj) on 2022-08-29 02:06:23 in reply to 7.1 [link] [source]

I have been burned by people using email blacklists to stop web traffic. Many ISPs (including mine) intentionally put entire client IP blocks into the list.

So my IP would show up in the email blacklists by design since I am not supposed to be sending emails from my dynamic address space.

I discovered this when I was demoing something to a friend at school and got blocked by the IT system web proxy using email blacklists.

For SMTP the lists are very useful, but I do not recommend using these lists for restricting HTTP/S access.

(14) By Kirill M (Kirill) on 2022-08-29 08:59:39 in reply to 11 [link] [source]

I have been burned by people using email blacklists to stop web traffic. Many ISPs (including mine) intentionally put entire client IP blocks into the list.

There are multiple email blacklists. ISP will put their entire client ranges in lists such as PBL (Policy Block List), thouse should certainly not be used outside the email delivery. The XBL which I was suggesting, is a "realtime database of IP addresses of hijacked PCs infected by illegal 3rd party exploits, including open proxies (HTTP, socks, AnalogX, wingate, etc), worms/viruses with built-in spam engines, and other types of trojan-horse exploit". Botnets members will typically be infected with something.

Also, there are even some email blocklist which even shouldn't be used for blocking email (such as SORBS and UCEPROTECT).

(8) By woji (adam.wojkowski) on 2022-08-28 19:04:54 in reply to 1 [link] [source]

Hello Richard,

I just asked our sysadmin and he suggest to ask your ISP to reroute malign traffic as part of DDOS protection (we use this) or dumb solution, since one URI is attacked, might work to put some tmp proxy (eg dockerized traefic) where U can limit IP ranges, agents, whatever... I use traefic in front of fossil too, with IP range filtering.

b.r.

(9) By stevel on 2022-08-28 23:04:12 in reply to 1 [source]

This is not directly related but might help going forward.

The Tclers wiki sits behind Cloudflare which is configure to cache all pages indefinitely. The wiki uses the Cloudflare API to clear the cache whenever a page changes. Usually this means multiple page caches are cleared (e.g. Recent changes and any back links). The net effect is that the wiki is fully cached and protected. We actually encourage crawling because it fills the cache.

Pages that can’t be cached (like the differences between two versions in the history) have heightened Cloudflare security checks enabled via page rules, which is why you sometimes see “checking browser” messages.

Cloudflare have generously donated an enterprise license to the Tcl community as part of their open source initiative. I would be surprised if they did not respond positively to an approach from the Fossil or SQLite communities.

I can’t see any other way of protecting the wiki from botnets given the resources we have at our disposal.

--Steve

(12.4) By John Rouillard (rouilj) on 2022-08-29 02:33:32 edited from 12.3 in reply to 1 [link] [source]

Is this page using enough CPU to be considered an expensive page? (edit: A fast glance through the code shows calls to cache_write only in tar.c and zip.c (that also hanldes sqlar archives). So you might not even write or consult the cache for /vinfo....)

If so shouldn't it be in the page cache? (As you note Fossil is stopping the client and any proxies from caching the page. Also I assume from your comment it's not using the fossil cache either.)

In reading about fossil cache, I am not sure what the criteria is for caching. Is it pure generation time? Frequency of use (cumulative generation time over some window. Maybe adding a gcra with a threshold of 30 requests in a 5 minute period triggers caching that url.)?

Is there a manual way to force the URL to be served from the page cache (some SQL on the cache database perhaps)?

(13.1) By Stephan Beal (stephan) on 2022-08-29 10:49:22 edited from 13.0 in reply to 1 [link] [source]

Fossil makes no attempt to cache the page in question. I'm not sure why. Maybe that is a bug. I'll have to look into it.

A cursory analysis suggests that the cache used by zip/tar is only useful for blobs which are independent of the site skin. It wouldn't work as-is for vdiff but should be okay for vpatch (reachable via the "patch" link in the vdiff view).

Edit: OTOH, the actual diff part of the page is independent of the skin and could hypothetically be cached.

(15) By Kevin (KevinYouren) on 2022-08-29 09:25:38 in reply to 1 [link] [source]

Have you got simple firewall active (UFW) on the machine?

You could block the IP addresses.

(16) By Richard Hipp (drh) on 2022-08-29 11:29:55 in reply to 1 [link] [source]

The webserver used by fossil-scm.org did not record the query strings on HTTP requests in its log.

All these tens of thousands of requests to a single webpage - I think the attacker was sending the same request many times but adding new query parameters to each request. The query parameters contained attempts at SQL injection. So (I'm guessing) the purpose of the robot attack was to try to gain control of machines using SQL injection.

I have now updated the web server so that it does record the query strings. If the attack happens again, this will hopefully give me a better view of what is going on.

Depending on what I find, I might enhance Fossil so that if it sees unknown query parameters that contain likely SQL injection attacks, it will return the honeypot page rather than go through the lengthy computation of computing the diff of a 5-year-old check-in.

(17) By anonymous on 2022-08-29 23:11:19 in reply to 16 [link] [source]

For what its worth, if our web server is hit with a request with a query that does not match one of our known end points we assume a bad actor and the server does not even send a reply.

(18) By Richard Hipp (drh) on 2022-08-30 00:08:38 in reply to 17 [link] [source]

known end points

OT: I've seen this term "end point" used a lot recently. Can somebody please explain to me what it means in the context of its current trendy usage? Is "end point" a real thing or is it just a new buzzword?

The web interface for Fossil identifies lots of "methods". For example, the /timeline method returns HTML that displays a timeline. The /info method shows information about an artifact. And so forth. Are these methods the "end points" of Fossil?

(19) By stevel on 2022-08-30 00:12:39 in reply to 18 [link] [source]

I can give an example, if the URL contains .php you can assume it isn't a valid end point :)

(20) By stevel on 2022-08-30 00:18:45 in reply to 18 [link] [source]

And you might consider using fail2ban https://en.wikipedia.org/wiki/Fail2ban (albeit I outsource pretty much all of the mitigation to Cloudflare because I simply don't have the time nor enery to do it myself).

(21.1) By Andy Bradford (andybradford) on 2022-08-30 03:01:59 edited from 21.0 in reply to 18 [link] [source]

> Is "end point" a real thing or is it just a new buzzword?

It's just an industry  term that has been gaining a  lot of traction, or
usage lately  (within the  last few  years). Some might  just call  it a
buzzword. Basically it's just  a path that is defined in  a web API (and
maybe even other non-URL contexts).

/xfer might be considered an "end point".

I suppose /timeline might also be considered an "end point".

Andy