Fossil User Forum

Latest anti-robot defenses
Login

Latest anti-robot defenses

Latest anti-robot defenses

(1) By Florian Balmer (florian.balmer) on 2025-08-16 08:49:23 [link] [source]

I've been experimenting with the latest anti-robot defenses.

It's nice to see the simplifications, both for users (no more captcha) and for admins (fewer settings).

Do you think it's even possible to set the keyboard focus to the Ok button as soon as it's ready? This would be particularly useful for people who want to work without a mouse as much as possible!

At least in Chrome, the autofocus attribute doesn't seem to work for the (initially) disabled button, but calling focus() does the trick:

Baseline: Fossil [54afc94ce0]
Index: src/robot.c
==================================================================
--- src/robot.c
+++ src/robot.c
@@ -98,10 +98,11 @@
   @ function Nhtot1520(x){return document.getElementById(x);}
   @ function Aoxlxzajv(h){\
   @ Nhtot1520("vx").value=h;\
   @ Nhtot1520("cx").value="Ok";\
   @ Nhtot1520("cx").disabled=false;\
+  @ Nhtot1520("cx").focus();\
   @ }
   @ function Vhcnyarsm(h,a){\
   @ if(a>0){setTimeout(Vhcnyarsm,1,h+a,a-1);}else{Aoxlxzajv(h);}\
   @ }
   k = 200 + h2%99;

And this line seems to query a value from the config table that never gets added (if I'm right; being aware that it's work in progress).

Also, I'm happy the RSS feed is back! Thanks!

(2) By Richard Hipp (drh) on 2025-08-16 14:10:03 in reply to 1 [link] [source]

Please test the latest code and report back whether you think it is an improvement or a regression.

(3) By Florian Balmer (florian.balmer) on 2025-08-16 14:30:24 in reply to 2 [link] [source]

I think it's an improvement!

I have no idea whether the duration matters, i.e. if nested setTimeout() calls with ultra-short delays would already outsmart the bots, or if something perceptible by humans is necessary so the bots get bored.

Anyway, I feel that waiting a little longer just for once is much better than having to reach out for the mouse and aim for a small button.

Thanks!

PS: I'm starting to understand why setTimeout() is your favorite function :)

PPS: I'm still surprised the bots got through [0d41eb4790]. But maybe if the requests originated from a web browser plugin, they had full support for any compression.

(4) By Florian Balmer (florian.balmer) on 2025-08-16 14:33:27 in reply to 3 [link] [source]

The correct link is: [0d41eb4790].

(5) By Andy Bradford (andybradford) on 2025-08-16 16:24:47 in reply to 2 [link] [source]

> Please test the latest code and report back whether you think it is an
> improvement or a regression.

I often  use "fossil server"  and connect via  http://localhost:8080/ to
view  the repository  (I only  use "fossil  ui" when  I have  deliberate
intentions  to make  changes  via the  UI). Is  it  intended that  these
anti-robot defenses also apply to localhost?

It  seems to  take an  extremely  long time  before I  can view  certain
content now.

Thanks,

Andy

(12) By Florian Balmer (florian.balmer) on 2025-08-18 06:48:00 in reply to 5 [link] [source]

You either have to set the new robot-restrict setting to a value that doesn't match any web UI pages (it looks like an empty value always falls back to the default), or apply the following patch:

Baseline: Fossil [e5991efb68]
Index: src/robot.c
==================================================================
--- src/robot.c
+++ src/robot.c
@@ -171,10 +171,11 @@
 */
 int robot_restrict(const char *zPage){
   const char *zGlob;
   const char *zToken;
   static int bKnownPass = 0;
+  if( cgi_is_loopback(g.zIpAddr) ) return 0;
   if( g.zLogin ) return 0;    /* Logged in users always get through */
   if( bKnownPass ) return 0;  /* Already known to pass robot restrictions */
   zGlob = db_get("robot-restrict",robot_restrict_default());
   if( zGlob==0 || zGlob[0]==0 ){ bKnownPass = 1;  return 0; }
   if( !glob_multi_match(zGlob, zPage) ) return 0;

(13) By Stephan Beal (stephan) on 2025-08-18 11:02:24 in reply to 12 [link] [source]

  • if( cgi_is_loopback(g.zIpAddr) ) return 0;

See /chat (#23987) for more info.

(14) By Richard Hipp (drh) on 2025-08-18 11:02:38 in reply to 12 [link] [source]

Andy already has two good choices to fix his problem:

  1. Run "fossil ui" so that he has full permissions - in which case robot-restrict does not apply

  2. Change his "robot-restrict" setting to an empty string.

I do not see the need to add a new "does not apply to loopback IPs" wrinkle to the robot-restrict logic specification. That just makes robot-restrict more complicated. Simpler is better.

(16) By Florian Balmer (florian.balmer) on 2025-08-18 13:24:00 in reply to 14 [link] [source]

As somebody lacking the resources (both regarding brain capacity and spare time) to fight the bots on my own, I'm always thoroughly impressed by your clever ideas and the fast implementation! However, from the perspective of a user, I'm also a bit annoyed about every new "obstacle" on my own repositories and on the fossil-scm.org and sqlite.org websites. I'm relying on a hosting company with servers located in my country to take care of the bots, and my understanding of "simplicity" is to have as much freedom as possible for browsing and scripting.

One thing I'd really LOVE to have is that a token gets you a fossil-client-ok cookie right from the server, without having to wait for the timeouts involved in client-side calculations. So I can append token=XXXX to the links in my bookmarks, and when using them as entry-points to the Fossil website, I'm free to browse around without having to wait for the proof-of-work the first time I happen to click a "deep" link. Please, can we have this? :)

(17) By Richard Hipp (drh) on 2025-08-18 15:56:44 in reply to 16 [link] [source]

I have set https://fossil-scm.org/home so that anonymous logins now last for 30 days. So if you will log in as "anonymous" and click the "Remember Me" button, and as long as you don't upgrade your web browser, you shouldn't be troubled by any anti-robot defenses again for a month.

(18) By Florian Balmer (florian.balmer) on 2025-08-18 16:15:03 in reply to 17 [link] [source]

Thanks!

My computer usage is very similar to the processes described here:

So my entire browser history and caches are cleared several times a day.

Starting from a bookmark with &token=XXXX at the end would be the ultimate comfort for me. But I understand that you may hesitate, as it's a somewhat unusual feature. I don't think it's no more harmful than using a token in a script (which could also access many links with the same token), or using a browser plugin to append the token to any URLs matching a certain pattern (like http://fossil-scm.org/* -- but plugins are strictly banned from all my browsers!). As tokens are connected to users, the risk of abuse seems small, to me.

(20) By Andy Bradford (andybradford) on 2025-08-19 03:28:05 in reply to 18 [link] [source]

> My computer usage is very similar to the processes described here:

Oddly enough, so is mine, however, I haven't broken the habit of relying
on HISTFILE, but my habits overlap  with many of the others described in
that document.

Thanks for sharing.

Andy

(34) By Florian Balmer (florian.balmer) on 2025-08-22 15:15:00 in reply to 16 [link] [source]

Whether this code is executed depends on the value of the robot-restrict setting, i.e. only if robot_restrict() finds a restriction, client_might_be_a_robot() is called.

I'm very happy about this feature, and I'm using it with my bookmarks, but I have to tweak them to include some restricted query parameters.

For example, the code linked above is not executed with:

  • https://fossil-scm.org/home/timeline?token=abcdef0123456789

But it is executed with (note the trailing query parameter):

  • https://fossil-scm.org/home/timeline?token=abcdef0123456789&c=

Is this intentional, or should the code path linked above be run unconditionally with each web request?

(19) By Andy Bradford (andybradford) on 2025-08-19 02:48:36 in reply to 14 [link] [source]

> 2. Change his "robot-restrict" setting to an empty string.

Yes, this  is the only  option that makes sense  to me since  it doesn't
make sense to run "fossil ui" when I don't need privileges.

Regarding setting "robot-restrict" to an empty string; I tried that, but
it  doesn't work,  presumably because  "empty string"  has some  special
meaning with respect to settings:

$ fossil settings robot-restrict -g ""
$ fossil settings robot-restrict
robot-restrict          

When I run  "fossil server" it still trammels the  request. Maybe "empty
string" is an alias for "fossil unset"? Should I set it to a string with
a single space instead?

Thanks,

Andy

(21) By Andy Bradford (andybradford) on 2025-08-19 03:32:20 in reply to 19 [link] [source]

> Should I set it to a string with a single space instead?

I did try that  and it seems to work (e.g. I  don't get anti-robotted on
localhost):

$ fossil settings robot-restrict --global " "
$ fossil settings robot-restrict             
robot-restrict           (global)     
$ fossil server

Now things work smoothly.

Thanks,

Andy

(22) By Richard Hipp (drh) on 2025-08-19 10:30:47 in reply to 19 [link] [source]

Make the setting "off" to disable all robot restrictions.

(27) By Andy Bradford (andybradford) on 2025-08-19 17:17:16 in reply to 22 [link] [source]

> Make the setting "off" to disable all robot restrictions.

Thanks, tested and it's working.

Andy

(29) By Florian Balmer (florian.balmer) on 2025-08-21 15:58:03 in reply to 22 [link] [source]

I noticed I'm no longer able to freely browse my repositories (as "nobody") even if robot_restrict is set to off: the diffs are always hidden by default and only appear when clicking a link that contains a diff= query parameter.

Is this intentional? If no, should client_might_be_a_robot() return false if robot_restrict is off? Or should preferred_diff_type() handle this case?

(30) By Richard Hipp (drh) on 2025-08-21 16:02:39 in reply to 29 [link] [source]

When robot-restrict (with a "-" not a "_") is off, then robot_restrict_has_tag() should return 0 regardless of what tag is passed in. That should cause robot_restrict() to always return 0, and hence should always let your requests go through.

(31) By Florian Balmer (florian.balmer) on 2025-08-21 16:14:35 in reply to 30 [link] [source]

But preferred_diff_type() directly calls client_might_be_a_robot(), which returns true for "nobody".

I don't see a built-in way to disable this. It's easy for me to patch this, but I wonder if this is intentional?

(32) By Richard Hipp (drh) on 2025-08-21 16:22:35 in reply to 31 [link] [source]

Try again after the latest check-in, please.

(33) By Florian Balmer (florian.balmer) on 2025-08-21 16:29:13 in reply to 32 [link] [source]

Yes, that works, if robot-restrict is off, user "nobody" can see the diffs. Thanks!

(15) By spindrift on 2025-08-18 12:40:52 in reply to 5 [link] [source]

If fossil is running behind a reverse proxy, then all connections may come from localhost, no?

Probably not sensible to whitewash all connections from 127.0.0.1 when run as fossil server

(6) By Florian Balmer (florian.balmer) on 2025-08-17 03:32:58 in reply to 1 [link] [source]

Now I see what the 'token-*' config entries are intended for!

Right now, my script to dowload the SQLite source code RSS feed works fine.

In case it's necessary to put the RSS feed behind the robot barrier again, can arbitrary people like me apply for a read-only account on the SQLite source code repository so they are able to generate their acsess tokens? (If I get it right, only registered users can generate access tokens?)

(7) By Florian Balmer (florian.balmer) on 2025-08-17 04:03:45 in reply to 1 [link] [source]

What I would love to have: if viewing a Fossil page with an access token in the browser would return you a fossil-client-ok cookie.

Then I could keep the link to Fossil with &token=XXX appended in my bookmarks to bypass the robots without having to wait and without having to login.

I would LOVE that, as my browser cookies are rather transient.

While this request may seem naive, I don't see why keeping a personal bookmark with an access token is different from keeping the token in a script.

(8) By spindrift on 2025-08-17 06:56:38 in reply to 7 [link] [source]

If you haven't seen this response, it sounds like it may be of interest in that case.

(9) By Florian Balmer (florian.balmer) on 2025-08-17 07:51:19 in reply to 8 [link] [source]

I've seen the reply. I'm suggesting a new feature: allow tokens to bypass bot-tests when using a browser, not a script, i.e. generate the client-ok cookie when a page is viewed with a browser. That would be awesome!

(10) By spindrift on 2025-08-17 08:05:33 in reply to 9 [link] [source]

I had assumed that you could just do exactly that.

Present the token as a query param at the end of your browser requested URI.

I admit I haven't tried it, as I assume it would just work.

If it doesn't, then I agree, seems beneficial in your example.

(11) By spindrift on 2025-08-17 08:09:18 in reply to 10 [link] [source]

I've just tried it and while it doesn't not work, I'm struggling to even activate the bot detection at the moment, so I can't tell if it's unneeded on the pages I've tried it on, or the presented token is being accepted for "anonymous access".

(23) By Florian Balmer (florian.balmer) on 2025-08-19 16:21:32 in reply to 1 [source]

BTW: The getComputedStyle() test (now also used in src/href.js) doesn't work with IE, which stores the z-index style as a number, instead of as a string -- not sure if anybody cares but me :) A modified version of the test still works with IE (see below).

Also, this is one of the rare legitimate use cases for the === operator. Opinions on this differ, but I'm having a hard time reading JS sources full of === when it's not necessary, because I always feel dumb for missing the author's intent.

Test HTML File

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Test: Fossil getComputedStyle() Anti-Bot Defense</title>
  <style>
    body {
      z-index: 0;
    }
  </style>
</head>
<body>
  <ul id="results"></ul>
  <script>
    var ul = document.getElementById('results'), li, tn;
    li = document.createElement('li');
    tn = document.createTextNode(
      'Test 1: window.getComputedStyle(document.body).zIndex → ' +
      'type: ' + typeof window.getComputedStyle(document.body).zIndex +
      ', value:' + window.getComputedStyle(document.body).zIndex
    );
    li.appendChild(tn);
    ul.appendChild(li);
    li = document.createElement('li');
    tn = document.createTextNode(
      'Test 2: window.getComputedStyle(document.body).zIndex==="0" → ' +
      'type: ' + typeof( window.getComputedStyle(document.body).zIndex==="0" ) +
      ', value: ' + ( window.getComputedStyle(document.body).zIndex==="0" )
    );
    li.appendChild(tn);
    ul.appendChild(li);
    li = document.createElement('li');
    tn = document.createTextNode(
      'Test 3: window.getComputedStyle(document.body).zIndex===0 → ' +
      'type: ' + typeof( window.getComputedStyle(document.body).zIndex===0 ) +
      ', value: ' + ( window.getComputedStyle(document.body).zIndex===0 )
    );
    li.appendChild(tn);
    ul.appendChild(li);
  </script>
</body>
</html>

Output from Chromium

• Test 1: window.getComputedStyle(document.body).zIndex → type: string, value:0
• Test 2: window.getComputedStyle(document.body).zIndex==="0" → type: boolean, value: true
• Test 3: window.getComputedStyle(document.body).zIndex===0 → type: boolean, value: false

Output from IE

• Test 1: window.getComputedStyle(document.body).zIndex → type: number, value:0
• Test 2: window.getComputedStyle(document.body).zIndex==="0" → type: boolean, value: false
• Test 3: window.getComputedStyle(document.body).zIndex===0 → type: boolean, value: true

(24) By Richard Hipp (drh) on 2025-08-19 16:38:35 in reply to 23 [link] [source]

Does it work for IE now?

(25) By Florian Balmer (florian.balmer) on 2025-08-19 16:50:57 in reply to 24 [link] [source]

Yes, it works, thank you!

I'd like to emphasize that a lot of the more recent JS code based on the fossil.dom framework doesn't run in IE, so a few pieces of the UI are defunct for an out-of-the-box build, but href.js is required (if enabled), and the latest version now works fine!

(26) By Richard Hipp (drh) on 2025-08-19 17:03:51 in reply to 25 [link] [source]

Microsoft says that "Internet Explorer (IE) 11 is the last major version of Internet Explorer. On June 15, 2022, the Internet Explorer 11 desktop application ended support...."

So IE has been unsupported for more than 3 years. It probably contains unfixed, critical security bugs. You might want to think about picking a new web browser.

(28) By Florian Balmer (florian.balmer) on 2025-08-19 17:35:18 in reply to 26 [link] [source]

IE is not my main browser, but it's super-handy to use it on my old (offline) boxes and VMs, where I'm doing some of my development.