Latest anti-robot defenses
(1) By Florian Balmer (florian.balmer) on 2025-08-16 08:49:23 [link] [source]
I've been experimenting with the latest anti-robot defenses.
It's nice to see the simplifications, both for users (no more captcha) and for admins (fewer settings).
Do you think it's even possible to set the keyboard focus to the Ok button as
soon as it's ready? This would be particularly useful for people who want to
work without a mouse as much as possible!
At least in Chrome, the autofocus attribute doesn't seem to work for the
(initially) disabled button, but calling focus() does the trick:
Baseline: Fossil [54afc94ce0]
Index: src/robot.c
==================================================================
--- src/robot.c
+++ src/robot.c
@@ -98,10 +98,11 @@
@ function Nhtot1520(x){return document.getElementById(x);}
@ function Aoxlxzajv(h){\
@ Nhtot1520("vx").value=h;\
@ Nhtot1520("cx").value="Ok";\
@ Nhtot1520("cx").disabled=false;\
+ @ Nhtot1520("cx").focus();\
@ }
@ function Vhcnyarsm(h,a){\
@ if(a>0){setTimeout(Vhcnyarsm,1,h+a,a-1);}else{Aoxlxzajv(h);}\
@ }
k = 200 + h2%99;
And this line seems to query a value from the config table that never gets added (if I'm right; being aware that it's work in progress).
Also, I'm happy the RSS feed is back! Thanks!
(2) By Richard Hipp (drh) on 2025-08-16 14:10:03 in reply to 1 [link] [source]
Please test the latest code and report back whether you think it is an improvement or a regression.
(3) By Florian Balmer (florian.balmer) on 2025-08-16 14:30:24 in reply to 2 [link] [source]
I think it's an improvement!
I have no idea whether the duration matters, i.e. if nested setTimeout() calls
with ultra-short delays would already outsmart the bots, or if something
perceptible by humans is necessary so the bots get bored.
Anyway, I feel that waiting a little longer just for once is much better than having to reach out for the mouse and aim for a small button.
Thanks!
PS: I'm starting to understand why setTimeout() is your favorite function :)
PPS: I'm still surprised the bots got through [0d41eb4790]. But maybe if the requests originated from a web browser plugin, they had full support for any compression.
(4) By Florian Balmer (florian.balmer) on 2025-08-16 14:33:27 in reply to 3 [link] [source]
The correct link is: [0d41eb4790].
(5) By Andy Bradford (andybradford) on 2025-08-16 16:24:47 in reply to 2 [link] [source]
> Please test the latest code and report back whether you think it is an > improvement or a regression. I often use "fossil server" and connect via http://localhost:8080/ to view the repository (I only use "fossil ui" when I have deliberate intentions to make changes via the UI). Is it intended that these anti-robot defenses also apply to localhost? It seems to take an extremely long time before I can view certain content now. Thanks, Andy
(12) By Florian Balmer (florian.balmer) on 2025-08-18 06:48:00 in reply to 5 [link] [source]
You either have to set the new robot-restrict setting to a value that
doesn't match any web UI pages (it looks like an empty value always falls back
to the default), or apply the following patch:
Baseline: Fossil [e5991efb68]
Index: src/robot.c
==================================================================
--- src/robot.c
+++ src/robot.c
@@ -171,10 +171,11 @@
*/
int robot_restrict(const char *zPage){
const char *zGlob;
const char *zToken;
static int bKnownPass = 0;
+ if( cgi_is_loopback(g.zIpAddr) ) return 0;
if( g.zLogin ) return 0; /* Logged in users always get through */
if( bKnownPass ) return 0; /* Already known to pass robot restrictions */
zGlob = db_get("robot-restrict",robot_restrict_default());
if( zGlob==0 || zGlob[0]==0 ){ bKnownPass = 1; return 0; }
if( !glob_multi_match(zGlob, zPage) ) return 0;
(13) By Stephan Beal (stephan) on 2025-08-18 11:02:24 in reply to 12 [link] [source]
- if( cgi_is_loopback(g.zIpAddr) ) return 0;
See /chat (#23987) for more info.
(14) By Richard Hipp (drh) on 2025-08-18 11:02:38 in reply to 12 [link] [source]
Andy already has two good choices to fix his problem:
Run "
fossil ui" so that he has full permissions - in which case robot-restrict does not applyChange his "robot-restrict" setting to an empty string.
I do not see the need to add a new "does not apply to loopback IPs" wrinkle to the robot-restrict logic specification. That just makes robot-restrict more complicated. Simpler is better.
(16) By Florian Balmer (florian.balmer) on 2025-08-18 13:24:00 in reply to 14 [link] [source]
As somebody lacking the resources (both regarding brain capacity and spare time) to fight the bots on my own, I'm always thoroughly impressed by your clever ideas and the fast implementation! However, from the perspective of a user, I'm also a bit annoyed about every new "obstacle" on my own repositories and on the fossil-scm.org and sqlite.org websites. I'm relying on a hosting company with servers located in my country to take care of the bots, and my understanding of "simplicity" is to have as much freedom as possible for browsing and scripting.
One thing I'd really LOVE to have is that a token gets you a fossil-client-ok
cookie right from the server, without having to wait for the timeouts involved
in client-side calculations. So I can append token=XXXX to the links in my
bookmarks, and when using them as entry-points to the Fossil website, I'm free
to browse around without having to wait for the proof-of-work the first time I
happen to click a "deep" link. Please, can we have this? :)
(17) By Richard Hipp (drh) on 2025-08-18 15:56:44 in reply to 16 [link] [source]
I have set https://fossil-scm.org/home so that anonymous logins now last for 30 days. So if you will log in as "anonymous" and click the "Remember Me" button, and as long as you don't upgrade your web browser, you shouldn't be troubled by any anti-robot defenses again for a month.
(18) By Florian Balmer (florian.balmer) on 2025-08-18 16:15:03 in reply to 17 [link] [source]
Thanks!
My computer usage is very similar to the processes described here:
So my entire browser history and caches are cleared several times a day.
Starting from a bookmark with &token=XXXX at the end would be the ultimate
comfort for me. But I understand that you may hesitate, as it's a somewhat
unusual feature. I don't think it's no more harmful than using a token in a
script (which could also access many links with the same token), or using a
browser plugin to append the token to any URLs matching a certain pattern (like
http://fossil-scm.org/* -- but plugins are strictly banned from all my
browsers!). As tokens are connected to users, the risk of abuse seems small, to
me.
(20) By Andy Bradford (andybradford) on 2025-08-19 03:28:05 in reply to 18 [link] [source]
> My computer usage is very similar to the processes described here: Oddly enough, so is mine, however, I haven't broken the habit of relying on HISTFILE, but my habits overlap with many of the others described in that document. Thanks for sharing. Andy
(34) By Florian Balmer (florian.balmer) on 2025-08-22 15:15:00 in reply to 16 [link] [source]
Whether this code is executed depends on the value of the
robot-restrict setting, i.e. only if robot_restrict() finds a
restriction, client_might_be_a_robot() is called.
I'm very happy about this feature, and I'm using it with my bookmarks, but I have to tweak them to include some restricted query parameters.
For example, the code linked above is not executed with:
https://fossil-scm.org/home/timeline?token=abcdef0123456789
But it is executed with (note the trailing query parameter):
https://fossil-scm.org/home/timeline?token=abcdef0123456789&c=
Is this intentional, or should the code path linked above be run unconditionally with each web request?
(19) By Andy Bradford (andybradford) on 2025-08-19 02:48:36 in reply to 14 [link] [source]
> 2. Change his "robot-restrict" setting to an empty string. Yes, this is the only option that makes sense to me since it doesn't make sense to run "fossil ui" when I don't need privileges. Regarding setting "robot-restrict" to an empty string; I tried that, but it doesn't work, presumably because "empty string" has some special meaning with respect to settings: $ fossil settings robot-restrict -g "" $ fossil settings robot-restrict robot-restrict When I run "fossil server" it still trammels the request. Maybe "empty string" is an alias for "fossil unset"? Should I set it to a string with a single space instead? Thanks, Andy
(21) By Andy Bradford (andybradford) on 2025-08-19 03:32:20 in reply to 19 [link] [source]
> Should I set it to a string with a single space instead? I did try that and it seems to work (e.g. I don't get anti-robotted on localhost): $ fossil settings robot-restrict --global " " $ fossil settings robot-restrict robot-restrict (global) $ fossil server Now things work smoothly. Thanks, Andy
(22) By Richard Hipp (drh) on 2025-08-19 10:30:47 in reply to 19 [link] [source]
Make the setting "off" to disable all robot restrictions.
(27) By Andy Bradford (andybradford) on 2025-08-19 17:17:16 in reply to 22 [link] [source]
> Make the setting "off" to disable all robot restrictions. Thanks, tested and it's working. Andy
(29) By Florian Balmer (florian.balmer) on 2025-08-21 15:58:03 in reply to 22 [link] [source]
I noticed I'm no longer able to freely browse my repositories (as "nobody") even
if robot_restrict is set to off: the diffs are always hidden by default and
only appear when clicking a link that contains a diff= query parameter.
Is this intentional? If no, should client_might_be_a_robot() return false
if robot_restrict is off? Or should preferred_diff_type() handle this
case?
(30) By Richard Hipp (drh) on 2025-08-21 16:02:39 in reply to 29 [link] [source]
When robot-restrict (with a "-" not a "_") is off, then robot_restrict_has_tag() should return 0
regardless of what tag is passed in. That should cause robot_restrict() to always return 0,
and hence should always let your requests go through.
(31) By Florian Balmer (florian.balmer) on 2025-08-21 16:14:35 in reply to 30 [link] [source]
But preferred_diff_type() directly calls client_might_be_a_robot(), which
returns true for "nobody".
I don't see a built-in way to disable this. It's easy for me to patch this, but I wonder if this is intentional?
(32) By Richard Hipp (drh) on 2025-08-21 16:22:35 in reply to 31 [link] [source]
Try again after the latest check-in, please.
(33) By Florian Balmer (florian.balmer) on 2025-08-21 16:29:13 in reply to 32 [link] [source]
Yes, that works, if robot-restrict is off, user "nobody" can see the diffs.
Thanks!
(15) By spindrift on 2025-08-18 12:40:52 in reply to 5 [link] [source]
If fossil is running behind a reverse proxy, then all connections may come from localhost, no?
Probably not sensible to whitewash all connections from 127.0.0.1 when run as fossil server
(6) By Florian Balmer (florian.balmer) on 2025-08-17 03:32:58 in reply to 1 [link] [source]
Now I see what the 'token-*' config entries are intended for!
Right now, my script to dowload the SQLite source code RSS feed works fine.
In case it's necessary to put the RSS feed behind the robot barrier again, can arbitrary people like me apply for a read-only account on the SQLite source code repository so they are able to generate their acsess tokens? (If I get it right, only registered users can generate access tokens?)
(7) By Florian Balmer (florian.balmer) on 2025-08-17 04:03:45 in reply to 1 [link] [source]
What I would love to have: if viewing a Fossil page with an access token in the
browser would return you a fossil-client-ok cookie.
Then I could keep the link to Fossil with &token=XXX appended in my bookmarks
to bypass the robots without having to wait and without having to login.
I would LOVE that, as my browser cookies are rather transient.
While this request may seem naive, I don't see why keeping a personal bookmark with an access token is different from keeping the token in a script.
(8) By spindrift on 2025-08-17 06:56:38 in reply to 7 [link] [source]
If you haven't seen this response, it sounds like it may be of interest in that case.
(9) By Florian Balmer (florian.balmer) on 2025-08-17 07:51:19 in reply to 8 [link] [source]
I've seen the reply. I'm suggesting a new feature: allow tokens to bypass bot-tests when using a browser, not a script, i.e. generate the client-ok cookie when a page is viewed with a browser. That would be awesome!
(10) By spindrift on 2025-08-17 08:05:33 in reply to 9 [link] [source]
I had assumed that you could just do exactly that.
Present the token as a query param at the end of your browser requested URI.
I admit I haven't tried it, as I assume it would just work.
If it doesn't, then I agree, seems beneficial in your example.
(11) By spindrift on 2025-08-17 08:09:18 in reply to 10 [link] [source]
I've just tried it and while it doesn't not work, I'm struggling to even activate the bot detection at the moment, so I can't tell if it's unneeded on the pages I've tried it on, or the presented token is being accepted for "anonymous access".
(23) By Florian Balmer (florian.balmer) on 2025-08-19 16:21:32 in reply to 1 [source]
BTW: The getComputedStyle() test (now also used in src/href.js) doesn't
work with IE, which stores the z-index style as a number, instead of as a
string -- not sure if anybody cares but me :) A modified version of the test
still works with IE (see below).
Also, this is one of the rare legitimate use cases for the === operator.
Opinions on this differ, but I'm having a hard time reading JS sources full
of === when it's not necessary, because I always feel dumb for missing the
author's intent.
Test HTML File
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Test: Fossil getComputedStyle() Anti-Bot Defense</title>
<style>
body {
z-index: 0;
}
</style>
</head>
<body>
<ul id="results"></ul>
<script>
var ul = document.getElementById('results'), li, tn;
li = document.createElement('li');
tn = document.createTextNode(
'Test 1: window.getComputedStyle(document.body).zIndex → ' +
'type: ' + typeof window.getComputedStyle(document.body).zIndex +
', value:' + window.getComputedStyle(document.body).zIndex
);
li.appendChild(tn);
ul.appendChild(li);
li = document.createElement('li');
tn = document.createTextNode(
'Test 2: window.getComputedStyle(document.body).zIndex==="0" → ' +
'type: ' + typeof( window.getComputedStyle(document.body).zIndex==="0" ) +
', value: ' + ( window.getComputedStyle(document.body).zIndex==="0" )
);
li.appendChild(tn);
ul.appendChild(li);
li = document.createElement('li');
tn = document.createTextNode(
'Test 3: window.getComputedStyle(document.body).zIndex===0 → ' +
'type: ' + typeof( window.getComputedStyle(document.body).zIndex===0 ) +
', value: ' + ( window.getComputedStyle(document.body).zIndex===0 )
);
li.appendChild(tn);
ul.appendChild(li);
</script>
</body>
</html>
Output from Chromium
• Test 1: window.getComputedStyle(document.body).zIndex → type: string, value:0
• Test 2: window.getComputedStyle(document.body).zIndex==="0" → type: boolean, value: true
• Test 3: window.getComputedStyle(document.body).zIndex===0 → type: boolean, value: false
Output from IE
• Test 1: window.getComputedStyle(document.body).zIndex → type: number, value:0
• Test 2: window.getComputedStyle(document.body).zIndex==="0" → type: boolean, value: false
• Test 3: window.getComputedStyle(document.body).zIndex===0 → type: boolean, value: true
(24) By Richard Hipp (drh) on 2025-08-19 16:38:35 in reply to 23 [link] [source]
Does it work for IE now?
(25) By Florian Balmer (florian.balmer) on 2025-08-19 16:50:57 in reply to 24 [link] [source]
Yes, it works, thank you!
I'd like to emphasize that a lot of the more recent JS code based on the
fossil.dom framework doesn't run in IE, so a few pieces of the UI are defunct
for an out-of-the-box build, but href.js is required (if enabled), and the
latest version now works fine!
(26) By Richard Hipp (drh) on 2025-08-19 17:03:51 in reply to 25 [link] [source]
Microsoft says that "Internet Explorer (IE) 11 is the last major version of Internet Explorer. On June 15, 2022, the Internet Explorer 11 desktop application ended support...."
So IE has been unsupported for more than 3 years. It probably contains unfixed, critical security bugs. You might want to think about picking a new web browser.
(28) By Florian Balmer (florian.balmer) on 2025-08-19 17:35:18 in reply to 26 [link] [source]
IE is not my main browser, but it's super-handy to use it on my old (offline) boxes and VMs, where I'm doing some of my development.