test failed failed on riscv64 with 2.25
(1) By anonymous on 2025-03-04 17:00:43 [link] [source]
Hi,
There is one test failed on Debian riscv64:
The "sha1" package is not available.
Skipping 'th1-tcl-2', SQLite package for Tcl not available
test th1-expr-3 FAILED!
RESULT: -0./,),(-*,(
Skipping th1-anycap-*-1 perm tests: not in Fossil repo checkout.
I test the th1-expr-3
on the same riscv64 hardware with 2.24, but it passed.
Could you have a look at this? TIA
(2) By vimer (vimerbf) on 2025-03-05 02:09:31 in reply to 1 [link] [source]
I tested more cases, which are below:
rv@-01:~/build/fossil/fossil-2.25$ fossil test-th-eval "expr (1<<31)"
-0./,),(-*,(
rv@01:~/build/fossil/fossil-2.25$ fossil test-th-eval "catch {expr (1<<31)}"
0
(3) By Stephan Beal (stephan) on 2025-03-05 03:40:48 in reply to 1 [link] [source]
Could you have a look at this?
Donation of appropriate hardware would go a long way towards making that happen :).
(4) By vimer (vimerbf) on 2025-03-05 07:05:08 in reply to 3 [link] [source]
Would it make sense to provide remote access?
If so, you can try it with riscv64 hardware: https://github.com/plctlab/riscv-lab-access
(5) By Stephan Beal (stephan) on 2025-03-05 07:36:01 in reply to 4 [link] [source]
Would it make sense to provide remote access?
Though i can't speak for anyone else on the project, providing cool new hardware would be a lot more compelling ;).
If so, you can try it with riscv64 hardware:
This is not me volunteering - my proverbial plate is slightly over its capacity1 - but that is an interesting service, despite the irony of requiring a github account to debug a fossil oddity. i do hope that someone will take that on, as riscv is a seriously interesting I.T. development and i look forward to eventually owning a riscv system (donated or not!).
- ^ That said: receiving a riscv dev board to play with would compel me to return the favor by expanding that proverbial plate!
(6) By vimer (vimerbf) on 2025-03-05 07:55:11 in reply to 5 [link] [source]
Though i can't speak for anyone else on the project, providing cool new hardware would be a lot more compelling ;).
Okay.:)
but that is an interesting service, despite the irony of requiring a github account to debug a fossil oddity
hmm, here the Github account is just to be used as adding pubkey then you can get one ssh config entry that allows you to build/debug code. Oh, I got it, you mean you are one fossil
user.:-)
development and i look forward to eventually owning a riscv system
It is not hard to get one RISC-V system today, you can get one donated from RVI board program0 if you are interested. Now I think we have to focus on how to fix the issue, could you give me your pubkey if this works(tsu.yubo@gmail.com)?
(7) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-09 08:59:22 in reply to 1 [link] [source]
I used the riscv64 porter box ricci.debian.org.
Fossil takes like six thousand years to compile on ricci. Not sure if it's GCC-on-riscv64-is-a-pig vs riscv64-is-a-pig vs porter-box-posessed-by-pig-demon, but something is a pig!
I'm trying to figure out if the test failure issue is in the calculations vs the printing routines. So here are some commands, with the output on amd64, then on riscv64 if that differs. It looks to me like the issue is some problem with the most negative 32-bit twos-complement integer in specific. But it's hard to distinguish printing from arithmetic, because it seems to use a string as the representation of the intermediate value. At least, that's my best guess.
$ ./fossil version
This is fossil version 2.25 [1fba3e4aa9] 2025-04-07 19:33:35 UTC
$ ./fossil test-th-eval "expr (1<<30)"
1073741824
$ ./fossil test-th-eval "expr (1<<31)"
-2147483648
-0./,),(-*,(
$ ./fossil test-th-eval "expr (1<<31)+1"
-2147483647
TH_ERROR: expected number, got: "-0./,),(-*,("
$ ./fossil test-th-eval "expr 2147483647"
2147483647
2147483647
$ ./fossil test-th-eval "expr 2147483647+1"
-2147483648
-0./,),(-*,(
$ ./fossil test-th-eval "expr -2147483648"
-2147483648
-0./,),(-*,(
$ ./fossil test-th-eval "expr -2147483647"
-2147483647
-2147483647
$ ./fossil test-th-eval "expr -2147483647-1"
-2147483648
-0./,),(-*,(
$ ./fossil test-th-eval "expr -2147483647-2"
2147483647
2147483647
$ ./fossil test-th-eval "expr -2147483647-1-1"
2147483647
TH_ERROR: expected number, got: "-0./,),(-*,("
(8) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-09 15:27:41 in reply to 7 [link] [source]
Okay, I have good news and I have bad news.
The problem is also exhibited when I compile 2.24 from sources, so this is not due to some change from 2.24 to 2.25. But the problem is not exhibited by a binary compiled from the very same 2.24 sources in April 2024.
The good news is this seems due to a change in the riscv64 toolchain introducing a new bug. And the fossil test suite caught it. THANK YOU!!!
The bad news is that this seems due to a change in the riscv64 toolchain introducing a new bug. Which is (a) going to be hard to track down and fix, and (b) will probably hold up the transition of fossil into Debian testing and might even keep it out of the next stable release.
# FRESHLY COMPILED from 1:2.24-6 sources:
$ ./fossil version
This is fossil version 2.24 [8be0372c10] 2024-04-23 13:25:26 UTC
$ ./fossil test-th-eval "expr (1<<31)"
-0./,),(-*,(
# Using the fossil 1:2.24-6 package compiled in April 2024:
$ /usr/bin/fossil version
This is fossil version 2.24 [5fe7dddc6f] 2024-04-30 14:34:12 UTC
$ /usr/bin/fossil test-th-eval "expr (1<<31)"
-2147483648
(9) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-09 20:05:36 in reply to 8 [link] [source]
I tried again using clang 19.1.7 to build fossil 2.25 on riscv64, and it works fine.
So either there's a bug in GCC 14.2.0. Or maybe GCC is following some language spec literally, and the spec says (1<<31) on a signed 32-bit integer is indeterminate, so it's doing something indeterminate.
In any case, I'll fix the Debian package issue by using clang on riscv64. But it might be a good idea to make sure that, somewhere in the bowels of the fossil system, there isn't an assumption being made about some C construct which is actually, according to the spec, undefined. Because GCC has been getting more and more aggressive about optimizing away that sort of thing, breaking code left and right in the process.
(10) By Richard Hipp (drh) on 2025-04-09 21:15:00 in reply to 9 [link] [source]
Can you please test the latest trunk check-in (2025-04-09T21:12Z or later) using GCC 14.2.0 on ricci.debian.org and let us know if that clears your problem?
(11.1) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-10 12:31:17 edited from 11.0 in reply to 10 [link] [source]
Thanks!
I just tested the that, using the trunk branch with one more commit after (takes hours because riscv64 ricci takes two orders of magnitude longer to compile than my desktop) and ... the fix did not work. 😖.
Can you point me to the exact places where <<
is done, and where the 32-bit integers are converted to strings?
(12) By Richard Hipp (drh) on 2025-04-10 12:41:30 in reply to 11.1 [link] [source]
The <<
is done at th.c:2166.
Integer-to-text conversion is done at th.c:2876-2897.
(13.1) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-10 13:11:03 edited from 13.0 in reply to 11.1 [link] [source]
I looked at the fix. You're tweaking the shifting code that implements user-level <<
. I don't see how that could be the problem: at worst that would result in a wrong number, not a weird non-numeric string.
Here is a simpler test case that doesn't use <<
which breaks it.
$ ./fossil test-th-eval "expr -2147483648"
-0./,),(-*,(
If I had to guess, it looks like an issue rendering the most negative 32-bit integer into a string.
Here I try to construct the same number or nearby ones in other ways.
$ ./fossil test-th-eval "expr -2147483647"
-2147483647
$ ./fossil test-th-eval "expr -2147483647-1"
-0./,),(-*,(
$ ./fossil test-th-eval "expr -2147483647-2"
2147483647
$ ./fossil test-th-eval "expr 2*-1073741824"
-0./,),(-*,(
$ ./fossil test-th-eval "expr 2147483647"
2147483647
$ ./fossil test-th-eval "expr 2147483647+1"
-0./,),(-*,(
And here is something that blew my intuition, because I did not think it would be able to represent a number one greater than the maximum positive 32-big 2s-complement integer:
$ ./fossil test-th-eval "expr 2147483648"
2147483648
$ ./fossil test-th-eval "expr 2147483648+0"
-0./,),(-*,(
(14) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-10 14:50:42 in reply to 13.1 [link] [source]
Just a guess, but I suspect:
The
<<
code was fine, can revert that.The problem is in
Th_SetResultInt
in the filesrc/th.c
In particular, the multiplication by -1
here
if( iVal<0 ){
isNegative = 1;
uVal = iVal * -1;
}
looks suspect to me.
If uVal
somehow is not being treated as unsigned, then (char)(48+(uVal%10))
could result in a non-digit. That may be a GCC bug. (Although maybe not; the iVal*-1
may violate some invariant if iVal
is negative but the product cannot be represented as a signed 32-bit int so GCC infers it can use signed instructions even though uVal
is unsigned because it can prove blah blah blah.) In any case I'd suggest working around it with triple guardrails.
Maybe just use sprintf
?
Or, define and use
char todigit(int i) {
int j = i%10;
char c = '0'+j;
if (! (0 <= j && j <= 9)) error("urk");
if (! isdigit(c)) error("gurk");
return c;
}
(15) By Richard Hipp (drh) on 2025-04-10 16:51:09 in reply to 14 [link] [source]
Please try check-in 2025-04-10T16:49Z or later.
(16) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-04-10 17:17:27 in reply to 15 [link] [source]
This fixes it:
int Th_SetResultInt(Th_Interp *interp, int iVal){
char zBuf[32];
snprintf(zBuf, sizeof(zBuf), "%d", iVal);
return Th_SetResult(interp, zBuf, -1);
}
(17.1) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-05-01 14:07:03 edited from 17.0 in reply to 16 [link] [source]
Further updates.
I've uploaded 2.26 to Debian. Since snprintf is in C99, I'm keeping the patch to src/th.c
to use snprintf
rather than the explicit base10 conversion loop.
Apparently this signed instruction issue is a case of following-the-C-standard-looks-like-a-bug, because with int s
, writing -s
, or abs(s)
, or s*-1
, when s
is the most negative integer, yields undefined behavior! In any context, including unsigned u=-s
, because the intermediate value on the right hand side of that expression is signed.
I spoke with someone in the standards loop who is reviewing a draft copy of next C standard, and they are going to try to get
extern unsigned uabs(int i);
added to the standard C library. It's actually a bit pressing, since this issue may create security problems with crypto code, and there's basically no efficient way to code around it right now.
(18) By Stephan Beal (stephan) on 2025-05-06 17:25:02 in reply to 17.1 [link] [source]
I've uploaded 2.26 to Debian. Since snprintf is in C99, I'm keeping the patch to src/th.c to use snprintf rather than the explicit base10 conversion loop.
The C99 dependency could be eliminated by using sqlite3_snprintf() instead. If you can confirm that that resolves the issue i'll get that checked in.
(19) By Barak A. Pearlmutter (barak_pearlmutter) on 2025-05-17 20:24:57 in reply to 18 [source]
Go for it. If sqlite3_snprintf() gets an error on this particular edge case, that's arguably a more important issue anyway.