Fossil Forum

Forum
Login

Fossil gdiff with winmergeu "argument contains...character... security risk"

By majka on 2019-01-31 17:13:58

I am using fossil on Win 10 (fossil version 2.7 [9aa9ba8bd2]). Following error comes up when running fossil gdiff --from prev FILE.vba:

the [FILE.vba~3] argument to the "winmergeu" command contains a character (ascii 0xffffffc5) that is a security risk

It is only a one file from about 15 that has the problem. The files are generated from Word documents containing macros, using olevba to extract these to a text file. This is then converted to Unicode encoding from Windows CP 1250, should it make any difference.

Simply running the winmergeu FILE.vba FILE.vba~3 or fossil diff --from prev FILE.vba works fine and there seems nothing suspect in the resulting diff.

Has anybody an idea what could be the reason for this error?

By florian.balmer on 2019-02-01 17:18:29 [link]

The error message seems to origin from these checks:

FILE: src/blob.c, VERSION: 03104c701e, LINES: 1267-1278

I can reproduce the same error with a file named "[.txt", for example (without the quotes).

What is different that I get a full path with the error message, and not just a file name, but this may be due to some Fossil setting.

... the [C:/path/to/check-out/[.txt~1] argument ...

But I don't see any offending characters in your example.

The output of 0xffffffc5 for the format string "0x%02x" and the 8-bit argument char c somehow looks like an integer overflow of the char argument, promoted to a 32-bit signed integer? A size or signed/unsigned problem with the native char type?

In two's complement notation:

0xffffffc5 ← 11111111 11111111 11111111 11000101 → -0x3b == -59

Can somebody shed some light on this?

Or, is it possible that the file name "FILE.vba" contains additional non-printable characters, as they can often be found in Word and Excel documents, in case the file names were also generated from the contents of the document? But then the non-printable characters would at least have been stripped when pasting to the Fossil Forum post editor.

By anonymous on 2019-02-01 17:51:28 [link]

The printf format type x (and also X) expects unsigned integer.

Normally, I would code:

printf("0x02x", ((uint32_t)c) & 0x000000FFU);

However, for C89, I think that would be expressed as:

printf("0x02x", ((unsigned int)c) & (unsigned int)255);

By majka on 2019-02-01 18:00:55 [link]

I get the full path as well, I stripped it from the message.

The path and filename in the first example does contain non-ascii but valid UTF-8 characters (ŽádostČSOB/žádostČS) in the file name. The system encoding is win cp1250, in the initial case, these characters were changed to UTF-8 (cygwin/iconv)

I have even tried another method to export the separate .bas files instead of single text file with all of the content. The problem stays the same. This time, the encoding stays in cp1250, the problematic characters are in the path C:/path/to/check-out/žádostČSOB/file1.bas etc.

The files inside are perfectly valid cp1250 encoded files with no extra characters, in the first case converted to UTF8 lf, in the second case left as cp1250 with crlf.

fossil gdiff is throwing the error in both cases.

By florian.balmer on 2019-02-02 12:23:30 [link]

printf("0x02x", ((unsigned int)c) & (unsigned int)255);

The explicit cast to unsigned types should suppress sign extension when promoting from char to int, and remove the left-padding bits (the ffff...) from the character code printed with the error message.

I get the full path as well, I stripped it from the message.

The path and filename in the first example does contain non-ascii but valid UTF-8 characters (ŽádostČSOB/žádostČS) in the file name.

Okay, so this was the missing piece of information.

Now I can reproduce your example:

the [C:/path/to/checkout/žádostČSOB/file.txt] argument to the "winmerge" command contains a character (ascii 0xffffffc5) that is a security risk

Fossil internally stores path names as UTF-8. The path name from your example:

žádostČSOB

Corresponds to the following byte sequence in UTF-8:

C5 BE C3 A1 64 6F 73 74 C4 8C 53 4F 42

C5 (or 11000101 in binary) is a lead byte indicating one continuation byte. Together with the trail byte BE (or 10111110 in binary) it encodes the character ž.

The linked code snippet (line 1268) tests for the following (single-byte) characters (on Windows) to detect invalid arguments:

"  \  ;  *  ?  [
22 5C 3B 2A 3F 5B

Plus anything that is smaller than the space character, which is 20 in hexadecimal notation, or 32 in decimal notation.

None of the tested characters is matching C5, which seems to trigger the error.

The only explanation I have is that the Fossil builds we both are using have been compiled with the signed char type, which seems to be the default for most compilers. Then the following test for control characters equal to or lower than space would flag an invalid character:

c<' ' → (char)0xC5 < (char)0x20

With the unsigned char type:

(unsigned char)0xC5 < (unsigned char)0x20 == false

With the signed char type:

(signed char)0xC5 < (signed char)0x20 → -59 < 32 == true

Considering the notation c<' ', I would guess that this was not intented?

So maybe the check for equal or lower than space could be patched to include a lower bounds:

c<' ' && c>0

Or a cast to unsigned char, or something equivalent?

I'd be happy to create, test and submit a patch. Maybe I can find the time to do so tonight, or tomorrow.

However, as this affects a Fossil security policy, I'd like to have feedback from the Fossil architects, whether this is the right thing to do, here?

Moreover, I'm not sure what is the best way to patch this, so that there won't be any compiler warnings with systems defaulting to signed char? The C89-way outlined by anonymous above?

Also, I can imagine that there's more locations where Fossil printf's a hex byte, would they all need to be patched to suppress sign extension?