Fossil

View Ticket
Login

View Ticket

Ticket Hash: e399bc1edfe45b2edb56eb037f63a6bf4cdbc211
Title: Non-ASCII characters in file and folder names are not handled correctly
Status: Fixed Type: Code_Defect
Severity: Critical Priority:
Subsystem: Resolution: Fixed
Last Modified: 2012-11-29 14:50:48
Version Found In: 1.24
Description:
Using the latest version 1.24 the issue occurs after the following steps:

A. For the folder and filenames:

  1. Create a folder named "MañósosCarácteres"
  2. Create a file named "cönáñón" inside the new folder
  3. fossil addremove
  4. fossil commit -m "Añadí una carpeta y un archivo con tildes"

B. then on a different machine

  1. fossil update The new folder named "MañósosCarácteres" is now available
  2. Add a new file called "éstedesdemác.rst" to the folder "MañósosCarácteres"
  3. fossil commit -m "Añadí nuevo archivo con tildes desde mi mac"

C. Back to the first machine

  1. fossil update A second folder with the name "MañósosCarácteres" has been added. How it can add a folder with exactly the same name and have the file system accept this I do not understand. It happens on both Ubuntu 12.04LTS and OSX10.7.

One folder contains both files, and the other contains only the first file ("cönáñón").

In batch of files I was working on for a customer, that contained many non-ASCII characters, sometimes the file names would suddenly duplicate with inverse accents (going from á to à). If you need more examples let me know and I will do a similar run with all non-ascii characters that we use in Spanish.

User Comments:
drh added on 2012-11-29 00:54:47:
Konstantin Khomoutov writes on the fossil-users mailing list:

I'm just handwaving, but Git's code base recently received some
modifications to specifically deal with issues a native Mac OS X
filesystem have with regard to UTF-8.  AFAIK the deal was about that
filesystem pefrorming one of standard UTF-8 normalizations either when
writing or when reading (or both) so that when you create a directory
entry and then read it back, you might get an octet string different
from that you wrote.

See the extensive commit message in [1] and [2] in general.

  1.  [https://github.com/git/git/commit/76759c7dff53e8c84e975b88cb8245587c14c7ba]
  2.  [http://en.wikipedia.org/wiki/HFS_Plus]

anonymous added on 2012-11-29 14:05:10:
I checked this a bit further to include a Windows 7 environment. Win 7 and OSX 10.7 respond differently

A. On the Ubuntu 12.04LTS machine
  #  add the folder  and "WiThCaPiTals" and "WiThCaPiTalsó"
  #  In both folders "withcapitals" and "WiThCaPiTalsó" add the file "FilEWithCapItals.rst"
  #  In the folder "withcapitals" add the file "filewithcapitals.rst"

B. Update the repo on OSX 10.7
  #  fossil: changes 3 files modified
  #  fossil: WARNING: 1 unmanaged files were overwritten
  #  the result is one folder: "WiThCaPiTalsó" containing the file "FilEWithCapItals.rst"
  # Two folders and two files dissapeared.

B. Update the repo on Windows 7
  #  fossil: changes 2 files modified
  #  fossil: WARNING: 1 unmanaged files were overwritten
  #  the result is two folders: "WiThCaPiTals" and "WiThCaPiTalsó" both containing the file "FilEWithCapItals.rst"
  # One folder and one file dissapeared

It looks indeed as if this is caused by how the underlying OS deals with capitalization and non-ascii characters. I used fossil for a workshop on version control and some participants had issues like the change of direction of the accent. But I have not been able to replicate those. Neither have I been able to replicate the issues in the Wiki we saw, that is why I have not reported it separately.

drh added on 2012-11-29 14:50:48:

See also: http://en.wikipedia.org/wiki/Unicode_equivalence#Errors_due_to_normalization_differences