10

Is there an undocumented way to work with case sensitive files in Windows(If not is this a bug - it doesn't appear to match OSX or Linux behavior)?

First enable case sensitivity in Windows. Then create 2 files named file using cygywin.

echo file > file
echo File > File

Now run

FilePrint["File"]
> file

Notice it doesn't refer to the correct file. I'm mainly interested in the Read,Write & Import.

William
  • 7,595
  • 2
  • 22
  • 70
  • 1
    I'm interested in this question, but I really don't want to mess up with the registry on my only computer :( – Silvia Oct 26 '15 at 09:20
  • @Silvia It is just changing one value but I understand your point. – William Oct 26 '15 at 19:08
  • 2
    It's a property of my company, not good if "anything" happened to it.. BTW agree with @OleksandrR. that this is a qualified question with clear description. – Silvia Oct 27 '15 at 03:00
  • 1
    I would suggest using win API functions through NETLink. – Silvia Oct 27 '15 at 03:07
  • @Silvia I added a small bounty in hope someone(maybe you) might add at least a partial answer. It should also put an end to the close votes at least temporarily. – William Oct 27 '15 at 03:29
  • Is it not possible to close questions with bounties? The close button is still there. I was tempted to click it just to see if it would work ;-) – Szabolcs Oct 27 '15 at 11:42
  • @Szabolcs http://meta.stackexchange.com/questions/14591/how-can-we-close-questions-with-bounties Yes it is confusing because it is still there but doesn't work. Basically where would the bounty go if you vote to close it? – William Oct 27 '15 at 15:51
  • What does FileNames[] give you? You may want/need the option IgnoreCase -> False. – Eric Towers Nov 01 '15 at 06:11
  • @EricTowers assuming I'm using the option correctly it appears not to effect the output FileNames[IgnoreCase -> True]. Surprisingly both files are listed. – William Nov 01 '15 at 15:08
  • Could you be more specific about what you really want to achieve by working with these files? Surely one can accomplish various operations using non-Mathematica code, but if your question is whether we can modify Mathematica to support this while still using FilePrint et al., I would guess the answer is no. – Oleksandr R. Nov 02 '15 at 02:20
  • @OleksandrR. I'm mainly interested in the Read,Write & Import. I kinda figured it wouldn't be Mathematica code but a .NET patch. – William Nov 02 '15 at 03:07
  • 2
    oddly FileByteCount and FileHash do recognize the case sensitive names. – george2079 Nov 03 '15 at 19:40
  • Heh it seems that FindFile uses Internal`PacletFindFile internally, only to decide that it is not necessary to consider the path to be a packlet. – Jacob Akkerboom Nov 03 '15 at 21:17
  • @george2079 What version of M? – William Nov 04 '15 at 02:54
  • 10.1. If it matters I'm looking at a linux(samba server) file system mounted under windows ( without any windows registry mods ). Windows shows the two files in list views but can only access one of them. – george2079 Nov 04 '15 at 17:15
  • @george2079 It appears FileByteCount doesn't work as you described on a NTFS hard drive with the option enabled. I'm not sure why you are experiencing something different on Samba. I'm 10 so that might be the issue. – William Nov 04 '15 at 17:33

2 Answers2

5

This is decidedly not an answer, so please do not give me the bounty. It is also not a solution to the problem, for the simple reason that my C++, Win32, and LibraryLink skills are virtually nonexistent, and certainly not sufficient to write this robustly. Rather, my intent is simply to show that the answer given by Eric Towers contains some serious misconceptions about how Win32 and NT work, and is not a solution either.*

As I mentioned in my comments under his answer, the only reason why some (although not all) of Mathematica's file-handling functions fail to respect case-sensitivity is because they call the Win32 API function CreateFile without supplying the flag FILE_FLAG_POSIX_SEMANTICS. The Microsoft documentation for the flag is the following:

Access will occur according to POSIX rules. This includes allowing multiple files with names differing only in case, for file systems that support that naming. Use care when using this option, because files created with this flag may not be accessible by applications that are written for MS-DOS or 16-bit Windows.

(Note that it is not entirely clear whether there is any other semantic distinction apart from case-sensitivity. POSIX also does not normally lock files that are opened, whereas Windows does, which seems to me to fall under the category of semantics.)

Microsoft is absolutely correct that this capability (and creating and using case-distinguished files in general) should be approached with some care. Most Win32 applications do not supply this flag, even though there is nothing preventing them from doing so. This includes Explorer and many other utilities provided with Windows itself. These programs will thus not be able to distinguish between filenames that differ only in their case, and in fact they normally behave as if you had specified all of the applicable files. This can be a serious problem when the operation one wants to perform is e.g. deletion.

With that caveat, let us proceed to the program. For this I will use some Microsoft example code available on MSDN.

Let us note explicitly:

  • This is Win32 code. Not POSIX. Not NT. (Most of the NT API is undocumented anyway.)
  • The function called is CreateFile, part of the Win32 API. Not NtCreateFile (which is case-sensitive by default, as NT is in general).
  • It is a user-mode application, not a driver. It does not run in kernel mode.

First we create the files. I used Cygwin, consistent with the question:

Olek@core2 /mnt/c/Users/Olek/Desktop/fileprint
$ echo file > file

Olek@core2 /mnt/c/Users/Olek/Desktop/fileprint
$ echo File > File

Now compile the program (I used MinGW g++):

C:\Users\Olek\Desktop\fileprint>g++ -D__in= -o fileprint.exe fileprint.cpp

Try it out:

C:\Users\Olek\Desktop\fileprint>fileprint.exe File

Error code:     0
Number of bytes:        4
Data read from File (4 bytes):
file

What's that? It doesn't work? We must have forgotten to specify FILE_FLAG_POSIX_SEMANTICS! Let's add it (on line 53):

FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, // normal file

becomes

FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED | FILE_FLAG_POSIX_SEMANTICS,

After recompiling:

C:\Users\Olek\Desktop\fileprint>fileprint.exe File

Error code:     0
Number of bytes:        4
Data read from File (4 bytes):
File

Well, look at that! Apparently the presence or absence of FILE_FLAG_POSIX_SEMANTICS really does make all the difference. Who'd have guessed?

But still, one can reasonably ask, how does this relate to Mathematica? The answer is that it is obviously possible to write the same C++/Win32 code as a DLL and load it into Mathematica using LibraryLink. With the aid of the (new-in-9) functions for defining stream methods, it would be relatively straightforward to manipulate files with arbitrarily-cased names as Mathematica streams. Relatively straightforward, that is, for someone who is more confident of their ability to write correct Win32 code than I am of mine.

Finally, the Microsoft example code is given below for reference, in case the page on which it currently exists is changed or moved in future:

#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <strsafe.h>

#define BUFFERSIZE 5
DWORD g_BytesTransferred = 0;

void DisplayError(LPTSTR lpszFunction);

VOID CALLBACK FileIOCompletionRoutine(
  __in  DWORD dwErrorCode,
  __in  DWORD dwNumberOfBytesTransfered,
  __in  LPOVERLAPPED lpOverlapped
);

VOID CALLBACK FileIOCompletionRoutine(
  __in  DWORD dwErrorCode,
  __in  DWORD dwNumberOfBytesTransfered,
  __in  LPOVERLAPPED lpOverlapped )
 {
  _tprintf(TEXT("Error code:\t%x\n"), dwErrorCode);
  _tprintf(TEXT("Number of bytes:\t%x\n"), dwNumberOfBytesTransfered);
  g_BytesTransferred = dwNumberOfBytesTransfered;
 }

//
// Note: this simplified sample assumes the file to read is an ANSI text file
// only for the purposes of output to the screen. CreateFile and ReadFile
// do not use parameters to differentiate between text and binary file types.
//

void __cdecl _tmain(int argc, TCHAR *argv[])
{
    HANDLE hFile; 
    DWORD  dwBytesRead = 0;
    char   ReadBuffer[BUFFERSIZE] = {0};
    OVERLAPPED ol = {0};

    printf("\n");
    if( argc != 2 )
    {
        printf("Usage Error: Incorrect number of arguments\n\n");
        _tprintf(TEXT("Usage:\n\t%s <text_file_name>\n"), argv[0]);
        return;
    }

    hFile = CreateFile(argv[1],               // file to open
                       GENERIC_READ,          // open for reading
                       FILE_SHARE_READ,       // share for reading
                       NULL,                  // default security
                       OPEN_EXISTING,         // existing file only
                       FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, // normal file
                       NULL);                 // no attr. template

    if (hFile == INVALID_HANDLE_VALUE) 
    { 
        DisplayError(TEXT("CreateFile"));
        _tprintf(TEXT("Terminal failure: unable to open file \"%s\" for read.\n"), argv[1]);
        return; 
    }

    // Read one character less than the buffer size to save room for
    // the terminating NULL character. 

    if( FALSE == ReadFileEx(hFile, ReadBuffer, BUFFERSIZE-1, &ol, FileIOCompletionRoutine) )
    {
        DisplayError(TEXT("ReadFile"));
        printf("Terminal failure: Unable to read from file.\n GetLastError=%08x\n", GetLastError());
        CloseHandle(hFile);
        return;
    }
    SleepEx(5000, TRUE);
    dwBytesRead = g_BytesTransferred;
    // This is the section of code that assumes the file is ANSI text. 
    // Modify this block for other data types if needed.

    if (dwBytesRead > 0 && dwBytesRead <= BUFFERSIZE-1)
    {
        ReadBuffer[dwBytesRead]='\0'; // NULL character

        _tprintf(TEXT("Data read from %s (%d bytes): \n"), argv[1], dwBytesRead);
        printf("%s\n", ReadBuffer);
    }
    else if (dwBytesRead == 0)
    {
        _tprintf(TEXT("No data read from file %s\n"), argv[1]);
    }
    else
    {
        printf("\n ** Unexpected value for dwBytesRead ** \n");
    }

    // It is always good practice to close the open file handles even though
    // the app will exit here and clean up open handles anyway.

    CloseHandle(hFile);
}

void DisplayError(LPTSTR lpszFunction) 
// Routine Description:
// Retrieve and output the system error message for the last-error code
{ 
    LPVOID lpMsgBuf;
    LPVOID lpDisplayBuf;
    DWORD dw = GetLastError(); 

    FormatMessage(
        FORMAT_MESSAGE_ALLOCATE_BUFFER | 
        FORMAT_MESSAGE_FROM_SYSTEM |
        FORMAT_MESSAGE_IGNORE_INSERTS,
        NULL,
        dw,
        MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
        (LPTSTR) &lpMsgBuf,
        0, 
        NULL );

    lpDisplayBuf = 
        (LPVOID)LocalAlloc( LMEM_ZEROINIT, 
                            ( lstrlen((LPCTSTR)lpMsgBuf)
                              + lstrlen((LPCTSTR)lpszFunction)
                              + 40) // account for format string
                            * sizeof(TCHAR) );

    if (FAILED( StringCchPrintf((LPTSTR)lpDisplayBuf, 
                     LocalSize(lpDisplayBuf) / sizeof(TCHAR),
                     TEXT("%s failed with error code %d as follows:\n%s"), 
                     lpszFunction, 
                     dw, 
                     lpMsgBuf)))
    {
        printf("FATAL ERROR: Unable to output error code.\n");
    }

    _tprintf(TEXT("ERROR: %s\n"), (LPCTSTR)lpDisplayBuf);

    LocalFree(lpMsgBuf);
    LocalFree(lpDisplayBuf);
}

* Eric's workaround of using the short file names was necessary because, in the case he describes, he had obviously handled the drag-and-drop operation incorrectly and did not retrieve the file names (which include the extensions) but rather the text displayed by Explorer, which is merely part of the user interface. Handling drag-and-drop input is not trivial in Win32, but even so, Microsoft cannot really be blamed if it is done wrongly, because all of the documentation and ample examples are provided. And why should we not rely on short file names? Because they don't necessarily even exist! Short file names are completely optional, and implemented as hard links, on NTFS volumes.

Glorfindel
  • 547
  • 1
  • 8
  • 14
Oleksandr R.
  • 23,023
  • 4
  • 87
  • 125
  • Normal Win32 applications do not behave as if you have selected all the case-distinct-only files. They behave as if you have selected the first one returned by FindFirstFile{|Ex|Transaction}() or FindNextFile{|Ex}() (which need FIND_FIRST_EX_CASE_SENSITIVE to ensure OBJ_CASE_INSENSITIVE is not set in subsequent calls in their call trees). This can be observed in, for instance, Notepad, which always opens one file in a case-equivalence-class of files, the one that is found first (which is entirely deterministic and is based on the order of the files in the directory on disk). – Eric Towers Nov 04 '15 at 13:21
  • @EricTowers it depends on the application, but Explorer and cmd.exe treat it as if both file and File were specified. Notepad is not capable of opening multiple files from the Open dialog, so it is not too surprising that it chooses only one of the two. If drag and drop doesn't work correctly for a batch file, it is a good reason either not to use drag and drop or not to use the batch language, which is so limited and painful to use that it is not a good choice for any but the very simplest operations anyway. Windows Script Host or PowerShell scripting would have been better choices. – Oleksandr R. Nov 04 '15 at 14:10
  • With limits to the extent of changes allowable on client machines, using anything other than batch was disallowed. No full languages, not .NET runtimes, ... The list of restrictions was onerous. This left batch or a completely compiled solution. One of those could be whacked up in the time frame (a day, since neither time nor money was budgeted for the data analysis, because people who haven't done it think it's trivial...) – Eric Towers Nov 05 '15 at 17:44