Tech Support > Microsoft Windows > Drivers > Characters allowed in short filenames
Characters allowed in short filenames
Posted by Norman Diamond on April 15th, 2008


My partial understanding is that short filenames are stored using the OEM
code page of the system default locale at the time that the file (or
directory) is created.

For complicated code pages this is pretty simple, for example code page 932
is both ANSI and OEM, so each ANSI codepoint maps onto the exact same OEM
codepoint.

For simple code pages this isn't so simple. For example for several Western
European languages the default ANSI code page is 1252 but the default OEM
code page isn't 1252. I thought I read that the default OEM code page for
US Windows would be 437, but experiments indicate otherwise.

As far as I can tell, code page 437 doesn't contain a ß character. So if
the current default OEM code page is 437 and I create a new file then the
short filename cannot contain a ß.

Code page 850 contains a ß character. So if the current default OEM code
page is 850 then we are halfway towards allowing a short filename to contain
a ß. We shouldn't get more than halfway because lowercase letters aren't
allowed in short filenames, but let's proceed.

I installed US Windows 98 in a virtual PC. I left all its language settings
as defaults; I didn't even install the options for limited amounts of
multilingualism. In a command prompt window I tried the MODE CON command,
but it gave an error instead of telling what code page it was using.

I did install the character map utility, and copied a ß character into the
command prompt. US Windows 98 let me create file SßT.TXT. Well this is OK
so far, since long filenames are stored in Unicode.

Oops. The DIR command said that the short filename is also SßT.TXT. So
does this mean that US Windows defaults its OEM code page to 850 instead of
437?

The next problem is that fatgen103.doc says that short filenames are always
converted to uppercase. So how could a short filename be SßT.TXT instead of
SSST.TXT? No problem for the long filename to be SßT.TXT, but how could the
short filename contain a lowercase letter?

Other letters are going to be more troublesome, and I guess
ntfsgen103.doc[*] is going to say even less than fatgen103.doc says, but if
anyone knows the real rules, could someone please say?

[* I assume there's no such document, which is the reason it's not even
going to say how to determine what characters are allowed in short names in
NTFS.]

Posted by David Craig on April 15th, 2008


For whichever version of Windows was in effect when fatgen103.doc was
created, it is possibly true. The sources to fastfat are in the WDK. There
are also somethings going on under the win32 subsystem, but if you use a
NtCreateFile() you can bypass those to see if there is any difference.

"Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message
news:e1iF$PrnIHA.2352@TK2MSFTNGP05.phx.gbl...


Posted by Norman Diamond on April 15th, 2008


Unfortunately my requirements at the moment don't involve what the OS is
doing (aside from what its system locale's OEM code page might be from time
to time). My requirements involve storing allowable filenames. Long
filenames seem to be pretty simple ... mostly[*]. Short filenames are
yielding all these questions.

Please, do you know what the real rules are for what characters are allowed
on disk in short filenames?

[* The Posix subsystem clouds up the question of what's allowed in long
filenames. | is allowed but " isn't, even though Posix itself allows both.
Oops this comes from experiments not from documentation.]


"David Craig" <drivers@nowhere.us> wrote in message
news:%2356XeVrnIHA.1164@TK2MSFTNGP02.phx.gbl...

Posted by Norman Diamond on April 15th, 2008


US Windows 98 stored the German lower-case letter ß as the Greek lower-case
letter β. Code page 437 has β. Halfway OK, US Windows 98 defaults to OEM
code page 437 not 850, and we are halfway towards being able to store β
(but not ß).

Now, fatgen103.doc is very clear in prohibiting lower-case letters from
being stored in a short name. So still, how did β get into a short name?

Does anyone know the real rules on what is allowed in a short name?


"Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message
news:e1iF$PrnIHA.2352@TK2MSFTNGP05.phx.gbl...

Posted by IQDave on April 15th, 2008


BJ Rollins seems to have a great handle on this info on his IMTesty Blog. You
might be able to get an answer from him.

IQDave

"Norman Diamond" wrote:

Posted by Norman Diamond on April 16th, 2008


BJ Rollins has a great blog with an average proportion of bugs.
fatgen103.doc seems to have a slightly below average proportion of bugs, but
Windows doesn't completely agree with it. I need the actual correct rules.


"IQDave" <IQDave@discussions.microsoft.com> wrote in message
news:5AB0E88D-7FD3-4242-97E2-5E1CC1D808A6@microsoft.com...

Posted by Norman Diamond on April 16th, 2008


Also by the way, BJ Rollins' articles (at least the ones I saw) don't even
attempt to discuss what is valid in any OEM code page in the short name
stored on the device. Sometimes they discuss what is valid in one selected
ANSI code page in a client application, but don't even attempt to discuss
dozens of other valid ANSI code pages.


"Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message
news:%23NVDwV1nIHA.5096@TK2MSFTNGP02.phx.gbl...

Posted by m on April 16th, 2008


The short answer is: it depends. The rules for file names, especially short
names, are extremely complex and depend on the environment.



BTW: you are not going to get what you want out of this group by tirading
like this



"Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message
news:udqEoW2nIHA.4292@TK2MSFTNGP04.phx.gbl...


Posted by Norman Diamond on April 17th, 2008


It wasn't intended to be a tirade, just a statement of fact. Mr. Rollins'
articles had a different purpose from my needs, and the two do not overlap.
I tried to explain to IQDave why the overlap is zero.

Yes I know that the rules are extremely complex, and I was hoping someone
could say where to find them.


"m" <m@b.c> wrote in message news:uD9FJg9nIHA.4616@TK2MSFTNGP05.phx.gbl...

Posted by Alexander Grigoriev on April 17th, 2008


Jumping in late to this.
Have you checked CheckNameLegalDOS8Dot3 ? This must be the function you're
looking for.

"Norman Diamond" <ndiamond@newsgroup.nospam> wrote in message
news:uYuYx4BoIHA.1772@TK2MSFTNGP03.phx.gbl...


Posted by Norman Diamond on April 22nd, 2008


OK, I think I understand now. A common occurence in open source land is
that the source code is the documentation. In the situation here, the
source code of the CheckNameLegalDOS8Dot3 API is the documentation of what
characters are allowed in short filenames. So, is the source code of
CheckNameLegalDOS8Dot3 publicly available?


"Alexander Grigoriev" <alegr@earthlink.net> wrote in message
news:Opw0nnJoIHA.1236@TK2MSFTNGP02.phx.gbl...

Posted by IQDave on May 12th, 2008


Norman,
You are correct about the blog not containing all the rules. But, if
you'll re-read my post it did not say that the full list of _rules_ was on
BJRollins blog, but that, "You might be able to get an answer from _him_."
What I was trying to convey (but apparently did not fully accomplish) was
that he seems to have a good grasp on the domain and that you should contact
him (not read his blog).

To put it more explicitly, if you can't find the source that you are
looking for, you may want to try emailing him. Either he, or someone he knows
will likely have access to the information you seek.

Thanks, and good luck.
IQDave

P.S.
I understand the frustration of seeking precise details on a complex issue
well and did not take your response as a tirade. I just ask that you read
what is there and not what you think is there.
Also, I'd be highly surprized if there was actually "zero overlap" in the
testing article and your pursuit. But then, I'm an analyst and could find
overlap almost anywhere. Best wishes.

"Norman Diamond" wrote:


Similar Posts