C:\RETRO\DOS>type vfatto~1.htm
VFAT to 8.3: The Shortname Rules Behind the Curtain
The second story begins with a floppy label that looked harmless:
RELEASE_NOTES_FINAL_REALLY_FINAL.TXT
By itself, that filename is only mildly annoying. Inside a mixed DOS/Windows pipeline in 1990s tooling, it can become a release blocker.
Our fictional team learned this in one long weekend. The packager ran on a VFAT-capable machine. The installer verifier ran in a strict DOS context. The build ledger expected 8.3 aliases. Nobody had documented the shortname translation rules completely. Everybody thought they “basically knew” them.
“Basically” lasted until the audit script flagged twelve mismatches that were all technically valid and operationally catastrophic.
This article is the deep dive we wish we had then: how long names become 8.3 aliases, how collisions are resolved, and how to build deterministic tooling around those rules.
First principle: translate per path component
The most important rule is easy to miss:
Translation happens per single path component, not on the full path string.
That means each directory name and final file name is handled independently. If you normalize the entire path in one pass, you will eventually generate aliases that cannot exist in real directory contexts.
In practical terms:
C:\SRC\Very Long Directory\My Program Source.pas- is translated component-by-component, each with its own collision scope
That “collision scope” phrase matters. Uniqueness is enforced within a directory, not globally across the volume.
Fast path: already legal 8.3 names stay as-is
If the input is already a legal short name after OEM uppercase normalization, use that 8.3 form directly (uppercase).
This avoids unnecessary alias churn and preserves operator expectations. A file named CONFIG.SYS should not become something novel just because your algorithm always builds FIRST6~1.
Teams that skip this rule create avoidable incompatibilities.
When alias generation is required
If the name is not already legal 8.3, generate alias candidates using strict steps.
The baseline candidate pattern is:
FIRST6~1.EXT
Where:
FIRST6is normalized/truncated basename prefix~1is initial numeric tail.EXTis extension if one exists, truncated to max 3
No extension? Then no trailing dot/extension segment.
Dot handling is where most bugs hide
Real filenames can contain multiple dots, trailing dots, and decorative punctuation. The rules must be explicit:
- skip leading
.characters - allow only one basename/extension separator in 8.3
- prefer the last dot that has valid non-space characters after it
- if name ends with a dot, ignore that trailing dot and use a previous valid dot if present
This is the difference between deterministic behavior and parser folklore.
Example intuition:
report.final.v3.txt-> extension source is last meaningful dot beforetxtarchive.-> trailing dot is ignored; extension may end up empty
Character legality and normalization
Normalization from the spec includes:
- remove spaces and extra dots
- uppercase letters using active OEM code page semantics
- drop characters that are not representable/legal for short names
Disallowed characters include control chars and:
" * + , / : ; < = > ? [ \ ] |
A critical note from the rules:
- Microsoft-documented NT behavior:
[ ] + = , : ;are replaced with_during short-name generation - other illegal/superfluous characters are removed
If your toolchain mixes “replace” and “remove” without policy, you will drift from expected aliases.
Collision handling is an algorithm, not a guess
The collision rule set is precise:
- try
~1 - if occupied, try
~2,~3, … - as tail digits grow, shrink basename prefix so total basename+tail stays within 8 chars
- continue until unique in the directory
That means ~10 and ~100 are not formatting quirks. They force basename compaction decisions.
A common implementation failure is forgetting to shrink prefix when suffix width grows. The result is invalid aliases or silent truncation.
A deterministic translator skeleton
The following Pascal-style pseudocode keeps policy explicit:
function MakeShortAlias(const LongName: string; const Existing: TStringSet): string;
var
BaseRaw, ExtRaw, BaseNorm, ExtNorm: string;
Tail, PrefixLen: Integer;
Candidate: string;
begin
SplitUsingDotRules(LongName, BaseRaw, ExtRaw); { skip leading dots, last valid dot logic }
BaseNorm := NormalizeBase(BaseRaw); { remove spaces/extra dots, uppercase, legality policy }
ExtNorm := NormalizeExt(ExtRaw); { uppercase, legality policy, truncate to 3 }
if IsLegal83(BaseNorm, ExtNorm) and (not Existing.Contains(Compose83(BaseNorm, ExtNorm))) then
begin
MakeShortAlias := Compose83(BaseNorm, ExtNorm);
Exit;
end;
Tail := 1;
repeat
PrefixLen := 8 - (1 + Length(IntToStr(Tail))); { room for "~" + digits }
if PrefixLen < 1 then PrefixLen := 1;
Candidate := Copy(BaseNorm, 1, PrefixLen) + '~' + IntToStr(Tail);
Candidate := Compose83(Candidate, ExtNorm);
Inc(Tail);
until not Existing.Contains(Candidate);
MakeShortAlias := Candidate;
end;This intentionally leaves NormalizeBase, NormalizeExt, and SplitUsingDotRules as separate units so policy stays testable.
Table-driven tests beat intuition
Our fictional team fixed its pipeline by building a test corpus, not by debating memory:
|
|
The exact alias strings can vary with existing collisions and code-page/legality policy details, but the algorithmic behavior should not vary.
Why this matters in operational pipelines
Shortname translation touches many workflows:
- installer scripts that reference legacy names
- backup/restore verification against manifests
- cross-tool compatibility between VFAT-aware and strict 8.3 utilities
- reproducible release artifacts
If alias generation is non-deterministic, two developers can build “same version” media with different effective filenames.
That is a release-management nightmare.
The fictional incident response
In our story, the break happened during a Friday packaging run. By Saturday morning, three teams had three conflicting explanations:
- “the verifier is wrong”
- “Windows generated weird aliases”
- “someone copied files manually”
By Saturday afternoon, a tiny deterministic translator plus collision-aware tests cut through all three theories. The verifier was correct, alias generation differed between tools, and manual copies had introduced namespace collisions in one directory.
Nobody needed blame. We needed rules.
Subtle rule: legality depends on OEM code page
One more important caveat from the spec:
Uppercasing and character validity are evaluated in active OEM code page context.
That means “works on my machine” can still fail if code-page assumptions differ. For strict reproducibility, pin the environment and test corpus together.
Practical implementation checklist
For a robust translator:
- process one path component at a time
- implement legal-8.3 fast path first
- codify dot-selection/trailing-dot behavior exactly
- separate remove-vs-replace character policy clearly
- enforce extension max length 3
- implement collision tail growth with dynamic prefix shrink
- ship fixture tests with occupied-directory scenarios
That last point is non-negotiable. Most alias bugs only appear under collision pressure.
Closing scene
Our weekend story ends around 01:03 on Sunday. The final verification pass prints green across every directory. The whiteboard still looks chaotic. The room still smells like old plastic and instant coffee. But now the behavior is explainable.
Long names can still be expressive. Short names can still be strict. The bridge between them does not need magic. It needs documented rules and testable translation.
In DOS-era engineering, that is usually the whole game: reduce mystery, increase repeatability, and let simple tools carry serious work.
Related reading: