VFAT to 8.3: The Shortname Rules Behind the Curtain

C:\RETRO\DOS>type vfatto~1.htm

VFAT to 8.3: The Shortname Rules Behind the Curtain

The second story begins with a floppy label that looked harmless:

RELEASE_NOTES_FINAL_REALLY_FINAL.TXT

By itself, that filename is only mildly annoying. Inside a mixed DOS/Windows pipeline in 1990s tooling, it can become a release blocker.

Our fictional team learned this in one long weekend. The packager ran on a VFAT-capable machine. The installer verifier ran in a strict DOS context. The build ledger expected 8.3 aliases. Nobody had documented the shortname translation rules completely. Everybody thought they “basically knew” them.

“Basically” lasted until the audit script flagged twelve mismatches that were all technically valid and operationally catastrophic.

This article is the deep dive we wish we had then: how long names become 8.3 aliases, how collisions are resolved, and how to build deterministic tooling around those rules.

First principle: translate per path component

The most important rule is easy to miss:

Translation happens per single path component, not on the full path string.

That means each directory name and final file name is handled independently. If you normalize the entire path in one pass, you will eventually generate aliases that cannot exist in real directory contexts.

In practical terms:

  • C:\SRC\Very Long Directory\My Program Source.pas
  • is translated component-by-component, each with its own collision scope

That “collision scope” phrase matters. Uniqueness is enforced within a directory, not globally across the volume.

If the input is already a legal short name after OEM uppercase normalization, use that 8.3 form directly (uppercase).

This avoids unnecessary alias churn and preserves operator expectations. A file named CONFIG.SYS should not become something novel just because your algorithm always builds FIRST6~1.

Teams that skip this rule create avoidable incompatibilities.

When alias generation is required

If the name is not already legal 8.3, generate alias candidates using strict steps.

The baseline candidate pattern is:

FIRST6~1.EXT

Where:

  • FIRST6 is normalized/truncated basename prefix
  • ~1 is initial numeric tail
  • .EXT is extension if one exists, truncated to max 3

No extension? Then no trailing dot/extension segment.

Dot handling is where most bugs hide

Real filenames can contain multiple dots, trailing dots, and decorative punctuation. The rules must be explicit:

  • skip leading . characters
  • allow only one basename/extension separator in 8.3
  • prefer the last dot that has valid non-space characters after it
  • if name ends with a dot, ignore that trailing dot and use a previous valid dot if present

This is the difference between deterministic behavior and parser folklore.

Example intuition:

  • report.final.v3.txt -> extension source is last meaningful dot before txt
  • archive. -> trailing dot is ignored; extension may end up empty

Character legality and normalization

Normalization from the spec includes:

  • remove spaces and extra dots
  • uppercase letters using active OEM code page semantics
  • drop characters that are not representable/legal for short names

Disallowed characters include control chars and:

" * + , / : ; < = > ? [ \ ] |

A critical note from the rules:

  • Microsoft-documented NT behavior: [ ] + = , : ; are replaced with _ during short-name generation
  • other illegal/superfluous characters are removed

If your toolchain mixes “replace” and “remove” without policy, you will drift from expected aliases.

Collision handling is an algorithm, not a guess

The collision rule set is precise:

  1. try ~1
  2. if occupied, try ~2, ~3, …
  3. as tail digits grow, shrink basename prefix so total basename+tail stays within 8 chars
  4. continue until unique in the directory

That means ~10 and ~100 are not formatting quirks. They force basename compaction decisions.

A common implementation failure is forgetting to shrink prefix when suffix width grows. The result is invalid aliases or silent truncation.

A deterministic translator skeleton

The following Pascal-style pseudocode keeps policy explicit:

function MakeShortAlias(const LongName: string; const Existing: TStringSet): string;
var
  BaseRaw, ExtRaw, BaseNorm, ExtNorm: string;
  Tail, PrefixLen: Integer;
  Candidate: string;
begin
  SplitUsingDotRules(LongName, BaseRaw, ExtRaw);   { skip leading dots, last valid dot logic }
  BaseNorm := NormalizeBase(BaseRaw);              { remove spaces/extra dots, uppercase, legality policy }
  ExtNorm  := NormalizeExt(ExtRaw);                { uppercase, legality policy, truncate to 3 }

  if IsLegal83(BaseNorm, ExtNorm) and (not Existing.Contains(Compose83(BaseNorm, ExtNorm))) then
  begin
    MakeShortAlias := Compose83(BaseNorm, ExtNorm);
    Exit;
  end;

  Tail := 1;
  repeat
    PrefixLen := 8 - (1 + Length(IntToStr(Tail))); { room for "~" + digits }
    if PrefixLen < 1 then PrefixLen := 1;
    Candidate := Copy(BaseNorm, 1, PrefixLen) + '~' + IntToStr(Tail);
    Candidate := Compose83(Candidate, ExtNorm);
    Inc(Tail);
  until not Existing.Contains(Candidate);

  MakeShortAlias := Candidate;
end;

This intentionally leaves NormalizeBase, NormalizeExt, and SplitUsingDotRules as separate units so policy stays testable.

Table-driven tests beat intuition

Our fictional team fixed its pipeline by building a test corpus, not by debating memory:

1
2
3
4
5
6
7
Input Component                         Expected Shape
--------------------------------------  ------------------------
README.TXT                              README.TXT
very long filename.txt                  VERYLO~1.TXT
archive.final.build.log                 ARCHIV~1.LOG
...hiddenprofile                        HIDDEN~1
name with spaces.and.dots...cfg         NAMEWI~1.CFG

The exact alias strings can vary with existing collisions and code-page/legality policy details, but the algorithmic behavior should not vary.

Why this matters in operational pipelines

Shortname translation touches many workflows:

  • installer scripts that reference legacy names
  • backup/restore verification against manifests
  • cross-tool compatibility between VFAT-aware and strict 8.3 utilities
  • reproducible release artifacts

If alias generation is non-deterministic, two developers can build “same version” media with different effective filenames.

That is a release-management nightmare.

The fictional incident response

In our story, the break happened during a Friday packaging run. By Saturday morning, three teams had three conflicting explanations:

  • “the verifier is wrong”
  • “Windows generated weird aliases”
  • “someone copied files manually”

By Saturday afternoon, a tiny deterministic translator plus collision-aware tests cut through all three theories. The verifier was correct, alias generation differed between tools, and manual copies had introduced namespace collisions in one directory.

Nobody needed blame. We needed rules.

Subtle rule: legality depends on OEM code page

One more important caveat from the spec:

Uppercasing and character validity are evaluated in active OEM code page context.

That means “works on my machine” can still fail if code-page assumptions differ. For strict reproducibility, pin the environment and test corpus together.

Practical implementation checklist

For a robust translator:

  1. process one path component at a time
  2. implement legal-8.3 fast path first
  3. codify dot-selection/trailing-dot behavior exactly
  4. separate remove-vs-replace character policy clearly
  5. enforce extension max length 3
  6. implement collision tail growth with dynamic prefix shrink
  7. ship fixture tests with occupied-directory scenarios

That last point is non-negotiable. Most alias bugs only appear under collision pressure.

Closing scene

Our weekend story ends around 01:03 on Sunday. The final verification pass prints green across every directory. The whiteboard still looks chaotic. The room still smells like old plastic and instant coffee. But now the behavior is explainable.

Long names can still be expressive. Short names can still be strict. The bridge between them does not need magic. It needs documented rules and testable translation.

In DOS-era engineering, that is usually the whole game: reduce mystery, increase repeatability, and let simple tools carry serious work.

Related reading:

2026-03-10