C:\RETRO\DOS>type determ~1.htm
Deterministic DIR Output as an Operational Contract
The story starts at 23:14 in a room with two beige towers, one half-dead fluorescent tube, and a whiteboard covered in hand-written file counts. We had one mission: rebuild a damaged release set from mixed backup disks and compare it against a known-good manifest.
On paper, that sounds easy. In practice, it meant parsing DIR output across different machines, each configured slightly differently, each with enough personality to make automation fail at the worst moment.
By 23:42 we had already hit the first trap. One machine produced DIR output that looked “normal” to a human and ambiguous to a parser. Another printed dates in a different shape. A third had enough local customization that every assumption broke after line three. We were not failing because DOS was bad. We were failing because we had not written down what “correct output” meant.
That night we stopped treating DIR as a casual command and started treating it as an API contract.
This article is that deep dive: why a deterministic profile matters, how to structure it, and how to parse it without superstitions.
The turning point: formatting is behavior
In modern systems, people accept that JSON schemas and protocol contracts are architecture. In DOS-era workflows, plain text command output played that same role. If your automation consumed command output, formatting was behavior.
Our internal profile locked one specific command shape:
DIR [drive:][path][filespec]- default long listing
- no
/W, no/B, no formatting switches - fixed US date/time rendering (
MM-DD-YY,h:mma/h:mmp)
That scoping decision solved half the problem. We stopped pretending one parser should support every possible switch/locale and instead declared a strict operating envelope.
A canonical listing is worth hours of debugging
The profile included a canonical example and we used it as a fixture:
|
|
Why include this in a spec? Because examples settle debates that prose cannot. When two engineers disagree, the fixture wins.
The 38-column row discipline
The core entry template was fixed-width:
|
|
That yields exactly 38 columns:
- columns
1..8: basename (left-aligned) - column
9: space - columns
10..12: extension (left-aligned) - columns
13..14: spaces - columns
15..22: size-or-dir (right-aligned) - column
23: space - columns
24..31: date - column
32: space - columns
33..38: time (right-aligned)
Once you adopt positional parsing instead of regex guesswork, DIR lines become boring in the best way.
Why this works even on noisy nights
Fixed-width parsing has practical advantages under pressure:
- no locale-sensitive token splitting for date/time columns
- no ambiguity between
<DIR>and size values - deterministic handling of one-digit vs two-digit hour
- easy visual validation during manual triage
At 01:12, when you are diffing listings by eye and caffeine alone, “column 15 starts the size field” is operational mercy.
Header and footer are part of the protocol
Many parsers only parse entry rows and ignore header/footer. That is a missed opportunity.
Our profile explicitly fixed header sequence:
- volume label line (
is <LABEL>orhas no label) - serial line (
XXXX-XXXX, uppercase hex) - blank line
Directory of <PATH>- blank line
And footer sequence:
- file totals:
%8u File(s) %11s bytes - dir/free totals:
%8u Dir(s) %11s bytes free
Those two footer lines are not decoration. They are integrity checks. If parsed file count says 127 and footer says 126, stop and investigate before touching production disks.
Parsing algorithm we actually trusted
This is the skeleton we converged on in Turbo Pascal style:
type
TDirEntry = record
BaseName: string[8];
Ext: string[3];
IsDir: Boolean;
SizeBytes: LongInt;
DateText: string[8]; { MM-DD-YY }
TimeText: string[6]; { right-aligned h:mma/h:mmp }
end;
function TrimRight(const S: string): string;
var
I: Integer;
begin
I := Length(S);
while (I > 0) and (S[I] = ' ') do Dec(I);
TrimRight := Copy(S, 1, I);
end;
function ParseEntryLine(const L: string; var E: TDirEntry): Boolean;
var
NameField, ExtField, SizeField, DateField, TimeField: string;
Code: Integer;
begin
ParseEntryLine := False;
if Length(L) < 38 then Exit;
NameField := Copy(L, 1, 8);
ExtField := Copy(L, 10, 3);
SizeField := Copy(L, 15, 8);
DateField := Copy(L, 24, 8);
TimeField := Copy(L, 33, 6);
E.BaseName := TrimRight(NameField);
E.Ext := TrimRight(ExtField);
E.DateText := DateField;
E.TimeText := TimeField;
if TrimRight(SizeField) = '<DIR>' then
begin
E.IsDir := True;
E.SizeBytes := 0;
end
else
begin
E.IsDir := False;
Val(TrimRight(SizeField), E.SizeBytes, Code);
if Code <> 0 then Exit;
end;
ParseEntryLine := True;
end;This parser is intentionally plain. No hidden assumptions, no dynamic heuristics, no “best effort.” It either matches the profile or fails loudly.
Edge cases that must be explicit
The spec was strict about awkward but common cases:
- extensionless files: extension field is blank (three spaces in raw row)
- short names/exts: right-padding in fixed fields
- directories always use
<DIR>in size field - if value exceeds width, allow rightward overflow; never truncate data
The overflow rule is subtle and important. Truncation creates false data, and false data is worse than ugly formatting.
Counting bytes: grouped vs ungrouped is not random
A detail teams often forget:
- entry
SIZE_OR_DIRfile size is decimal without grouping - footer byte totals are grouped with US commas in this profile
That split looks cosmetic until a parser accidentally strips commas in one place but not the other. If totals are part of your acceptance gate, normalize once and test it with fixtures.
The fictional incident that made it real
At 02:07 in our story, we finally had a clean parse on machine A. We ran the same process on machine B, then compared manifests. Everything looked perfect except one tiny mismatch: file count agreed, byte count differed by 1,024.
Old us would have guessed corruption and started copying disks again.
Spec-driven us inspected footer math first, then entry parse, then source listing capture. The issue was not corruption. One listing had accidentally included a generated staging file from a side directory because the operator typed a wildcard path incorrectly.
The deterministic header (Directory of ...) and footer checks caught it in minutes.
No drama. Just protocol discipline.
What this teaches beyond DOS
The strongest lesson is not “DOS output is neat.” The lesson is operational:
- any text output consumed by tools should be treated as a contract
- contracts need explicit scope and out-of-scope declarations
- examples + field widths + sequence rules beat vague descriptions
- integrity lines (counts/totals) should be first-class validation points
That mindset scales from floppy-era rebuild scripts to modern CI logs and telemetry processors.
Implementation checklist for your own parser
If you want a stable implementation from this profile:
- enforce command profile (no unsupported switches)
- parse header in strict order
- parse entry rows by fixed columns, not token split
- parse footer totals and cross-check with computed values
- fail explicitly on profile deviation
- keep canonical fixture listings in version control
This gives you deterministic behavior and debuggable failures.
Closing scene
At 03:18 we printed two manifests, one from recovered media and one from archive baseline, and compared them line by line. For the first time that night, we trusted the result.
Not because the room got quieter.
Not because the disks got newer.
Because the contract got clearer.
The old DOS prompt did what old prompts always do: it reflected our discipline back at us.
Related reading: