<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Internet on TurboVision</title>
    <link>https://turbovision.in6-addr.net/tags/internet/</link>
    <description>Recent content in Internet on TurboVision</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Tue, 21 Apr 2026 14:06:12 +0000</lastBuildDate>
    <atom:link href="https://turbovision.in6-addr.net/tags/internet/index.xml" rel="self" type="application/rss&#43;xml" />
    
    
    
    <item>
      <title>From Mailboxes to Everything Internet, Part 4: Perimeter, Proxies, and the Operations Upgrade</title>
      <link>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-4-perimeter-proxies-and-the-operations-upgrade/</link>
      <pubDate>Fri, 21 May 2010 00:00:00 +0000</pubDate>
      <lastBuildDate>Fri, 21 May 2010 00:00:00 +0000</lastBuildDate>
      <guid>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-4-perimeter-proxies-and-the-operations-upgrade/</guid>
      <description>&lt;p&gt;The final phase of the migration story starts when internet access stops being &amp;ldquo;useful&amp;rdquo; and becomes &amp;ldquo;required for normal business.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;That is the moment architecture changes character. You are no longer adding online capabilities to an offline-first world. You are operating an internet-dependent environment where outages hurt immediately, security posture matters daily, and latency becomes political.&lt;/p&gt;
&lt;p&gt;If Part 1 taught us gateways, Part 2 taught policy discipline, and Part 3 taught identity realism, Part 4 teaches operational maturity: perimeter control, proxy strategy, and observability that is good enough to act on.&lt;/p&gt;
&lt;h2 id=&#34;the-perimeter-timeline-everyone-lived&#34;&gt;The perimeter timeline everyone lived&lt;/h2&gt;
&lt;p&gt;In the late 90s and early 2000s, many of us moved through the same progression:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;permissive edge with ad-hoc rules&lt;/li&gt;
&lt;li&gt;basic packet filtering&lt;/li&gt;
&lt;li&gt;NAT as default containment and address strategy&lt;/li&gt;
&lt;li&gt;explicit service publishing with stricter inbound policy&lt;/li&gt;
&lt;li&gt;recurring audits and documented rule ownership&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tool names changed over time. The operating truth stayed constant:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If nobody can explain why a firewall rule exists, that rule is debt.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&#34;rule-sets-as-executable-policy&#34;&gt;Rule sets as executable policy&lt;/h2&gt;
&lt;p&gt;The biggest jump in reliability came when we stopped treating firewall config as wizard output and started treating it like policy code with comments, ownership, and change history.&lt;/p&gt;
&lt;p&gt;A conceptual baseline:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;default INPUT  = DROP
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;default FORWARD = DROP
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;default OUTPUT = ACCEPT
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;allow established,related
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;allow loopback
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;allow admin-ssh from mgmt-net
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;allow smtp to mail-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;allow web to reverse-proxy
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;log+drop everything else&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is not about minimalism for style points. It is about creating a rulebase an operator can reason about quickly during incidents.&lt;/p&gt;
&lt;h2 id=&#34;nat-convenience-and-trap-in-one-box&#34;&gt;NAT: convenience and trap in one box&lt;/h2&gt;
&lt;p&gt;NAT solved practical problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;private address reuse&lt;/li&gt;
&lt;li&gt;easy outbound internet for many hosts&lt;/li&gt;
&lt;li&gt;accidental reduction of direct inbound exposure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It also created recurring confusion:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;works outbound, fails inbound&amp;rdquo;&lt;/li&gt;
&lt;li&gt;protocol edge cases under state tracking&lt;/li&gt;
&lt;li&gt;poor assumptions that NAT equals security policy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We learned to separate concerns explicitly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NAT handles address translation&lt;/li&gt;
&lt;li&gt;firewall handles policy&lt;/li&gt;
&lt;li&gt;service publishing handles intentional exposure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Combining them mentally is how outages hide.&lt;/p&gt;
&lt;h2 id=&#34;proxy-and-cache-operations-bandwidth-as-architecture&#34;&gt;Proxy and cache operations: bandwidth as architecture&lt;/h2&gt;
&lt;p&gt;Web access volume and software update traffic make proxy/cache design a real budget topic, especially on constrained links.&lt;/p&gt;
&lt;p&gt;A disciplined proxy setup gave us:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reduced repeated downloads&lt;/li&gt;
&lt;li&gt;controllable egress behavior&lt;/li&gt;
&lt;li&gt;clearer audit path for outbound traffic&lt;/li&gt;
&lt;li&gt;policy enforcement point for categories and exceptions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It also gave us politics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;who gets exceptions&lt;/li&gt;
&lt;li&gt;what to log and for how long&lt;/li&gt;
&lt;li&gt;how to communicate policy without creating a revolt&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The winning pattern was transparent policy with named ownership and periodic review, not silent filtering.&lt;/p&gt;
&lt;h2 id=&#34;monitoring-matured-from-nice-graph-to-first-responder&#34;&gt;Monitoring matured from &amp;ldquo;nice graph&amp;rdquo; to &amp;ldquo;first responder&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Early graphing projects were often visual hobbies. Around 2008-2010, monitoring became core operations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;service availability checks&lt;/li&gt;
&lt;li&gt;latency and packet-loss visibility&lt;/li&gt;
&lt;li&gt;queue and disk saturation alerts&lt;/li&gt;
&lt;li&gt;trend analysis for capacity planning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A minimal useful stack in that era looked like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;polling/graphing for interfaces and host metrics&lt;/li&gt;
&lt;li&gt;active checks for critical services&lt;/li&gt;
&lt;li&gt;alert routing by severity and schedule&lt;/li&gt;
&lt;li&gt;daily review of top recurring warnings&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most teams fail not from missing tools, but from alert noise without ownership.&lt;/p&gt;
&lt;h2 id=&#34;alert-hygiene-less-noise-more-truth&#34;&gt;Alert hygiene: less noise, more truth&lt;/h2&gt;
&lt;p&gt;We adopted three rules that changed everything:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;every alert must map to a concrete action&lt;/li&gt;
&lt;li&gt;every noisy alert must be tuned or removed&lt;/li&gt;
&lt;li&gt;every major incident must produce one monitoring improvement&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Without these rules, monitoring becomes background anxiety.
With them, monitoring becomes a decision system.&lt;/p&gt;
&lt;h2 id=&#34;web-went-from-optional-to-default-workload&#34;&gt;Web went from optional to default workload&lt;/h2&gt;
&lt;p&gt;In the &amp;ldquo;everything internet&amp;rdquo; phase, internal services increasingly depended on external web APIs, update endpoints, and browser-based tooling. Outbound failures became as disruptive as inbound failures.&lt;/p&gt;
&lt;p&gt;That pushed us to monitor the whole path:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;local DNS health&lt;/li&gt;
&lt;li&gt;upstream DNS responsiveness&lt;/li&gt;
&lt;li&gt;default route and failover behavior&lt;/li&gt;
&lt;li&gt;proxy health&lt;/li&gt;
&lt;li&gt;selected external endpoint reachability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When users say &amp;ldquo;internet is slow,&amp;rdquo; they mean any one of twelve potential bottlenecks.&lt;/p&gt;
&lt;h2 id=&#34;incident-story-the-half-outage-that-taught-path-thinking&#34;&gt;Incident story: the half-outage that taught path thinking&lt;/h2&gt;
&lt;p&gt;One of our most educational incidents looked like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;internal DNS resolved fine&lt;/li&gt;
&lt;li&gt;external name resolution intermittently failed&lt;/li&gt;
&lt;li&gt;some websites loaded, others timed out&lt;/li&gt;
&lt;li&gt;mail queues started deferring to specific domains&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Initial blame went to firewall changes. Real cause was upstream DNS flapping plus a local resolver timeout setting that turned transient upstream latency into user-visible failure bursts.&lt;/p&gt;
&lt;p&gt;Fixes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;tune resolver timeout/retry behavior&lt;/li&gt;
&lt;li&gt;add secondary upstream resolvers with health checks&lt;/li&gt;
&lt;li&gt;monitor DNS query latency as first-class metric&lt;/li&gt;
&lt;li&gt;add runbook step: test path by stage, not by &amp;ldquo;internet yes/no&amp;rdquo;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The lesson: binary status checks are comforting and often wrong.&lt;/p&gt;
&lt;h2 id=&#34;operational-runbooks-became-mandatory&#34;&gt;Operational runbooks became mandatory&lt;/h2&gt;
&lt;p&gt;As dependency increased, we formalized runbooks for common internet-era failures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;high packet loss on WAN edge&lt;/li&gt;
&lt;li&gt;DNS partial outage&lt;/li&gt;
&lt;li&gt;proxy saturation&lt;/li&gt;
&lt;li&gt;firewall deploy regression&lt;/li&gt;
&lt;li&gt;certificate expiry risk (yes, this became real quickly)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A useful runbook page had:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;symptom signatures&lt;/li&gt;
&lt;li&gt;first 5 commands/checks&lt;/li&gt;
&lt;li&gt;containment action&lt;/li&gt;
&lt;li&gt;escalation threshold&lt;/li&gt;
&lt;li&gt;known false signals&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Good runbooks are written by people who have been paged, not by people who enjoy templates.&lt;/p&gt;
&lt;h2 id=&#34;capacity-planning-by-trend-not-by-optimism&#34;&gt;Capacity planning by trend, not by optimism&lt;/h2&gt;
&lt;p&gt;The 2005-2010 period punished optimistic capacity assumptions. We moved to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;weekly trend snapshots&lt;/li&gt;
&lt;li&gt;monthly peak reports&lt;/li&gt;
&lt;li&gt;explicit growth assumptions tied to user counts/services&lt;/li&gt;
&lt;li&gt;trigger thresholds for upgrade planning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bandwidth, disk, queue depth, and backup windows all needed trend visibility.&lt;/p&gt;
&lt;p&gt;The cheapest way to buy reliability is to stop being surprised.&lt;/p&gt;
&lt;h2 id=&#34;security-posture-in-the-broadband-normal&#34;&gt;Security posture in the broadband normal&lt;/h2&gt;
&lt;p&gt;Always-on connectivity changed attack surface and incident frequency. Sensible baseline hardening became routine:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;minimize exposed services&lt;/li&gt;
&lt;li&gt;patch regularly with rollback plan&lt;/li&gt;
&lt;li&gt;enforce admin access boundaries&lt;/li&gt;
&lt;li&gt;log denied traffic with retention policy&lt;/li&gt;
&lt;li&gt;periodically validate external exposure with independent scans&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No single control solved this. Layered boring controls did.&lt;/p&gt;
&lt;h2 id=&#34;documentation-as-operational-memory&#34;&gt;Documentation as operational memory&lt;/h2&gt;
&lt;p&gt;The largest hidden risk in these years was tacit knowledge. One expert could still keep a network alive, but one expert could not scale resilience.&lt;/p&gt;
&lt;p&gt;We wrote concise docs for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;edge topology&lt;/li&gt;
&lt;li&gt;rule ownership&lt;/li&gt;
&lt;li&gt;proxy exceptions&lt;/li&gt;
&lt;li&gt;monitoring map&lt;/li&gt;
&lt;li&gt;escalation contacts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then we tested docs by having another operator run routine tasks from them. If they failed, doc quality was failing, not operator quality.&lt;/p&gt;
&lt;h2 id=&#34;the-mindset-shift-that-completed-migration&#34;&gt;The mindset shift that completed migration&lt;/h2&gt;
&lt;p&gt;By 2010, the real completion signal was not &amp;ldquo;all services on Linux.&amp;rdquo;&lt;br&gt;
The completion signal was:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;we can explain the system&lt;/li&gt;
&lt;li&gt;we can detect drift early&lt;/li&gt;
&lt;li&gt;we can recover predictably&lt;/li&gt;
&lt;li&gt;we can hand operations across people&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is the shift from clever setup to resilient operations.&lt;/p&gt;
&lt;h2 id=&#34;final-lessons-from-the-full-series&#34;&gt;Final lessons from the full series&lt;/h2&gt;
&lt;p&gt;Across all four parts, the durable lessons are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;bridge systems first, replace systems second&lt;/li&gt;
&lt;li&gt;treat policy as explicit artifacts&lt;/li&gt;
&lt;li&gt;migrate identities and habits with as much care as services&lt;/li&gt;
&lt;li&gt;design monitoring and runbooks for tired humans&lt;/li&gt;
&lt;li&gt;prefer incremental certainty over dramatic cutovers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of this sounds fashionable. All of it works.&lt;/p&gt;
&lt;h2 id=&#34;what-comes-next&#34;&gt;What comes next&lt;/h2&gt;
&lt;p&gt;Outside this series, two adjacent topics deserve their own deep dives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;storage reliability on budget hardware (where most silent disasters begin)&lt;/li&gt;
&lt;li&gt;early virtualization in small Linux shops (where consolidation and experimentation finally met)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both changed how we thought about failure domains and recovery.&lt;/p&gt;
&lt;h2 id=&#34;one-quarterly-drill-that-paid-off-every-time&#34;&gt;One quarterly drill that paid off every time&lt;/h2&gt;
&lt;p&gt;By the end of this migration era, we added a quarterly &amp;ldquo;internet dependency drill.&amp;rdquo; It was intentionally small and practical: simulate one realistic edge failure and walk the runbook with the current on-call rotation.&lt;/p&gt;
&lt;p&gt;Typical drill themes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;upstream DNS degraded but not fully down&lt;/li&gt;
&lt;li&gt;accidental firewall regression after policy deploy&lt;/li&gt;
&lt;li&gt;proxy saturation during patch rollout day&lt;/li&gt;
&lt;li&gt;WAN packet loss spike during business hours&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The rule was simple: no blame, no theater, and one concrete improvement item must come out of each drill.&lt;/p&gt;
&lt;p&gt;This practice changed behavior in a measurable way. Operators started recognizing symptoms earlier, escalation happened with better context, and runbooks stayed alive instead of rotting into documentation archives.&lt;/p&gt;
&lt;p&gt;Most importantly, drills exposed stale assumptions before real incidents did. In internet-dependent systems, stale assumptions are often the first domino.&lt;/p&gt;
&lt;p&gt;One side effect we did not expect: these drills improved cross-team language. Network admins, service admins, and helpdesk staff started describing incidents with the same terms and sequence. That alone reduced triage delay, because every handoff no longer restarted the investigation from zero.&lt;/p&gt;
&lt;p&gt;Shared language is not a soft benefit; in outages, it is response-time infrastructure.
It prevents expensive confusion.&lt;/p&gt;
&lt;p&gt;Related reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/linux/migrations/from-mailboxes-to-everything-internet-part-1-the-gateway-years/&#34;&gt;From Mailboxes to Everything Internet, Part 1: The Gateway Years&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/linux/migrations/from-mailboxes-to-everything-internet-part-2-mail-migration-under-real-traffic/&#34;&gt;From Mailboxes to Everything Internet, Part 2: Mail Migration Under Real Traffic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/linux/migrations/from-mailboxes-to-everything-internet-part-3-identity-file-services-and-mixed-networks/&#34;&gt;From Mailboxes to Everything Internet, Part 3: Identity, File Services, and Mixed Networks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/latency-budgeting-on-old-machines/&#34;&gt;Latency Budgeting on Old Machines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>From Mailboxes to Everything Internet, Part 3: Identity, File Services, and Mixed Networks</title>
      <link>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-3-identity-file-services-and-mixed-networks/</link>
      <pubDate>Thu, 18 Sep 2008 00:00:00 +0000</pubDate>
      <lastBuildDate>Thu, 18 Sep 2008 00:00:00 +0000</lastBuildDate>
      <guid>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-3-identity-file-services-and-mixed-networks/</guid>
      <description>&lt;p&gt;By the time mail became stable, the next migration pressure arrived exactly where everyone knew it would: file shares, printers, and user identity.&lt;/p&gt;
&lt;p&gt;In theory this is straightforward. In reality, this is where organizations discover the true complexity of their own history. Shared drives are business process. Printer queues are department politics. User accounts are unwritten social contracts. You are not migrating servers. You are migrating habits.&lt;/p&gt;
&lt;p&gt;In the 1995-2010 arc, Linux earned trust in this space because it solved practical problems at sane cost. But it only worked when we treated mixed environments as first-class architecture, not temporary embarrassment.&lt;/p&gt;
&lt;h2 id=&#34;the-mixed-network-reality-we-actually-had&#34;&gt;The mixed-network reality we actually had&lt;/h2&gt;
&lt;p&gt;Our baseline looked familiar to many geeks in 2008:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;some old Windows clients&lt;/li&gt;
&lt;li&gt;a few newer Windows clients&lt;/li&gt;
&lt;li&gt;Linux workstations in technical teams&lt;/li&gt;
&lt;li&gt;legacy scripts depending on share paths nobody wanted to rename&lt;/li&gt;
&lt;li&gt;printers with &amp;ldquo;special driver behavior&amp;rdquo; that existed only in rumor&lt;/li&gt;
&lt;li&gt;user account sprawl with inconsistent naming conventions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No greenfield, no clean slate.&lt;/p&gt;
&lt;p&gt;The migration target was equally practical:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;centralize file and print services on Linux&lt;/li&gt;
&lt;li&gt;standardize authentication path as much as feasible&lt;/li&gt;
&lt;li&gt;keep client disruption low&lt;/li&gt;
&lt;li&gt;preserve existing share semantics long enough for staged cleanup&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;why-samba-became-a-migration-weapon&#34;&gt;Why Samba became a migration weapon&lt;/h2&gt;
&lt;p&gt;Samba was not exciting in a conference-slide way. It was exciting in a &amp;ldquo;we can migrate without breaking payroll&amp;rdquo; way.&lt;/p&gt;
&lt;p&gt;It gave us leverage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;speak SMB to existing clients&lt;/li&gt;
&lt;li&gt;keep Unix-native storage and tooling under the hood&lt;/li&gt;
&lt;li&gt;centralize access control in files we could version&lt;/li&gt;
&lt;li&gt;run on hardware we could afford and replace&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The strongest outcome was operational consistency. We could finally inspect and manage share policy as code-like config, not opaque GUI state.&lt;/p&gt;
&lt;p&gt;A conceptual share policy looked like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-ini&#34; data-lang=&#34;ini&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;[finance]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;path&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/srv/shares/finance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;read only&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;no&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;valid users&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;@finance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;create mask&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;0660&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;directory mask&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;0770&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;[public]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;path&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;/srv/shares/public&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;read only&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;no&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;na&#34;&gt;guest ok&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;yes&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The syntax is less important than explicitness: who can access what, with which defaults.&lt;/p&gt;
&lt;h2 id=&#34;naming-and-identity-cleanup-the-hard-part-nobody-budgets&#34;&gt;Naming and identity cleanup: the hard part nobody budgets&lt;/h2&gt;
&lt;p&gt;The technical install was rarely the blocker. Identity cleanup was.&lt;/p&gt;
&lt;p&gt;We inherited user namespaces like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;initials on one system&lt;/li&gt;
&lt;li&gt;full names elsewhere&lt;/li&gt;
&lt;li&gt;legacy aliases kept alive by scripts&lt;/li&gt;
&lt;li&gt;contractor accounts with no lifecycle policy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A migration that ignores identity normalization creates permanent complexity debt.&lt;/p&gt;
&lt;p&gt;We built a mapping file and treated it as a controlled artifact:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;legacy_id   canonical_uid   display_name
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;jd          jdoe            John Doe
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;finance1    finance.ops     Finance Operations
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;svcprint    svc.print       Print Service Account&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then we staged migrations by team, not by technology component. That one decision reduced support calls dramatically.&lt;/p&gt;
&lt;h2 id=&#34;directory-services-useful-but-only-with-boundaries&#34;&gt;Directory services: useful, but only with boundaries&lt;/h2&gt;
&lt;p&gt;NIS, LDAP, local files, and domain-style approaches all appeared in real deployments. The important mistake to avoid was trying to force full centralization in one leap.&lt;/p&gt;
&lt;p&gt;Our pattern:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;centralize high-value user groups first&lt;/li&gt;
&lt;li&gt;keep local emergency admin path on each critical server&lt;/li&gt;
&lt;li&gt;document source-of-truth per account class&lt;/li&gt;
&lt;li&gt;automate consistency checks&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A central directory without local break-glass access is an outage multiplier.&lt;/p&gt;
&lt;h2 id=&#34;file-migration-strategy-that-survived-reality&#34;&gt;File migration strategy that survived reality&lt;/h2&gt;
&lt;p&gt;The best sequence we found:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;classify shares by business criticality&lt;/li&gt;
&lt;li&gt;migrate low-risk shares first&lt;/li&gt;
&lt;li&gt;preserve path compatibility through aliases/symlinks where possible&lt;/li&gt;
&lt;li&gt;run side-by-side read validation&lt;/li&gt;
&lt;li&gt;migrate write ownership after validation window&lt;/li&gt;
&lt;li&gt;freeze and archive old share with explicit retention date&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This gave users confidence because rollbacks remained feasible.&lt;/p&gt;
&lt;p&gt;We also learned to publish &amp;ldquo;what changed this week&amp;rdquo; notes with plain language and exact examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;old path&lt;/li&gt;
&lt;li&gt;new path&lt;/li&gt;
&lt;li&gt;unchanged behavior&lt;/li&gt;
&lt;li&gt;changed behavior&lt;/li&gt;
&lt;li&gt;support contact&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Silence is interpreted as instability.&lt;/p&gt;
&lt;h2 id=&#34;printers-where-migrations-go-to-get-humbled&#34;&gt;Printers: where migrations go to get humbled&lt;/h2&gt;
&lt;p&gt;Print migration seems trivial until one department uses a bizarre tray/font/duplex combination that only one driver profile handles.&lt;/p&gt;
&lt;p&gt;We created printer profile inventories before cutover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;model + firmware revision&lt;/li&gt;
&lt;li&gt;required driver mode&lt;/li&gt;
&lt;li&gt;known paper/duplex quirks&lt;/li&gt;
&lt;li&gt;department-specific defaults&lt;/li&gt;
&lt;li&gt;fallback queue&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then we tested with actual user documents, not vendor test pages.&lt;/p&gt;
&lt;p&gt;An immaculate test page proves nothing about accounting reports with embedded fonts.&lt;/p&gt;
&lt;h2 id=&#34;permissions-model-deny-ambiguity-early&#34;&gt;Permissions model: deny ambiguity early&lt;/h2&gt;
&lt;p&gt;Permission bugs are expensive because they damage trust from both sides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;too permissive -&amp;gt; security concern&lt;/li&gt;
&lt;li&gt;too restrictive -&amp;gt; productivity concern&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We moved to group-based share ownership and banned ad-hoc one-off user ACL edits in production without change notes. This felt strict and paid off quickly.&lt;/p&gt;
&lt;p&gt;The rule was simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if access need is recurring, represent it as group policy&lt;/li&gt;
&lt;li&gt;if access need is temporary, represent it with explicit expiry&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Temporary exceptions without expiry become permanent architecture by accident.&lt;/p&gt;
&lt;h2 id=&#34;migration-observability-for-fileidentity-services&#34;&gt;Migration observability for file/identity services&lt;/h2&gt;
&lt;p&gt;For this phase, useful metrics were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auth failures per source host&lt;/li&gt;
&lt;li&gt;file server latency during peak office windows&lt;/li&gt;
&lt;li&gt;share-level error rates&lt;/li&gt;
&lt;li&gt;print queue backlog and failure codes&lt;/li&gt;
&lt;li&gt;top denied access paths&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &amp;ldquo;top denied paths&amp;rdquo; report became our best policy feedback loop. It showed where documentation was wrong, where group membership drifted, and where users still followed old habits.&lt;/p&gt;
&lt;h2 id=&#34;incident-story-the-phantom-permission-outage&#34;&gt;Incident story: the phantom permission outage&lt;/h2&gt;
&lt;p&gt;We once lost half a day to what looked like widespread permission corruption after a migration wave. Root cause was not ACL damage. Root cause was client-side credential caching from old identities on a batch of desktops that were never fully logged out after account mapping changes.&lt;/p&gt;
&lt;p&gt;Fix:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;clear cached credentials&lt;/li&gt;
&lt;li&gt;force re-auth&lt;/li&gt;
&lt;li&gt;re-test representative access matrix&lt;/li&gt;
&lt;li&gt;update runbook with pre-cutover &amp;ldquo;credential cache reset&amp;rdquo; step&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The lesson: mixed-network incidents often come from boundary behavior, not core service logic.&lt;/p&gt;
&lt;h2 id=&#34;change-control-without-bureaucracy-theater&#34;&gt;Change control without bureaucracy theater&lt;/h2&gt;
&lt;p&gt;By 2008, we had enough scars to adopt lightweight but real change control:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one-page change intent&lt;/li&gt;
&lt;li&gt;explicit rollback&lt;/li&gt;
&lt;li&gt;affected services/users&lt;/li&gt;
&lt;li&gt;pre/post validation checklist&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not a ticketing cathedral. Just enough structure to prevent repeat mistakes.&lt;/p&gt;
&lt;p&gt;Migration work tempts improvisation. Improvisation is useful during investigation, dangerous during production rollout.&lt;/p&gt;
&lt;h2 id=&#34;the-cultural-upgrade-hidden-inside-technical-migration&#34;&gt;The cultural upgrade hidden inside technical migration&lt;/h2&gt;
&lt;p&gt;The largest win from this phase was cultural:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;infrastructure became more legible&lt;/li&gt;
&lt;li&gt;ownership became less tribal&lt;/li&gt;
&lt;li&gt;junior operators could contribute safely&lt;/li&gt;
&lt;li&gt;users got clearer communication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Linux did not magically deliver this. Clear boundaries and documented policy delivered it.&lt;/p&gt;
&lt;p&gt;Samba, directory services, and Unix tooling gave us the implementation path.&lt;/p&gt;
&lt;h2 id=&#34;if-you-are-planning-this-now&#34;&gt;If you are planning this now&lt;/h2&gt;
&lt;p&gt;If you are a small or mid-size team in 2008 planning a mixed-network migration, here is the short list that matters:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;inventory identities before touching auth backends&lt;/li&gt;
&lt;li&gt;migrate by team/business workflow, not by software component&lt;/li&gt;
&lt;li&gt;use group policy over user-by-user exceptions&lt;/li&gt;
&lt;li&gt;keep local emergency admin access&lt;/li&gt;
&lt;li&gt;test printers with real documents&lt;/li&gt;
&lt;li&gt;track top denied paths and act on them weekly&lt;/li&gt;
&lt;li&gt;publish plain-language migration notes users can forward internally&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If these are in place, tooling choice becomes manageable.
If these are missing, tooling choice will not save you.&lt;/p&gt;
&lt;h2 id=&#34;what-we-documented-after-every-team-migration&#34;&gt;What we documented after every team migration&lt;/h2&gt;
&lt;p&gt;A useful discipline in this phase was writing a short &amp;ldquo;migration memo&amp;rdquo; after each department cutover. Not a giant postmortem deck. One page, same headings every time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;what changed&lt;/li&gt;
&lt;li&gt;what broke&lt;/li&gt;
&lt;li&gt;what surprised us&lt;/li&gt;
&lt;li&gt;what to do differently next wave&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Patterns appeared quickly. We discovered, for example, that teams with the fewest technical customizations still generated many support requests if communications were vague, while highly customized teams generated fewer tickets when we sent exact path/credential examples ahead of time.&lt;/p&gt;
&lt;p&gt;The lesson was uncomfortable and valuable: support volume was often a documentation quality metric, not a complexity metric.&lt;/p&gt;
&lt;h2 id=&#34;decommissioning-old-services-without-creating-panic&#34;&gt;Decommissioning old services without creating panic&lt;/h2&gt;
&lt;p&gt;One more operational gap deserves mention: graceful decommissioning. Teams often migrate to new shares and auth paths, then leave old services half-alive &amp;ldquo;just in case.&amp;rdquo; Six months later those half-alive systems become shadow dependencies nobody can explain.&lt;/p&gt;
&lt;p&gt;We fixed this by adding an explicit retirement protocol:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;announce decommission date in advance&lt;/li&gt;
&lt;li&gt;publish list of known remaining users/scripts&lt;/li&gt;
&lt;li&gt;provide one final migration clinic window&lt;/li&gt;
&lt;li&gt;switch old service to read-only for a short grace period&lt;/li&gt;
&lt;li&gt;archive and remove with signed-off checklist&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Read-only grace periods were particularly effective. They surfaced hidden dependencies safely without encouraging indefinite delay.&lt;/p&gt;
&lt;p&gt;Another small but effective trick was publishing a &amp;ldquo;last-seen usage&amp;rdquo; report for legacy shares during the retirement window. Seeing concrete timestamps and hostnames moved conversations from fear to evidence. Teams could decide with confidence instead of intuition, and decommission dates stopped slipping for emotional reasons.&lt;/p&gt;
&lt;p&gt;Related reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/linux/migrations/from-mailboxes-to-everything-internet-part-2-mail-migration-under-real-traffic/&#34;&gt;From Mailboxes to Everything Internet, Part 2: Mail Migration Under Real Traffic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/musings/clarity-is-an-operational-advantage/&#34;&gt;Clarity Is an Operational Advantage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>From Mailboxes to Everything Internet, Part 2: Mail Migration Under Real Traffic</title>
      <link>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-2-mail-migration-under-real-traffic/</link>
      <pubDate>Tue, 27 Feb 2007 00:00:00 +0000</pubDate>
      <lastBuildDate>Tue, 27 Feb 2007 00:00:00 +0000</lastBuildDate>
      <guid>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-2-mail-migration-under-real-traffic/</guid>
      <description>&lt;p&gt;If Part 1 was about building a bridge, Part 2 is about learning to drive trucks across it in bad weather.&lt;/p&gt;
&lt;p&gt;Once mail leaves &amp;ldquo;small local utility&amp;rdquo; territory and becomes a central service, the conversation changes. You stop asking &amp;ldquo;can it send and receive?&amp;rdquo; and start asking:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;can it survive hostile traffic?&lt;/li&gt;
&lt;li&gt;can it be operated by more than one person?&lt;/li&gt;
&lt;li&gt;can policy changes be rolled out without accidental outages?&lt;/li&gt;
&lt;li&gt;can users trust it on weekdays when everyone is overloaded?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our case, that transition happened between 2001 and 2007. By then, Linux mail infrastructure was no longer experimental in geek circles. It was production, with all the consequences.&lt;/p&gt;
&lt;h2 id=&#34;why-we-moved-away-from-wizard-level-config-only&#34;&gt;Why we moved away from &amp;ldquo;wizard-level config only&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Many older setups depended on one person who understood every macro, alias map, and legacy hack in a mail config. That worked until that person got sick, changed jobs, or simply slept through a pager alert.&lt;/p&gt;
&lt;p&gt;Our first explicit migration goal in this phase was organizational, not technical:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A competent operator should be able to reason about mail behavior from plain files and runbooks.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That goal pushed us toward simpler policy expression and clearer service boundaries. Whether your final stack was sendmail, postfix, qmail, or exim mattered less than whether your team could operate it calmly.&lt;/p&gt;
&lt;h2 id=&#34;the-stack-boundary-model-that-reduced-incidents&#34;&gt;The stack boundary model that reduced incidents&lt;/h2&gt;
&lt;p&gt;We separated the pipeline into explicit layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;SMTP ingress/egress policy&lt;/li&gt;
&lt;li&gt;queue and routing&lt;/li&gt;
&lt;li&gt;content filtering (spam/virus)&lt;/li&gt;
&lt;li&gt;mailbox delivery and retrieval (POP/IMAP)&lt;/li&gt;
&lt;li&gt;user/admin observability&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key idea: one layer should fail in ways visible to the next, not silently mutate behavior.&lt;/p&gt;
&lt;p&gt;When all logic is crammed into one giant config, failure states become ambiguous. Ambiguity is expensive in incidents.&lt;/p&gt;
&lt;h2 id=&#34;real-world-migration-pattern-parallel-path-then-cutover&#34;&gt;Real-world migration pattern: parallel path, then cutover&lt;/h2&gt;
&lt;p&gt;Our cutovers got safer once we standardized this pattern:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;deploy new MTA host in parallel&lt;/li&gt;
&lt;li&gt;mirror relevant policy maps and aliases&lt;/li&gt;
&lt;li&gt;run shadow traffic tests (submission + delivery + bounce paths)&lt;/li&gt;
&lt;li&gt;cut one low-risk domain first&lt;/li&gt;
&lt;li&gt;watch queue/error behavior for a week&lt;/li&gt;
&lt;li&gt;migrate high-volume domains next&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This sounds slow. It is fast compared to cleaning up one bad all-at-once switch.&lt;/p&gt;
&lt;h2 id=&#34;the-anti-spam-era-changed-architecture&#34;&gt;The anti-spam era changed architecture&lt;/h2&gt;
&lt;p&gt;By 2005-2007, spam pressure made &amp;ldquo;mail server&amp;rdquo; and &amp;ldquo;mail security&amp;rdquo; inseparable. A useful configuration had to combine:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;connection-level checks (HELO sanity, rate controls)&lt;/li&gt;
&lt;li&gt;policy checks (relay restrictions, recipient validation)&lt;/li&gt;
&lt;li&gt;reputation checks (RBLs)&lt;/li&gt;
&lt;li&gt;content scoring (SpamAssassin-like layer)&lt;/li&gt;
&lt;li&gt;malware scanning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical policy layout in that era looked conceptually like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ingress:
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  reject_non_fqdn_sender
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  reject_non_fqdn_recipient
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  reject_unknown_sender_domain
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  reject_unauth_destination
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  check_rbl zen.example-rbl.net
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  pass_to_content_filter
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;content_filter:
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  spam_score_threshold = 6.0
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  quarantine_threshold = 12.0
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  antivirus = enabled&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The exact knobs differed by implementation. The architecture of staged decision points did not.&lt;/p&gt;
&lt;h2 id=&#34;false-positives-the-quiet-business-outage&#34;&gt;False positives: the quiet business outage&lt;/h2&gt;
&lt;p&gt;Most teams fear spam floods. We learned to fear false positives just as much. Aggressive filtering can silently break legitimate workflows, especially for smaller orgs where one supplier&amp;rsquo;s odd mail setup is still mission-critical.&lt;/p&gt;
&lt;p&gt;We moved to a tiered posture:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;reject only on high-confidence transport policy violations&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;tag/quarantine for uncertain content cases&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;teach users to report false positives with full headers&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This reduced support friction and preserved trust.&lt;/p&gt;
&lt;p&gt;A service users trust imperfectly is a service they route around with private inboxes, and then governance fails quietly.&lt;/p&gt;
&lt;h2 id=&#34;queue-operations-numbers-that-actually-mattered&#34;&gt;Queue operations: numbers that actually mattered&lt;/h2&gt;
&lt;p&gt;People love total queue size graphs. Useful, but incomplete. We tracked a more operational set:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;queue age percentile (P50/P95)&lt;/li&gt;
&lt;li&gt;deferred reasons by top code/domain&lt;/li&gt;
&lt;li&gt;bounce class distribution&lt;/li&gt;
&lt;li&gt;local disk growth vs queue growth&lt;/li&gt;
&lt;li&gt;retry success after first deferral&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why queue age percentile? Because a small queue with very old entries is often more dangerous than a large queue of fresh retries.&lt;/p&gt;
&lt;h2 id=&#34;submission-and-auth-became-first-class&#34;&gt;Submission and auth became first-class&lt;/h2&gt;
&lt;p&gt;As users moved from fixed office networks to mixed environments, authenticated submission stopped being optional. We separated trusted relay from authenticated submission explicitly and documented it in end-user instructions.&lt;/p&gt;
&lt;p&gt;A minimal policy split looked like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relay without auth only from managed LAN ranges&lt;/li&gt;
&lt;li&gt;require auth for all remote submission&lt;/li&gt;
&lt;li&gt;enforce TLS where practical&lt;/li&gt;
&lt;li&gt;disable legacy insecure paths gradually with communication windows&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;People remember technical changes. They forget user communication. In migrations, communication is part of uptime.&lt;/p&gt;
&lt;h2 id=&#34;logging-from-forensic-artifact-to-daily-dashboard&#34;&gt;Logging: from forensic artifact to daily dashboard&lt;/h2&gt;
&lt;p&gt;Early on, logs were mostly used after incidents. By mid-migration, we treated them as daily control instruments. We built tiny scripts that summarized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;top rejected senders&lt;/li&gt;
&lt;li&gt;top deferred recipient domains&lt;/li&gt;
&lt;li&gt;top local auth failures&lt;/li&gt;
&lt;li&gt;per-hour inbound/outbound volume&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even crude summaries built operator intuition fast. If Tuesday looks unlike every previous Tuesday, investigate before users notice.&lt;/p&gt;
&lt;h2 id=&#34;dns-and-reputation-maintenance-discipline&#34;&gt;DNS and reputation maintenance discipline&lt;/h2&gt;
&lt;p&gt;Mail reliability in 2007 is tightly coupled to DNS hygiene and sending reputation. We added recurring checks for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;forward/reverse consistency&lt;/li&gt;
&lt;li&gt;MX consistency after planned changes&lt;/li&gt;
&lt;li&gt;SPF correctness&lt;/li&gt;
&lt;li&gt;stale secondary records&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A single stale record can cause &amp;ldquo;works for most people&amp;rdquo; failures that consume days.&lt;/p&gt;
&lt;h2 id=&#34;incident-story-the-day-policy-order-bit-us&#34;&gt;Incident story: the day policy order bit us&lt;/h2&gt;
&lt;p&gt;One outage class recurred until we fixed our process: policy ordering mistakes.&lt;/p&gt;
&lt;p&gt;A config reload with one rule moved above another can flip behavior from permissive to catastrophic. We had one deploy where recipient validation executed before a required local map was loaded in a new process context. External effect: temporary 5xx rejects for valid local recipients.&lt;/p&gt;
&lt;p&gt;The post-incident fix was procedural:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;stage config in syntax check mode&lt;/li&gt;
&lt;li&gt;run policy simulation against known-good/known-bad test cases&lt;/li&gt;
&lt;li&gt;reload in maintenance window&lt;/li&gt;
&lt;li&gt;verify with live probes&lt;/li&gt;
&lt;li&gt;keep rollback snippet ready&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The technical fix was small. The process fix prevented repeats.&lt;/p&gt;
&lt;h2 id=&#34;the-human-layer-runbooks-and-ownership&#34;&gt;The human layer: runbooks and ownership&lt;/h2&gt;
&lt;p&gt;Mail operations improved when we wrote short, explicit runbooks and attached clear ownership:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;high queue depth but low queue age&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;low queue depth but high queue age&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;sudden outbound spike&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;auth failure burst&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;upstream DNS inconsistency&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each runbook had:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;first checks&lt;/li&gt;
&lt;li&gt;known bad patterns&lt;/li&gt;
&lt;li&gt;escalation condition&lt;/li&gt;
&lt;li&gt;rollback or containment action&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The format matters less than consistency. Under stress, consistency wins.&lt;/p&gt;
&lt;h2 id=&#34;migration-economics-why-smaller-steps-are-cheaper&#34;&gt;Migration economics: why smaller steps are cheaper&lt;/h2&gt;
&lt;p&gt;A common argument was &amp;ldquo;let&amp;rsquo;s wait and migrate everything when we also redo identity and web hosting.&amp;rdquo; We tried that once and regretted it. Bundling too many moving parts creates coupled risk and unclear root causes.&lt;/p&gt;
&lt;p&gt;Mail migration became tractable when we treated it as its own program with clear acceptance gates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;transport reliability&lt;/li&gt;
&lt;li&gt;policy correctness&lt;/li&gt;
&lt;li&gt;abuse resilience&lt;/li&gt;
&lt;li&gt;operator clarity&lt;/li&gt;
&lt;li&gt;user communication quality&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Only after those stabilized did we stack adjacent migrations.&lt;/p&gt;
&lt;h2 id=&#34;what-changes-in-2007-operations&#34;&gt;What changes in 2007 operations&lt;/h2&gt;
&lt;p&gt;Compared with 2001, a 2007 Linux mail setup in our environment looked less romantic and much more professional:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;explicit relay boundaries&lt;/li&gt;
&lt;li&gt;documented policy layers&lt;/li&gt;
&lt;li&gt;operational dashboards from logs&lt;/li&gt;
&lt;li&gt;recurring DNS/reputation checks&lt;/li&gt;
&lt;li&gt;reproducible deployment and rollback&lt;/li&gt;
&lt;li&gt;practical abuse handling without user-hostile defaults&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We did not eliminate incidents. We made incidents legible.&lt;/p&gt;
&lt;p&gt;That is the difference between hobby administration and service operations.&lt;/p&gt;
&lt;h2 id=&#34;practical-checklist-if-you-are-migrating-this-year&#34;&gt;Practical checklist: if you are migrating this year&lt;/h2&gt;
&lt;p&gt;If you are planning a migration this year, this is the condensed list I would tape above the rack:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;define policy boundaries before touching software packages&lt;/li&gt;
&lt;li&gt;build and test in parallel, then cut over domain-by-domain&lt;/li&gt;
&lt;li&gt;implement anti-spam as layered decisions, not one giant hammer&lt;/li&gt;
&lt;li&gt;measure queue age, not just queue size&lt;/li&gt;
&lt;li&gt;separate LAN relay from authenticated submission&lt;/li&gt;
&lt;li&gt;automate log summaries your operators will actually read&lt;/li&gt;
&lt;li&gt;simulate policy before reload&lt;/li&gt;
&lt;li&gt;treat user comms as part of the rollout, not afterthought&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you do only four of these, do 1, 3, 4, and 7.&lt;/p&gt;
&lt;h2 id=&#34;weekly-review-ritual-that-kept-us-honest&#34;&gt;Weekly review ritual that kept us honest&lt;/h2&gt;
&lt;p&gt;One habit improved this migration more than any single package choice: a short weekly mail operations review with evidence, not opinions.&lt;/p&gt;
&lt;p&gt;The agenda stayed fixed:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;queue age trend over last seven days&lt;/li&gt;
&lt;li&gt;top five defer reasons and whether each is improving&lt;/li&gt;
&lt;li&gt;false-positive reports with root-cause category&lt;/li&gt;
&lt;li&gt;auth failure clusters by source network&lt;/li&gt;
&lt;li&gt;one policy/rule cleanup item&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We kept the meeting to thirty minutes and required one concrete action at the end. If there was no action, we were probably admiring graphs instead of improving service.&lt;/p&gt;
&lt;p&gt;This ritual sounds simple because it is simple. The impact came from repetition. It turned scattered incidents into a feedback loop and gradually removed &amp;ldquo;mystery behavior&amp;rdquo; from the system.&lt;/p&gt;
&lt;p&gt;Related reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/linux/migrations/from-mailboxes-to-everything-internet-part-1-the-gateway-years/&#34;&gt;From Mailboxes to Everything Internet, Part 1: The Gateway Years&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/hacking/tools/terminal-kits-for-incident-triage/&#34;&gt;Terminal Kits for Incident Triage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>From Mailboxes to Everything Internet, Part 1: The Gateway Years</title>
      <link>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-1-the-gateway-years/</link>
      <pubDate>Tue, 14 Mar 2006 00:00:00 +0000</pubDate>
      <lastBuildDate>Tue, 14 Mar 2006 00:00:00 +0000</lastBuildDate>
      <guid>https://turbovision.in6-addr.net/linux/migrations/from-mailboxes-to-everything-internet-part-1-the-gateway-years/</guid>
      <description>&lt;p&gt;By the time people started saying &amp;ldquo;everything is online now,&amp;rdquo; many of us had already lived through two different worlds that barely spoke the same language.&lt;/p&gt;
&lt;p&gt;The first world was mailbox culture: dial-up nodes, message bases, Crosspoint setups, nightly rituals, packet exchanges, and local sysops who could fix a broken feed with a modem command and a pot of coffee. The second world was internet service culture: DNS, MX records, SMTP relays, POP boxes, always-on links, and users asking why the web was &amp;ldquo;slow today&amp;rdquo; as if bandwidth was weather.&lt;/p&gt;
&lt;p&gt;This series is about that crossing.&lt;/p&gt;
&lt;p&gt;Part 1 is the beginning of the crossing: the gateway years, when we still had one foot in mailbox software and one foot in Linux services, and we built bridges because nothing else existed yet.&lt;/p&gt;
&lt;h2 id=&#34;the-room-where-migration-began&#34;&gt;The room where migration began&lt;/h2&gt;
&lt;p&gt;Our first Linux gateway did not arrive as strategy. It arrived as a beige box rescued from an office upgrade pile, with a noisy fan and a disk that sounded like it was counting down to failure. We installed a small distribution, gave it a static IP, and told ourselves this was &amp;ldquo;temporary.&amp;rdquo; It stayed in production for three years.&lt;/p&gt;
&lt;p&gt;The old world was stable in the way old systems become stable: every sharp edge had already cut someone, so everyone knew where not to touch. Crosspoint was doing its job. Message exchange windows were predictable. Users knew when lines were busy and when downloads would be faster. Nothing was modern, but everything had shape.&lt;/p&gt;
&lt;p&gt;The new world was not stable. It was fast and constantly changing, but not stable. Protocol expectations moved. User behavior moved. Threat models moved. Providers moved. The migration problem was not &amp;ldquo;install Linux and done.&amp;rdquo; The migration problem was preserving trust while replacing almost every layer under that trust.&lt;/p&gt;
&lt;p&gt;That is why gateways mattered. They let us migrate behavior first and infrastructure second.&lt;/p&gt;
&lt;h2 id=&#34;why-gateways-beat-big-bang-migrations&#34;&gt;Why gateways beat big-bang migrations&lt;/h2&gt;
&lt;p&gt;The smartest decision is refusing the heroic rewrite mindset. We do not announce one switch date and burn the old stack. We insert a Linux gateway between known systems and unknown systems, then move one concern at a time:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;forwarding paths&lt;/li&gt;
&lt;li&gt;addressing and aliases&lt;/li&gt;
&lt;li&gt;queue behavior&lt;/li&gt;
&lt;li&gt;retries and failure visibility&lt;/li&gt;
&lt;li&gt;user-facing tooling&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That ordering was not glamorous, but it protected operations.&lt;/p&gt;
&lt;p&gt;Big-bang migrations look fast on whiteboards and expensive in real life. Gateways look slow on whiteboards and fast in incident response.&lt;/p&gt;
&lt;h2 id=&#34;the-first-practical-bridge-message-transport&#34;&gt;The first practical bridge: message transport&lt;/h2&gt;
&lt;p&gt;The earliest bridge usually looked like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;mailbox network traffic continues as before&lt;/li&gt;
&lt;li&gt;internet-bound traffic exits through Linux SMTP path&lt;/li&gt;
&lt;li&gt;incoming internet mail lands on Linux first&lt;/li&gt;
&lt;li&gt;local translation/forwarding rules feed legacy mailboxes where needed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This gave us one powerful property: we could debug internet path issues without disrupting internal mailbox flows that users depended on daily.&lt;/p&gt;
&lt;p&gt;A minimal relay policy draft from that era often looked like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# conceptual policy, not distro-specific syntax
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;allow_relay_from = 127.0.0.1, 192.168.0.0/24
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;default_action   = reject
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;local_domains    = example.net, bbs.example.net
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;smart_host       = isp-relay.example.net
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;queue_retry      = 15m
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;max_queue_age    = 3d&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You can replace every keyword above with your preferred MTA syntax. The architectural point is invariant: explicit relay boundaries, explicit domains, explicit queue policy.&lt;/p&gt;
&lt;h2 id=&#34;addressing-drift-the-hidden-migration-tax&#34;&gt;Addressing drift: the hidden migration tax&lt;/h2&gt;
&lt;p&gt;The first operational pain was not modem scripts or DNS records. It was naming drift.&lt;/p&gt;
&lt;p&gt;Mailbox-era naming conventions and internet-era address conventions were often related but not identical. We had aliases in user muscle memory that did not map cleanly to internet address rules. People had decades of habit in some cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;old handles&lt;/li&gt;
&lt;li&gt;area-specific routing assumptions&lt;/li&gt;
&lt;li&gt;implicit local-domain shortcuts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The migration trick was to preserve familiar entry points while moving canonical identity to internet-safe forms.&lt;/p&gt;
&lt;p&gt;We ended up with translation tables that looked boring and saved us hundreds of support mails:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;old_alias      -&amp;gt; canonical_mailbox
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sysop          -&amp;gt; admin@example.net
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;support-local  -&amp;gt; helpdesk@example.net
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;john.d         -&amp;gt; john.doe@example.net&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Most migration failures are identity failures dressed as transport failures.&lt;/p&gt;
&lt;h2 id=&#34;dns-is-where-we-stopped-improvising&#34;&gt;DNS is where we stopped improvising&lt;/h2&gt;
&lt;p&gt;In mailbox culture, many routing assumptions lived in operator knowledge. In internet culture, that same routing intent must be represented in DNS records that other systems can query and trust.&lt;/p&gt;
&lt;p&gt;The day we moved MX handling from ad-hoc provider defaults to explicit records was the day incident triage got easier.&lt;/p&gt;
&lt;p&gt;A tiny zone fragment captured more operational truth than many meetings:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-dns&#34; data-lang=&#34;dns&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;IN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;MX&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;sc&#34;&gt;10&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;py&#34;&gt;mail1.example.net.&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;IN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;MX&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;sc&#34;&gt;20&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;py&#34;&gt;mail2.example.net.&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;mail1&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;IN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;A&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;203.0.113.15&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;mail2&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;IN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;A&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;203.0.113.16&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The key is not syntax. The key is declaring fallback behavior intentionally. If primary host is down, we already know what should happen next.&lt;/p&gt;
&lt;h2 id=&#34;queue-literacy-as-survival-skill&#34;&gt;Queue literacy as survival skill&lt;/h2&gt;
&lt;p&gt;Every sysadmin migrating to internet mail learns this eventually: queue behavior is where confidence is either built or destroyed.&lt;/p&gt;
&lt;p&gt;Users do not care that a remote host gave a transient 4xx. They care whether their message disappeared.&lt;/p&gt;
&lt;p&gt;So we trained ourselves and junior operators to answer three questions fast:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Is the message queued?&lt;/li&gt;
&lt;li&gt;Why is it queued?&lt;/li&gt;
&lt;li&gt;When is next retry?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Those three answers turn panic into process.&lt;/p&gt;
&lt;p&gt;During the gateway years, we posted a laminated &amp;ldquo;mail panic checklist&amp;rdquo; near the rack:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;check queue depth&lt;/li&gt;
&lt;li&gt;sample queue reasons&lt;/li&gt;
&lt;li&gt;verify DNS and upstream reachability&lt;/li&gt;
&lt;li&gt;confirm local disk not full&lt;/li&gt;
&lt;li&gt;verify daemon alive and accepting local submission&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It looked primitive. It prevented chaos.&lt;/p&gt;
&lt;h2 id=&#34;security-changed-the-social-contract&#34;&gt;Security changed the social contract&lt;/h2&gt;
&lt;p&gt;Mailbox systems had abuse, but internet-facing SMTP changed abuse economics overnight. Open relay misconfiguration could turn your server into a spam cannon before breakfast.&lt;/p&gt;
&lt;p&gt;Our first open relay incident lasted forty minutes and felt like forty days.&lt;/p&gt;
&lt;p&gt;We fixed it by moving from permissive defaults to deny-by-default relay policy and by testing from outside networks before every major config change. We also added tiny audit scripts that checked banner, open ports, and policy behavior from a second host. Nothing fancy. Just enough automation to avoid repeating avoidable mistakes.&lt;/p&gt;
&lt;p&gt;The cultural shift was bigger than the technical shift: &amp;ldquo;it works&amp;rdquo; was no longer sufficient. &amp;ldquo;It works safely under hostile traffic&amp;rdquo; became baseline.&lt;/p&gt;
&lt;h2 id=&#34;going-online-changed-support-load&#34;&gt;Going online changed support load&lt;/h2&gt;
&lt;p&gt;A mailbox user asking for help usually came with local context: software version, dialing behavior, known node, known timing window.&lt;/p&gt;
&lt;p&gt;An internet user asking for help often came with &amp;ldquo;mail is broken&amp;rdquo; and no context.&lt;/p&gt;
&lt;p&gt;So we created what we now call structured support intake, long before that phrase became common:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;sender address&lt;/li&gt;
&lt;li&gt;recipient address&lt;/li&gt;
&lt;li&gt;timestamp and timezone&lt;/li&gt;
&lt;li&gt;exact error text&lt;/li&gt;
&lt;li&gt;one reproduction attempt with command output&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This cut mean-time-to-triage massively.&lt;/p&gt;
&lt;p&gt;In other words, migration forced us to formalize operations.&lt;/p&gt;
&lt;h2 id=&#34;the-tooling-stack-we-trusted-by-2001&#34;&gt;The tooling stack we trusted by 2001&lt;/h2&gt;
&lt;p&gt;By the end of the earliest gateway phase, a reliable small-site stack often included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Linux host with disciplined package baseline&lt;/li&gt;
&lt;li&gt;DNS under our control&lt;/li&gt;
&lt;li&gt;SMTP relay with strict policy&lt;/li&gt;
&lt;li&gt;basic POP/IMAP service for user retrieval&lt;/li&gt;
&lt;li&gt;log rotation and disk-space monitoring&lt;/li&gt;
&lt;li&gt;scripted daily backup of configs and queue metadata&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We did not call this &amp;ldquo;platform engineering.&amp;rdquo; It was just survival with documentation.&lt;/p&gt;
&lt;h2 id=&#34;why-these-gateway-lessons-matter-in-2006-operations&#34;&gt;Why these gateway lessons matter in 2006 operations&lt;/h2&gt;
&lt;p&gt;In 2006 operations, the web moves fast. Broadband is common in many places. Users assume immediacy. People discuss hosted services seriously. Yet the gateway lessons still hold:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;preserve behavior during infrastructure changes&lt;/li&gt;
&lt;li&gt;migrate one boundary at a time&lt;/li&gt;
&lt;li&gt;make routing intent explicit&lt;/li&gt;
&lt;li&gt;treat queues as first-class observability&lt;/li&gt;
&lt;li&gt;never ship mail infrastructure without hostile-traffic assumptions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are not legacy lessons. They are durable operations lessons.&lt;/p&gt;
&lt;h2 id=&#34;field-note-the-migration-metric-that-mattered-most&#34;&gt;Field note: the migration metric that mattered most&lt;/h2&gt;
&lt;p&gt;We tried to track many metrics during those years: queue depth, retries, bounce rates, uptime percentages. Useful, all of them. But the metric that predicted success best was simpler:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How many issues can a tired operator diagnose correctly in ten minutes at 02:00?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If your architecture makes that easy, your migration is healthy.
If your architecture requires one heroic expert, your migration is brittle.&lt;/p&gt;
&lt;p&gt;Gateways made 02:00 diagnosis easier. That is why they were the right choice.&lt;/p&gt;
&lt;h2 id=&#34;current-migration-focus-areas&#34;&gt;Current migration focus areas&lt;/h2&gt;
&lt;p&gt;The same gateway discipline applies immediately to the next pressure zones:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;mail stack policy and anti-spam layering without open-relay mistakes&lt;/li&gt;
&lt;li&gt;file/print and identity migration in mixed Windows-Linux environments&lt;/li&gt;
&lt;li&gt;perimeter/proxy/monitoring runbooks that keep incident handling predictable&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;appendix-the-one-page-gateway-notebook&#34;&gt;Appendix: the one-page gateway notebook&lt;/h2&gt;
&lt;p&gt;One practical artifact from these years deserves to be copied directly: a one-page gateway notebook entry that every on-call operator could read in under two minutes.&lt;/p&gt;
&lt;p&gt;Ours looked like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Gateway host: gw1
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Critical services: smtp, dns-cache, queue-runner
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Known upstreams: isp-relay-a, isp-relay-b
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;If mail delayed:
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  1) check queue depth + oldest queued age
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  2) check DNS resolution for target domains
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  3) check upstream reachability and local disk free
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  4) sample 5 queued messages for common reason
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  5) decide: wait/retry, reroute, or escalate
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Escalate immediately if:
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  - queue age &amp;gt; 2h for priority domains
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  - repeated local write errors
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  - resolver timeout &amp;gt; threshold for 15m&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That page did not make us smarter. It made us consistent. In migration work, consistency under pressure is often the difference between a bad hour and a bad weekend.&lt;/p&gt;
&lt;p&gt;Related reading:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/dos/batch-file-wizardry/&#34;&gt;Batch File Wizardry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://turbovision.in6-addr.net/retro/dos/config-sys-as-architecture/&#34;&gt;CONFIG.SYS as Architecture&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
  </channel>
</rss>
