Moving A Datacenter Without Downtime in 30 Days

TL;DR
With six months of prep and ~30 days of tightly scripted change windows, our team moved an entire data center with no unplanned downtime. The keys: early circuit orders, a 50 Gbps low-latency link to trunk VLANs and replicate data, Pure Storage ActiveCluster (stretched volume, VMware cross-vCenter migrations, Zerto for the largest SQL workloads, Cohesity for backup mobility, NetBox as our labeling source of truth, and disciplined runbooks. Below is the exact timeline, what went right, what almost didn’t, and who helped get us there.

Team Win: 300+ VMs migrated in weeks, Internet/DMZ cutovers completed at night, SQL cluster failovers executed cleanly, backup estate moved and replicating — all without users noticing. Couldn’t be prouder of this crew.

Why We Moved Datacenters

Our current datacenter is in a leased building and was originally built ~20 years ago (I helped with that move, too). Major components — generator and UPS — had reached it's end of life; replacement alone would run $500k–$750k. Add lease costs, power, and fire-suppression maintenance, and staying no longer made economic sense. With our lease ending in September 2025, it was time to move.

Planning the Move (May 2024 → Sept 2025)

We’d moved once three years earlier, so we had strong lessons learned — what to do and what not to do. Budget prep began in May 2024, and we quickly realized that most of our network gear, storage, and servers were EOL. That turned into a bonus: we could stand up parallel capacity at the new site while the old site continued to run.

We scheduled walkthroughs of three colocation sites near the office, selected one, and entered contract negotiations (which, at our firm, can take time). While negotiating, we specified a private cage with biometric access and 48U, extra-deep racks to fit PDUs and keep cabling clean. I highly recommend both — they make running cables and accessing gear far easier.

Pro tip: If a provider says a circuit takes 90 days, plan for 180–270. Order early, build a schedule in Slack, and assume something will slip.

Circuits & Connectivity Strategy

We targeted a 50 Gbps, low-latency circuit so we could live-migrate VMs and trunk VLANs between datacenters (avoiding mass IP renumbering). We signed ~12 months in advance so the carrier could ready facilities and equipment.

We also worked through MPLS, Point-to-Point (P2P), VPLS, and a new Internet circuit at the Colo.

How it played out:

Spectrum Enterprise (50 Gbps): Delivered ~15 days late due to equipment issues — their sales/PM team mitigated quickly and still hit our window.
MPLS: Marked “delivered” in February but wasn’t actually installed; after escalation, it became active in July, and credits were issued.
P2P/VPLS: Minor snags; both landed within the needed window.
Internet: We pivoted from moving an existing circuit to using the Colo’s carrier. They allocated an IP block and turned it up on time, with BGP default for automatic failover.

Bottom line: Order early. Document delivery dates. Plan at least one month of cushion ahead of your first cutover. And explicitly note “service move” in contracts to avoid early-termination fees.

Build-Out, Labeling & Rack Strategy

Our PM began bi-weekly coordination in January 2025. We started with what sounds trivial but isn’t: power/fiber standards and a single source of truth for labels.

Source of truth: We standardized on NetBox (NetBox Labs) for labeling. Racks, devices, patch panels, interfaces, cables, all live in NetBox. Every physical label maps 1:1 to a NetBox object so field work and troubleshooting are consistent.
Cable standards: Distinct fiber colors for iSCSI, front-side LAN, DMZ, and inter-switch; custom red/white power cables to balance PDUs and spot issues at a glance.
Fiber management: We chose fiber patch panels in each rack (vs. direct server-to-switch runs). Each fiber is labeled and recorded in NetBox (panel/port ↔ server ↔ switch port). Troubleshooting is now fast and sane.
Rack hardware: We used a PATCHBOX-style cage-nut replacement system to avoid bloody fingers and speed up one-person installs — a bigger quality-of-life win than you’d think.

Our label schema (examples):

R12-PP01-P24 ⇄ CO1-LEAF-A-Eth1/15         # Patch panel port to switch port
R10-PDU-A-C13 (20A) → SRV-APP01-PSU1      # PDU circuit to server PSU
CIR-SPECTRUM-50G-PRI                      # Primary 50G circuit (provider-tagged)
DMZ-VLAN310-GW@COLO                       # Gateway label for DMZ VLAN 310 at colo

Network design:
We retained a collapsed core with separate iSCSI switching. All production links home-run to redundant core switches. Top-of-rack gear is used for out-of-band management (1 GbE copper). Be mindful of spanning-tree roots, HSRP/VRRP priorities, and BGP routes when you trunk VLANs over the inter-DC link: initially, traffic will hairpin across the 50 Gbps path until you move gateways.

Internet & DMZ Cutovers

Turn up the Internet at the Colo with a BGP default route.
Trunk DMZ VLANs over the 50 Gbps backbone (for clustered reverse proxies/LBs).
Move the secondary load balancer to the new DC → fail over to make it primary.
Align firewall DMZ interfaces to avoid default-gateway conflicts.
Prepare DNS: lower TTLs, scrub stale records, update IP-bound services.
Nightly change window (9:00 PM): shift default gateways and DMZ firewall IPs.
- Result: One missed service prerequisite was corrected; otherwise, smooth.

Cutover tracking: Circuit handoffs, provider contacts, LOA/CFA notes, and public IP allocations were tracked in a shared Excel spreadsheet to keep the cutover plan and the on-the-ground reality perfectly aligned. NetBox remained our inventory/labeling source of truth.

VMware & Storage Migrations

Storage: New Pure Storage arrays configured with ActiveCluster (stretched volumes) for synchronous replication; vVols where appropriate.
vCenter: New vCenter stood up in the Colo; new clusters joined and mapped to the arrays.
Method: Cross-vCenter VM migrations validated with test VMs first, then scaled in batches of 10 → 20 per night.
Throughput: ~300 VMs moved over a few weeks with no user impact.

Backups:
Our Cohesity estate (six chassis) was pre-cabled and moved in an afternoon. We cut replication to a new, higher-bandwidth link to the DR site; replication times effectively halved. The 50 Gbps backbone ensured backup traffic didn’t interfere with live VM migrations.

Financial reporting (SQL + snapshots):
Morning processes mount Pure snapshots of large SQL volumes. vVols aren’t compatible with the stretched-volume approach, so we replicated those datasets and scheduled the change for a weekend, with the business accepting that numbers might lag by a day. We moved four large SQL servers via Zerto, updated snapshot scripts for the new volume IDs, and the financial dashboards were accurate by Monday.

Citrix PVS/VDI & SQL Cluster Weekend

We split the existing VMware hosts: half moved the first week of July, the remainder the following weekend.
PVS & DHCP were built redundant across sites; half moved first without issue.
VDI came up with one minor snag (a missing VLAN on switches); fixed quickly.
SQL cluster failover: Primary physical nodes were shut down; instances failed over to secondary nodes as planned.
Racking: ~13 servers were unracked, transported, and re-racked starting at 8 AM.
Gotcha: One VDI stack initially overloaded a power circuit; we rebalanced and proceeded.
Outcome: All services returned without an outage.

Final steps that night:
We migrated the primary default gateways for all VLANs to the Colo. Later the same evening, we cut MPLS and VPLS to the new site.

What Made the Difference

50 Gbps low-latency link to trunk VLANs (no mass IP changes) and carry replication.
Pure Storage ActiveCluster for stretched volumes and clean failover.
VMware cross-vCenter migrations after thorough dress rehearsals.
Zerto to replicate the heaviest SQL workloads with minimal downtime risk.
Cohesity backups moved early, then accelerated on a bigger replication pipe.
NetBox as the authoritative inventory and labeling system (DCIM/IPAM), keeping circuits, ports, cables, and power feeds consistent with what was physically labeled.
Runbooks + labeling + standards (cables, power, fiber, NetBox records) that paid off under pressure.
Early, realistic circuit planning, including contingency for slips.

Vendor Shout-Outs (Thank You)

Spectrum Enterprise — delivered the 50 Gbps service and worked through last-mile constraints to hit our window.
Pure Storage — ActiveCluster and snapshot workflows made the migration safe and fast.
VMware — cross-vCenter migrations and vVols where appropriate.
Zerto — for the large SQL replication and clean cutover weekend.
Cohesity — backup mobility and accelerated replication after the move.
NetBox (NetBox Labs) — our DCIM/IPAM and labeling source of truth for racks, devices, interfaces, cables, power, and circuits.
PATCHBOX — cage-nut replacement hardware that turned racking into a painless, one-person job.
Our colocation partner — on-time cage build, biometric access, power delivery, and IP allocation/BGP.

We also made a deliberate decision to exit a chronically unreliable Internet provider prior to the move after repeated extended outages. Professional lesson: track provider MTBF/MTTR, set clear SLOs, and be willing to change when they’re not met.

Lessons Learned (Steal These)

Order circuits 6–12 months out. Build a schedule cushion and track every milestone.
Lower DNS TTLs a week before cutover. Verify on authoritative servers, not just recursive caches.
Trunk first, flip later. Expect traffic to hairpin across the inter-DC link until you move HSRP/VRRP priorities and gateways.
Label everything in one system. Use NetBox as the source of truth and mirror labels in the rack; add QR codes to speed audits.
Standardize power. Custom-color cords + PDU balance checks avoided surprises (except that first VDI rack… which we fixed fast).
Practice the failover. Tabletop the runbooks, then do live dress rehearsals with test VMs and non-critical services.
Have a rollback. Keep old routes dark-but-ready for 24–48 hours.

Results

Zero unplanned downtime. Nightly windows only.
300+ VMs moved in a few weeks.
~50% faster replication after the move (bigger DR pipe).
Internet/DMZ cleanly cut over with BGP failover.
Financial reporting on time on Monday after a planned weekend switch.
De-risked aging power/UPS/generator exposure and exited an aging facility before lease end.

Thank You, Team

Network, storage, virtualization, DBAs, PMO, help desk — you all executed with precision and calm. From the labeling party to the 8 AM racking sprint, this was a masterclass in preparation and teamwork. Huge appreciation.

Appendix — Quick Specs & Choices

Core design: Collapsed core + separate iSCSI stack; OOB via 1 GbE ToR.
Routing: BGP for Internet default, HSRP/VRRP for gateways; careful STP root selection.
Storage: Pure ActiveCluster (stretched volumes); vVols where appropriate.
Backups: Cohesity, replication cut to higher-bandwidth DR link.
VMs: Cross-vCenter migration in batches after validation.
Citrix: PVS/VDI split-move across consecutive weekends with SQL cluster failover.