X (Twitter) Facebook Pinterest LinkedIn E-mail

An battery fire at the National Information Resources Service (NIRS) in Daejeon on September 26 caused one of the most severe data losses recorded in a modern administration: 858 TB of information from the G-Drive—the “online disk” for officials—was completely destroyed and without backup. The failure knocked out 96 government systems; 95 had backups. The only exception, G-Drive, stored work material for 125,000 public employees across 74 ministries, with an estimated usage of 17% of central government staff.

The fire originated in a room with 384 battery packs, which ravaged much of a plant and rendered critical services offline: official email, online postal service, ministry websites, the complaints and petitions system, and even affected 119 (emergency services), according to the initial assessment. Six days after the incident, only about 16–18% of the 647 systems hosted at the site had been restored; G-Drive remained unrecoverable due to the total lack of backups.

A “Government Disk” Without a Backup Plan

Created in 2017 to share documents (with 30 GB per user) and improve security, G-Drive gradually replaced local storage: a directive from the Ministry of Interior and Security urged not to store work material on office PCs and centralize it on that platform. The irony is stark: the recommended system for safeguarding documentation lacked backups. A ministry source even justified it by saying “could not have copies due to its large capacity.” In data center terms, 858 TB is not exceptional: today, petabytes are routinely managed with replication and snapshots.

The repercussions are significant in areas like Personnel Management—a heavy G-Drive user: eight years’ worth of material could have been lost, including internal minutes, documentation for the National Assembly, and potentially personal data (verifications, disciplinary records). Though transactional systems like e-Person are hosted elsewhere (Gwangju), no exact accounting exists of what was destroyed: teams are tracking down PCs, emails, and physical records to rebuild at least the minimal operational data. This month’s Assembly Audit might be compromised due to inability to provide requested documentation.

Backups Yes… but Not Always, and Not Up-to-Date

The NIRS reported that, before the fire, 62% of the 647 systems were backed up daily, while the remaining 38% were backed up monthly. In some cases, the last backup was on August 31, meaning those services lost all data from September. Even worse, the adjacent storage repository—near the site of the fire—where critical repositories for restoring operations resided, became inaccessible due to dust and ash, halting recovery efforts. The official plan is to migrate the 96 affected systems to the Daegu facility; estimates initially spoke of four weeks, but experts anticipate longer delays.

“Too Big to Copy”: False Economy

The argument that capacity prevents copying is technically unsustainable by 2025. Robust and proven strategies for large-scale data include:

3-2-1 rule (three copies, two media, one off-site), now extended to 3-2-1-1-0 (an immutable copy with zero verified errors).
Frequent versioning and snapshots at the file system or storage level (ZFS, Ceph, object storage solutions with object lock).
Synchronous / active-active replication between two data centers for near-zero RPO/RTO, complemented by delayed backups at a third site (protects against logical corruption or ransomware).
Segmentation by criticality and SLA: not all repositories require the same RPO, but none should be without a backup plan.

With storage tiering, deduplication, and compression, backing up hundreds of terabytes is no longer an insurmountable challenge. What remains prohibitive is rebuilding eight years of work.

Chain Failures: Technology, Governance, and Culture

The incident was not merely an electrical accident; it exposes systemic gaps:

Design: a centralized service without geographic redundancy or immutable copies.
Governance: policies that mandate the use of G-Drive, but do not require equivalent backups.
Operations: backup frequencies vary and monthly windows for systems with dynamic info.
Risk Management: no regular disaster recovery testing (backup exists only when restored).
Physical resilience: co-location of batteries, storage, and network amplified the damage.

Human Impact and Responsibilities

While the fire did not cause direct casualties, a staff member involved in recovery committed suicide on October 3 in Sejong, serving as a tragic reminder of the human toll accompanying technological disasters. Four people have been detained on suspicion of criminal negligence. Politically, opposition parties criticize the lack of manuals and audits and demand accountability.

What Should Change Starting Today

Mapping and classifying data (who stores what, where, and with what RPO/RTO).
Immutable backups (WORM / object lock) and tiered retention (daily, weekly, monthly) with restoration tests documented.
Dual active-active data centers for core platforms and a third site for copies; physical separation of power/storage.
Integrity telemetry (block verification, scrubbing) and compliance monitoring with executive alerts.
Crisis plan: runbooks, contacts, table-top exercises, and quarterly failover tests.
Culture: resilience metrics (not just uptime), with protected budget for business continuity.

Lessons for Any Organization

If data exists only in one place, it essentially doesn’t exist.
Infinite RPO and unpredictable RTO are inevitable consequences of not copying.
Battery fires and simultaneous power/storage failures do happen; the question is when and what will remain standing.
Operational simplicity (versioning + replication + immutable copies) beats the never-implemented ideal of perfection.

Frequently Asked Questions

Why couldn’t the 858 TB “fit” into a backup plan?
With current technologies (deduplication, compression, object storage, LTO-9/10 tapes, sovereign clouds), it does fit. The issue was not technical but related to prioritization and design.

Is replicating to another data center enough?
No. Replication protects against physical failures; immutable copies protect against deletion, corruption, and ransomware. Both are necessary.

How often should one test recovery?
At minimum, quarterly for critical systems and monthly for representative sets. Without restoration drills, backups are just assumptions.

What RPO/RTO are reasonable for a government “Drive”?
For a cross-functional repository, RPO ≤ 24 hours with hourly snapshots and phase RTOs (basic read in hours, full write in <48 hours) are achievable with proper architecture.

Note: The data and timeline referenced are based on official reports and local coverage following the September 26 fire at Daejeon’s NIRS, including estimates of affected systems, volumes, and recovery percentages published in subsequent days.

via: Chosun and Donga

X (Twitter) Facebook Pinterest LinkedIn E-mail