Monad Testnet Outage Follow-Up — Cable Fault Confirmed, TrieDB Disk Replaced, DC Outage Noted | Pulse

Monad outage follow-up, datacenter diagnostics, and disk replacement visual

Monad Testnet Outage Follow-Up — Cable Fault Confirmed, TrieDB Disk Replaced, DC Outage Noted

Overview

This report is a continuation of the initial incident write-up: Monad Testnet Outage Report — Host Connectivity Loss and Recovery

Following the original outage and recovery, datacenter diagnostics confirmed the root cause of the initial loss of communications: a faulty cable prevented reliable remote access to the server and blocked normal remote reset paths. After the cable was replaced and access was restored, BitCtrl performed deeper inspection of the host to ensure there were no underlying faults contributing to the instability.

Context

During post-cable diagnostics, the node’s TrieDB disk began showing I/O errors, indicating storage degradation rather than a purely network-layer incident. After escalation and discussion with datacenter support, we agreed to perform a disk swap using on-site hands to eliminate the failing component and stabilize the node.

The disk replacement was executed successfully today. The new disk is installed and the node was brought back into service. Because TrieDB is a critical state component, the operation required controlled downtime and validation steps, resulting in approximately 2 hours and 30 minutes of downtime for the testnet validator.

Provider Outage Context (MEVSpace, Feb 27, 2026)

Operational Impact

In addition to the above, BitCtrl also reports an unrelated datacenter-wide outage at MEVSpace on February 27, 2026, lasting roughly 30 minutes. This event was provider-side and separate from the host hardware fault, but the timing was unfavorable because multiple infrastructure disruptions occurred within the same week and in close proximity. The outage impact was visible across both testnet and mainnet validators hosted in the same MEVSpace datacenter footprint.

Downtime Summary (past week)

Host hardware fault (cable + disk remediation): ~2h30m downtime
Provider outage (MEVSpace DC): ~30m downtime
Total reported downtime (combined): ~3h00m

Key Takeaways

Initial outage root cause was confirmed as a faulty cable preventing remote access/reset
Follow-up diagnostics revealed TrieDB disk errors, requiring an on-site disk replacement
Disk swap was successful, but caused 2h30m downtime due to single-host testnet setup
Separate MEVSpace datacenter-wide outage (~30 minutes) was visible across testnet and mainnet validators in the same DC

published-saturdayincidentmonadalertoperatorsincident-responseroot-causevalidator-ops