The conference room went silent. Dead silent. The kind of silence that makes your stomach drop.
It was 9:47 AM on a Monday morning in 2020, and the COO of a fast-growing SaaS company had just asked a simple question during their SOC 2 readiness assessment: "How long would it take us to recover if our production database was completely destroyed right now?"
The CTO's face went pale. "I... I'm not sure. Maybe a week? We have backups somewhere..."
"Somewhere?" I asked quietly.
That's when we discovered that their "backup strategy" consisted of automated snapshots that hadn't been tested in 18 months, stored in the same region as their production data, with no documented recovery procedures. If their primary AWS region had gone down, they would have been finished.
Three months later, we found out just how close they'd come to disaster. A configuration error corrupted their primary database. Because we'd implemented proper SOC 2-aligned backup and recovery procedures, they were back online in 2 hours and 17 minutes instead of facing potential bankruptcy.
After fifteen years of implementing business continuity programs, I can tell you this with absolute certainty: your backup strategy is worthless until you prove it works. And SOC 2 doesn't just require you to have backups—it requires you to prove they actually protect your business.
Why SOC 2 Takes Backup and Recovery So Seriously
Let me share something that might surprise you: in the SOC 2 Trust Services Criteria, backup and recovery isn't just one control—it touches multiple criteria across Security, Availability, and even Processing Integrity.
Here's what SOC 2 auditors are actually evaluating:
Trust Services Criteria | Backup & Recovery Requirements | What Auditors Look For |
|---|---|---|
Availability (A1.2) | System availability commitments | Recovery time objectives (RTO) meeting SLA commitments |
Availability (A1.3) | System recovery procedures | Documented, tested recovery procedures |
Common Criteria (CC6.1) | Logical and physical security | Backup data encryption and access controls |
Common Criteria (CC7.2) | System monitoring | Backup success/failure monitoring and alerting |
Common Criteria (CC9.1) | Risk mitigation | Business impact analysis and continuity planning |
I learned this the hard way during my first SOC 2 audit back in 2016. We thought having automated backups was enough. The auditor smiled politely and asked: "When was the last time you performed a full restoration test?"
We hadn't. Ever.
She failed us on four different controls. That failure cost the company a $3.2 million customer contract and taught me a lesson I've never forgotten.
"A backup you haven't tested is just a placebo. It makes you feel better, but it won't save you when things go wrong."
The Real Cost of Backup Failures (Stories from the Trenches)
Let me tell you about three companies I've worked with, and what their backup situations taught me:
Case Study 1: The "We Have Backups" Company
In 2019, I consulted for a healthcare technology startup. Impressive team, great product, solid revenue growth. They were confident about their SOC 2 audit because they had "comprehensive backups."
During our assessment, we attempted a test recovery. Here's what we found:
Day 1: Initiated restore from backup. Discovered backup files were corrupted.
Day 2: Tried older backup. Different corruption issue.
Day 3: Found a backup that worked. Started restore process.
Day 4: Realized the backup was 6 weeks old, missing critical customer data.
Day 5: Attempted to piece together data from multiple sources.
Day 8: Finally achieved partial recovery with significant data loss.
If this had been a real disaster, they would have lost 40% of their customers and faced millions in HIPAA violation fines.
The fix? We implemented a proper backup and recovery program. Total cost: $87,000. Estimated cost of actual disaster they avoided: north of $12 million.
Case Study 2: The Ransomware Wake-Up Call
A financial services company got hit with ransomware in 2021. The attackers encrypted everything—including their backups. Why? Because the backups were accessible from the production network with the same compromised credentials.
They paid $340,000 in ransom. Then spent another $890,000 on forensics, remediation, and recovery. Then lost their cyber insurance coverage. Then failed their SOC 2 audit.
Total damage: $4.7 million in direct costs, plus immeasurable reputation damage.
The lesson? Backups must be isolated and immutable. If attackers can reach your backups, they're not backups—they're additional targets.
Case Study 3: The Success Story
Now for a happier tale. In 2022, I worked with an e-commerce platform that experienced a catastrophic database failure during Black Friday. Not ransomware, not an attack—just bad luck and a failed drive controller that took down their entire database cluster.
Here's their timeline:
9:14 AM: Database failure detected
9:18 AM: Incident response team activated
9:22 AM: Recovery procedures initiated
9:47 AM: Restore from backup completed
10:03 AM: Data validation finished
10:11 AM: Services fully operational
Total downtime: 57 minutes. During Black Friday. Their estimated revenue loss: $127,000 (painful but survivable). Competitor who suffered similar failure without proper backups? They were down for 4 days and lost an estimated $8.3 million.
What made the difference? A SOC 2-compliant backup and recovery program that was documented, automated, monitored, and tested monthly.
"The quality of your disaster recovery plan is measured in minutes, not good intentions."
The SOC 2 Backup and Recovery Framework
Let me break down what SOC 2 actually requires. This isn't theoretical—this is what auditors will test:
1. Backup Strategy Documentation
SOC 2 requires documented backup strategies that define:
Component | What You Need | Why It Matters |
|---|---|---|
Backup Scope | What data/systems are backed up | Ensures critical assets are protected |
Backup Frequency | How often backups occur | Defines acceptable data loss (RPO) |
Retention Periods | How long backups are kept | Meets recovery and compliance needs |
Backup Types | Full, incremental, differential | Balances storage costs with recovery speed |
Storage Locations | Where backups are stored | Protects against regional failures |
Recovery Objectives | RTO and RPO targets | Sets measurable recovery expectations |
I've seen companies fail audits simply because they couldn't produce this documentation. One client told me, "But everyone on the team knows how it works!"
The auditor's response was perfect: "What happens when your team isn't available during a disaster?"
2. Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
These aren't just acronyms—they're the foundation of your entire backup strategy.
RTO (Recovery Time Objective): How long can your business survive without this system?
RPO (Recovery Point Objective): How much data can you afford to lose?
Here's a real-world example from a client I worked with:
System | Business Impact | RTO | RPO | Backup Frequency | Recovery Method |
|---|---|---|---|---|---|
Production Database | Critical - Revenue generating | 1 hour | 15 minutes | Continuous replication + 15-min snapshots | Automated failover |
Customer Portal | High - Customer experience | 4 hours | 1 hour | Hourly snapshots | Manual restore with automation |
Internal Wiki | Medium - Productivity | 24 hours | 24 hours | Daily backups | Manual restore |
Development Environment | Low - Can rebuild | 1 week | 1 week | Weekly backups | Manual rebuild |
Email Archives | Low - Historical | 1 week | 24 hours | Daily incremental | Manual restore |
Notice how backup strategies vary based on business impact? That's the key insight most organizations miss.
I once worked with a startup that backed up everything with the same frequency—daily. Sounds reasonable, right? Wrong. Their customer database was down for 23 hours because they had no recent recovery point. Meanwhile, they were spending $18,000 monthly to backup development environments that could have been rebuilt from source control.
3. The 3-2-1 Rule (And Why SOC 2 Loves It)
Every backup strategy I implement follows the 3-2-1 rule:
3 copies of your data (production + 2 backups)
2 different media types (local disk + cloud, for example)
1 offsite backup (different geographic region)
But here's where I add my own twist based on modern threats—I call it the 3-2-1-1-0 rule:
3 copies of your data
2 different media types
1 offsite backup
1 offline/immutable backup (protection against ransomware)
0 errors after verification (tested and validated)
Here's how this looks in practice:
Backup Copy | Location | Type | Purpose | Update Frequency |
|---|---|---|---|---|
Primary Data | AWS us-east-1 | Live production | Active operations | Real-time |
First Backup | AWS us-east-1 (separate account) | Automated snapshots | Fast recovery | Every 15 minutes |
Second Backup | AWS us-west-2 | Replicated snapshots | Regional failure protection | Hourly |
Offsite Backup | Azure (different cloud) | Archived backups | Cloud provider failure | Daily |
Immutable Backup | Write-once storage | Locked archives | Ransomware protection | Weekly |
This might seem excessive, but I've seen every single one of these layers save a company from disaster.
The Recovery Procedures SOC 2 Auditors Want to See
Here's something that surprises most people: having backups isn't enough for SOC 2. You need documented, tested recovery procedures.
I learned this during a particularly challenging audit in 2018. The company had beautiful backup systems—automated, monitored, encrypted, geographically distributed. The auditor was impressed.
Then she asked: "Walk me through your recovery procedures for your production database."
The team looked at each other. Shrugged. "We'd figure it out if we needed to?"
Instant failure.
What SOC 2 Auditors Actually Test
Based on conducting and supporting over 40 SOC 2 audits, here's what auditors will examine:
Audit Area | What They Review | What They Test | Common Failures |
|---|---|---|---|
Documentation | Recovery procedure documents | Step-by-step accuracy | Outdated procedures, missing steps |
Access Controls | Who can access backups | Permission testing | Over-permissioned access, no MFA |
Backup Monitoring | Alerting for failures | Alert response evidence | Alerts ignored, no response procedures |
Encryption | Data protection at rest/transit | Encryption validation | Unencrypted backups, weak encryption |
Testing Evidence | Recovery test results | Test completeness | No tests, or incomplete documentation |
Retention Compliance | Backup retention policies | Retention enforcement | Inconsistent retention, no validation |
Let me share a recovery procedure template that's passed every audit I've conducted:
Recovery Procedure: Production Database
Last Updated: [Date]
Last Tested: [Date]
Owner: [Name/Role]
RTO: 1 hour
RPO: 15 minutes"The difference between a disaster and an inconvenience is having a runbook you've actually tested."
Testing: The Part Everyone Skips (And Why That's Dangerous)
Here's a confession: early in my career, I was guilty of the "set it and forget it" mentality with backups. Backups ran automatically, monitoring showed green checkmarks, and I assumed everything was fine.
Then came the phone call.
A client's primary database had failed, and they needed to restore from backup. The most recent backup was corrupted. And the one before that. And the one before that. Turns out, a configuration change six months earlier had broken the backup process, but the monitoring only checked that backups ran, not that they actually worked.
We had 47 backup files. Zero were usable. The company lost three days of data and nearly went bankrupt.
That was the day I became obsessed with testing.
The Testing Schedule That Actually Works
Based on 15 years of experience, here's the testing approach I implement for every SOC 2 client:
Test Type | Frequency | What's Tested | Success Criteria | Time Required |
|---|---|---|---|---|
Automated Validation | Every backup | File integrity, completion status | Backup completes with zero errors | Automated |
Partial Recovery | Weekly | Single database/file restore | Restore completes within RTO, data validates | 30-60 min |
Full System Recovery | Monthly | Complete system restore to test environment | Full functionality restored within RTO | 2-4 hours |
Disaster Simulation | Quarterly | Full recovery in production-like environment | All services operational, all data validated | 4-8 hours |
Tabletop Exercise | Semi-annually | Team walks through disaster scenario | Team follows procedures correctly | 2 hours |
Full Disaster Drill | Annually | Complete recovery including all stakeholders | Organization recovers within documented RTOs | Full day |
The most important thing I've learned? Document everything. SOC 2 auditors want evidence that testing occurred and that any issues were resolved.
Here's a testing log template that's worked for dozens of audits:
Test Date | Test Type | System Tested | RTO Target | Actual Time | RPO Target | Actual Data Loss | Issues Found | Resolution | Tested By |
|---|---|---|---|---|---|---|---|---|---|
2024-01-15 | Full System | Production DB | 1 hour | 43 minutes | 15 min | 12 minutes | DNS failover delay | Updated automation | J. Smith |
2024-01-22 | Partial | Customer files | 2 hours | 1.5 hours | 1 hour | 45 minutes | None | N/A | M. Jones |
Common Backup and Recovery Failures (And How to Avoid Them)
After implementing backup systems for over 50 organizations, I've seen the same mistakes repeated. Let me save you the pain:
Mistake #1: The "Backup Singularity"
Storing all your backups in one location, one account, or one cloud provider.
Real Example: A company I consulted for in 2021 had backups in their production AWS account. An intern with over-permissioned access accidentally deleted the production account. Gone. Everything. Production servers, backups, snapshots, everything.
Solution: Separate AWS accounts for production and backups, with different credentials and strict access controls.
Mistake #2: The "Accessible Backup" Problem
Making backups easily accessible from production networks.
Real Example: Ransomware encrypted a company's production data AND all their backups because the backup storage was mapped as a network drive with the same credentials.
Solution: Implement immutable backups, air-gapped storage, or write-once storage that even administrators can't delete.
Mistake #3: The "Trust but Don't Verify" Approach
Assuming backups work without testing them.
Real Example: A SaaS company discovered during a disaster that their backup process had been failing silently for 8 months due to a permissions issue. The monitoring only checked that the job started, not that it completed successfully.
Solution: Automated integrity checking, hash verification, and regular restoration testing.
Mistake #4: The "Documentation Gap"
Having backups but no recovery procedures.
Real Example: During a critical incident, the only person who knew how to restore the database was on vacation in the Maldives with spotty internet. The recovery that should have taken 2 hours took 14 hours.
Solution: Documented, step-by-step recovery procedures that any qualified team member can follow.
Mistake #5: The "All or Nothing" Strategy
Treating all data the same regardless of criticality.
Real Example: A company was spending $47,000 monthly backing up everything with the same aggressive strategy. Meanwhile, their critical customer database had the same backup frequency as their test environments.
Solution: Tiered backup strategy based on business impact and recovery requirements.
Building a SOC 2-Compliant Backup System: A Practical Roadmap
Let me give you the exact roadmap I follow with clients. This has successfully passed every SOC 2 audit I've supported:
Phase 1: Assessment and Planning (Week 1-2)
Step 1: Business Impact Analysis
Identify and categorize all systems:
Priority Tier | Definition | RTO | RPO | Examples |
|---|---|---|---|---|
Critical | Revenue loss >$10K/hour | < 1 hour | < 15 min | Production databases, payment systems |
High | Significant customer impact | < 4 hours | < 1 hour | Customer portals, support systems |
Medium | Internal productivity impact | < 24 hours | < 24 hours | Internal tools, collaboration platforms |
Low | Minimal immediate impact | < 1 week | < 1 week | Archives, development environments |
Step 2: Current State Assessment
Document what you have:
Current backup systems and configurations
Backup frequency and retention
Storage locations and redundancy
Recovery procedures (if any exist)
Last successful recovery test (if any)
I use a simple assessment checklist:
[ ] All critical systems have backups
[ ] Backup frequency meets RPO requirements
[ ] Backups stored in multiple locations
[ ] Backups are encrypted
[ ] Backup access requires MFA
[ ] Automated backup monitoring exists
[ ] Recovery procedures are documented
[ ] Recovery has been tested in last 30 days
[ ] Team is trained on recovery procedures
[ ] Backup costs are tracked and optimized
Phase 2: Implementation (Week 3-8)
Week 3-4: Infrastructure Setup
Set up your backup infrastructure:
Primary Backup System: Same region as production, fast recovery
Secondary Backup: Different region, disaster protection
Tertiary Backup: Different cloud provider or on-premises
Immutable Storage: Write-once-read-many protection
Week 5-6: Automation and Monitoring
Implement automated systems:
# Example automated backup validation
def validate_backup(backup_file):
checks = {
'file_exists': check_file_exists(backup_file),
'size_reasonable': check_file_size(backup_file),
'integrity_verified': verify_checksum(backup_file),
'encryption_confirmed': verify_encryption(backup_file),
'timestamp_recent': check_timestamp(backup_file)
}
if all(checks.values()):
log_success(backup_file)
return True
else:
alert_team(checks)
return False
Week 7-8: Documentation and Training
Create comprehensive documentation:
Backup configuration documentation
Recovery procedures for each system
Escalation procedures
Team training materials
Phase 3: Testing and Validation (Week 9-12)
Week 9: Initial recovery tests on non-critical systems
Week 10: Full recovery test on production-like environment
Week 11: Disaster simulation with full team
Week 12: Documentation refinement and final validation
The Monitoring and Alerting Setup That Saves Lives
Let me share the monitoring setup that's prevented countless disasters:
Critical Alerts (Immediate Response Required)
Alert | Trigger | Response Time | Action Required |
|---|---|---|---|
Backup Failed | Any backup job fails | 5 minutes | Investigate and remediate immediately |
Backup Validation Failed | Integrity check fails | 5 minutes | Test restore, escalate if necessary |
Storage Near Capacity | >85% storage used | 30 minutes | Provision additional storage |
Backup Older Than Expected | Last successful backup exceeded RPO | 15 minutes | Force manual backup, investigate |
Replication Lag Exceeded | Cross-region replication delayed | 15 minutes | Check network, verify replication |
Warning Alerts (Address Within Business Hours)
Alert | Trigger | Response Time | Action Required |
|---|---|---|---|
Backup Duration Increased | Backup took 2x normal time | 4 hours | Review performance, optimize if needed |
Storage Growth Anomaly | Unusual storage consumption | 4 hours | Investigate data growth pattern |
Incomplete Test | Monthly test not completed | 24 hours | Schedule and complete test |
Informational Alerts (Track and Review)
Successful backup completions
Test recovery completions
Storage utilization trends
Cost trending
I set up a Slack channel for one client that automatically posts backup status. Green checkmark for success, red X for failures, yellow warning for issues. The team sees the health of their backups 20+ times per day, creating a culture of backup awareness.
"What gets monitored gets managed. What gets alerted gets fixed. What gets ignored creates disasters."
Real-World Backup Architecture Examples
Let me show you three actual architectures I've implemented that passed SOC 2 audits:
Architecture 1: Small SaaS Company (Series A, 50 employees)
Environment: AWS-based, PostgreSQL database, 2TB data
Component | Solution | Cost/Month | RPO | RTO |
|---|---|---|---|---|
Primary Backup | AWS RDS automated snapshots (15-min) | Included | 15 min | 30 min |
Secondary Backup | AWS Backup to S3 (different region) | $340 | 1 hour | 2 hours |
Immutable Backup | S3 Glacier with vault lock | $180 | 24 hours | 4 hours |
Monitoring | CloudWatch + PagerDuty | $120 | N/A | N/A |
Total | $640/month |
This setup costs less than $8,000 annually but provides enterprise-grade protection.
Architecture 2: Mid-Size FinTech (Series C, 300 employees)
Environment: Multi-cloud, MySQL clusters, 50TB data
Component | Solution | Cost/Month | RPO | RTO |
|---|---|---|---|---|
Primary Backup | AWS RDS continuous backup | Included | 5 min | 15 min |
Secondary Backup | Cross-region replication | $4,200 | 5 min | 15 min |
Tertiary Backup | Azure blob storage (different cloud) | $2,800 | 1 hour | 4 hours |
Immutable Backup | AWS S3 Glacier with object lock | $1,200 | 24 hours | 8 hours |
Testing Environment | Automated weekly full restore | $1,800 | N/A | N/A |
Monitoring | Datadog + PagerDuty enterprise | $850 | N/A | N/A |
Total | $10,850/month |
Architecture 3: Enterprise Healthcare (500+ employees)
Environment: Hybrid cloud, HIPAA-compliant, 200TB data
Component | Solution | Cost/Month | RPO | RTO |
|---|---|---|---|---|
Primary Backup | On-premises appliance + cloud sync | $8,500 | 15 min | 30 min |
Secondary Backup | AWS with HIPAA BAA | $12,000 | 15 min | 1 hour |
Tertiary Backup | Azure with HIPAA BAA | $11,000 | 1 hour | 4 hours |
Immutable Backup | Tape library (air-gapped) | $3,200 | 24 hours | 12 hours |
DR Site | Hot standby environment | $18,000 | 15 min | 30 min |
Testing | Monthly full DR drills | $4,500 | N/A | N/A |
Monitoring | Splunk + ServiceNow | $2,800 | N/A | N/A |
Total | $60,000/month |
Notice the pattern? Investment scales with business criticality and data volume, but the principles remain the same.
The Backup Retention Strategy That Balances Cost and Compliance
One of the most common questions I get: "How long should we keep backups?"
The answer depends on several factors:
Consideration | Typical Requirement | Example Retention |
|---|---|---|
SOC 2 Requirements | Evidence of controls over audit period | Minimum 12 months |
Legal/Regulatory | Industry-specific requirements | 7 years (financial), 6 years (healthcare) |
Business Recovery | Ability to restore from various points | 30 days (daily), 12 months (weekly) |
Cost Optimization | Balance storage costs vs. utility | Tiered storage (hot to cold to archive) |
Ransomware Protection | Recovery before infection | 90 days minimum |
Here's a retention strategy I implemented for a fintech company:
Backup Type | Retention Period | Storage Tier | Estimated Cost/TB/Month |
|---|---|---|---|
Continuous snapshots | 24 hours | Hot (SSD) | $100 |
Hourly snapshots | 7 days | Warm (SSD) | $50 |
Daily snapshots | 30 days | Cool (HDD) | $20 |
Weekly snapups | 12 months | Cold (S3 Standard) | $8 |
Monthly snapshots | 7 years | Archive (Glacier) | $1 |
This tiered approach reduced their backup costs by 67% while actually improving their recovery capabilities.
Common SOC 2 Audit Findings and How to Address Them
After supporting dozens of SOC 2 audits, here are the most common findings related to backup and recovery:
Finding #1: "Backup restoration procedures not documented"
Auditor's Concern: Without documented procedures, recovery is dependent on individual knowledge.
Remediation:
Create step-by-step recovery procedures for each critical system
Include screenshots and command examples
Store procedures in accessible location (wiki, SharePoint, etc.)
Review and update quarterly
Timeline: 2-4 weeks
Cost: Internal time only
Finding #2: "No evidence of backup restoration testing"
Auditor's Concern: Untested backups may not work when needed.
Remediation:
Implement monthly recovery testing schedule
Document test results with screenshots and validation
Create remediation plans for any issues found
Store evidence in organized fashion for audit
Timeline: Ongoing (monthly)
Cost: 4-8 hours/month of team time
Finding #3: "Backup monitoring alerts not configured or not responded to"
Auditor's Concern: Backup failures may go unnoticed.
Remediation:
Configure alerts for all backup failures
Establish response procedures and timeframes
Document alert investigations and resolutions
Implement escalation for unresolved alerts
Timeline: 1-2 weeks
Cost: $100-500/month for monitoring tools
Finding #4: "Inadequate backup encryption"
Auditor's Concern: Sensitive data not properly protected in backups.
Remediation:
Enable encryption at rest for all backup storage
Implement encryption in transit for backup transfers
Use strong encryption (AES-256 minimum)
Document encryption methods and key management
Timeline: 1-2 weeks
Cost: Often included in cloud services
Finding #5: "Insufficient geographic redundancy"
Auditor's Concern: Single location failure could eliminate all backups.
Remediation:
Implement backups in at least two geographic regions
Consider multi-cloud strategy for critical systems
Document disaster scenarios and recovery from each backup location
Test cross-region recovery
Timeline: 2-4 weeks
Cost: Varies ($500-5,000/month depending on data volume)
Lessons from 15 Years of Disasters and Recoveries
Let me close with some hard-won wisdom:
Lesson 1: Speed of recovery beats perfection of backup
I've seen companies with beautiful, complex backup systems take days to recover because the process was too complicated. Keep it simple. Keep it fast.
Lesson 2: The best backup system is the one you'll actually test
I'd rather have a simpler backup system that gets tested monthly than a sophisticated system that never gets validated.
Lesson 3: Automate everything, but trust nothing
Automation is essential, but automated validation is even more critical. Never assume automated processes work without verification.
Lesson 4: Culture matters more than technology
The organizations with the best recovery capabilities aren't necessarily the ones with the most expensive tools—they're the ones where everyone understands and values backup and recovery.
Lesson 5: The disaster you prepare for isn't the one you'll face
I've never seen a disaster unfold exactly as planned in tabletop exercises. But organizations that practice recovery do infinitely better than those that don't.
"Your backup strategy should be boring and reliable, not innovative and exciting. Save innovation for your products. Make backups predictable."
Your Action Plan: Getting SOC 2-Ready
Here's what to do this week:
Monday:
Inventory all systems and data
Identify current backup status
List gaps in coverage
Tuesday-Wednesday:
Calculate RTO and RPO for each critical system
Document current recovery procedures (if any)
Identify missing documentation
Thursday:
Test recovery of one non-critical system
Document the process and time required
Note any issues or improvements needed
Friday:
Review findings with leadership
Create prioritized remediation plan
Schedule follow-up meeting for next steps
Final Thoughts: The Backup Strategy That Saved a Company
I'll end where I started—with that SaaS company whose COO asked about recovery time.
After implementing a proper SOC 2-aligned backup and recovery program, they experienced that database corruption I mentioned. Two hours and seventeen minutes from detection to full recovery.
But here's the part that still gives me chills: their biggest competitor suffered a similar incident three months later. Same type of database corruption. Similar company size.
The competitor was down for six days. They lost 28% of their customers. They had to lay off 40% of their staff. They're still recovering financially three years later.
My client? They sent a status update to customers explaining the brief interruption and offering a service credit for the downtime. Most customers didn't even notice. They actually gained customers during the incident because of their transparent communication and quick recovery.
That's the difference between having backups and having a SOC 2-compliant backup and recovery program.
Your backups aren't protecting your business. Your tested, documented, validated, monitored recovery capability is protecting your business.
Invest accordingly.