Skip to main content

How to write an RFP for disaster recovery

Requirements, questions, and evaluation criteria specific to disaster recovery procurement

8 min read

Disaster recovery RFPs are critical because they address the complex challenge of maintaining business operations amidst increasing cyber threats and infrastructure failures. A well-crafted RFP ensures that your organization selects a solution capable of minimizing downtime and data loss, safeguarding your reputation and financial stability.

What makes disaster recovery RFPs different

Disaster recovery RFPs are unique due to the high stakes involved and the intricate technical requirements. Unlike other software procurements where a poor choice may lead to operational friction, an inadequate disaster recovery solution can result in permanent business failure.

This necessitates a rigorous evaluation process that goes beyond basic feature comparisons.nnThe complexity stems from the need to protect diverse workloads across on-premises, cloud, and hybrid environments, while also addressing evolving cyber threats like ransomware. Regulatory mandates such as DORA further complicate the landscape, requiring financial institutions to manage third-party IT risks with unprecedented rigor.

A successful RFP must therefore delve into specific technical capabilities like immutable storage, orchestrated failover, and identity resilience.

  • Defining clear Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) for different application tiers.
  • Ensuring the proposed solution offers immutable storage to protect backups from ransomware.
  • Verifying the vendor's ability to orchestrate failover and failback of complex multi-tier applications.
  • Assessing the vendor's experience and expertise in recovering from cyber incidents, not just traditional disasters.

RFP vs RFI vs RFQ

Here's when to use each document type when procuring disaster recovery software.

RFI

Request for Information

Use early in your search to understand what vendors offer and narrow your list. Gather general capabilities, company background, and high-level pricing ranges.

RFP

Request for Proposal

Use when you know your requirements and want detailed vendor solutions and pricing. This is your main evaluation document for shortlisted vendors.

RFQ

Request for Quote

Use when requirements are fixed and you just need final pricing. Often used after RFP when you're ready to negotiate with finalists.

When procuring disaster recovery solutions, an RFI is helpful for initial market research to understand available technologies and vendor capabilities. An RFP is essential for a detailed evaluation of specific solutions against defined requirements, while an RFQ is generally unsuitable due to the complexity and customization involved.

Technical requirements checklist

Use this checklist when defining your RFP scope.

Data Protection

  • Continuous Data Protection (CDP) with sub-minute RPOs
  • Immutable storage (WORM) to prevent ransomware encryption
  • Air-gapped backups for offline protection
  • Support for various data sources (on-premises, cloud, SaaS)

Recovery Capabilities

  • Automated failover and failback orchestration
  • Granular point-in-time recovery
  • Application-aware recovery
  • Identity resilience (Active Directory/Entra ID protection)

Security

  • Anomaly detection using AI/ML
  • Integration with SIEM platforms
  • Role-based access control (RBAC)
  • Encryption at rest and in transit

Deployment & Management

  • Support for hybrid cloud environments
  • Unified management console
  • Automated DR testing capabilities
  • Compliance reporting dashboards

Compliance

  • Alignment with regulatory requirements (DORA, HIPAA, GDPR)
  • Data residency options
  • Audit logging and reporting
  • Third-party certifications (SOC 2, ISO 27001)

Questions to include in your RFP

Architecture & Deployment

  • Describe your solution's architecture, including data storage, replication, and failover mechanisms.
    Understanding the architecture is critical for assessing scalability and resilience.
  • What deployment options are available (cloud, on-premises, hybrid), and what are the pros and cons of each?
    Ensures alignment with your organization's infrastructure strategy.
  • How does your solution handle data sovereignty and compliance requirements in different geographic regions?
    Crucial for organizations operating internationally or with specific data residency needs.
  • Explain your solution's approach to minimizing latency and ensuring data consistency during replication.
    Impacts recovery time and data integrity.

Data Protection & Immutability

  • Describe your solution's approach to data immutability and how it prevents ransomware from encrypting backups.
    Immutable storage is a critical defense against ransomware attacks.
  • What specific administrative controls prevent an attacker with compromised root credentials from deleting or altering backups?
    Verifies the robustness of the immutability implementation.
  • How does your solution ensure the integrity and recoverability of backups in the event of a storage system failure?
    Protects against data loss due to hardware malfunctions.
  • What data encryption methods are used at rest and in transit, and what key management practices are employed?
    Ensures data confidentiality and security.

Orchestration & Automation

  • Describe your solution's orchestration capabilities for automating failover and failback of multi-tier applications.
    Automated orchestration reduces recovery time and minimizes human error.
  • Can you demonstrate a "clean recovery" of our specific multi-tier application stack within our RTO, including the restoration of associated Identity (AD) services?
    Validates the solution's ability to recover complex applications in a real-world scenario.
  • How does your solution manage application dependencies and ensure that systems are brought online in the correct sequence?
    Critical for maintaining application functionality after a failover.
  • What level of customization and scripting is required to adapt your orchestration workflows to our specific environment?
    Impacts implementation time and ongoing maintenance effort.

Testing & Reporting

  • Describe your solution's automated DR testing capabilities and how they help us validate our recovery readiness.
    Regular testing is essential for verifying the effectiveness of the DR plan.
  • What types of reports are available to document test results and compliance with regulatory requirements?
    Facilitates auditing and compliance reporting.
  • How frequently should we perform DR tests, and what resources are required from our team?
    Helps plan for ongoing maintenance and testing activities.
  • Can your solution simulate ransomware attacks to test our recovery procedures and identify vulnerabilities?
    Proactively identifies weaknesses in the DR plan.

Pricing & Licensing

  • Describe your pricing model, including all licensing fees, implementation costs, and ongoing maintenance charges.
    Transparency in pricing is essential for accurate budgeting.
  • How does your pricing model account for "compute burst" and "data egress" costs during a full-scale failover and failback exercise?
    Cloud-based DR solutions can become prohibitively expensive during actual disasters or tests if these costs are not factored in.
  • Are there any hidden costs or usage-based fees that we should be aware of?
    Avoids unexpected budget overruns.
  • Do you offer discounts for long-term contracts or volume purchases?
    Negotiating favorable pricing terms can significantly reduce TCO.

Vendor Experience & Support

  • How many years have you been providing disaster recovery solutions, and what is your track record of successful recoveries?
    Experience is a key indicator of vendor reliability.
  • Can you provide customer references in our industry who have successfully recovered from a disaster using your solution?
    Relevant references demonstrate the vendor's ability to meet your specific needs.
  • What is your average "Recovery Time Actual" (RTA) observed across your customer base for a ransomware event, and how does this differ from theoretical RTO?
    RTA reflects the practical friction of human decision-making and data movement, providing a more realistic measure of recovery performance.
  • What level of technical support is included with your solution, and what are your service level agreements (SLAs) for response and resolution times?
    Ensures timely assistance during critical recovery events.

Compliance and security requirements

Depending on your industry, you may need to require proof of these certifications and standards.

HIPAA

Required if handling protected health information (phi). If applicable, request a Business Associate Agreement (BAA) and documentation of HIPAA compliance measures.

GDPR

Required if processing personal data of eu citizens. If applicable, request documentation of GDPR compliance, including data residency options and data subject rights management.

SOC 2 Type II

Required increasingly becoming a baseline requirement for cloud-based services. If applicable, request a current SOC 2 Type II report to verify the vendor's security controls.

DORA (Digital Operational Resilience Act)

Required for financial entities operating in the eu. If applicable, request documentation on how the solution helps meet DORA requirements for third-party IT risk management and resilience testing.

ISO 27001

Required demonstrates adherence to international standards for information security management. If applicable, request a copy of their ISO 27001 certification and scope of certification.

Evaluation criteria

Here is the suggested weighting for disaster recovery RFPs.

Functionality Fit How well the solution meets the defined requirements for data protection, recovery orchestration, and security.
25%
Technical Architecture The scalability, resilience, and security of the solution's architecture.
20%
Total Cost of Ownership The overall cost of the solution, including licensing, implementation, maintenance, and usage-based fees.
20%
Vendor Experience & Support The vendor's experience, reputation, and the quality of their technical support.
15%
Compliance & Security The solution's ability to meet relevant compliance requirements and protect data from cyber threats.
10%
Ease of Use & Management The simplicity and efficiency of the solution's management interface and automation capabilities.
10%

Some weights were adjusted based on your priorities.

  • Increase if the solution offers unique or advanced features.
  • Increase if the solution supports complex hybrid cloud environments.
  • Decrease if the solution offers significant cost savings compared to alternatives.
  • Increase if the vendor has a proven track record of successful recoveries in your industry.
  • Increase if your organization operates in a highly regulated industry.

Red flags to watch

  • Inability to provide customer references in your industry

    Lack of relevant references suggests limited experience with your specific requirements and use cases. It may also indicate customer dissatisfaction.

  • Vague pricing responses or hidden fees

    Vendors who can't provide clear pricing often have complex fee structures that inflate TCO. Watch out for unexpected charges for data egress, compute bursts, or support services.

  • Reliance on manual processes for failover and failback

    Manual processes are prone to errors and delays, increasing recovery time and business disruption. Look for solutions with automated orchestration capabilities.

  • Untested implementation or lack of DR testing capabilities

    A DR plan is only as good as its last successful test. Vendors who cannot demonstrate a history of successful, unannounced test results are a significant risk.

  • Limited support for immutable storage or air-gapped backups

    Without these features, your backups are vulnerable to ransomware attacks. Ensure the vendor offers robust protection against data encryption.

Key metrics to request

Ask vendors to provide benchmarks from similar customers.

Average Recovery Time Objective (RTO) and Recovery Point Objective (RPO) achieved for similar customers

Provides a benchmark for expected recovery performance.

Implementation timeline for similar customers

Helps set realistic expectations and identify potential delays.

Number of successful disaster recovery tests performed annually

Demonstrates the vendor's commitment to ongoing testing and validation.

Customer satisfaction rating and Net Promoter Score (NPS)

Indicates overall customer experience and satisfaction with the vendor's services.

Percentage of backups successfully recovered during tests

Measures the reliability and effectiveness of the backup and recovery process.