Backup and recovery are some of the oldest and most frequently performed data center operations. Yet they remain frustrating, as the need for effective recovery challenges storage and data center managers. Gartner's client inquiry call volume regarding backup remains high on a year-to-year basis. We see logjams in the data backup and recovery process, such as:
- Amount and proliferation of data
- More-stringent SLAs
- Antiquated backup models (that is, scale issues)
The Three Cs of Backup Concerns
Cost
While the upfront acquisition cost remains a concern, the annual maintenance fee associated with backup software is the issue for larger enterprises. As more applications have been deployed, and more machines (physical and virtual) have been installed, the costs of backup have risen. Often coupled with this is the desire to add protection to remote offices, and sometimes to protect desktops and laptops. All combined, these things can result in a larger backup bill, and much larger annual maintenance and service costs. In some situations, newer pricing models such as pricing based on capacity (in terabytes) has helped, while in other cases, they have become an increasing concern.
Capability
From end-user inquiries, conference polling and worldwide surveys, Gartner clients' top complaints about backup capability include not meeting the backup window, not being able to restore data fast enough to meet SLAs, point solutions not being able to fully protect all data types or locations, daunting administrative interfaces, and incomplete reporting. In particular, they have concerns that their vendors or products are not keeping pace with the level of innovation of competing solutions.
Complexity
Organizations also complain about the complexity of their backup solutions. The feeling is that too much time, expertise and effort are spent keeping the current recovery systems afloat. Organizations would like backup to be a process that requires far less supervision and administrative attention, and for higher backup and restore success rates to be easier to achieve.
The Fourth and Fifth Cs of Backup Concerns
Over the past three years, Gartner clients have mentioned additional backup frustrations, such as completeness and scale, and customer support.
Completeness and Scale
This is a variation on the capability issue above. Although a particular function may be available, organizations have concerns about how robust it is, how much additional effort is required to employ it, and especially, how effective the overall solution performs at broad scale. An example of this is server virtualization support, which may have been recently improved and deemed workable initially, but at scale, has exhibited issues.
Customer Support
Many organizations are basing their vendor renewal considerations on the quality of support that they receive. Breakdowns in support systems can lead to lost confidence in the product or vendor. Organizations increasingly do not want to rely too much on the heroics of their in-house teams to deliver an effective backup practice.
1. Fully Implement Current Backup Solution Capabilities
For a variety of reasons, some organizations have yet to embrace and deploy data protection approaches and techniques that are already available. The expanded use of disks for backup — while a marketing slogan for some time — is now the norm for the industry (see "Organizations Leverage Hybrid Backup and Recovery to Take Advantage of Speed and Low Cost"). Client- and target-side data deduplication solutions are offered by many providers. Most backup suppliers have delivered significant improvements in server virtualization and SharePoint recovery in their most recent product releases. Support for endpoint backup and cloud applications is becoming more robust.
New capabilities can take three to five years or more to gain widespread adoption, as most organizations are risk-averse. However, many companies delay implementation because they do not know their vendor options and capabilities.
Action Item:
- Before making plans to jettison your current backup product, ensure that your vendor has provided feature updates from the past three years to fully leverage the investment.
2. Implement Archiving and Improved Data Management Practices
The vast majority of the backup methodologies use a model whereby frequent full backups of all data are taken. While it is often standard procedure to configure the application for nightly incremental backups for six consecutive days and then once a week (usually on a weekend day) to take a full backup, many organizations opt for a full, nightly backup for email and critical databases, as that practice can minimize the amount of restore processing that needs to occur. While this approach has worked well in the past, many now find they cannot contain backup activity in the available time (the backup window). While newer backup approaches can help address this, many backup implementations still rely on the "full plus" incremental concept.
Removing files, email and application data from primary storage can drastically reduce the amount of backup processing required during each full backup, known as reducing the "working store." Organizations could perform a garbage collection process, using a storage resource management (SRM) tool. However, most use data identification tools in an archive solution or emerging stand-alone file analysis solutions to identify archive candidates, or they outright delete unneeded and duplicate data.
Implementing an archive solution that moves data to lower-cost storage devices can reduce the backup window. Archiving also can:
- Provide faster restore times for a complete recovery of all backed-up data (since there is less data to bring back)
- Reduce the cost of storing and potentially transporting the backup media off-site
- Decrease backup retention periods
Gartner typically recommends a 90-day backup retention (see "Modify Your Storage Backup Plan to Improve Data Management and Reduce Cost"). This results in lower exposure for e-discovery during a litigation activity, and can help contain the significant labor issue of scanning backup tapes for required legal materials.
Action Items:
- Implement an archiving solution as part of an overall information governance strategy to improve backup and restore times, and reduce primary and backup storage costs.
- Evaluate SRM and especially file analysis tools as a way to implement a "defensible deletion" policy in your organization.
- If possible, reduce backup retention to 90 days to reduce costs, accelerate overall backup or recovery processing, and reduce e-discovery exposures.
3. Evaluate Newer Backup Pricing Models
Backup products are traditionally priced on a per-server basis — with add-on costs for advanced features, and newer capabilities commanding a premium price. Over time, most vendors continue to collapse the number of charged items into their base products, or add features into an expanded, extended or enterprise offering. As a result, current deduplication charges should be expected to decline, perhaps as a result of competitive pressures.
Nearly every backup vendor has introduced capacity-based pricing. For organizations that deploy server virtualization (which was sometimes a backup option separately charged for), many application agents or advanced disk features, the capacity-based bundle can represent a more attractive overall initial cost. For organizations that have a fewer number of servers and a larger amount of data — especially when a single server is used as a proxy for many terabytes of network-attached storage (NAS) Network Data Management Protocol (NDMP), or when few advanced features have been implemented — capacity-based pricing may cost more than traditional server-based licensing.
For larger enterprises, the maintenance costs of the typical three-year backup software purchase and renewal agreements loom as a greater concern than the initial acquisition cost, as these represent a future spending commitment. This future-oriented concern is amplified by capacity-based models since the growth rate of data is faster than the growth rates of physical and virtual servers. Vendors can differ as to where the capacity is measured — on the "front end" for the data being backed up, or the "back end," measuring the amount of data that the backup solution generates, which is typically after compression and deduplication.
Clients tell us they are concerned about capacity-based pricing, noting that some leading vendors are expensive when deployed at scale. In July 2013, Asigra announced a new pricing model that has a smaller base charge, but charges on successful recoveries performed.
1 Gartner expects that the industry will begin to pressure backup software vendors for lower costs at larger backup volumes to avoid the similar issue where backup appliances become viewed as cost-prohibitive when broadly deployed (see "Storage Appliances May Transcend IT Silos and Incur High Cost at Scale").
Action Items:
- If you are under a maintenance agreement, investigate whether you are entitled to free upgrades. Consider newer versions when the vendor bundles previously charged-for features.
- When evaluating new backup solutions or extending a maintenance agreement with your current backup vendor, look for new pricing and packaging plans, such as capacity-based licensing, product versions with a collapsed parts list that include additional features in the base product, and current and upcoming bundles that combine features at a lower overall cost.
- When negotiating with vendors, first, understand which features are additionally charged for and what capabilities are included at no additional expense in all products that are on your shortlist to ensure an accurate TCO and for use in pricing negotiations.
4. Fully Implement Data Reduction
The value of data reduction technologies, such as deduplication, cannot be overstated. Deduplication improves the economics of disk-based backup and recovery approaches by reducing data — resulting in significantly lower disk requirements and cost, and providing more-efficient and faster replication of backup data to off-site locations. Gartner believes that data reduction, such as compression and deduplication, is a "must have" capability for backup solutions.
The benefits of deduplication are in resource savings. Potential savings can occur on many levels. The primary benefit is in substantially decreasing the amount of disk space required to store a given amount of data. Gartner clients typically report deduplication ratios of 7-to-1 to 20-to-1. Actual ratios vary depending on the amount of data redundancy, the type of data (encrypted and previously compressed files often do not further compress or deduplicate with backup deduplication), the data change rate, and the backup methodology (for example, full, full plus incremental, full plus differential or incremental forever).
The more often that full backups are conducted, the higher the deduplication ratio is. Depending on the deduplication implementation, there can be bandwidth savings in the amount of data transferred over the network. Deduplication can decrease power and cooling needs and the physical size of storage devices, as this practice uses less physical capacity and lowers acquisition costs. The benefits of data reduction increase as more data is processed and stored, and as a larger history is available for comparison and reduction.
Action Items:
- When evaluating backup software or disk-based hardware solutions, consider data reduction (such as compression and data deduplication) a must-have feature and an essential part of assessment criteria. Many backup vendors have released new or expanded deduplication features. Understand the latest capabilities so as not to be incorrectly swayed by a vendor's positioning of the competitive capabilities of other solutions.
- Re-evaluate the applicability of deduplication to more workloads. The use of solid-state drives (SSDs) for portions of the backup infrastructure (especially for the deduplication index) — or more-refined algorithms that split processing requirements across backup clients, the media or master servers — have made the use of deduplication more appropriate for more data types.
- Deduplication, in particular, offers the potential for many cost savings (decreased disk, bandwidth, power and cooling requirements). However, ensure that any premium paid for the capability does not offset the economic savings. Even if deduplication costs more than a nondeduplicated solution, consider the performance and operational benefits.
5. Implement Unified Recovery Management (Snapshot and Replication Support)
The environment that needs to be protected is expanding. There are not only newer types of applications and an increasing amount of virtualization deployed in the data center, but also new workloads such as test and development, remote office and branch office (ROBO), and endpoints (laptops, tablets and smartphones), generating data that needs to be protected. At the same time, hypervisor solutions, and snapshot and replication solutions that are based on servers, storage arrays and networks, are becoming pervasive at attractive cost points. Enterprise backup vendors are responding with leading backup solutions, adding capability to protect more data center and non-data-center workloads at the file, application, virtual machine and volume levels.
This has led to the notion of a single administrative console, catalog and reporting engine for all data capture mechanisms. This can enable application and virtualization administrators to perform backups and restores by utilizing the primary backup application. Backup vendors are integrating with storage systems to catalog snapshots (such as, in order of the breadth of support, CommVault, HP, IBM, EMC, Symantec, Asigra and Veeam), and some products offer integration with replication engines (see "The Future of Backup May Not Be Backup").
Traditional backup products will eventually transform into recovery management solutions that may not own all of the data capture and data transfer techniques. In addition to traditional backup and recovery (application-, file- and image-based, and so on), stronger support for solutions for server-based replication, storage-array-based replication, or intelligent-switch-based or network-based replication will become more important.
The notion of copy data management — which reduces the proliferation of secondary copies of data for backup, disaster recovery, testing and reporting — is becoming increasingly important to contain costs and to improve infrastructure agility.
There will also be a "manager of managers," a common and established concept in the networking and system management domains. A hierarchy of federated management tools feed into one another, becoming a unified recovery manager. This allows for simplified implementation of several tiers and service levels, offering centralized monitoring, reporting and control.
Action Item:
- Before making additional investments in backup software, push incumbent and prospective recovery vendors for current and committed road maps for their manager-of-managers support — especially snapshot and replication management and integration, copy data management, and single-console capabilities.
6. Implement Tiered Recovery
Directionally, disk usage, data deduplication, replication for electronically vaulting off-site copies of data, and snapshots for taking more-frequent copies of data are all on the rise. Yet the same tools, technologies and backup techniques from decades ago are also typically still implemented. This expanded menu of options, techniques, cost structures and service levels has changed the way that organizations deliver backup services.
In the past, backup was very much a "one size fits all" endeavor. Often, the only major backup decisions being made were whether something should be backed up and, if so, how long it should be retained. Today, new techniques and technologies have led to an expanded menu of choices, and one current or emerging recovery approach does not always win out over another. Administrators will have more flexibility, including differentiated levels of cost and service, in providing recovery solutions. Just as the concept of tiered storage provides a variety of cost and performance levels for storage capacity, tiered recovery provides differentiated levels of backup and recovery services. Unlike tiered storage, the tiered recovery model may be additive, with an organization using multiple techniques together to achieve the needed overall level of data availability and recovery characteristics, and to ensure that business risk and continuity requirements are met.
To implement tiered recovery, organizations should conduct a business impact assessment (BIA) to categorize the criticality of the IT services. Any recovery architecture must include an understanding of the IT services supporting business processes and associated service levels. Service levels affect the capabilities, cost, architecture and complexity of the backup solution. Gartner recommends performing a BIA for backup data to contain cost and to deliver the most appropriate service levels. Most organizations specify three to five tiers of criticality, with the highest tier having the most stringent service levels.
Action Items:
- Implement tiered recovery to optimize the balance between cost and recovery service levels, recognizing that the exploitation of storage device snapshot and recovery techniques are likely to be leveraged to reduce backup windows and improve restore times.
- Conduct a BIA, and review it annually to determine the criticality of your business systems and their data. Implement tiered recovery by using the BIA results, and devise three to five tiers. Associate recovery service levels to each tier, including recovery time objective (RTO), recovery point objective (RPO), retention and off-site copies.
7. Perform Regular Restore Testing
Backups may be unrecoverable for many reasons. Some of the more common issues are server configuration and application deployment updates, user or operator error in the backup process, and hardware and software failures. In most organizations, backups are initially set up and then automatically run. Backup verification tends to be only a review of the backup logs, with a quick scan for failures and error messages.
This process may be acceptable for determining whether data was successfully written to the backup media. However, it doesn't provide information about whether the data is recoverable, and does not validate that the data is logically consistent and usable by the application, nor that the right information was backed up. Some businesses have instrumented backup reporting tools to better understand how backups trend over time, and to get more visibility into backup successes and failures.
Actual recovery is the only way that a data center can be certain that data is fully recoverable. Backup or restore testing has become a dying practice in most data centers, and as a result, organizations could be far less resilient than they believe they are.
Action Item:
- Perform data recovery testing at least once a year on a subset of data to ensure that the backup strategy can effectively meet the stated protection SLAs. More-frequent testing of the most mission-critical data may be warranted.
8. Ensure That the Cloud Backup Has a Local Copy
Public cloud options are becoming increasingly considered for server workloads, especially for remote-office and departmental computing environments. The best practices discussed above apply equally as well in a cloud backup deployment. While most organizations cite concerns over security as their top cloud issue, the greater issue is often latency, as data encryption and key management are well-established methods for protecting off-site backup data. This means that an on-premises, local copy of the data, or at least the most recent backup of the most critical data, is best practice. Thus, a disk-to-disk-to-cloud (D2D2C) model is emerging (for a deeper look into best practices for cloud backup, see "Is Cloud Backup Right for Your Servers?").
Action Item:
- Ensure that all servers with a restore data payload of 50GB or more, and an RTO of one day or less, have local disk copies of the backup data for first-level operational restore requirements. This would only be for protection against logical errors, such as accidental deletion and data corruption, and a limited number of physical threats, and the cloud copy of the data would be used for larger disaster recovery remediation.