
There’s no question that Active Directory is a mission-critical part of your infrastructure. Enabling, as it does, authentication for services like e-mail, collaboration, databases, applications, file sharing, and more, it’s crucial that Active Directory remain online and operational. That means Active Directory backups are always going to be an important part of your life. Many organizations continue to rely on traditional backup approaches that, when closely examined, actually do very little to meet the most common business needs for restoration and recovery. As a result, restore and recovery operations take longer, require more manual effort and administrative overhead, and generally create a drag on the organization’s efficiency and effectiveness.
In this paper, we’ll explore some of the tradeoffs that traditional backups have typically required organizations to make. We’ll also look at the underlying business needs for a variety of scenarios, so that you can consider other, more modern backups approaches with a goal of finding one that more precisely fits your business needs.
First, you should always keep in mind that any kind of backup is essentially an insurance policy. You figure out how much risk you’re willing to be open to, and you buy a policy to cover the rest. That policy costs money, and you typically hope you never have to make a claim under that policy, even though you pay the premiums on time, ever time.
With that in mind, consider that an AD backup plan must always have four components:
These components are listed in order of decreasing granularity, and in most organizations this list also corresponds to the frequency with which each component is needed. In other words, attribute-level backups provide very granular recovery capability, and in many organizations represent the most commonly performed kind of recovery. A forest-level backup is the least granular, involving as it does the entire forest, and for most organizations it’s the rarest form of recovery. As you assemble your backup plan, you’ll have to determine the point on this list where you won’t provide a specific recovery capability.
For example, forest recovery is actually a very complex and detailed operation; Microsoft product support will only support recovered forests if they walk you through the recovery. That doesn’t mean that you can’t use tools to speed up the steps they’ll ask you to take, but it does mean that forest-level recoveries are a major undertaking. Some organizations decide that a forest-level failure is a serious enough risk, offering significant enough damage, that they want to have whatever tools they can get to speed that process up and reduce the impact as much as possible. Other organizations decide that the domain-level failure scenario is the most serious they’ll plan to deal with, and if a forest-level failure does occur, they choose to accept the risk and impact rather than making a plan to mitigate it.
The point is that it doesn’t matter what kind of failure scenarios you choose to deal with and provide mitigation for, so long as you’re deciding, based on your organization’s specific needs, to take a particular path.
Many organizations rely primarily on point-in-time backups of a domain controller to meet their primary recovery needs. This approach has some significant disadvantages, which often aren’t clear unless the organization makes an effort to clearly track the amount of time and manual labor spent using those backups in actual recovery scenarios.
A full backup of the domain is useful primarily for recovering the entire domain – a scenario that, thanks to Active Directory’s distributed, multi-master nature, is pretty unusual for an organization to encounter. In other words, because each domain controller acts as a backup for all of the others, it’s reasonably rare that you need a full domain backup to recover the entire domain. If a single domain controller fails, you repair it and let it re-replicate data from its partner domain controllers.
Instead, full backups are most often used to perform more granular recovery: Object-level and even attribute-level recovery. A full backup can absolutely be used for object-level recovery; in reality, attribute-level recovery using a full backup typically means restoring the entire object, not just one or two attributes. Full backups can often be difficult to use. For example, using Windows’ native backup utility to perform a recovery entails taking a domain controller completely offline and into recovery mode, where administrators perform a somewhat complex command-line-drive authoritative restore. It’s time-consuming, to say the least, and requires a good amount of specialized expertise. It also, obviously, involves taking a domain controller offline.
Many third-party backup solutions can restore individual objects from a full backup without taking a domain controller offline, which is at least a performance improvement. However, these traditional backups still have a major weak point that most organizations tend to accept only because they haven’t thought of a better approach. That weakness? Point in time. Traditional backups are made on a regular schedule, and any data that changes between backups is “at risk,” meaning if the data is lost then it can’t be recovered because it hasn’t yet been backed up. New user accounts, changes to user passwords, changes to group memberships, and so forth can all be lost if a failure occurs between backups. Your most granular point of recovery is the most recent backup, which in many organizations puts up to 24 hours of data at-risk.
Organizations should demand more granularity and less at-risk data:
Introduced in Windows server 2008 R2, the “Active Directory Recycle Bin” offers a very minimal level of native support for object-level restores. This feature comes with a number of caveats, and it’s important to understand what this feature does and does not do before your organization chooses to rely upon it.
Once enabled in a Windows Server 2008 R2-level domain (meaning all domain controllers run that or a later version of Windows, and both the domain’s and forest’s functional levels have been formally upgraded), the “Recycle Bin” stores a copy of all deleted objects. Using a series of command-line utilities in Windows PowerShell, objects can then be copied back from the “Recycle Bin” and into the production directory (despite the name, there is no graphical Recycle Bin element to work with).
This feature cannot be used for attribute-level restoration, because it requires that the entire object be deleted in order for the object to get moved to the “Recycle Bin.” For obvious reasons, this feature is also useless for domain-level restoration, since it depends upon the domain itself being functional.
One reason that traditional backups can be especially difficult to use – especially for common attribute-level and object-level restoration – is that administrators typically have to make use of several different tools in order to complete the entire process.
For example, an administrator must first discover the need for a restoration. This may come as an alert from an auditing solution, where administrators receive an e-mail when changes are made to a highly-sensitive group such as the Enterprise Admins group. A restoration request might also originate from the IT department’s help desk, such as when a user calls about a specific problem. The need for a restoration might also come up when an administrator or auditor is checking the Active Directory change log, and notices an improper or unwanted change.
From there, the administrator must switch to their recovery tool, which may also involve pulling a recent backup file from magnetic tape. The administrator must, within the tool, locate the desired object or attribute, and then initiate the restoration.
Organizations tend to accept this general workflow, as it’s been the one in use for computer backup and recovery since time immemorial. But in an Active Directory environment in particular, it’s a cumbersome process that doesn’t need to exist. Most organizations already conduct change auditing for Active Directory; why can’t a change log entry – which is often the starting point for discovering that a recovery will be necessary – offer a way to immediately recover from, or roll back, that particular change?
Even third-party tools that offer both backup and change auditing capabilities often forgo this level of integration, requiring administrators to see the change in one area of the tool, and then switch to a different area of the tool to perform the recovery or roll back. If a product were to tightly integrate those two pieces of functionality into a single tool, the overall workflow would not only be much more efficient, but also less open to human error.
Organizations can help drive a more efficient workflow by identifying that as a requirement. For example:
Although the terms are often used somewhat interchangeably, there is a distinct difference between recovery and rollback (or roll back) in Active Directory. With a recovery, you’re generally bringing back an entire object that was removed from the directory; with a roll back, you’re typically restoring attribute values to a previous state, without having to copy the entire object back into the directory (because it already exists, just not in the desired state).