Auditing is one of the most complex and contentious subjects in the world of Windows network management. Organizations are demanding more and more in terms of auditing requirements, while technical professionals are continuously running up against limitations that seem to require endless compromises. In this paper, we’ll look at some of those realities, and examine why Microsoft seems comfortable, for the most part, to let those be the reality.

 

Computing Involves Compromise

Like any other form of work, computing involves compromise. You want to haul freight for your customers, but you only have a Ford Ranger small pickup truck? Well, you’re going to have to make some compromises: Haul less freight or buy a bigger truck. Computers are the same way. If you’re going to ask a computer to do more work, then it’s going to hit its maximum capacity sooner, and you’re going to find yourself either performing less work or buying more (or bigger) servers. The frustration with auditing is often due to the fact that it is a significant workload, it can’t just be done “in the background,” but it isn’t production workload. It’s overhead, and it can definitely be frustrating to have to engineer additional capacity to handle what’s seen as non-value-added work.

Let’s dig into that a little. Maybe there’s a better answer, or at least a more complete one.

 

Auditing is Hard Work

There’s no question that security event auditing in Windows is an intensive task. If you crank up auditing to full-blast, you’re asking Windows to write a log message every time someone successfully authenticates. Or touches a file, whether for reading or writing. Or reads attributes of a directory object. Or any of a hundred other tasks that your users are performing hundreds of times a day. A fully-audited file server on a busy network can generate thousands of security audit events per minute. That’s literally millions of events across an average workday. That’s a lot of work, and it’s no surprise that organizations who attempt it often discover that Windows can’t keep up with the auditing and the workload that triggered the auditing. In other words, a fully-audited, busy file server becomes less of a file server and more of an auditing server. That means the file server workload has to be shifted somewhere else, or you’re going to have to find a different solution.

 

Auditing Creates Work, Too

Let’s say you buy yourself the biggest Windows server machine ever seen. Gigabytes of RAM. Terabytes of storage. Multicore processors crammed into every cubic inch. This thing can keep up with whatever you throw at it. You crank up the native auditing functions to full-blast, finally collecting all of the information your organization’s security plan requires you to.

Now you’re sitting on tens of millions of audit messages per week – per server. Sorry, were you actually planning to use any of that data? Because each server, when it comes to auditing, is literally a world unto itself. Your first task is going to be consolidating all of that information into a secure repository where it can be searched, filtered, collated, and reported on. Yes, newer versions of Windows offer event forwarding, which would seem to solve at least the problem of consolidation. But if generating those millions of log events created a lot of workload, what do you think forwarding them is going to do? It just gets out of hand too quickly. 

So what do most organizations end up doing?

 

Native Windows Auditing: An Exercise in Compromise

Companies everywhere are simply compromising. At a “birds of a feather” discussion on Windows security auditing at Microsoft’s TechEd 2011 event, attendees made it clear that fully enabling auditing in their organizations wasn’t an option. IT specialists, auditors, and management had to sit down and essentially assign a cost to every piece of information that captured, so that they could start to make decisions on which events they’d capture, and which they’d simply have to miss out on. 

Some attendees’ organizations had spent a decade or more developing custom auditing tools to try and overcome these limits. Imagine that: Undertaking a massive, decades-long software development project just to capture auditing information was probably not in the original business plan, but that’s what organizations often feel they’re driven to.

One attendee said it best: “If you’re using the native event logs, you simply can’t expect to capture every single activity that takes place on a busy server. It just isn’t practical, and even if you did, you’d have so much data you wouldn’t be able to manage it.”

Even if you decide to go with native security auditing, simply setting it up and enforcing that setup – ensuring that auditing is happening – can be difficult. Active Directory and the file system have different auditing settings, which must be consistently configured and enforced, which – even with Group Policy – can be tedious to initially configure and difficult to actually enforce. Gaining central, non-overridable control over auditing settings is a crucial requirement.

It’s painful when technical limitations force us to do with less than we know our organization needs and deserves, but that’s the reality of the situation. If you’re relying solely on Windows’ native event logs, you’re going to have to be comfortable with gaps in your audit log, because you’re not going to be able to catch it all.

 

Actually Using That Log Data

Let’s suppose that you sit down and make the hard choices, and compromise on your auditing plan. You accept that it’ll have some gaps, but perhaps you’re able to come up with a set of events that meet your major business needs without making the server melt into a puddle of goo. Now you just have to utilize that data.

This is where it’s critical to identify what you actually need those audit logs for, because Windows’ native tools are fairly basic. They weren’t really designed for modern security auditing requirements in the age of Sarbanes-Oxley, GLB, HIPAA, PCI DSS, and so on; the native Event Viewer was primarily intended for use by technical professionals as a troubleshooting tool. Nobody really ever envisioned the tool having to deal with millions of records, and so it offers fairly primitive capabilities for filtering and searching, and it offers zero capability for generating reports of any kind. Looking for a report that shows all access to a specific set of files over a given period of time? Good luck with that.

The event log entries are, unfortunately, in a format that’s decidedly unfriendly to many business-level needs. For example, assuming you’ve somehow consolidated all of the event logs into a single log, you could use the native tools to pull up all of the events that relate to an Active Directory object attribute change. There’s a specific event ID for that, so you’d simply filter on that and wait while the computer churned through the log (which is a flat file, not a relational database, so it’ll take a while) to find the matching entries. What you can’t easily do is narrow it down any further, to find (for example) entries where a particular administrator changed a particular attribute on a particular user account. That level of detailed filtering is going to require you to somehow get the log data into a normalized database of some kind, and you’ll have to come up with a way of breaking the individual bits of data in each event out into separate database fields.

But let’s say that you do decide to deal with the limitations. You’re going to collect what you reasonably can, accept that there will be holes in your auditing coverage, and you’re going to deal with the native tools for viewing the log data. Perhaps you’ve got some amazing scripters on your IT team, and you just know they can extract data from the native logs and perhaps dump the data into a SQL Server database. So you’ll manually manage that process and set up a mini software development shop in-house to get the data consolidated into some useful repository. You’ll invest the time and effort needed to build the reports you need, perhaps using SQL Server Reporting Services. What other challenges will you be facing?

 

Technical Logs aren’t Always Business-Friendly

Another problem that organization face with the native event logs is that they were, first and foremost, designed as troubleshooting tools. They’re very technical. Here’s a typical audit event from the Security log of a file server:

Object Open: 
Object Server File
Object Type Security
Object Name: C:\Payroll.xls
New Handle ID: 1368
Operation ID: {0,130582}
Process ID: 684
Primary User Name: Administrator
Primary Domain: COMPANY
Primary Logon ID: (0x0,0x67C7)
Client User Name: -
Client Domain: -
Client Logon ID: -
Accesses DELETE
SYNCHRONIZE 
ReadAttributes 
Privileges -

Clear as a bell, right? You know what that Handle ID is all about, and what that Operation ID signifies, correct? You can tell whether the file was opened, deleted, read, moved, copied, or something else? That Logon ID information makes sense to you?

What about this one:
Special groups have been assigned to a new logon.

Subject: 
Security ID: S-1-5-18
Account Name: DC1-W2K8$
Account Domain: COMPANY
Logon ID: 0x3E7
Logon GUID: 947829587-8374-2837-3747-82716378

Does that work for your business needs? You see, the problem is that many of these event log entries – perhaps even most of them – are extremely cryptic and highly technical. There’s no translation facility built in; you’re getting raw computer data. Microsoft has actually made great strides in the past decade or so, making more and more messages clearer and easier to read – if you’re a technical professional. But turning this raw data into something that’s useful for, say, a security audit or a forensic investigation is still incredibly difficult.

Another challenge is pulling out specific auditing events related to specific object- or attribute-level queries, such as “show me everyone who’s modified this particular object, or who’s been changing a particular attribute.” The data is in the log, but it’s embedded in a text field, meaning complex pattern-matching, regular expressions, and other searches are necessary – and those kinds of searches aren’t directly supported in the native event viewer application. Further, some attribute-level changes aren’t even decoded into English, meaning you wind up having to figure out internal ID information and search on that. The end result is that you’ll have gaps in your audit trail.

What’s more, the data you’re able to retrieve isn’t always terribly useful. For example, one of the much-appreciated new features in the latest versions of Windows Server is the inclusion of “before and after” information in the event log for certain Active Directory operations. Now, when someone changes a user’s Department attribute (for example) from “Finance” to “Janitorial,” there’s a record of the change that shows both pieces of information. So you’ll know what changed, and you’ll know what it changed from – if the event is one of the few that includes this information. Not every operation inside Active Directory does, and outside Active Directory that “before and after” information isn’t included at all. Not having that information makes it difficult to actually use the events. “Okay,” you might say to yourself, “I can see in the log that the permissions on this file were changed. Should I be concerned about that?” Well, you’ll have to go find the file and look at its permissions manually to see if the new permissions warrant concern, and you won’t have any way – short of restoring the file from backup – to see what the permissions were. 

Not having “before and after” information can actually make it difficult to assemble a pattern of events, which is what auditing is really all about. When an administrator, for example, gives themselves additional permission on a file, opens the file, prints it, and then removes their extra permissions – that’s the type of behavior your audit log is meant to reveal and even highlight. But without the “before and after” information in the log, it’s tough to see.

 

Caution: Caveats Ahead

Depending on your organization’s specific needs, you also need to be a bit careful with not only home-grown solutions, but also solutions from ISVs. There are a lot of differences in approaches.

For example, the most simplistic solutions simply consolidate the native event logs. That’s great, but it doesn’t change the underlying fact that the native event log data is cryptic and difficult to use, not to mention resource-intensive to collect in the first place; merely consolidating it and reporting on the data doesn’t really buy you much of an advantage. You’ll still be compromising.

How consolidation happens is important, too. For example, one concern shared by many organizations is the ability of administrators to clear the native event logs. While doing so writes a log entry, so the deed doesn’t go unnoticed, you lose your audit log. It’s a way for an administrator – or anyone else with the necessary permissions – to cover their tracks. Sure, you’ll know they did it, but they can often plead guilty to a simple mistake, or to some activity that’s less severe than what they actually engaged in. What those organizations look for is real-time consolidation, meaning audit data that’s sent off-server, to a secured repository, in the very same instant that it happens. Sure, an admin can still clear the native log – but it won’t matter, because a copy of the data is already saved in a repository that the admin doesn’t have permission to. This separation of duties is a crucial requirement for many organizations, and it’s effectively impossible to achieve with native logs alone, as well as with non-real-time consolidation tools.

 

Taking the Event Log Further

Why rely on the event log to just be an audit trail? While we’re discussing the limitations of the native log, let’s also discuss what it should be able to do – things that perhaps you haven’t even imagined.




Download this white paper: The Tradeoffs and Risks of Traditional Windows Auditing