Vegas Conference – Data Quality

This past week I attended the Information and Data Quality Conference at the Palms Hotel in Las Vegas. The conference material was great, and because data quality is an emerging discipline just as data governance is, there were many different approaches and methodologies to implementing data excellence in your organization. Last week I opened up the opportunity for you to steer my data quality conference experience by allowing you to email me with sessions that you’d like notes from… One person took me up on the offer and later this week I’ll provide you with my notes from the session he requested, “Managing Data Quality in an ERP Environment” by Danette McGilvray.

I’d like to start, though, with key takeaways from my half-day session (on the 1st day): “Using Data Profiling for Proactive Data Quality Improvement” by David Plotkin of Wells Fargo Bank. As you probably know by now, I maintain that a good data quality program is a key piece to making Data Governance successful (and vice-versa). This session included some great tips for starting and sustaining data quality. Read on for notes from the session…
Data Governance

David Plotkin is the Data Quality Manager for the Wells Fargo Consumer Credit Group. He began his presentation by outlining the importance and differences between proactive and reactive data quality… the differences are (note that he refers to the business units as his ‘customer’:

Reactive Data Quality

  • The customer tells us about data quality issues
  • The customer tells us what they think the issue is
  • Issues are logged and worked on as time permits
  • Users uncertain about who to report issues to or status of issues

Proactive Data Quality

  • Review existing elements for data quality
  • Review new elements as they are brought in
  • Publish results and trends in Data Quality (scorecards)
  • Solicit feedback from key business users on results

There are many reasons that proactive data quality is better than reactive. I think both are necessary, but the more you can catch upfront, the better!  A few noted benefits to proactive DQ were: making our jobs easier and less frustrating, increase customer confidence, save money, catch things before they break systems.

To start, David recommends prioritizing issues via a cost/benefit matrix. The projects that have the lowest cost and highest benefits ratio would be best to start with. He then recommends collecting business rules via a standard template that you will use for all rules going forward. He didn’t provide a template, but I would suggest that the template would have a minimum of: data field, systems impacted, business units involved and known issues. The rest of the fields would depend on your organization.

I think they key is to provide a nice ‘snapshot‘ for the business to view the data field so that they can easily capture (or provide) the business rules.

The remainder of the session focused on data profiling tools and the great opportunities they provide to quickly and easily identify issues. David acknowledged that data quality is not tool dependent, and that “big things can be done with small tools” but did concede that tools certainly allow us to be more productive because a ton of manual processing is reduced or eliminated.  Tools can review all of the values in a specific field and tell you the mean, media, mode, outliers, trends, and tons of other statistical information that you can use to make good decisions from.

You can see many tools being advertised on this site, or google “data profiling” for more data profiling products that you can shake a stick at. The sponsors room also had a bunch of companies whose tools did data profiling in some way or another. Rather than me rehash the notes on how you can use these tools in a proactive manner, I suggest you do some research on data profiling tools with the companies themselves – trust me, their sales people will be more than happy to show you what they have to offer :)