Ending Data Co-Dependence, Chapter 2 – Enrichment

The IQVIA Healthcare Center of Excellence is part consulting firm, part advisory service and part relationship counselor.  We help organizations realize that they don’t have to be in an endless cycle of fights with their critical data.  We can help you learn to harmoniously acquiRe, Enrich, goVern, intEgrate, pRovision and visualizE (REVERE – get it?) your data in a way that will not only change how you think about data management, but make you love your data again.

Over the next month, we will cover this evolution of data management through a series of blog posts that share some of our learnings over 4,000+ of projects and 1,000+ customers.

Don’t break up with your data. Learn how to love your data, again. Let IQVIA and our Healthcare Center of Excellence help you make something beautiful out of your data.


Enrichment – “there’s nothing quite like quality”

In this chapter, it’s time to discuss data enrichment and the cost of quality.

Let’s be clear – healthcare data quality stinks.  It stinks like hot garbage – yeah, I said it – like hot garbage on the day before garbage pickup in the hottest part of August!

Let’s also be clear – that doesn’t have to ruin your day, nor make healthcare data management as difficult as it often is. Data enrichment can’t turn garbage into gold, but it can sure make it smell a lot better!

Data enrichment comes in many forms:

We don’t necessarily leverage all these forms in an intentional and deliberate way:

  • We fix problems with data standardization but often defer the governance tasks of establishing those standards until a critical problem arises
  • We cleanse data where necessary – often spot routines and procedures, occasional quality rules and similar fixes, but we often lack a true data quality program that seeks to improve data quality more holistically
  • We ask end users and Subject Matter Experts (SMEs) to correct instances of data but don’t drive a process that asks why these errors are injected into the source systems and data provisioning cycle to begin with
  • Generally, the industry does a poor job of creating and leveraging synthetic data (manufactured and inferred data) due to a lack of understanding about how this data is useful and the right processes for creating synthetic data
  • Third party data is an industry unto itself. While there are many good sources of outside data that can offer insight and add value to institutional data, most organizations have not taken full advantage of the data they own and operationalizing outside data is often the part of this integration that is a struggle.

Data quality

If we are going to talk about data quality, then we must talk about the ‘cost of quality’ and Philip B. Crosby. Crosby was the quality control manager for the Pershing missile program, credited with coining the phrase “quality is free.” He helped create the roots of some well-known quality processes with his ‘Zero Defects’ ideology. According to Wikipedia, Crosby basically stated the following:

Crosby’s response to the quality crisis was the principle of “doing it right the first time” (DIRFT). He also included four principles:

  1. The definition of quality is conformance to requirements (requirements meaning both the product and the customer’s requirements)
  2. The system of quality is prevention
  3. The performance standard is zero defects (relative to requirements)
  4. The measurement of quality is the price of nonconforming

In Crosby’s world – the cost of quality was “free” – meaning it was cheaper to do it right the first time and less expensive than the cost to resolve the quality issues later. Even when dealing with the disaster that healthcare data is, quality can absolutely create long-term savings in resource time, customer satisfaction and, ultimately, money.

Data quality comes in at least two forms: positive and negative. Negative data quality is the one often talked about: a required field is null, an alpha value where a numeric one belongs – this is usually what people mean when they say data quality.

We can, however, talk about data quality in existential ways – positive data quality. Like the powers of positive thought – positive data quality makes a good t-shirt design slogan: “my data is happier than yours” ™ (yes, I am going to trademark that slogan!), but it also offers a pragmatic process to enrich data and create impact.

Positive data quality offers us a way to score, grade and qualify data based on its overall profile. We can add elements to the data stream – metadata, additional fields, processing instructions – that allow for intelligent routing, data valuation, and other critical operations that can derive more value from the data you own and reduce the amount of re-work and cleanup necessary. Let’s get out of the theoretical and put some of our new data relationship skills to work.

Get started with metrics

In healthcare, we have clear quality metrics/measures we can baseline and discuss in real dollar value, and show Return on Investment (ROI). Provider reimbursements (submitting claims) and claims processing (paying claims) are two scenarios where each record is tied to an actual dollar amount and delay can be calculated in deferred revenue and revenue recognition. It is a practical place to start a bona fide quality program in its first iteration.

Start by creating a small number of metrics to track quality issues. But what metrics do you start with?

The industry standards are a good place to start: Accuracy, Consistency, Completeness, Integrity, Timeliness, however, establishing internal agreement about what these might mean can create delay and complexity.

So, while everyone is busy noodling over how to codify these, start with things that are easy to define (because you have already defined them):

  • Turnaround time (TAT) – this is already an established metric in claims processing (inbound and outbound)
  • PCT (%)/Count of Null values in required fields (PCT_Null) – we know what fields must be filled to create and process a claim
  • PCT (%)/Count of Unvalidated values (PCT_Unvalidated) – there are established standards for values in many fields in claims records and simple quality validations will identify these issues

We can build these as simple quality rules in tools like Informatica Data Quality (IDQ), or, if a dedicated quality tool doesn’t exist, stored procedures will do to start. The idea is to baseline these metrics with an initial quality run and track them over time to show improvement and value.

Enrich the output (let’s get in the weeds a bit)

Data can be enriched with what we like to call ‘additive data quality’ – metadata (in the form of additional fields, reference data or documentation) that embeds the quality output into each record for further processing.

For example, if we were to set a threshold of 70% on our PCT_Null metric above (I am using an arbitrary threshold in this example), our claims file ingestion process can leverage a rule that pauses the processing of a claims file that exceeds this threshold.

This process can then route the file to a more discerning quality routine that determines the cost of the claim delay and makes a real-time calculation of loss/risk and, based on established rules, chooses to process the file anyway, flagging each offending record for cost, risk, exclusion and rerouting.

The process can also manufacture missing fields (synthetic data) by leveraging the balance of the record.  If a provider identifier is missing, can it be inferred through surrounding records, historical records, a reference table lookup, master data management (MDM) system or third-party party validation store?  This intelligent processing allows for:

  • Field-level validation
  • Row-level processing
  • File-level routing
  • Real-time quality improvement
  • Meta-data enrichment
  • Source vendor/system scoring

This happens while ensuring we don’t increase turnaround time (TAT) for non-offending records, while making sure we can isolate poor quality or high-risk records before they end up in the transactional system or, worse, the warehouse, where the cost of backing data out is even higher.

Illustrating value

A process such as described above, can illustrate value across the enterprise and create an easy discussion with executives, IT folks, and business stakeholders alike.  It’s like family therapy – if your family was made up of machines, data and people from work (ok, I will work on a better metaphor starting now).

Ultimately, you have much more value in your data than you are currently utilizing. Data enrichment can help drive better insight into the problems that exist in your provisioning stream, but also provide actionable metrics for the value of resolving those issues and the improved value of your data that will be driven by that process.

thanks for reading

— adam



NextGovernance – data, not unlike young children, really craves structure

Next time we’ll talk about data governance and how this practice doesn’t have to be painful, frustrating or the reason you think about going back to school to change careers