Real World Value of Health Data De-Identification Tools

By Sam Wehbe, Marketing Director, Privacy Analytics
Twitter: @privacyanalytic

The amount of health data is staggering and becoming more so on a daily basis. Today, 30 percent of all data storage worldwide is health data, according to a 2012 study by the Ponemon Institute. Health data is increasing 48 percent annually, according to the research firm IDC, which predicts the data ocean will equal 2,314 extabytes by 2020. By comparison, a mere drop in this ocean of five extabyes equal all the words ever spoken by human beings, in any language, according to Fortrust, a data center.

Any way you slice or dice it, we are awash in data. And it comes from many different sources, including hospital emergency rooms, physicians’ notes, medical claims forms, pharmacies, drug trials, and more. Every day these data are filed away in data centers, data warehouses and data farms.

We are entering a critical time for medical research because of all this data, much of it representing real world, real life situations. But a lot of it is locked away because of legitimate concerns that individual patient identities may be exposed. In fact, it’s against the law for those in possession of personal health information to release data without taking steps to protect patient anonymity. Sharing data without proper safeguards in place is risky, so unless the risks can be reduced, much of this data will remain locked in silos.

But what if there were ways to safely share these valuable extabytes of medical information? Risk-based de-identification solutions are already available on the market, allowing healthcare organizations to leverage useful data without compromising privacy. This approach examines datasets based on the level of risk of exposing patient identities. No technique reduces the risk to zero, but risk can be greatly reduced without sacrificing the value of the data to medical researchers. When the right balance of privacy protection and data sharing is struck, good things happen that can make very sick people well again and improve the delivery of healthcare. Here are some examples.

Spinal cord injuries
Committed to seeing a world without paralysis after a spinal cord injury, one research institute opted to improve research by using risk-based de-identification. Spinal cord injuries happen every day, often occurring after a vehicle accident, fall, impact by an object, sports injury or by violence. Over 250,000 Americans suffer from such injuries, with approximately 11,000 new injuries each year. A heavy price is paid in economic loss and personal suffering.

Forming a nationwide registry of spinal cord injury patients, a research institute aggregates and de-identifies the data, and then uses the data to push towards the cure. Data sets are also used to improve and standardize delivery of care internationally, increase development and commercialization of innovations, and generate opportunities for participation in research and smart health decision-making among consumers.

Cancer and diabetes therapies
A leader in developing innovative therapeutics in the areas of oncology and endocrinology, this major biopharmaceutical organization assists specialty pharmacies and other healthcare providers by extracting patient-level data related to drug dispensing and clinical management of specific patient populations. The data they use for secondary purposes are obtained from specialty pharmacies. The data include information on patients, payers, claims, diagnoses, prescription and dispensary data. Using a data aggregator partnership, they regularly perform risk-based de-identification on their data through a secure process. They then use the de-identified data to assist providers in improving clinical care for their patients and for their business operations.

Multiple Sclerosis
Looking for to perform a longitudinal study on Multiple Sclerosis (MS) patients, a well-respected data analytics organization knew they had an uphill battle on their hands. Their researchers wanted to collect longitudinal data, but this is a slow, expensive process with many logistical difficulties. They knew the information they wanted already exists at various clinics, but access to raw information for secondary purposes is impossible. They partnered with these clinics to receive de-identified data. But they needed assurance that their data sets were sufficiently de-identified. Using a consultant, a risk assessment was performed and the data sets were judged to be safe to use.

Without this assurance, the clinical data on MS patients would not be used. Patients would need to be recruited independently, the corresponding tests duplicated. Their study would be significantly more difficult to conduct–if it could be done at all. Luckily the data are available just as hope is growing for treatments that will not only halt the progress of this devastating disease affecting millions of people around the world, but actually reversing nerve damage leading to paralysis.

This insurance association processes more than 80% of all claims from 96% of hospitals nationwide. But while the sheer volume of data represents terabytes of rich information for analysis, the association couldn’t fully leverage this critical asset to allow its members to benchmark service delivery. For the association, the main culprit preventing the sharing of more granular, patient-level data was traditional de-identification methods that protected.

Improving the service provided by health insurers
An insurance association processes a majority of all claims from 96 percent of hospitals nationwide. But while the sheer volume of data represents terabytes of rich information for analysis, the association couldn’t fully leverage this critical asset to allow its members to benchmark service delivery. For the association, the main obstacle preventing the sharing of more granular, patient-level data was traditional de-identification methods that protected patient privacy.

The association was uncomfortable with the level of risks associated with Safe Harbor de-identification method, let alone the lack of data granularity needed to improve service delivery and realize cost savings. The association’s chief legal counsel recognized that while their datasets were compliant with HIPAA, the quality of the data after the Safe Harbor de-identifying process was not useful to its members.

By using commercially available software built on the Expert Determination methodology, the association was able to de-identify their claims data in a way that enabled analytics. They could measure the risk and provide an audit trail into their process. The rich insights they were able to glean from their de-identified data allowed for comparative benchmarking while ensuring patient data was anonymous.

These four examples illustrate ways health data can be put to productive uses while safeguarding patient privacy. These uses are made possible by sophisticated tools available to data custodians now. Looking at new tools and solutions is especially crucial as the volume of health data and health costs skyrocket. The ability to access and share the raw data being collected for cutting edge analysis and sophisticated informatics will help improve patient outcomes and the management of the healthcare system.

Share this: