First Anonymization Solution for Structured and Unstructured Data for Secondary Use

Privacy Analytics Launches Solution for Structured and Unstructured Data for Secondary Use

Press ReleaseWASHINGTON – Building on its award-winning anonymization software, PARAT, Privacy Analytics Inc. ( announced the availability of PARAT 5.3, which extends its de-identification and masking capabilities to unstructured data. According to one organization, 80 percent of medical record data will be unstructured within two years, a critical source of analysis to ascertain new insights, innovation and knowledge for research hospitals and organizations, medical device companies, and insurance and medical claims providers, among others.1 The challenge for many organizations is accessing and analyzing unstructured and structured data, while ensuring that the personal information it contains is robustly protected under HIPAA and other legal requirements.

The Biomedical Translational Research Information System (BTRIS) at the National Institutes of Health (NIH) Clinical Center, a biomedical research facility and an agency of the United States Department of Health and Human Services (DHHS), recently purchased (through a competitive bidding process) PARAT Text software, the standalone module of PARAT 5.3. The BTRIS trans-NIH clinical research data repository plans to anonymize unstructured text data from more than 400,000 patients for research purposes using PARAT Text. The government agency selected PARAT Text to augment the data currently available in “de-identified format” within the BTRIS repository. The addition of unstructured text data without personal identifiers to the repository will allow researchers access to NIH Clinical Center clinical documentation from 1976 to the present. Access to clinical documentation in addition to structured data in de-identified form allows researchers to test hypotheses for new research, confirm potential sample sizes for proposed research and find collaborators for cross-disciplinary research studies.

“PARAT 5.3 allows organizations to safeguard their data, while at the same time enabling them to gain richer analysis from an integrated solution that marries structured and unstructured anonymization,” said Khaled El Emam, CEO, Privacy Analytics Inc. “The software matches structured and unstructured data values, to ensure the consistency and integrity of data, while also tying masked personal information to corresponding anonymized unstructured text for richer analysis. This allows privacy officers to safeguard personal information across their enterprise and statisticians and data analytic professionals to leverage it for secondary use.”

Using a risk-based methodology to anonymize personal information in accordance with HIPAA and other legal requirements, PARAT 5.3 automates the masking and de-identification of data in standard database tables and text or XML-based documents. Very often this unstructured data resides in text-based formats, including, for example, in:

  1. Electronic health records where personal information resides in free form text, is often exported in XML format, and must be anonomyized for analysis;
  2. Medical devices where unstructured data or free form text from machine “dumps” or downloads (i.e. x-ray machines or CAT scans) are sent to a database(s) for analysis; and,
  3. On-line Forums where patients or providers discuss their conditions or cases, and this narrative needs to be anonymized to facilitate sentiment analysis and other forms of information extraction.

The PARAT masks or renders personal information, such as names, phone numbers and medical record numbers (MRNs) unrecognizable, which due to the resulting obfuscation prevents its analysis. In the same database, however, PARAT 5.3 can de-identify or alter indirect personal identifiers, such as date of birth, medical facility name, and ZIP or postal code, to enable high quality, aggregate and individual-level analysis while protecting personal information at the same time.