Data Sharing Across Borders: The Vital Role of a Trusted Research Environment

By Stavros Papadopoulos, Founder and CEO, TileDB
LinkedIn: Stavros Papadopoulos
LinkedIn: TileDB

An estimated 30 percent of the world’s data is generated by the healthcare industry, and it’s increasing at a faster rate than even finance and media. Increasingly, seamless data sharing between hospitals and other types of organizations is proving critical to improving both population and individual health, as well as promoting health equity.

We’re seeing examples of this collaborative information sharing more and more every day. In an increasingly interconnected world, where infectious diseases cross borders with ease, global collaboration played a vital role in responding to the COVID-19 pandemic. Specifically, a global collaboration of hospitals used this kind of approach to train an AI model to identify COVID-19 patient oxygen needs.

Another example involves Rady’s Children’s Hospital in San Diego, and its subsidiary, Rady’s Children’s Institute for Genomic Medicine (RCIGM). RCIGM screens newborn infants for genome variants, in order to diagnose and treat them as early as possible – often while there’s still a chance to select a treatment path.

RCIGM recently expanded their pool of available data to include two healthy adult population sequencing variant databases from the UK and Mexico City. In doing this, RCIGM is able to match infants’ genome variant data to healthy grown adults, ultimately helping them determine which genome variants were present in healthy adults and therefore likely harmless. As a result, RCIGM has been able to effectively reduce the number of false positives in affected newborns from 97 to three percent, resulting in far fewer anxious families.

In this type of scenario, speed and accuracy of diagnosis are critical – which means you need to get access to the genomic data conveniently and quickly. All too often, data-driven decisions in healthcare may fall short, not because data is unavailable, but because it’s inaccessible. If newborn screening projects could better share genomic data from across the planet, this could have a huge positive impact on the health outcomes for infants everywhere.

Such sharing sounds great in theory, but it can be really difficult in practice. While it would be great to have all the data ever needed wrapped up neatly in one database, that’s just not reality, especially in healthcare. As organizations grow and their data infrastructure becomes more distributed, data engineers, scientists and analysts are increasingly being required to join data that might reside in different data sources or might not even exist in the same datacenter or be controlled by the same entity.

This kind of cross-silo query is referred to as a growth ‘federated query’ and it is emerging as an increasingly critical capability for hospitals, as data volumes, demand for analytics, and the cross-industry need for greater collaboration grow. It behooves hospitals to implement database technology that supports the federated query model. For instance:

  • Multimodal data support – At the most basic level, database technology should include support for multimodal data, or information composed of different types of data sources and formats. In the context of healthcare, multimodal data might include data collected from a person’s electronic health record; an image captured by an X-ray; genomic data; and more.
    It is this ability to combine data from multiple sources in multiple formats that can produce more accurate results in reaching a diagnosis. Databases in healthcare should allow hospitals to query multiple heterogeneous data sources in real-time, without the need for traditional extract, transfer and load (ETL) processes, which can delay insights by “batching” data.
  • Data Remains at Rest – Secondly, the technology should support the querying of diverse data sources without requiring that the data be physically moved or duplicated. Redundant data storage is avoided by retrieving only the necessary information from the source system.
  • Simplified Governance – Thirdly and relatedly, as an additional benefit of keeping data in place, the database technology should drastically simplify regulatory compliance with strict data residency laws. Instead of moving U.S. data to a European-based analytics system, for example, the technology should support running the data with U.S. servers, therefore ensuring HIPAA compliance.

In a federated query, different organizations return only anonymized results to a central query platform, where researchers can then analyze the combined data. In this way, the database technology supports the dynamic sharing of necessary information without breaching patient privacy and while also adhering to the localized data governance rules concerning the storage and sharing of healthcare data.

Federated queries are the future of healthcare data – allowing researchers to access and analyze heterogeneous data sets both within an individual hospital setting or across multiple organizations (like hospitals or research organizations) without physically sharing or transferring that data. As healthcare data volumes only expand in the coming years, federated query functionality will grow more critical, and modern database technologies must be up for the job, delivering a trusted shared research environment for disease diagnosis and treatment.