Optimizing Life Sciences Unstructured Data in the Cloud

By Krishna Subramanian, President & COO, Komprise
Twitter: @Komprise

The digitization of healthcare driven by the explosive growth of medical devices and point-of-care software in recent years has been a game changer. Consider the speed at which effective COVID-19 vaccines were developed by large pharmaceutical companies, saving millions of lives, and allowing the global economy to recover faster.

Like no other time in medical history, new technologies combined with industry collaboration are delivering groundbreaking opportunities for more accurate clinical decision-making and faster development of treatments for life-threatening conditions:

  • Consider the speed at which major research institutions came together to develop and release Covid-19 vaccines, which wouldn’t have been possible without cloud and digital tools for rapid research, testing, analysis, and communications.
  • Digital microscopy gives technicians the ability to view an image of the sample immediately on a monitor for real-time analysis.
  • Internet of Medical Things (IoMT) delivers remote monitoring and diagnosis using wearable biosensors which monitor medications and deliver alerts on chronic conditions such as emphysema, multiple sclerosis, and diabetes.

The flip side of medical technology innovation is that these devices and sensors are generating massive amounts of unstructured data – adding to the overall healthcare data deluge. Roughly 30% of the world’s data volume is being generated by the healthcare industry, according to RBC Capital Markets. The collection and analysis of quality data is vital to healthcare but the rising volume of unstructured data, data that doesn’t fit nicely into rows-and-column based spreadsheets and databases is stretching IT budgets. It’s imperative to implement the right strategies for enterprise data storage, data governance, data security and data analytics initiatives. In this article, we will address the unique unstructured data management challenges facing life sciences organizations today with some tactics for addressing them.

Life sciences: patient care innovation and competitive gain in the cloud

One outcome of life-sciences unstructured data growth is an acceleration of data center consolidation and migration to the public cloud. Sixty percent of pharma executives have already made changes or have a plan in place to invest in cloud-based services to support their digital transformation efforts, according to PwC.

The lion’s share of clinical research data is unstructured (images, sensor data, documents) and managing it appropriately requires some new thinking because of its size and because it doesn’t work well with traditional data analytics tools. This calls for abandoning traditional one-size-fits-all strategies for storing data and avoiding lift-and-shift migrations to the cloud. By doing so, organizations can be cost effective and strategic with cloud migrations and discover new value from the data they already have.

Leading life-sciences companies are investing more in the cloud with an eye at efficiency and economic gain. McKinsey reports: “When analyzing cloud migrations in Fortune 500 pharma and medtech companies, McKinsey found that, on average, companies stood to gain about $10 billion to $15 billion in improved 2030 EBITDA run rates from the rejuvenation of IT, but roughly twice as much—about $25 billion to $30 billion—from business innovation.” Moderna Therapeutic’s Director of Informatics talks about how running ML processes in AWS is speeding time to market for life-saving therapeutics.

Cloud-based artificial intelligence (AI) is particularly exciting. A report by Deloitte outlined several use cases including to integrate data and improve the workflow for clinical trials: “They can even use AI to generate insights from past and current trials to inform and improve future trials.”

The major cloud providers are responding by investing in healthcare industry-specific analytics platforms such as Amazon Healthlake and Azure Health Data Services, which will play a significant role in both delivering new therapies to market faster and driving down the cost of research.

Addressing unstructured data management needs in life sciences

Yet the question is: which data sets should move to the cloud and how? Answering this starts with taking a closer look at the traditional life-sciences data infrastructure. Pharmaceutical companies, biotechs and research institutions frequently face the following data management issues, which drive up costs and impede R&D activities:

  • Data silos hampering collaboration across teams and departments as well as audits.
  • Data visibility issues with data spread across many different hybrid IT environments and disparate applications.
  • Difficulties searching and securely accessing and using data exported into cloud-based data lakes and other new data platforms.
  • Continual change in regulations, affecting data practices.
  • Too much time–at least 50%– spent on data preparation and deployment, according to IDC.

Here are the leading considerations for managing life sciences data in the cloud:

Analytics and segmentation: Before moving data to the cloud or buying more data storage, understanding data across all hybrid storage and usage/access patterns can direct optimal placement. A company may want to move clinical trials data to the cloud after the trial has concluded. This addresses compliance issues for storing data for the required time without clogging up expensive on-premises storage–which should be preserved for active, regularly accessed data. Organizations may need to locate and move files containing PII to highly secure, immutable storage such as object in the cloud. Analyzing and right-placing data can save significantly on storage costs, freeing up budget for R&D projects while ensuring that critical workloads have the appropriate protection and performance. The Komprise 2021 State of Unstructured Data Management Report found that investing in analytics tools was the highest priority (45%) over buying more cloud or on-premises storage or modernizing backups.

Data tagging for context and search: With scalable data lake and data warehouse technologies now commonplace, IT teams can move data into clouds services where data scientists can run machine learning and other processes on it. The trick is finding the right data. When moving data from clinical applications and instruments into cloud storage, contextual data is lost. This makes it painstakingly slow to cull the right data for a research study. A data management platform which facilitates tagging can apply metadata such as project, disease type, instrument type and demographics to the files. That way when files are moved into the cloud the researcher can search on keywords and find what they need without manual digging. This delivers an efficient way to search and enrich data with tags for query and to automate workflows. Tagging data for future use and enabling data lakes was identified as a top goal for unstructured data management, according to the Komprise survey.

Driving collaboration between research data scientists and central IT: Research data specialists and IT directors will benefit by working more closely together. The former know what scientists are looking for while the latter understand the nature of cloud infrastructure and data analytics implementations so they can create the best technical foundation to meet these research requirements. The end goal is to make it more viable for everyday analysts to search across distributed data sets to find what they need so that IT can continually, through policy-based automation, move the right data to the right platforms for analysis.

Ransomware defense: Privacy and security is a high priority for life science organizations, yet fewer than half (45%) have active ransomware protections in place to prevent breaches and data loss, according to research by Egnyte. Running trials, conducting research and managing compliance activities requires a keen understanding of security capabilities in the cloud. One strategy is to move cold data to object-locked storage such as AWS S3 and eliminate it from active storage and backups. This allows organizations to create a logically isolated recovery copy while cutting storage and backup costs by up to 80%.

Life-sciences organizations today have the tools and technologies to bring new and better products to market faster than ever before. In lockstep with adopting cloud and machine learning platforms, there’s a dire need to modernize data management practices. Wrangling unstructured data, in particular, is slowing down the process of leveraging new types of clinical data for research and creating a blind spot in data analytics. Rethinking traditional data management practices to fully leverage clinical and laboratory images, patient files including telehealth video, sensor data from wearables, research files and more is a must-have capability for life sciences leaders.