Unstructured Data Holds the Key to Better Understanding Mental Health

By David Talby, CEO, John Snow Labs
LinkedIn: David Talby
LinkedIn: John Snow Labs

Mental health remains one of the most complex and under-diagnosed aspects of patient care. While digital health tools and wellness apps flood the market in the wake of COVID-19, the actual state of mental health in the US has quietly worsened, contrary to what public discussions about mental health may suggest.

Nearly 60 million (23%) American adults experienced a mental illness in the past year (Mental Health America). With Mental Health Awareness Month in full swing, it’s time to confront an uncomfortable truth within this population: We’re missing critical health data that leads to important diagnoses and care. Fortunately, artificial intelligence (AI) may be able to change this.

The Data We’re Not Seeing

A new study conducted by Oracle Health, John Snow Labs, and the Children’s Hospital of Orange County analyzed more than 109,000 patients using Oracle’s Real-World Data platform, linked to national claims databases. The research compared the detection of neuropsychiatric symptoms—such as anxiety, memory loss, agitation, and mood disorders—using structured electronic health record (EHR) data versus structured data augmented with unstructured clinical notes.

The findings were sobering. Key mental health events were routinely overlooked when relying solely on structured data. In fact, the number of suicide and self-harm events doubled once unstructured notes were included. Symptoms like irritability and hallucinations were often recorded only in narrative form, highlighting how much is lost in translation when relying on codes alone.

Structured Data Isn’t Telling the Whole Story

Structured EHR fields—think diagnosis codes or medication lists—serve as the skeleton of a patient’s health history. But when it comes to mental health, it’s often the nuances documented in unstructured notes that reveal the bigger picture.

These clinical narratives capture patient behavior, family input, mood fluctuations, and cognitive concerns that may never make it into a billing code. For pediatricians and general practitioners, reluctance to assign a definitive mental health diagnosis—due to stigma, uncertainty, or scope of practice—often results in key symptoms remaining in the margins.

If we want to really understand the state of mental health, we must capture the full patient journey. Structured data is a necessary part of that, but alone, is insufficient. The details that clinicians jot down during encounters, or the way they describe subtle symptom progression, contain vital insights we can’t afford to ignore.

AI to the Rescue

Natural Language Processing (NLP) and Large Language Models (LLMs) are two areas that hold major promise for the mental health space. These AI technologies can analyze unstructured text at scale, surfacing mental health indicators that traditional tools miss. In the aforementioned study, incorporating unstructured notes led to a 20% increase in identified outcome events—a significant leap in progress.

That said, building these systems is no small feat. Training AI to extract clinically relevant insights requires annotated data, clinician oversight, and significant compute resources. And the reality is, many health systems lack the internal capacity to manage such initiatives on their own.

The challenge is compounded by the complexity of mental health coding itself. Unlike a broken bone, which comes with a clear diagnostic code and treatment path, mental health symptoms often fall into gray areas. Without precise language or billing incentives to document these events consistently, it’s no surprise we’re not capturing the full spectrum.

Hybrid Models are the Key

The most promising approach lies in hybrid AI architectures that combine the precision of rule-based NLP with the flexibility and reasoning capabilities of LLMs. These pipelines can extract both overtly stated symptoms and more subtly implied conditions, achieving near expert-level accuracy without relying solely on structured fields.

In practice, that means better detection of at-risk patients, faster insights for researchers, and more informed care pathways. LLMs fine-tuned on medical language can flag early signs of distress even if words like “depression” or “anxiety” are never explicitly mentioned. This approach allows AI to scale across healthcare settings, from academic centers to community clinics, and adapt to new data with minimal retraining.

What Now?

One out of every two people in the world will develop a mental health disorder in their lifetime, according to a large-scale study co-led by researchers from Harvard Medical School and the University of Queensland. While awareness is increasing, so are the number and severity of mental health events. With half of the world impacted by this, we can’t continue to treat it the same way.

Fortunately, we now have the tools to close the mental health data gap. By combining structured and unstructured data via advances in AI, we can improve the accuracy and timeliness of diagnosis, implement more effective interventions, and ultimately, support better patient outcomes.

The Data We’re Not Seeing

Structured Data Isn’t Telling the Whole Story

AI to the Rescue

Hybrid Models are the Key

What Now?

Share this: