By Vish Srivastava, Co-Founder and CEO, Century Health
LinkedIn: Vish Srivastava
LinkedIn: Century Health
Specialty practices are engines of clinical insight. Every patient encounter generates information that cannot be found in clinical trials: real-world treatment patterns, longitudinal outcomes across diverse populations, and responses to therapies in patients who don’t qualify for the studies that brought those therapies to market.
Clinical trials, by design, enroll relatively homogeneous patient populations under tightly controlled conditions. The patients seen in routine care are often more complex, with comorbidities, varied disease trajectories, and inconsistent treatment adherence that fall outside those parameters. In one Stanford observational study of treatment-naive patients with neovascular age-related macular degeneration, nearly 45% would not have met the eligibility criteria for the clinical trial whose findings were being applied to their care. This gap shapes how treatments are understood in real-world care, and specialty practices already hold the data to close it.
The data needed to advance the field is already sitting in clinical records, locked in a form that cannot be queried.
The clinical variables that matter most for research and benchmarking, including imaging findings, treatment sequences, and outcomes tracked across months or years, rarely live in clean, queryable fields. They are embedded in free-text notes, PDF imaging reports, and documentation systems designed for specialized clinical workflows, rather than analyses. Structured diagnosis codes capture only a fraction of the picture. Extracting the rest manually is not feasible at scale for practices focused on delivering care.
Ophthalmology makes this challenge especially clear. It’s a specialty that generates rich longitudinal data: OCT measurements, visual acuity tracked over time, detailed treatment histories. Yet a 2023 review published in Ophthalmology Science identified the lack of harmonized data structures and inconsistent data labeling across ophthalmology EHR systems as a primary barrier to large-scale research in the specialty, noting that even within a single institution, the same clinical data may be stored in different locations and at different levels of detail. The same pattern holds across many specialties.
Advances in AI are starting to address this. Natural language processing (NLP) and machine learning (ML) applied to clinical records can now extract and structure the variables buried in unstructured data at a scale and speed that was not previously achievable. Before these tools existed, practices that wanted to extract structured insight from their records faced a difficult choice: invest in costly manual chart abstraction, where trained staff reviewed notes one by one to pull relevant variables, or simply leave the data dormant. Manual abstraction is slow, prone to inconsistency, and impossible to sustain at the scale needed for meaningful analysis. A practice seeing hundreds of patients per week could spend months extracting data from a single year of records, and the output would still reflect the judgment and attention of whoever did the reviewing. For most specialty practices, it was not a realistic path.
What changed is the adaptation of AI models to the specific demands of clinical data. Healthcare records carry requirements that general-purpose language models were not designed to meet: terminology varies by specialty, abbreviations are dense and context-dependent, and the stakes of misclassification are high. Recent advances in clinical NLP have produced models trained on large volumes of medical text, capable of recognizing disease-specific variables: progression markers, treatment response indicators, and clinically meaningful changes in a condition over time. QA and validation matter just as much: establishing that extracted data matches what a clinician would extract, and building audit trails that allow practices and research partners to verify what was extracted and how. This can now be done without requiring clinicians to change how they document care. AI platforms work instead with the records already being created, parsing imaging reports, abstracting clinical notes, normalizing measurements, and linking encounters over time to reconstruct what actually happened in care.
For practices, organizing this data creates a clearer view of their own patient populations: how patients progress, how they respond to treatment, and how outcomes vary across different clinical profiles. In practical terms, this can mean identifying which patient subgroups respond best to a given therapy, tracking how outcomes shift when treatment intervals change, or benchmarking adherence and follow-up rates against broader peer data. For practice leadership, it can surface patterns that are difficult to see in day-to-day operations: which conditions are being managed to guideline, where care gaps exist, and how the practice’s outcomes compare to published standards. It also allows for more meaningful comparisons with clinical trial data, helping to identify where real-world populations diverge and what that may mean for treatment decisions.
The value of that data extends beyond the practice itself. It enables practices to contribute to life sciences research with real-world evidence that reflects how care is actually delivered, data increasingly used to understand treatment effectiveness, monitor safety after approval, and inform the design of future studies. It can also support participation in funded research and provide the outcomes-based evidence that value-based care models require.
Regulatory direction reinforces why this matters. In July 2024, the FDA finalized guidance on the use of EHR and medical claims data to support regulatory decision-making for drugs and biological products, a signal that real-world evidence generated from routine clinical care is increasingly central to drug development. The data produced in specialty practices aligns closely with this approach.
Practices that invest in structuring and activating their clinical data will be better positioned to understand their own outcomes, attract research partnerships, and shape how treatments are evaluated across their field. Practices generate this data every day. The only question is whether they make it usable.