How Far Can AI Chat Agents Go With Clinical Trial Data?

By Bhavish Lekh, Co-founder & CEO

Something I think about a lot is whether the tools we're building at Aurora Analytica are genuinely adding value — or whether the world has moved on and someone can now get the same results from ChatGPT or Claude.

It's a question I take seriously. AI is advancing at an extraordinary pace, and what wasn't possible six months ago often is today. I regularly check where the publicly available platforms have got to, because my goal has always been to build something that fills a real gap — not to add noise to an already saturated market.

So when I noticed that AI chat agents like ChatGPT, Claude, and Gemini now offer connectors to ClinicalTrials.gov, I didn't just take note and move on. I wanted to do the work properly and understand what these tools can genuinely deliver for the people I used to sit alongside — feasibility teams at CROs working under real time pressure.

The question in the back of my mind was a personal one. If I was back working in feasibility and had 5 RFPs land on my desk at once, could I use one of these AI chat agents to increase my productivity? Could it help me save my company money on software subscriptions? Could it genuinely do the heavy lifting that feasibility teams need during a compressed RFP cycle?

I spent time working through a real scenario, documented everything, and wrote up what I found — what worked well and where the limitations are.

The experiment

We connected an AI chat agent to ClinicalTrials.gov and set it a task that any feasibility analyst would recognise: retrieve and analyse all completed NSCLC trials started from January 2020.

The parameters were straightforward:

Indication: Non-Small Cell Lung Cancer (NSCLC)
Status: Completed
Start date: January 2020 onwards
Total trials identified: 255

The agent found all 255 trials in about 5–7 minutes of guided conversation. So far, so good.

We then asked it to retrieve the full record for SKYSCRAPER-03 (NCT04513925) — a Phase 3 Hoffmann-La Roche study with 177 sites across 25+ countries. It pulled the full detail, including a table of all 25 US sites with facility names, cities, states, and zip codes. That's a genuine capability, and it's useful.

Then we tried to scale it.

Where it breaks down

Every trial record had to be retrieved individually. One API call per trial. Each call required a prompt from the user, processing time from the agent, and a review step to check the output. There is no bulk retrieval, no batch processing, and no way to parallelise the work.

We mapped the realistic time cost:

Activity	Time
Agent API call per trial	15–20 seconds
User crafting / refining the prompt	2–5 minutes
Agent processing and presenting results	30–60 seconds
User reviewing output for accuracy	3–5 minutes
Combining records into a consolidated list (per 10 trials)	2–5 minutes
Total per trial (end-to-end)	8–12 minutes
Total for 255 trials	34–51 hours

That's 34–51 hours of combined user and agent time — assuming no session breaks, no errors, and no re-prompting. All of which happen regularly in practice.

For context, an experienced analyst working directly with the ClinicalTrials.gov data can calculate a patients-per-site-per-month enrolment rate in 2–3 minutes per trial. The AI agent took 8–12 minutes for the same calculation. That's 3–5x slower, not faster.

The context window problem

AI chat agents have a finite context window — the amount of information they can hold in a single session. Each trial record retrieved fills up that window. Large Phase 3 trials with hundreds of sites consume far more context than small Phase 1 studies.

In our testing, the context window was effectively exhausted after 10–20 trials, depending on trial complexity. At that point, the session had to be terminated and restarted from scratch. There is no state persistence between sessions — everything must be manually re-established.

For a 255-trial dataset, that means 3–5 separate sessions minimum, with no continuity between them. Any cross-trial aggregate — like a median enrolment rate across the full dataset — cannot be reliably computed because no single session ever sees the complete data.

Something else that caught me off guard: I couldn't even export the 255 trials into a simple Excel file through the chat agent. The data existed only within the conversation — there was no way to get it out into a structured format that I could work with, share with colleagues, or attach to a proposal. For anyone who's worked in feasibility, you'll know that a dataset you can't export is a dataset you can't use.

The trust gap

There's another limitation that matters a great deal in our industry. AI chat agents operate without a native audit trail. There is no query log, no structured output schema, and no traceable decision record. The user cannot easily verify what query logic the agent applied, what fields it selected, or how it interpreted the data at each step.

For clinical and regulatory contexts — where data provenance and decision traceability directly affect site selection, sponsor proposals, and regulatory submissions — this is a meaningful gap. Any output from an AI chat agent requires independent verification against primary source data before it can be relied upon.

What AI chat agents are good at

This report is not a dismissal of AI chat agents. They are genuinely useful for:

Quick trial lookups — finding a specific trial by NCT ID, sponsor, or condition
Single trial deep dives — pulling full eligibility criteria, endpoints, and site lists for one study
Eligibility matching — checking whether a specific patient profile fits a trial's criteria
Sponsor analysis — identifying what a specific company is working on

For targeted, question-driven research tasks, they deliver real value. The limitation is specific: they cannot reliably perform structured analytical workloads at dataset scale.

Where Trial Core™ fits in

I should be upfront — I obviously have a perspective here, because we build a product in this space. But that's also exactly why I wanted to do this assessment properly. If AI chat agents could deliver what feasibility teams need at scale, I'd want to know that sooner rather than later.

What I came away with is that the limitations I found aren't really about the AI models themselves — those are genuinely impressive and getting better all the time. The constraints are architectural. A chat agent is a conversational interface built on top of a registry API. Trial Core™ is a purpose-built data pipeline. They're designed for different things:

Direct data pipeline connecting 7+ clinical trial registries and data sources beyond ClinicalTrials.gov
Aggregations and analytics built in at the infrastructure level — not reconstructed conversationally from individual API calls
Distributed AI architecture that processes 550K+ trial records at scale, with proprietary enrichment layers including enrolment benchmarks, site performance scores, and investigator track records
Complete, auditable datasets with full traceability — every output can be traced to its source

Where a conversational AI chat agent requires 34–51 hours to retrieve and validate data for 255 trials, Trial Core™ delivers the same analytical output in minutes. Not because it's faster at the same task — but because the task itself is architected differently.

Capability	AI Chat Agents	Trial Core™
Data retrieval for 255 trials	34–51 hours (sequential)	Minutes (pipeline)
Aggregation capability	Manual, per-session	Built-in, native
Cross-trial analytics (e.g. p/s/m)	8–12 min per trial	Instant, dataset-wide
Session state persistence	None	Persistent
Audit trail	None	Full traceability
Multi-registry data sources	Single registry per connector	Multiple registries integrated
Scalability	10–20 trials per session	Unlimited

Back to the 5 RFPs on my desk

I keep coming back to that question. If I was sitting in a feasibility team tomorrow, with 5 RFPs landing at once and a week to turn them around, what would I actually reach for?

For quick lookups — finding a specific trial, checking eligibility criteria, getting a sense of what a sponsor is running — I'd absolutely use a chat agent. They're a genuine time saver for that kind of work, and I think every feasibility team should have access to them.

But for the core analytical work — pulling enrolment rates across hundreds of trials, building site shortlists, calculating p/s/m benchmarks, producing the data that actually goes into the proposal — I'd want something built for that scale. Not because the chat agents aren't impressive, but because the work itself needs a different kind of infrastructure underneath it.

That's really why we built Trial Core™. Not to compete with chat agents, but to handle the part of the workflow they weren't designed for. I hope this report is useful to anyone thinking through the same questions — and as always, I'm happy to talk through it if you'd like to compare notes.

For further information or a demonstration of Aurora Analytica's clinical trial intelligence capabilities, contact bhav@aurora-analytica.com or book a demo.

Bhavish Lekh is the Co-founder and CEO of Aurora Analytica. Connect with him on LinkedIn.

How Far Can AI Chat Agents Go With Clinical Trial Data?

The experiment

Where it breaks down

The context window problem

The trust gap

What AI chat agents are good at

Where Trial Core™ fits in

Back to the 5 RFPs on my desk

More Articles

Breaking Down Silos: A User-Defined Approach to Global Feasibility

Decision Engines™ vs Data Tools: Navigating Clinical Trials Smarter

Optimizing Phase III Trials in Pancreatic Cancer – ESMO Poster 2024