Clinical Trial Navigator

Model: z-ai/glm-4.7

Status: Completed

Cost: $0.210

Tokens: 142,890

Started: 2026-01-05 14:35

Section 03

Technical Feasibility & AI Architecture

Clinical Trial Navigator - Product Viability Report

Analysis Date

October 26, 2023

Technical Achievability

The Clinical Trial Navigator is highly achievable using modern low-code and API-first architectures. The core dependency—the ClinicalTrials.gov API (AACT database)—is public and well-documented. The primary technical challenge is not "can we build it," but "can we ensure medical accuracy and data freshness" at scale.

Leveraging Large Language Models (LLMs) for "Plain Language Summaries" and "Eligibility Parsing" is a proven use case. Precedent exists in tools like Antidote.me and various MedTech startups. The main complexity lies in the asynchronous data processing required to index 450,000+ studies daily and running vector similarity searches efficiently without incurring exorbitant cloud costs.

A functional prototype can be built in 4-6 weeks by a single developer using Supabase and OpenAI. The architecture is standard (CRUD + Vector Search + AI wrapper), meaning there is low R&D risk.

Gap Analysis (Score < 10)

Medical Hallucination Risk: AI may misinterpret complex exclusion criteria.
Data Latency: Syncing 450k+ records daily requires robust background job management.

8.2

/ 10

High Feasibility

Recommendations:

Implement "Human-in-the-loop" review for the first 1,000 trial summaries.
Use pgvector (Postgres) instead of separate Vector DB to reduce complexity.

Recommended Technology Stack

Layer	Technology	Rationale
Frontend	Next.js 14 + Tailwind CSS	Next.js offers Server-Side Rendering (crucial for SEO of trial pages) and API routes. Tailwind enables rapid UI development. PWA capabilities are native.
Backend	Python (FastAPI) or Node.js	Python is preferred for the AI processing pipeline (better library support). FastAPI is performant and async-ready for handling concurrent API requests.
Database	Supabase (Postgres + pgvector)	Supabase handles Auth, DB, and Storage in one platform. `pgvector` extension allows semantic search without a separate Pinecone bill.
AI Layer	OpenAI GPT-4o + LangChain	GPT-4o offers the best reasoning for medical text simplification. LangChain manages the prompt chains for eligibility parsing efficiently.
Infrastructure	Vercel (Web) + Railway (Worker)	Vercel provides optimal frontend performance. Railway hosts the Python background worker for data ingestion/processing (avoiding Vercel timeouts).

System Architecture

CLIENT LAYER
                    PWA (Next.js)
                    Mobile Web
                
↓ HTTPS / REST
API & PROCESSING LAYER
                    Auth (Supabase)
                    Matching Engine
                    FHIR Parser
                
↓ SQL
↓ API
DATA STORAGEPostgreSQL (Users)
Vector DB (Trials)
Object Storage (Docs)
EXTERNAL & AIOpenAI GPT-4o
ClinicalTrials.gov API
Stripe / SendGrid

                *Nightly Worker (Python) syncs ClinicalTrials.gov data → Embeds → Stores in Vector DB
            

Feature Implementation Complexity

Feature	Complexity	Effort	Dependencies	Notes
User Authentication (HIPAA)	Low	2 days	Supabase Auth	Enable 2FA, strict session management
ClinicalTrials.gov Ingestion	Medium	5 days	Python, Background Jobs	Must handle large XML/JSON dumps efficiently
Smart Matching Engine	High	10 days	OpenAI Embeddings, pgvector	Hybrid search: Semantic (AI) + Metadata (Location/Phase)
Plain Language Summaries	Medium	4 days	OpenAI API	Cache results to save costs; prompt engineering critical
FHIR Health Record Import	High	8 days	Smart/FHIR App Launch	Complex OAuth flows; defer to Phase 2 if needed
Logistics & Maps	Low	3 days	Google Maps API	Standard geolocation + distance calc
Trial Tracker Dashboard	Low	3 days	Frontend State	CRUD operations with status filtering

AI/ML Implementation Strategy

Core Use Cases

1. Eligibility Parsing:
Raw Criteria Text → GPT-4o (Structured Extraction) → JSON (Inclusion/Exclusion Arrays)
2. Semantic Matching:
Patient Profile → Text-Embedding-3-Large → Vector Search → Ranked Trial List
3. Patient Briefs:
Full Protocol → GPT-4o (Summarization Prompt) → 8th Grade Reading Level Summary

Quality & Cost Control

Hallucination Prevention:

Use "Grounding" techniques. Force the AI to answer only based on the provided trial text. If information is missing, instruct it to say "Not specified" rather than inventing details.

Cost Management:

Est. Cost: $0.05 - $0.15 per active user/month (heavily dependent on usage).
Strategy: Cache all trial summaries (generate once, serve to millions). Use cheaper models (GPT-3.5-Turbo) for initial filtering, GPT-4o only for final summarization.

Data Requirements & Strategy

Data Sources

Primary: ClinicalTrials.gov API (AACT Database).
Volume: ~450k studies, ~2GB raw data.
Update Freq: Daily (Delta updates).
Secondary: User Input (Questionnaires), FHIR exports.

Data Schema

Users: ID, Health Profile (JSONB), Preferences.
Trials: NCT_ID, Criteria (Structured), Status, Locations.
Matches: User_ID, Trial_ID, Score, Match_Reasoning.
Subscriptions: User_ID, Query_Filters, Notification_Status.

Privacy (HIPAA)

PII: Encrypt names/email at rest.
Health Data: Isolate in separate schema; strict access controls.
BAA: Required for Supabase, Vercel, OpenAI, SendGrid.
Retention: Allow immediate account/data export/deletion.

Third-Party Integrations

Service	Purpose	Complexity	Cost	Criticality
Supabase	Auth, DB, Storage	Medium	Free → $25/mo	Must-have
OpenAI	LLM Processing	Simple API	Usage-based	Must-have
ClinicalTrials.gov	Trial Data Source	Complex Parsing	Free	Must-have
SendGrid	Transactional Email	Simple API	Free → $20/mo	Must-have
Stripe	Payments (Premium)	Medium	2.9% + 30¢	High
Google Maps API	Logistics/Distance	Simple API	$200 free credit	Nice-to-have

Scalability Analysis

Bottlenecks

AI Latency: Generating summaries takes 3-5s. Must be async (background job) to avoid blocking UI.
Vector Search: As the trial count grows, similarity search slows. Solution: Index optimization (HNSW) and filtering by location first to reduce search space.
API Rate Limits: ClinicalTrials.gov has rate limits. Must use bulk download (XML) for initial sync, API for daily deltas.

Cost at Scale

Users	1,000	10,000	100,000
Hosting (DB/Web)	$50	$200	$1,500
AI Processing	$100	$800	$6,000
Total Est. Monthly	$150	$1,000	$7,500

Technology Risks & Mitigations

AI Hallucination / Medical Inaccuracy

HIGH SEVERITY

The LLM might incorrectly interpret an exclusion criterion (e.g., misreading "history of" as "current") or hallucinating a benefit not listed in the protocol. This could lead patients to enroll in ineligible trials, endangering health and exposing the company to liability.

Mitigation:

Use strict JSON schema validation for AI outputs. Implement a "Confidence Score" for AI matches. Always display the original eligibility criteria alongside the AI summary. Add a "Verify with Doctor" call-to-action on every match.

ClinicalTrials.gov API Changes

MED SEVERITY

Dependency on a government API. Changes in data structure, unscheduled downtime, or rate limit reductions could break the matching engine or stale the database.

Mitigation:

Build an abstraction layer for data ingestion. Monitor the AACT database (Aggregated Clinical Trial Data) as a backup source. Implement robust error logging for the daily sync job to catch schema drift immediately.

HIPAA Compliance Violation

MED SEVERITY

Accidental exposure of Patient Health Information (PHI) via logs, unencrypted storage, or sending PII to non-HIPAA compliant AI models.

Mitigation:

Sign BAAs with all vendors before processing data. Use PII stripping (e.g., replace names with "Patient") before sending text to OpenAI. Enable audit logging on all database access.

Development Timeline (10 Weeks)

Phase 1: Foundation (Weeks 1-2)

Setup Supabase project (Auth, Postgres with pgvector).
Deploy Next.js frontend to Vercel.
Implement "Questionnaire" UI for user health profile.
Deliverable: User can sign up and save a health condition.

Phase 2: Data & AI Engine (Weeks 3-5)

Build Python worker to ingest ClinicalTrials.gov data.
Implement OpenAI embedding generation for trial descriptions.
Build "Smart Match" API endpoint (Vector search + Filters).
Deliverable: System returns relevant trials for a condition.

Phase 3: Features & Polish (Weeks 6-8)

Develop "Plain Language Summary" generation pipeline.
Build Trial Tracker Dashboard (Save/Status features).
Integrate Maps API for logistics.
Deliverable: Functional MVP with core workflows.

Phase 4: Launch Prep (Weeks 9-10)

Security audit (HIPAA check).
Load testing (simulate 100 concurrent users).
Setup Analytics (PostHog/Mixpanel).
Deliverable: Production-Ready v1.0.

Team Composition

Solo Founder Feasibility

Possible, but challenging. A solo technical founder can build the MVP using the low-code stack (Supabase + Vercel). However, the "AI Prompt Engineering" and "Medical Data Validation" require significant time investment.

Required Skills: React/Next.js, Python (for data worker), SQL, Prompt Engineering.

Ideal Team (2 People)

Full-Stack Engineer Frontend (React), API (Node/Python), DB (Postgres).

AI/Data Engineer Python, LangChain, ETL Pipelines, Vector DBs.