06. MVP Roadmap & Feature Prioritization
Strategic execution plan for BenchmarkHub: From prototype to platform.
MVP Definition: The "Github for LLM Evaluations"
Core Value Proposition: A web-based tool allowing engineers to define a specific task, run it against 3-5 top models instantly using their own API keys, and visualize the winner based on correctness and cost.
Must-Have Features
- Custom Benchmark Builder (Input/Output pairs)
- Multi-model Runner (via OpenRouter integration)
- "LLM-as-a-Judge" Evaluation Logic
- Public Benchmark Library (Read/Fork)
NOT in MVP
- Native billing/credit system (BYO API Key only)
- CI/CD Pipeline integrations
- Team workspaces & RBAC
- Human-in-the-loop rating interface
Feature Prioritization Matrix
Strategic plotting of features to maximize ROI in Phase 1.
Phased Development Roadmap
Timeline & Milestones
Setup
Wk 1-2
Core Build
Wk 3-6
Beta/Launch
Wk 7-8
- M1 (Wk 2): Runner API connects to OpenRouter.
- M2 (Wk 6): 20 Internal benchmarks running.
- M3 (Wk 8): Public Beta Launch.
"Do More With Less" Stack
Leveraging existing APIs to cut engineering time by 60%.
| LLM Aggregation | OpenRouter | Saves 3 wks |
| Auth & User Mgmt | Clerk | Saves 1 wk |
| Vector DB | Supabase | Saves 5 days |
| UI Components | shadcn/ui | Saves 2 wks |
⚠️ Critical Roadmap Risks
Risk: API Key Liability
Users fearing key leakage.
Mitigation: Keys stored only in browser local storage for MVP (Client-side execution where possible).
Risk: Cost of "Judge"
GPT-4 evaluation is expensive.
Mitigation: Default to cheaper models (GPT-4o mini) for draft runs; Premium judge for final.