06. Validation Experiments & Hypotheses
A systematic approach to de-risking PromptVault by testing critical assumptions regarding prompt chaos, version control needs, and willingness to pay before full-scale engineering.
1. Core Hypotheses
We have identified 4 critical risk areas that require immediate validation.
Hypothesis #1: The "Chaos" Problem
🔴 Critical Risk
"We believe that AI Engineers & Product Teams
Will actively switch from Notion/Spreadsheets to a dedicated tool
If we provide structured organization and searchability
We will know this is true when 60% of interviewed engineers cite 'finding the right version' as a top-3 weekly pain point."
Will actively switch from Notion/Spreadsheets to a dedicated tool
If we provide structured organization and searchability
We will know this is true when 60% of interviewed engineers cite 'finding the right version' as a top-3 weekly pain point."
Current Evidence: High search volume for "prompt engineering tools"; frequent complaints in r/LocalLLaMA about losing prompts.
Validation Metric: >60% pain confirmation in interviews; >5% CTR on "Stop losing prompts" ads.
Validation Metric: >60% pain confirmation in interviews; >5% CTR on "Stop losing prompts" ads.
Hypothesis #2: Version Control Utility
🔴 Critical Risk
"We believe that developers building with LLMs
Will adopt a 'Git-for-Prompts' workflow
If we provide diff views and instant rollback capabilities
We will know this is true when users in usability tests revert changes 3+ times during a simulation session."
Will adopt a 'Git-for-Prompts' workflow
If we provide diff views and instant rollback capabilities
We will know this is true when users in usability tests revert changes 3+ times during a simulation session."
Current Evidence: Developers love Git; LangChain adoption suggests desire for code-like structure.
Validation Metric: >70% of testers rate "Diff View" as the most valuable feature.
Validation Metric: >70% of testers rate "Diff View" as the most valuable feature.
Hypothesis #3: Team Monetization
🟡 High Risk
"We believe that Engineering Managers
Will pay $49/user/month for team features
If we provide centralized governance and multi-model cost tracking
We will know this is true when we secure 5 Letters of Intent (LOI) or pre-orders from teams >5 people."
Will pay $49/user/month for team features
If we provide centralized governance and multi-model cost tracking
We will know this is true when we secure 5 Letters of Intent (LOI) or pre-orders from teams >5 people."
Current Evidence: B2B SaaS standard pricing; competitors like Dust.tt charge premium.
Validation Metric: 5 signed LOIs or 20% conversion on pricing page "Contact Sales" button.
Validation Metric: 5 signed LOIs or 20% conversion on pricing page "Contact Sales" button.
Hypothesis #4: Multi-Model Testing
🟢 Medium Risk
"We believe that Prompt Engineers
Will prefer PromptVault over provider playgrounds
If we allow simultaneous testing of OpenAI, Anthropic, and Llama
We will know this is true when users connect >2 API keys during onboarding in the prototype."
Will prefer PromptVault over provider playgrounds
If we allow simultaneous testing of OpenAI, Anthropic, and Llama
We will know this is true when users connect >2 API keys during onboarding in the prototype."
Current Evidence: Model fragmentation is increasing (Claude 3 vs GPT-4).
Validation Metric: >50% of beta users connect secondary model provider.
Validation Metric: >50% of beta users connect secondary model provider.
2. Experiment Catalog
We will execute these experiments to validate the hypotheses above, ranked by impact.
3. Prioritization Matrix
High Impact
High Effort
Exp 01: Interviews
Exp 02: Landing Page
Exp 03: VS Code Ext
Exp 04: Prototype
Exp 05: Concierge
We prioritize High Impact / Low Effort (Top Left) first.
4. Validation Schedule
Phase 1: Problem Discovery
Weeks 1-2
- Execute Exp 01 (Interviews)
- Launch Exp 02 (Landing Page Ads)
Phase 2: Solution Fit
Weeks 3-5
- Build & Test Exp 04 (Figma Prototype)
- Build Exp 03 (VS Code Fake Door)
Phase 3: Business Viability
Weeks 6-8
- Execute Exp 06 (Pricing Survey)
- Launch VS Code Ext & Measure Retention
- Final Go/No-Go Decision
5. Decision Framework
✅ Criteria for "GO"
- 🎯 Problem: >60% of interviewees confirm "Prompt Chaos" is a major pain.
- 📈 Interest: >5% Landing Page Signup Rate (Waitlist).
- 🛠️ Utility: >70% Prototype users successfully use "Diff View".
- 💰 WTP: Validation of $19/mo price floor via survey/intent.
⚠️ Pivot Triggers
-
Trigger: Users say Notion is "good enough."
Pivot: Shift from "Standalone App" to "Notion Plugin" or "VS Code Extension" only (meet them where they are). -
Trigger: Individual devs won't pay, only teams will.
Pivot: Drop $19 tier, go purely B2B Sales ($499/mo min), focus entirely on Governance/Security features. -
Trigger: Users only care about Testing, not Organization.
Pivot: Drop "Library" features, build "LLM CI/CD Pipeline" tool exclusively.
Experiment Documentation Standard
## Experiment Log: [Name]
**Date:** [MM/DD/YYYY] | **Owner:** [Name]
**Hypothesis:** [Link to H1-H4]
**Results:**
- Target Metric: [e.g., 5% CTR]
- Actual Result: [e.g., 3.2% CTR]
- Outcome: [PASS / FAIL / INCONCLUSIVE]
**Key Insight:**
[One sentence summary of what we learned about the user behavior]
**Action Item:**
[Iterate / Kill / Scale]