Problem Framing with AI

Project Overview

This project explores how AI can enhance the Software Development Life Cycle (SDLC) by collaborating with human teams through a structured, role-based workflow.
Our approach mirrors the way real product teams operate — humans lead discovery, AI supports synthesis. For instance, interviews and workshops are conducted by researchers and designers, while AI assists in analyzing data, clustering insights, and framing problem statements.

We are benchmarking AI outputs against existing human research, aiming for at least 70% accuracy and interpretive alignment before integrating them into the live workflow.
This phase focuses on Problem Framing, a critical step in the SDLC where ambiguity is transformed into clarity.

As the AI Champion leading the squad, I guided the team in developing deterministic prompts that make problem framing measurable, repeatable, and transparent by ensuring AI becomes a reliable partner in design reasoning.

ROLE & DURATION

Lead, AI Enablement | EPAM

UX Research, Design Strategy, Problem Space

November 2025

Table of Contents

Introduction

Problem framing defines how teams understand and prioritize opportunities. It is the stage where ambiguity turns into clarity where “what users said” and “what the business wants” must reconcile into a shared definition of the problem worth solving.

Artificial Intelligence, when prompted systematically, can enhance this process.
It can normalize, structure, and cross-check the raw, messy inputs designers usually receive from sticky notes and transcripts to Jira tickets and dashboards.
Rather than generating ideas, the AI’s role here is sensemaking by turning unstructured inputs into evidence-based hypotheses.

This case presentation shows how we can systematize problem framing across four input types within the SDLC:

Workshops
Data
Vague Requirements
User Interviews

Each input has its own deterministic prompt design that controls for randomness, ensures consistency, and produces structured outputs ready for prioritization.

Why Problem Framing Matters in the SDLC

In every SDLC, misframed problems lead to wasted design and development cycles. Teams often jump into building solutions without fully aligning on what needs solving or why.

By introducing a structured problem framing step, teams can:

Identify contradictions and missing evidence early.
Align design intent with business goals.
Anchor every “How might we” in traceable inputs.

In short, problem framing is where discovery meets definition and where clarity protects time, cost, and credibility downstream.

The Four Entry Points to Problem Framing

Category	Input Type	Objective
Workshops	Facilitated sessions or brainstorming transcripts	Convert open-ended ideas into structured opportunity spaces and ranked HMWs.
Data	Behavioral, product, or operational datasets	Extract pattern-based hypotheses and measurable signals from quantitative evidence.
Vague Requirements	PRDs, Jira tickets, meeting notes	Normalize ambiguous requirements and flag measurable rewrites.
User Interviews	Interview transcripts	Extract pain/gain themes, consolidate across users, and generate prioritized HMWs.

Determinism vs Variety: How We Control It

Generative AI is probabilistic by nature, meaning identical inputs can yield different outputs.
To use it reliably for analytical work, we need deterministic prompt structures that are designed to minimize interpretation drift. Every prompt in this framework follows a structure that encodes reasoning.

A deterministic prompt clearly defines:

Role – defines expertise and framing context
Context – product, objective, and constraints
Definitions – key terms and evidence rules
Limits & Constraints – caps, word limits, or max items
Ranking & Sorting Logic – to enforce order
Acceptance Criteria – schema-based validation
Fixed Output Format – ensures structured reusability

This standardization makes problem framing with AI repeatable by transforming what used to depend on “gut feel” into a process with measurable consistency.

Lens-Based Evaluation

The lens framework helps teams assess the completeness and systemic balance of their insights.
Not every problem is a design problem. This step helps the AI tools to strategize and frame the problem within a specific context.

Lens	Guiding Question	Why It Matters
People	Who benefits or is excluded?	Shapes user personas and accessibility considerations.
Process	How does it integrate operationally?	Ensures workflows are feasible and smooth.
Technology	What’s feasible within the tech stack?	Prevents overpromising.
Policy	What rules or compliance apply?	Safeguards against legal or governance risk.
Data & Evidence	What’s proven vs assumed?	Grounds design in verifiable signals.
Temporal	How will this evolve?	Supports scalability and adaptability.
Resources	What’s realistically available?	Keeps ambition feasible.
Value & Incentives	What motivates adoption?	Ensures sustainable engagement.
Culture	How does it fit organizationally or socially?	Predicts resistance or advocacy.

Each HMW is scored accross:

Factor	0	1	2
evidence_fit	No direct evidence	Partial/indirect	Direct, multi-source
impact	Marginal	Meaningful	High-leverage
effort_inverse	Heavy	Moderate	Light
risk_inverse	High risk	Manageable	Low risk

This scoring creates a consistent rubric for prioritization, reducing the tendency to favor ideas based on intuition or popularity.

Evaluation Loop

The evaluation loop ensures that outputs are consistent, replicable, and measurable, not subjective.

Reproducibility Check: Run the same prompt multiple times to test stability.
Inter-Rater Match: Compare AI and human interpretations; track ≥70% accuracy as a reliability target.
Error Categorization: Identify failure modes (e.g., missed ambiguity, weak clustering) and adjust prompt parameters.

By automating schema validation and scoring logic, the loop ensures speed and consistency while keeping human oversight meaningful.

Prompt: Workshop Interpretation and Prioritization

Workshops are where ideas flow freely, but translating that energy into structured insight is difficult.
This prompt captures, clusters, and ranks workshop notes into evidence-backed themes and HMWs.

What it does:
Processes raw workshop notes or transcripts, extracts idea fragments, groups them into themes, and generates ranked HMWs (How Might We) statements with traceable rationale.

Input:
Free-form workshop notes, transcripts, or sticky-note exports.

Output:
JSON (themes + ranked HMWs) and Markdown (ranked summaries + assumptions/gaps).

(Full deterministic prompt structure to be placed here — same format as Vague Requirements Prompt 2)

Use Case Example:
When a workshop produces hundreds of sticky notes or chat messages, this prompt can automatically surface clusters like “Data quality issues,” “Slow approvals,” or “Onboarding confusion,” and turn them into actionable HMWs tied to evidence.

ROLE
You are a neutral facilitator-analyst converting workshop artifacts (empathy maps, stickies, votes) into evidence-backed problem statements and HMWs.

CONTEXT
Organization / Product / Domain: {org_or_product}
Workshop purpose (why now): {purpose_or_goal}
Participants (roles & counts): {roles_counts}
Known constraints/biases (optional): {constraints_or_biases}
Lens tagging (optional): {people | process | technology | policy | data | incentives | culture | temporal}
HMW track: {both | design | nondesign} # choose one; default = both

DEFINITIONS
Theme = a semantic cluster of related evidence (not a single sticky).
Pain Point = friction, unmet need, negative outcome; Gain Point = benefit, enabler, positive outcome.
Tie-break (Pain vs Gain): if net user outcome is negative → Pain; otherwise → Gain.
Evidence = verbatim sticky/quote + source marker (e.g., “empathy:says#S12”, “board:cluster#A3”); vote_count optional.
support_ids = IDs of evidence items linked to a theme (e.g., [“P1″,”P3”]).
evidence_strength = integer count of distinct support_ids.
severity_impact = High | Medium | Low | N/A.
business_value = High | Medium | Low | null.
Design tags (if design track used) = {ia, interaction, feedback, copy, a11y, perf}.

LIMITS
MAX_THEMES={8}
MAX_HMW_PER_THEME={3}
TOP_N={5}
Quote length (if any new quotes used) ≤ 20 words.

PRIORITIZATION / RANKING RULE (apply in order; deterministic)
1) evidence_strength (desc)
2) severity_impact (High > Med > Low > N/A)
3) business_value (High > Med > Low > null)
4) label A→Z (stable tie-break)

TASK
1) Extract & Classify

From INPUT, pull Pain Points and Gain Points with verbatim evidence and source markers; give each an ID (P#/G#).

2) Group & Score Themes

Cluster Pain Points into Themes (short label + 1-line rationale).
For each theme: set support_ids, compute evidence_strength, set severity_impact and business_value, and (optionally) lenses.

3) Generate HMWs (per selected track)

Pattern: “How might we [action] for [user/segment] so that [desired outcome]?”
If HMW track = both → output hmw_design[] (with design tags) AND hmw_nondesign[] (policy/process/tech/data-gov; no UI).
Each HMW must cite evidence_ref (1–2 IDs ⊆ support_ids). Max MAX_HMW_PER_THEME per array.

4) Rank Top N

Produce top_n HMW IDs using the ranking rule.

5) Assumptions and conflicts

List any missing evidence, imbalances (vote skew, role dominance), or contradictions.

CONSTRAINTS

Strict evidence only; do NOT invent quotes, sources, or counts. Use “N/A” if missing.
Preserve verbatim inside evidence; do not paraphrase quotes.
HMWs must be problem-focused (no baked solutions) and feasibly testable in ≤ 6–8 weeks.
Determinism: use low temperature (0–0.2) if available.

SORTING & DETERMINISM

Sort themes by computed rank; ties → label A→Z.
Sort HMWs by the same rule; ties → label A→Z.

ACCEPTANCE CRITERIA (must pass)

Schema valid; required keys present; caps respected (MAX_THEMES, MAX_HMW_PER_THEME, TOP_N).
Each HMW cites 1–2 evidence_ref IDs that exist in its theme’s support_ids.
Rankings follow the rule; ties resolved by label A→Z.
No fabricated data; missing values marked “N/A”.

VALIDATOR (run before finalizing; auto-fix where trivial, else list violations)
Return either PASS or JSON list of {where, field, issue, suggestion}. Ensure:

All IDs resolve; evidence_ref ⊆ support_ids.
Quotes ≤ 20 words; sources present.
HMWs are problem-focused, actor/context clear, scope ≤ 6–8 weeks.

OUTPUT FORMAT (use exactly this structure)

Output (Part 1 — JSON ONLY)
{
“pain_points”: [
{“id”:”P1″,”verbatim”:”≤20w”,”source”:”empathy:says|thinks|does|feels|board:…”,”vote_count”:null}
],
“gain_points”: [
{“id”:”G1″,”verbatim”:”≤20w”,”source”:”empathy:says|thinks|does|feels|board:…”,”vote_count”:null}
],
“themes”: [
{
“label”:”A”,
“name”:”{short theme title}”,
“rationale”:”{1-line why these items cluster}”,
“support_ids”:[“P1″,”P3”],
“evidence”:[
{“quote”:”verbatim”,”source”:”empathy:says#S12″,”vote_count”:2}
],
“scores”:{
“evidence_strength”:2,
“severity_impact”:”High”,
“business_value”:”Med”
},
“lenses”:[“people”,”process”],
“hmws”:{
“hmw_design”:[
{“id”:”A-1″,”text”:”How might we … for … so that …?”,”tags”:[“ia”,”interaction”],”evidence_ref”:[“P1”]}
],
“hmw_nondesign”:[
{“id”:”A-2″,”text”:”How might we … for … so that …?”,”evidence_ref”:[“P3”]}
]
}
}
],
“top_n”:[“A-1″,”B-1″,”C-1″,”D-1″,”E-1”],
“assumptions_or_conflicts”: [“List missing evidence, vote/role imbalances, and explicit contradictions.”]
}

Output (Part 2 — MARKDOWN ONLY)
A. Themes (Grouped from Findings)
Theme Name: {short name}

Supporting findings: • {source / quote / votes or counts}
Why this theme matters: {1–2 sentences grounded in evidence}

C. HMW Statements (2–3 per Top Theme)
HMW 1: How might we [action] for [target/user] so that [desired outcome]?
Evidence: {P#/G#,… from support_ids; e.g., empathy:says#S12 (2 votes)}
Lens/Tags (optional): {people/process/technology/policy/data/incentives/culture/temporal} • {ia/interaction/feedback/copy/a11y/perf}
HMW 2: …
Evidence: …
Lens/Tags: …

D. Assumptions, Gaps & Contradictions

{bullets}

E. Appendix — Raw Evidence Index (optional)

{ID → exact source markers and any counts}

INPUT (paste raw workshop exports below this line)

Empathy Map
Says: {…}
Thinks: {…}
Does: {…}
Feels: {…}
Participants & Roles: {…}
Votes / Tallies (if any): {…}
Known Biases/Constraints: {…}
Additional Artifacts (links/IDs): {…}

Prompt: Quantitative Insight Interpretation

Data is an anchor for evidence-based framing, but most teams don’t interpret metrics systematically.
This prompt helps structure numeric or categorical data summaries into hypotheses and potential opportunity spaces.

What it does:
Converts structured data summaries into interpretable hypotheses, detects anomalies or outliers, and frames top opportunities or risks through HMWs.

Input:
CSV summaries, dashboard text exports, or metric observations.

Output:
JSON (findings + hypothesis clusters + HMWs) and Markdown (ranked table of insights).

Example Behavior:
If user drop-off at checkout rose 12%, the AI identifies patterns (e.g., “payment failure” + “promo code errors”), suggests evidence-backed hypotheses, and converts them into structured HMWs like:

“How might we improve checkout reliability for returning users so that they complete transactions seamlessly?”

ROLE
You are a senior UX research analyst converting survey/log analytics into evidence-backed problem statements and HMWs.

CONTEXT
Organization / Product / Domain: {org_or_product}
Purpose (why now): {purpose_or_goal}
Audience / Segments (optional): {personas_or_segments}
Lens tagging (one per HMW): {people | process | technology | policy | data | incentives | culture | temporal}
HMW track (optional): {single | both} # if both, keep text the same; the lens differentiates
Constraint: Use only the data provided; if anything is missing, write “N/A”. Do not infer beyond INPUT.

DEFINITIONS
Finding = a data-backed observation strictly derived from INPUT.
Metric tuple = {q:”Q#”, n:, d:, pct:, note:”short label”}.
Theme = semantic cluster of related findings; short noun phrase + 1-line rationale.
support_ids = IDs of findings attached to a theme (e.g., [“F1″,”F3”]).
evidence_strength = integer count of distinct support_ids.
severity_impact = High | Medium | Low | N/A (label only if discernible from data).
business_value = High | Medium | Low | null (use if provided; else null).
Design tags (if you later split design/non-design elsewhere) = {ia, interaction, feedback, copy, a11y, perf}.

LIMITS
MAX_THEMES={8}
MAX_HMW_PER_THEME={3}
TOP_N={5}

TASK
1) Extract Findings

From INPUT, pull pains/opportunities/patterns as findings.
Attach exact metric tuples (q#, n, d, pct, note). If any field is missing, use “N/A”.

2) Group → Themes & Score

Cluster findings into themes (short label + 1-line rationale).
For each theme, set support_ids and compute evidence_strength; set severity_impact; set business_value if available.

3) Generate HMWs (2–3 per top theme)

Pattern: “How might we [action] for [target/user] so that [desired outcome]?”
Add an Evidence line citing concrete q#/n/d/% from that theme.
Tag each HMW with exactly one lens from the unified set.

4) Rank Top-N

Produce top_n HMW IDs via the ranking rule.

5) Assumptions and conflicts

Note unavailable intensity/impact, missing segments/time windows, caveats, data quality issues or contradictions.

CONSTRAINTS

Evidence-only; no invented numbers, quotes, or sources. Missing → “N/A”.
HMWs are solution-neutral, user+outcome oriented, and feasibly testable in ≤ 6–8 weeks.
Determinism: use low temperature (0–0.2) if available.
Respect caps (MAX_THEMES, MAX_HMW_PER_THEME, TOP_N).

SORTING & DETERMINISM

Sort themes by computed rank (1..N). Sort HMWs using the same rule.
Break ties deterministically by label A→Z.

ACCEPTANCE CRITERIA (must pass)

JSON schema valid; all required keys present; caps respected.
Every HMW cites ≥1 evidence_ref that exists in its theme’s support_ids (or cites specific metric tuples from that theme).
Rankings follow the rule; tie-break applied.
No fabricated data; missing values marked “N/A”.

VALIDATOR (run before finalizing; auto-fix where trivial, else list violations)
Return either PASS or a JSON list of {where, field, issue, suggestion}. Ensure:

Metric tuples have valid fields (q, n, d, pct) or “N/A”.
HMWs follow the pattern and include an Evidence line with real q#/n/d/% from the same theme.
IDs resolve; evidence_ref ⊆ support_ids (when IDs used).
Scope: HMWs are problem-focused with clear actor/context and ≤ 6–8 week feasibility.

OUTPUT FORMAT (use exactly this structure)

Output (Part 1 — JSON ONLY)
{
“findings”: [
{
“id”: “F1”,
“type”: “pain|opportunity|pattern”,
“verbatim”: “short label derived from data (no invention)”,
“metrics”: [
{“q”:”Q6″,”n”:43,”d”:54,”pct”:79.63,”note”:”discover via email”}
]
}
],
“themes”: [
{
“label”: “A”,
“name”: “{short theme title}”,
“rationale”: “{1-line why these findings cluster}”,
“support_ids”: [“F1″,”F3”],
“evidence”: [
{“q”:”Q6″,”n”:43,”d”:54,”pct”:79.63,”quote_or_note”:”email is primary discovery”}
],
“scores”: {
“evidence_strength”: 2,
“severity_impact”: “High|Med|Low|N/A”,
“business_value”: “High|Med|Low|null”
},
“hmws”: [
{
“id”: “A-1”,
“text”: “How might we … for … so that …?”,
“lens”: “people|process|technology|policy|data|incentives|culture|temporal”,
“evidence_ref”: [“F1”] // optional if you cite explicit metric tuples above
}
]
}
],
“top_n”: [“A-1″,”B-1″,”C-1″,”D-1″,”E-1”],
“assumptions_or_conflicts”: [“Conflicting metrics/segments/time windows, missing intensity, data quality issues.”]
}

Output (Part 2 — MARKDOWN ONLY)
A. Themes (Grouped from Findings)
Theme Name: {short name}

Supporting findings: • {Q6: 43/54 (79.6%) discover via email} • {Q7: 33/54 revisit via email}
Why this theme matters: {1–2 sentences grounded in evidence}

C. HMW Statements (2–3 per Top Theme)
HMW 1: How might we [action] for [target/user] so that [desired outcome]?
Evidence: {Q6: 43/54; Q7: 33/54 … or F-IDs from the theme}
Lens: {people|process|technology|policy|data|incentives|culture|temporal}
HMW 2: …
Evidence: …
Lens: …

D. Assumptions, Gaps & Contradictions

{bullets}

E. Appendix — Raw Evidence Index (optional)

{Theme/HMW → Q#, n/d, %, time window, segment if applicable}

INPUT (paste raw survey/log data below this line)

Dataset/context: {e.g., WAC survey wave, dates}
Key tables or excerpts:
Q# | Question | n/d | % | Notes
{paste here}
Segments/time windows (optional): {…}
Caveats / data quality (optional): {…}

Prompt: Making Sense of Vague Requirements

Vague or conflicting requirements are a frequent bottleneck in SDLCs.
These two prompts normalize requirements and convert them into measurable HMWs.

Prompt 1: Normalize + Ambiguity Lint

What it does:
Extracts discrete requirement items and identifies ambiguous or under-specified language.

Input:
PRDs, meeting notes, Jira tickets.

Output:
JSON (items + lint) and Markdown tables (normalized items + ambiguity list).

ROLE
You are a senior requirements analyst.

CONTEXT
Organization / Product: {org_or_product}
Objective: Convert vague requirements into discrete, traceable items and flag ambiguity with measurable rewrites.
Audience / Segments: {personas_or_segments}
Strict evidence: TRUE (no invented quotes/sources).
Max items to extract (optional): {e.g., 80}

DEFINITIONS
Item = a discrete requirement statement (≤25 words, verbatim from input).
Type ∈ Goal | Need | Constraint | Assumption | OpenQ.
Source = section/heading/URL slug inferred from where the quote appears.
Ambiguity types:

vague_term (intuitive, seamless, robust, fast…)
pronoun_ambiguity (it/this/they w/o referent)
under_specified_quantifier (many/few/some; numbers w/o owner/list)
TBD/deferred
Severity ∈ High | Medium | Low (High blocks implementation/testing).

LIMITS
Items must be ≤25 words; do not merge distinct ideas.

PRIORITIZATION / RANKING RULE
N/A for this stage (extraction + lint only).

TASK
1) Normalize

Extract items verbatim (≤25w), assign Type, attach Source. ID as R-###.

2) Ambiguity Lint

For each item, detect 0..n issues, set Severity, and propose a measurable rewrite that preserves intent.

CONSTRAINTS

Do not invent quotes, sources, or KPIs. If Source is unclear, write “Unknown”.
Each item must express a single intent.

SORTING & DETERMINISM

Sort items by source order (top→down), then by Type A→Z, then by ID.

ACCEPTANCE CRITERIA (must pass)

JSON matches schema; all items ≤25 words; valid types; sources present/Unknown.
Lint issues use allowed ambiguity types; suggested_fix is specific/measurable.

VALIDATOR (return PASS or violations JSON)
Check: id/type/text/source present; text length ≤25w; ambiguity types valid; suggested_fix specific.

OUTPUT FORMAT (use exactly this structure)

Output (Part 1 — JSON ONLY)
{
“items”: [
{
“id”: “R-001”,
“type”: “Goal|Need|Constraint|Assumption|OpenQ”,
“text”: “≤25-word exact quote”,
“source”: “Section or URL slug|Unknown”
}
],
“lint”: [
{
“id”: “R-001”,
“issues”: [
{“type”:”vague_term”,”term”:”intuitive”,”reason”:”not measurable”},
{“type”:”pronoun_ambiguity”,”reason”:”unclear referent”}
],
“severity”: “High|Medium|Low”,
“suggested_fix”: “Rewrite with KPI / explicit referent / owner+deadline / acceptance criteria”
}
]
}

Output (Part 2 — MARKDOWN ONLY)
A) Normalized Items
ID | Type | Exact Quote (≤25w) | Source

B) Ambiguity Linter
Group by item ID. If no issues, write “None” under that ID.
R-###
Issue: {ambiguity_type} (“{term/fragment}”) → Severity: {High|Medium|Low}
Fix: {measurable rewrite / explicit referent / owner + deadline / acceptance criteria}

C) Assumptions, Gaps & Contradictions

Note unresolved TBDs, conflicting items (e.g., two goals with opposing KPIs), or missing sources.

INPUT (raw text pasted below)

{Paste meeting notes / PRD / Jira / Confluence extract here}

Prompt 2: Cluster/Interpret & Prioritize HMWs

What it does:
Clusters normalized items into themes, generates solution-neutral HMWs, and ranks them deterministically using evidence strength and severity.

Input:
JSON output from Prompt 1.

Output:
JSON (themes + ranked HMWs + rationale) and Markdown tables (ranked themes + summaries).

ROLE
You are a senior UX research analyst and design strategist.

CONTEXT
Organization / Product: {org_or_product}
Research Goal: {research_goal}
Audience / Segments: {personas_or_segments}
Lens set (≤3 per theme): {people | process | technology | policy | data | incentives | culture | temporal}
HMW_TRACK: {both | design | nondesign} # default both
Design tags (if design track used): {ia, interaction, feedback, copy, a11y, perf}

DEFINITIONS
support_ids = IDs from Prompt-1 items that support a theme (e.g., [“R-001″,”R-014”]).
evidence_strength = integer count of distinct support_ids.
severity_impact ∈ High | Medium | Low.
business_value ∈ High | Medium | Low | null (carry if present; else null).
Theme = semantic cluster (≤MAX_THEMES) with short label + 2–3 line summary (problem/need).

LIMITS
MAX_THEMES={10}
MAX_HMW_PER_THEME={3}
TOP_N={5}

PRIORITIZATION / RANKING RULE (apply in order; deterministic)
1) evidence_strength (desc)
2) severity_impact (High > Med > Low)
3) business_value (High > Med > Low > null)
4) label A→Z (stable tie-break)

TASK
(C) Cluster & Interpret

Cluster Prompt-1 items into ≤MAX_THEMES themes; label + 2–3 line summary.
Assign ≤3 lenses per theme.
Set support_ids; compute evidence_strength; set severity_impact; carry business_value if present.

(D) Generate HMWs (solution-neutral)

Pattern: “How might we [action] for [user/segment] so that [desired outcome]?”
Every HMW must include evidence_ref (1–2 IDs ⊆ theme.support_ids).
Emit per HMW_TRACK:
• both → hmw_design[] (with design tags) AND hmw_nondesign[] (policy/process/tech/data-gov; no UI specifics)
• design → only hmw_design[]
• nondesign → only hmw_nondesign[]
Limit each array to ≤MAX_HMW_PER_THEME, prioritizing alignment to evidence_strength.

(E) Prioritize

Rank all themes strictly by the rule; embed rank (1..N) + 1-line rationale.
Return only TOP_N themes.

CONSTRAINTS

Do not invent IDs/quotes/sources; evidence_ref ⊆ support_ids.
HMWs must be solution-neutral, user+outcome oriented; feasible in ≤6–8 weeks.
Determinism: low temperature (0–0.2) if available.

SORTING & DETERMINISM

Sort themes by ascending rank (1..N); break ties by label A→Z.
Ensure every hmw.text begins with “How might we ” and contains “ for ” and “ so that ”.

ACCEPTANCE CRITERIA (must pass)

JSON schema valid; required keys present; caps respected.
Each HMW cites 1–2 evidence_ref IDs within its theme’s support_ids; tags (if present) ∈ allowed set.
Rankings follow the rule; ties resolved with label A→Z.
No fabricated data; gaps marked as “N/A” in rationales if needed.

VALIDATOR (return PASS or violations JSON)

Check schema, caps, ID resolution, evidence_ref ⊆ support_ids, HMW pattern & scope, lenses/tags allowed.

OUTPUT FORMAT (use exactly this structure)

Output (Part 1 — JSON ONLY)
{
“themes”: [
{
“label”: “A”,
“summary”: “2–3 lines”,
“lenses”: [“people”,”process”], // ≤3
“support_ids”: [“R-001″,”R-014”],
“evidence_strength”: 2,
“severity_impact”: “High|Medium|Low”,
“business_value”: “High|Medium|Low|null”,
“rank”: 1,
“rationale”: “why ranked here per rule”,
“hmw_design”: [
{
“id”: “A-1”,
“text”: “How might we … for … so that …?”,
“evidence_ref”: [“R-001″,”R-014”],
“tags”: [“ia”,”interaction”]
}
],
“hmw_nondesign”: [
{
“id”: “A-2”,
“text”: “How might we … for … so that …?”,
“evidence_ref”: [“R-001”]
}
]
}
],
“assumptions_or_conflicts”: [“Conflicting requirements, unresolved TBDs, or evidence gaps affecting ranking.”]
}

HMWs — Design (per theme)
HMW: How might we … for … so that … ?
Evidence: R-###, R-### • Tags: ia, interaction (≤MAX_HMW_PER_THEME)

HMWs — Non-Design (per theme)
HMW: How might we … for … so that … ?
Evidence: R-###, R-### (≤MAX_HMW_PER_THEME)

Assumptions, Gaps & Contradictions

{bullets}

INPUT (paste Prompt-1 output below this line)

Preferred: JSON from Prompt-1 (items + lint).
Alternative: Markdown tables (Items + Linter) clearly labeled.

Prompt: Synthesizing User Interviews

Interview data represents the voice of the user, but interpreting it consistently is hard.
This pair of prompts makes the process structured and evidence-driven.

Prompt 1: Extract Pain & Gain per Interview

What it does:
Extracts themes from a single transcript, categorizes them into Pain vs Gain, and attaches supporting quotes.

Input:
Speaker-labeled interview transcript.

Output:
JSON (pain_points, gain_points) and Markdown tables.

ROLE
You are a UX researcher performing thematic analysis on a single interview transcript.

CONTEXT
Organization / Domain: {org_or_product}
Interview purpose (why now): {research_goal}
Participant type (persona/role): {persona_or_role}
Study mode (optional): {generative|evaluative}; Key tasks/flows (optional): {…}

DEFINITIONS
Theme = recurring idea that affects goals/behavior/outcomes; collapse paraphrases under one label.
Umbrellas = Pain Points (frictions, unmet needs, negative outcomes) vs Gain Points (benefits, enablers, positive outcomes).
Tie-break rule = If the net user outcome is negative → classify as Pain; otherwise → Gain.
Frequency = count of distinct mentions of the SAME theme within THIS interview (nearby paraphrases count as one).
Evidence = ≤20-word verbatim quote + timestamp “mm:ss–mm:ss”; if unknown → “unknown”.
Bucket/Category = short higher-level grouping label (e.g., “Onboarding Clarity”, “Performance & Errors”).

LIMITS
Quotes ≤ 20 words each.
Every theme must include ≥1 evidence quote.
Each theme belongs to EXACTLY one umbrella (Pain OR Gain).

PRIORITIZATION / RANKING RULE (within each umbrella)
1) frequency (desc) → 2) perceived impact (High>Med>Low) → 3) theme (A→Z)

TASK
1) Extract themes and assign each to Pain or Gain (use tie-break if needed).
2) Assign each theme to a Bucket/Category (short label).
3) Count frequency; attach up to 3 representative verbatim quotes with timestamps.
4) Rank themes within each umbrella using the rule above.
5) Note bucket definitions and any contradictions/outliers.

CONSTRAINTS
Do NOT invent quotes or timestamps. Use “unknown” if a timestamp is missing.
No superficial notes (e.g., “they used the app”); include only insight-bearing themes.
Keep output concise and scannable.
Determinism: use low temperature (0–0.2) if available.

SORTING & DETERMINISM
Pre-sort Pain themes and Gain themes independently using the ranking rule.
Use theme A→Z as a stable tie-break.

ACCEPTANCE CRITERIA (must pass)

JSON matches schema; each theme has bucket, frequency ≥1, impact label, and ≥1 quote (≤20w) with timestamp or “unknown”.
No theme appears in both umbrellas.
Sorting follows the ranking rule; tie-break A→Z applied.

VALIDATOR (return PASS or violations JSON)
Ensure: required keys present; quote length ≤20w; timestamps valid or “unknown”; umbrellas exclusive; sorting correct.

OUTPUT FORMAT (use exactly this structure)

Output (Part 1 — JSON ONLY)
{
“pain_points”: [
{
“bucket”: “string”,
“theme”: “string”,
“frequency”: 0,
“impact”: “High|Medium|Low”,
“evidence”: [
{“quote”: “≤20 words”, “timestamp”: “mm:ss-mm:ss|unknown”}
]
}
],
“gain_points”: [
{
“bucket”: “string”,
“theme”: “string”,
“frequency”: 0,
“impact”: “High|Medium|Low”,
“evidence”: [
{“quote”: “≤20 words”, “timestamp”: “mm:ss-mm:ss|unknown”}
]
}
],
“notes”: {
“bucket_definitions”: [{“bucket”: “string”, “definition”: “1 line”}],
“contradictions_or_outliers”: [“string”]
}
}

Output (Part 2 — MARKDOWN ONLY)
Pain Points (table)
Bucket/Category | Theme | Frequency | Impact | Quotes (Time)

Gain Points (table)
Bucket/Category | Theme | Frequency | Impact | Quotes (Time)

Notes (bullets)

Bucket definitions (1 line each)
Contradictions or outliers (1–2 bullets)

INPUT (paste transcript below this line)

Transcript (speaker-labeled if available):
{paste_transcript_here}

Prompt 2: Consolidate Multiple Interviews & Prioritized HMWs

What it does:
Merges multiple interview outputs, ranks themes by frequency and severity, and generates prioritized HMWs for synthesis.

Input:
Array of Prompt 1 outputs.

Output:
JSON (consolidated themes + HMWs + assumptions/conflicts) and Markdown (ranked consolidated themes + HMWs).

ROLE
You are a senior UX research analyst consolidating multiple interview analyses into prioritized, evidence-backed HMWs.

CONTEXT
Organization / Domain: {org_or_product}
Research Goal: {research_goal}
Participant types / segments: {personas_or_segments}
HMW track (optional): {both | design | nondesign} # default both
Lens mapping (optional): {people | process | technology | policy | data | incentives | culture | temporal}

DEFINITIONS
Theme equivalence = merge themes with highly similar meaning; keep the clearest label.
frequency_interviews = number of distinct interviews mentioning the theme (max 1 per interview).
severity = High | Medium | Low (strong negative affect/critical failure/business risk → High).
Evidence cap = ≤3 representative quotes per consolidated theme, each “{interview_id}:{timestamp} ‘≤20w quote’”.
Design tags (if design track used) = {ia, interaction, feedback, copy, a11y, perf}.

LIMITS
MAX_THEMES={10}
MAX_HMW_PER_THEME={3}
TOP_N={5}
Quotes per consolidated theme ≤3; each quote ≤20 words.

PRIORITIZATION / RANKING RULE (apply in order; deterministic)
1) frequency_interviews (desc)
2) severity (High > Med > Low)
3) business_value (High > Med > Low > null)
4) label A→Z (stable tie-break)

TASK
1) Merge & Recount

Merge semantically equivalent themes across interviews; recount frequency_interviews (1 per interview max).

2) Aggregate Evidence

Keep ≤3 representative quotes per theme, formatted “ID:Time ‘≤20w’”.

3) Rank Themes

Apply the ranking rule; add a one-line rationale per theme.

4) Generate HMWs (per track)

Pattern: “How might we [action] for [user/segment] so that [desired outcome]?”
For design track, add tags from {ia, interaction, feedback, copy, a11y, perf}; no UI specifics in text.
For nondesign track, focus on policy/process/technology/data-governance; no UI specifics.
Max {MAX_HMW_PER_THEME} per theme.

5) Output Top N

Return only TOP_N themes and their HMWs.

6) Assumptions or Conflicts

Note gaps, contradictions, or merging ambiguities.

CONSTRAINTS

Do NOT invent quotes, timestamps, interviews, or IDs.
HMWs must be solution-neutral, user+outcome oriented, and feasibly testable in ≤6–8 weeks.
Determinism: low temperature (0–0.2) if available.

SORTING & DETERMINISM

Sort consolidated_themes by computed rank (1..N); ties → label A→Z.
If HMWs are globally ranked, use the same rule; else keep per-theme order.

ACCEPTANCE CRITERIA (must pass)

JSON schema valid; caps respected.
frequency_interviews ≤ total interviews; quotes ≤3 and ≤20w, each with ID:timestamp.
Each HMW tied to a top theme; tags (if present) ∈ allowed set.
No fabricated data; gaps marked as “N/A” where applicable.

VALIDATOR (return PASS or violations JSON)
Ensure: keys present; caps obeyed; quotes/timestamps valid; theme merges consistent; ranking correct.

OUTPUT FORMAT (use exactly this structure)

Output (Part 1 — JSON ONLY)
{
“consolidated_themes”: [
{
“label”: “string”,
“umbrella”: “pain|gain”,
“frequency_interviews”: 0,
“severity”: “High|Medium|Low”,
“evidence”: [
{“interview_id”: “INT-001”, “quote”: “≤20 words”, “timestamp”: “mm:ss-mm:ss”}
],
“lenses”: [“people”,”process”] // optional
}
],
“top_themes”: [
{
“label”: “string”,
“rationale”: “1 line per ranking rule”,
“hows_might_we”: [
{“id”:”H-1″,”text”:”How might we … for … so that …?”}
],
“design_tags”: [[“ia”,”interaction”]], // if track includes design (one array per HMW)
“lenses”: [“people”,”process”] // optional
}
],
“assumptions_or_conflicts”: [“string”]
}

HMWs (for Top {N} themes)
{Theme Label}

HMW 1: How might we … for … so that … ?
Evidence: {ID:Time “Quote”}
Tags/Lenses (optional): {ia, interaction, …} • {people, process, …}
HMW 2: …

Assumptions or Conflicts

{bullets}

INPUT (paste below this line)

Preferred JSON: array of Interview Prompt-1 outputs, e.g.:
[
{
“interview_id”: “INT-001”,
“pain_points”: [{ “bucket”: “…”, “theme”: “…”, “frequency”: 3, “impact”:”High”, “evidence”: [{“quote”:”…”, “timestamp”:”..”}]}],
“gain_points”: [{ “bucket”: “…”, “theme”: “…”, “frequency”: 2, “impact”:”Medium”, “evidence”: [{“quote”:”…”, “timestamp”:”..”}]}],
“notes”: {…}
}
// more interviews…
]

Alternative (Markdown): paste clearly labeled Pain/Gain tables per interview, each preceded by “Interview ID: INT-###”.

Phase 2: Wrap Into an Agent

Agentize only once prompt outputs are consistent, benchmarked, and validated. Operationalize the prompts into agents for long-term reliability through governance, monitoring, and continuous improvement.

Agent = orchestration + memory + routing + validations.

Explore the Agentic workflows below:

Workshop → Problem Framing

Data → Problem Framing

Vague Requirements → Problem Framing

User Interviews → Problem Framing

Outcomes & Learnings

Typical HMW practice (Before)	With AI Problem Framing Bureau (After)
HMWs aren’t traceable	Every HMW is backed by a common evidence spine – quotes, timestamps, survey metrics – all with IDs that you can trace back in seconds.
Votes & opinions > data	Themes are scored and ranked using explicit criteria: evidence strength, impact, business value – so prioritization is defensible.
Inputs are siloed (workshops vs surveys vs interviews)	All inputs are normalised into a shared structure (context + evidence + themes + HMWs), so you can compare across workshops, data, requirements, and interviews.
HMWs default to UI tweaks	Every theme is tagged with strategy lenses (people, process, technology, policy, data, resources, incentives, culture, temporal) before HMWs are framed – forcing a system-level view first.
Non-design problems get dumped on UX	HMWs are split into Design (IA, interaction, copy, feedback, a11y, perf) and Non-design (process, policy, staffing, tooling, incentives), so responsibilities are clear upfront.
Every project starts from scratch	The Bureau creates reusable reports and JSON – clean problem framing docs that can feed reviews, case studies, and future AI agents without rework.

From multiple test runs:

Deterministic structures reduced ambiguity and variance by 60%.
Prompt reasoning transparency improved explainability for non-design stakeholders.
Multi-lens scoring helped balance human empathy with organizational feasibility.

AI emerged not as a creative shortcut, but as a clarity multiplier, transforming noise into structure.

By embedding clarity, evidence, and determinism into our prompts, we teach machines how to reason like disciplined strategists.

When every insight is traceable and every decision grounded in evidence, teams design with confidence and problem framing becomes a repeatable craft.

Using OpenAI prompt optimizer tool

OpenAI offers a dedicated Prompt Optimizer tool for ChatGPT. Strangely, it’s not built directly into the ChatGPT interface but exists as a separate utility:

🔗 https://platform.openai.com/chat/edit?models=gpt-5&optimize=true

The tool is straightforward: you paste your prompt into an input field, and ChatGPT automatically refines it to produce a clearer, more effective version.