JAI Agent Assist

Real-time AI assistance for customer service agents

This demo showcases a real-time agent assist system that listens to customer conversations and surfaces relevant knowledge, recommended phrasing, and follow-up points — all within an embeddable widget that integrates into any agent desktop (Genesys Cloud, Amazon Connect, etc.). The demo uses a fictional credit card company ("Sakura Card") as the domain.

Architecture

D
Customer Simulator
AI-powered customer
Gemini generates turns

postMessage
B
Agent Workstation
Simulated desktop
Agent types responses

iframe
C
JAI Agent Assist
Embeddable widget
JAI's deliverable
POST /api/search
Worker
Backend API
Retrieval + AI orchestration
KV + Gemini 2.5 Flash

Live URLs

Quick Start

2
Configure persona + scenario, click 通話開始
3
When the customer speaks, type a Japanese response in the Workstation's "エージェントの返答" box and click 返答する

Test Scenarios

Pre-built test cases with suggested agent responses. Click "Copy" to copy a line to clipboard, then paste into the Workstation.

Product Thinking & Roadmap

Open questions, scenario taxonomy, and next-phase ideas surfaced during MVP development.

Customer Utterance Taxonomy

Not every customer utterance is a question with a factual answer. Treating all input the same way is the most common failure mode of FAQ-based assistants. The system needs to classify what kind of utterance it just received before deciding what to surface.

Utterance TypeExample (Japanese)Agent / AI should…
Factual query海外で使えますか?手数料はいくらですか?Retrieve FAQ → present specific answer
Instructionカードを止めてくださいTrigger operational flow
Complaint全然つながらなくて、ずっと待たされましたEmpathize + apologize + transition to main concern
Emotional expression1〜2週間も…困るんです。現金もあまり持たない方なので…Acknowledge feeling + suggest concrete next step. NO FAQ.
Small talk今日は寒いですねPolite acknowledgment, redirect to main topic

MVP currently handles Factual query and Instruction well. Emotional expression and Complaint are the highest-value next investments — they're the moments where human agents prove their worth, and where AI assistance can either help or actively get in the way.

Open Design Questions

These are questions the demo surfaced but does not resolve. Each represents a real product decision a deploying customer will need to make.

Q1: Strict retrieval vs. generative empathy — where's the line?

Some customers (and regulators, especially in banking) want the system to ONLY surface verified manual content. Others want flexible empathetic responses even when no FAQ matches. JAI should treat this as a configurable mode per deployment, not a fixed product decision.

Q2: How much conversation context should the AI see?

Single-turn input is cheap but misses pronouns and follow-ups. Full history is expensive and risks privacy concerns when sent to external LLMs. A rolling window of the last 3 turns plus a running summary is likely the right balance — but this needs validation with real call data.

Q3: How does a deploying customer turn their existing knowledge into the knowledge base?

Three tiers of customer maturity: Tier 1 — customer has Word/PDF manuals only → upload → auto-chunk → auto-index. Tier 2 — customer has structured FAQ in Confluence/SharePoint → connector → scheduled sync. Tier 3 — customer has dynamic data (rates, policies that change weekly) → API push → version control. The platform needs to support all three, and the UX for "how do I know the AI is using the right version" is critical.

Q4: What is the unit of feedback?

The 👍/👎 buttons collect agent satisfaction with the suggestion. But what does a 👎 mean? "Wrong FAQ"? "Wrong phrasing"? "Wrong timing"? Each requires different fixes. The platform needs structured feedback categories, not a single thumbs-down.

Q5: When should a card NOT appear?

Showing a low-confidence card is worse than showing nothing — it trains agents to ignore the panel. The system needs a confidence threshold below which it stays silent. Where to set that threshold is a per-deployment calibration problem.

Q6: What is the product boundary — does the AI generate specific phrases for the agent to say, or does it guide which procedural step the agent should be on?

Initial demo design had the AI generate verbatim agent phrasing (推奨応答). On reflection, this conflates two different products:

A) Phrase generation — AI produces the actual words the agent says. Requires deep industry-specific knowledge (banking compliance, medical liability, legal disclaimers) and per-customer customization of allowed/forbidden language. JAI would need to encode every customer's SOP into prompt logic.

B) Procedural navigation — AI identifies which step of the customer's existing SOP the conversation is on (e.g., "customer is in identity verification phase of card-loss workflow") and surfaces relevant resources. The customer's own SOP, authored in JAI's configuration backend, defines what each step looks like.

(B) is the cleaner boundary. It means: JAI doesn't need to know each industry's compliance rules — the customer encodes them once in their SOP. The same engine works across banking, telecom, insurance, healthcare — only the SOP changes. Liability for incorrect phrasing stays with the customer who authored the SOP, not with JAI. Customers retain control over their core asset: how their agents talk.

This reframing has significant implications for Phase 3 (configuration backend): the primary artifact customers manage is not a flat FAQ list, but a structured SOP — workflow nodes with associated knowledge resources, recommended actions, and step transitions. The FAQ-style retrieval becomes one piece of a broader SOP orchestration system.

Phased Roadmap

MVP demonstrates the core loop. The phases below outline what's needed to move from demo to real customer deployment.

Phase 1 — Current MVP current

End-to-end loop: customer utterance → AI intent classification → FAQ retrieval → AI-rewritten agent suggestion → display in embedded widget. Four components: Simulator (D), Workstation (B), Assist (C), Worker backend. Single fictional knowledge base (Sakura Card). Mock customer driven by AI persona. Utterance type classification with empathy and clarification modes.

Phase 2 — Production-readiness planned · 2-3 months engineering

Confidence threshold with silent mode. Conversation history with rolling context window. Real audio input via JAI Speech (replace mock text input). Agent feedback dashboard with structured categories. Multi-knowledge-base support (one customer = multiple verticals). Complaint-specific handling with escalation detection.

Phase 3 — Customer-facing platform future · 4-6 months including non-engineering

Configuration backend: customer admins upload, edit, and version knowledge. Three-tier ingestion (upload / connector / API). Approval workflows and permissions. Audit logs for compliance review. Real Genesys integration via AudioHook + Interaction Widget. Customer trial deployment with UX iteration from real agents. Compliance review for regulated industries (financial services, insurance, healthcare).

Past Conversations

Recent demo call logs saved automatically when calls end. Click to view full transcript.

Loading...

Tech Stack

Frontend Cloudflare Pages, vanilla HTML/CSS/JS
Backend Cloudflare Workers
Storage Cloudflare KV
AI Google Gemini 2.5 Flash
Version Control GitHub (private)

Project Status

Current version: MVP v0.1. End-to-end flow is working: AI customer simulation → FAQ retrieval with AI intent extraction → recommended spoken phrasing for agents with follow-up points.

Next steps: visual polish, feedback dashboard, multi-knowledge-base support, Genesys Cloud integration, analytics.

Project Diary

Granular log of what happened, what broke, what got changed. Continuously updated.

v0.1.17
Hub review pass: relabeled Worker, fixed duplicate period, manual mode now saves logs
  • Worker was labeled "A" but A is reserved for the Phase 3 configuration backend → relabeled as "W" / "Worker" without letter
  • Low-confidence card showed duplicate period (です。。例えば) — fixed by stripping trailing period from AI fragment before template concatenation
  • Manual mode in Simulator was saving call logs but missing "mode" field → added mode: "ai" | "manual" to saved logs with proper defaults for missing persona/scenario fields
  • Past Conversations: bumped cache-bust to v=3, updated empty state message to Japanese
v0.1.16
Reframed product boundary: procedural navigation, not phrase generation
  • Spent earlier discussion debating what specific phrasing AI should suggest (e.g., should agent ask for card number? for date of birth?). Realized this debate had no end — every customer company has different SOPs, JAI can't enumerate them
  • Pivot: AI's job is to identify which SOP step the conversation is on, not to author specific agent phrases
  • Implications for Phase 3 platform design: configuration backend manages structured SOPs (workflow + knowledge), not flat FAQs
  • Implications for C widget: should display "current step + next step" navigation, not "verbatim phrase to say"
  • This is a more defensible product boundary — JAI doesn't need industry-specific compliance knowledge, doesn't generate liable phrasing
v0.1.15
Added Open Design Q on confidence disclosure
  • Recent low-confidence card safety bug surfaced a deeper design question: how should the system communicate AI uncertainty without either being unsafe (silent confident wrongness) or breaking trust (constant disclaimers)
  • Added as Q6 in Open Design Questions section
v0.1.14
Fixed popup blocker + low-confidence card safety
  • Manual Mode "Workstationを開く" was being blocked by browser popup blocker → switched to user manually opens Workstation tab + BroadcastChannel connection with status indicator
  • Low-confidence card was visually identical to recommended response cards → agent could mistakenly read AI's meta-hint to customer on live call
  • Redesigned warning cards: amber dashed border, "⚠️ 確認が必要です" header, italic disclaimer "このメッセージは案内ではなく、エージェントへのヒントです", plus AI-generated example clarification questions
  • Same redesign applied to both low_confidence and unclear/clarification card types
v0.1.13
Added manual mode to Simulator for controlled testing
  • AI-driven Simulator can't reproduce the same utterance twice, making bug regression testing hard
  • Workstation had its manual input removed earlier (to avoid two overlapping input boxes), so there was no way to inject precise customer messages
  • Decision: keep Workstation clean (agent-side only); add a mode toggle in Simulator (AI Mode / Manual Mode)
  • Manual Mode bypasses AI generation and lets me type exact customer messages — useful for debugging context bugs and rehearsing demos
v0.1.12
Wrong FAQ surfaced due to missing conversation context
  • 70代女性 card-loss scenario, turn 3 customer asks: "あの、すぐに、ですか?情報とか必要ですよね?"
  • AI surfaced ETCカード FAQ with recommended response about card application — completely wrong
  • Root cause: /api/search saw only current utterance, not conversation history. Keywords すぐに / 必要 / 情報 matched ETC FAQ by accident
  • Fixes: pass last 3 turns of history to /api/search and extractIntent prompt; add confidence threshold (score < 4 → show 低信頼度 card instead of wrong FAQ); add expandable "AIが使った文脈" debug link on cards
  • After fix: same query correctly identified as "カード停止手続きの即時性と必要情報に関する確認", no ETC card surfaced
v0.1.11
Discovered call log save is silently failing
  • Past Conversations section exists in Hub UI but stays empty after running calls
  • Investigating: save endpoint may not be wired, may not trigger on all end paths, or KV namespace may be missing
  • Diagnosed: Worker API and KV both working correctly — 3 logs existed. Root cause was browser caching a stale app.js that predated the loadCallLogs code
  • Fixed by adding cache-bust query string to script tag and redeploying
v0.1.10
Customer utterance taxonomy surfaced
  • Tested Sakura Card 70代女性 scenario: agent told customer card reissue takes 1-2 weeks
  • Customer responded with anxious self-talk: "現金もあまり持たない方なので…"
  • JAI matched keywords (買い物 / 支払い / 現金不要) and surfaced リボ払い FAQ — totally wrong
  • Realized: the system treats all utterances as questions. Identified 5 utterance types: factual query / instruction / complaint / emotional expression / small talk
  • Implemented intent_type classification in extractIntent. emotional_expression now returns empathy card instead of FAQ. unclear returns clarification card.
v0.1.9
Redesigned card content from action verbs to spoken phrasing
  • Initial AI output for agent suggestions used verbs: "利用停止を案内する", "海外利用可否を確認する"
  • During testing, realized agent has nothing to actually SAY — only a list of actions
  • Changed prompt to produce: (1) "推奨応答" — one full Japanese sentence agent can read aloud, (2) "補足ポイント" — short noun phrases for follow-up topics
  • Card UI restructured: 推奨応答 dominant at top, 補足 below in smaller font, FAQ source collapsed
v0.1.8
Gemini spend cap hit, investigated cost
  • Spend dashboard showed over-cap, all Gemini calls returning 429
  • Two root causes found: (1) Simulator was auto-generating customer messages on a timer even with no agent response → runaway loop, (2) each /api/search call triggered 4 Gemini requests (1 intent + 3 instruction rewrites)
  • Fixes: disabled auto-mode in Simulator; reduced to 2 Gemini calls per turn (intent + top card only); cards 2-3 show raw FAQ without AI rewrite
  • Added KV-based response caching with 1-hour TTL and wrangler tail logging for every Gemini call
v0.1.7
Made Simulator conversational
  • Initial Simulator generated one customer message per click, no awareness of agent replies
  • Added bidirectional messaging: agent types in B → posted back to D
  • D maintains full conversation history; passes it to Gemini on each new turn
  • Updated Gemini prompt to track history and decide call_should_end naturally
  • AI customer now ends call with ありがとうございました when concern is addressed
v0.1.6
Confirmed real Genesys not available, switched strategy
  • Tried Genesys free trial signup — confirmation email never arrived
  • Investigated: Genesys uses sales-qualified trial flow, not self-serve developer signup
  • Confirmed real developer sandboxes require enterprise subscription or AppFoundry Partner status
  • Decided: build complete simulation environment instead of waiting for real Genesys access
v0.1.5
Split frontend into Workstation (B) + Assist iframe (C)
  • Original setup: single page with everything mixed together
  • Refactored to two independently deployed Cloudflare Pages: workstation/ and assist/
  • Communication via window.postMessage (CUSTOMER_MESSAGE events)
  • Reason: future Genesys deployment will embed C directly into Interaction Widget; B is replaced by Genesys itself
v0.1.4
Knowledge base finalized as Sakura Card
  • Iterations: started with Seven Bank (too narrow) → NTT docomo (broader) → Yamato delivery (not phone-driven) → settled on credit card (high phone-support volume, urgent scenarios)
  • Final source: real Rakuten Card public FAQ, anonymized as "Sakura Card" / さくらポイント / Sakura App
  • 10 FAQs uploaded to Cloudflare KV: 紛失盗難, 不正利用, 限度額, 引き落とし, リボ, ポイント, 海外利用, ETC, 家族カード, 明細
v0.1.3
Added AI layer (Gemini)
  • Two integration points: extractIntent() for keywords/urgency, generateAgentInstruction() for agent phrasing
  • Both call Gemini gemini-2.5-flash; API key managed via wrangler secret
  • Test: "財布をどこかに置いてきちゃって、止めたいんですが" (no カード keyword) → correctly retrieved faq-001 via AI-extracted keywords
v0.1.2
Built basic /api/search endpoint
  • Worker fetches all 10 FAQs from KV, scores each against query
  • Scoring: keyword hit +3, question hit +2, category hit +1, simple substring matching
  • Returns top 3 results with CORS headers
  • Frontend renders results as cards: category badge, inferred question, answer, source, feedback buttons
v0.1.1
Knowledge base seeded
  • Created data/faqs.json with 10 entries: id, category, question (spoken Japanese), answer, source, keywords, source_url
  • KV namespace FAQS created; all 10 keys uploaded via scripts/upload-faqs.js
v0.1.0
Project scaffolding deployed
  • npm create cloudflare@latest with Hello World Worker template, JavaScript, git enabled
  • Pushed to GitHub: yiwenshi-cmyk/agent-assist-demo (private)
  • Worker deployed to agent-assist-demo.shiyw1027.workers.dev
v0.0.3
Workflow setup
  • Started using Claude Code as executor; main Claude as planner/PM partner
  • Established pattern: Claude gives instructions in copyable blocks → user pastes to Claude Code → reports results back
v0.0.2
Genesys account signup attempted
  • Tried Genesys Developer Center — not a signup page, just docs
  • Tried free trial form — submitted but no confirmation email arrived
  • Found Genesys Cloud CX pricing is enterprise only — not suitable for personal demo
v0.0.1
Initial framing
  • Started from Channel squad RFC: Agent Assist for call centers, modeled on Google Agent Assist x Genesys
  • Decided to build a personal-account demo to validate the core hypothesis end-to-end before pushing for company resources
  • Initial scope: real Genesys integration → quickly revised after seeing access constraints