JAI Agent Assist

Real-time AI assistance for customer service agents

This demo showcases a real-time agent assist system that listens to customer conversations and surfaces relevant knowledge, recommended phrasing, and follow-up points — all within an embeddable widget that integrates into any agent desktop (Genesys Cloud, Amazon Connect, etc.). The demo uses a fictional credit card company ("Sakura Card") as the domain.

Architecture

Customer Simulator

AI-powered customer
Gemini generates turns

→
postMessage

Agent Workstation

Simulated desktop
Agent types responses

→
iframe

JAI Agent Assist

Embeddable widget
JAI's deliverable

↓

POST /api/search

Worker

Backend API

Retrieval + AI orchestration
KV + Gemini 2.5 Flash

Live URLs

Customer Simulator

Configure AI customer persona, drives the demo conversation

↗

Agent Workstation

Simulates a Genesys-like agent desktop, hosts the Assist widget

↗

JAI Agent Assist

The embeddable widget — this is the actual JAI product

↗

Worker API

Backend: retrieval + AI orchestration

↗

⌘

GitHub Source

Private repo — yiwenshi-cmyk/agent-assist-demo

↗

Quick Start

Open Customer Simulator (D)

Configure persona + scenario, click 通話開始

When the customer speaks, type a Japanese response in the Workstation's "エージェントの返答" box and click 返答する

Test Scenarios

Pre-built test cases with suggested agent responses. Click "Copy" to copy a line to clipboard, then paste into the Workstation.

Product Thinking & Roadmap

Open questions, scenario taxonomy, and next-phase ideas surfaced during MVP development.

Customer Utterance Taxonomy

Not every customer utterance is a question with a factual answer. Treating all input the same way is the most common failure mode of FAQ-based assistants. The system needs to classify what kind of utterance it just received before deciding what to surface.

Utterance Type	Example (Japanese)	Agent / AI should…
Factual query	海外で使えますか？手数料はいくらですか？	Retrieve FAQ → present specific answer
Instruction	カードを止めてください	Trigger operational flow
Complaint	全然つながらなくて、ずっと待たされました	Empathize + apologize + transition to main concern
Emotional expression	1〜2週間も…困るんです。現金もあまり持たない方なので…	Acknowledge feeling + suggest concrete next step. NO FAQ.
Small talk	今日は寒いですね	Polite acknowledgment, redirect to main topic

MVP currently handles Factual query and Instruction well. Emotional expression and Complaint are the highest-value next investments — they're the moments where human agents prove their worth, and where AI assistance can either help or actively get in the way.

Open Design Questions

These are questions the demo surfaced but does not resolve. Each represents a real product decision a deploying customer will need to make.

Q1: Strict retrieval vs. generative empathy — where's the line?

Some customers (and regulators, especially in banking) want the system to ONLY surface verified manual content. Others want flexible empathetic responses even when no FAQ matches. JAI should treat this as a configurable mode per deployment, not a fixed product decision.

Q2: How much conversation context should the AI see?

Single-turn input is cheap but misses pronouns and follow-ups. Full history is expensive and risks privacy concerns when sent to external LLMs. A rolling window of the last 3 turns plus a running summary is likely the right balance — but this needs validation with real call data.

Q3: How does a deploying customer turn their existing knowledge into the knowledge base?

Three tiers of customer maturity: Tier 1 — customer has Word/PDF manuals only → upload → auto-chunk → auto-index. Tier 2 — customer has structured FAQ in Confluence/SharePoint → connector → scheduled sync. Tier 3 — customer has dynamic data (rates, policies that change weekly) → API push → version control. The platform needs to support all three, and the UX for "how do I know the AI is using the right version" is critical.

Q4: What is the unit of feedback?

The 👍/👎 buttons collect agent satisfaction with the suggestion. But what does a 👎 mean? "Wrong FAQ"? "Wrong phrasing"? "Wrong timing"? Each requires different fixes. The platform needs structured feedback categories, not a single thumbs-down.

Q5: When should a card NOT appear?

Showing a low-confidence card is worse than showing nothing — it trains agents to ignore the panel. The system needs a confidence threshold below which it stays silent. Where to set that threshold is a per-deployment calibration problem.

Q6: What is the product boundary — does the AI generate specific phrases for the agent to say, or does it guide which procedural step the agent should be on?

Initial demo design had the AI generate verbatim agent phrasing (推奨応答). On reflection, this conflates two different products:

A) Phrase generation — AI produces the actual words the agent says. Requires deep industry-specific knowledge (banking compliance, medical liability, legal disclaimers) and per-customer customization of allowed/forbidden language. JAI would need to encode every customer's SOP into prompt logic.

B) Procedural navigation — AI identifies which step of the customer's existing SOP the conversation is on (e.g., "customer is in identity verification phase of card-loss workflow") and surfaces relevant resources. The customer's own SOP, authored in JAI's configuration backend, defines what each step looks like.

(B) is the cleaner boundary. It means: JAI doesn't need to know each industry's compliance rules — the customer encodes them once in their SOP. The same engine works across banking, telecom, insurance, healthcare — only the SOP changes. Liability for incorrect phrasing stays with the customer who authored the SOP, not with JAI. Customers retain control over their core asset: how their agents talk.

This reframing has significant implications for Phase 3 (configuration backend): the primary artifact customers manage is not a flat FAQ list, but a structured SOP — workflow nodes with associated knowledge resources, recommended actions, and step transitions. The FAQ-style retrieval becomes one piece of a broader SOP orchestration system.

Phased Roadmap

MVP demonstrates the core loop. The phases below outline what's needed to move from demo to real customer deployment.

Phase 1 — Current MVP current

End-to-end loop: customer utterance → AI intent classification → FAQ retrieval → AI-rewritten agent suggestion → display in embedded widget. Four components: Simulator (D), Workstation (B), Assist (C), Worker backend. Single fictional knowledge base (Sakura Card). Mock customer driven by AI persona. Utterance type classification with empathy and clarification modes.

Phase 2 — Production-readiness planned · 2-3 months engineering

Confidence threshold with silent mode. Conversation history with rolling context window. Real audio input via JAI Speech (replace mock text input). Agent feedback dashboard with structured categories. Multi-knowledge-base support (one customer = multiple verticals). Complaint-specific handling with escalation detection.

Phase 3 — Customer-facing platform future · 4-6 months including non-engineering

Configuration backend: customer admins upload, edit, and version knowledge. Three-tier ingestion (upload / connector / API). Approval workflows and permissions. Audit logs for compliance review. Real Genesys integration via AudioHook + Interaction Widget. Customer trial deployment with UX iteration from real agents. Compliance review for regulated industries (financial services, insurance, healthcare).

Past Conversations

Recent demo call logs saved automatically when calls end. Click to view full transcript.

Tech Stack

Frontend Cloudflare Pages, vanilla HTML/CSS/JS

Backend Cloudflare Workers

Storage Cloudflare KV

AI Google Gemini 2.5 Flash

Version Control GitHub (private)

Project Status

Current version: MVP v0.1. End-to-end flow is working: AI customer simulation → FAQ retrieval with AI intent extraction → recommended spoken phrasing for agents with follow-up points.

Next steps: visual polish, feedback dashboard, multi-knowledge-base support, Genesys Cloud integration, analytics.

Project Diary

Granular log of what happened, what broke, what got changed. Continuously updated.

v0.1.17

Hub review pass: relabeled Worker, fixed duplicate period, manual mode now saves logs

Worker was labeled "A" but A is reserved for the Phase 3 configuration backend → relabeled as "W" / "Worker" without letter
Low-confidence card showed duplicate period (です。。例えば) — fixed by stripping trailing period from AI fragment before template concatenation
Manual mode in Simulator was saving call logs but missing "mode" field → added mode: "ai" | "manual" to saved logs with proper defaults for missing persona/scenario fields
Past Conversations: bumped cache-bust to v=3, updated empty state message to Japanese

v0.1.16

Reframed product boundary: procedural navigation, not phrase generation

Spent earlier discussion debating what specific phrasing AI should suggest (e.g., should agent ask for card number? for date of birth?). Realized this debate had no end — every customer company has different SOPs, JAI can't enumerate them
Pivot: AI's job is to identify which SOP step the conversation is on, not to author specific agent phrases
Implications for Phase 3 platform design: configuration backend manages structured SOPs (workflow + knowledge), not flat FAQs
Implications for C widget: should display "current step + next step" navigation, not "verbatim phrase to say"
This is a more defensible product boundary — JAI doesn't need industry-specific compliance knowledge, doesn't generate liable phrasing

v0.1.15

Added Open Design Q on confidence disclosure

Recent low-confidence card safety bug surfaced a deeper design question: how should the system communicate AI uncertainty without either being unsafe (silent confident wrongness) or breaking trust (constant disclaimers)
Added as Q6 in Open Design Questions section

v0.1.14

Fixed popup blocker + low-confidence card safety

Manual Mode "Workstationを開く" was being blocked by browser popup blocker → switched to user manually opens Workstation tab + BroadcastChannel connection with status indicator
Low-confidence card was visually identical to recommended response cards → agent could mistakenly read AI's meta-hint to customer on live call
Redesigned warning cards: amber dashed border, "⚠️ 確認が必要です" header, italic disclaimer "このメッセージは案内ではなく、エージェントへのヒントです", plus AI-generated example clarification questions
Same redesign applied to both low_confidence and unclear/clarification card types

v0.1.13

Added manual mode to Simulator for controlled testing

AI-driven Simulator can't reproduce the same utterance twice, making bug regression testing hard
Workstation had its manual input removed earlier (to avoid two overlapping input boxes), so there was no way to inject precise customer messages
Decision: keep Workstation clean (agent-side only); add a mode toggle in Simulator (AI Mode / Manual Mode)
Manual Mode bypasses AI generation and lets me type exact customer messages — useful for debugging context bugs and rehearsing demos

v0.1.12

Wrong FAQ surfaced due to missing conversation context

70代女性 card-loss scenario, turn 3 customer asks: "あの、すぐに、ですか？情報とか必要ですよね？"
AI surfaced ETCカード FAQ with recommended response about card application — completely wrong
Root cause: /api/search saw only current utterance, not conversation history. Keywords すぐに / 必要 / 情報 matched ETC FAQ by accident
Fixes: pass last 3 turns of history to /api/search and extractIntent prompt; add confidence threshold (score < 4 → show 低信頼度 card instead of wrong FAQ); add expandable "AIが使った文脈" debug link on cards
After fix: same query correctly identified as "カード停止手続きの即時性と必要情報に関する確認", no ETC card surfaced

v0.1.11

Discovered call log save is silently failing

Past Conversations section exists in Hub UI but stays empty after running calls
Investigating: save endpoint may not be wired, may not trigger on all end paths, or KV namespace may be missing
Diagnosed: Worker API and KV both working correctly — 3 logs existed. Root cause was browser caching a stale app.js that predated the loadCallLogs code
Fixed by adding cache-bust query string to script tag and redeploying

v0.1.10

Customer utterance taxonomy surfaced

Tested Sakura Card 70代女性 scenario: agent told customer card reissue takes 1-2 weeks
Customer responded with anxious self-talk: "現金もあまり持たない方なので…"
JAI matched keywords (買い物 / 支払い / 現金不要) and surfaced リボ払い FAQ — totally wrong
Realized: the system treats all utterances as questions. Identified 5 utterance types: factual query / instruction / complaint / emotional expression / small talk
Implemented intent_type classification in extractIntent. emotional_expression now returns empathy card instead of FAQ. unclear returns clarification card.

v0.1.9

Redesigned card content from action verbs to spoken phrasing

Initial AI output for agent suggestions used verbs: "利用停止を案内する", "海外利用可否を確認する"
During testing, realized agent has nothing to actually SAY — only a list of actions
Changed prompt to produce: (1) "推奨応答" — one full Japanese sentence agent can read aloud, (2) "補足ポイント" — short noun phrases for follow-up topics
Card UI restructured: 推奨応答 dominant at top, 補足 below in smaller font, FAQ source collapsed

v0.1.8

Gemini spend cap hit, investigated cost

Spend dashboard showed over-cap, all Gemini calls returning 429
Two root causes found: (1) Simulator was auto-generating customer messages on a timer even with no agent response → runaway loop, (2) each /api/search call triggered 4 Gemini requests (1 intent + 3 instruction rewrites)
Fixes: disabled auto-mode in Simulator; reduced to 2 Gemini calls per turn (intent + top card only); cards 2-3 show raw FAQ without AI rewrite
Added KV-based response caching with 1-hour TTL and wrangler tail logging for every Gemini call

v0.1.7

Made Simulator conversational

Initial Simulator generated one customer message per click, no awareness of agent replies
Added bidirectional messaging: agent types in B → posted back to D
D maintains full conversation history; passes it to Gemini on each new turn
Updated Gemini prompt to track history and decide call_should_end naturally
AI customer now ends call with ありがとうございました when concern is addressed

v0.1.6

Confirmed real Genesys not available, switched strategy

Tried Genesys free trial signup — confirmation email never arrived
Investigated: Genesys uses sales-qualified trial flow, not self-serve developer signup
Confirmed real developer sandboxes require enterprise subscription or AppFoundry Partner status
Decided: build complete simulation environment instead of waiting for real Genesys access

v0.1.5

Split frontend into Workstation (B) + Assist iframe (C)

Original setup: single page with everything mixed together
Refactored to two independently deployed Cloudflare Pages: workstation/ and assist/
Communication via window.postMessage (CUSTOMER_MESSAGE events)
Reason: future Genesys deployment will embed C directly into Interaction Widget; B is replaced by Genesys itself

v0.1.4

Knowledge base finalized as Sakura Card

Iterations: started with Seven Bank (too narrow) → NTT docomo (broader) → Yamato delivery (not phone-driven) → settled on credit card (high phone-support volume, urgent scenarios)
Final source: real Rakuten Card public FAQ, anonymized as "Sakura Card" / さくらポイント / Sakura App
10 FAQs uploaded to Cloudflare KV: 紛失盗難, 不正利用, 限度額, 引き落とし, リボ, ポイント, 海外利用, ETC, 家族カード, 明細

v0.1.3

Added AI layer (Gemini)

Two integration points: extractIntent() for keywords/urgency, generateAgentInstruction() for agent phrasing
Both call Gemini gemini-2.5-flash; API key managed via wrangler secret
Test: "財布をどこかに置いてきちゃって、止めたいんですが" (no カード keyword) → correctly retrieved faq-001 via AI-extracted keywords

v0.1.2

Built basic /api/search endpoint

Worker fetches all 10 FAQs from KV, scores each against query
Scoring: keyword hit +3, question hit +2, category hit +1, simple substring matching
Returns top 3 results with CORS headers
Frontend renders results as cards: category badge, inferred question, answer, source, feedback buttons

v0.1.1

Knowledge base seeded

Created data/faqs.json with 10 entries: id, category, question (spoken Japanese), answer, source, keywords, source_url
KV namespace FAQS created; all 10 keys uploaded via scripts/upload-faqs.js

v0.1.0

Project scaffolding deployed

npm create cloudflare@latest with Hello World Worker template, JavaScript, git enabled
Pushed to GitHub: yiwenshi-cmyk/agent-assist-demo (private)
Worker deployed to agent-assist-demo.shiyw1027.workers.dev

v0.0.3

Workflow setup

Started using Claude Code as executor; main Claude as planner/PM partner
Established pattern: Claude gives instructions in copyable blocks → user pastes to Claude Code → reports results back

v0.0.2

Genesys account signup attempted

Tried Genesys Developer Center — not a signup page, just docs
Tried free trial form — submitted but no confirmation email arrived
Found Genesys Cloud CX pricing is enterprise only — not suitable for personal demo

v0.0.1

Initial framing

Started from Channel squad RFC: Agent Assist for call centers, modeled on Google Agent Assist x Genesys
Decided to build a personal-account demo to validate the core hypothesis end-to-end before pushing for company resources
Initial scope: real Genesys integration → quickly revised after seeing access constraints