Real-time AI assistance for customer service agents
This demo showcases a real-time agent assist system that listens to customer conversations and surfaces relevant knowledge, recommended phrasing, and follow-up points — all within an embeddable widget that integrates into any agent desktop (Genesys Cloud, Amazon Connect, etc.). The demo uses a fictional credit card company ("Sakura Card") as the domain.
Pre-built test cases with suggested agent responses. Click "Copy" to copy a line to clipboard, then paste into the Workstation.
Open questions, scenario taxonomy, and next-phase ideas surfaced during MVP development.
Not every customer utterance is a question with a factual answer. Treating all input the same way is the most common failure mode of FAQ-based assistants. The system needs to classify what kind of utterance it just received before deciding what to surface.
| Utterance Type | Example (Japanese) | Agent / AI should… |
|---|---|---|
| Factual query | 海外で使えますか?手数料はいくらですか? | Retrieve FAQ → present specific answer |
| Instruction | カードを止めてください | Trigger operational flow |
| Complaint | 全然つながらなくて、ずっと待たされました | Empathize + apologize + transition to main concern |
| Emotional expression | 1〜2週間も…困るんです。現金もあまり持たない方なので… | Acknowledge feeling + suggest concrete next step. NO FAQ. |
| Small talk | 今日は寒いですね | Polite acknowledgment, redirect to main topic |
MVP currently handles Factual query and Instruction well. Emotional expression and Complaint are the highest-value next investments — they're the moments where human agents prove their worth, and where AI assistance can either help or actively get in the way.
These are questions the demo surfaced but does not resolve. Each represents a real product decision a deploying customer will need to make.
Some customers (and regulators, especially in banking) want the system to ONLY surface verified manual content. Others want flexible empathetic responses even when no FAQ matches. JAI should treat this as a configurable mode per deployment, not a fixed product decision.
Single-turn input is cheap but misses pronouns and follow-ups. Full history is expensive and risks privacy concerns when sent to external LLMs. A rolling window of the last 3 turns plus a running summary is likely the right balance — but this needs validation with real call data.
Three tiers of customer maturity: Tier 1 — customer has Word/PDF manuals only → upload → auto-chunk → auto-index. Tier 2 — customer has structured FAQ in Confluence/SharePoint → connector → scheduled sync. Tier 3 — customer has dynamic data (rates, policies that change weekly) → API push → version control. The platform needs to support all three, and the UX for "how do I know the AI is using the right version" is critical.
The 👍/👎 buttons collect agent satisfaction with the suggestion. But what does a 👎 mean? "Wrong FAQ"? "Wrong phrasing"? "Wrong timing"? Each requires different fixes. The platform needs structured feedback categories, not a single thumbs-down.
Showing a low-confidence card is worse than showing nothing — it trains agents to ignore the panel. The system needs a confidence threshold below which it stays silent. Where to set that threshold is a per-deployment calibration problem.
Initial demo design had the AI generate verbatim agent phrasing (推奨応答). On reflection, this conflates two different products:
A) Phrase generation — AI produces the actual words the agent says. Requires deep industry-specific knowledge (banking compliance, medical liability, legal disclaimers) and per-customer customization of allowed/forbidden language. JAI would need to encode every customer's SOP into prompt logic.
B) Procedural navigation — AI identifies which step of the customer's existing SOP the conversation is on (e.g., "customer is in identity verification phase of card-loss workflow") and surfaces relevant resources. The customer's own SOP, authored in JAI's configuration backend, defines what each step looks like.
(B) is the cleaner boundary. It means: JAI doesn't need to know each industry's compliance rules — the customer encodes them once in their SOP. The same engine works across banking, telecom, insurance, healthcare — only the SOP changes. Liability for incorrect phrasing stays with the customer who authored the SOP, not with JAI. Customers retain control over their core asset: how their agents talk.
This reframing has significant implications for Phase 3 (configuration backend): the primary artifact customers manage is not a flat FAQ list, but a structured SOP — workflow nodes with associated knowledge resources, recommended actions, and step transitions. The FAQ-style retrieval becomes one piece of a broader SOP orchestration system.
MVP demonstrates the core loop. The phases below outline what's needed to move from demo to real customer deployment.
End-to-end loop: customer utterance → AI intent classification → FAQ retrieval → AI-rewritten agent suggestion → display in embedded widget. Four components: Simulator (D), Workstation (B), Assist (C), Worker backend. Single fictional knowledge base (Sakura Card). Mock customer driven by AI persona. Utterance type classification with empathy and clarification modes.
Confidence threshold with silent mode. Conversation history with rolling context window. Real audio input via JAI Speech (replace mock text input). Agent feedback dashboard with structured categories. Multi-knowledge-base support (one customer = multiple verticals). Complaint-specific handling with escalation detection.
Configuration backend: customer admins upload, edit, and version knowledge. Three-tier ingestion (upload / connector / API). Approval workflows and permissions. Audit logs for compliance review. Real Genesys integration via AudioHook + Interaction Widget. Customer trial deployment with UX iteration from real agents. Compliance review for regulated industries (financial services, insurance, healthcare).
Recent demo call logs saved automatically when calls end. Click to view full transcript.
Loading...
Current version: MVP v0.1. End-to-end flow is working: AI customer simulation → FAQ retrieval with AI intent extraction → recommended spoken phrasing for agents with follow-up points.
Granular log of what happened, what broke, what got changed. Continuously updated.