How We Built a Drop-In AI Support Widget in 3 Weeks

One script tag. A 24/7 AI agent that knows your products, handles tier-1 support, and escalates intelligently. Here's the full stack, the decisions we agonized over, and what broke first.

A client came to us with a simple brief: their support team was drowning. 200+ tickets a day, 80% of them asking the same 15 questions. They'd tried a basic FAQ bot before — the kind that pattern-matches on keywords — and users hated it. They wanted something smarter. We had 3 weeks. Here's what we shipped and how we built it.

The Constraint That Shaped Everything

The client's engineering team was two people maintaining a legacy PHP monolith. There was zero appetite for a new backend service, a new database, or any infrastructure they'd have to own. The deliverable had to be a single script tag — drop it into the site, it works. That constraint killed several ideas immediately: no server-side session storage, no custom authentication flow, no webhooks the client would need to handle. Everything had to live in a system we controlled.

The Stack

Next.js API routes for the widget backend — serverless, no infra to maintain
Supabase for conversation history and escalation state (pgvector for embeddings)
OpenAI text-embedding-3-small for document vectorization
Claude claude-sonnet-4-6 as the reasoning model — better instruction-following than GPT-4o for our eval set
Vanilla JS widget bundle (~18KB gzipped) — no React in the client embed
Cloudflare for edge caching of the widget script itself

Knowledge Ingestion: The Part Nobody Talks About

The AI is only as good as what it knows. We built a lightweight ingestion pipeline: the client pastes a list of URLs (their docs site, FAQ pages, product pages) and we crawl, chunk, and embed them. We settled on 512-token chunks with 10% overlap after testing — smaller chunks preserved more precision in retrieval, but too small and the context was incomplete. We also added a manual override layer: a simple CMS where the client can write 'golden answers' to their top 20 questions. These get pinned to the top of the retrieval results regardless of vector similarity score.

“RAG pipelines fail silently. The model confidently answers from slightly wrong context and the user gets plausible-sounding misinformation. The golden answers layer was the most important thing we built.”

The Escalation Logic

This was the hard part. 'Smart escalation' sounds simple — if the bot can't answer, hand off to a human. In practice: how do you know when the bot can't answer? A low retrieval similarity score is one signal, but not enough. A user expressing frustration is another. A question touching billing, cancellations, or anything with legal exposure should always escalate regardless of confidence.

We ended up with a three-signal system: (1) retrieval confidence below threshold, (2) a secondary LLM classifier call that checks if the user message matches a blocklist of high-stakes topics, and (3) sentiment analysis on the last 3 messages. If two of three signals fire, we escalate. On escalation, the widget creates a support ticket pre-populated with the full conversation transcript and the user's contact info — zero re-typing for the human agent.

What Broke First

Token costs. In testing, we were streaming full page content into context without thinking. A single conversation was hitting 8K tokens in 4 turns. At production volume — 200 tickets/day — that was a real budget problem. The fix: strict context trimming that keeps the 3 most recent exchanges plus the top-3 retrieved chunks, never more. We also added response caching for common questions at the Redis layer. After tuning, cost per conversation landed at $0.003 — under the client's budget target.

The Numbers After 6 Weeks

73% of tickets fully resolved by the AI — no human needed
Average first-response time dropped from 4 hours to under 3 seconds
Human support tickets down from 200/day to 54/day
CSAT score held at 4.2/5 — the same as the previous human-only baseline

The widget is now one of our live demos at widget.easydevs.xyz — fully interactive, no signup. If you want to see the escalation flow, try asking it something it can't answer. The handoff is seamless.

How We Built a Drop-In AI Support Widget in 3 Weeks

The Constraint That Shaped Everything

The Stack

Knowledge Ingestion: The Part Nobody Talks About

The Escalation Logic

What Broke First

The Numbers After 6 Weeks

Let’s build your AI product together.

More articles

How We Built Salesvora: A WhatsApp-First CRM with Automated Follow-Ups

Building Review Easy: AI Sentiment Routing for Local Businesses