How This Was Built

A look at the architecture, tech choices, and design thinking behind the Demo Script Generator.

What It Does

This tool helps sales engineers and pre-sales professionals create structured demo scripts for their products. You provide product context -- a website URL, uploaded files, or a description -- and the LLM generates a script following proven demo frameworks: limbic openings, the 3 Key Ideas structure, and Tell-Show-Tell delivery.

Architecture

Frontend

Next.js 15 + React 19

App Router, TypeScript, streaming SSE rendering

Backend

Python FastAPI

SSE streaming, LangGraph agent orchestration

AI Layer

LangGraph + OpenAI + Anthropic

Multi-model agent with human-in-the-loop approval

Infrastructure

Docker Compose

Two-service stack, zero-database, stateless deployment

The Agent Architecture

There are actually two LLMs at work here, not one -- and they're from different providers. The orchestrator agent runs on OpenAI's GPT-4.1-mini: fast, cheap, and good enough for routing decisions, asking discovery questions, and calling tools. When it has enough context, it calls a tool called write_script, which hands off to Claude Sonnet -- Anthropic's model -- with its own system prompt specialized purely for script writing.

Why two models from two providers? Each model is matched to its job. The orchestrator does structured reasoning: parse context, decide what to ask next, decide when to search, decide when to write. 4.1-mini handles that well and keeps costs low. The script writer is the deliverable -- it needs to follow a framework (Tell-Show-Tell, limbic opening) while producing natural, compelling prose. Claude is measurably better at that kind of structured creative writing.

Context window isolation matters too. The script writer gets a clean, focused prompt with just the structured context it needs -- no conversation history, no back-and-forth noise. It produces a better script because it isn't distracted by twelve messages of discovery chat.

Middleware as Guardrails

The agent uses three middleware layers, and each one exists because something can go wrong without it:

Human-in-the-loop middleware intercepts the write_scripttool call before it executes. You see what the agent is about to generate and can approve, edit the parameters, or reject with feedback. This isn't just a nice UX touch -- it means the expensive script generation LLM call only happens when you actually want it to.

Tool call limit middleware caps write_scriptat 3 calls per session. Without this, a confused agent could enter a revision loop -- writing, deciding it's not good enough, rewriting, repeat. Three is enough for an initial draft plus a couple of refinements.

Model call limit middlewareis the bluntest instrument: 12 LLM calls per thread, then the agent wraps up. I'm funding this as a free demo, so runaway conversations would be a problem. This is the safety net that keeps my API bills from surprises.

Why SSE Over WebSockets

The data flow during generation is one-directional -- server streams tokens to the client. SSE handles that natively over standard HTTP, reconnects automatically, and doesn't fight proxies or load balancers. WebSockets would add complexity for no benefit here.

Why a Separate Python Backend

LangChain and LangGraph are Python-first. I could have squeezed the agent logic into Next.js API routes, but I'd be fighting the ecosystem instead of using it. Python handles orchestration; Next.js handles the UI. They talk over REST + SSE. Simple.

Other Design Choices

No auth, no database, no user accounts. This is a demo tool -- it should be frictionless. Rate limiting is IP-based to keep it fair without requiring signups. The discovery agent can run Tavily web searches on its own to research your product and industry, so you don't have to spoon-feed it everything.

About the Author

I'm James Dickson, a Principal AI Architect focused on agentic AI and enterprise automation. I built this to demonstrate how LLM-powered tools can support real pre-sales workflows -- not just chatbots, but structured, domain-specific assistants.