A look at the architecture, tech choices, and design thinking behind the Demo Script Generator.
This tool helps sales engineers and pre-sales professionals create structured demo scripts for their products. You provide product context -- a website URL, uploaded files, or a description -- and the LLM generates a script following proven demo frameworks: limbic openings, the 3 Key Ideas structure, and Tell-Show-Tell delivery.
There are actually two LLMs at work here, not one. The agent you chat with handles discovery -- asking questions, running web searches, figuring out your product and audience. When it has enough context, it calls a tool called write_script, which spins up a separate LLM call with its own system prompt specialized purely for script writing.
Why bother? Context window isolation. The script writer gets a clean, focused prompt with just the structured context it needs -- no conversation history, no back-and-forth noise. It produces a better script because it isn't distracted by twelve messages of discovery chat. The parent agent then reviews the output and presents it to you.
The agent uses three middleware layers, and each one exists because something can go wrong without it:
Human-in-the-loop middleware intercepts the write_scripttool call before it executes. You see what the agent is about to generate and can approve, edit the parameters, or reject with feedback. This isn't just a nice UX touch -- it means the expensive script generation LLM call only happens when you actually want it to.
Tool call limit middleware caps write_scriptat 3 calls per session. Without this, a confused agent could enter a revision loop -- writing, deciding it's not good enough, rewriting, repeat. Three is enough for an initial draft plus a couple of refinements.
Model call limit middlewareis the bluntest instrument: 12 LLM calls per thread, then the agent wraps up. I'm funding this as a free demo, so runaway conversations would be a problem. This is the safety net that keeps my OpenAI bill from surprises.
The data flow during generation is one-directional -- server streams tokens to the client. SSE handles that natively over standard HTTP, reconnects automatically, and doesn't fight proxies or load balancers. WebSockets would add complexity for no benefit here.
LangChain and LangGraph are Python-first. I could have squeezed the agent logic into Next.js API routes, but I'd be fighting the ecosystem instead of using it. Python handles orchestration; Next.js handles the UI. They talk over REST + SSE. Simple.
No auth, no database, no user accounts. This is a demo tool -- it should be frictionless. Rate limiting is IP-based to keep it fair without requiring signups. The discovery agent can run Tavily web searches on its own to research your product and industry, so you don't have to spoon-feed it everything.