Engineering

Letting AI Answer the Phone

Branching dialogue trees broke under real player behaviour. Here is how a Cloudflare Worker, a streaming LLM, and some careful prompt design turned that into a conversation system that actually works.

Game Development March 21, 2026 7 min read

For a while I was trying to build a system of scripted conversations for a simulation game I am working on. You know the type: branching dialogue, conditional responses, maybe a few state flags depending on what the player asked.

It worked until it did not.

The problem with simulated conversations is that players do not behave like your dialogue tree expects. They ask strange questions, repeat themselves, jump between topics, or misunderstand what is happening entirely. Once you start trying to handle that with traditional branching dialogue, the complexity explodes.

After a few days of wrestling with it I had a thought that felt both obvious and slightly dangerous:

What if I just let an AI handle the conversation instead?

That idea eventually turned into a system that now powers part of a project I have been quietly building behind the scenes. I am not quite ready to talk about the game itself yet, but I can talk about the architecture that made the AI side of it work.

Step one: do not give the internet your API key

My first attempt was the naive one. The client talked directly to the model API.

This works great right up until you remember one small detail: games can be decompiled. If your app ships with an API key baked into it, someone will extract it and start sending requests on your behalf.

So that design lasted roughly five minutes.

The fix was putting a Cloudflare Worker in the middle:

Game Client → Cloudflare Worker → LLM Provider

The worker does three important things.

First, it stores the API keys safely using Worker secrets so the client never sees them. Second, it validates and sanitises incoming requests. Since the game client is ultimately controlled by the player, the worker acts as a gatekeeper to ensure requests are well-formed and safe. Third, it streams the AI response back to the client using server-sent events. Instead of waiting for a full response, the dialogue appears gradually on screen as it is generated. That small detail ended up making conversations feel dramatically more natural.

Cloudflare Workers are a great fit for this because they are cheap, fast, and run right at the edge.

Choosing a model is more complicated than it looks

The next decision was which model to actually run this on.

My initial testing used GPT-5.4-mini. It sits in a really nice spot where it is cheap enough to run continuously in a game loop but still smart enough to follow fairly strict prompt instructions.

For most cases it worked well, but I wanted to see if I could get a bit more elasticity in the responses, slightly less rigid dialogue and something that felt a bit more natural in conversation.

So I experimented with Kimi K2.5, which is priced in roughly the same range and has a reputation for producing more expressive responses. In practice the results were interesting. Sometimes the conversations felt more dynamic, and it definitely had a different tone.

But I quickly ran into a practical issue that had nothing to do with the model quality: debugging.

When you are building an AI gameplay system, being able to inspect exactly what went into a prompt is incredibly important. With the OpenAI platform I can open the dashboard and see the full request history, the prompt, the structured payload, and the model's response. That visibility makes it much easier to track down things like hallucinations, prompt conflicts, or mistakes in the game state being passed to the model.

With Kimi, that kind of observability is not really built in. To achieve the same level of insight you would need to build an LLM tunnel layer that logs requests before forwarding them to the provider. At that point the extra infrastructure started to outweigh the benefits of switching models.

So I ended up back where I started: GPT-5.4-mini. Not because it was necessarily the most expressive model, but because the tooling around it made development dramatically easier. When you are iterating on prompts constantly, being able to see exactly what the model saw turns out to be incredibly valuable.

Getting the AI to behave

A raw LLM is basically a polite assistant. That is great for writing emails. It is terrible for simulation gameplay.

Out of the box it tends to answer too clearly, volunteer too much information, remain overly calm, and try to be helpful in ways that break the game.

So most of the real work ended up being prompt design. Instead of a simple instruction, each interaction sends a structured context payload describing the current state of the situation. Things like emotional tone, known facts, what information is allowed to be revealed, and what the model absolutely should not invent.

Structured Tags One trick that worked particularly well was embedding structured tags inside the AI's dialogue. When the model mentions something important, it wraps that information in a tag the game can parse. The player just sees natural dialogue, but under the hood the game quietly extracts those tagged pieces of information and feeds them into the gameplay systems.

That ended up being the bridge between free-form AI conversation and structured game logic.

The two biggest problems

Latency

Even a couple of seconds feels slow during a conversation. Waiting for a full AI response completely breaks the flow. Switching everything to streaming responses helped enormously. Instead of waiting for the full message, text appears as the model generates it, which feels much closer to a real conversation.

Hallucinations

LLMs love inventing details. Early versions of the system would casually introduce things that were never part of the scenario. Entertaining, but disastrous for game logic.

The fix was adding a strict fact-validation layer inside the prompt instructions. The model only receives a list of allowed facts and is explicitly instructed to avoid referencing anything outside that list. It is not perfect, but it reduced hallucinations dramatically.

Where this is going

The conversation system is now stable enough to build gameplay around, which is exciting. More importantly, it creates a kind of simulation that would be incredibly difficult to achieve with traditional scripted dialogue.

This AI architecture is the backbone of a project I have been quietly working on behind the scenes. I am not quite ready to show the full game yet, but the goal is to build simulations that feel much less scripted and much more reactive to the player.

If it works the way I hope it will, the result should be something that feels far more unpredictable than traditional game dialogue systems.