Skip to content

Reconsider llm.Conversation in favor of allowing prompts to be in reply to responses #938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
simonw opened this issue Apr 20, 2025 · 10 comments

Comments

@simonw
Copy link
Owner

simonw commented Apr 20, 2025

I was originally planning on implementing tools (#898) as part of llm.Conversation but I've been having second thoughts about that.

There's a lot to be said for allowing prompts to reply to other responses, and having those form a chain.

Two big advantages:

  • The in-memory (no SQLite database at all) story for the Python client library is simplified. Users don't have to remember to instantiate a llm.Conversation if they are going to send prompts that follow up on other prompts - particularly useful for implementing tools.
  • Means I can support branching conversations. Maybe you start a conversation with a huge document (which gets into the prompt cache) and then ask multiple questions of that document as separate branches of replies to that initial prompt. Also good for a UI that lets power users really take control of the context in their ongoing conversations.
@simonw simonw added the design label Apr 20, 2025
@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

I'm trying to figure out if this is a blocker for tools or not.

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

If I were to do this here's one potential design:

  • Response objects gain a .reply(prompt) method, which can be used to reply to that response with a fresh prompt and get back a fresh response.
  • The responses database table gains a new nullable reply_to_id column which can be used to build a tree of responses. SQLite has had recursive CTEs for ages to SQL for working with these won't be too hard.
  • I could keep the concept of a conversation around to avoid breaking the existing CLI and databases, but a conversation could actually be a tree rather than a linear chain.
  • Thankfully llm -r is not yet taken, so I can use llm -r/--reply to reply to the most recent message (similar to llm -c/--continue at the moment), and have llm --reply-to ID as the way of replying to a specific response ID.

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

I'm going to do a research spike on this in a branch.

@simonw simonw added the tools label Apr 20, 2025
@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

I think this is the migration:

@migration
def m018_replies(db):
    db["responses"].add_column("reply_to_id", str)
    db["responses"].add_foreign_key("reply_to_id", "responses", "id")
    db["responses"].transform(
        column_order=(
            "id",
            "reply_to_id",
            "model",
            "prompt",
            "system",
            "prompt_id",
            "system_id",
            "schema_id",
            "prompt_json",
            "options_json",
            "response",
            "response_json",
            "conversation_id",
            "first_token_ms",
            "duration_ms",
            "datetime_utc",
            "input_tokens",
            "output_tokens",
            "token_details",
        ),
    )

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

I guess this means the Prompt() class constructor needs to be able to take a reply_to_id= argument, or a reply_to=Response() argument, or both?

Would be neater to just to reply_to=response (which actually usually gets populated from calling response.reply(...)) and have the CLI DB handling code deal with IDs if necessary.

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

Also interesting: currently the .execute() method (including in all the plugins) has this signature:

def execute(self, prompt, stream, response, conversation=None, key=None):
if prompt.system and not self.allows_system_prompt:
raise NotImplementedError("Model does not support system prompts")
messages = self.build_messages(prompt, conversation)
kwargs = self.build_kwargs(prompt, stream)
client = self.get_client(key)

A lot of those then have methods like this one:

def build_messages(self, prompt, conversation):
messages = []
current_system = None
if conversation is not None:
for prev_response in conversation.responses:
if (
prev_response.prompt.system
and prev_response.prompt.system != current_system
):
messages.append(
{"role": "system", "content": prev_response.prompt.system}
)
current_system = prev_response.prompt.system

Note how conversation is a thing passed directly to .execute() which is used to build up that previous messages= array.

In the new reply-to world that won't be necessary. Just being passed the prompt will be enough, since the code can then follow the prompt.reply_to.prompt.reply_to chain all the way to the top in order to rebuild the messages.

This is great news for implementing tools, because it helps solve the thorny problem of keeping the ID from tool call requests so it can be matched up with the IDs in the tool call replies.

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

This change could be a breaking change for existing plugins. That's worth thinking about - it may be possible to keep them working by detecting if their execute() method takes a conversation parameter and behaving differently for plugin models that don't do that.

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

The .execute() signature is a bit of a mess already, perhaps I rename that method to some new name to allow for a fresh design entirely?

Current docs:

## Understanding execute()

Understanding execute()

The full signature of the execute() method is:

def execute(self, prompt, stream, response, conversation):

The prompt argument is a Prompt object that contains the text that the user provided, the system prompt and the provided options.

stream is a boolean that says if the model is being run in streaming mode.

response is the Response object that is being created by the model. This is provided so you can write additional information to response.response_json, which may be logged to the database.

conversation is the Conversation that the prompt is a part of - or None if no conversation was provided. Some models may use conversation.responses to access previous prompts and responses in the conversation and use them to construct a call to the LLM that includes previous context.

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

I'm already reconsidering what .execute() does a bit for annotations in:

@simonw
Copy link
Owner Author

simonw commented Apr 20, 2025

Now that I've built this:

I can try this:

llm -f github:simonw/llm \
  -f issue:simonw/llm/938 \
  -m gemini-2.5-pro-exp-03-25 \
  --system 'muse on this issue, then propose a whole bunch of code to help implement it'

Gemini 2.5 Pro came up with a whole bunch of suggestions, and charged me 66.36 cents.

https://gist.github.com/simonw/a5f0c1e8184f4ddc8b71b30890fe690c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant