Skip to content

LiteLLM extension crashes with run_streamed #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
theimpostor opened this issue Apr 24, 2025 · 3 comments
Closed

LiteLLM extension crashes with run_streamed #587

theimpostor opened this issue Apr 24, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@theimpostor
Copy link

Describe the question

Running the LiteLLM sample from the doc works. However modifying the sample to use the run_streamed API to log responses, tool calls etc results in an exception.

Debug information

  • Agents SDK version: v0.0.12
  • Python version: 3.13

Repro steps

#!/usr/bin/env -S uv run --script

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "openai-agents[litellm]",
# ]
# ///

from __future__ import annotations

import asyncio

from agents import Agent, ItemHelpers, Runner, function_tool, set_tracing_disabled
from agents.extensions.models.litellm_model import LitellmModel

set_tracing_disabled(disabled=True)

@function_tool
def get_weather(city: str):
    print(f"[debug] getting weather for {city}")
    return f"The weather in {city} is sunny."


async def main(model: str, api_key: str):
    agent = Agent(
        name="Assistant",
        instructions="You only respond in haikus.",
        model=LitellmModel(model=model, api_key=api_key),
        tools=[get_weather],
    )

    result = Runner.run_streamed(agent, "What's the weather in Tokyo?")
    async for event in result.stream_events():
        # We'll ignore the raw responses event deltas
        if event.type == "raw_response_event":
            continue
        # When the agent updates, print that
        elif event.type == "agent_updated_stream_event":
            print(f"Agent updated: {event.new_agent.name}")
            continue
        # When items are generated, print them
        elif event.type == "run_item_stream_event":
            if event.item.type == "tool_call_item":
                print(f"-- Tool was called: {event.item.raw_item}")
            elif event.item.type == "tool_call_output_item":
                print(f"-- Tool output: {event.item.output}")
            elif event.item.type == "message_output_item":
                print(f"-- Message output:\n {ItemHelpers.text_message_output(event.item)}")
            else:
                pass  # Ignore other event types

    print(result.final_output)


if __name__ == "__main__":
    # First try to get model/api key from args
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, required=False)
    parser.add_argument("--api-key", type=str, required=False)
    args = parser.parse_args()

    model = args.model
    if not model:
        model = input("Enter a model name for Litellm: ")

    api_key = args.api_key
    if not api_key:
        api_key = input("Enter an API key for Litellm: ")

    asyncio.run(main(model, api_key))
❯ ./demo-streaming.py --api-key=... --model=together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
Agent updated: Assistant
Traceback (most recent call last):
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/pydantic/main.py", line 986, in __getattr__
    return pydantic_extra[item]
           ~~~~~~~~~~~~~~^^^^^^
KeyError: 'usage'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/private/var/folders/ct/x2gct7yn2bxfqs891n8h1dxr0000gn/T/tmp.FRPiXhVbRG/./demo-streaming.py", line 73, in <module>
    asyncio.run(main(model, api_key))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/usr/local/Cellar/[email protected]/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/local/Cellar/[email protected]/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 719, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/private/var/folders/ct/x2gct7yn2bxfqs891n8h1dxr0000gn/T/tmp.FRPiXhVbRG/./demo-streaming.py", line 34, in main
    async for event in result.stream_events():
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<16 lines>...
                pass  # Ignore other event types
                ^^^^
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/result.py", line 191, in stream_events
    raise self._stored_exception
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/run.py", line 560, in _run_streamed_impl
    turn_result = await cls._run_single_turn_streamed(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
    )
    ^
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/run.py", line 671, in _run_single_turn_streamed
    async for event in model.stream_response(
    ...<28 lines>...
        streamed_result._event_queue.put_nowait(RawResponsesStreamEvent(data=event))
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/extensions/models/litellm_model.py", line 167, in stream_response
    async for chunk in ChatCmplStreamHandler.handle_stream(response, stream):
    ...<3 lines>...
            final_response = chunk.response
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/models/chatcmpl_stream_handler.py", line 59, in handle_stream
    usage = chunk.usage
            ^^^^^^^^^^^
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/pydantic/main.py", line 988, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') from exc
AttributeError: 'ModelResponseStream' object has no attribute 'usage'

The sample given in the docs works:

❯ ./demo.py --api-key=... --model=together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
[debug] getting weather for Tokyo
Sakura blooms dance
 Gentle Tokyo spring breeze
Warmth on skin so sweet

Expected behavior

run_streamed should work or there should be some way of streaming events while using LiteLLM models.

@theimpostor theimpostor added the bug Something isn't working label Apr 24, 2025
@DanieleMorotti
Copy link
Contributor

Yes, I noticed that in ChatCmplStreamHandler, it tries to read the usage attribute for each chunk, but it should be present only in the last one if I'm not wrong. And even if you correct that, then it fails in delta.refusal.

I corrected the code cheking if the attributes are present in the delta and then it worked, @rm-openai let me know if you have time to do it otherwise I may open a PR with the corrections and write a simple test for the streaming (if this solution may be good for you)

@rm-openai
Copy link
Collaborator

@DanieleMorotti would love your PR

@devv-shayan
Copy link

thankyou all, waiting for this

rm-openai added a commit that referenced this issue Apr 24, 2025
In response to issue #587 , I implemented a solution to first check if
`refusal` and `usage` attributes exist in the `delta` object.

I added a unit test similar to `test_openai_chatcompletions_stream.py`.

Let me know if I should change something.

---------

Co-authored-by: Rohan Mehta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants