LiteLLM extension crashes with run_streamed #587

theimpostor · 2025-04-24T12:21:55Z

Describe the question

Running the LiteLLM sample from the doc works. However modifying the sample to use the run_streamed API to log responses, tool calls etc results in an exception.

Debug information

Agents SDK version: v0.0.12
Python version: 3.13

Repro steps

#!/usr/bin/env -S uv run --script

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "openai-agents[litellm]",
# ]
# ///

from __future__ import annotations

import asyncio

from agents import Agent, ItemHelpers, Runner, function_tool, set_tracing_disabled
from agents.extensions.models.litellm_model import LitellmModel

set_tracing_disabled(disabled=True)

@function_tool
def get_weather(city: str):
    print(f"[debug] getting weather for {city}")
    return f"The weather in {city} is sunny."


async def main(model: str, api_key: str):
    agent = Agent(
        name="Assistant",
        instructions="You only respond in haikus.",
        model=LitellmModel(model=model, api_key=api_key),
        tools=[get_weather],
    )

    result = Runner.run_streamed(agent, "What's the weather in Tokyo?")
    async for event in result.stream_events():
        # We'll ignore the raw responses event deltas
        if event.type == "raw_response_event":
            continue
        # When the agent updates, print that
        elif event.type == "agent_updated_stream_event":
            print(f"Agent updated: {event.new_agent.name}")
            continue
        # When items are generated, print them
        elif event.type == "run_item_stream_event":
            if event.item.type == "tool_call_item":
                print(f"-- Tool was called: {event.item.raw_item}")
            elif event.item.type == "tool_call_output_item":
                print(f"-- Tool output: {event.item.output}")
            elif event.item.type == "message_output_item":
                print(f"-- Message output:\n {ItemHelpers.text_message_output(event.item)}")
            else:
                pass  # Ignore other event types

    print(result.final_output)


if __name__ == "__main__":
    # First try to get model/api key from args
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, required=False)
    parser.add_argument("--api-key", type=str, required=False)
    args = parser.parse_args()

    model = args.model
    if not model:
        model = input("Enter a model name for Litellm: ")

    api_key = args.api_key
    if not api_key:
        api_key = input("Enter an API key for Litellm: ")

    asyncio.run(main(model, api_key))

❯ ./demo-streaming.py --api-key=... --model=together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
Agent updated: Assistant
Traceback (most recent call last):
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/pydantic/main.py", line 986, in __getattr__
    return pydantic_extra[item]
           ~~~~~~~~~~~~~~^^^^^^
KeyError: 'usage'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/private/var/folders/ct/x2gct7yn2bxfqs891n8h1dxr0000gn/T/tmp.FRPiXhVbRG/./demo-streaming.py", line 73, in <module>
    asyncio.run(main(model, api_key))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/usr/local/Cellar/[email protected]/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/local/Cellar/[email protected]/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 719, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/private/var/folders/ct/x2gct7yn2bxfqs891n8h1dxr0000gn/T/tmp.FRPiXhVbRG/./demo-streaming.py", line 34, in main
    async for event in result.stream_events():
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<16 lines>...
                pass  # Ignore other event types
                ^^^^
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/result.py", line 191, in stream_events
    raise self._stored_exception
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/run.py", line 560, in _run_streamed_impl
    turn_result = await cls._run_single_turn_streamed(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
    )
    ^
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/run.py", line 671, in _run_single_turn_streamed
    async for event in model.stream_response(
    ...<28 lines>...
        streamed_result._event_queue.put_nowait(RawResponsesStreamEvent(data=event))
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/extensions/models/litellm_model.py", line 167, in stream_response
    async for chunk in ChatCmplStreamHandler.handle_stream(response, stream):
    ...<3 lines>...
            final_response = chunk.response
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/agents/models/chatcmpl_stream_handler.py", line 59, in handle_stream
    usage = chunk.usage
            ^^^^^^^^^^^
  File "/Users/shoda/.cache/uv/environments-v2/demo-streaming-44c9a8b0431f886f/lib/python3.13/site-packages/pydantic/main.py", line 988, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') from exc
AttributeError: 'ModelResponseStream' object has no attribute 'usage'

The sample given in the docs works:

❯ ./demo.py --api-key=... --model=together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
[debug] getting weather for Tokyo
Sakura blooms dance
 Gentle Tokyo spring breeze
Warmth on skin so sweet

Expected behavior

run_streamed should work or there should be some way of streaming events while using LiteLLM models.

The text was updated successfully, but these errors were encountered:

DanieleMorotti · 2025-04-24T14:23:57Z

Yes, I noticed that in ChatCmplStreamHandler, it tries to read the usage attribute for each chunk, but it should be present only in the last one if I'm not wrong. And even if you correct that, then it fails in delta.refusal.

I corrected the code cheking if the attributes are present in the delta and then it worked, @rm-openai let me know if you have time to do it otherwise I may open a PR with the corrections and write a simple test for the streaming (if this solution may be good for you)

rm-openai · 2025-04-24T15:01:37Z

@DanieleMorotti would love your PR

devv-shayan · 2025-04-24T16:14:02Z

thankyou all, waiting for this

In response to issue #587 , I implemented a solution to first check if `refusal` and `usage` attributes exist in the `delta` object. I added a unit test similar to `test_openai_chatcompletions_stream.py`. Let me know if I should change something. --------- Co-authored-by: Rohan Mehta <[email protected]>

theimpostor added the bug Something isn't working label Apr 24, 2025

DanieleMorotti mentioned this issue Apr 24, 2025

Fix stream error using LiteLLM #589

Merged

rm-openai closed this as completed Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiteLLM extension crashes with run_streamed #587

LiteLLM extension crashes with run_streamed #587

theimpostor commented Apr 24, 2025

DanieleMorotti commented Apr 24, 2025

rm-openai commented Apr 24, 2025

devv-shayan commented Apr 24, 2025

LiteLLM extension crashes with run_streamed #587

LiteLLM extension crashes with run_streamed #587

Comments

theimpostor commented Apr 24, 2025

Describe the question

Debug information

Repro steps

Expected behavior

DanieleMorotti commented Apr 24, 2025

rm-openai commented Apr 24, 2025

devv-shayan commented Apr 24, 2025