Skip to content

Optimize Latency for Parallel Agent Runs with Streaming #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adhishthite opened this issue Apr 14, 2025 · 5 comments
Closed

Optimize Latency for Parallel Agent Runs with Streaming #498

adhishthite opened this issue Apr 14, 2025 · 5 comments
Labels
question Question about using the SDK stale

Comments

@adhishthite
Copy link

Please read this first

  • Have you read the docs?Agents SDK docs -> Yes
  • Have you searched for related issues? Others may have had similar requests -> Yes

Question

I'm implementing a parallel translation pattern in the examples, where multiple agents generate translations simultaneously, and a selection agent chooses the best one. While this approach provides quality benefits, it introduces significant latency in the user experience, especially when streaming responses.

Current Implementation

The current implementation follows this pattern:

async def main():
    msg = input("Enter message for translation")
    
    with trace("Parallel translation"):
        # Run 3 translation agents in parallel (takes ~10s total)
        res_1, res_2, res_3 = await asyncio.gather(
            Runner.run(spanish_agent, msg),
            Runner.run(spanish_agent, msg),
            Runner.run(spanish_agent, msg),
        )
        
        # Collect outputs and combine
        outputs = [
            ItemHelpers.text_message_outputs(res_1.new_items),
            ItemHelpers.text_message_outputs(res_2.new_items),
            ItemHelpers.text_message_outputs(res_3.new_items),
        ]
        translations = "\n\n".join(outputs)
        
        # Run selection agent to pick best (adds more latency)
        best_translation = await Runner.run(
            translation_picker,
            f"Input: {msg}\n\nTranslations:\n{translations}",
        )

Problem Statement

The current workflow creates a significant latency issue in streaming scenarios:

  1. All translation agents must complete execution (taking ~10s in parallel)
  2. Only after all translations are complete can the selection agent begin processing
  3. The UI remains without any output until the selection agent starts streaming
  4. This latency compounds when this is part of a longer agent chain

This implementation leads to poor user experience due to long waiting periods without feedback, especially in complex agent workflows where subsequent agents depend on this translation output.

What's the recommended approach to optimize this pattern for streaming scenarios while maintaining the quality benefits of parallel execution and selection?

@adhishthite adhishthite added the question Question about using the SDK label Apr 14, 2025
@rm-openai
Copy link
Collaborator

I think this is more a product question than a technical one? The common pattern I've seen is to stream updates to the user as the other agents run. i.e. run your parallel agents in a streaming fashion, use the streaming events as a way to deliver progress updates, then show a final response when done.

@adhishthite
Copy link
Author

Thanks @rm-openai .

Do we have an example on such streaming?

@rm-openai
Copy link
Collaborator

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Apr 23, 2025
Copy link

This issue was closed because it has been inactive for 3 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about using the SDK stale
Projects
None yet
Development

No branches or pull requests

2 participants