Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT for Feedback - Support for token streaming for more dynamic UX #4443

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

jspv
Copy link
Contributor

@jspv jspv commented Dec 1, 2024

Why are these changes needed?

ChatCompletionClient nicely supports token level streaming via create_stream, but this method is currently not accessible in the AssistantAgent. This proposed change adds an option to pass a token_callback when instantiating AssistantAgent, if provided:

  1. create_stream will be leveraged instead of create when calling on_messages_stream
  2. the provided callback will be called with the returned token as the argument.

This will allow the calling application access to the returned tokens real-time. Nothing else is changed, the normal returns to on_messages_streams are not affected.

Example:
streaming_tokens

If folks feel this a good idea, I will make appropriate updates in documentation and tests.

Related issue number

Checks

@jspv
Copy link
Contributor Author

jspv commented Dec 1, 2024 via email

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 2, 2024

@jspv thanks for the PR! We have an issue to track streaming output from agents: #3862 and #3983. The general idea is to stream partial messages through the async iterator from on_messages_stream. Do you think that approach can meet your need? We haven't started working on that yet.

@jspv
Copy link
Contributor Author

jspv commented Dec 4, 2024

@jspv thanks for the PR! We have an issue to track streaming output from agents: #3862 and #3983. The general idea is to stream partial messages through the async iterator from on_messages_stream. Do you think that approach can meet your need? We haven't started working on that yet.

Yes, that would work. I considered that approach but as it be a potentially breaking change, I avoided it for my testing. Happy to take a stab at it. The callers pulling from the iterator would need to be able to differentiate between tokens being returned vs. other types of messages coming back (tool call, etc.); it could be as simple as type str for tokens and Response for non-Tokens, or are you thinking a different response type for Tokens?

As callers of on_messages_stream may want the full results (call for tools, etc.) streamed back but not the tokens, how it currently works with the underlying client call model_client.create() vs. model_client.create_stream()), there would need to be a way to signal to on_messages_stream that token streaming is desired. e.g. stream_tokens = True to on_messages_stream?

@jspv
Copy link
Contributor Author

jspv commented Dec 4, 2024

Thinking more on this. An advantage of the callback model vs. async iterator is that it works perfectly when invoking group chats, E.g. RoundRobinGroupChat and await self.agent_team.run(task=message). This way the desire for streaming is indicated part of the agent's instantiation and I can choose which agents I wish to receiving streamed tokens from and which ones I do not. When I call agent_team.run(task = message I get only the tokens I'm interested in via the callbacks. This already works with my minimal code; using the on_messages iterator would require a lot of rework for team/group chats.

What I think would make sense is to accept a list of callbacks (Langchain does this) on Agent instantiation, or alternatively create methods for registering and removing callbacks from the agent. If there are callbacks listed, the agent will use create_stream in on_messages_stream instead of create, and will call the callbacks on returned tokens with a structure that has the token and the calling agent.

Effectively this cleanly separates streamed tokens into their own path for getting to the UI for those who want them (as I really think this is only a UI need), and leaves all the 'normal' paths for group chats and inter-agent communications.

Thoughts?

@ekzhu ekzhu requested a review from Copilot December 6, 2024 07:58

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (4)

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:260

  • Ensure that the token_callback is an async function before using await. Add a check to verify if the token_callback is an async function.
if self._token_callback is not None:

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:188

  • [nitpick] The error message should be more informative. Suggestion: 'The model does not support function calling, which is required for the provided tools.'
raise ValueError("The model does not support function calling.")

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:271

  • [nitpick] The error message should be more informative. Suggestion: 'Unsupported tool type provided. Expected Tool or callable, but got {type(tool)}.'
raise ValueError(f"Unsupported tool type: {type(tool)}")

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:330

  • [nitpick] The error message should be more informative. Suggestion: 'Unsupported handoff type provided. Expected HandoffBase or str, but got {type(handoff)}.'
raise ValueError(f"Unsupported handoff type: {type(handoff)}")
@ekzhu
Copy link
Collaborator

ekzhu commented Dec 9, 2024

@jspv since this feature is targeting 0.4.1, do you want to join our discord channel so we can discuss? https://aka.ms/autogen-discord

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 18, 2024

@jspv would you like to join our community office hours to discuss the changes you proposed here? See #4059

@jackgerrits
Copy link
Member

jackgerrits commented Dec 19, 2024

@jspv Agent output should go via the runtime (message publishing). The reason this is important is so that cross process communication works as expected. The callback approach will only work in a single process.

While agentchat is only currently single process, we are expanding it to work with the same distributed expectations of core in an upcoming release. So, we will likely get to tackling partial message streaming in 0.4.1. Because of this, we don't want to add call backs to the AssistantAgent included by default agentchat.

However, in saying all this, if callbacks work well for you and the constraints I mentioned above don't apply to you then I would encourage you to use them! Given the modular architecture of 0.4 and support for custom agents it should be really easy for you do this. Essentially you'd just copy/paste AssistantAgent, make your changes, and use it with all of the agentchat classes without modification.

@jspv
Copy link
Contributor Author

jspv commented Dec 19, 2024

@jspv Agent output should go via the runtime (message publishing). The reason this is important is so that cross process communication works as expected. The callback approach will only work in a single process.

While agentchat is only currently single process, we are expanding it to work with the same distributed expectations of core in an upcoming release. So, we will likely get to tackling partial message streaming in 0.4.1. Because of this, we don't want to add call backs to the AssistantAgent included by default agentchat.

However, in saying all this, if callbacks work well for you and the constraints I mentioned above don't apply to you then I would encourage you to use them! Given the modular architecture of 0.4 and support for custom agents it should be really easy for you do this. Essentially you'd just copy/paste AssistantAgent, make your changes, and use it with all of the agentchat classes without modification.

Understood. Thanks for the feedback; happy to assist where I can. My thinking on high level requirements so far is:

  • Token streaming should be enabled/disabled as an option to the agent, not the team, as some agent's TextMessages may not be suitable for streaming (e.g. large blocks of text, structured non-conversational output, etc.)
  • Agent token streaming should be a toggleable property and not permanently set when agents are instantiated
  • Token streaming is primarily a UI feature; the streamed tokens are not relevant to chat history, model context, saved/loaded state, intra-agent messages, etc. All the information that is relevant to those is captured in existing messages. E.g. After any stream of tokens, when the completion is finished, the standard TextMessage message should be published that has the entire response to all the agents; other agents don't need to receive the token-by-token messages; really just the UI.
    • This implies that a somewhat different message mechanism be created for streamed tokens, one that doesn't necessarily publish to all agents; but would still be awaitable to the calling application (e.g. exposed through the awaitable team.run_stream or agent.on_message_stream perhaps with an identifiable StreamedTokenMessage type or similar if that is the method of choice).

Does this seem reasonable? Is there a natural approach to modifying the message structure to support this? - I'm happy to prototype the change.

@jspv jspv marked this pull request as draft December 25, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants