DRAFT for Feedback - Support for token streaming for more dynamic UX #4443

jspv · 2024-12-01T14:45:02Z

Why are these changes needed?

ChatCompletionClient nicely supports token level streaming via create_stream, but this method is currently not accessible in the AssistantAgent. This proposed change adds an option to pass a token_callback when instantiating AssistantAgent, if provided:

create_stream will be leveraged instead of create when calling on_messages_stream
the provided callback will be called with the returned token as the argument.

This will allow the calling application access to the returned tokens real-time. Nothing else is changed, the normal returns to on_messages_streams are not affected.

Example:

If folks feel this a good idea, I will make appropriate updates in documentation and tests.

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

jspv · 2024-12-01T16:58:24Z

@microsoft-github-policy-service agree From: microsoft-github-policy-service[bot] ***@***.***> Date: Sunday, December 1, 2024 at 9:45 AM To: microsoft/autogen ***@***.***> Cc: jspv ***@***.***>, Mention ***@***.***> Subject: Re: [microsoft/autogen] DRAFT for Feedback - Support for token streaming for more dynamic UX (PR #4443) @jspv<https://github.com/jspv> please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information. [...]

ekzhu · 2024-12-02T19:53:22Z

@jspv thanks for the PR! We have an issue to track streaming output from agents: #3862 and #3983. The general idea is to stream partial messages through the async iterator from on_messages_stream. Do you think that approach can meet your need? We haven't started working on that yet.

jspv · 2024-12-04T11:55:20Z

@jspv thanks for the PR! We have an issue to track streaming output from agents: #3862 and #3983. The general idea is to stream partial messages through the async iterator from on_messages_stream. Do you think that approach can meet your need? We haven't started working on that yet.

Yes, that would work. I considered that approach but as it be a potentially breaking change, I avoided it for my testing. Happy to take a stab at it. The callers pulling from the iterator would need to be able to differentiate between tokens being returned vs. other types of messages coming back (tool call, etc.); it could be as simple as type str for tokens and Response for non-Tokens, or are you thinking a different response type for Tokens?

As callers of on_messages_stream may want the full results (call for tools, etc.) streamed back but not the tokens, how it currently works with the underlying client call model_client.create() vs. model_client.create_stream()), there would need to be a way to signal to on_messages_stream that token streaming is desired. e.g. stream_tokens = True to on_messages_stream?

jspv · 2024-12-04T19:04:58Z

Thinking more on this. An advantage of the callback model vs. async iterator is that it works perfectly when invoking group chats, E.g. RoundRobinGroupChat and await self.agent_team.run(task=message). This way the desire for streaming is indicated part of the agent's instantiation and I can choose which agents I wish to receiving streamed tokens from and which ones I do not. When I call agent_team.run(task = message I get only the tokens I'm interested in via the callbacks. This already works with my minimal code; using the on_messages iterator would require a lot of rework for team/group chats.

What I think would make sense is to accept a list of callbacks (Langchain does this) on Agent instantiation, or alternatively create methods for registering and removing callbacks from the agent. If there are callbacks listed, the agent will use create_stream in on_messages_stream instead of create, and will call the callbacks on returned tokens with a structure that has the token and the calling agent.

Effectively this cleanly separates streamed tokens into their own path for getting to the UI for those who want them (as I really think this is only a UI need), and leaves all the 'normal' paths for group chats and inter-agent communications.

Thoughts?

Copilot reviewed 1 out of 1 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (4)

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:260

Ensure that the token_callback is an async function before using await. Add a check to verify if the token_callback is an async function.

if self._token_callback is not None:

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:188

[nitpick] The error message should be more informative. Suggestion: 'The model does not support function calling, which is required for the provided tools.'

raise ValueError("The model does not support function calling.")

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:271

[nitpick] The error message should be more informative. Suggestion: 'Unsupported tool type provided. Expected Tool or callable, but got {type(tool)}.'

raise ValueError(f"Unsupported tool type: {type(tool)}")

python/packages/autogen-agentchat/src/autogen_agentchat/agents/_assistant_agent.py:330

[nitpick] The error message should be more informative. Suggestion: 'Unsupported handoff type provided. Expected HandoffBase or str, but got {type(handoff)}.'

raise ValueError(f"Unsupported handoff type: {type(handoff)}")

…to stream_token_0.4

ekzhu · 2024-12-09T04:19:43Z

@jspv since this feature is targeting 0.4.1, do you want to join our discord channel so we can discuss? https://aka.ms/autogen-discord

ekzhu · 2024-12-18T18:27:39Z

@jspv would you like to join our community office hours to discuss the changes you proposed here? See #4059

jackgerrits · 2024-12-19T18:02:35Z

@jspv Agent output should go via the runtime (message publishing). The reason this is important is so that cross process communication works as expected. The callback approach will only work in a single process.

While agentchat is only currently single process, we are expanding it to work with the same distributed expectations of core in an upcoming release. So, we will likely get to tackling partial message streaming in 0.4.1. Because of this, we don't want to add call backs to the AssistantAgent included by default agentchat.

However, in saying all this, if callbacks work well for you and the constraints I mentioned above don't apply to you then I would encourage you to use them! Given the modular architecture of 0.4 and support for custom agents it should be really easy for you do this. Essentially you'd just copy/paste AssistantAgent, make your changes, and use it with all of the agentchat classes without modification.

…v/autogen into ToolCallResultSummaryMessage

jspv · 2024-12-19T22:43:01Z

@jspv Agent output should go via the runtime (message publishing). The reason this is important is so that cross process communication works as expected. The callback approach will only work in a single process.

While agentchat is only currently single process, we are expanding it to work with the same distributed expectations of core in an upcoming release. So, we will likely get to tackling partial message streaming in 0.4.1. Because of this, we don't want to add call backs to the AssistantAgent included by default agentchat.

However, in saying all this, if callbacks work well for you and the constraints I mentioned above don't apply to you then I would encourage you to use them! Given the modular architecture of 0.4 and support for custom agents it should be really easy for you do this. Essentially you'd just copy/paste AssistantAgent, make your changes, and use it with all of the agentchat classes without modification.

Understood. Thanks for the feedback; happy to assist where I can. My thinking on high level requirements so far is:

Token streaming should be enabled/disabled as an option to the agent, not the team, as some agent's TextMessages may not be suitable for streaming (e.g. large blocks of text, structured non-conversational output, etc.)
Agent token streaming should be a toggleable property and not permanently set when agents are instantiated
Token streaming is primarily a UI feature; the streamed tokens are not relevant to chat history, model context, saved/loaded state, intra-agent messages, etc. All the information that is relevant to those is captured in existing messages. E.g. After any stream of tokens, when the completion is finished, the standard TextMessage message should be published that has the entire response to all the agents; other agents don't need to receive the token-by-token messages; really just the UI.
- This implies that a somewhat different message mechanism be created for streamed tokens, one that doesn't necessarily publish to all agents; but would still be awaitable to the calling application (e.g. exposed through the awaitable team.run_stream or agent.on_message_stream perhaps with an identifiable StreamedTokenMessage type or similar if that is the method of choice).

Does this seem reasonable? Is there a natural approach to modifying the message structure to support this? - I'm happy to prototype the change.

jspv added 2 commits November 21, 2024 21:34

AssistantAgent support for streaming tokens

963e409

Merge branch 'main' into stream_token_0.4

8b9295e

ekzhu added the proj-agentchat label Dec 2, 2024

Merge branch 'microsoft:main' into stream_token_0.4

c20298d

ekzhu requested a review from Copilot December 6, 2024 07:58

Copilot AI reviewed Dec 6, 2024

View reviewed changes

jspv added 4 commits December 8, 2024 19:39

Merge branch 'microsoft:main' into stream_token_0.4

cf9a1eb

Updates for token handling

b854dae

Merge branch 'stream_token_0.4' of https://github.com/jspv/autogen in…

6054cd9

…to stream_token_0.4

Fixed formatting

f43f831

jspv added 2 commits December 9, 2024 08:58

Set default to fix Azure clients

9a763f4

refresh from upstream

9d4d58c

jspv added 12 commits December 18, 2024 16:07

Adding ToolCallResultSummaryMessage

4be98d0

Merge remote-tracking branch 'origin/main' into stream_token_0.4

c36c4f2

revert typo

9ff27cf

Fix merge error

36fd8a5

Merge branch 'ToolCallResultSummaryMessage' into stream_token_0.4

16b6f5a

ToolCallResultSummaryMessage support

c740229

Support for ToolCallResultSummaryMessage

bd3ee60

Merge branch 'main' into ToolCallResultSummaryMessage

d17dff4

Added ToolCallSummaryMessage

b26bcb0

Merge branch 'ToolCallResultSummaryMessage' into stream_token_0.4

d9c1d32

Merge branch 'main' into ToolCallResultSummaryMessage

f0e9be2

ruff format

aef903e

jspv and others added 3 commits December 19, 2024 06:32

Add ToolCallSummaryMessage to ChatMessage

b4eb3ab

Merge branch 'ToolCallResultSummaryMessage' into stream_token_0.4

8a9561c

Merge branch 'main' into ToolCallResultSummaryMessage

4b72194

jspv added 6 commits December 19, 2024 15:21

typing and tests for ToolCallSummaryMessage

285228a

Merge branch 'ToolCallResultSummaryMessage' of https://github.com/jsp…

22ff506

…v/autogen into ToolCallResultSummaryMessage

Merge branch 'ToolCallResultSummaryMessage' into stream_token_0.4

98a2feb

Merge branch 'main' into ToolCallResultSummaryMessage

386db37

PR Feedback

a6cab09

Merge branch 'ToolCallResultSummaryMessage' into stream_token_0.4

2622fdc

Merge branch 'main' into stream_token_0.4

26ec534

jspv marked this pull request as draft December 25, 2024 17:10

Merge branch 'main' into stream_token_0.4

58a322a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT for Feedback - Support for token streaming for more dynamic UX #4443

DRAFT for Feedback - Support for token streaming for more dynamic UX #4443

jspv commented Dec 1, 2024

jspv commented Dec 1, 2024 via email •

edited

Loading

ekzhu commented Dec 2, 2024 •

edited

Loading

jspv commented Dec 4, 2024

jspv commented Dec 4, 2024 •

edited

Loading

ekzhu commented Dec 9, 2024

ekzhu commented Dec 18, 2024

jackgerrits commented Dec 19, 2024 •

edited

Loading

jspv commented Dec 19, 2024

DRAFT for Feedback - Support for token streaming for more dynamic UX #4443

Are you sure you want to change the base?

DRAFT for Feedback - Support for token streaming for more dynamic UX #4443

Conversation

jspv commented Dec 1, 2024

Why are these changes needed?

Related issue number

Checks

jspv commented Dec 1, 2024 via email • edited Loading

ekzhu commented Dec 2, 2024 • edited Loading

jspv commented Dec 4, 2024

jspv commented Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

ekzhu commented Dec 9, 2024

ekzhu commented Dec 18, 2024

jackgerrits commented Dec 19, 2024 • edited Loading

jspv commented Dec 19, 2024

jspv commented Dec 1, 2024 via email •

edited

Loading

ekzhu commented Dec 2, 2024 •

edited

Loading

jspv commented Dec 4, 2024 •

edited

Loading

jackgerrits commented Dec 19, 2024 •

edited

Loading