You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before I open an issue I thought I would ask: is agbenchmark with agent protocol expected to work with third-party (non Auto-GPT) agents that obey the protocol?
I kept stripping out more and more of the agent until I have a nearly empty mock implementation that might pass the WriteFile test, and agbenchmark is throwing pydantic validation errors on the artifacts.
My questions is, am I doing something wrong and this "should" work, or is agbenchmark expected to have rough edges when testing via the agent protocol? Are there different versions of the agent protocol?
Here is my agent:
from agent_protocol import Agent, Step, Task
import os
from pathlib import Path
async def _execute(step: Step):
task = await Agent.db.get_task(step.task_id)
wsfolder = Agent.get_workspace(step.task_id)
if not os.path.exists(wsfolder):
os.makedirs(wsfolder)
file_name = "my_file.txt"
# try cheating and see if tester is broken
with open(os.path.join(wsfolder, file_name), "w") as outf:
outf.write("Washington")
path = Path("./" + file_name)
await Agent.db.create_artifact(task_id=step.task_id, step_id=step.step_id, relative_path=str(path.parent), file_name=path.name)
# protocol handler test (test.sh) apparently expects at least two steps in some test cases
await Agent.db.create_step(task.task_id, "finish", is_last=True)
return step
async def _finish(step: Step):
print("Here is where we finish")
return step
async def task_handler(task: Task) -> None:
if not task.input:
raise Exception("No task prompt")
await Agent.db.create_step(task.task_id, "execute")
async def step_handler(step: Step) -> Step:
if step.name == "execute":
return await _execute(step)
elif step.name == "finish":
return await _finish(step)
else:
raise Exception("Unrecognized step name")
Agent.setup_agent(task_handler, step_handler).start()
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Before I open an issue I thought I would ask: is agbenchmark with agent protocol expected to work with third-party (non Auto-GPT) agents that obey the protocol?
I started with the protocol implementation from https://agentprotocol.ai/sdks/python using the Python SDK and it passes validation according to https://agentprotocol.ai/test.sh
I kept stripping out more and more of the agent until I have a nearly empty mock implementation that might pass the WriteFile test, and agbenchmark is throwing pydantic validation errors on the artifacts.
My questions is, am I doing something wrong and this "should" work, or is agbenchmark expected to have rough edges when testing via the agent protocol? Are there different versions of the agent protocol?
Here is my agent:
Beta Was this translation helpful? Give feedback.
All reactions