Current status of agbenchmark with agent protocol client? #5853

vector76 · 2023-10-20T22:15:07Z

vector76
Oct 20, 2023

Before I open an issue I thought I would ask: is agbenchmark with agent protocol expected to work with third-party (non Auto-GPT) agents that obey the protocol?

I started with the protocol implementation from https://agentprotocol.ai/sdks/python using the Python SDK and it passes validation according to https://agentprotocol.ai/test.sh

I kept stripping out more and more of the agent until I have a nearly empty mock implementation that might pass the WriteFile test, and agbenchmark is throwing pydantic validation errors on the artifacts.

My questions is, am I doing something wrong and this "should" work, or is agbenchmark expected to have rough edges when testing via the agent protocol? Are there different versions of the agent protocol?

Here is my agent:

from agent_protocol import Agent, Step, Task
import os
from pathlib import Path

async def _execute(step: Step):
    task = await Agent.db.get_task(step.task_id)

    wsfolder = Agent.get_workspace(step.task_id)
    if not os.path.exists(wsfolder):
        os.makedirs(wsfolder)

    file_name = "my_file.txt"
    # try cheating and see if tester is broken
    with open(os.path.join(wsfolder, file_name), "w") as outf:
        outf.write("Washington")

    path = Path("./" + file_name)
    await Agent.db.create_artifact(task_id=step.task_id, step_id=step.step_id, relative_path=str(path.parent), file_name=path.name)

    # protocol handler test (test.sh) apparently expects at least two steps in some test cases
    await Agent.db.create_step(task.task_id, "finish", is_last=True)
    return step


async def _finish(step: Step):
    print("Here is where we finish")
    return step


async def task_handler(task: Task) -> None:
    if not task.input:
        raise Exception("No task prompt")
    await Agent.db.create_step(task.task_id, "execute")


async def step_handler(step: Step) -> Step:
    if step.name == "execute":
        return await _execute(step)
    elif step.name == "finish":
        return await _finish(step)
    else:
        raise Exception("Unrecognized step name")


Agent.setup_agent(task_handler, step_handler).start()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current status of agbenchmark with agent protocol client? #5853

{{title}}

Replies: 0 comments

Select a reply

Current status of agbenchmark with agent protocol client? #5853

vector76 Oct 20, 2023

Replies: 0 comments

vector76
Oct 20, 2023