Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][dashboard] Dashboard head modules as Actors. #49432

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Dec 25, 2024

A DashboardHeadModule defines a dashboard module that lives in a dashboard itself. This is causing scalability issues and a whole-server blast radius. Introducing another type of module, namely DashboardHeadActorModule. The dashboard creates a Ray Actor from each module class, and make a ray method on each inbound request. With await obj_ref it's not blocked by any module-internal stalls.

Converted healthz_head into an Actor Module. The plan is to convert all modules gradually.

Limitations: don't support HTTPException. One needs to return Response(status=403, text=xxx). Existing code is mostly already like that, while there are about 2 raises which can be updated.

On ray start --head: the old start order is: gcs_server -> dashboard -> head node raylet. But now the dashboard needs to do ray.init(), so the new order is: gcs_server -> head node raylet -> dashboard. To fully confirm dashboard is not a "Ray Control Plane" but rather more like a Ray application, and Ray should be running without a dashboard up and running.

On DataSource: the plan is to remove all data dependencies between modules. If any module wants data, go query yourself via GcsClient. Notably, StateHead queries all DataSource and the plan is to move any state API into the corresponding Heaeds. For example, move /api/v0/cluster_events to EventHead via #49380.

For a detailed "how it works" document, see this.

@@ -212,6 +215,8 @@ async def run(self):
backup_count=args.logging_rotate_backup_count,
)

logger.error(" ".join(sys.argv))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug prints, will remove

Signed-off-by: Ruiyang Wang <[email protected]>
@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Dec 25, 2024
Signed-off-by: Ruiyang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants