[core][dashboard] Dashboard head modules as Actors. #49432
+341
−60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A DashboardHeadModule defines a dashboard module that lives in a dashboard itself. This is causing scalability issues and a whole-server blast radius. Introducing another type of module, namely DashboardHeadActorModule. The dashboard creates a Ray Actor from each module class, and make a ray method on each inbound request. With
await obj_ref
it's not blocked by any module-internal stalls.Converted healthz_head into an Actor Module. The plan is to convert all modules gradually.
Limitations: don't support HTTPException. One needs to return Response(status=403, text=xxx). Existing code is mostly already like that, while there are about 2 raises which can be updated.
On
ray start --head
: the old start order is:gcs_server -> dashboard -> head node raylet
. But now the dashboard needs to doray.init()
, so the new order is:gcs_server -> head node raylet -> dashboard
. To fully confirmdashboard
is not a "Ray Control Plane" but rather more like a Ray application, and Ray should be running without a dashboard up and running.On DataSource: the plan is to remove all data dependencies between modules. If any module wants data, go query yourself via GcsClient. Notably,
StateHead
queries all DataSource and the plan is to move any state API into the corresponding Heaeds. For example, move/api/v0/cluster_events
to EventHead via #49380.For a detailed "how it works" document, see this.