[Prism] Use the worker-id gRPC metadata #32167

lostluck · 2024-08-13T15:11:16Z

A lesser known feature of the FnAPI protocol is that the SDK needs to set GRPC metadata before a runner should acknowledge the worker for all RPCs.

This allows distinguishes between pipeline workers to avoid needing a new port for each worker instance within a job, but also to distinguish which job the worker is a part of. So there are savings for ports, and GRPC based goroutines, which in extreme cases could cause efficiency issue in thread scheduling.

The proposal is to have a single "multiplexer" layer within prism to route between the handlers for given jobs and workers. This should be on the same single port as JobManagement, since GRPC should allow sharing for different services on the same port. Otherwise allow a single port to be assigned and known at prism startup time for worker endpoint use.

The "worker_id" metadata can be looked up from a GRPC message's context. See grpcx.ReadWorkerID for how to do that: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/util/grpcx/metadata.go

This multiplexer would likely need to be started up by the jobservices Server as well, adding a dependency between the worker and jobservices package. If that's a problem, we can have the whatever is starting up the jobservice to also start up the worker multiplexer, and provide a way of registering workers for the job on the jobservices.Job type.

The multiplexer would implement the Beam FnAPI but otherwise be delegating to the existing implementations of those methods on the worker.W type, by looking up the appropriate worker.W instance by the jobID.

Workers would need to be unregistered on job termination to keep things tidy, but that can be handled via context cancellation on the job's Root context (the RootCtx field).

Aside: apparently it is possible to also serve the web pages on the same port too for Go: https://stackoverflow.com/questions/63668447/why-grpc-go-can-run-grpc-server-and-http-server-at-the-same-address-and-port-bu. Might be worthwhile to avoid spending ports.

Note that this would consolidate the GRPC internal per-worker Goroutines and structures. Each worker in a job would still have ~9 Goroutines to manage communication for that physical worker.

lostluck mentioned this issue Aug 13, 2024

[Tracking Umbrella] Prism Runner areas for contribution. #29650

Open

lostluck changed the title ~~Use the worker-id GRPC metadata~~ [prism] Use the worker-id GRPC metadata Aug 13, 2024

lostluck added go prism labels Aug 13, 2024

damondouglas self-assigned this Oct 22, 2024

damondouglas linked a pull request Dec 23, 2024 that will close this issue

[Prism] Use the worker-id gRPC metadata #33438

Draft

3 tasks

damondouglas changed the title ~~[prism] Use the worker-id GRPC metadata~~ [prism] Use the worker-id gRPC metadata Dec 26, 2024

damondouglas changed the title ~~[prism] Use the worker-id gRPC metadata~~ [Prism] Use the worker-id gRPC metadata Dec 26, 2024

damondouglas mentioned this issue Dec 26, 2024

[Bug]: Python SDK inconsistently instantiates WorkerIdInterceptor #33450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prism] Use the worker-id gRPC metadata #32167

[Prism] Use the worker-id gRPC metadata #32167

lostluck commented Aug 13, 2024 •

edited

Loading

[Prism] Use the worker-id gRPC metadata #32167

[Prism] Use the worker-id gRPC metadata #32167

Comments

lostluck commented Aug 13, 2024 • edited Loading

lostluck commented Aug 13, 2024 •

edited

Loading