You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A lesser known feature of the FnAPI protocol is that the SDK needs to set GRPC metadata before a runner should acknowledge the worker for all RPCs.
This allows distinguishes between pipeline workers to avoid needing a new port for each worker instance within a job, but also to distinguish which job the worker is a part of. So there are savings for ports, and GRPC based goroutines, which in extreme cases could cause efficiency issue in thread scheduling.
The proposal is to have a single "multiplexer" layer within prism to route between the handlers for given jobs and workers. This should be on the same single port as JobManagement, since GRPC should allow sharing for different services on the same port. Otherwise allow a single port to be assigned and known at prism startup time for worker endpoint use.
This multiplexer would likely need to be started up by the jobservices Server as well, adding a dependency between the worker and jobservices package. If that's a problem, we can have the whatever is starting up the jobservice to also start up the worker multiplexer, and provide a way of registering workers for the job on the jobservices.Job type.
The multiplexer would implement the Beam FnAPI but otherwise be delegating to the existing implementations of those methods on the worker.W type, by looking up the appropriate worker.W instance by the jobID.
Workers would need to be unregistered on job termination to keep things tidy, but that can be handled via context cancellation on the job's Root context (the RootCtx field).
Note that this would consolidate the GRPC internal per-worker Goroutines and structures. Each worker in a job would still have ~9 Goroutines to manage communication for that physical worker.
The text was updated successfully, but these errors were encountered:
A lesser known feature of the FnAPI protocol is that the SDK needs to set GRPC metadata before a runner should acknowledge the worker for all RPCs.
This allows distinguishes between pipeline workers to avoid needing a new port for each worker instance within a job, but also to distinguish which job the worker is a part of. So there are savings for ports, and GRPC based goroutines, which in extreme cases could cause efficiency issue in thread scheduling.
The proposal is to have a single "multiplexer" layer within prism to route between the handlers for given jobs and workers. This should be on the same single port as JobManagement, since GRPC should allow sharing for different services on the same port. Otherwise allow a single port to be assigned and known at prism startup time for worker endpoint use.
The "worker_id" metadata can be looked up from a GRPC message's context. See grpcx.ReadWorkerID for how to do that: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/util/grpcx/metadata.go
This multiplexer would likely need to be started up by the jobservices Server as well, adding a dependency between the worker and jobservices package. If that's a problem, we can have the whatever is starting up the jobservice to also start up the worker multiplexer, and provide a way of registering workers for the job on the jobservices.Job type.
The multiplexer would implement the Beam FnAPI but otherwise be delegating to the existing implementations of those methods on the worker.W type, by looking up the appropriate worker.W instance by the jobID.
Workers would need to be unregistered on job termination to keep things tidy, but that can be handled via context cancellation on the job's Root context (the RootCtx field).
Aside: apparently it is possible to also serve the web pages on the same port too for Go: https://stackoverflow.com/questions/63668447/why-grpc-go-can-run-grpc-server-and-http-server-at-the-same-address-and-port-bu. Might be worthwhile to avoid spending ports.
Note that this would consolidate the GRPC internal per-worker Goroutines and structures. Each worker in a job would still have ~9 Goroutines to manage communication for that physical worker.
The text was updated successfully, but these errors were encountered: