Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WaitForSidecar check in Workflow SDK #1403

Open
WhitWaldo opened this issue Nov 6, 2024 · 5 comments
Open

Add WaitForSidecar check in Workflow SDK #1403

WhitWaldo opened this issue Nov 6, 2024 · 5 comments
Labels
area/workflow kind/enhancement New feature or request

Comments

@WhitWaldo
Copy link
Contributor

Describe the feature

The DaprClient offers a method that enables the developer to wait for the sidecar to be ready before starting any operations, but Dapr.Workflow doesn't have any such method.

Perhaps it's worth looking at adding it during startup both as a method exposed to developers, but also which happens internally so that any attempt to do an actor operation is preempted until the app channel is live and then allowed to proceed.

Release Note

RELEASE NOTE: UPDATE Dapr workflows wait for the sidecar to be ready before performing operations.

@WhitWaldo WhitWaldo added kind/enhancement New feature or request area/workflow labels Nov 6, 2024
@WhitWaldo
Copy link
Contributor Author

@olitomlinson Just verifying - should this just wait for sidecar readiness on /v1.0/healthz/outbound or wait for app channel as well on /v1.0/healthz?

@oising
Copy link

oising commented Nov 6, 2024

I implore you to revisit what consitudes a "ready" sidecar. The sidecar should really be using early parsing information from components to figure out what services comprise the readiness check. i.e. is there an actorStateStore: true configured, is there an app port configured, and then wait for the actor runtime to be ready, and for the app port probe to complete (for example.) Without this, race conditions abound and the waitforsidecarasync call is meaningless.

@WhitWaldo
Copy link
Contributor Author

That might be more of a larger ask from the Dapr runtime than what the SDK itself has access to. We've got the Health API with just shows the sidecar is up and running (or not) and that the app channel is set up.

I believe the app channel on the Health API only returns true if the app port is accessible (catching app port configured and waiting for the app port probe to complete). It'd be nice if it'd also fail to return true if the actor state store isn't configured (as I think the only services that rely on the app channel are actors and workflows), but it should ideally catch if actors aren't configured right or the actor state store not configured on the outbound check).

So while I get that it'd be neat for each separate service to potentially report liveness checks, I think most of them would be duplicative to the two we already have. I think as far as the SDK goes, we've got all the information we need.

It'd be nice if the health checks would report precisely what isn't validated as that could simplify debugging (e.g. the error you shared in Discord: The state store is not configured to use the actor runtime. Have you set the - name: actorStateStore value : "true" in your state store component file? instead of just in the runtime logs, but I don't know that there's necessarily a race condition involved by the SDKs waiting on app channel (where necessary) and outbound to indicate readiness.

@oising
Copy link

oising commented Nov 6, 2024

There's 100% a race condition whereby you can cause workflows and actor invocations to fail, as the sidecar initializes in parallel to the app, and the WaitForSidecarAsync does NOT act as a readiness check. It's more of a liveness check, to use k8s parlance.

@WhitWaldo
Copy link
Contributor Author

@oising Just following up here after having had a discussion with the Dapr runtime developers about it - the goal following the release of 1.15 is this should not be "handled" by the SDK at all. Rather, all requests to the runtime from the SDK will be blocked and queued until the runtime itself has determined readiness at which point it'll suddenly get responsive and chatty.

As such, don't expect any changes to the SDK behavior here as we'll plan to lean entirely on that effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workflow kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants