-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent potentially misleading error message about "no heartbeat" #2180
base: main
Are you sure you want to change the base?
Conversation
In the context of github.com/os-autoinst/openQA we observed error messages coming from Mojo::Server::Prefork when otherwise there is no harmful effect observed. As the underlying prefork module should only inform about a potentially erroneous situation but can not know about the effect on any component that uses the prefork module the log message instead should not claim an error but rather only inform. Hence this commit reduces the log level from error to info accordingly. Related progress issue: https://progress.opensuse.org/issues/138536
Is there a reason it's not possible for y'all to extended the timeout it with the attribute "heartbeat_timeout" in Mojo::Server::Prefork since your application requires it? |
yes, we already use 900s. As far as I understood the evaluation in https://progress.opensuse.org/issues/106759 and https://progress.opensuse.org/issues/138536 regarding how this is used within the context of github.com/os-autoinst/openQA/ we will run into the timeout regardless. Probably something in the implementation is sub-optimal but anyway I still think the message with a log level of "error" is too strict |
Why |
I would say same reasoning because I think this component can not know if the condition is harmful to the caller but if you guys prefer "warn" that would also help us. |
Am I misunderstanding what the log message means? Something blocked the worker for more than the heartbeat timeout and the supervisor has decided to kill that worker, causing anybody with a pending request to that worker to never receive any more bytes in response... Is that the situation this message warns about? |
Yes, it's also in the FAQ. https://docs.mojolicious.org/Mojolicious/Guides/FAQ#What-does-Worker-31842-has-no-heartbeat-50-seconds-restarting-mean |
@kraih Cool, though I don't see how "I'm about to TERM a blocking worker" could ever be part of normal/safe operation? Is there any way for the dying worker to log the requests it was stuck on when the TERM arrived? The OpenQA ticket looks like it would be a much shorter investigation if that info were in the logs... Also, would it be worth suggesting moving the blocking code into a |
You're welcome to try and come up with a proposal for that. |
In the context of
github.com/os-autoinst/openQA
we observed error messages coming from Mojo::Server::Prefork
when otherwise there is no harmful effect observed. As the underlying
prefork module should only inform about a potentially erroneous
situation but can not know about the effect on any component that uses
the prefork module the log message instead should not claim an error but
rather only inform. Hence this commit reduces the log level from error
to info accordingly.
Related progress issue: https://progress.opensuse.org/issues/138536