Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Who will update the NodeCondition to "False" after the problem recovery #936

Open
Congrool opened this issue Aug 10, 2024 · 2 comments

Comments

@Congrool
Copy link

Congrool commented Aug 10, 2024

Hi, I'm using the system-log-monitor to detect the node problem and update the respective NodeCondition, which can match logs in /dev/kmsg of specific pattern.

I noticed that the NPD updates the NodeCondtion Status to True when it gets the target log. But after digging into the code of systemlogmonitor, I didn't find anything which will reset the NodeCondition to False after recovery.

I want to know who will take the responsibility to update the NodeCondtion to False after recovery in the best practice. Is it the job of remedy system?

@Congrool Congrool changed the title [QuestionWho will update the NodeCondition to "False" [Question] Who will update the NodeCondition to "False" after the problem recovery Aug 10, 2024
@severloh
Copy link

severloh commented Nov 5, 2024

I also couldn't find this feature. Currently, my solution is to write a program to reset the status of the nodes manually.

@googs1025
Copy link
Contributor

googs1025 commented Dec 26, 2024

IIUC, this needs to be done manually. We currently use a custom script to implement this internally. npd is a inspect system that is responsible for reporting problems but not for recovery. Recovery requires manual recovery and manual recovery of the status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants