Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Emit metrics of cluster creation and other related metrics for observability #2681

Open
2 tasks done
ujjawal-khare-27 opened this issue Dec 23, 2024 · 0 comments
Open
2 tasks done
Labels
enhancement New feature or request triage

Comments

@ujjawal-khare-27
Copy link

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

In production, ensuring that Ray cluster creation adheres to strict SLAs is crucial. Any unexpected delays in cluster creation can stem from various factors, such as prolonged image pull times, node creation delays, or suboptimal KubeRay settings. Currently, this process requires manual monitoring, which is cumbersome and prone to missing critical events or edge cases. Automating this monitoring is essential to improve reliability and efficiency.

Use case

Deeper insights and observability

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@ujjawal-khare-27 ujjawal-khare-27 added enhancement New feature or request triage labels Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant