You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
I configured kuberay v1.2.2 with --enable-leader-election turned on, and then set up three replicas. I found in monitoring that if a large number of RayJobs are submitted at the same time, the ray-operator queue will be blocked, affecting other Ray Clusters.
I want to confirm whether ray-operator is stateless. If I do not set --enable-leader-election and set more replicas, can this situation be alleviated?
Reproduction script
Submit 100 RayJobs to kuberay at the same time through the script
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
can you benchmark the queuing delay with 100 RayCluster (no RayJob) vs 100 RayJob? If 100 RayCluster is much faster than 100 RayJob, I think I can locate the root cause and provide a fix.
@kevin85421 I tested 256 RayClusters (no RayJob) vs 100 RayJobs. Indeed, the latency of Ray Cluster is much better than that of RayJob. RayJob: Ray Cluster:
But I still want to know whether more ray-operator replicas can scale linearly. I think it is difficult to draw reliable conclusions based on black box testing alone.
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
I configured kuberay v1.2.2 with --enable-leader-election turned on, and then set up three replicas. I found in monitoring that if a large number of RayJobs are submitted at the same time, the ray-operator queue will be blocked, affecting other Ray Clusters.
I want to confirm whether ray-operator is stateless. If I do not set --enable-leader-election and set more replicas, can this situation be alleviated?
Reproduction script
Submit 100 RayJobs to kuberay at the same time through the script
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: