You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Set the `--num-cpus` rayStartParam and/or the CPU resource limit for the Ray container.
The Ray head is ready. Starting the autoscaler.
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/scripts/scripts.py", line 2614, in main
return cli()
File "/home/ray/anaconda3/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/scripts/scripts.py", line 2342, in kuberay_autoscaler
run_kuberay_autoscaler(cluster_name, cluster_namespace)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/run_autoscaler.py", line 76, in run_kuberay_autoscaler
Monitor(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/monitor.py", line 583, in run
self._initialize_autoscaler()
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/monitor.py", line 231, in _initialize_autoscaler
self.autoscaler = StandardAutoscaler(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/autoscaler.py", line 251, in __init__
self.reset(errors_fatal=True)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/autoscaler.py", line 1122, in reset
raise e
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/autoscaler.py", line 1035, in reset
new_config = self.config_reader()
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/autoscaling_config.py", line 59, in __call__
autoscaling_config = _derive_autoscaling_config_from_ray_cr(ray_cr)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/autoscaling_config.py", line 96, in _derive_autoscaling_config_from_ray_cr
available_node_types = _generate_available_node_types_from_ray_cr_spec(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/autoscaling_config.py", line 195, in _generate_available_node_types_from_ray_cr_spec
_HEAD_GROUP_NAME: _node_type_from_group_spec(headGroupSpec, is_head=True),
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/autoscaling_config.py", line 217, in _node_type_from_group_spec
resources = _get_ray_resources_from_group_spec(group_spec, is_head)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/autoscaling_config.py", line 249, in _get_ray_resources_from_group_spec
num_cpus = _get_num_cpus(ray_start_params, k8s_resource_limits, group_name)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/autoscaler/_private/kuberay/autoscaling_config.py", line 316, in _get_num_cpus
raise ValueError(
ValueError: Autoscaler failed to detect `CPU` resources for group head-group.
Set the `--num-cpus` rayStartParam and/or the CPU resource limit for the Ray container.
Reproduction script
create a raycluster not set resources.limt
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
I believe #2365 should fix this since we will set --num-cpus based on requests if limits is no longer set. However, this will only available starting from KubeRay v1.3. In the meantime you need to set num-cpus in rayStartParams to match what you specfieid in the CPU requests.
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
ray head logs:
autoscaler log:
Reproduction script
create a raycluster not set resources.limt
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: