-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRA: attach devices to nodes #5007
Comments
/assign @KobayashiD27 As discussed in kubernetes/kubernetes#124042 (comment). /sig scheduling |
@pohly: GitHub didn't allow me to assign the following users: KobayashiD27. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thank you for creating the issue. I will post a draft KEP as soon as possible. |
To facilitate the discussion on the KEP, we would like to share the design of the composable controller we are considering as a component utilizing the fabric-oriented scheduler function. By sharing this, we believe we can deepen the discussion on the optimal implementation of the scheduler function. Additionally, we would like to verify whether the controller design matches the DRA design. BackgroundOur controller's philosophy is to efficiently utilize fabric devices. Therefore, we prefer to allocate devices directly connected to the node over attached fabric devices. (e.g., Node-local devices > Attached fabric devices > Pre-attached fabric devices) Design OverviewThis design aims to efficiently utilize fabric devices, prioritizing node-local devices to improve performance. The composable controller manages fabric devices that can be attached and detached. Therefore, it publishes a list of fabric devices as ResourceSlices. The structure we are considering is as follows: # composable controller publish this pool
kind: ResourceSlice
pool: composable-device
driver: gpu.nvidia.com
nodeSelector: fabric1
devices:
- name: device1
...
- name: device2
... The vendor's DRA kubelet plugin will also publish the devices managed by the vendor as ResourceSlices. # vendor DRA kubelet plugin publish this pool
kind: ResourceSlice
pool: Node1
driver: gpu.nvidia.com
nodeName: Node1
devices:
- name: device3
... Here, when the scheduler selects the fabric device We are considering the following two methods for handling ResourceSlices upon completion of the attachment. We would like to hear your opinions and feasibility on these two composable controller proposals. Proposal 1: The composable controller publishes ResourceSlices with NodeName set within the poolMultiple ResourceSlices are published with the same pool name. One indicates the devices included in the fabric, and the other indicates the devices attached to the node. # composable controller publish this pool
kind: ResourceSlice
pool: composable-device
driver: gpu.nvidia.com
nodeSelector: fabric1
devices:
- name: device2
...
---
kind: ResourceSlice
pool: composable-device
driver: gpu.nvidia.com
nodeName: Node1
devices:
- name: device1
... If the vendor's plugin responds to hotplug, # vendor DRA kubelet plugin publish this pool
kind: ResourceSlice
pool: Node1
driver: gpu.nvidia.com
nodeName: Node1
devices:
- name: device3
...
- name: device1
... This may cause device duplication issues between ResourceSlices. To prevent multiple ResourceSlices from publishing duplicate devices, we plan to define a deny list and standardize it with DRA. Advantages
Disadvantages
Proposal 2: Attached devices are published by the vendor's pluginIn this case, devices are removed from the composable-device pool. # composable controller publish this pool
kind: ResourceSlice
pool: composable-device
driver: gpu.nvidia.com
nodeSelector: fabric1
devices:
- name: device2
... If the vendor's plugin responds to hotplug, # vendor DRA kubelet plugin publish this pool
kind: ResourceSlice
pool: Node1
driver: gpu.nvidia.com
nodeName: Node1
devices:
- name: device3
...
- name: device1
... This breaks the linkage between ResourceClaim and ResourceSlice. Therefore, it is necessary to modify the AllocationResult of the ResourceClaim. Advantages
Disadvantages
We would appreciate your feedback and insights on these proposals to ensure the optimal implementation of the scheduler function and alignment with the DRA design. |
Let's keep the discussion in this issue shorter. You now can put all of this, including the alternatives, into the KEP document. |
Enhancement Description
/sig scheduling
/wg device-management
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update PR(s):Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
The text was updated successfully, but these errors were encountered: