Add optimum.quanto
as supported load-time quantization_config
#10328
Labels
optimum.quanto
as supported load-time quantization_config
#10328
Recent additions to diffusers added
BitsAndBytesConfig
as well asTorchAoConfig
options that can be used asquantization_config
when loading model components usingfrom_pretrained
for example:
ask is to also support Huggingface's own Optimum Quanto
right now its possible to use it, but only as post-load on-demand quantization, there is no option to use it like BnB or TorchAO to apply quantization automatically during load itself.
@yiyixuxu @sayakpaul @DN6 @asomoza
The text was updated successfully, but these errors were encountered: