exl2 #4

eramax · 2023-12-29T19:55:57Z

using exl2 2.4 you can run mixtral on colab, did you give it a try ?

dvmazur · 2023-12-30T11:45:31Z

Hey! We are currently looking into other quantization approaches, both to improve inference speed and LM quality. How good is exl2's 2.4 quantization? 2.4 bits per parameters sounds like it reduces perplexity quite a bit. Could you provide any links, so we can look into it?

eramax · 2023-12-30T12:37:58Z

@dvmazurm I made this example for you https://gist.github.com/eramax/b6fc0b472372037648df7f0019ab0e78
one note is colab T4 with 15 GB Vram is not enough for the context of Mixtral-8x7B if it was 16 GB it will work fine, since we need some vram for the context beside the model and the 2.4 model get loaded in about 14.7 GB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exl2 #4

exl2 #4

eramax commented Dec 29, 2023

dvmazur commented Dec 30, 2023

eramax commented Dec 30, 2023

exl2 #4

exl2 #4

Comments

eramax commented Dec 29, 2023

dvmazur commented Dec 30, 2023

eramax commented Dec 30, 2023