You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey! We are currently looking into other quantization approaches, both to improve inference speed and LM quality. How good is exl2's 2.4 quantization? 2.4 bits per parameters sounds like it reduces perplexity quite a bit. Could you provide any links, so we can look into it?
@dvmazurm I made this example for you https://gist.github.com/eramax/b6fc0b472372037648df7f0019ab0e78
one note is colab T4 with 15 GB Vram is not enough for the context of Mixtral-8x7B if it was 16 GB it will work fine, since we need some vram for the context beside the model and the 2.4 model get loaded in about 14.7 GB.
using exl2 2.4 you can run mixtral on colab, did you give it a try ?
The text was updated successfully, but these errors were encountered: