-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to finetune this on a custom dataset? #17
Comments
Hi! Full fine-tuning won't work as the model is quantized, but you could try fine-tuning the model using various PEFT techniques which work with quantized base models. Check out QLoRA for example. Hope this is helpful. |
The structure of the loaded model is:
When I try to train with
I get that peft is not compatible with HQQLinearTritonSavable, evidently: 7 frames ValueError: Target module HQQLinearTritonSavable() is not supported. Currently, only the following modules are supported: |
Hey, @nmarafo and @complete-dope! It looks like using huggingface's peft for fine-tuning the offloaded model is a bit tricky (due to custom layers mostly), but I haven't looked into it myself. A LoRA fine-tuning setup similar to the original paper can be hacked together quite simply: # imports
class LoRALayer(nn.Module):
def __init__(self, module: nn.Linear, rank: int):
super().__init__()
self.module = module
self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))
def forward(self, input):
bottleneck = F.linear(input, self.adapter_A.T)
residual = F.linear(bottleneck, self.adapter_B.T)
return self.module(input) + residual
def custom_get_peft_model(model, rank):
for _, module in model.named_modules():
if not isinstance(module, MixtralAttention):
continue
module.q_proj = LoRALayer(module.q_proj, rank)
# TODO: {k, v, o}_proj
return model Note that this example only applies LoRA to attention parameters. Doing the same for the expert layers is tricker as it might break the ExpertCache (haven't looked into that myself yet). |
Thank you very much for the answer. Sorry for my inexperience, I'm trying to implement it like this:
and I get this error:
|
Perhaps is solved with this:
|
I'm not sure whether from transformers import AutoConfig
config = AutoConfig.from_pretrained("mistralai/Mixtral-8x7B-v0.1")
head_dim = config.hidden_size // config.num_attention_heads
# (in_features, out_features)
q_proj_shape = (config.hidden_size, config.num_attention_heads * head_dim)
k_proj_shape = (config.hidden_size, config.num_key_value_heads * head_dim)
v_proj_shape = (config.hidden_size, config.num_key_value_heads * head_dim)
o_proj_shape = (config.num_attention_heads * head_dim, config.hidden_size) Haven't checked whether these shapes are correct, but they must be. If this snippet doesn't work, you could try reconstructing the original shapes from here. |
Please have you found any solution |
Hi there,
Just wondering is it possible to fine tune this model on a custom dataset? If so, are there any examples/code?
Many thanks for any help, and for this amazing model, I'm finding it works really well!
The text was updated successfully, but these errors were encountered: