Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.config.to_diff_dict() delivers different result to model.save_pretrained() #35426

Open
2 of 4 tasks
umarbutler opened this issue Dec 27, 2024 · 0 comments
Open
2 of 4 tasks
Labels

Comments

@umarbutler
Copy link
Contributor

System Info

  • transformers version: 4.48.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.12.5
  • Huggingface_hub version: 0.25.1
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA GeForce RTX 4090

Who can help?

@ArthurZuc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I have a use case that requires that model weights always be encrypted when in local storage and only be decrypted in memory. As a result, it is not an option to use model.from_pretrained(dir).

Instead, my workaround has been to do:

import msgspec
from pyfakefs.fake_filesystem_unittest import Patcher as ffspatcher
from transformers import AutoConfig, AutoModelForSequenceClassification, PreTrainedModel

weights = {...} # Deserialized to `dict` from an encrypted file elsewhere.
config = {...} # Deserialized to `dict` from an encrypted file elsewhere.

json_encoder = msgspec.json.encode

with ffspatcher() as patcher:
    fakepath = f'FAKE_FILE_SYSTEM://config.json'
    patcher.fs.create_file(fakepath, contents = json_encoder(config))
    config = AutoConfig.from_pretrained(fakepath)

model: PreTrainedModel = AutoModelForSequenceClassification.from_config(config)
model.load_state_dict(weights)

The problem I've noticed, however, is that when I serialize my config like so:

config = model.config.to_diff_dict()

The resulting config includes the key _attn_implementation_autoset set to True whereas the actual config of the model does not include that key and as a result when I try loading the config with AutoConfig.from_pretrained(), it ends up not using the default attention implementation for my model, SDPA, delivering effectively a different model with different logits.

My current hotfix is to just delete the key _attn_implementation_autoset from all of my configs. But is it really necessary to add that key to to_diff_dict() when it is not added when you do save_pretrained()?

Expected behavior

I get the same model in a reproduciable way as when I save the config with to_diff_dict() vs save_pretrained().

@umarbutler umarbutler added the bug label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant