`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()` #35426

umarbutler · 2024-12-27T00:51:26Z

System Info

transformers version: 4.48.0.dev0
Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.12.5
Huggingface_hub version: 0.25.1
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.5.1+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA GeForce RTX 4090

Who can help?

@ArthurZuc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I have a use case that requires that model weights always be encrypted when in local storage and only be decrypted in memory. As a result, it is not an option to use model.from_pretrained(dir).

Instead, my workaround has been to do:

import msgspec
from pyfakefs.fake_filesystem_unittest import Patcher as ffspatcher
from transformers import AutoConfig, AutoModelForSequenceClassification, PreTrainedModel

weights = {...} # Deserialized to `dict` from an encrypted file elsewhere.
config = {...} # Deserialized to `dict` from an encrypted file elsewhere.

json_encoder = msgspec.json.encode

with ffspatcher() as patcher:
    fakepath = f'FAKE_FILE_SYSTEM://config.json'
    patcher.fs.create_file(fakepath, contents = json_encoder(config))
    config = AutoConfig.from_pretrained(fakepath)

model: PreTrainedModel = AutoModelForSequenceClassification.from_config(config)
model.load_state_dict(weights)

The problem I've noticed, however, is that when I serialize my config like so:

config = model.config.to_diff_dict()

The resulting config includes the key _attn_implementation_autoset set to True whereas the actual config of the model does not include that key and as a result when I try loading the config with AutoConfig.from_pretrained(), it ends up not using the default attention implementation for my model, SDPA, delivering effectively a different model with different logits.

My current hotfix is to just delete the key _attn_implementation_autoset from all of my configs. But is it really necessary to add that key to to_diff_dict() when it is not added when you do save_pretrained()?

Expected behavior

I get the same model in a reproduciable way as when I save the config with to_diff_dict() vs save_pretrained().

The text was updated successfully, but these errors were encountered:

umarbutler added the bug label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()` #35426

`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()` #35426

umarbutler commented Dec 27, 2024

model.config.to_diff_dict() delivers different result to model.save_pretrained() #35426

model.config.to_diff_dict() delivers different result to model.save_pretrained() #35426

Comments

umarbutler commented Dec 27, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()` #35426

`model.config.to_diff_dict()` delivers different result to `model.save_pretrained()` #35426