Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.78 Context size is bigger then configured #1242

Open
PeterPeet opened this issue Nov 30, 2024 · 1 comment
Open

v1.78 Context size is bigger then configured #1242

PeterPeet opened this issue Nov 30, 2024 · 1 comment

Comments

@PeterPeet
Copy link

If I set the context size to any value (doesn't matter if I use GUI or manual) then n_ctx gets bigger then the configured value. This is a new behaviour in my opinion since I updated. The difference is always 256. I used 1.76 before.

Is just the warning new or does this get calculated/set in the wrong way? My Model is optimized for 8192, but I use it in a trained version that gives me 16384. I tried other models they behave the same.

Here is an example for context_size: 16384, but it happens with every value I set.

Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_seq_max = 1
llama_new_context_with_model: n_ctx = 16640
llama_new_context_with_model: n_ctx_per_seq = 16640
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: n_ctx_pre_seq (16640) > n_ctx_train (16384) -- possible training context overflow
llama_kv_cache_init: CUDA0 KV buffer size = 2080.00 MiB
llama_new_context_with_model: KV self size = 2080.00 MiB, K (f16): 1040.00 MiB, V (f16): 1040.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 108.75 MiB

My configuration for this example is:
{"model": "", "model_param": "D:/SillyTavern/Modelle/Text-Creation/daybreak-kunoichi-2dpo-7b-q4_k_m.gguf", "port": 5001, "port_param": 5001, "host": "", "launch": false, "config": null, "threads": 7, "usecublas": ["normal", "0", "mmq"], "usevulkan": null, "useclblast": null, "usecpu": false, "contextsize": 16384, "gpulayers": 35, "tensor_split": null, "ropeconfig": [0.0, 10000.0], "blasbatchsize": 512, "blasthreads": null, "lora": null, "noshift": false, "nofastforward": false, "nommap": false, "usemlock": false, "noavx2": false, "debugmode": 0, "onready": "", "benchmark": null, "prompt": "", "promptlimit": 100, "multiuser": 1, "remotetunnel": false, "highpriority": false, "foreground": false, "preloadstory": null, "quiet": false, "ssl": null, "nocertify": false, "mmproj": null, "password": null, "ignoremissing": false, "chatcompletionsadapter": null, "flashattention": true, "quantkv": 0, "forceversion": 0, "smartcontext": false, "unpack": "", "nomodel": false, "showgui": false, "skiplauncher": false, "hordemodelname": "", "hordeworkername": "", "hordekey": "", "hordemaxctx": 0, "hordegenlen": 0, "sdmodel": "", "sdthreads": 7, "sdclamped": 0, "sdt5xxl": "", "sdclipl": "", "sdclipg": "", "sdvae": "", "sdvaeauto": false, "sdquant": false, "sdlora": "", "sdloramult": 1.0, "whispermodel": "", "hordeconfig": null, "sdconfig": null, "noblas": false}
llama_new_context_with_model: CUDA_Host compute buffer size = 40.51 MiB
llama_new_context_with_model: graph nodes = 903
llama_new_context_with_model: graph splits = 2

@askmyteapot
Copy link

Has always done that. Its just the warning that's new.
The additional 256 context is for buffer i believe. Won't actually impact usage, as you'd still only be requesting 16384 max context when actually interacting with the model.

Safe to ignore in this circumstance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants