You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I set the context size to any value (doesn't matter if I use GUI or manual) then n_ctx gets bigger then the configured value. This is a new behaviour in my opinion since I updated. The difference is always 256. I used 1.76 before.
Is just the warning new or does this get calculated/set in the wrong way? My Model is optimized for 8192, but I use it in a trained version that gives me 16384. I tried other models they behave the same.
Here is an example for context_size: 16384, but it happens with every value I set.
Has always done that. Its just the warning that's new.
The additional 256 context is for buffer i believe. Won't actually impact usage, as you'd still only be requesting 16384 max context when actually interacting with the model.
If I set the context size to any value (doesn't matter if I use GUI or manual) then n_ctx gets bigger then the configured value. This is a new behaviour in my opinion since I updated. The difference is always 256. I used 1.76 before.
Is just the warning new or does this get calculated/set in the wrong way? My Model is optimized for 8192, but I use it in a trained version that gives me 16384. I tried other models they behave the same.
Here is an example for context_size: 16384, but it happens with every value I set.
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_seq_max = 1
llama_new_context_with_model: n_ctx = 16640
llama_new_context_with_model: n_ctx_per_seq = 16640
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: n_ctx_pre_seq (16640) > n_ctx_train (16384) -- possible training context overflow
llama_kv_cache_init: CUDA0 KV buffer size = 2080.00 MiB
llama_new_context_with_model: KV self size = 2080.00 MiB, K (f16): 1040.00 MiB, V (f16): 1040.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 108.75 MiB
My configuration for this example is:
{"model": "", "model_param": "D:/SillyTavern/Modelle/Text-Creation/daybreak-kunoichi-2dpo-7b-q4_k_m.gguf", "port": 5001, "port_param": 5001, "host": "", "launch": false, "config": null, "threads": 7, "usecublas": ["normal", "0", "mmq"], "usevulkan": null, "useclblast": null, "usecpu": false, "contextsize": 16384, "gpulayers": 35, "tensor_split": null, "ropeconfig": [0.0, 10000.0], "blasbatchsize": 512, "blasthreads": null, "lora": null, "noshift": false, "nofastforward": false, "nommap": false, "usemlock": false, "noavx2": false, "debugmode": 0, "onready": "", "benchmark": null, "prompt": "", "promptlimit": 100, "multiuser": 1, "remotetunnel": false, "highpriority": false, "foreground": false, "preloadstory": null, "quiet": false, "ssl": null, "nocertify": false, "mmproj": null, "password": null, "ignoremissing": false, "chatcompletionsadapter": null, "flashattention": true, "quantkv": 0, "forceversion": 0, "smartcontext": false, "unpack": "", "nomodel": false, "showgui": false, "skiplauncher": false, "hordemodelname": "", "hordeworkername": "", "hordekey": "", "hordemaxctx": 0, "hordegenlen": 0, "sdmodel": "", "sdthreads": 7, "sdclamped": 0, "sdt5xxl": "", "sdclipl": "", "sdclipg": "", "sdvae": "", "sdvaeauto": false, "sdquant": false, "sdlora": "", "sdloramult": 1.0, "whispermodel": "", "hordeconfig": null, "sdconfig": null, "noblas": false}
llama_new_context_with_model: CUDA_Host compute buffer size = 40.51 MiB
llama_new_context_with_model: graph nodes = 903
llama_new_context_with_model: graph splits = 2
The text was updated successfully, but these errors were encountered: