Commit graph

62 commits

Author SHA1 Message Date
oobabooga
0ef1b8f8b4 Use ExLlamaV2 (instead of the HF one) for EXL2 models for now
It doesn't seem to have the "OverflowError" bug
2025-04-17 05:47:40 -07:00
oobabooga
682c78ea42 Add back detection of GPTQ models (closes #6841) 2025-04-11 21:00:42 -07:00
oobabooga
8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
oobabooga
a5855c345c
Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835) 2025-04-07 21:42:33 -03:00
oobabooga
7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga
e6181e834a Remove AutoAWQ as a standalone loader
(it works better through transformers)
2024-07-23 15:31:17 -07:00
oobabooga
907137a13d Automatically set bf16 & use_eager_attention for Gemma-2 2024-07-01 21:46:35 -07:00
mefich
a85749dcbe
Update models_settings.py: add default alpha_value, add proper compress_pos_emb for newer GGUFs (#6111) 2024-06-26 22:17:56 -03:00
oobabooga
577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
Forkoz
1576227f16
Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-06-14 13:51:01 -03:00
oobabooga
2d196ed2fe Remove obsolete pre_layer parameter 2024-06-12 18:56:44 -07:00
oobabooga
9e189947d1 Minor fix after bd7cc4234d (thanks @belladoreai) 2024-05-21 10:37:30 -07:00
oobabooga
bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga
a38a37b3b3 llama.cpp: default n_gpu_layers to the maximum value for the model automatically 2024-05-19 10:57:42 -07:00
oobabooga
64e2a9a0a7 Fix the Phi-3 template when used in the UI 2024-04-24 01:34:11 -07:00
oobabooga
f27e1ba302
Add a /v1/internal/chat-prompt endpoint (#5879) 2024-04-19 00:24:46 -03:00
oobabooga
17c4319e2d Fix loading command-r context length metadata 2024-04-10 21:39:59 -07:00
oobabooga
64a76856bd Metadata: Fix loading Command R+ template with multiple options 2024-04-06 07:32:17 -07:00
oobabooga
d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga
9ab7365b56 Read rope_theta for DBRX model (thanks turboderp) 2024-04-01 20:25:31 -07:00
oobabooga
db5f6cd1d8 Fix ExLlamaV2 loaders using unnecessary "bits" metadata 2024-03-30 21:51:39 -07:00
oobabooga
624faa1438 Fix ExLlamaV2 context length setting (closes #5750) 2024-03-30 21:33:16 -07:00
oobabooga
4039999be5 Autodetect llamacpp_HF loader when tokenizer exists 2024-02-16 09:29:26 -08:00
oobabooga
76d28eaa9e
Add a menu for customizing the instruction template for the model (#5521) 2024-02-16 14:21:17 -03:00
oobabooga
2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga
cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
oobabooga
2734ce3e4c
Remove RWKV loader (#5130) 2023-12-31 02:01:40 -03:00
oobabooga
0e54a09bcb
Remove exllamav1 loaders (#5128) 2023-12-31 01:57:06 -03:00
B611
b7dd1f9542
Specify utf-8 encoding for model metadata file open (#5125) 2023-12-31 01:34:32 -03:00
Water
674be9a09a
Add HQQ quant loader (#4888)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-12-18 21:23:16 -03:00
oobabooga
f0d6ead877
llama.cpp: read instruction template from GGUF metadata (#4975) 2023-12-18 01:51:58 -03:00
oobabooga
39d2fe1ed9
Jinja templates for Instruct and Chat (#4874) 2023-12-12 17:23:14 -03:00
oobabooga
98361af4d5
Add QuIP# support (#4803)
It has to be installed manually for now.
2023-12-06 00:01:01 -03:00
oobabooga
e05d8fd441 Style changes 2023-11-15 15:51:37 -08:00
oobabooga
09f807af83 Use ExLlama_HF for GPTQ models by default 2023-10-21 20:45:38 -07:00
oobabooga
e14bde4946 Minor improvements to evaluation logs 2023-10-15 20:51:43 -07:00
oobabooga
9fab9a1ca6 Minor fix 2023-10-10 14:08:11 -07:00
oobabooga
a49cc69a4a Ignore rope_freq_base if value is 10000 2023-10-10 13:57:40 -07:00
oobabooga
7ffb424c7b Add AutoAWQ to README 2023-10-05 09:22:37 -07:00
cal066
cc632c3f33
AutoAWQ: initial support (#3999) 2023-10-05 13:19:18 -03:00
oobabooga
96da2e1c0d Read more metadata (config.json & quantize_config.json) 2023-09-29 06:14:16 -07:00
oobabooga
f8e9733412 Minor syntax change 2023-09-28 19:32:35 -07:00
oobabooga
1dd13e4643 Read Transformers config.json metadata 2023-09-28 19:19:47 -07:00
oobabooga
7a3ca2c68f Better detect EXL2 models 2023-09-23 13:05:55 -07:00
oobabooga
5075087461 Fix command-line arguments being ignored 2023-09-19 13:11:46 -07:00
oobabooga
3d1c0f173d User config precedence over GGUF metadata 2023-09-14 12:15:52 -07:00
Gennadij
460c40d8ab
Read more GGUF metadata (scale_linear and freq_base) (#3877) 2023-09-12 17:02:42 -03:00
oobabooga
16e1696071 Minor qol change 2023-09-12 10:44:26 -07:00
oobabooga
9331ab4798
Read GGUF metadata (#3873) 2023-09-11 18:49:30 -03:00
oobabooga
ed86878f02 Remove GGML support 2023-09-11 07:44:00 -07:00