simcop2387/text-generation-webui-mirror

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2025-06-07 14:17:09 -04:00

Author	SHA1	Message	Date
oobabooga	4925c307cf	Auto-adjust GPU layers on context size and cache type changes + many fixes	2025-05-16 09:07:38 -07:00
oobabooga	5534d01da0	Estimate the VRAM for GGUF models + autoset `gpu-layers` (#6980 )	2025-05-16 00:07:37 -03:00
oobabooga	3fa1a899ae	UI: Fix gpu-layers being ignored (closes #6973 )	2025-05-13 12:07:59 -07:00
oobabooga	d9de14d1f7	Restructure the repository (#6904 )	2025-04-26 08:56:54 -03:00
oobabooga	d4b1e31c49	Use `--ctx-size` to specify the context size for all loaders Old flags are still recognized as alternatives.	2025-04-25 16:59:03 -07:00
oobabooga	78aeabca89	Fix the transformers loader	2025-04-21 18:33:14 -07:00
oobabooga	ae02ffc605	Refactor the transformers loader (#6859 )	2025-04-20 13:33:47 -03:00
oobabooga	ae54d8faaa	New llama.cpp loader (#6846 )	2025-04-18 09:59:37 -03:00
Googolplexed	d78abe480b	Allow for model subfolder organization for GGUF files (#6686 ) --------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>	2025-04-18 02:53:59 -03:00
oobabooga	2c2d453c8c	Revert "Use ExLlamaV2 (instead of the HF one) for EXL2 models for now" This reverts commit `0ef1b8f8b4`.	2025-04-17 21:31:32 -07:00
oobabooga	0ef1b8f8b4	Use ExLlamaV2 (instead of the HF one) for EXL2 models for now It doesn't seem to have the "OverflowError" bug	2025-04-17 05:47:40 -07:00
oobabooga	682c78ea42	Add back detection of GPTQ models (closes #6841 )	2025-04-11 21:00:42 -07:00
oobabooga	8b8d39ec4e	Add ExLlamaV3 support (#6832 )	2025-04-09 00:07:08 -03:00
oobabooga	a5855c345c	Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835 )	2025-04-07 21:42:33 -03:00
oobabooga	7157257c3f	Remove the AutoGPTQ loader (#6641 )	2025-01-08 19:28:56 -03:00
oobabooga	e6181e834a	Remove AutoAWQ as a standalone loader (it works better through transformers)	2024-07-23 15:31:17 -07:00
oobabooga	907137a13d	Automatically set bf16 & use_eager_attention for Gemma-2	2024-07-01 21:46:35 -07:00
mefich	a85749dcbe	Update models_settings.py: add default alpha_value, add proper compress_pos_emb for newer GGUFs (#6111 )	2024-06-26 22:17:56 -03:00
oobabooga	577a8cd3ee	Add TensorRT-LLM support (#5715 )	2024-06-24 02:30:03 -03:00
Forkoz	1576227f16	Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119 ) --------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>	2024-06-14 13:51:01 -03:00
oobabooga	2d196ed2fe	Remove obsolete pre_layer parameter	2024-06-12 18:56:44 -07:00
oobabooga	9e189947d1	Minor fix after `bd7cc4234d` (thanks @belladoreai)	2024-05-21 10:37:30 -07:00
oobabooga	bd7cc4234d	Backend cleanup (#6025 )	2024-05-21 13:32:02 -03:00
oobabooga	a38a37b3b3	llama.cpp: default n_gpu_layers to the maximum value for the model automatically	2024-05-19 10:57:42 -07:00
oobabooga	64e2a9a0a7	Fix the Phi-3 template when used in the UI	2024-04-24 01:34:11 -07:00
oobabooga	f27e1ba302	Add a /v1/internal/chat-prompt endpoint (#5879 )	2024-04-19 00:24:46 -03:00
oobabooga	17c4319e2d	Fix loading command-r context length metadata	2024-04-10 21:39:59 -07:00
oobabooga	64a76856bd	Metadata: Fix loading Command R+ template with multiple options	2024-04-06 07:32:17 -07:00
oobabooga	d423021a48	Remove CTransformers support (#5807 )	2024-04-04 20:23:58 -03:00
oobabooga	9ab7365b56	Read rope_theta for DBRX model (thanks turboderp)	2024-04-01 20:25:31 -07:00
oobabooga	db5f6cd1d8	Fix ExLlamaV2 loaders using unnecessary "bits" metadata	2024-03-30 21:51:39 -07:00
oobabooga	624faa1438	Fix ExLlamaV2 context length setting (closes #5750 )	2024-03-30 21:33:16 -07:00
oobabooga	4039999be5	Autodetect llamacpp_HF loader when tokenizer exists	2024-02-16 09:29:26 -08:00
oobabooga	76d28eaa9e	Add a menu for customizing the instruction template for the model (#5521 )	2024-02-16 14:21:17 -03:00
oobabooga	2a1063eff5	Revert "Remove non-HF ExLlamaV2 loader (#5431 )" This reverts commit `cde000d478`.	2024-02-06 06:21:36 -08:00
oobabooga	cde000d478	Remove non-HF ExLlamaV2 loader (#5431 )	2024-02-04 01:15:51 -03:00
oobabooga	2734ce3e4c	Remove RWKV loader (#5130 )	2023-12-31 02:01:40 -03:00
oobabooga	0e54a09bcb	Remove exllamav1 loaders (#5128 )	2023-12-31 01:57:06 -03:00
B611	b7dd1f9542	Specify utf-8 encoding for model metadata file open (#5125 )	2023-12-31 01:34:32 -03:00
Water	674be9a09a	Add HQQ quant loader (#4888 ) --------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>	2023-12-18 21:23:16 -03:00
oobabooga	f0d6ead877	llama.cpp: read instruction template from GGUF metadata (#4975 )	2023-12-18 01:51:58 -03:00
oobabooga	39d2fe1ed9	Jinja templates for Instruct and Chat (#4874 )	2023-12-12 17:23:14 -03:00
oobabooga	98361af4d5	Add QuIP# support (#4803 ) It has to be installed manually for now.	2023-12-06 00:01:01 -03:00
oobabooga	e05d8fd441	Style changes	2023-11-15 15:51:37 -08:00
oobabooga	09f807af83	Use ExLlama_HF for GPTQ models by default	2023-10-21 20:45:38 -07:00
oobabooga	e14bde4946	Minor improvements to evaluation logs	2023-10-15 20:51:43 -07:00
oobabooga	9fab9a1ca6	Minor fix	2023-10-10 14:08:11 -07:00
oobabooga	a49cc69a4a	Ignore rope_freq_base if value is 10000	2023-10-10 13:57:40 -07:00
oobabooga	7ffb424c7b	Add AutoAWQ to README	2023-10-05 09:22:37 -07:00
cal066	cc632c3f33	AutoAWQ: initial support (#3999 )	2023-10-05 13:19:18 -03:00

1 2

72 commits