Commit graph

138 commits

Author SHA1 Message Date
oobabooga
0ef1b8f8b4 Use ExLlamaV2 (instead of the HF one) for EXL2 models for now
It doesn't seem to have the "OverflowError" bug
2025-04-17 05:47:40 -07:00
oobabooga
bf48ec8c44 Remove an unnecessary UI message 2025-04-07 17:43:41 -07:00
oobabooga
a5855c345c
Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835) 2025-04-07 21:42:33 -03:00
oobabooga
75ff3f3815 UI: Mention common context length values 2025-01-25 08:22:23 -08:00
oobabooga
3020f2e5ec UI: improve the info message about --tensorcores 2025-01-09 12:44:03 -08:00
oobabooga
7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga
c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
oobabooga
39a5c9a49c
UI organization (#6618) 2024-12-29 11:16:17 -03:00
oobabooga
ddccc0d657 UI: minor change to log messages 2024-12-17 19:39:00 -08:00
oobabooga
3030c79e8c UI: show progress while loading a model 2024-12-17 19:37:43 -08:00
Diner Burger
addad3c63e
Allow more granular KV cache settings (#6561) 2024-12-17 17:43:48 -03:00
mefich
1c937dad72
Filter whitespaces in downloader fields in model tab (#6518) 2024-11-18 12:01:40 -03:00
oobabooga
93c250b9b6 Add a UI element for enable_tp 2024-10-01 11:16:15 -07:00
oobabooga
7050bb880e UI: make n_ctx/max_seq_len/truncation_length numbers rather than sliders 2024-07-27 23:11:53 -07:00
Harry
078e8c8969
Make compress_pos_emb float (#6276) 2024-07-28 03:03:19 -03:00
oobabooga
e6181e834a Remove AutoAWQ as a standalone loader
(it works better through transformers)
2024-07-23 15:31:17 -07:00
oobabooga
f18c947a86 Update the tensorcores description 2024-07-22 18:06:41 -07:00
oobabooga
aa809e420e Bump llama-cpp-python to 0.2.83, add back tensorcore wheels
Also add back the progress bar patch
2024-07-22 18:05:11 -07:00
oobabooga
11bbf71aa5
Bump back llama-cpp-python (#6257) 2024-07-22 16:19:41 -03:00
oobabooga
0f53a736c1 Revert the llama-cpp-python update 2024-07-22 12:02:25 -07:00
oobabooga
a687f950ba Remove the tensorcores llama.cpp wheels
They are not faster than the default wheels anymore and they use a lot of space.
2024-07-22 11:54:35 -07:00
oobabooga
e9d4bff7d0 Update the --tensor_split description 2024-07-20 22:04:48 -07:00
oobabooga
564d8c8c0d Make alpha_value a float number 2024-07-20 20:02:54 -07:00
oobabooga
79c4d3da3d
Optimize the UI (#6251) 2024-07-21 00:01:42 -03:00
Vhallo
a9a6d72d8c
Use gr.Number for RoPE scaling parameters (#6233)
---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2024-07-20 18:57:09 -03:00
oobabooga
e436d69e2b Add --no_xformers and --no_sdpa flags for ExllamaV2 2024-07-11 15:47:37 -07:00
oobabooga
c176244327 UI: Move cache_8bit/cache_4bit further up 2024-07-05 12:16:21 -07:00
oobabooga
e79e7b90dc UI: Move the cache_8bit and cache_4bit elements up 2024-07-04 20:21:28 -07:00
oobabooga
8b44d7b12a Lint 2024-07-04 20:16:44 -07:00
GralchemOz
8a39f579d8
transformers: Add eager attention option to make Gemma-2 work properly (#6188) 2024-07-01 12:08:08 -03:00
oobabooga
577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga
b48ab482f8 Remove obsolete "gptq_for_llama_info" message 2024-06-23 22:05:19 -07:00
GodEmperor785
2c5a9eb597
Change limits of RoPE scaling sliders in UI (#6142) 2024-06-19 21:42:17 -03:00
Forkoz
1d79aa67cf
Fix flash-attn UI parameter to actually store true. (#6076) 2024-06-13 00:34:54 -03:00
oobabooga
2d196ed2fe Remove obsolete pre_layer parameter 2024-06-12 18:56:44 -07:00
oobabooga
4f1e96b9e3 Downloader: Add --model-dir argument, respect --model-dir in the UI 2024-05-23 20:42:46 -07:00
oobabooga
9e189947d1 Minor fix after bd7cc4234d (thanks @belladoreai) 2024-05-21 10:37:30 -07:00
oobabooga
bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
Samuel Wein
b63dc4e325
UI: Warn user if they are trying to load a model from no path (#6006) 2024-05-19 20:05:17 -03:00
chr
6b546a2c8b
llama.cpp: increase the max threads from 32 to 256 (#5889) 2024-05-19 20:02:19 -03:00
oobabooga
e61055253c Bump llama-cpp-python to 0.2.69, add --flash-attn option 2024-05-03 04:31:22 -07:00
oobabooga
51fb766bea
Add back my llama-cpp-python wheels, bump to 0.2.65 (#5964) 2024-04-30 09:11:31 -03:00
oobabooga
f0538efb99 Remove obsolete --tensorcores references 2024-04-24 00:31:28 -07:00
wangshuai09
fd4e46bce2
Add Ascend NPU support (basic) (#5541) 2024-04-11 18:42:20 -03:00
Alex O'Connell
b94cd6754e
UI: Respect model and lora directory settings when downloading files (#5842) 2024-04-11 01:55:02 -03:00
oobabooga
d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga
2a92a842ce
Bump gradio to 4.23 (#5758) 2024-03-26 16:32:20 -03:00
oobabooga
49b111e2dd Lint 2024-03-17 08:33:23 -07:00
oobabooga
40a60e0297 Convert attention_sink_size to int (closes #5696) 2024-03-13 08:15:49 -07:00
oobabooga
63701f59cf UI: mention that n_gpu_layers > 0 is necessary for the GPU to be used 2024-03-11 18:54:15 -07:00