Commit graph

225 commits

Author SHA1 Message Date
oobabooga
ad6d0218ae Fix after 219f0a7731 2025-06-01 19:27:14 -07:00
oobabooga
219f0a7731 Fix exllamav3_hf models failing to unload (closes #7031) 2025-05-30 12:05:49 -07:00
oobabooga
9ec46b8c44 Remove the HQQ loader (HQQ models can be loaded through Transformers) 2025-05-19 09:23:24 -07:00
oobabooga
5534d01da0
Estimate the VRAM for GGUF models + autoset gpu-layers (#6980) 2025-05-16 00:07:37 -03:00
oobabooga
d4b1e31c49 Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
2025-04-25 16:59:03 -07:00
oobabooga
86c3ed3218 Small change to the unload_model() function 2025-04-20 20:00:56 -07:00
oobabooga
b3bf7a885d Fix ExLlamaV2_HF and ExLlamaV3_HF after ae02ffc605 2025-04-20 11:32:48 -07:00
oobabooga
ae02ffc605
Refactor the transformers loader (#6859) 2025-04-20 13:33:47 -03:00
oobabooga
ae54d8faaa
New llama.cpp loader (#6846) 2025-04-18 09:59:37 -03:00
oobabooga
8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
SamAcctX
f28f39792d
update deprecated deepspeed import for transformers 4.46+ (#6725) 2025-02-02 20:41:36 -03:00
oobabooga
c08d87b78d Make the huggingface loader more readable 2025-01-09 12:23:38 -08:00
oobabooga
7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga
c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
Petr Korolev
13c033c745
Fix CUDA error on MPS backend during API request (#6572)
---------

Co-authored-by: oobabooga <oobabooga4@gmail.com>
2025-01-02 00:06:11 -03:00
oobabooga
7b88724711
Make responses start faster by removing unnecessary cleanup calls (#6625) 2025-01-01 18:33:38 -03:00
oobabooga
b92d7fd43e Add warnings for when AutoGPTQ, TensorRT-LLM, or HQQ are missing 2024-09-28 20:30:24 -07:00
oobabooga
e926c03b3d Add a --tokenizer-dir command-line flag for llamacpp_HF 2024-08-06 19:41:18 -07:00
oobabooga
9dcff21da9 Remove unnecessary shared.previous_model_name variable 2024-07-28 18:35:11 -07:00
oobabooga
514fb2e451 Fix UI error caused by --idle-timeout 2024-07-28 18:30:06 -07:00
oobabooga
e6181e834a Remove AutoAWQ as a standalone loader
(it works better through transformers)
2024-07-23 15:31:17 -07:00
oobabooga
8b44d7b12a Lint 2024-07-04 20:16:44 -07:00
GralchemOz
8a39f579d8
transformers: Add eager attention option to make Gemma-2 work properly (#6188) 2024-07-01 12:08:08 -03:00
oobabooga
577a8cd3ee
Add TensorRT-LLM support (#5715) 2024-06-24 02:30:03 -03:00
oobabooga
536f8d58d4 Do not expose alpha_value to llama.cpp & rope_freq_base to transformers
To avoid confusion
2024-06-23 22:09:24 -07:00
oobabooga
a36fa73071 Lint 2024-06-12 19:00:21 -07:00
oobabooga
bd7cc4234d
Backend cleanup (#6025) 2024-05-21 13:32:02 -03:00
oobabooga
9f77ed1b98
--idle-timeout flag to unload the model if unused for N minutes (#6026) 2024-05-19 23:29:39 -03:00
Tisjwlf
907702c204
Fix gguf multipart file loading (#5857) 2024-05-19 20:22:09 -03:00
oobabooga
e9c9483171 Improve the logging messages while loading models 2024-05-03 08:10:44 -07:00
oobabooga
dfdb6fee22 Set llm_int8_enable_fp32_cpu_offload=True for --load-in-4bit
To allow for 32-bit CPU offloading (it's very slow).
2024-04-26 09:39:27 -07:00
oobabooga
4094813f8d Lint 2024-04-24 09:53:41 -07:00
Colin
f3c9103e04
Revert walrus operator for params['max_memory'] (#5878) 2024-04-24 01:09:14 -03:00
wangshuai09
fd4e46bce2
Add Ascend NPU support (basic) (#5541) 2024-04-11 18:42:20 -03:00
oobabooga
d02744282b Minor logging change 2024-04-06 18:56:58 -07:00
oobabooga
1bdceea2d4 UI: Focus on the chat input after starting a new chat 2024-04-06 12:57:57 -07:00
oobabooga
1b87844928 Minor fix 2024-04-05 18:43:43 -07:00
oobabooga
6b7f7555fc Logging message to make transformers loader a bit more transparent 2024-04-05 18:40:02 -07:00
oobabooga
308452b783 Bitsandbytes: load preconverted 4bit models without additional flags 2024-04-04 18:10:24 -07:00
oobabooga
d423021a48
Remove CTransformers support (#5807) 2024-04-04 20:23:58 -03:00
oobabooga
13fe38eb27 Remove specialized code for gpt-4chan 2024-04-04 16:11:47 -07:00
oobabooga
4039999be5 Autodetect llamacpp_HF loader when tokenizer exists 2024-02-16 09:29:26 -08:00
oobabooga
b2b74c83a6 Fix Qwen1.5 in llamacpp_HF 2024-02-15 19:04:19 -08:00
oobabooga
d47182d9d1
llamacpp_HF: do not use oobabooga/llama-tokenizer (#5499) 2024-02-14 00:28:51 -03:00
oobabooga
4e34ae0587 Minor logging improvements 2024-02-06 08:22:08 -08:00
oobabooga
8ee3cea7cb Improve some log messages 2024-02-06 06:31:27 -08:00
oobabooga
2a1063eff5 Revert "Remove non-HF ExLlamaV2 loader (#5431)"
This reverts commit cde000d478.
2024-02-06 06:21:36 -08:00
oobabooga
cde000d478
Remove non-HF ExLlamaV2 loader (#5431) 2024-02-04 01:15:51 -03:00
sam-ngu
c0bdcee646
added trust_remote_code to deepspeed init loaderClass (#5237) 2024-01-26 11:10:57 -03:00
oobabooga
89e7e107fc Lint 2024-01-09 16:27:50 -08:00