Commit graph

349 commits

Author SHA1 Message Date
oobabooga
b28fa86db6 Default --gpu-layers to 256 2025-05-06 17:51:55 -07:00
Downtown-Case
5ef564a22e
Fix model config loading in shared.py for Python 3.13 (#6961) 2025-05-06 17:03:33 -03:00
mamei16
8137eb8ef4
Dynamic Chat Message UI Update Speed (#6952) 2025-05-05 18:05:23 -03:00
oobabooga
df7bb0db1f Rename --n-gpu-layers to --gpu-layers 2025-05-04 20:03:55 -07:00
oobabooga
4cea720da8 UI: Remove the "Autoload the model" feature 2025-05-02 16:38:28 -07:00
oobabooga
905afced1c Add a --portable flag to hide things in portable mode 2025-05-02 16:34:29 -07:00
oobabooga
b46ca01340 UI: Set max_updates_second to 12 by default
When the tokens/second at at ~50 and the model is a thinking model,
the markdown rendering for the streaming message becomes a CPU
bottleneck.
2025-04-30 14:53:15 -07:00
oobabooga
d10bded7f8 UI: Add an enable_thinking option to enable/disable Qwen3 thinking 2025-04-28 22:37:01 -07:00
oobabooga
7b80acd524 Fix parsing --extra-flags 2025-04-26 18:40:03 -07:00
oobabooga
0fe3b033d0 Fix parsing of --n_ctx and --max_seq_len (2nd attempt) 2025-04-26 17:52:21 -07:00
oobabooga
c4afc0421d Fix parsing of --n_ctx and --max_seq_len 2025-04-26 17:43:53 -07:00
oobabooga
4ff91b6588 Better default settings for Speculative Decoding 2025-04-26 17:24:40 -07:00
oobabooga
3a207e7a57 Improve the --help formatting a bit 2025-04-26 07:31:04 -07:00
oobabooga
cbd4d967cc Update a --help message 2025-04-26 05:09:52 -07:00
oobabooga
d9de14d1f7
Restructure the repository (#6904) 2025-04-26 08:56:54 -03:00
oobabooga
d4017fbb6d
ExLlamaV3: Add kv cache quantization (#6903) 2025-04-25 21:32:00 -03:00
oobabooga
d4b1e31c49 Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
2025-04-25 16:59:03 -07:00
oobabooga
877cf44c08 llama.cpp: Add StreamingLLM (--streaming-llm) 2025-04-25 16:21:41 -07:00
oobabooga
d35818f4e1
UI: Add a collapsible thinking block to messages with <think> steps (#6902) 2025-04-25 18:02:02 -03:00
oobabooga
98f4c694b9 llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server 2025-04-25 07:32:51 -07:00
Matthew Jenkins
8f2493cc60
Prevent llamacpp defaults from locking up consumer hardware (#6870) 2025-04-24 23:38:57 -03:00
oobabooga
93fd4ad25d llama.cpp: Document the --device-draft syntax 2025-04-24 09:20:11 -07:00
oobabooga
c71a2af5ab Handle CMD_FLAGS.txt in the main code (closes #6896) 2025-04-24 08:21:06 -07:00
oobabooga
bfbde73409 Make 'instruct' the default chat mode 2025-04-24 07:08:49 -07:00
oobabooga
e99c20bcb0
llama.cpp: Add speculative decoding (#6891) 2025-04-23 20:10:16 -03:00
oobabooga
8cfd7f976b Revert "Remove the old --model-menu flag"
This reverts commit 109de34e3b.
2025-04-20 13:35:42 -07:00
oobabooga
ae02ffc605
Refactor the transformers loader (#6859) 2025-04-20 13:33:47 -03:00
oobabooga
d68f0fbdf7 Remove obsolete references to llamacpp_HF 2025-04-18 07:46:04 -07:00
oobabooga
c6901aba9f Remove deprecation warning code 2025-04-18 06:05:47 -07:00
oobabooga
8144e1031e Remove deprecated command-line flags 2025-04-18 06:02:28 -07:00
oobabooga
ae54d8faaa
New llama.cpp loader (#6846) 2025-04-18 09:59:37 -03:00
oobabooga
4ed0da74a8 Remove the obsolete 'multimodal' extension 2025-04-09 20:09:48 -07:00
oobabooga
8b8d39ec4e
Add ExLlamaV3 support (#6832) 2025-04-09 00:07:08 -03:00
oobabooga
a5855c345c
Set context lengths to at most 8192 by default (to prevent out of memory errors) (#6835) 2025-04-07 21:42:33 -03:00
oobabooga
109de34e3b Remove the old --model-menu flag 2025-03-31 09:24:03 -07:00
oobabooga
0360f54ae8 UI: add a "Show after" parameter (to use with DeepSeek </think>) 2025-02-02 15:30:09 -08:00
oobabooga
c832953ff7 UI: Activate auto_max_new_tokens by default 2025-01-14 05:59:55 -08:00
oobabooga
d2f6c0f65f Update README 2025-01-10 13:25:40 -08:00
oobabooga
c393f7650d Update settings-template.yaml, organize modules/shared.py 2025-01-10 13:22:18 -08:00
oobabooga
83c426e96b
Organize internals (#6646) 2025-01-10 18:04:32 -03:00
oobabooga
7fe46764fb Improve the --help message about --tensorcores as well 2025-01-10 07:07:41 -08:00
oobabooga
da6d868f58 Remove old deprecated flags (~6 months or more) 2025-01-09 16:11:46 -08:00
BPplays
619265b32c
add ipv6 support to the API (#6559) 2025-01-09 10:23:44 -03:00
oobabooga
91a8a87887 Remove obsolete code 2025-01-08 15:07:21 -08:00
oobabooga
7157257c3f
Remove the AutoGPTQ loader (#6641) 2025-01-08 19:28:56 -03:00
oobabooga
c0f600c887 Add a --torch-compile flag for transformers 2025-01-05 05:47:00 -08:00
oobabooga
11af199aff Add a "Static KV cache" option for transformers 2025-01-04 17:52:57 -08:00
oobabooga
60c93e0c66 UI: Set cache_type to fp16 by default 2024-12-17 19:44:20 -08:00
Diner Burger
addad3c63e
Allow more granular KV cache settings (#6561) 2024-12-17 17:43:48 -03:00
oobabooga
d769618591
Improved UI (#6575) 2024-12-17 00:47:41 -03:00