Leszek Hanusz
|
142c4eb1a6
|
Try to stop the model if it was loaded when ctrl-c received
|
2025-06-03 00:57:55 +02:00 |
|
oobabooga
|
2db7745cbd
|
Show llama.cpp prompt processing on one line instead of many lines
|
2025-06-01 22:12:24 -07:00 |
|
oobabooga
|
e4d3f4449d
|
API: Fix a regression
|
2025-05-16 13:02:27 -07:00 |
|
oobabooga
|
5534d01da0
|
Estimate the VRAM for GGUF models + autoset gpu-layers (#6980)
|
2025-05-16 00:07:37 -03:00 |
|
oobabooga
|
62c774bf24
|
Revert "New attempt"
This reverts commit e7ac06c169 .
|
2025-05-13 06:42:25 -07:00 |
|
oobabooga
|
e7ac06c169
|
New attempt
|
2025-05-10 19:20:04 -07:00 |
|
oobabooga
|
9ea2a69210
|
llama.cpp: Add --no-webui to the llama-server command
|
2025-05-08 10:41:25 -07:00 |
|
oobabooga
|
c4f36db0d8
|
llama.cpp: remove tfs (it doesn't get used)
|
2025-05-06 08:41:13 -07:00 |
|
oobabooga
|
d1c0154d66
|
llama.cpp: Add top_n_sigma, fix typical_p in sampler priority
|
2025-05-06 06:38:39 -07:00 |
|
oobabooga
|
b817bb33fd
|
Minor fix after df7bb0db1f
|
2025-05-05 05:00:20 -07:00 |
|
oobabooga
|
b7a5c7db8d
|
llama.cpp: Handle short arguments in --extra-flags
|
2025-05-04 07:14:42 -07:00 |
|
oobabooga
|
4c2e3b168b
|
llama.cpp: Add a retry mechanism when getting the logits (sometimes it fails)
|
2025-05-03 06:51:20 -07:00 |
|
oobabooga
|
b950a0c6db
|
Lint
|
2025-04-30 20:02:10 -07:00 |
|
oobabooga
|
a6c3ec2299
|
llama.cpp: Explicitly send cache_prompt = True
|
2025-04-30 15:24:07 -07:00 |
|
oobabooga
|
1ee0acc852
|
llama.cpp: Make --verbose print the llama-server command
|
2025-04-28 15:56:25 -07:00 |
|
oobabooga
|
c6c2855c80
|
llama.cpp: Remove the timeout while loading models (closes #6907)
|
2025-04-27 21:22:21 -07:00 |
|
oobabooga
|
7b80acd524
|
Fix parsing --extra-flags
|
2025-04-26 18:40:03 -07:00 |
|
oobabooga
|
234aba1c50
|
llama.cpp: Simplify the prompt processing progress indicator
The progress bar was unreliable
|
2025-04-26 17:33:47 -07:00 |
|
oobabooga
|
d4b1e31c49
|
Use --ctx-size to specify the context size for all loaders
Old flags are still recognized as alternatives.
|
2025-04-25 16:59:03 -07:00 |
|
oobabooga
|
faababc4ea
|
llama.cpp: Add a prompt processing progress bar
|
2025-04-25 16:42:30 -07:00 |
|
oobabooga
|
877cf44c08
|
llama.cpp: Add StreamingLLM (--streaming-llm )
|
2025-04-25 16:21:41 -07:00 |
|
oobabooga
|
98f4c694b9
|
llama.cpp: Add --extra-flags parameter for passing additional flags to llama-server
|
2025-04-25 07:32:51 -07:00 |
|
oobabooga
|
e99c20bcb0
|
llama.cpp: Add speculative decoding (#6891)
|
2025-04-23 20:10:16 -03:00 |
|
Matthew Jenkins
|
d3e7c655e5
|
Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862)
|
2025-04-20 23:06:24 -03:00 |
|
oobabooga
|
5ab069786b
|
llama.cpp: add back the two encode calls (they are harmless now)
|
2025-04-19 17:38:36 -07:00 |
|
oobabooga
|
b9da5c7e3a
|
Use 127.0.0.1 instead of localhost for faster llama.cpp on Windows
|
2025-04-19 17:36:04 -07:00 |
|
oobabooga
|
9c9df2063f
|
llama.cpp: fix unicode decoding (closes #6856)
|
2025-04-19 16:38:15 -07:00 |
|
oobabooga
|
ba976d1390
|
llama.cpp: avoid two 'encode' calls
|
2025-04-19 16:35:01 -07:00 |
|
oobabooga
|
ed42154c78
|
Revert "llama.cpp: close the connection immediately on 'Stop'"
This reverts commit 5fdebc554b .
|
2025-04-19 05:32:36 -07:00 |
|
oobabooga
|
5fdebc554b
|
llama.cpp: close the connection immediately on 'Stop'
|
2025-04-19 04:59:24 -07:00 |
|
oobabooga
|
6589ebeca8
|
Revert "llama.cpp: new optimization attempt"
This reverts commit e2e73ed22f .
|
2025-04-18 21:16:21 -07:00 |
|
oobabooga
|
e2e73ed22f
|
llama.cpp: new optimization attempt
|
2025-04-18 21:05:08 -07:00 |
|
oobabooga
|
e2e90af6cd
|
llama.cpp: don't include --rope-freq-base in the launch command if null
|
2025-04-18 20:51:18 -07:00 |
|
oobabooga
|
9f07a1f5d7
|
llama.cpp: new attempt at optimizing the llama-server connection
|
2025-04-18 19:30:53 -07:00 |
|
oobabooga
|
f727b4a2cc
|
llama.cpp: close the connection properly when generation is cancelled
|
2025-04-18 19:01:39 -07:00 |
|
oobabooga
|
b3342b8dd8
|
llama.cpp: optimize the llama-server connection
|
2025-04-18 18:46:36 -07:00 |
|
oobabooga
|
2002590536
|
Revert "Attempt at making the llama-server streaming more efficient."
This reverts commit 5ad080ff25 .
|
2025-04-18 18:13:54 -07:00 |
|
oobabooga
|
71ae05e0a4
|
llama.cpp: Fix the sampler priority handling
|
2025-04-18 18:06:36 -07:00 |
|
oobabooga
|
5ad080ff25
|
Attempt at making the llama-server streaming more efficient.
|
2025-04-18 18:04:49 -07:00 |
|
oobabooga
|
4fabd729c9
|
Fix the API without streaming or without 'sampler_priority' (closes #6851)
|
2025-04-18 17:25:22 -07:00 |
|
oobabooga
|
5135523429
|
Fix the new llama.cpp loader failing to unload models
|
2025-04-18 17:10:26 -07:00 |
|
oobabooga
|
caa6afc88b
|
Only show 'GENERATE_PARAMS=...' in the logits endpoint if use_logits is True
|
2025-04-18 09:57:57 -07:00 |
|
oobabooga
|
d00d713ace
|
Rename get_max_context_length to get_vocabulary_size in the new llama.cpp loader
|
2025-04-18 08:14:15 -07:00 |
|
oobabooga
|
c1cc65e82e
|
Lint
|
2025-04-18 08:06:51 -07:00 |
|
oobabooga
|
a0abf93425
|
Connect --rope-freq-base to the new llama.cpp loader
|
2025-04-18 06:53:51 -07:00 |
|
oobabooga
|
ae54d8faaa
|
New llama.cpp loader (#6846)
|
2025-04-18 09:59:37 -03:00 |
|