Compare commits

...

45 commits

Author SHA1 Message Date
oobabooga
ee25cefce2
Merge 0783f5c891 into d47c8eb956 2025-06-05 18:42:24 +00:00
oobabooga
0783f5c891 Use the latest format 2025-06-05 11:42:12 -07:00
oobabooga
7f7909be54 Merge branch 'dev' into multimodal_gguf 2025-06-05 10:52:06 -07:00
oobabooga
7366ff5dfa Change a class name 2025-06-05 10:49:02 -07:00
oobabooga
27affa9db7 Pre-merge dev branch 2025-06-05 10:47:07 -07:00
oobabooga
d47c8eb956 Remove quotes from LLM-generated websearch query (closes #7045).
Fix by @Quiet-Joker
2025-06-05 06:57:59 -07:00
oobabooga
977ec801b7 Improve table colors in instruct mode 2025-06-05 06:33:45 -07:00
Hanusz Leszek
3829507d0f
Stop model during graceful shutdown (#7042) 2025-06-04 15:13:36 -03:00
oobabooga
3d676cd50f Optimize syntax highlighting 2025-06-04 11:02:04 -07:00
oobabooga
66a75c899a Improve the scrollbars in code blocks 2025-06-04 10:59:43 -07:00
oobabooga
9bd7359ffa Scroll the textarea into view when editing a message 2025-06-04 10:47:14 -07:00
oobabooga
93b3752cdf Revert "Remove the "Is typing..." yield by default"
This reverts commit b30a73016d.
2025-06-04 09:40:30 -07:00
oobabooga
b38ec0ec38 Update llama.cpp 2025-06-02 11:33:17 -07:00
oobabooga
b30a73016d Remove the "Is typing..." yield by default 2025-06-02 07:49:22 -07:00
oobabooga
7278548cd1
Simplify the one-click installer (#7039) 2025-06-02 09:57:55 -03:00
oobabooga
bb409c926e
Update only the last message during streaming + add back dynamic UI update speed (#7038) 2025-06-02 09:50:17 -03:00
oobabooga
45c9ae312c Use the flash-attention wheels in https://github.com/kingbri1/flash-attention 2025-06-01 22:17:22 -07:00
oobabooga
2db7745cbd Show llama.cpp prompt processing on one line instead of many lines 2025-06-01 22:12:24 -07:00
oobabooga
ad6d0218ae Fix after 219f0a7731 2025-06-01 19:27:14 -07:00
oobabooga
92adceb7b5 UI: Fix the model downloader progress bar 2025-06-01 19:22:21 -07:00
oobabooga
7a81beb0c1 Turn long pasted text into an attachment automatically 2025-06-01 18:26:14 -07:00
oobabooga
bf42b2c3a1 Fix thinking blocks sometimes showing a white outline 2025-06-01 11:02:04 -07:00
oobabooga
83849336d8 Improve how Show controls looks in the hover menu 2025-06-01 10:58:49 -07:00
oobabooga
3e3746283c Improve the typing dots position 2025-06-01 10:55:31 -07:00
oobabooga
88ff3e6ad8 CSS fixes after 98a7508a99 2025-06-01 08:04:35 -07:00
oobabooga
9e80193008 Add the model name to each message's metadata 2025-05-31 22:41:35 -07:00
oobabooga
0816ecedb7 Lint 2025-05-31 22:25:09 -07:00
oobabooga
98a7508a99 UI: Move 'Show controls' inside the hover menu 2025-05-31 22:22:13 -07:00
oobabooga
85f2f01a3a UI: Fix extra gaps on the right sidebar 2025-05-31 21:29:57 -07:00
oobabooga
f8d220c1e6 Add a tooltip to the web search checkbox 2025-05-31 21:22:36 -07:00
oobabooga
4a2727b71d Add a tooltip to the file upload button 2025-05-31 20:24:31 -07:00
oobabooga
1d88456659 Add support for .docx attachments 2025-05-31 20:15:07 -07:00
oobabooga
dc8ed6dbe7 Bump exllamav3 to 0.0.3 2025-05-31 14:27:33 -07:00
oobabooga
c55d3c61c6 Bump exllamav2 to 0.3.1 2025-05-31 14:21:42 -07:00
oobabooga
15f466ca3f Update README 2025-05-30 15:49:57 -07:00
oobabooga
219f0a7731 Fix exllamav3_hf models failing to unload (closes #7031) 2025-05-30 12:05:49 -07:00
oobabooga
298d4719c6 Multiple small style improvements 2025-05-30 11:32:24 -07:00
oobabooga
7c29879e79 Fix 'Start reply with' (closes #7033) 2025-05-30 11:17:47 -07:00
oobabooga
1f3b1a1b94 Simplify things 2025-05-28 12:18:17 -07:00
oobabooga
d702a2a962 Lint 2025-05-28 11:51:05 -07:00
oobabooga
9d7894a13f Organize 2025-05-28 10:10:26 -07:00
oobabooga
c6d0de8538 Better image positioning in prompts 2025-05-28 09:28:20 -07:00
oobabooga
c1a47a0b60 Better request header 2025-05-28 09:17:02 -07:00
oobabooga
2e21b1f5e3 Integrate with the API 2025-05-28 09:14:26 -07:00
oobabooga
f92e1f44a0 Add multimodal support (llama.cpp) 2025-05-28 05:52:07 -07:00
39 changed files with 990 additions and 525 deletions

View file

@ -14,18 +14,18 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
- Supports multiple text generation backends in one UI/API, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), [ExLlamaV2](https://github.com/turboderp-org/exllamav2), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) (the latter via its own [Dockerfile](https://github.com/oobabooga/text-generation-webui/blob/main/docker/TensorRT-LLM/Dockerfile)). - Supports multiple text generation backends in one UI/API, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), [ExLlamaV2](https://github.com/turboderp-org/exllamav2), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) (the latter via its own [Dockerfile](https://github.com/oobabooga/text-generation-webui/blob/main/docker/TensorRT-LLM/Dockerfile)).
- Easy setup: Choose between **portable builds** (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or the one-click installer that creates a self-contained `installer_files` directory. - Easy setup: Choose between **portable builds** (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or the one-click installer that creates a self-contained `installer_files` directory.
- **File attachments**: Upload text files and PDF documents directly in conversations to talk about their contents. - 100% offline and private, with zero telemetry, external resources, or remote update requests.
- **Web search**: Optionally search the internet with LLM-generated queries based on your input to add context to the conversation.
- Advanced chat management: Edit messages, navigate between message versions, and branch conversations at any point.
- Automatic prompt formatting using Jinja2 templates. You don't need to ever worry about prompt formats. - Automatic prompt formatting using Jinja2 templates. You don't need to ever worry about prompt formats.
- Automatic GPU layers for GGUF models (on NVIDIA GPUs). - **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents.
- UI that resembles the original ChatGPT style. - **Web search**: Optionally search the internet with LLM-generated queries to add context to the conversation.
- Three chat modes: `instruct`, `chat-instruct`, and `chat`, with automatic prompt templates in `chat-instruct`. - Aesthetic UI with dark and light themes.
- Free-form text generation in the Default/Notebook tabs without being limited to chat turns. You can send formatted conversations from the Chat tab to these. - `instruct` mode for instruction-following (like ChatGPT), and `chat-instruct`/`chat` modes for talking to custom characters.
- Edit messages, navigate between message versions, and branch conversations at any point.
- Multiple sampling parameters and generation options for sophisticated text generation control. - Multiple sampling parameters and generation options for sophisticated text generation control.
- Switch between different models easily in the UI without restarting, with fine control over settings. - Switch between different models in the UI without restarting.
- Automatic GPU layers for GGUF models (on NVIDIA GPUs).
- Free-form text generation in the Default/Notebook tabs without being limited to chat turns.
- OpenAI-compatible API with Chat and Completions endpoints, including tool-calling support see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples). - OpenAI-compatible API with Chat and Completions endpoints, including tool-calling support see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples).
- 100% offline and private, with zero telemetry, external resources, or remote update requests. Web search is optional and user-controlled.
- Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details. - Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
## How to install ## How to install

View file

@ -17,6 +17,14 @@
color: #d1d5db !important; color: #d1d5db !important;
} }
.chat .message-body :is(th, td) {
border-color: #40404096 !important;
}
.dark .chat .message-body :is(th, td) {
border-color: #ffffff75 !important;
}
.chat .message-body :is(p, ul, ol) { .chat .message-body :is(p, ul, ol) {
margin: 1.25em 0 !important; margin: 1.25em 0 !important;
} }

View file

@ -582,7 +582,6 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
#chat-input { #chat-input {
padding: 0; padding: 0;
padding-top: 18px;
background: transparent; background: transparent;
border: none; border: none;
} }
@ -661,37 +660,12 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
} }
} }
#show-controls {
position: absolute;
background-color: transparent;
border: 0 !important;
border-radius: 0;
}
#show-controls label {
z-index: 1000;
position: absolute;
right: 30px;
top: 10px;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.dark #show-controls span {
color: var(--neutral-400);
}
#show-controls span {
color: var(--neutral-600);
}
#typing-container { #typing-container {
display: none; display: none;
position: absolute; position: absolute;
background-color: transparent; background-color: transparent;
left: -2px; left: -2px;
top: 4px; top: -5px;
padding: var(--block-padding); padding: var(--block-padding);
} }
@ -785,6 +759,33 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
background: var(--selected-item-color-dark) !important; background: var(--selected-item-color-dark) !important;
} }
#show-controls {
height: 36px;
border-top: 1px solid var(--border-color-dark) !important;
border-left: 1px solid var(--border-color-dark) !important;
border-right: 1px solid var(--border-color-dark) !important;
border-radius: 0;
border-bottom: 0 !important;
background-color: var(--darker-gray);
padding-top: 3px;
padding-left: 4px;
display: flex;
}
#show-controls label {
display: flex;
flex-direction: row-reverse;
font-weight: bold;
justify-content: start;
width: 100%;
padding-right: 12px;
gap: 10px;
}
#show-controls label input {
margin-top: 4px;
}
.transparent-substring { .transparent-substring {
opacity: 0.333; opacity: 0.333;
} }
@ -1326,6 +1327,10 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
overflow: hidden; overflow: hidden;
} }
.thinking-content:focus, .thinking-header:focus {
outline: 0 !important;
}
.dark .thinking-block { .dark .thinking-block {
background-color: var(--darker-gray); background-color: var(--darker-gray);
} }
@ -1551,3 +1556,25 @@ strong {
color: var(--body-text-color-subdued); color: var(--body-text-color-subdued);
margin-top: 4px; margin-top: 4px;
} }
.image-attachment {
flex-direction: column;
}
.image-preview {
border-radius: 16px;
margin-bottom: 5px;
object-fit: cover;
object-position: center;
border: 2px solid var(--border-color-primary);
aspect-ratio: 1 / 1;
}
button:focus {
outline: none;
}
/* Fix extra gaps for hidden elements on the right sidebar */
.svelte-sa48pu.stretch:has(> .hidden:only-child) {
display: none;
}

View file

@ -32,6 +32,7 @@ class ModelDownloader:
self.max_retries = max_retries self.max_retries = max_retries
self.session = self.get_session() self.session = self.get_session()
self._progress_bar_slots = None self._progress_bar_slots = None
self.progress_queue = None
def get_session(self): def get_session(self):
session = requests.Session() session = requests.Session()
@ -218,33 +219,45 @@ class ModelDownloader:
max_retries = self.max_retries max_retries = self.max_retries
attempt = 0 attempt = 0
file_downloaded_count_for_progress = 0
try: try:
while attempt < max_retries: while attempt < max_retries:
attempt += 1 attempt += 1
session = self.session session = self.session
headers = {} headers = {}
mode = 'wb' mode = 'wb'
current_file_size_on_disk = 0
try: try:
if output_path.exists() and not start_from_scratch: if output_path.exists() and not start_from_scratch:
# Resume download current_file_size_on_disk = output_path.stat().st_size
r = session.get(url, stream=True, timeout=20) r_head = session.head(url, timeout=20)
total_size = int(r.headers.get('content-length', 0)) r_head.raise_for_status()
if output_path.stat().st_size >= total_size: total_size = int(r_head.headers.get('content-length', 0))
if current_file_size_on_disk >= total_size and total_size > 0:
if self.progress_queue is not None and total_size > 0:
self.progress_queue.put((1.0, str(filename)))
return return
headers = {'Range': f'bytes={output_path.stat().st_size}-'} headers = {'Range': f'bytes={current_file_size_on_disk}-'}
mode = 'ab' mode = 'ab'
with session.get(url, stream=True, headers=headers, timeout=30) as r: with session.get(url, stream=True, headers=headers, timeout=30) as r:
r.raise_for_status() # If status is not 2xx, raise an error r.raise_for_status()
total_size = int(r.headers.get('content-length', 0)) total_size_from_stream = int(r.headers.get('content-length', 0))
block_size = 1024 * 1024 # 1MB if mode == 'ab':
effective_total_size = current_file_size_on_disk + total_size_from_stream
else:
effective_total_size = total_size_from_stream
filename_str = str(filename) # Convert PosixPath to string if necessary block_size = 1024 * 1024
filename_str = str(filename)
tqdm_kwargs = { tqdm_kwargs = {
'total': total_size, 'total': effective_total_size,
'initial': current_file_size_on_disk if mode == 'ab' else 0,
'unit': 'B', 'unit': 'B',
'unit_scale': True, 'unit_scale': True,
'unit_divisor': 1024, 'unit_divisor': 1024,
@ -261,16 +274,20 @@ class ModelDownloader:
}) })
with open(output_path, mode) as f: with open(output_path, mode) as f:
if mode == 'ab':
f.seek(current_file_size_on_disk)
with tqdm.tqdm(**tqdm_kwargs) as t: with tqdm.tqdm(**tqdm_kwargs) as t:
count = 0 file_downloaded_count_for_progress = current_file_size_on_disk
for data in r.iter_content(block_size): for data in r.iter_content(block_size):
f.write(data) f.write(data)
t.update(len(data)) t.update(len(data))
if total_size != 0 and self.progress_bar is not None: if effective_total_size != 0 and self.progress_queue is not None:
count += len(data) file_downloaded_count_for_progress += len(data)
self.progress_bar(float(count) / float(total_size), f"{filename_str}") progress_fraction = float(file_downloaded_count_for_progress) / float(effective_total_size)
self.progress_queue.put((progress_fraction, filename_str))
break
break # Exit loop if successful
except (RequestException, ConnectionError, Timeout) as e: except (RequestException, ConnectionError, Timeout) as e:
print(f"Error downloading {filename}: {e}.") print(f"Error downloading {filename}: {e}.")
print(f"That was attempt {attempt}/{max_retries}.", end=' ') print(f"That was attempt {attempt}/{max_retries}.", end=' ')
@ -295,10 +312,9 @@ class ModelDownloader:
finally: finally:
print(f"\nDownload of {len(file_list)} files to {output_folder} completed.") print(f"\nDownload of {len(file_list)} files to {output_folder} completed.")
def download_model_files(self, model, branch, links, sha256, output_folder, progress_bar=None, start_from_scratch=False, threads=4, specific_file=None, is_llamacpp=False): def download_model_files(self, model, branch, links, sha256, output_folder, progress_queue=None, start_from_scratch=False, threads=4, specific_file=None, is_llamacpp=False):
self.progress_bar = progress_bar self.progress_queue = progress_queue
# Create the folder and writing the metadata
output_folder.mkdir(parents=True, exist_ok=True) output_folder.mkdir(parents=True, exist_ok=True)
if not is_llamacpp: if not is_llamacpp:

View file

@ -1,8 +1,10 @@
import base64
import copy import copy
import json import json
import time import time
from collections import deque from collections import deque
import requests
import tiktoken import tiktoken
from pydantic import ValidationError from pydantic import ValidationError
@ -16,6 +18,7 @@ from modules.chat import (
load_character_memoized, load_character_memoized,
load_instruction_template_memoized load_instruction_template_memoized
) )
from modules.logging_colors import logger
from modules.presets import load_preset_memoized from modules.presets import load_preset_memoized
from modules.text_generation import decode, encode, generate_reply from modules.text_generation import decode, encode, generate_reply
@ -82,6 +85,50 @@ def process_parameters(body, is_legacy=False):
return generate_params return generate_params
def process_image_url(url, image_id):
"""Process an image URL and return attachment data for llama.cpp"""
try:
if url.startswith("data:"):
if "base64," in url:
image_data = url.split("base64,", 1)[1]
else:
raise ValueError("Unsupported data URL format")
else:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'}
response = requests.get(url, timeout=10, headers=headers)
response.raise_for_status()
image_data = base64.b64encode(response.content).decode('utf-8')
return {"image_data": image_data, "image_id": image_id}
except Exception as e:
logger.error(f"Error processing image URL {url}: {e}")
return None
def process_multimodal_content(content):
"""Extract text and images from OpenAI multimodal format"""
if isinstance(content, str):
return content, []
if isinstance(content, list):
text_content = ""
images = []
for item in content:
if item.get("type") == "text":
text_content += item.get("text", "")
elif item.get("type") == "image_url":
image_url = item.get("image_url", {}).get("url", "")
if image_url:
image = process_image_url(image_url, len(images) + 1)
if image:
images.append(image)
return text_content, images
return str(content), []
def convert_history(history): def convert_history(history):
''' '''
Chat histories in this program are in the format [message, reply]. Chat histories in this program are in the format [message, reply].
@ -93,19 +140,29 @@ def convert_history(history):
user_input = "" user_input = ""
user_input_last = True user_input_last = True
system_message = "" system_message = ""
all_images = [] # Simple list to collect all images
for entry in history: for entry in history:
content = entry["content"] content = entry["content"]
role = entry["role"] role = entry["role"]
if role == "user": if role == "user":
user_input = content # Process multimodal content
processed_content, images = process_multimodal_content(content)
if images:
image_refs = "".join("<__media__>" for img in images)
processed_content = f"{processed_content} {image_refs}"
user_input = processed_content
user_input_last = True user_input_last = True
all_images.extend(images) # Add any images to our collection
if current_message: if current_message:
chat_dialogue.append([current_message, '', '']) chat_dialogue.append([current_message, '', ''])
current_message = "" current_message = ""
current_message = content current_message = processed_content
elif role == "assistant": elif role == "assistant":
if "tool_calls" in entry and isinstance(entry["tool_calls"], list) and len(entry["tool_calls"]) > 0 and content.strip() == "": if "tool_calls" in entry and isinstance(entry["tool_calls"], list) and len(entry["tool_calls"]) > 0 and content.strip() == "":
continue # skip tool calls continue # skip tool calls
@ -126,7 +183,11 @@ def convert_history(history):
if not user_input_last: if not user_input_last:
user_input = "" user_input = ""
return user_input, system_message, {'internal': chat_dialogue, 'visible': copy.deepcopy(chat_dialogue)} return user_input, system_message, {
'internal': chat_dialogue,
'visible': copy.deepcopy(chat_dialogue),
'images': all_images # Simple list of all images from the conversation
}
def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, prompt_only=False) -> dict: def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, prompt_only=False) -> dict:
@ -150,9 +211,23 @@ def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, p
elif m['role'] == 'function': elif m['role'] == 'function':
raise InvalidRequestError(message="role: function is not supported.", param='messages') raise InvalidRequestError(message="role: function is not supported.", param='messages')
if 'content' not in m and "image_url" not in m: # Handle multimodal content validation
content = m.get('content')
if content is None:
raise InvalidRequestError(message="messages: missing content", param='messages') raise InvalidRequestError(message="messages: missing content", param='messages')
# Validate multimodal content structure
if isinstance(content, list):
for item in content:
if not isinstance(item, dict) or 'type' not in item:
raise InvalidRequestError(message="messages: invalid content item format", param='messages')
if item['type'] not in ['text', 'image_url']:
raise InvalidRequestError(message="messages: unsupported content type", param='messages')
if item['type'] == 'text' and 'text' not in item:
raise InvalidRequestError(message="messages: missing text in content item", param='messages')
if item['type'] == 'image_url' and ('image_url' not in item or 'url' not in item['image_url']):
raise InvalidRequestError(message="messages: missing image_url in content item", param='messages')
# Chat Completions # Chat Completions
object_type = 'chat.completion' if not stream else 'chat.completion.chunk' object_type = 'chat.completion' if not stream else 'chat.completion.chunk'
created_time = int(time.time()) created_time = int(time.time())
@ -205,6 +280,10 @@ def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, p
'stream': stream 'stream': stream
}) })
# Add images to state for llama.cpp multimodal support
if history.get('images'):
generate_params['image_attachments'] = history['images']
max_tokens = generate_params['max_new_tokens'] max_tokens = generate_params['max_new_tokens']
if max_tokens in [None, 0]: if max_tokens in [None, 0]:
generate_params['max_new_tokens'] = 512 generate_params['max_new_tokens'] = 512

View file

@ -95,6 +95,12 @@ function startEditing(messageElement, messageBody, isUserMessage) {
editingInterface.textarea.focus(); editingInterface.textarea.focus();
editingInterface.textarea.setSelectionRange(rawText.length, rawText.length); editingInterface.textarea.setSelectionRange(rawText.length, rawText.length);
// Scroll the textarea into view
editingInterface.textarea.scrollIntoView({
behavior: "smooth",
block: "center"
});
// Setup event handlers // Setup event handlers
setupEditingHandlers(editingInterface.textarea, messageElement, originalHTML, messageBody, isUserMessage); setupEditingHandlers(editingInterface.textarea, messageElement, originalHTML, messageBody, isUserMessage);
} }
@ -229,10 +235,23 @@ function removeLastClick() {
document.getElementById("Remove-last").click(); document.getElementById("Remove-last").click();
} }
function handleMorphdomUpdate(text) { function handleMorphdomUpdate(data) {
// Determine target element and use it as query scope
var target_element, target_html;
if (data.last_message_only) {
const childNodes = document.getElementsByClassName("messages")[0].childNodes;
target_element = childNodes[childNodes.length - 1];
target_html = data.html;
} else {
target_element = document.getElementById("chat").parentNode;
target_html = "<div class=\"prose svelte-1ybaih5\">" + data.html + "</div>";
}
const queryScope = target_element;
// Track open blocks // Track open blocks
const openBlocks = new Set(); const openBlocks = new Set();
document.querySelectorAll(".thinking-block").forEach(block => { queryScope.querySelectorAll(".thinking-block").forEach(block => {
const blockId = block.getAttribute("data-block-id"); const blockId = block.getAttribute("data-block-id");
// If block exists and is open, add to open set // If block exists and is open, add to open set
if (blockId && block.hasAttribute("open")) { if (blockId && block.hasAttribute("open")) {
@ -242,7 +261,7 @@ function handleMorphdomUpdate(text) {
// Store scroll positions for any open blocks // Store scroll positions for any open blocks
const scrollPositions = {}; const scrollPositions = {};
document.querySelectorAll(".thinking-block[open]").forEach(block => { queryScope.querySelectorAll(".thinking-block[open]").forEach(block => {
const content = block.querySelector(".thinking-content"); const content = block.querySelector(".thinking-content");
const blockId = block.getAttribute("data-block-id"); const blockId = block.getAttribute("data-block-id");
if (content && blockId) { if (content && blockId) {
@ -255,8 +274,8 @@ function handleMorphdomUpdate(text) {
}); });
morphdom( morphdom(
document.getElementById("chat").parentNode, target_element,
"<div class=\"prose svelte-1ybaih5\">" + text + "</div>", target_html,
{ {
onBeforeElUpdated: function(fromEl, toEl) { onBeforeElUpdated: function(fromEl, toEl) {
// Preserve code highlighting // Preserve code highlighting
@ -307,7 +326,7 @@ function handleMorphdomUpdate(text) {
); );
// Add toggle listeners for new blocks // Add toggle listeners for new blocks
document.querySelectorAll(".thinking-block").forEach(block => { queryScope.querySelectorAll(".thinking-block").forEach(block => {
if (!block._hasToggleListener) { if (!block._hasToggleListener) {
block.addEventListener("toggle", function(e) { block.addEventListener("toggle", function(e) {
if (this.open) { if (this.open) {

View file

@ -184,7 +184,7 @@ const observer = new MutationObserver(function(mutations) {
const prevSibling = lastChild?.previousElementSibling; const prevSibling = lastChild?.previousElementSibling;
if (lastChild && prevSibling) { if (lastChild && prevSibling) {
lastChild.style.setProperty("margin-bottom", lastChild.style.setProperty("margin-bottom",
`max(0px, calc(max(70vh, 100vh - ${prevSibling.offsetHeight}px - 102px) - ${lastChild.offsetHeight}px))`, `max(0px, calc(max(70vh, 100vh - ${prevSibling.offsetHeight}px - 84px) - ${lastChild.offsetHeight}px))`,
"important" "important"
); );
} }
@ -217,7 +217,7 @@ function isElementVisibleOnScreen(element) {
} }
function doSyntaxHighlighting() { function doSyntaxHighlighting() {
const messageBodies = document.querySelectorAll(".message-body"); const messageBodies = document.getElementById("chat").querySelectorAll(".message-body");
if (messageBodies.length > 0) { if (messageBodies.length > 0) {
observer.disconnect(); observer.disconnect();
@ -229,6 +229,7 @@ function doSyntaxHighlighting() {
codeBlocks.forEach((codeBlock) => { codeBlocks.forEach((codeBlock) => {
hljs.highlightElement(codeBlock); hljs.highlightElement(codeBlock);
codeBlock.setAttribute("data-highlighted", "true"); codeBlock.setAttribute("data-highlighted", "true");
codeBlock.classList.add("pretty_scrollbar");
}); });
renderMathInElement(messageBody, { renderMathInElement(messageBody, {
@ -277,7 +278,7 @@ for (i = 0; i < slimDropdownElements.length; i++) {
// The show/hide events were adapted from: // The show/hide events were adapted from:
// https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js // https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js
//------------------------------------------------ //------------------------------------------------
var buttonsInChat = document.querySelectorAll("#chat-tab #chat-buttons button"); var buttonsInChat = document.querySelectorAll("#chat-tab #chat-buttons button, #chat-tab #chat-buttons #show-controls");
var button = document.getElementById("hover-element-button"); var button = document.getElementById("hover-element-button");
var menu = document.getElementById("hover-menu"); var menu = document.getElementById("hover-menu");
var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement; var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement;
@ -298,18 +299,21 @@ if (buttonsInChat.length > 0) {
const thisButton = buttonsInChat[i]; const thisButton = buttonsInChat[i];
menu.appendChild(thisButton); menu.appendChild(thisButton);
thisButton.addEventListener("click", () => { // Only apply transformations to button elements
hideMenu(); if (thisButton.tagName.toLowerCase() === "button") {
}); thisButton.addEventListener("click", () => {
hideMenu();
});
const buttonText = thisButton.textContent; const buttonText = thisButton.textContent;
const matches = buttonText.match(/(\(.*?\))/); const matches = buttonText.match(/(\(.*?\))/);
if (matches && matches.length > 1) { if (matches && matches.length > 1) {
// Apply the transparent-substring class to the matched substring // Apply the transparent-substring class to the matched substring
const substring = matches[1]; const substring = matches[1];
const newText = buttonText.replace(substring, `&nbsp;<span class="transparent-substring">${substring.slice(1, -1)}</span>`); const newText = buttonText.replace(substring, `&nbsp;<span class="transparent-substring">${substring.slice(1, -1)}</span>`);
thisButton.innerHTML = newText; thisButton.innerHTML = newText;
}
} }
} }
} }
@ -382,21 +386,10 @@ document.addEventListener("click", function (event) {
} }
}); });
//------------------------------------------------
// Relocate the "Show controls" checkbox
//------------------------------------------------
var elementToMove = document.getElementById("show-controls");
var parent = elementToMove.parentNode;
for (var i = 0; i < 2; i++) {
parent = parent.parentNode;
}
parent.insertBefore(elementToMove, parent.firstChild);
//------------------------------------------------ //------------------------------------------------
// Position the chat input // Position the chat input
//------------------------------------------------ //------------------------------------------------
document.getElementById("show-controls").parentNode.classList.add("chat-input-positioned"); document.getElementById("chat-input-row").classList.add("chat-input-positioned");
//------------------------------------------------ //------------------------------------------------
// Focus on the chat input // Focus on the chat input
@ -872,3 +865,53 @@ function navigateLastAssistantMessage(direction) {
return false; return false;
} }
//------------------------------------------------
// Paste Handler for Long Text
//------------------------------------------------
const MAX_PLAIN_TEXT_LENGTH = 2500;
function setupPasteHandler() {
const textbox = document.querySelector("#chat-input textarea[data-testid=\"textbox\"]");
const fileInput = document.querySelector("#chat-input input[data-testid=\"file-upload\"]");
if (!textbox || !fileInput) {
setTimeout(setupPasteHandler, 500);
return;
}
textbox.addEventListener("paste", async (event) => {
const text = event.clipboardData?.getData("text");
if (text && text.length > MAX_PLAIN_TEXT_LENGTH) {
event.preventDefault();
const file = new File([text], "pasted_text.txt", {
type: "text/plain",
lastModified: Date.now()
});
const dataTransfer = new DataTransfer();
dataTransfer.items.add(file);
fileInput.files = dataTransfer.files;
fileInput.dispatchEvent(new Event("change", { bubbles: true }));
}
});
}
if (document.readyState === "loading") {
document.addEventListener("DOMContentLoaded", setupPasteHandler);
} else {
setupPasteHandler();
}
//------------------------------------------------
// Tooltips
//------------------------------------------------
// File upload button
document.querySelector("#chat-input .upload-button").title = "Upload text files, PDFs, and DOCX documents";
// Activate web search
document.getElementById("web-search").title = "Search the internet with DuckDuckGo";

View file

@ -220,13 +220,22 @@ def generate_chat_prompt(user_input, state, **kwargs):
# Add attachment content if present # Add attachment content if present
if user_key in metadata and "attachments" in metadata[user_key]: if user_key in metadata and "attachments" in metadata[user_key]:
attachments_text = "" attachments_text = ""
for attachment in metadata[user_key]["attachments"]: image_refs = ""
filename = attachment.get("name", "file")
content = attachment.get("content", "")
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
if attachments_text: for attachment in metadata[user_key]["attachments"]:
enhanced_user_msg = f"{user_msg}\n\nATTACHMENTS:\n{attachments_text}" if attachment.get("type") == "image":
# Add image reference for multimodal models
image_refs += "<__media__>"
else:
# Handle text/PDF attachments as before
filename = attachment.get("name", "file")
content = attachment.get("content", "")
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
if image_refs or attachments_text:
enhanced_user_msg = f"{user_msg} {image_refs}"
if attachments_text:
enhanced_user_msg += f"\n\nATTACHMENTS:\n{attachments_text}"
messages.insert(insert_pos, {"role": "user", "content": enhanced_user_msg}) messages.insert(insert_pos, {"role": "user", "content": enhanced_user_msg})
@ -240,22 +249,29 @@ def generate_chat_prompt(user_input, state, **kwargs):
has_attachments = user_key in metadata and "attachments" in metadata[user_key] has_attachments = user_key in metadata and "attachments" in metadata[user_key]
if (user_input or has_attachments) and not impersonate and not _continue: if (user_input or has_attachments) and not impersonate and not _continue:
# For the current user input being processed, check if we need to add attachments current_row_idx = len(history)
if not impersonate and not _continue and len(history_data.get('metadata', {})) > 0: user_key = f"user_{current_row_idx}"
current_row_idx = len(history)
user_key = f"user_{current_row_idx}"
if user_key in metadata and "attachments" in metadata[user_key]: enhanced_user_input = user_input
attachments_text = ""
for attachment in metadata[user_key]["attachments"]: if user_key in metadata and "attachments" in metadata[user_key]:
attachments_text = ""
image_refs = ""
for attachment in metadata[user_key]["attachments"]:
if attachment.get("type") == "image":
image_refs += "<__media__>"
else:
filename = attachment.get("name", "file") filename = attachment.get("name", "file")
content = attachment.get("content", "") content = attachment.get("content", "")
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n" attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
if image_refs or attachments_text:
enhanced_user_input = f"{user_input} {image_refs}"
if attachments_text: if attachments_text:
user_input = f"{user_input}\n\nATTACHMENTS:\n{attachments_text}" enhanced_user_input += f"\n\nATTACHMENTS:\n{attachments_text}"
messages.append({"role": "user", "content": user_input}) messages.append({"role": "user", "content": enhanced_user_input})
def make_prompt(messages): def make_prompt(messages):
if state['mode'] == 'chat-instruct' and _continue: if state['mode'] == 'chat-instruct' and _continue:
@ -495,26 +511,63 @@ def add_message_attachment(history, row_idx, file_path, is_user=True):
file_extension = path.suffix.lower() file_extension = path.suffix.lower()
try: try:
# Handle different file types # Handle image files
if file_extension == '.pdf': if file_extension in ['.jpg', '.jpeg', '.png', '.webp', '.bmp', '.gif']:
# Convert image to base64
with open(path, 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
# Determine MIME type from extension
mime_type_map = {
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.webp': 'image/webp',
'.bmp': 'image/bmp',
'.gif': 'image/gif'
}
mime_type = mime_type_map.get(file_extension, 'image/jpeg')
# Format as data URL
data_url = f"data:{mime_type};base64,{image_data}"
# Generate unique image ID
image_id = len([att for att in history['metadata'][key]["attachments"] if att.get("type") == "image"]) + 1
attachment = {
"name": filename,
"type": "image",
"image_data": data_url,
"image_id": image_id,
"file_path": str(path) # For UI preview
}
elif file_extension == '.pdf':
# Process PDF file # Process PDF file
content = extract_pdf_text(path) content = extract_pdf_text(path)
file_type = "application/pdf" attachment = {
"name": filename,
"type": "application/pdf",
"content": content,
}
elif file_extension == '.docx':
content = extract_docx_text(path)
attachment = {
"name": filename,
"type": "application/docx",
"content": content,
}
else: else:
# Default handling for text files # Default handling for text files
with open(path, 'r', encoding='utf-8') as f: with open(path, 'r', encoding='utf-8') as f:
content = f.read() content = f.read()
file_type = "text/plain" attachment = {
"name": filename,
# Add attachment "type": "text/plain",
attachment = { "content": content,
"name": filename, }
"type": file_type,
"content": content,
}
history['metadata'][key]["attachments"].append(attachment) history['metadata'][key]["attachments"].append(attachment)
return content # Return the content for reuse return attachment # Return the attachment for reuse
except Exception as e: except Exception as e:
logger.error(f"Error processing attachment {filename}: {e}") logger.error(f"Error processing attachment {filename}: {e}")
return None return None
@ -538,6 +591,53 @@ def extract_pdf_text(pdf_path):
return f"[Error extracting PDF text: {str(e)}]" return f"[Error extracting PDF text: {str(e)}]"
def extract_docx_text(docx_path):
"""
Extract text from a .docx file, including headers,
body (paragraphs and tables), and footers.
"""
try:
import docx
doc = docx.Document(docx_path)
parts = []
# 1) Extract non-empty header paragraphs from each section
for section in doc.sections:
for para in section.header.paragraphs:
text = para.text.strip()
if text:
parts.append(text)
# 2) Extract body blocks (paragraphs and tables) in document order
parent_elm = doc.element.body
for child in parent_elm.iterchildren():
if isinstance(child, docx.oxml.text.paragraph.CT_P):
para = docx.text.paragraph.Paragraph(child, doc)
text = para.text.strip()
if text:
parts.append(text)
elif isinstance(child, docx.oxml.table.CT_Tbl):
table = docx.table.Table(child, doc)
for row in table.rows:
cells = [cell.text.strip() for cell in row.cells]
parts.append("\t".join(cells))
# 3) Extract non-empty footer paragraphs from each section
for section in doc.sections:
for para in section.footer.paragraphs:
text = para.text.strip()
if text:
parts.append(text)
return "\n".join(parts)
except Exception as e:
logger.error(f"Error extracting text from DOCX: {e}")
return f"[Error extracting DOCX text: {str(e)}]"
def generate_search_query(user_message, state): def generate_search_query(user_message, state):
"""Generate a search query from user message using the LLM""" """Generate a search query from user message using the LLM"""
# Augment the user message with search instruction # Augment the user message with search instruction
@ -554,7 +654,12 @@ def generate_search_query(user_message, state):
query = "" query = ""
for reply in generate_reply(formatted_prompt, search_state, stopping_strings=[], is_chat=True): for reply in generate_reply(formatted_prompt, search_state, stopping_strings=[], is_chat=True):
query = reply.strip() query = reply
# Strip and remove surrounding quotes if present
query = query.strip()
if len(query) >= 2 and query.startswith('"') and query.endswith('"'):
query = query[1:-1]
return query return query
@ -590,6 +695,19 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
for file_path in files: for file_path in files:
add_message_attachment(output, row_idx, file_path, is_user=True) add_message_attachment(output, row_idx, file_path, is_user=True)
# Collect image attachments for llama.cpp
image_attachments = []
if 'metadata' in output:
user_key = f"user_{row_idx}"
if user_key in output['metadata'] and "attachments" in output['metadata'][user_key]:
for attachment in output['metadata'][user_key]["attachments"]:
if attachment.get("type") == "image":
image_attachments.append(attachment)
# Add image attachments to state for the generation
if image_attachments:
state['image_attachments'] = image_attachments
# Add web search results as attachments if enabled # Add web search results as attachments if enabled
if state.get('enable_web_search', False): if state.get('enable_web_search', False):
search_query = generate_search_query(text, state) search_query = generate_search_query(text, state)
@ -660,7 +778,7 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
# Add timestamp for assistant's response at the start of generation # Add timestamp for assistant's response at the start of generation
row_idx = len(output['internal']) - 1 row_idx = len(output['internal']) - 1
update_message_metadata(output['metadata'], "assistant", row_idx, timestamp=get_current_timestamp()) update_message_metadata(output['metadata'], "assistant", row_idx, timestamp=get_current_timestamp(), model_name=shared.model_name)
# Generate # Generate
reply = None reply = None
@ -775,7 +893,9 @@ def generate_chat_reply_wrapper(text, state, regenerate=False, _continue=False):
last_save_time = time.monotonic() last_save_time = time.monotonic()
save_interval = 8 save_interval = 8
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)): for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
yield chat_html_wrapper(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu']), history yield chat_html_wrapper(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'], last_message_only=(i > 0)), history
if i == 0:
time.sleep(0.125) # We need this to make sure the first update goes through
current_time = time.monotonic() current_time = time.monotonic()
# Save on first iteration or if save_interval seconds have passed # Save on first iteration or if save_interval seconds have passed
@ -806,9 +926,12 @@ def remove_last_message(history):
return html.unescape(last[0]), history return html.unescape(last[0]), history
def send_dummy_message(textbox, state): def send_dummy_message(text, state):
history = state['history'] history = state['history']
text = textbox['text']
# Handle both dict and string inputs
if isinstance(text, dict):
text = text['text']
# Initialize metadata if not present # Initialize metadata if not present
if 'metadata' not in history: if 'metadata' not in history:
@ -822,9 +945,12 @@ def send_dummy_message(textbox, state):
return history return history
def send_dummy_reply(textbox, state): def send_dummy_reply(text, state):
history = state['history'] history = state['history']
text = textbox['text']
# Handle both dict and string inputs
if isinstance(text, dict):
text = text['text']
# Initialize metadata if not present # Initialize metadata if not present
if 'metadata' not in history: if 'metadata' not in history:

View file

@ -245,3 +245,20 @@ class Exllamav3HF(PreTrainedModel, GenerationMixin):
pretrained_model_name_or_path = Path(f'{shared.args.model_dir}') / Path(pretrained_model_name_or_path) pretrained_model_name_or_path = Path(f'{shared.args.model_dir}') / Path(pretrained_model_name_or_path)
return Exllamav3HF(pretrained_model_name_or_path) return Exllamav3HF(pretrained_model_name_or_path)
def unload(self):
"""Properly unload the ExllamaV3 model and free GPU memory."""
if hasattr(self, 'ex_model') and self.ex_model is not None:
self.ex_model.unload()
self.ex_model = None
if hasattr(self, 'ex_cache') and self.ex_cache is not None:
self.ex_cache = None
# Clean up any additional ExllamaV3 resources
if hasattr(self, 'past_seq'):
self.past_seq = None
if hasattr(self, 'past_seq_negative'):
self.past_seq_negative = None
if hasattr(self, 'ex_cache_negative'):
self.ex_cache_negative = None

View file

@ -350,12 +350,14 @@ remove_button = f'<button class="footer-button footer-remove-button" title="Remo
info_button = f'<button class="footer-button footer-info-button" title="message">{info_svg}</button>' info_button = f'<button class="footer-button footer-info-button" title="message">{info_svg}</button>'
def format_message_timestamp(history, role, index): def format_message_timestamp(history, role, index, tooltip_include_timestamp=True):
"""Get a formatted timestamp HTML span for a message if available""" """Get a formatted timestamp HTML span for a message if available"""
key = f"{role}_{index}" key = f"{role}_{index}"
if 'metadata' in history and key in history['metadata'] and history['metadata'][key].get('timestamp'): if 'metadata' in history and key in history['metadata'] and history['metadata'][key].get('timestamp'):
timestamp = history['metadata'][key]['timestamp'] timestamp = history['metadata'][key]['timestamp']
return f"<span class='timestamp'>{timestamp}</span>" tooltip_text = get_message_tooltip(history, role, index, include_timestamp=tooltip_include_timestamp)
title_attr = f' title="{html.escape(tooltip_text)}"' if tooltip_text else ''
return f"<span class='timestamp'{title_attr}>{timestamp}</span>"
return "" return ""
@ -372,22 +374,50 @@ def format_message_attachments(history, role, index):
for attachment in attachments: for attachment in attachments:
name = html.escape(attachment["name"]) name = html.escape(attachment["name"])
# Make clickable if URL exists if attachment.get("type") == "image":
if "url" in attachment: # Show image preview
name = f'<a href="{html.escape(attachment["url"])}" target="_blank" rel="noopener noreferrer">{name}</a>' file_path = attachment.get("file_path", "")
attachments_html += (
f'<div class="attachment-box image-attachment">'
f'<img src="file/{file_path}" alt="{name}" class="image-preview" />'
f'<div class="attachment-name">{name}</div>'
f'</div>'
)
else:
# Make clickable if URL exists (web search)
if "url" in attachment:
name = f'<a href="{html.escape(attachment["url"])}" target="_blank" rel="noopener noreferrer">{name}</a>'
attachments_html += (
f'<div class="attachment-box">'
f'<div class="attachment-icon">{attachment_svg}</div>'
f'<div class="attachment-name">{name}</div>'
f'</div>'
)
attachments_html += (
f'<div class="attachment-box">'
f'<div class="attachment-icon">{attachment_svg}</div>'
f'<div class="attachment-name">{name}</div>'
f'</div>'
)
attachments_html += '</div>' attachments_html += '</div>'
return attachments_html return attachments_html
return "" return ""
def get_message_tooltip(history, role, index, include_timestamp=True):
"""Get tooltip text combining timestamp and model name for a message"""
key = f"{role}_{index}"
if 'metadata' not in history or key not in history['metadata']:
return ""
meta = history['metadata'][key]
tooltip_parts = []
if include_timestamp and meta.get('timestamp'):
tooltip_parts.append(meta['timestamp'])
if meta.get('model_name'):
tooltip_parts.append(f"Model: {meta['model_name']}")
return " | ".join(tooltip_parts)
def get_version_navigation_html(history, i, role): def get_version_navigation_html(history, i, role):
"""Generate simple navigation arrows for message versions""" """Generate simple navigation arrows for message versions"""
key = f"{role}_{i}" key = f"{role}_{i}"
@ -443,66 +473,69 @@ def actions_html(history, i, role, info_message=""):
f'{version_nav_html}') f'{version_nav_html}')
def generate_instruct_html(history): def generate_instruct_html(history, last_message_only=False):
output = f'<style>{instruct_css}</style><div class="chat" id="chat" data-mode="instruct"><div class="messages">' if not last_message_only:
output = f'<style>{instruct_css}</style><div class="chat" id="chat" data-mode="instruct"><div class="messages">'
else:
output = ""
for i in range(len(history['visible'])): def create_message(role, content, raw_content):
row_visible = history['visible'][i] """Inner function that captures variables from outer scope."""
row_internal = history['internal'][i] class_name = "user-message" if role == "user" else "assistant-message"
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Get timestamps # Get role-specific data
user_timestamp = format_message_timestamp(history, "user", i) timestamp = format_message_timestamp(history, role, i)
assistant_timestamp = format_message_timestamp(history, "assistant", i) attachments = format_message_attachments(history, role, i)
# Get attachments # Create info button if timestamp exists
user_attachments = format_message_attachments(history, "user", i) info_message = ""
assistant_attachments = format_message_attachments(history, "assistant", i) if timestamp:
tooltip_text = get_message_tooltip(history, role, i)
info_message = info_button.replace('title="message"', f'title="{html.escape(tooltip_text)}"')
# Create info buttons for timestamps if they exist return (
info_message_user = "" f'<div class="{class_name}" '
if user_timestamp != "": f'data-raw="{html.escape(raw_content, quote=True)}"'
# Extract the timestamp value from the span
user_timestamp_value = user_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_user = info_button.replace("message", user_timestamp_value)
info_message_assistant = ""
if assistant_timestamp != "":
# Extract the timestamp value from the span
assistant_timestamp_value = assistant_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_assistant = info_button.replace("message", assistant_timestamp_value)
if converted_visible[0]: # Don't display empty user messages
output += (
f'<div class="user-message" '
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
f'data-index={i}>'
f'<div class="text">'
f'<div class="message-body">{converted_visible[0]}</div>'
f'{user_attachments}'
f'{actions_html(history, i, "user", info_message_user)}'
f'</div>'
f'</div>'
)
output += (
f'<div class="assistant-message" '
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
f'data-index={i}>' f'data-index={i}>'
f'<div class="text">' f'<div class="text">'
f'<div class="message-body">{converted_visible[1]}</div>' f'<div class="message-body">{content}</div>'
f'{assistant_attachments}' f'{attachments}'
f'{actions_html(history, i, "assistant", info_message_assistant)}' f'{actions_html(history, i, role, info_message)}'
f'</div>' f'</div>'
f'</div>' f'</div>'
) )
output += "</div></div>" # Determine range
start_idx = len(history['visible']) - 1 if last_message_only else 0
end_idx = len(history['visible'])
for i in range(start_idx, end_idx):
row_visible = history['visible'][i]
row_internal = history['internal'][i]
# Convert content
if last_message_only:
converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
else:
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Generate messages
if not last_message_only and converted_visible[0]:
output += create_message("user", converted_visible[0], row_internal[0])
output += create_message("assistant", converted_visible[1], row_internal[1])
if not last_message_only:
output += "</div></div>"
return output return output
def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=False): def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=False, last_message_only=False):
output = f'<style>{chat_styles[style]}</style><div class="chat" id="chat"><div class="messages">' if not last_message_only:
output = f'<style>{chat_styles[style]}</style><div class="chat" id="chat"><div class="messages">'
else:
output = ""
# We use ?character and ?time.time() to force the browser to reset caches # We use ?character and ?time.time() to force the browser to reset caches
img_bot = ( img_bot = (
@ -510,112 +543,117 @@ def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=
if Path("user_data/cache/pfp_character_thumb.png").exists() else '' if Path("user_data/cache/pfp_character_thumb.png").exists() else ''
) )
img_me = ( def create_message(role, content, raw_content):
f'<img src="file/user_data/cache/pfp_me.png?{time.time() if reset_cache else ""}">' """Inner function for CAI-style messages."""
if Path("user_data/cache/pfp_me.png").exists() else '' circle_class = "circle-you" if role == "user" else "circle-bot"
) name = name1 if role == "user" else name2
for i in range(len(history['visible'])): # Get role-specific data
row_visible = history['visible'][i] timestamp = format_message_timestamp(history, role, i, tooltip_include_timestamp=False)
row_internal = history['internal'][i] attachments = format_message_attachments(history, role, i)
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Get timestamps # Get appropriate image
user_timestamp = format_message_timestamp(history, "user", i) if role == "user":
assistant_timestamp = format_message_timestamp(history, "assistant", i) img = (f'<img src="file/user_data/cache/pfp_me.png?{time.time() if reset_cache else ""}">'
if Path("user_data/cache/pfp_me.png").exists() else '')
else:
img = img_bot
# Get attachments return (
user_attachments = format_message_attachments(history, "user", i)
assistant_attachments = format_message_attachments(history, "assistant", i)
if converted_visible[0]: # Don't display empty user messages
output += (
f'<div class="message" '
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
f'data-index={i}>'
f'<div class="circle-you">{img_me}</div>'
f'<div class="text">'
f'<div class="username">{name1}{user_timestamp}</div>'
f'<div class="message-body">{converted_visible[0]}</div>'
f'{user_attachments}'
f'{actions_html(history, i, "user")}'
f'</div>'
f'</div>'
)
output += (
f'<div class="message" ' f'<div class="message" '
f'data-raw="{html.escape(row_internal[1], quote=True)}"' f'data-raw="{html.escape(raw_content, quote=True)}"'
f'data-index={i}>' f'data-index={i}>'
f'<div class="circle-bot">{img_bot}</div>' f'<div class="{circle_class}">{img}</div>'
f'<div class="text">' f'<div class="text">'
f'<div class="username">{name2}{assistant_timestamp}</div>' f'<div class="username">{name}{timestamp}</div>'
f'<div class="message-body">{converted_visible[1]}</div>' f'<div class="message-body">{content}</div>'
f'{assistant_attachments}' f'{attachments}'
f'{actions_html(history, i, "assistant")}' f'{actions_html(history, i, role)}'
f'</div>' f'</div>'
f'</div>' f'</div>'
) )
output += "</div></div>" # Determine range
start_idx = len(history['visible']) - 1 if last_message_only else 0
end_idx = len(history['visible'])
for i in range(start_idx, end_idx):
row_visible = history['visible'][i]
row_internal = history['internal'][i]
# Convert content
if last_message_only:
converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
else:
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Generate messages
if not last_message_only and converted_visible[0]:
output += create_message("user", converted_visible[0], row_internal[0])
output += create_message("assistant", converted_visible[1], row_internal[1])
if not last_message_only:
output += "</div></div>"
return output return output
def generate_chat_html(history, name1, name2, reset_cache=False): def generate_chat_html(history, name1, name2, reset_cache=False, last_message_only=False):
output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">' if not last_message_only:
output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">'
else:
output = ""
for i in range(len(history['visible'])): def create_message(role, content, raw_content):
row_visible = history['visible'][i] """Inner function for WPP-style messages."""
row_internal = history['internal'][i] text_class = "text-you" if role == "user" else "text-bot"
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Get timestamps # Get role-specific data
user_timestamp = format_message_timestamp(history, "user", i) timestamp = format_message_timestamp(history, role, i)
assistant_timestamp = format_message_timestamp(history, "assistant", i) attachments = format_message_attachments(history, role, i)
# Get attachments # Create info button if timestamp exists
user_attachments = format_message_attachments(history, "user", i) info_message = ""
assistant_attachments = format_message_attachments(history, "assistant", i) if timestamp:
tooltip_text = get_message_tooltip(history, role, i)
info_message = info_button.replace('title="message"', f'title="{html.escape(tooltip_text)}"')
# Create info buttons for timestamps if they exist return (
info_message_user = ""
if user_timestamp != "":
# Extract the timestamp value from the span
user_timestamp_value = user_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_user = info_button.replace("message", user_timestamp_value)
info_message_assistant = ""
if assistant_timestamp != "":
# Extract the timestamp value from the span
assistant_timestamp_value = assistant_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_assistant = info_button.replace("message", assistant_timestamp_value)
if converted_visible[0]: # Don't display empty user messages
output += (
f'<div class="message" '
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
f'data-index={i}>'
f'<div class="text-you">'
f'<div class="message-body">{converted_visible[0]}</div>'
f'{user_attachments}'
f'{actions_html(history, i, "user", info_message_user)}'
f'</div>'
f'</div>'
)
output += (
f'<div class="message" ' f'<div class="message" '
f'data-raw="{html.escape(row_internal[1], quote=True)}"' f'data-raw="{html.escape(raw_content, quote=True)}"'
f'data-index={i}>' f'data-index={i}>'
f'<div class="text-bot">' f'<div class="{text_class}">'
f'<div class="message-body">{converted_visible[1]}</div>' f'<div class="message-body">{content}</div>'
f'{assistant_attachments}' f'{attachments}'
f'{actions_html(history, i, "assistant", info_message_assistant)}' f'{actions_html(history, i, role, info_message)}'
f'</div>' f'</div>'
f'</div>' f'</div>'
) )
output += "</div></div>" # Determine range
start_idx = len(history['visible']) - 1 if last_message_only else 0
end_idx = len(history['visible'])
for i in range(start_idx, end_idx):
row_visible = history['visible'][i]
row_internal = history['internal'][i]
# Convert content
if last_message_only:
converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
else:
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Generate messages
if not last_message_only and converted_visible[0]:
output += create_message("user", converted_visible[0], row_internal[0])
output += create_message("assistant", converted_visible[1], row_internal[1])
if not last_message_only:
output += "</div></div>"
return output return output
@ -629,15 +667,15 @@ def time_greeting():
return "Good evening!" return "Good evening!"
def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=False): def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=False, last_message_only=False):
if len(history['visible']) == 0: if len(history['visible']) == 0:
greeting = f"<div class=\"welcome-greeting\">{time_greeting()} How can I help you today?</div>" greeting = f"<div class=\"welcome-greeting\">{time_greeting()} How can I help you today?</div>"
result = f'<div class="chat" id="chat">{greeting}</div>' result = f'<div class="chat" id="chat">{greeting}</div>'
elif mode == 'instruct': elif mode == 'instruct':
result = generate_instruct_html(history) result = generate_instruct_html(history, last_message_only=last_message_only)
elif style == 'wpp': elif style == 'wpp':
result = generate_chat_html(history, name1, name2) result = generate_chat_html(history, name1, name2, last_message_only=last_message_only)
else: else:
result = generate_cai_chat_html(history, name1, name2, style, character, reset_cache) result = generate_cai_chat_html(history, name1, name2, style, character, reset_cache=reset_cache, last_message_only=last_message_only)
return {'html': result} return {'html': result, 'last_message_only': last_message_only}

View file

@ -121,6 +121,18 @@ class LlamaServer:
to_ban = [[int(token_id), False] for token_id in state['custom_token_bans'].split(',')] to_ban = [[int(token_id), False] for token_id in state['custom_token_bans'].split(',')]
payload["logit_bias"] = to_ban payload["logit_bias"] = to_ban
# Add image data if present
if 'image_attachments' in state:
medias = []
for attachment in state['image_attachments']:
medias.append({
"type": "image",
"data": attachment['image_data']
})
if medias:
payload["medias"] = medias
return payload return payload
def generate_with_streaming(self, prompt, state): def generate_with_streaming(self, prompt, state):
@ -142,7 +154,7 @@ class LlamaServer:
if shared.args.verbose: if shared.args.verbose:
logger.info("GENERATE_PARAMS=") logger.info("GENERATE_PARAMS=")
printable_payload = {k: v for k, v in payload.items() if k != "prompt"} printable_payload = {k: v for k, v in payload.items() if k not in ["prompt", "image_data"]}
pprint.PrettyPrinter(indent=4, sort_dicts=False).pprint(printable_payload) pprint.PrettyPrinter(indent=4, sort_dicts=False).pprint(printable_payload)
print() print()
@ -409,14 +421,31 @@ class LlamaServer:
def filter_stderr_with_progress(process_stderr): def filter_stderr_with_progress(process_stderr):
progress_pattern = re.compile(r'slot update_slots: id.*progress = (\d+\.\d+)') progress_pattern = re.compile(r'slot update_slots: id.*progress = (\d+\.\d+)')
last_was_progress = False
try: try:
for line in iter(process_stderr.readline, ''): for line in iter(process_stderr.readline, ''):
line = line.rstrip('\n\r') # Remove existing newlines
progress_match = progress_pattern.search(line) progress_match = progress_pattern.search(line)
if progress_match: if progress_match:
sys.stderr.write(line) if last_was_progress:
# Overwrite the previous progress line using carriage return
sys.stderr.write(f'\r{line}')
else:
# First progress line - print normally
sys.stderr.write(line)
sys.stderr.flush() sys.stderr.flush()
last_was_progress = True
elif not line.startswith(('srv ', 'slot ')) and 'log_server_r: request: GET /health' not in line: elif not line.startswith(('srv ', 'slot ')) and 'log_server_r: request: GET /health' not in line:
sys.stderr.write(line) if last_was_progress:
# Finish the progress line with a newline, then print the new line
sys.stderr.write(f'\n{line}\n')
else:
# Normal line - print with newline
sys.stderr.write(f'{line}\n')
sys.stderr.flush() sys.stderr.flush()
last_was_progress = False
# For filtered lines, don't change last_was_progress state
except (ValueError, IOError): except (ValueError, IOError):
pass pass

View file

@ -116,10 +116,13 @@ def unload_model(keep_model_name=False):
return return
is_llamacpp = (shared.model.__class__.__name__ == 'LlamaServer') is_llamacpp = (shared.model.__class__.__name__ == 'LlamaServer')
if shared.model.__class__.__name__ == 'Exllamav3HF':
shared.model.unload()
shared.model = shared.tokenizer = None shared.model = shared.tokenizer = None
shared.lora_names = [] shared.lora_names = []
shared.model_dirty_from_training = False shared.model_dirty_from_training = False
if not is_llamacpp: if not is_llamacpp:
from modules.torch_utils import clear_torch_cache from modules.torch_utils import clear_torch_cache
clear_torch_cache() clear_torch_cache()

View file

@ -21,7 +21,7 @@ lora_names = []
# Generation variables # Generation variables
stop_everything = False stop_everything = False
generation_lock = None generation_lock = None
processing_message = '*Is typing...*' processing_message = ''
# UI variables # UI variables
gradio = {} gradio = {}
@ -47,7 +47,6 @@ settings = {
'max_new_tokens_max': 4096, 'max_new_tokens_max': 4096,
'prompt_lookup_num_tokens': 0, 'prompt_lookup_num_tokens': 0,
'max_tokens_second': 0, 'max_tokens_second': 0,
'max_updates_second': 12,
'auto_max_new_tokens': True, 'auto_max_new_tokens': True,
'ban_eos_token': False, 'ban_eos_token': False,
'add_bos_token': True, 'add_bos_token': True,

View file

@ -65,41 +65,39 @@ def _generate_reply(question, state, stopping_strings=None, is_chat=False, escap
all_stop_strings += st all_stop_strings += st
shared.stop_everything = False shared.stop_everything = False
last_update = -1
reply = '' reply = ''
is_stream = state['stream'] is_stream = state['stream']
if len(all_stop_strings) > 0 and not state['stream']: if len(all_stop_strings) > 0 and not state['stream']:
state = copy.deepcopy(state) state = copy.deepcopy(state)
state['stream'] = True state['stream'] = True
min_update_interval = 0
if state.get('max_updates_second', 0) > 0:
min_update_interval = 1 / state['max_updates_second']
# Generate # Generate
last_update = -1
latency_threshold = 1 / 1000
for reply in generate_func(question, original_question, state, stopping_strings, is_chat=is_chat): for reply in generate_func(question, original_question, state, stopping_strings, is_chat=is_chat):
cur_time = time.monotonic()
reply, stop_found = apply_stopping_strings(reply, all_stop_strings) reply, stop_found = apply_stopping_strings(reply, all_stop_strings)
if escape_html: if escape_html:
reply = html.escape(reply) reply = html.escape(reply)
if is_stream: if is_stream:
cur_time = time.time()
# Limit number of tokens/second to make text readable in real time # Limit number of tokens/second to make text readable in real time
if state['max_tokens_second'] > 0: if state['max_tokens_second'] > 0:
diff = 1 / state['max_tokens_second'] - (cur_time - last_update) diff = 1 / state['max_tokens_second'] - (cur_time - last_update)
if diff > 0: if diff > 0:
time.sleep(diff) time.sleep(diff)
last_update = time.time() last_update = time.monotonic()
yield reply yield reply
# Limit updates to avoid lag in the Gradio UI # Limit updates to avoid lag in the Gradio UI
# API updates are not limited # API updates are not limited
else: else:
if cur_time - last_update > min_update_interval: # If 'generate_func' takes less than 0.001 seconds to yield the next token
last_update = cur_time # (equivalent to more than 1000 tok/s), assume that the UI is lagging behind and skip yielding
if (cur_time - last_update) > latency_threshold:
yield reply yield reply
last_update = time.monotonic()
if stop_found or (state['max_tokens_second'] > 0 and shared.stop_everything): if stop_found or (state['max_tokens_second'] > 0 and shared.stop_everything):
break break

View file

@ -6,6 +6,7 @@ import yaml
import extensions import extensions
from modules import shared from modules import shared
from modules.chat import load_history
with open(Path(__file__).resolve().parent / '../css/NotoSans/stylesheet.css', 'r') as f: with open(Path(__file__).resolve().parent / '../css/NotoSans/stylesheet.css', 'r') as f:
css = f.read() css = f.read()
@ -71,6 +72,7 @@ if not shared.args.old_colors:
block_background_fill_dark='transparent', block_background_fill_dark='transparent',
block_border_color_dark='transparent', block_border_color_dark='transparent',
input_border_color_dark='var(--border-color-dark)', input_border_color_dark='var(--border-color-dark)',
input_border_color_focus_dark='var(--border-color-dark)',
checkbox_border_color_dark='var(--border-color-dark)', checkbox_border_color_dark='var(--border-color-dark)',
border_color_primary_dark='var(--border-color-dark)', border_color_primary_dark='var(--border-color-dark)',
button_secondary_border_color_dark='var(--border-color-dark)', button_secondary_border_color_dark='var(--border-color-dark)',
@ -89,6 +91,8 @@ if not shared.args.old_colors:
checkbox_label_shadow='none', checkbox_label_shadow='none',
block_shadow='none', block_shadow='none',
block_shadow_dark='none', block_shadow_dark='none',
input_shadow_focus='none',
input_shadow_focus_dark='none',
button_large_radius='0.375rem', button_large_radius='0.375rem',
button_large_padding='6px 12px', button_large_padding='6px 12px',
input_radius='0.375rem', input_radius='0.375rem',
@ -191,7 +195,6 @@ def list_interface_input_elements():
'max_new_tokens', 'max_new_tokens',
'prompt_lookup_num_tokens', 'prompt_lookup_num_tokens',
'max_tokens_second', 'max_tokens_second',
'max_updates_second',
'do_sample', 'do_sample',
'dynamic_temperature', 'dynamic_temperature',
'temperature_last', 'temperature_last',
@ -267,6 +270,10 @@ def gather_interface_values(*args):
if not shared.args.multi_user: if not shared.args.multi_user:
shared.persistent_interface_state = output shared.persistent_interface_state = output
# Prevent history loss if backend is restarted but UI is not refreshed
if output['history'] is None and output['unique_id'] is not None:
output['history'] = load_history(output['unique_id'], output['character_menu'], output['mode'])
return output return output

View file

@ -18,7 +18,7 @@ def create_ui():
mu = shared.args.multi_user mu = shared.args.multi_user
shared.gradio['Chat input'] = gr.State() shared.gradio['Chat input'] = gr.State()
shared.gradio['history'] = gr.JSON(visible=False) shared.gradio['history'] = gr.State({'internal': [], 'visible': [], 'metadata': {}})
with gr.Tab('Chat', id='Chat', elem_id='chat-tab'): with gr.Tab('Chat', id='Chat', elem_id='chat-tab'):
with gr.Row(elem_id='past-chats-row', elem_classes=['pretty_scrollbar']): with gr.Row(elem_id='past-chats-row', elem_classes=['pretty_scrollbar']):
@ -55,7 +55,6 @@ def create_ui():
with gr.Column(scale=10, elem_id='chat-input-container'): with gr.Column(scale=10, elem_id='chat-input-container'):
shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar']) shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar'])
shared.gradio['show_controls'] = gr.Checkbox(value=shared.settings['show_controls'], label='Show controls (Ctrl+S)', elem_id='show-controls')
shared.gradio['typing-dots'] = gr.HTML(value='<div class="typing"><span></span><span class="dot1"></span><span class="dot2"></span></div>', label='typing', elem_id='typing-container') shared.gradio['typing-dots'] = gr.HTML(value='<div class="typing"><span></span><span class="dot1"></span><span class="dot2"></span></div>', label='typing', elem_id='typing-container')
with gr.Column(scale=1, elem_id='generate-stop-container'): with gr.Column(scale=1, elem_id='generate-stop-container'):
@ -65,21 +64,15 @@ def create_ui():
# Hover menu buttons # Hover menu buttons
with gr.Column(elem_id='chat-buttons'): with gr.Column(elem_id='chat-buttons'):
with gr.Row(): shared.gradio['Regenerate'] = gr.Button('Regenerate (Ctrl + Enter)', elem_id='Regenerate')
shared.gradio['Regenerate'] = gr.Button('Regenerate (Ctrl + Enter)', elem_id='Regenerate') shared.gradio['Continue'] = gr.Button('Continue (Alt + Enter)', elem_id='Continue')
shared.gradio['Continue'] = gr.Button('Continue (Alt + Enter)', elem_id='Continue') shared.gradio['Remove last'] = gr.Button('Remove last reply (Ctrl + Shift + Backspace)', elem_id='Remove-last')
shared.gradio['Remove last'] = gr.Button('Remove last reply (Ctrl + Shift + Backspace)', elem_id='Remove-last') shared.gradio['Impersonate'] = gr.Button('Impersonate (Ctrl + Shift + M)', elem_id='Impersonate')
shared.gradio['Send dummy message'] = gr.Button('Send dummy message')
with gr.Row(): shared.gradio['Send dummy reply'] = gr.Button('Send dummy reply')
shared.gradio['Impersonate'] = gr.Button('Impersonate (Ctrl + Shift + M)', elem_id='Impersonate') shared.gradio['send-chat-to-default'] = gr.Button('Send to Default')
shared.gradio['send-chat-to-notebook'] = gr.Button('Send to Notebook')
with gr.Row(): shared.gradio['show_controls'] = gr.Checkbox(value=shared.settings['show_controls'], label='Show controls (Ctrl+S)', elem_id='show-controls')
shared.gradio['Send dummy message'] = gr.Button('Send dummy message')
shared.gradio['Send dummy reply'] = gr.Button('Send dummy reply')
with gr.Row():
shared.gradio['send-chat-to-default'] = gr.Button('Send to Default')
shared.gradio['send-chat-to-notebook'] = gr.Button('Send to Notebook')
with gr.Row(elem_id='chat-controls', elem_classes=['pretty_scrollbar']): with gr.Row(elem_id='chat-controls', elem_classes=['pretty_scrollbar']):
with gr.Column(): with gr.Column():
@ -87,7 +80,7 @@ def create_ui():
shared.gradio['start_with'] = gr.Textbox(label='Start reply with', placeholder='Sure thing!', value=shared.settings['start_with'], elem_classes=['add_scrollbar']) shared.gradio['start_with'] = gr.Textbox(label='Start reply with', placeholder='Sure thing!', value=shared.settings['start_with'], elem_classes=['add_scrollbar'])
with gr.Row(): with gr.Row():
shared.gradio['enable_web_search'] = gr.Checkbox(value=shared.settings.get('enable_web_search', False), label='Activate web search') shared.gradio['enable_web_search'] = gr.Checkbox(value=shared.settings.get('enable_web_search', False), label='Activate web search', elem_id='web-search')
with gr.Row(visible=shared.settings.get('enable_web_search', False)) as shared.gradio['web_search_row']: with gr.Row(visible=shared.settings.get('enable_web_search', False)) as shared.gradio['web_search_row']:
shared.gradio['web_search_pages'] = gr.Number(value=shared.settings.get('web_search_pages', 3), precision=0, label='Number of pages to download', minimum=1, maximum=10) shared.gradio['web_search_pages'] = gr.Number(value=shared.settings.get('web_search_pages', 3), precision=0, label='Number of pages to download', minimum=1, maximum=10)
@ -202,7 +195,7 @@ def create_event_handlers():
shared.reload_inputs = gradio(reload_arr) shared.reload_inputs = gradio(reload_arr)
# Morph HTML updates instead of updating everything # Morph HTML updates instead of updating everything
shared.gradio['display'].change(None, gradio('display'), None, js="(data) => handleMorphdomUpdate(data.html)") shared.gradio['display'].change(None, gradio('display'), None, js="(data) => handleMorphdomUpdate(data)")
shared.gradio['Generate'].click( shared.gradio['Generate'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then( ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(

View file

@ -1,4 +1,6 @@
import importlib import importlib
import queue
import threading
import traceback import traceback
from functools import partial from functools import partial
from pathlib import Path from pathlib import Path
@ -205,48 +207,51 @@ def load_lora_wrapper(selected_loras):
def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), return_links=False, check=False): def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), return_links=False, check=False):
downloader_module = importlib.import_module("download-model")
downloader = downloader_module.ModelDownloader()
update_queue = queue.Queue()
try: try:
# Handle direct GGUF URLs # Handle direct GGUF URLs
if repo_id.startswith("https://") and ("huggingface.co" in repo_id) and (repo_id.endswith(".gguf") or repo_id.endswith(".gguf?download=true")): if repo_id.startswith("https://") and ("huggingface.co" in repo_id) and (repo_id.endswith(".gguf") or repo_id.endswith(".gguf?download=true")):
try: try:
path = repo_id.split("huggingface.co/")[1] path = repo_id.split("huggingface.co/")[1]
# Extract the repository ID (first two parts of the path)
parts = path.split("/") parts = path.split("/")
if len(parts) >= 2: if len(parts) >= 2:
extracted_repo_id = f"{parts[0]}/{parts[1]}" extracted_repo_id = f"{parts[0]}/{parts[1]}"
filename = repo_id.split("/")[-1].replace("?download=true", "")
# Extract the filename (last part of the path)
filename = repo_id.split("/")[-1]
if "?download=true" in filename:
filename = filename.replace("?download=true", "")
repo_id = extracted_repo_id repo_id = extracted_repo_id
specific_file = filename specific_file = filename
except: except Exception as e:
pass yield f"Error parsing GGUF URL: {e}"
progress(0.0)
return
if repo_id == "": if not repo_id:
yield ("Please enter a model path") yield "Please enter a model path."
progress(0.0)
return return
repo_id = repo_id.strip() repo_id = repo_id.strip()
specific_file = specific_file.strip() specific_file = specific_file.strip()
downloader = importlib.import_module("download-model").ModelDownloader()
progress(0.0) progress(0.0, "Preparing download...")
model, branch = downloader.sanitize_model_and_branch_names(repo_id, None) model, branch = downloader.sanitize_model_and_branch_names(repo_id, None)
yield "Getting download links from Hugging Face..."
yield ("Getting the download links from Hugging Face")
links, sha256, is_lora, is_llamacpp = downloader.get_download_links_from_huggingface(model, branch, text_only=False, specific_file=specific_file) links, sha256, is_lora, is_llamacpp = downloader.get_download_links_from_huggingface(model, branch, text_only=False, specific_file=specific_file)
if not links:
yield "No files found to download for the given model/criteria."
progress(0.0)
return
# Check for multiple GGUF files # Check for multiple GGUF files
gguf_files = [link for link in links if link.lower().endswith('.gguf')] gguf_files = [link for link in links if link.lower().endswith('.gguf')]
if len(gguf_files) > 1 and not specific_file: if len(gguf_files) > 1 and not specific_file:
output = "Multiple GGUF files found. Please copy one of the following filenames to the 'File name' field:\n\n```\n" output = "Multiple GGUF files found. Please copy one of the following filenames to the 'File name' field:\n\n```\n"
for link in gguf_files: for link in gguf_files:
output += f"{Path(link).name}\n" output += f"{Path(link).name}\n"
output += "```" output += "```"
yield output yield output
return return
@ -255,17 +260,13 @@ def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), retur
output = "```\n" output = "```\n"
for link in links: for link in links:
output += f"{Path(link).name}" + "\n" output += f"{Path(link).name}" + "\n"
output += "```" output += "```"
yield output yield output
return return
yield ("Getting the output folder") yield "Determining output folder..."
output_folder = downloader.get_output_folder( output_folder = downloader.get_output_folder(
model, model, branch, is_lora, is_llamacpp=is_llamacpp,
branch,
is_lora,
is_llamacpp=is_llamacpp,
model_dir=shared.args.model_dir if shared.args.model_dir != shared.args_defaults.model_dir else None model_dir=shared.args.model_dir if shared.args.model_dir != shared.args_defaults.model_dir else None
) )
@ -275,19 +276,65 @@ def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), retur
output_folder = Path(shared.args.lora_dir) output_folder = Path(shared.args.lora_dir)
if check: if check:
progress(0.5) yield "Checking previously downloaded files..."
progress(0.5, "Verifying files...")
yield ("Checking previously downloaded files")
downloader.check_model_files(model, branch, links, sha256, output_folder) downloader.check_model_files(model, branch, links, sha256, output_folder)
progress(1.0) progress(1.0, "Verification complete.")
else: yield "File check complete."
yield (f"Downloading file{'s' if len(links) > 1 else ''} to `{output_folder}/`") return
downloader.download_model_files(model, branch, links, sha256, output_folder, progress_bar=progress, threads=4, is_llamacpp=is_llamacpp)
yield (f"Model successfully saved to `{output_folder}/`.") yield ""
except: progress(0.0, "Download starting...")
progress(1.0)
yield traceback.format_exc().replace('\n', '\n\n') def downloader_thread_target():
try:
downloader.download_model_files(
model, branch, links, sha256, output_folder,
progress_queue=update_queue,
threads=4,
is_llamacpp=is_llamacpp,
specific_file=specific_file
)
update_queue.put(("COMPLETED", f"Model successfully saved to `{output_folder}/`."))
except Exception as e:
tb_str = traceback.format_exc().replace('\n', '\n\n')
update_queue.put(("ERROR", tb_str))
download_thread = threading.Thread(target=downloader_thread_target)
download_thread.start()
while True:
try:
message = update_queue.get(timeout=0.2)
if not isinstance(message, tuple) or len(message) != 2:
continue
msg_identifier, data = message
if msg_identifier == "COMPLETED":
progress(1.0, "Download complete!")
yield data
break
elif msg_identifier == "ERROR":
progress(0.0, "Error occurred")
yield data
break
elif isinstance(msg_identifier, float):
progress_value = msg_identifier
description_str = data
progress(progress_value, f"Downloading: {description_str}")
except queue.Empty:
if not download_thread.is_alive():
yield "Download process finished."
break
download_thread.join()
except Exception as e:
progress(0.0)
tb_str = traceback.format_exc().replace('\n', '\n\n')
yield tb_str
def update_truncation_length(current_length, state): def update_truncation_length(current_length, state):

View file

@ -71,8 +71,6 @@ def create_ui(default_preset):
shared.gradio['max_new_tokens'] = gr.Slider(minimum=shared.settings['max_new_tokens_min'], maximum=shared.settings['max_new_tokens_max'], value=shared.settings['max_new_tokens'], step=1, label='max_new_tokens', info='⚠️ Setting this too high can cause prompt truncation.') shared.gradio['max_new_tokens'] = gr.Slider(minimum=shared.settings['max_new_tokens_min'], maximum=shared.settings['max_new_tokens_max'], value=shared.settings['max_new_tokens'], step=1, label='max_new_tokens', info='⚠️ Setting this too high can cause prompt truncation.')
shared.gradio['prompt_lookup_num_tokens'] = gr.Slider(value=shared.settings['prompt_lookup_num_tokens'], minimum=0, maximum=10, step=1, label='prompt_lookup_num_tokens', info='Activates Prompt Lookup Decoding.') shared.gradio['prompt_lookup_num_tokens'] = gr.Slider(value=shared.settings['prompt_lookup_num_tokens'], minimum=0, maximum=10, step=1, label='prompt_lookup_num_tokens', info='Activates Prompt Lookup Decoding.')
shared.gradio['max_tokens_second'] = gr.Slider(value=shared.settings['max_tokens_second'], minimum=0, maximum=20, step=1, label='Maximum tokens/second', info='To make text readable in real time.') shared.gradio['max_tokens_second'] = gr.Slider(value=shared.settings['max_tokens_second'], minimum=0, maximum=20, step=1, label='Maximum tokens/second', info='To make text readable in real time.')
shared.gradio['max_updates_second'] = gr.Slider(value=shared.settings['max_updates_second'], minimum=0, maximum=24, step=1, label='Maximum UI updates/second', info='Set this if you experience lag in the UI during streaming.')
with gr.Column(): with gr.Column():
with gr.Row(): with gr.Row():
with gr.Column(): with gr.Column():

View file

@ -70,12 +70,8 @@ def is_installed():
def cpu_has_avx2(): def cpu_has_avx2():
try: try:
import cpuinfo import cpuinfo
info = cpuinfo.get_cpu_info() info = cpuinfo.get_cpu_info()
if 'avx2' in info['flags']: return 'avx2' in info['flags']
return True
else:
return False
except: except:
return True return True
@ -83,30 +79,112 @@ def cpu_has_avx2():
def cpu_has_amx(): def cpu_has_amx():
try: try:
import cpuinfo import cpuinfo
info = cpuinfo.get_cpu_info() info = cpuinfo.get_cpu_info()
if 'amx' in info['flags']: return 'amx' in info['flags']
return True
else:
return False
except: except:
return True return True
def torch_version(): def load_state():
site_packages_path = None """Load installer state from JSON file"""
for sitedir in site.getsitepackages(): if os.path.exists(state_file):
if "site-packages" in sitedir and conda_env_path in sitedir: try:
site_packages_path = sitedir with open(state_file, 'r') as f:
break return json.load(f)
except:
return {}
return {}
if site_packages_path:
torch_version_file = open(os.path.join(site_packages_path, 'torch', 'version.py')).read().splitlines() def save_state(state):
torver = [line for line in torch_version_file if line.startswith('__version__')][0].split('__version__ = ')[1].strip("'") """Save installer state to JSON file"""
with open(state_file, 'w') as f:
json.dump(state, f)
def get_gpu_choice():
"""Get GPU choice from state file or ask user"""
state = load_state()
gpu_choice = state.get('gpu_choice')
if not gpu_choice:
if "GPU_CHOICE" in os.environ:
choice = os.environ["GPU_CHOICE"].upper()
print_big_message(f"Selected GPU choice \"{choice}\" based on the GPU_CHOICE environment variable.")
else:
choice = get_user_choice(
"What is your GPU?",
{
'A': 'NVIDIA - CUDA 12.4',
'B': 'AMD - Linux/macOS only, requires ROCm 6.2.4',
'C': 'Apple M Series',
'D': 'Intel Arc (beta)',
'N': 'CPU mode'
},
)
# Convert choice to GPU name
gpu_choice = {"A": "NVIDIA", "B": "AMD", "C": "APPLE", "D": "INTEL", "N": "NONE"}[choice]
# Save choice to state
state['gpu_choice'] = gpu_choice
save_state(state)
return gpu_choice
def get_pytorch_install_command(gpu_choice):
"""Get PyTorch installation command based on GPU choice"""
base_cmd = f"python -m pip install torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION} "
if gpu_choice == "NVIDIA":
return base_cmd + "--index-url https://download.pytorch.org/whl/cu124"
elif gpu_choice == "AMD":
return base_cmd + "--index-url https://download.pytorch.org/whl/rocm6.2.4"
elif gpu_choice in ["APPLE", "NONE"]:
return base_cmd + "--index-url https://download.pytorch.org/whl/cpu"
elif gpu_choice == "INTEL":
if is_linux():
return "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
else:
return "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
else: else:
from torch import __version__ as torver return base_cmd
return torver
def get_pytorch_update_command(gpu_choice):
"""Get PyTorch update command based on GPU choice"""
base_cmd = f"python -m pip install --upgrade torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION}"
if gpu_choice == "NVIDIA":
return f"{base_cmd} --index-url https://download.pytorch.org/whl/cu124"
elif gpu_choice == "AMD":
return f"{base_cmd} --index-url https://download.pytorch.org/whl/rocm6.2.4"
elif gpu_choice in ["APPLE", "NONE"]:
return f"{base_cmd} --index-url https://download.pytorch.org/whl/cpu"
elif gpu_choice == "INTEL":
intel_extension = "intel-extension-for-pytorch==2.1.10+xpu" if is_linux() else "intel-extension-for-pytorch==2.1.10"
return f"{base_cmd} {intel_extension} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
else:
return base_cmd
def get_requirements_file(gpu_choice):
"""Get requirements file path based on GPU choice"""
requirements_base = os.path.join("requirements", "full")
if gpu_choice == "AMD":
file_name = f"requirements_amd{'_noavx2' if not cpu_has_avx2() else ''}.txt"
elif gpu_choice == "APPLE":
file_name = f"requirements_apple_{'intel' if is_x86_64() else 'silicon'}.txt"
elif gpu_choice in ["INTEL", "NONE"]:
file_name = f"requirements_cpu_only{'_noavx2' if not cpu_has_avx2() else ''}.txt"
elif gpu_choice == "NVIDIA":
file_name = f"requirements{'_noavx2' if not cpu_has_avx2() else ''}.txt"
else:
raise ValueError(f"Unknown GPU choice: {gpu_choice}")
return os.path.join(requirements_base, file_name)
def get_current_commit(): def get_current_commit():
@ -209,28 +287,8 @@ def get_user_choice(question, options_dict):
def update_pytorch_and_python(): def update_pytorch_and_python():
print_big_message("Checking for PyTorch updates.") print_big_message("Checking for PyTorch updates.")
gpu_choice = get_gpu_choice()
# Update the Python version. Left here for future reference in case this becomes necessary. install_cmd = get_pytorch_update_command(gpu_choice)
# print_big_message("Checking for PyTorch and Python updates.")
# current_python_version = f"{sys.version_info.major}.{sys.version_info.minor}"
# if current_python_version != PYTHON_VERSION:
# run_cmd(f"conda install -y python={PYTHON_VERSION}", assert_success=True, environment=True)
torver = torch_version()
base_cmd = f"python -m pip install --upgrade torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION}"
if "+cu" in torver:
install_cmd = f"{base_cmd} --index-url https://download.pytorch.org/whl/cu124"
elif "+rocm" in torver:
install_cmd = f"{base_cmd} --index-url https://download.pytorch.org/whl/rocm6.2.4"
elif "+cpu" in torver:
install_cmd = f"{base_cmd} --index-url https://download.pytorch.org/whl/cpu"
elif "+cxx11" in torver:
intel_extension = "intel-extension-for-pytorch==2.1.10+xpu" if is_linux() else "intel-extension-for-pytorch==2.1.10"
install_cmd = f"{base_cmd} {intel_extension} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
else:
install_cmd = base_cmd
run_cmd(install_cmd, assert_success=True, environment=True) run_cmd(install_cmd, assert_success=True, environment=True)
@ -256,43 +314,11 @@ def install_webui():
if os.path.isfile(state_file): if os.path.isfile(state_file):
os.remove(state_file) os.remove(state_file)
# Ask the user for the GPU vendor # Get GPU choice and save it to state
if "GPU_CHOICE" in os.environ: gpu_choice = get_gpu_choice()
choice = os.environ["GPU_CHOICE"].upper()
print_big_message(f"Selected GPU choice \"{choice}\" based on the GPU_CHOICE environment variable.")
# Warn about changed meanings and handle old choices
if choice == "B":
print_big_message("Warning: GPU_CHOICE='B' now means 'AMD' in the new version.")
elif choice == "C":
print_big_message("Warning: GPU_CHOICE='C' now means 'Apple M Series' in the new version.")
elif choice == "D":
print_big_message("Warning: GPU_CHOICE='D' now means 'Intel Arc' in the new version.")
else:
choice = get_user_choice(
"What is your GPU?",
{
'A': 'NVIDIA - CUDA 12.4',
'B': 'AMD - Linux/macOS only, requires ROCm 6.2.4',
'C': 'Apple M Series',
'D': 'Intel Arc (beta)',
'N': 'CPU mode'
},
)
# Convert choices to GPU names for compatibility
gpu_choice_to_name = {
"A": "NVIDIA",
"B": "AMD",
"C": "APPLE",
"D": "INTEL",
"N": "NONE"
}
selected_gpu = gpu_choice_to_name[choice]
# Write a flag to CMD_FLAGS.txt for CPU mode # Write a flag to CMD_FLAGS.txt for CPU mode
if selected_gpu == "NONE": if gpu_choice == "NONE":
cmd_flags_path = os.path.join(script_dir, "user_data", "CMD_FLAGS.txt") cmd_flags_path = os.path.join(script_dir, "user_data", "CMD_FLAGS.txt")
with open(cmd_flags_path, 'r+') as cmd_flags_file: with open(cmd_flags_path, 'r+') as cmd_flags_file:
if "--cpu" not in cmd_flags_file.read(): if "--cpu" not in cmd_flags_file.read():
@ -300,34 +326,20 @@ def install_webui():
cmd_flags_file.write("\n--cpu\n") cmd_flags_file.write("\n--cpu\n")
# Handle CUDA version display # Handle CUDA version display
elif any((is_windows(), is_linux())) and selected_gpu == "NVIDIA": elif any((is_windows(), is_linux())) and gpu_choice == "NVIDIA":
print("CUDA: 12.4") print("CUDA: 12.4")
# No PyTorch for AMD on Windows (?) # No PyTorch for AMD on Windows (?)
elif is_windows() and selected_gpu == "AMD": elif is_windows() and gpu_choice == "AMD":
print("PyTorch setup on Windows is not implemented yet. Exiting...") print("PyTorch setup on Windows is not implemented yet. Exiting...")
sys.exit(1) sys.exit(1)
# Find the Pytorch installation command
install_pytorch = f"python -m pip install torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION} "
if selected_gpu == "NVIDIA":
install_pytorch += "--index-url https://download.pytorch.org/whl/cu124"
elif selected_gpu == "AMD":
install_pytorch += "--index-url https://download.pytorch.org/whl/rocm6.2.4"
elif selected_gpu in ["APPLE", "NONE"]:
install_pytorch += "--index-url https://download.pytorch.org/whl/cpu"
elif selected_gpu == "INTEL":
if is_linux():
install_pytorch = "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
else:
install_pytorch = "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
# Install Git and then Pytorch # Install Git and then Pytorch
print_big_message("Installing PyTorch.") print_big_message("Installing PyTorch.")
install_pytorch = get_pytorch_install_command(gpu_choice)
run_cmd(f"conda install -y ninja git && {install_pytorch} && python -m pip install py-cpuinfo==9.0.0", assert_success=True, environment=True) run_cmd(f"conda install -y ninja git && {install_pytorch} && python -m pip install py-cpuinfo==9.0.0", assert_success=True, environment=True)
if selected_gpu == "INTEL": if gpu_choice == "INTEL":
# Install oneAPI dependencies via conda # Install oneAPI dependencies via conda
print_big_message("Installing Intel oneAPI runtime libraries.") print_big_message("Installing Intel oneAPI runtime libraries.")
run_cmd("conda install -y -c https://software.repos.intel.com/python/conda/ -c conda-forge dpcpp-cpp-rt=2024.0 mkl-dpcpp=2024.0", environment=True) run_cmd("conda install -y -c https://software.repos.intel.com/python/conda/ -c conda-forge dpcpp-cpp-rt=2024.0 mkl-dpcpp=2024.0", environment=True)
@ -349,31 +361,15 @@ def update_requirements(initial_installation=False, pull=True):
assert_success=True assert_success=True
) )
torver = torch_version()
requirements_base = os.path.join("requirements", "full")
if "+rocm" in torver:
file_name = f"requirements_amd{'_noavx2' if not cpu_has_avx2() else ''}.txt"
elif "+cpu" in torver or "+cxx11" in torver:
file_name = f"requirements_cpu_only{'_noavx2' if not cpu_has_avx2() else ''}.txt"
elif is_macos():
file_name = f"requirements_apple_{'intel' if is_x86_64() else 'silicon'}.txt"
else:
file_name = f"requirements{'_noavx2' if not cpu_has_avx2() else ''}.txt"
requirements_file = os.path.join(requirements_base, file_name)
# Load state from JSON file
current_commit = get_current_commit() current_commit = get_current_commit()
wheels_changed = False wheels_changed = not os.path.exists(state_file)
if os.path.exists(state_file): if not wheels_changed:
with open(state_file, 'r') as f: state = load_state()
last_state = json.load(f) if 'wheels_changed' in state or state.get('last_installed_commit') != current_commit:
if 'wheels_changed' in last_state or last_state.get('last_installed_commit') != current_commit:
wheels_changed = True wheels_changed = True
else:
wheels_changed = True gpu_choice = get_gpu_choice()
requirements_file = get_requirements_file(gpu_choice)
if pull: if pull:
# Read .whl lines before pulling # Read .whl lines before pulling
@ -409,19 +405,17 @@ def update_requirements(initial_installation=False, pull=True):
print_big_message(f"File '{file}' was updated during 'git pull'. Please run the script again.") print_big_message(f"File '{file}' was updated during 'git pull'. Please run the script again.")
# Save state before exiting # Save state before exiting
current_state = {} state = load_state()
if wheels_changed: if wheels_changed:
current_state['wheels_changed'] = True state['wheels_changed'] = True
save_state(state)
with open(state_file, 'w') as f:
json.dump(current_state, f)
sys.exit(1) sys.exit(1)
# Save current state # Save current state
current_state = {'last_installed_commit': current_commit} state = load_state()
with open(state_file, 'w') as f: state['last_installed_commit'] = current_commit
json.dump(current_state, f) state.pop('wheels_changed', None) # Remove wheels_changed flag
save_state(state)
if os.environ.get("INSTALL_EXTENSIONS", "").lower() in ("yes", "y", "true", "1", "t", "on"): if os.environ.get("INSTALL_EXTENSIONS", "").lower() in ("yes", "y", "true", "1", "t", "on"):
install_extensions_requirements() install_extensions_requirements()
@ -432,11 +426,10 @@ def update_requirements(initial_installation=False, pull=True):
# Update PyTorch # Update PyTorch
if not initial_installation: if not initial_installation:
update_pytorch_and_python() update_pytorch_and_python()
torver = torch_version()
clean_outdated_pytorch_cuda_dependencies() clean_outdated_pytorch_cuda_dependencies()
print_big_message(f"Installing webui requirements from file: {requirements_file}") print_big_message(f"Installing webui requirements from file: {requirements_file}")
print(f"TORCH: {torver}\n") print(f"GPU Choice: {gpu_choice}\n")
# Prepare the requirements file # Prepare the requirements file
textgen_requirements = open(requirements_file).read().splitlines() textgen_requirements = open(requirements_file).read().splitlines()

View file

@ -16,6 +16,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -33,12 +34,12 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
https://github.com/oobabooga/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/kingbri1/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -32,7 +33,7 @@ sse-starlette==1.6.5
tiktoken tiktoken
# AMD wheels # AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -32,7 +33,7 @@ sse-starlette==1.6.5
tiktoken tiktoken
# AMD wheels # AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -32,7 +33,7 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9-py3-none-any.whl https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3-py3-none-any.whl
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -32,8 +33,8 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9-py3-none-any.whl https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3-py3-none-any.whl
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -32,5 +33,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama.cpp (CPU only, AVX2) # llama.cpp (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -32,5 +33,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama.cpp (CPU only, no AVX2) # llama.cpp (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -16,6 +16,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -33,12 +34,12 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64" https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
https://github.com/oobabooga/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11" https://github.com/kingbri1/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11" https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"

View file

@ -15,6 +15,7 @@ Pillow>=9.5.0
psutil psutil
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,6 +19,6 @@ sse-starlette==1.6.5
tiktoken tiktoken
# Mac wheels # Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama.cpp (CPU only, AVX2) # llama.cpp (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# llama.cpp (CPU only, no AVX2) # llama.cpp (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -7,6 +7,7 @@ markdown
numpy==1.26.* numpy==1.26.*
pydantic==2.8.2 pydantic==2.8.2
PyPDF2==3.0.1 PyPDF2==3.0.1
python-docx==1.1.2
pyyaml pyyaml
requests requests
rich rich
@ -18,5 +19,5 @@ sse-starlette==1.6.5
tiktoken tiktoken
# CUDA wheels # CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -60,6 +60,14 @@ from modules.utils import gradio
def signal_handler(sig, frame): def signal_handler(sig, frame):
logger.info("Received Ctrl+C. Shutting down Text generation web UI gracefully.") logger.info("Received Ctrl+C. Shutting down Text generation web UI gracefully.")
# Explicitly stop LlamaServer to avoid __del__ cleanup issues during shutdown
if shared.model and shared.model.__class__.__name__ == 'LlamaServer':
try:
shared.model.stop()
except:
pass
sys.exit(0) sys.exit(0)

View file

@ -18,7 +18,6 @@ max_new_tokens_min: 1
max_new_tokens_max: 4096 max_new_tokens_max: 4096
prompt_lookup_num_tokens: 0 prompt_lookup_num_tokens: 0
max_tokens_second: 0 max_tokens_second: 0
max_updates_second: 12
auto_max_new_tokens: true auto_max_new_tokens: true
ban_eos_token: false ban_eos_token: false
add_bos_token: true add_bos_token: true