mirror of
https://github.com/oobabooga/text-generation-webui.git
synced 2025-06-07 06:06:20 -04:00
Compare commits
45 commits
ed6ecf1df4
...
ee25cefce2
Author | SHA1 | Date | |
---|---|---|---|
|
ee25cefce2 | ||
|
0783f5c891 | ||
|
7f7909be54 | ||
|
7366ff5dfa | ||
|
27affa9db7 | ||
|
d47c8eb956 | ||
|
977ec801b7 | ||
|
3829507d0f | ||
|
3d676cd50f | ||
|
66a75c899a | ||
|
9bd7359ffa | ||
|
93b3752cdf | ||
|
b38ec0ec38 | ||
|
b30a73016d | ||
|
7278548cd1 | ||
|
bb409c926e | ||
|
45c9ae312c | ||
|
2db7745cbd | ||
|
ad6d0218ae | ||
|
92adceb7b5 | ||
|
7a81beb0c1 | ||
|
bf42b2c3a1 | ||
|
83849336d8 | ||
|
3e3746283c | ||
|
88ff3e6ad8 | ||
|
9e80193008 | ||
|
0816ecedb7 | ||
|
98a7508a99 | ||
|
85f2f01a3a | ||
|
f8d220c1e6 | ||
|
4a2727b71d | ||
|
1d88456659 | ||
|
dc8ed6dbe7 | ||
|
c55d3c61c6 | ||
|
15f466ca3f | ||
|
219f0a7731 | ||
|
298d4719c6 | ||
|
7c29879e79 | ||
|
1f3b1a1b94 | ||
|
d702a2a962 | ||
|
9d7894a13f | ||
|
c6d0de8538 | ||
|
c1a47a0b60 | ||
|
2e21b1f5e3 | ||
|
f92e1f44a0 |
39 changed files with 990 additions and 525 deletions
18
README.md
18
README.md
|
@ -14,18 +14,18 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
|
||||||
|
|
||||||
- Supports multiple text generation backends in one UI/API, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), [ExLlamaV2](https://github.com/turboderp-org/exllamav2), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) (the latter via its own [Dockerfile](https://github.com/oobabooga/text-generation-webui/blob/main/docker/TensorRT-LLM/Dockerfile)).
|
- Supports multiple text generation backends in one UI/API, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), [ExLlamaV2](https://github.com/turboderp-org/exllamav2), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) (the latter via its own [Dockerfile](https://github.com/oobabooga/text-generation-webui/blob/main/docker/TensorRT-LLM/Dockerfile)).
|
||||||
- Easy setup: Choose between **portable builds** (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or the one-click installer that creates a self-contained `installer_files` directory.
|
- Easy setup: Choose between **portable builds** (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or the one-click installer that creates a self-contained `installer_files` directory.
|
||||||
- **File attachments**: Upload text files and PDF documents directly in conversations to talk about their contents.
|
- 100% offline and private, with zero telemetry, external resources, or remote update requests.
|
||||||
- **Web search**: Optionally search the internet with LLM-generated queries based on your input to add context to the conversation.
|
|
||||||
- Advanced chat management: Edit messages, navigate between message versions, and branch conversations at any point.
|
|
||||||
- Automatic prompt formatting using Jinja2 templates. You don't need to ever worry about prompt formats.
|
- Automatic prompt formatting using Jinja2 templates. You don't need to ever worry about prompt formats.
|
||||||
- Automatic GPU layers for GGUF models (on NVIDIA GPUs).
|
- **File attachments**: Upload text files, PDF documents, and .docx documents to talk about their contents.
|
||||||
- UI that resembles the original ChatGPT style.
|
- **Web search**: Optionally search the internet with LLM-generated queries to add context to the conversation.
|
||||||
- Three chat modes: `instruct`, `chat-instruct`, and `chat`, with automatic prompt templates in `chat-instruct`.
|
- Aesthetic UI with dark and light themes.
|
||||||
- Free-form text generation in the Default/Notebook tabs without being limited to chat turns. You can send formatted conversations from the Chat tab to these.
|
- `instruct` mode for instruction-following (like ChatGPT), and `chat-instruct`/`chat` modes for talking to custom characters.
|
||||||
|
- Edit messages, navigate between message versions, and branch conversations at any point.
|
||||||
- Multiple sampling parameters and generation options for sophisticated text generation control.
|
- Multiple sampling parameters and generation options for sophisticated text generation control.
|
||||||
- Switch between different models easily in the UI without restarting, with fine control over settings.
|
- Switch between different models in the UI without restarting.
|
||||||
|
- Automatic GPU layers for GGUF models (on NVIDIA GPUs).
|
||||||
|
- Free-form text generation in the Default/Notebook tabs without being limited to chat turns.
|
||||||
- OpenAI-compatible API with Chat and Completions endpoints, including tool-calling support – see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples).
|
- OpenAI-compatible API with Chat and Completions endpoints, including tool-calling support – see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples).
|
||||||
- 100% offline and private, with zero telemetry, external resources, or remote update requests. Web search is optional and user-controlled.
|
|
||||||
- Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
|
- Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
|
||||||
|
|
||||||
## How to install
|
## How to install
|
||||||
|
|
|
@ -17,6 +17,14 @@
|
||||||
color: #d1d5db !important;
|
color: #d1d5db !important;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.chat .message-body :is(th, td) {
|
||||||
|
border-color: #40404096 !important;
|
||||||
|
}
|
||||||
|
|
||||||
|
.dark .chat .message-body :is(th, td) {
|
||||||
|
border-color: #ffffff75 !important;
|
||||||
|
}
|
||||||
|
|
||||||
.chat .message-body :is(p, ul, ol) {
|
.chat .message-body :is(p, ul, ol) {
|
||||||
margin: 1.25em 0 !important;
|
margin: 1.25em 0 !important;
|
||||||
}
|
}
|
||||||
|
|
81
css/main.css
81
css/main.css
|
@ -582,7 +582,6 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
|
||||||
|
|
||||||
#chat-input {
|
#chat-input {
|
||||||
padding: 0;
|
padding: 0;
|
||||||
padding-top: 18px;
|
|
||||||
background: transparent;
|
background: transparent;
|
||||||
border: none;
|
border: none;
|
||||||
}
|
}
|
||||||
|
@ -661,37 +660,12 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#show-controls {
|
|
||||||
position: absolute;
|
|
||||||
background-color: transparent;
|
|
||||||
border: 0 !important;
|
|
||||||
border-radius: 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
#show-controls label {
|
|
||||||
z-index: 1000;
|
|
||||||
position: absolute;
|
|
||||||
right: 30px;
|
|
||||||
top: 10px;
|
|
||||||
white-space: nowrap;
|
|
||||||
overflow: hidden;
|
|
||||||
text-overflow: ellipsis;
|
|
||||||
}
|
|
||||||
|
|
||||||
.dark #show-controls span {
|
|
||||||
color: var(--neutral-400);
|
|
||||||
}
|
|
||||||
|
|
||||||
#show-controls span {
|
|
||||||
color: var(--neutral-600);
|
|
||||||
}
|
|
||||||
|
|
||||||
#typing-container {
|
#typing-container {
|
||||||
display: none;
|
display: none;
|
||||||
position: absolute;
|
position: absolute;
|
||||||
background-color: transparent;
|
background-color: transparent;
|
||||||
left: -2px;
|
left: -2px;
|
||||||
top: 4px;
|
top: -5px;
|
||||||
padding: var(--block-padding);
|
padding: var(--block-padding);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -785,6 +759,33 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
|
||||||
background: var(--selected-item-color-dark) !important;
|
background: var(--selected-item-color-dark) !important;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#show-controls {
|
||||||
|
height: 36px;
|
||||||
|
border-top: 1px solid var(--border-color-dark) !important;
|
||||||
|
border-left: 1px solid var(--border-color-dark) !important;
|
||||||
|
border-right: 1px solid var(--border-color-dark) !important;
|
||||||
|
border-radius: 0;
|
||||||
|
border-bottom: 0 !important;
|
||||||
|
background-color: var(--darker-gray);
|
||||||
|
padding-top: 3px;
|
||||||
|
padding-left: 4px;
|
||||||
|
display: flex;
|
||||||
|
}
|
||||||
|
|
||||||
|
#show-controls label {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: row-reverse;
|
||||||
|
font-weight: bold;
|
||||||
|
justify-content: start;
|
||||||
|
width: 100%;
|
||||||
|
padding-right: 12px;
|
||||||
|
gap: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
#show-controls label input {
|
||||||
|
margin-top: 4px;
|
||||||
|
}
|
||||||
|
|
||||||
.transparent-substring {
|
.transparent-substring {
|
||||||
opacity: 0.333;
|
opacity: 0.333;
|
||||||
}
|
}
|
||||||
|
@ -1326,6 +1327,10 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
|
||||||
overflow: hidden;
|
overflow: hidden;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.thinking-content:focus, .thinking-header:focus {
|
||||||
|
outline: 0 !important;
|
||||||
|
}
|
||||||
|
|
||||||
.dark .thinking-block {
|
.dark .thinking-block {
|
||||||
background-color: var(--darker-gray);
|
background-color: var(--darker-gray);
|
||||||
}
|
}
|
||||||
|
@ -1551,3 +1556,25 @@ strong {
|
||||||
color: var(--body-text-color-subdued);
|
color: var(--body-text-color-subdued);
|
||||||
margin-top: 4px;
|
margin-top: 4px;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.image-attachment {
|
||||||
|
flex-direction: column;
|
||||||
|
}
|
||||||
|
|
||||||
|
.image-preview {
|
||||||
|
border-radius: 16px;
|
||||||
|
margin-bottom: 5px;
|
||||||
|
object-fit: cover;
|
||||||
|
object-position: center;
|
||||||
|
border: 2px solid var(--border-color-primary);
|
||||||
|
aspect-ratio: 1 / 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
button:focus {
|
||||||
|
outline: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Fix extra gaps for hidden elements on the right sidebar */
|
||||||
|
.svelte-sa48pu.stretch:has(> .hidden:only-child) {
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
|
|
@ -32,6 +32,7 @@ class ModelDownloader:
|
||||||
self.max_retries = max_retries
|
self.max_retries = max_retries
|
||||||
self.session = self.get_session()
|
self.session = self.get_session()
|
||||||
self._progress_bar_slots = None
|
self._progress_bar_slots = None
|
||||||
|
self.progress_queue = None
|
||||||
|
|
||||||
def get_session(self):
|
def get_session(self):
|
||||||
session = requests.Session()
|
session = requests.Session()
|
||||||
|
@ -218,33 +219,45 @@ class ModelDownloader:
|
||||||
|
|
||||||
max_retries = self.max_retries
|
max_retries = self.max_retries
|
||||||
attempt = 0
|
attempt = 0
|
||||||
|
file_downloaded_count_for_progress = 0
|
||||||
|
|
||||||
try:
|
try:
|
||||||
while attempt < max_retries:
|
while attempt < max_retries:
|
||||||
attempt += 1
|
attempt += 1
|
||||||
session = self.session
|
session = self.session
|
||||||
headers = {}
|
headers = {}
|
||||||
mode = 'wb'
|
mode = 'wb'
|
||||||
|
current_file_size_on_disk = 0
|
||||||
|
|
||||||
try:
|
try:
|
||||||
if output_path.exists() and not start_from_scratch:
|
if output_path.exists() and not start_from_scratch:
|
||||||
# Resume download
|
current_file_size_on_disk = output_path.stat().st_size
|
||||||
r = session.get(url, stream=True, timeout=20)
|
r_head = session.head(url, timeout=20)
|
||||||
total_size = int(r.headers.get('content-length', 0))
|
r_head.raise_for_status()
|
||||||
if output_path.stat().st_size >= total_size:
|
total_size = int(r_head.headers.get('content-length', 0))
|
||||||
|
|
||||||
|
if current_file_size_on_disk >= total_size and total_size > 0:
|
||||||
|
if self.progress_queue is not None and total_size > 0:
|
||||||
|
self.progress_queue.put((1.0, str(filename)))
|
||||||
return
|
return
|
||||||
|
|
||||||
headers = {'Range': f'bytes={output_path.stat().st_size}-'}
|
headers = {'Range': f'bytes={current_file_size_on_disk}-'}
|
||||||
mode = 'ab'
|
mode = 'ab'
|
||||||
|
|
||||||
with session.get(url, stream=True, headers=headers, timeout=30) as r:
|
with session.get(url, stream=True, headers=headers, timeout=30) as r:
|
||||||
r.raise_for_status() # If status is not 2xx, raise an error
|
r.raise_for_status()
|
||||||
total_size = int(r.headers.get('content-length', 0))
|
total_size_from_stream = int(r.headers.get('content-length', 0))
|
||||||
block_size = 1024 * 1024 # 1MB
|
if mode == 'ab':
|
||||||
|
effective_total_size = current_file_size_on_disk + total_size_from_stream
|
||||||
|
else:
|
||||||
|
effective_total_size = total_size_from_stream
|
||||||
|
|
||||||
filename_str = str(filename) # Convert PosixPath to string if necessary
|
block_size = 1024 * 1024
|
||||||
|
filename_str = str(filename)
|
||||||
|
|
||||||
tqdm_kwargs = {
|
tqdm_kwargs = {
|
||||||
'total': total_size,
|
'total': effective_total_size,
|
||||||
|
'initial': current_file_size_on_disk if mode == 'ab' else 0,
|
||||||
'unit': 'B',
|
'unit': 'B',
|
||||||
'unit_scale': True,
|
'unit_scale': True,
|
||||||
'unit_divisor': 1024,
|
'unit_divisor': 1024,
|
||||||
|
@ -261,16 +274,20 @@ class ModelDownloader:
|
||||||
})
|
})
|
||||||
|
|
||||||
with open(output_path, mode) as f:
|
with open(output_path, mode) as f:
|
||||||
|
if mode == 'ab':
|
||||||
|
f.seek(current_file_size_on_disk)
|
||||||
|
|
||||||
with tqdm.tqdm(**tqdm_kwargs) as t:
|
with tqdm.tqdm(**tqdm_kwargs) as t:
|
||||||
count = 0
|
file_downloaded_count_for_progress = current_file_size_on_disk
|
||||||
for data in r.iter_content(block_size):
|
for data in r.iter_content(block_size):
|
||||||
f.write(data)
|
f.write(data)
|
||||||
t.update(len(data))
|
t.update(len(data))
|
||||||
if total_size != 0 and self.progress_bar is not None:
|
if effective_total_size != 0 and self.progress_queue is not None:
|
||||||
count += len(data)
|
file_downloaded_count_for_progress += len(data)
|
||||||
self.progress_bar(float(count) / float(total_size), f"{filename_str}")
|
progress_fraction = float(file_downloaded_count_for_progress) / float(effective_total_size)
|
||||||
|
self.progress_queue.put((progress_fraction, filename_str))
|
||||||
|
break
|
||||||
|
|
||||||
break # Exit loop if successful
|
|
||||||
except (RequestException, ConnectionError, Timeout) as e:
|
except (RequestException, ConnectionError, Timeout) as e:
|
||||||
print(f"Error downloading {filename}: {e}.")
|
print(f"Error downloading {filename}: {e}.")
|
||||||
print(f"That was attempt {attempt}/{max_retries}.", end=' ')
|
print(f"That was attempt {attempt}/{max_retries}.", end=' ')
|
||||||
|
@ -295,10 +312,9 @@ class ModelDownloader:
|
||||||
finally:
|
finally:
|
||||||
print(f"\nDownload of {len(file_list)} files to {output_folder} completed.")
|
print(f"\nDownload of {len(file_list)} files to {output_folder} completed.")
|
||||||
|
|
||||||
def download_model_files(self, model, branch, links, sha256, output_folder, progress_bar=None, start_from_scratch=False, threads=4, specific_file=None, is_llamacpp=False):
|
def download_model_files(self, model, branch, links, sha256, output_folder, progress_queue=None, start_from_scratch=False, threads=4, specific_file=None, is_llamacpp=False):
|
||||||
self.progress_bar = progress_bar
|
self.progress_queue = progress_queue
|
||||||
|
|
||||||
# Create the folder and writing the metadata
|
|
||||||
output_folder.mkdir(parents=True, exist_ok=True)
|
output_folder.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
if not is_llamacpp:
|
if not is_llamacpp:
|
||||||
|
|
|
@ -1,8 +1,10 @@
|
||||||
|
import base64
|
||||||
import copy
|
import copy
|
||||||
import json
|
import json
|
||||||
import time
|
import time
|
||||||
from collections import deque
|
from collections import deque
|
||||||
|
|
||||||
|
import requests
|
||||||
import tiktoken
|
import tiktoken
|
||||||
from pydantic import ValidationError
|
from pydantic import ValidationError
|
||||||
|
|
||||||
|
@ -16,6 +18,7 @@ from modules.chat import (
|
||||||
load_character_memoized,
|
load_character_memoized,
|
||||||
load_instruction_template_memoized
|
load_instruction_template_memoized
|
||||||
)
|
)
|
||||||
|
from modules.logging_colors import logger
|
||||||
from modules.presets import load_preset_memoized
|
from modules.presets import load_preset_memoized
|
||||||
from modules.text_generation import decode, encode, generate_reply
|
from modules.text_generation import decode, encode, generate_reply
|
||||||
|
|
||||||
|
@ -82,6 +85,50 @@ def process_parameters(body, is_legacy=False):
|
||||||
return generate_params
|
return generate_params
|
||||||
|
|
||||||
|
|
||||||
|
def process_image_url(url, image_id):
|
||||||
|
"""Process an image URL and return attachment data for llama.cpp"""
|
||||||
|
try:
|
||||||
|
if url.startswith("data:"):
|
||||||
|
if "base64," in url:
|
||||||
|
image_data = url.split("base64,", 1)[1]
|
||||||
|
else:
|
||||||
|
raise ValueError("Unsupported data URL format")
|
||||||
|
else:
|
||||||
|
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'}
|
||||||
|
response = requests.get(url, timeout=10, headers=headers)
|
||||||
|
response.raise_for_status()
|
||||||
|
image_data = base64.b64encode(response.content).decode('utf-8')
|
||||||
|
|
||||||
|
return {"image_data": image_data, "image_id": image_id}
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing image URL {url}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def process_multimodal_content(content):
|
||||||
|
"""Extract text and images from OpenAI multimodal format"""
|
||||||
|
if isinstance(content, str):
|
||||||
|
return content, []
|
||||||
|
|
||||||
|
if isinstance(content, list):
|
||||||
|
text_content = ""
|
||||||
|
images = []
|
||||||
|
|
||||||
|
for item in content:
|
||||||
|
if item.get("type") == "text":
|
||||||
|
text_content += item.get("text", "")
|
||||||
|
elif item.get("type") == "image_url":
|
||||||
|
image_url = item.get("image_url", {}).get("url", "")
|
||||||
|
if image_url:
|
||||||
|
image = process_image_url(image_url, len(images) + 1)
|
||||||
|
if image:
|
||||||
|
images.append(image)
|
||||||
|
|
||||||
|
return text_content, images
|
||||||
|
|
||||||
|
return str(content), []
|
||||||
|
|
||||||
|
|
||||||
def convert_history(history):
|
def convert_history(history):
|
||||||
'''
|
'''
|
||||||
Chat histories in this program are in the format [message, reply].
|
Chat histories in this program are in the format [message, reply].
|
||||||
|
@ -93,19 +140,29 @@ def convert_history(history):
|
||||||
user_input = ""
|
user_input = ""
|
||||||
user_input_last = True
|
user_input_last = True
|
||||||
system_message = ""
|
system_message = ""
|
||||||
|
all_images = [] # Simple list to collect all images
|
||||||
|
|
||||||
for entry in history:
|
for entry in history:
|
||||||
content = entry["content"]
|
content = entry["content"]
|
||||||
role = entry["role"]
|
role = entry["role"]
|
||||||
|
|
||||||
if role == "user":
|
if role == "user":
|
||||||
user_input = content
|
# Process multimodal content
|
||||||
|
processed_content, images = process_multimodal_content(content)
|
||||||
|
if images:
|
||||||
|
image_refs = "".join("<__media__>" for img in images)
|
||||||
|
processed_content = f"{processed_content} {image_refs}"
|
||||||
|
|
||||||
|
user_input = processed_content
|
||||||
user_input_last = True
|
user_input_last = True
|
||||||
|
all_images.extend(images) # Add any images to our collection
|
||||||
|
|
||||||
if current_message:
|
if current_message:
|
||||||
chat_dialogue.append([current_message, '', ''])
|
chat_dialogue.append([current_message, '', ''])
|
||||||
current_message = ""
|
current_message = ""
|
||||||
|
|
||||||
current_message = content
|
current_message = processed_content
|
||||||
|
|
||||||
elif role == "assistant":
|
elif role == "assistant":
|
||||||
if "tool_calls" in entry and isinstance(entry["tool_calls"], list) and len(entry["tool_calls"]) > 0 and content.strip() == "":
|
if "tool_calls" in entry and isinstance(entry["tool_calls"], list) and len(entry["tool_calls"]) > 0 and content.strip() == "":
|
||||||
continue # skip tool calls
|
continue # skip tool calls
|
||||||
|
@ -126,7 +183,11 @@ def convert_history(history):
|
||||||
if not user_input_last:
|
if not user_input_last:
|
||||||
user_input = ""
|
user_input = ""
|
||||||
|
|
||||||
return user_input, system_message, {'internal': chat_dialogue, 'visible': copy.deepcopy(chat_dialogue)}
|
return user_input, system_message, {
|
||||||
|
'internal': chat_dialogue,
|
||||||
|
'visible': copy.deepcopy(chat_dialogue),
|
||||||
|
'images': all_images # Simple list of all images from the conversation
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, prompt_only=False) -> dict:
|
def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, prompt_only=False) -> dict:
|
||||||
|
@ -150,9 +211,23 @@ def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, p
|
||||||
elif m['role'] == 'function':
|
elif m['role'] == 'function':
|
||||||
raise InvalidRequestError(message="role: function is not supported.", param='messages')
|
raise InvalidRequestError(message="role: function is not supported.", param='messages')
|
||||||
|
|
||||||
if 'content' not in m and "image_url" not in m:
|
# Handle multimodal content validation
|
||||||
|
content = m.get('content')
|
||||||
|
if content is None:
|
||||||
raise InvalidRequestError(message="messages: missing content", param='messages')
|
raise InvalidRequestError(message="messages: missing content", param='messages')
|
||||||
|
|
||||||
|
# Validate multimodal content structure
|
||||||
|
if isinstance(content, list):
|
||||||
|
for item in content:
|
||||||
|
if not isinstance(item, dict) or 'type' not in item:
|
||||||
|
raise InvalidRequestError(message="messages: invalid content item format", param='messages')
|
||||||
|
if item['type'] not in ['text', 'image_url']:
|
||||||
|
raise InvalidRequestError(message="messages: unsupported content type", param='messages')
|
||||||
|
if item['type'] == 'text' and 'text' not in item:
|
||||||
|
raise InvalidRequestError(message="messages: missing text in content item", param='messages')
|
||||||
|
if item['type'] == 'image_url' and ('image_url' not in item or 'url' not in item['image_url']):
|
||||||
|
raise InvalidRequestError(message="messages: missing image_url in content item", param='messages')
|
||||||
|
|
||||||
# Chat Completions
|
# Chat Completions
|
||||||
object_type = 'chat.completion' if not stream else 'chat.completion.chunk'
|
object_type = 'chat.completion' if not stream else 'chat.completion.chunk'
|
||||||
created_time = int(time.time())
|
created_time = int(time.time())
|
||||||
|
@ -205,6 +280,10 @@ def chat_completions_common(body: dict, is_legacy: bool = False, stream=False, p
|
||||||
'stream': stream
|
'stream': stream
|
||||||
})
|
})
|
||||||
|
|
||||||
|
# Add images to state for llama.cpp multimodal support
|
||||||
|
if history.get('images'):
|
||||||
|
generate_params['image_attachments'] = history['images']
|
||||||
|
|
||||||
max_tokens = generate_params['max_new_tokens']
|
max_tokens = generate_params['max_new_tokens']
|
||||||
if max_tokens in [None, 0]:
|
if max_tokens in [None, 0]:
|
||||||
generate_params['max_new_tokens'] = 512
|
generate_params['max_new_tokens'] = 512
|
||||||
|
|
|
@ -95,6 +95,12 @@ function startEditing(messageElement, messageBody, isUserMessage) {
|
||||||
editingInterface.textarea.focus();
|
editingInterface.textarea.focus();
|
||||||
editingInterface.textarea.setSelectionRange(rawText.length, rawText.length);
|
editingInterface.textarea.setSelectionRange(rawText.length, rawText.length);
|
||||||
|
|
||||||
|
// Scroll the textarea into view
|
||||||
|
editingInterface.textarea.scrollIntoView({
|
||||||
|
behavior: "smooth",
|
||||||
|
block: "center"
|
||||||
|
});
|
||||||
|
|
||||||
// Setup event handlers
|
// Setup event handlers
|
||||||
setupEditingHandlers(editingInterface.textarea, messageElement, originalHTML, messageBody, isUserMessage);
|
setupEditingHandlers(editingInterface.textarea, messageElement, originalHTML, messageBody, isUserMessage);
|
||||||
}
|
}
|
||||||
|
@ -229,10 +235,23 @@ function removeLastClick() {
|
||||||
document.getElementById("Remove-last").click();
|
document.getElementById("Remove-last").click();
|
||||||
}
|
}
|
||||||
|
|
||||||
function handleMorphdomUpdate(text) {
|
function handleMorphdomUpdate(data) {
|
||||||
|
// Determine target element and use it as query scope
|
||||||
|
var target_element, target_html;
|
||||||
|
if (data.last_message_only) {
|
||||||
|
const childNodes = document.getElementsByClassName("messages")[0].childNodes;
|
||||||
|
target_element = childNodes[childNodes.length - 1];
|
||||||
|
target_html = data.html;
|
||||||
|
} else {
|
||||||
|
target_element = document.getElementById("chat").parentNode;
|
||||||
|
target_html = "<div class=\"prose svelte-1ybaih5\">" + data.html + "</div>";
|
||||||
|
}
|
||||||
|
|
||||||
|
const queryScope = target_element;
|
||||||
|
|
||||||
// Track open blocks
|
// Track open blocks
|
||||||
const openBlocks = new Set();
|
const openBlocks = new Set();
|
||||||
document.querySelectorAll(".thinking-block").forEach(block => {
|
queryScope.querySelectorAll(".thinking-block").forEach(block => {
|
||||||
const blockId = block.getAttribute("data-block-id");
|
const blockId = block.getAttribute("data-block-id");
|
||||||
// If block exists and is open, add to open set
|
// If block exists and is open, add to open set
|
||||||
if (blockId && block.hasAttribute("open")) {
|
if (blockId && block.hasAttribute("open")) {
|
||||||
|
@ -242,7 +261,7 @@ function handleMorphdomUpdate(text) {
|
||||||
|
|
||||||
// Store scroll positions for any open blocks
|
// Store scroll positions for any open blocks
|
||||||
const scrollPositions = {};
|
const scrollPositions = {};
|
||||||
document.querySelectorAll(".thinking-block[open]").forEach(block => {
|
queryScope.querySelectorAll(".thinking-block[open]").forEach(block => {
|
||||||
const content = block.querySelector(".thinking-content");
|
const content = block.querySelector(".thinking-content");
|
||||||
const blockId = block.getAttribute("data-block-id");
|
const blockId = block.getAttribute("data-block-id");
|
||||||
if (content && blockId) {
|
if (content && blockId) {
|
||||||
|
@ -255,8 +274,8 @@ function handleMorphdomUpdate(text) {
|
||||||
});
|
});
|
||||||
|
|
||||||
morphdom(
|
morphdom(
|
||||||
document.getElementById("chat").parentNode,
|
target_element,
|
||||||
"<div class=\"prose svelte-1ybaih5\">" + text + "</div>",
|
target_html,
|
||||||
{
|
{
|
||||||
onBeforeElUpdated: function(fromEl, toEl) {
|
onBeforeElUpdated: function(fromEl, toEl) {
|
||||||
// Preserve code highlighting
|
// Preserve code highlighting
|
||||||
|
@ -307,7 +326,7 @@ function handleMorphdomUpdate(text) {
|
||||||
);
|
);
|
||||||
|
|
||||||
// Add toggle listeners for new blocks
|
// Add toggle listeners for new blocks
|
||||||
document.querySelectorAll(".thinking-block").forEach(block => {
|
queryScope.querySelectorAll(".thinking-block").forEach(block => {
|
||||||
if (!block._hasToggleListener) {
|
if (!block._hasToggleListener) {
|
||||||
block.addEventListener("toggle", function(e) {
|
block.addEventListener("toggle", function(e) {
|
||||||
if (this.open) {
|
if (this.open) {
|
||||||
|
|
93
js/main.js
93
js/main.js
|
@ -184,7 +184,7 @@ const observer = new MutationObserver(function(mutations) {
|
||||||
const prevSibling = lastChild?.previousElementSibling;
|
const prevSibling = lastChild?.previousElementSibling;
|
||||||
if (lastChild && prevSibling) {
|
if (lastChild && prevSibling) {
|
||||||
lastChild.style.setProperty("margin-bottom",
|
lastChild.style.setProperty("margin-bottom",
|
||||||
`max(0px, calc(max(70vh, 100vh - ${prevSibling.offsetHeight}px - 102px) - ${lastChild.offsetHeight}px))`,
|
`max(0px, calc(max(70vh, 100vh - ${prevSibling.offsetHeight}px - 84px) - ${lastChild.offsetHeight}px))`,
|
||||||
"important"
|
"important"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
@ -217,7 +217,7 @@ function isElementVisibleOnScreen(element) {
|
||||||
}
|
}
|
||||||
|
|
||||||
function doSyntaxHighlighting() {
|
function doSyntaxHighlighting() {
|
||||||
const messageBodies = document.querySelectorAll(".message-body");
|
const messageBodies = document.getElementById("chat").querySelectorAll(".message-body");
|
||||||
|
|
||||||
if (messageBodies.length > 0) {
|
if (messageBodies.length > 0) {
|
||||||
observer.disconnect();
|
observer.disconnect();
|
||||||
|
@ -229,6 +229,7 @@ function doSyntaxHighlighting() {
|
||||||
codeBlocks.forEach((codeBlock) => {
|
codeBlocks.forEach((codeBlock) => {
|
||||||
hljs.highlightElement(codeBlock);
|
hljs.highlightElement(codeBlock);
|
||||||
codeBlock.setAttribute("data-highlighted", "true");
|
codeBlock.setAttribute("data-highlighted", "true");
|
||||||
|
codeBlock.classList.add("pretty_scrollbar");
|
||||||
});
|
});
|
||||||
|
|
||||||
renderMathInElement(messageBody, {
|
renderMathInElement(messageBody, {
|
||||||
|
@ -277,7 +278,7 @@ for (i = 0; i < slimDropdownElements.length; i++) {
|
||||||
// The show/hide events were adapted from:
|
// The show/hide events were adapted from:
|
||||||
// https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js
|
// https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js
|
||||||
//------------------------------------------------
|
//------------------------------------------------
|
||||||
var buttonsInChat = document.querySelectorAll("#chat-tab #chat-buttons button");
|
var buttonsInChat = document.querySelectorAll("#chat-tab #chat-buttons button, #chat-tab #chat-buttons #show-controls");
|
||||||
var button = document.getElementById("hover-element-button");
|
var button = document.getElementById("hover-element-button");
|
||||||
var menu = document.getElementById("hover-menu");
|
var menu = document.getElementById("hover-menu");
|
||||||
var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement;
|
var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement;
|
||||||
|
@ -298,18 +299,21 @@ if (buttonsInChat.length > 0) {
|
||||||
const thisButton = buttonsInChat[i];
|
const thisButton = buttonsInChat[i];
|
||||||
menu.appendChild(thisButton);
|
menu.appendChild(thisButton);
|
||||||
|
|
||||||
thisButton.addEventListener("click", () => {
|
// Only apply transformations to button elements
|
||||||
hideMenu();
|
if (thisButton.tagName.toLowerCase() === "button") {
|
||||||
});
|
thisButton.addEventListener("click", () => {
|
||||||
|
hideMenu();
|
||||||
|
});
|
||||||
|
|
||||||
const buttonText = thisButton.textContent;
|
const buttonText = thisButton.textContent;
|
||||||
const matches = buttonText.match(/(\(.*?\))/);
|
const matches = buttonText.match(/(\(.*?\))/);
|
||||||
|
|
||||||
if (matches && matches.length > 1) {
|
if (matches && matches.length > 1) {
|
||||||
// Apply the transparent-substring class to the matched substring
|
// Apply the transparent-substring class to the matched substring
|
||||||
const substring = matches[1];
|
const substring = matches[1];
|
||||||
const newText = buttonText.replace(substring, ` <span class="transparent-substring">${substring.slice(1, -1)}</span>`);
|
const newText = buttonText.replace(substring, ` <span class="transparent-substring">${substring.slice(1, -1)}</span>`);
|
||||||
thisButton.innerHTML = newText;
|
thisButton.innerHTML = newText;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -382,21 +386,10 @@ document.addEventListener("click", function (event) {
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
//------------------------------------------------
|
|
||||||
// Relocate the "Show controls" checkbox
|
|
||||||
//------------------------------------------------
|
|
||||||
var elementToMove = document.getElementById("show-controls");
|
|
||||||
var parent = elementToMove.parentNode;
|
|
||||||
for (var i = 0; i < 2; i++) {
|
|
||||||
parent = parent.parentNode;
|
|
||||||
}
|
|
||||||
|
|
||||||
parent.insertBefore(elementToMove, parent.firstChild);
|
|
||||||
|
|
||||||
//------------------------------------------------
|
//------------------------------------------------
|
||||||
// Position the chat input
|
// Position the chat input
|
||||||
//------------------------------------------------
|
//------------------------------------------------
|
||||||
document.getElementById("show-controls").parentNode.classList.add("chat-input-positioned");
|
document.getElementById("chat-input-row").classList.add("chat-input-positioned");
|
||||||
|
|
||||||
//------------------------------------------------
|
//------------------------------------------------
|
||||||
// Focus on the chat input
|
// Focus on the chat input
|
||||||
|
@ -872,3 +865,53 @@ function navigateLastAssistantMessage(direction) {
|
||||||
|
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
//------------------------------------------------
|
||||||
|
// Paste Handler for Long Text
|
||||||
|
//------------------------------------------------
|
||||||
|
|
||||||
|
const MAX_PLAIN_TEXT_LENGTH = 2500;
|
||||||
|
|
||||||
|
function setupPasteHandler() {
|
||||||
|
const textbox = document.querySelector("#chat-input textarea[data-testid=\"textbox\"]");
|
||||||
|
const fileInput = document.querySelector("#chat-input input[data-testid=\"file-upload\"]");
|
||||||
|
|
||||||
|
if (!textbox || !fileInput) {
|
||||||
|
setTimeout(setupPasteHandler, 500);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
textbox.addEventListener("paste", async (event) => {
|
||||||
|
const text = event.clipboardData?.getData("text");
|
||||||
|
|
||||||
|
if (text && text.length > MAX_PLAIN_TEXT_LENGTH) {
|
||||||
|
event.preventDefault();
|
||||||
|
|
||||||
|
const file = new File([text], "pasted_text.txt", {
|
||||||
|
type: "text/plain",
|
||||||
|
lastModified: Date.now()
|
||||||
|
});
|
||||||
|
|
||||||
|
const dataTransfer = new DataTransfer();
|
||||||
|
dataTransfer.items.add(file);
|
||||||
|
fileInput.files = dataTransfer.files;
|
||||||
|
fileInput.dispatchEvent(new Event("change", { bubbles: true }));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
if (document.readyState === "loading") {
|
||||||
|
document.addEventListener("DOMContentLoaded", setupPasteHandler);
|
||||||
|
} else {
|
||||||
|
setupPasteHandler();
|
||||||
|
}
|
||||||
|
|
||||||
|
//------------------------------------------------
|
||||||
|
// Tooltips
|
||||||
|
//------------------------------------------------
|
||||||
|
|
||||||
|
// File upload button
|
||||||
|
document.querySelector("#chat-input .upload-button").title = "Upload text files, PDFs, and DOCX documents";
|
||||||
|
|
||||||
|
// Activate web search
|
||||||
|
document.getElementById("web-search").title = "Search the internet with DuckDuckGo";
|
||||||
|
|
194
modules/chat.py
194
modules/chat.py
|
@ -220,13 +220,22 @@ def generate_chat_prompt(user_input, state, **kwargs):
|
||||||
# Add attachment content if present
|
# Add attachment content if present
|
||||||
if user_key in metadata and "attachments" in metadata[user_key]:
|
if user_key in metadata and "attachments" in metadata[user_key]:
|
||||||
attachments_text = ""
|
attachments_text = ""
|
||||||
for attachment in metadata[user_key]["attachments"]:
|
image_refs = ""
|
||||||
filename = attachment.get("name", "file")
|
|
||||||
content = attachment.get("content", "")
|
|
||||||
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
|
|
||||||
|
|
||||||
if attachments_text:
|
for attachment in metadata[user_key]["attachments"]:
|
||||||
enhanced_user_msg = f"{user_msg}\n\nATTACHMENTS:\n{attachments_text}"
|
if attachment.get("type") == "image":
|
||||||
|
# Add image reference for multimodal models
|
||||||
|
image_refs += "<__media__>"
|
||||||
|
else:
|
||||||
|
# Handle text/PDF attachments as before
|
||||||
|
filename = attachment.get("name", "file")
|
||||||
|
content = attachment.get("content", "")
|
||||||
|
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
|
||||||
|
|
||||||
|
if image_refs or attachments_text:
|
||||||
|
enhanced_user_msg = f"{user_msg} {image_refs}"
|
||||||
|
if attachments_text:
|
||||||
|
enhanced_user_msg += f"\n\nATTACHMENTS:\n{attachments_text}"
|
||||||
|
|
||||||
messages.insert(insert_pos, {"role": "user", "content": enhanced_user_msg})
|
messages.insert(insert_pos, {"role": "user", "content": enhanced_user_msg})
|
||||||
|
|
||||||
|
@ -240,22 +249,29 @@ def generate_chat_prompt(user_input, state, **kwargs):
|
||||||
has_attachments = user_key in metadata and "attachments" in metadata[user_key]
|
has_attachments = user_key in metadata and "attachments" in metadata[user_key]
|
||||||
|
|
||||||
if (user_input or has_attachments) and not impersonate and not _continue:
|
if (user_input or has_attachments) and not impersonate and not _continue:
|
||||||
# For the current user input being processed, check if we need to add attachments
|
current_row_idx = len(history)
|
||||||
if not impersonate and not _continue and len(history_data.get('metadata', {})) > 0:
|
user_key = f"user_{current_row_idx}"
|
||||||
current_row_idx = len(history)
|
|
||||||
user_key = f"user_{current_row_idx}"
|
|
||||||
|
|
||||||
if user_key in metadata and "attachments" in metadata[user_key]:
|
enhanced_user_input = user_input
|
||||||
attachments_text = ""
|
|
||||||
for attachment in metadata[user_key]["attachments"]:
|
if user_key in metadata and "attachments" in metadata[user_key]:
|
||||||
|
attachments_text = ""
|
||||||
|
image_refs = ""
|
||||||
|
|
||||||
|
for attachment in metadata[user_key]["attachments"]:
|
||||||
|
if attachment.get("type") == "image":
|
||||||
|
image_refs += "<__media__>"
|
||||||
|
else:
|
||||||
filename = attachment.get("name", "file")
|
filename = attachment.get("name", "file")
|
||||||
content = attachment.get("content", "")
|
content = attachment.get("content", "")
|
||||||
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
|
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
|
||||||
|
|
||||||
|
if image_refs or attachments_text:
|
||||||
|
enhanced_user_input = f"{user_input} {image_refs}"
|
||||||
if attachments_text:
|
if attachments_text:
|
||||||
user_input = f"{user_input}\n\nATTACHMENTS:\n{attachments_text}"
|
enhanced_user_input += f"\n\nATTACHMENTS:\n{attachments_text}"
|
||||||
|
|
||||||
messages.append({"role": "user", "content": user_input})
|
messages.append({"role": "user", "content": enhanced_user_input})
|
||||||
|
|
||||||
def make_prompt(messages):
|
def make_prompt(messages):
|
||||||
if state['mode'] == 'chat-instruct' and _continue:
|
if state['mode'] == 'chat-instruct' and _continue:
|
||||||
|
@ -495,26 +511,63 @@ def add_message_attachment(history, row_idx, file_path, is_user=True):
|
||||||
file_extension = path.suffix.lower()
|
file_extension = path.suffix.lower()
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Handle different file types
|
# Handle image files
|
||||||
if file_extension == '.pdf':
|
if file_extension in ['.jpg', '.jpeg', '.png', '.webp', '.bmp', '.gif']:
|
||||||
|
# Convert image to base64
|
||||||
|
with open(path, 'rb') as f:
|
||||||
|
image_data = base64.b64encode(f.read()).decode('utf-8')
|
||||||
|
|
||||||
|
# Determine MIME type from extension
|
||||||
|
mime_type_map = {
|
||||||
|
'.jpg': 'image/jpeg',
|
||||||
|
'.jpeg': 'image/jpeg',
|
||||||
|
'.png': 'image/png',
|
||||||
|
'.webp': 'image/webp',
|
||||||
|
'.bmp': 'image/bmp',
|
||||||
|
'.gif': 'image/gif'
|
||||||
|
}
|
||||||
|
mime_type = mime_type_map.get(file_extension, 'image/jpeg')
|
||||||
|
|
||||||
|
# Format as data URL
|
||||||
|
data_url = f"data:{mime_type};base64,{image_data}"
|
||||||
|
|
||||||
|
# Generate unique image ID
|
||||||
|
image_id = len([att for att in history['metadata'][key]["attachments"] if att.get("type") == "image"]) + 1
|
||||||
|
|
||||||
|
attachment = {
|
||||||
|
"name": filename,
|
||||||
|
"type": "image",
|
||||||
|
"image_data": data_url,
|
||||||
|
"image_id": image_id,
|
||||||
|
"file_path": str(path) # For UI preview
|
||||||
|
}
|
||||||
|
elif file_extension == '.pdf':
|
||||||
# Process PDF file
|
# Process PDF file
|
||||||
content = extract_pdf_text(path)
|
content = extract_pdf_text(path)
|
||||||
file_type = "application/pdf"
|
attachment = {
|
||||||
|
"name": filename,
|
||||||
|
"type": "application/pdf",
|
||||||
|
"content": content,
|
||||||
|
}
|
||||||
|
elif file_extension == '.docx':
|
||||||
|
content = extract_docx_text(path)
|
||||||
|
attachment = {
|
||||||
|
"name": filename,
|
||||||
|
"type": "application/docx",
|
||||||
|
"content": content,
|
||||||
|
}
|
||||||
else:
|
else:
|
||||||
# Default handling for text files
|
# Default handling for text files
|
||||||
with open(path, 'r', encoding='utf-8') as f:
|
with open(path, 'r', encoding='utf-8') as f:
|
||||||
content = f.read()
|
content = f.read()
|
||||||
file_type = "text/plain"
|
attachment = {
|
||||||
|
"name": filename,
|
||||||
# Add attachment
|
"type": "text/plain",
|
||||||
attachment = {
|
"content": content,
|
||||||
"name": filename,
|
}
|
||||||
"type": file_type,
|
|
||||||
"content": content,
|
|
||||||
}
|
|
||||||
|
|
||||||
history['metadata'][key]["attachments"].append(attachment)
|
history['metadata'][key]["attachments"].append(attachment)
|
||||||
return content # Return the content for reuse
|
return attachment # Return the attachment for reuse
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error processing attachment {filename}: {e}")
|
logger.error(f"Error processing attachment {filename}: {e}")
|
||||||
return None
|
return None
|
||||||
|
@ -538,6 +591,53 @@ def extract_pdf_text(pdf_path):
|
||||||
return f"[Error extracting PDF text: {str(e)}]"
|
return f"[Error extracting PDF text: {str(e)}]"
|
||||||
|
|
||||||
|
|
||||||
|
def extract_docx_text(docx_path):
|
||||||
|
"""
|
||||||
|
Extract text from a .docx file, including headers,
|
||||||
|
body (paragraphs and tables), and footers.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import docx
|
||||||
|
|
||||||
|
doc = docx.Document(docx_path)
|
||||||
|
parts = []
|
||||||
|
|
||||||
|
# 1) Extract non-empty header paragraphs from each section
|
||||||
|
for section in doc.sections:
|
||||||
|
for para in section.header.paragraphs:
|
||||||
|
text = para.text.strip()
|
||||||
|
if text:
|
||||||
|
parts.append(text)
|
||||||
|
|
||||||
|
# 2) Extract body blocks (paragraphs and tables) in document order
|
||||||
|
parent_elm = doc.element.body
|
||||||
|
for child in parent_elm.iterchildren():
|
||||||
|
if isinstance(child, docx.oxml.text.paragraph.CT_P):
|
||||||
|
para = docx.text.paragraph.Paragraph(child, doc)
|
||||||
|
text = para.text.strip()
|
||||||
|
if text:
|
||||||
|
parts.append(text)
|
||||||
|
|
||||||
|
elif isinstance(child, docx.oxml.table.CT_Tbl):
|
||||||
|
table = docx.table.Table(child, doc)
|
||||||
|
for row in table.rows:
|
||||||
|
cells = [cell.text.strip() for cell in row.cells]
|
||||||
|
parts.append("\t".join(cells))
|
||||||
|
|
||||||
|
# 3) Extract non-empty footer paragraphs from each section
|
||||||
|
for section in doc.sections:
|
||||||
|
for para in section.footer.paragraphs:
|
||||||
|
text = para.text.strip()
|
||||||
|
if text:
|
||||||
|
parts.append(text)
|
||||||
|
|
||||||
|
return "\n".join(parts)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error extracting text from DOCX: {e}")
|
||||||
|
return f"[Error extracting DOCX text: {str(e)}]"
|
||||||
|
|
||||||
|
|
||||||
def generate_search_query(user_message, state):
|
def generate_search_query(user_message, state):
|
||||||
"""Generate a search query from user message using the LLM"""
|
"""Generate a search query from user message using the LLM"""
|
||||||
# Augment the user message with search instruction
|
# Augment the user message with search instruction
|
||||||
|
@ -554,7 +654,12 @@ def generate_search_query(user_message, state):
|
||||||
|
|
||||||
query = ""
|
query = ""
|
||||||
for reply in generate_reply(formatted_prompt, search_state, stopping_strings=[], is_chat=True):
|
for reply in generate_reply(formatted_prompt, search_state, stopping_strings=[], is_chat=True):
|
||||||
query = reply.strip()
|
query = reply
|
||||||
|
|
||||||
|
# Strip and remove surrounding quotes if present
|
||||||
|
query = query.strip()
|
||||||
|
if len(query) >= 2 and query.startswith('"') and query.endswith('"'):
|
||||||
|
query = query[1:-1]
|
||||||
|
|
||||||
return query
|
return query
|
||||||
|
|
||||||
|
@ -590,6 +695,19 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
|
||||||
for file_path in files:
|
for file_path in files:
|
||||||
add_message_attachment(output, row_idx, file_path, is_user=True)
|
add_message_attachment(output, row_idx, file_path, is_user=True)
|
||||||
|
|
||||||
|
# Collect image attachments for llama.cpp
|
||||||
|
image_attachments = []
|
||||||
|
if 'metadata' in output:
|
||||||
|
user_key = f"user_{row_idx}"
|
||||||
|
if user_key in output['metadata'] and "attachments" in output['metadata'][user_key]:
|
||||||
|
for attachment in output['metadata'][user_key]["attachments"]:
|
||||||
|
if attachment.get("type") == "image":
|
||||||
|
image_attachments.append(attachment)
|
||||||
|
|
||||||
|
# Add image attachments to state for the generation
|
||||||
|
if image_attachments:
|
||||||
|
state['image_attachments'] = image_attachments
|
||||||
|
|
||||||
# Add web search results as attachments if enabled
|
# Add web search results as attachments if enabled
|
||||||
if state.get('enable_web_search', False):
|
if state.get('enable_web_search', False):
|
||||||
search_query = generate_search_query(text, state)
|
search_query = generate_search_query(text, state)
|
||||||
|
@ -660,7 +778,7 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
|
||||||
|
|
||||||
# Add timestamp for assistant's response at the start of generation
|
# Add timestamp for assistant's response at the start of generation
|
||||||
row_idx = len(output['internal']) - 1
|
row_idx = len(output['internal']) - 1
|
||||||
update_message_metadata(output['metadata'], "assistant", row_idx, timestamp=get_current_timestamp())
|
update_message_metadata(output['metadata'], "assistant", row_idx, timestamp=get_current_timestamp(), model_name=shared.model_name)
|
||||||
|
|
||||||
# Generate
|
# Generate
|
||||||
reply = None
|
reply = None
|
||||||
|
@ -775,7 +893,9 @@ def generate_chat_reply_wrapper(text, state, regenerate=False, _continue=False):
|
||||||
last_save_time = time.monotonic()
|
last_save_time = time.monotonic()
|
||||||
save_interval = 8
|
save_interval = 8
|
||||||
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
|
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
|
||||||
yield chat_html_wrapper(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu']), history
|
yield chat_html_wrapper(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'], last_message_only=(i > 0)), history
|
||||||
|
if i == 0:
|
||||||
|
time.sleep(0.125) # We need this to make sure the first update goes through
|
||||||
|
|
||||||
current_time = time.monotonic()
|
current_time = time.monotonic()
|
||||||
# Save on first iteration or if save_interval seconds have passed
|
# Save on first iteration or if save_interval seconds have passed
|
||||||
|
@ -806,9 +926,12 @@ def remove_last_message(history):
|
||||||
return html.unescape(last[0]), history
|
return html.unescape(last[0]), history
|
||||||
|
|
||||||
|
|
||||||
def send_dummy_message(textbox, state):
|
def send_dummy_message(text, state):
|
||||||
history = state['history']
|
history = state['history']
|
||||||
text = textbox['text']
|
|
||||||
|
# Handle both dict and string inputs
|
||||||
|
if isinstance(text, dict):
|
||||||
|
text = text['text']
|
||||||
|
|
||||||
# Initialize metadata if not present
|
# Initialize metadata if not present
|
||||||
if 'metadata' not in history:
|
if 'metadata' not in history:
|
||||||
|
@ -822,9 +945,12 @@ def send_dummy_message(textbox, state):
|
||||||
return history
|
return history
|
||||||
|
|
||||||
|
|
||||||
def send_dummy_reply(textbox, state):
|
def send_dummy_reply(text, state):
|
||||||
history = state['history']
|
history = state['history']
|
||||||
text = textbox['text']
|
|
||||||
|
# Handle both dict and string inputs
|
||||||
|
if isinstance(text, dict):
|
||||||
|
text = text['text']
|
||||||
|
|
||||||
# Initialize metadata if not present
|
# Initialize metadata if not present
|
||||||
if 'metadata' not in history:
|
if 'metadata' not in history:
|
||||||
|
|
|
@ -245,3 +245,20 @@ class Exllamav3HF(PreTrainedModel, GenerationMixin):
|
||||||
pretrained_model_name_or_path = Path(f'{shared.args.model_dir}') / Path(pretrained_model_name_or_path)
|
pretrained_model_name_or_path = Path(f'{shared.args.model_dir}') / Path(pretrained_model_name_or_path)
|
||||||
|
|
||||||
return Exllamav3HF(pretrained_model_name_or_path)
|
return Exllamav3HF(pretrained_model_name_or_path)
|
||||||
|
|
||||||
|
def unload(self):
|
||||||
|
"""Properly unload the ExllamaV3 model and free GPU memory."""
|
||||||
|
if hasattr(self, 'ex_model') and self.ex_model is not None:
|
||||||
|
self.ex_model.unload()
|
||||||
|
self.ex_model = None
|
||||||
|
|
||||||
|
if hasattr(self, 'ex_cache') and self.ex_cache is not None:
|
||||||
|
self.ex_cache = None
|
||||||
|
|
||||||
|
# Clean up any additional ExllamaV3 resources
|
||||||
|
if hasattr(self, 'past_seq'):
|
||||||
|
self.past_seq = None
|
||||||
|
if hasattr(self, 'past_seq_negative'):
|
||||||
|
self.past_seq_negative = None
|
||||||
|
if hasattr(self, 'ex_cache_negative'):
|
||||||
|
self.ex_cache_negative = None
|
||||||
|
|
|
@ -350,12 +350,14 @@ remove_button = f'<button class="footer-button footer-remove-button" title="Remo
|
||||||
info_button = f'<button class="footer-button footer-info-button" title="message">{info_svg}</button>'
|
info_button = f'<button class="footer-button footer-info-button" title="message">{info_svg}</button>'
|
||||||
|
|
||||||
|
|
||||||
def format_message_timestamp(history, role, index):
|
def format_message_timestamp(history, role, index, tooltip_include_timestamp=True):
|
||||||
"""Get a formatted timestamp HTML span for a message if available"""
|
"""Get a formatted timestamp HTML span for a message if available"""
|
||||||
key = f"{role}_{index}"
|
key = f"{role}_{index}"
|
||||||
if 'metadata' in history and key in history['metadata'] and history['metadata'][key].get('timestamp'):
|
if 'metadata' in history and key in history['metadata'] and history['metadata'][key].get('timestamp'):
|
||||||
timestamp = history['metadata'][key]['timestamp']
|
timestamp = history['metadata'][key]['timestamp']
|
||||||
return f"<span class='timestamp'>{timestamp}</span>"
|
tooltip_text = get_message_tooltip(history, role, index, include_timestamp=tooltip_include_timestamp)
|
||||||
|
title_attr = f' title="{html.escape(tooltip_text)}"' if tooltip_text else ''
|
||||||
|
return f"<span class='timestamp'{title_attr}>{timestamp}</span>"
|
||||||
|
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
|
@ -372,22 +374,50 @@ def format_message_attachments(history, role, index):
|
||||||
for attachment in attachments:
|
for attachment in attachments:
|
||||||
name = html.escape(attachment["name"])
|
name = html.escape(attachment["name"])
|
||||||
|
|
||||||
# Make clickable if URL exists
|
if attachment.get("type") == "image":
|
||||||
if "url" in attachment:
|
# Show image preview
|
||||||
name = f'<a href="{html.escape(attachment["url"])}" target="_blank" rel="noopener noreferrer">{name}</a>'
|
file_path = attachment.get("file_path", "")
|
||||||
|
attachments_html += (
|
||||||
|
f'<div class="attachment-box image-attachment">'
|
||||||
|
f'<img src="file/{file_path}" alt="{name}" class="image-preview" />'
|
||||||
|
f'<div class="attachment-name">{name}</div>'
|
||||||
|
f'</div>'
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Make clickable if URL exists (web search)
|
||||||
|
if "url" in attachment:
|
||||||
|
name = f'<a href="{html.escape(attachment["url"])}" target="_blank" rel="noopener noreferrer">{name}</a>'
|
||||||
|
|
||||||
|
attachments_html += (
|
||||||
|
f'<div class="attachment-box">'
|
||||||
|
f'<div class="attachment-icon">{attachment_svg}</div>'
|
||||||
|
f'<div class="attachment-name">{name}</div>'
|
||||||
|
f'</div>'
|
||||||
|
)
|
||||||
|
|
||||||
attachments_html += (
|
|
||||||
f'<div class="attachment-box">'
|
|
||||||
f'<div class="attachment-icon">{attachment_svg}</div>'
|
|
||||||
f'<div class="attachment-name">{name}</div>'
|
|
||||||
f'</div>'
|
|
||||||
)
|
|
||||||
attachments_html += '</div>'
|
attachments_html += '</div>'
|
||||||
return attachments_html
|
return attachments_html
|
||||||
|
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def get_message_tooltip(history, role, index, include_timestamp=True):
|
||||||
|
"""Get tooltip text combining timestamp and model name for a message"""
|
||||||
|
key = f"{role}_{index}"
|
||||||
|
if 'metadata' not in history or key not in history['metadata']:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
meta = history['metadata'][key]
|
||||||
|
tooltip_parts = []
|
||||||
|
|
||||||
|
if include_timestamp and meta.get('timestamp'):
|
||||||
|
tooltip_parts.append(meta['timestamp'])
|
||||||
|
if meta.get('model_name'):
|
||||||
|
tooltip_parts.append(f"Model: {meta['model_name']}")
|
||||||
|
|
||||||
|
return " | ".join(tooltip_parts)
|
||||||
|
|
||||||
|
|
||||||
def get_version_navigation_html(history, i, role):
|
def get_version_navigation_html(history, i, role):
|
||||||
"""Generate simple navigation arrows for message versions"""
|
"""Generate simple navigation arrows for message versions"""
|
||||||
key = f"{role}_{i}"
|
key = f"{role}_{i}"
|
||||||
|
@ -443,66 +473,69 @@ def actions_html(history, i, role, info_message=""):
|
||||||
f'{version_nav_html}')
|
f'{version_nav_html}')
|
||||||
|
|
||||||
|
|
||||||
def generate_instruct_html(history):
|
def generate_instruct_html(history, last_message_only=False):
|
||||||
output = f'<style>{instruct_css}</style><div class="chat" id="chat" data-mode="instruct"><div class="messages">'
|
if not last_message_only:
|
||||||
|
output = f'<style>{instruct_css}</style><div class="chat" id="chat" data-mode="instruct"><div class="messages">'
|
||||||
|
else:
|
||||||
|
output = ""
|
||||||
|
|
||||||
for i in range(len(history['visible'])):
|
def create_message(role, content, raw_content):
|
||||||
row_visible = history['visible'][i]
|
"""Inner function that captures variables from outer scope."""
|
||||||
row_internal = history['internal'][i]
|
class_name = "user-message" if role == "user" else "assistant-message"
|
||||||
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
|
|
||||||
|
|
||||||
# Get timestamps
|
# Get role-specific data
|
||||||
user_timestamp = format_message_timestamp(history, "user", i)
|
timestamp = format_message_timestamp(history, role, i)
|
||||||
assistant_timestamp = format_message_timestamp(history, "assistant", i)
|
attachments = format_message_attachments(history, role, i)
|
||||||
|
|
||||||
# Get attachments
|
# Create info button if timestamp exists
|
||||||
user_attachments = format_message_attachments(history, "user", i)
|
info_message = ""
|
||||||
assistant_attachments = format_message_attachments(history, "assistant", i)
|
if timestamp:
|
||||||
|
tooltip_text = get_message_tooltip(history, role, i)
|
||||||
|
info_message = info_button.replace('title="message"', f'title="{html.escape(tooltip_text)}"')
|
||||||
|
|
||||||
# Create info buttons for timestamps if they exist
|
return (
|
||||||
info_message_user = ""
|
f'<div class="{class_name}" '
|
||||||
if user_timestamp != "":
|
f'data-raw="{html.escape(raw_content, quote=True)}"'
|
||||||
# Extract the timestamp value from the span
|
|
||||||
user_timestamp_value = user_timestamp.split('>', 1)[1].split('<', 1)[0]
|
|
||||||
info_message_user = info_button.replace("message", user_timestamp_value)
|
|
||||||
|
|
||||||
info_message_assistant = ""
|
|
||||||
if assistant_timestamp != "":
|
|
||||||
# Extract the timestamp value from the span
|
|
||||||
assistant_timestamp_value = assistant_timestamp.split('>', 1)[1].split('<', 1)[0]
|
|
||||||
info_message_assistant = info_button.replace("message", assistant_timestamp_value)
|
|
||||||
|
|
||||||
if converted_visible[0]: # Don't display empty user messages
|
|
||||||
output += (
|
|
||||||
f'<div class="user-message" '
|
|
||||||
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
|
|
||||||
f'data-index={i}>'
|
|
||||||
f'<div class="text">'
|
|
||||||
f'<div class="message-body">{converted_visible[0]}</div>'
|
|
||||||
f'{user_attachments}'
|
|
||||||
f'{actions_html(history, i, "user", info_message_user)}'
|
|
||||||
f'</div>'
|
|
||||||
f'</div>'
|
|
||||||
)
|
|
||||||
|
|
||||||
output += (
|
|
||||||
f'<div class="assistant-message" '
|
|
||||||
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
|
|
||||||
f'data-index={i}>'
|
f'data-index={i}>'
|
||||||
f'<div class="text">'
|
f'<div class="text">'
|
||||||
f'<div class="message-body">{converted_visible[1]}</div>'
|
f'<div class="message-body">{content}</div>'
|
||||||
f'{assistant_attachments}'
|
f'{attachments}'
|
||||||
f'{actions_html(history, i, "assistant", info_message_assistant)}'
|
f'{actions_html(history, i, role, info_message)}'
|
||||||
f'</div>'
|
f'</div>'
|
||||||
f'</div>'
|
f'</div>'
|
||||||
)
|
)
|
||||||
|
|
||||||
output += "</div></div>"
|
# Determine range
|
||||||
|
start_idx = len(history['visible']) - 1 if last_message_only else 0
|
||||||
|
end_idx = len(history['visible'])
|
||||||
|
|
||||||
|
for i in range(start_idx, end_idx):
|
||||||
|
row_visible = history['visible'][i]
|
||||||
|
row_internal = history['internal'][i]
|
||||||
|
|
||||||
|
# Convert content
|
||||||
|
if last_message_only:
|
||||||
|
converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
|
||||||
|
else:
|
||||||
|
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
|
||||||
|
|
||||||
|
# Generate messages
|
||||||
|
if not last_message_only and converted_visible[0]:
|
||||||
|
output += create_message("user", converted_visible[0], row_internal[0])
|
||||||
|
|
||||||
|
output += create_message("assistant", converted_visible[1], row_internal[1])
|
||||||
|
|
||||||
|
if not last_message_only:
|
||||||
|
output += "</div></div>"
|
||||||
|
|
||||||
return output
|
return output
|
||||||
|
|
||||||
|
|
||||||
def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=False):
|
def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=False, last_message_only=False):
|
||||||
output = f'<style>{chat_styles[style]}</style><div class="chat" id="chat"><div class="messages">'
|
if not last_message_only:
|
||||||
|
output = f'<style>{chat_styles[style]}</style><div class="chat" id="chat"><div class="messages">'
|
||||||
|
else:
|
||||||
|
output = ""
|
||||||
|
|
||||||
# We use ?character and ?time.time() to force the browser to reset caches
|
# We use ?character and ?time.time() to force the browser to reset caches
|
||||||
img_bot = (
|
img_bot = (
|
||||||
|
@ -510,112 +543,117 @@ def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=
|
||||||
if Path("user_data/cache/pfp_character_thumb.png").exists() else ''
|
if Path("user_data/cache/pfp_character_thumb.png").exists() else ''
|
||||||
)
|
)
|
||||||
|
|
||||||
img_me = (
|
def create_message(role, content, raw_content):
|
||||||
f'<img src="file/user_data/cache/pfp_me.png?{time.time() if reset_cache else ""}">'
|
"""Inner function for CAI-style messages."""
|
||||||
if Path("user_data/cache/pfp_me.png").exists() else ''
|
circle_class = "circle-you" if role == "user" else "circle-bot"
|
||||||
)
|
name = name1 if role == "user" else name2
|
||||||
|
|
||||||
for i in range(len(history['visible'])):
|
# Get role-specific data
|
||||||
row_visible = history['visible'][i]
|
timestamp = format_message_timestamp(history, role, i, tooltip_include_timestamp=False)
|
||||||
row_internal = history['internal'][i]
|
attachments = format_message_attachments(history, role, i)
|
||||||
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
|
|
||||||
|
|
||||||
# Get timestamps
|
# Get appropriate image
|
||||||
user_timestamp = format_message_timestamp(history, "user", i)
|
if role == "user":
|
||||||
assistant_timestamp = format_message_timestamp(history, "assistant", i)
|
img = (f'<img src="file/user_data/cache/pfp_me.png?{time.time() if reset_cache else ""}">'
|
||||||
|
if Path("user_data/cache/pfp_me.png").exists() else '')
|
||||||
|
else:
|
||||||
|
img = img_bot
|
||||||
|
|
||||||
# Get attachments
|
return (
|
||||||
user_attachments = format_message_attachments(history, "user", i)
|
|
||||||
assistant_attachments = format_message_attachments(history, "assistant", i)
|
|
||||||
|
|
||||||
if converted_visible[0]: # Don't display empty user messages
|
|
||||||
output += (
|
|
||||||
f'<div class="message" '
|
|
||||||
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
|
|
||||||
f'data-index={i}>'
|
|
||||||
f'<div class="circle-you">{img_me}</div>'
|
|
||||||
f'<div class="text">'
|
|
||||||
f'<div class="username">{name1}{user_timestamp}</div>'
|
|
||||||
f'<div class="message-body">{converted_visible[0]}</div>'
|
|
||||||
f'{user_attachments}'
|
|
||||||
f'{actions_html(history, i, "user")}'
|
|
||||||
f'</div>'
|
|
||||||
f'</div>'
|
|
||||||
)
|
|
||||||
|
|
||||||
output += (
|
|
||||||
f'<div class="message" '
|
f'<div class="message" '
|
||||||
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
|
f'data-raw="{html.escape(raw_content, quote=True)}"'
|
||||||
f'data-index={i}>'
|
f'data-index={i}>'
|
||||||
f'<div class="circle-bot">{img_bot}</div>'
|
f'<div class="{circle_class}">{img}</div>'
|
||||||
f'<div class="text">'
|
f'<div class="text">'
|
||||||
f'<div class="username">{name2}{assistant_timestamp}</div>'
|
f'<div class="username">{name}{timestamp}</div>'
|
||||||
f'<div class="message-body">{converted_visible[1]}</div>'
|
f'<div class="message-body">{content}</div>'
|
||||||
f'{assistant_attachments}'
|
f'{attachments}'
|
||||||
f'{actions_html(history, i, "assistant")}'
|
f'{actions_html(history, i, role)}'
|
||||||
f'</div>'
|
f'</div>'
|
||||||
f'</div>'
|
f'</div>'
|
||||||
)
|
)
|
||||||
|
|
||||||
output += "</div></div>"
|
# Determine range
|
||||||
|
start_idx = len(history['visible']) - 1 if last_message_only else 0
|
||||||
|
end_idx = len(history['visible'])
|
||||||
|
|
||||||
|
for i in range(start_idx, end_idx):
|
||||||
|
row_visible = history['visible'][i]
|
||||||
|
row_internal = history['internal'][i]
|
||||||
|
|
||||||
|
# Convert content
|
||||||
|
if last_message_only:
|
||||||
|
converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
|
||||||
|
else:
|
||||||
|
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
|
||||||
|
|
||||||
|
# Generate messages
|
||||||
|
if not last_message_only and converted_visible[0]:
|
||||||
|
output += create_message("user", converted_visible[0], row_internal[0])
|
||||||
|
|
||||||
|
output += create_message("assistant", converted_visible[1], row_internal[1])
|
||||||
|
|
||||||
|
if not last_message_only:
|
||||||
|
output += "</div></div>"
|
||||||
|
|
||||||
return output
|
return output
|
||||||
|
|
||||||
|
|
||||||
def generate_chat_html(history, name1, name2, reset_cache=False):
|
def generate_chat_html(history, name1, name2, reset_cache=False, last_message_only=False):
|
||||||
output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">'
|
if not last_message_only:
|
||||||
|
output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">'
|
||||||
|
else:
|
||||||
|
output = ""
|
||||||
|
|
||||||
for i in range(len(history['visible'])):
|
def create_message(role, content, raw_content):
|
||||||
row_visible = history['visible'][i]
|
"""Inner function for WPP-style messages."""
|
||||||
row_internal = history['internal'][i]
|
text_class = "text-you" if role == "user" else "text-bot"
|
||||||
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
|
|
||||||
|
|
||||||
# Get timestamps
|
# Get role-specific data
|
||||||
user_timestamp = format_message_timestamp(history, "user", i)
|
timestamp = format_message_timestamp(history, role, i)
|
||||||
assistant_timestamp = format_message_timestamp(history, "assistant", i)
|
attachments = format_message_attachments(history, role, i)
|
||||||
|
|
||||||
# Get attachments
|
# Create info button if timestamp exists
|
||||||
user_attachments = format_message_attachments(history, "user", i)
|
info_message = ""
|
||||||
assistant_attachments = format_message_attachments(history, "assistant", i)
|
if timestamp:
|
||||||
|
tooltip_text = get_message_tooltip(history, role, i)
|
||||||
|
info_message = info_button.replace('title="message"', f'title="{html.escape(tooltip_text)}"')
|
||||||
|
|
||||||
# Create info buttons for timestamps if they exist
|
return (
|
||||||
info_message_user = ""
|
|
||||||
if user_timestamp != "":
|
|
||||||
# Extract the timestamp value from the span
|
|
||||||
user_timestamp_value = user_timestamp.split('>', 1)[1].split('<', 1)[0]
|
|
||||||
info_message_user = info_button.replace("message", user_timestamp_value)
|
|
||||||
|
|
||||||
info_message_assistant = ""
|
|
||||||
if assistant_timestamp != "":
|
|
||||||
# Extract the timestamp value from the span
|
|
||||||
assistant_timestamp_value = assistant_timestamp.split('>', 1)[1].split('<', 1)[0]
|
|
||||||
info_message_assistant = info_button.replace("message", assistant_timestamp_value)
|
|
||||||
|
|
||||||
if converted_visible[0]: # Don't display empty user messages
|
|
||||||
output += (
|
|
||||||
f'<div class="message" '
|
|
||||||
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
|
|
||||||
f'data-index={i}>'
|
|
||||||
f'<div class="text-you">'
|
|
||||||
f'<div class="message-body">{converted_visible[0]}</div>'
|
|
||||||
f'{user_attachments}'
|
|
||||||
f'{actions_html(history, i, "user", info_message_user)}'
|
|
||||||
f'</div>'
|
|
||||||
f'</div>'
|
|
||||||
)
|
|
||||||
|
|
||||||
output += (
|
|
||||||
f'<div class="message" '
|
f'<div class="message" '
|
||||||
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
|
f'data-raw="{html.escape(raw_content, quote=True)}"'
|
||||||
f'data-index={i}>'
|
f'data-index={i}>'
|
||||||
f'<div class="text-bot">'
|
f'<div class="{text_class}">'
|
||||||
f'<div class="message-body">{converted_visible[1]}</div>'
|
f'<div class="message-body">{content}</div>'
|
||||||
f'{assistant_attachments}'
|
f'{attachments}'
|
||||||
f'{actions_html(history, i, "assistant", info_message_assistant)}'
|
f'{actions_html(history, i, role, info_message)}'
|
||||||
f'</div>'
|
f'</div>'
|
||||||
f'</div>'
|
f'</div>'
|
||||||
)
|
)
|
||||||
|
|
||||||
output += "</div></div>"
|
# Determine range
|
||||||
|
start_idx = len(history['visible']) - 1 if last_message_only else 0
|
||||||
|
end_idx = len(history['visible'])
|
||||||
|
|
||||||
|
for i in range(start_idx, end_idx):
|
||||||
|
row_visible = history['visible'][i]
|
||||||
|
row_internal = history['internal'][i]
|
||||||
|
|
||||||
|
# Convert content
|
||||||
|
if last_message_only:
|
||||||
|
converted_visible = [None, convert_to_markdown_wrapped(row_visible[1], message_id=i, use_cache=i != len(history['visible']) - 1)]
|
||||||
|
else:
|
||||||
|
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
|
||||||
|
|
||||||
|
# Generate messages
|
||||||
|
if not last_message_only and converted_visible[0]:
|
||||||
|
output += create_message("user", converted_visible[0], row_internal[0])
|
||||||
|
|
||||||
|
output += create_message("assistant", converted_visible[1], row_internal[1])
|
||||||
|
|
||||||
|
if not last_message_only:
|
||||||
|
output += "</div></div>"
|
||||||
|
|
||||||
return output
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
@ -629,15 +667,15 @@ def time_greeting():
|
||||||
return "Good evening!"
|
return "Good evening!"
|
||||||
|
|
||||||
|
|
||||||
def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=False):
|
def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=False, last_message_only=False):
|
||||||
if len(history['visible']) == 0:
|
if len(history['visible']) == 0:
|
||||||
greeting = f"<div class=\"welcome-greeting\">{time_greeting()} How can I help you today?</div>"
|
greeting = f"<div class=\"welcome-greeting\">{time_greeting()} How can I help you today?</div>"
|
||||||
result = f'<div class="chat" id="chat">{greeting}</div>'
|
result = f'<div class="chat" id="chat">{greeting}</div>'
|
||||||
elif mode == 'instruct':
|
elif mode == 'instruct':
|
||||||
result = generate_instruct_html(history)
|
result = generate_instruct_html(history, last_message_only=last_message_only)
|
||||||
elif style == 'wpp':
|
elif style == 'wpp':
|
||||||
result = generate_chat_html(history, name1, name2)
|
result = generate_chat_html(history, name1, name2, last_message_only=last_message_only)
|
||||||
else:
|
else:
|
||||||
result = generate_cai_chat_html(history, name1, name2, style, character, reset_cache)
|
result = generate_cai_chat_html(history, name1, name2, style, character, reset_cache=reset_cache, last_message_only=last_message_only)
|
||||||
|
|
||||||
return {'html': result}
|
return {'html': result, 'last_message_only': last_message_only}
|
||||||
|
|
|
@ -121,6 +121,18 @@ class LlamaServer:
|
||||||
to_ban = [[int(token_id), False] for token_id in state['custom_token_bans'].split(',')]
|
to_ban = [[int(token_id), False] for token_id in state['custom_token_bans'].split(',')]
|
||||||
payload["logit_bias"] = to_ban
|
payload["logit_bias"] = to_ban
|
||||||
|
|
||||||
|
# Add image data if present
|
||||||
|
if 'image_attachments' in state:
|
||||||
|
medias = []
|
||||||
|
for attachment in state['image_attachments']:
|
||||||
|
medias.append({
|
||||||
|
"type": "image",
|
||||||
|
"data": attachment['image_data']
|
||||||
|
})
|
||||||
|
|
||||||
|
if medias:
|
||||||
|
payload["medias"] = medias
|
||||||
|
|
||||||
return payload
|
return payload
|
||||||
|
|
||||||
def generate_with_streaming(self, prompt, state):
|
def generate_with_streaming(self, prompt, state):
|
||||||
|
@ -142,7 +154,7 @@ class LlamaServer:
|
||||||
|
|
||||||
if shared.args.verbose:
|
if shared.args.verbose:
|
||||||
logger.info("GENERATE_PARAMS=")
|
logger.info("GENERATE_PARAMS=")
|
||||||
printable_payload = {k: v for k, v in payload.items() if k != "prompt"}
|
printable_payload = {k: v for k, v in payload.items() if k not in ["prompt", "image_data"]}
|
||||||
pprint.PrettyPrinter(indent=4, sort_dicts=False).pprint(printable_payload)
|
pprint.PrettyPrinter(indent=4, sort_dicts=False).pprint(printable_payload)
|
||||||
print()
|
print()
|
||||||
|
|
||||||
|
@ -409,14 +421,31 @@ class LlamaServer:
|
||||||
|
|
||||||
def filter_stderr_with_progress(process_stderr):
|
def filter_stderr_with_progress(process_stderr):
|
||||||
progress_pattern = re.compile(r'slot update_slots: id.*progress = (\d+\.\d+)')
|
progress_pattern = re.compile(r'slot update_slots: id.*progress = (\d+\.\d+)')
|
||||||
|
last_was_progress = False
|
||||||
|
|
||||||
try:
|
try:
|
||||||
for line in iter(process_stderr.readline, ''):
|
for line in iter(process_stderr.readline, ''):
|
||||||
|
line = line.rstrip('\n\r') # Remove existing newlines
|
||||||
progress_match = progress_pattern.search(line)
|
progress_match = progress_pattern.search(line)
|
||||||
|
|
||||||
if progress_match:
|
if progress_match:
|
||||||
sys.stderr.write(line)
|
if last_was_progress:
|
||||||
|
# Overwrite the previous progress line using carriage return
|
||||||
|
sys.stderr.write(f'\r{line}')
|
||||||
|
else:
|
||||||
|
# First progress line - print normally
|
||||||
|
sys.stderr.write(line)
|
||||||
sys.stderr.flush()
|
sys.stderr.flush()
|
||||||
|
last_was_progress = True
|
||||||
elif not line.startswith(('srv ', 'slot ')) and 'log_server_r: request: GET /health' not in line:
|
elif not line.startswith(('srv ', 'slot ')) and 'log_server_r: request: GET /health' not in line:
|
||||||
sys.stderr.write(line)
|
if last_was_progress:
|
||||||
|
# Finish the progress line with a newline, then print the new line
|
||||||
|
sys.stderr.write(f'\n{line}\n')
|
||||||
|
else:
|
||||||
|
# Normal line - print with newline
|
||||||
|
sys.stderr.write(f'{line}\n')
|
||||||
sys.stderr.flush()
|
sys.stderr.flush()
|
||||||
|
last_was_progress = False
|
||||||
|
# For filtered lines, don't change last_was_progress state
|
||||||
except (ValueError, IOError):
|
except (ValueError, IOError):
|
||||||
pass
|
pass
|
||||||
|
|
|
@ -116,10 +116,13 @@ def unload_model(keep_model_name=False):
|
||||||
return
|
return
|
||||||
|
|
||||||
is_llamacpp = (shared.model.__class__.__name__ == 'LlamaServer')
|
is_llamacpp = (shared.model.__class__.__name__ == 'LlamaServer')
|
||||||
|
if shared.model.__class__.__name__ == 'Exllamav3HF':
|
||||||
|
shared.model.unload()
|
||||||
|
|
||||||
shared.model = shared.tokenizer = None
|
shared.model = shared.tokenizer = None
|
||||||
shared.lora_names = []
|
shared.lora_names = []
|
||||||
shared.model_dirty_from_training = False
|
shared.model_dirty_from_training = False
|
||||||
|
|
||||||
if not is_llamacpp:
|
if not is_llamacpp:
|
||||||
from modules.torch_utils import clear_torch_cache
|
from modules.torch_utils import clear_torch_cache
|
||||||
clear_torch_cache()
|
clear_torch_cache()
|
||||||
|
|
|
@ -21,7 +21,7 @@ lora_names = []
|
||||||
# Generation variables
|
# Generation variables
|
||||||
stop_everything = False
|
stop_everything = False
|
||||||
generation_lock = None
|
generation_lock = None
|
||||||
processing_message = '*Is typing...*'
|
processing_message = ''
|
||||||
|
|
||||||
# UI variables
|
# UI variables
|
||||||
gradio = {}
|
gradio = {}
|
||||||
|
@ -47,7 +47,6 @@ settings = {
|
||||||
'max_new_tokens_max': 4096,
|
'max_new_tokens_max': 4096,
|
||||||
'prompt_lookup_num_tokens': 0,
|
'prompt_lookup_num_tokens': 0,
|
||||||
'max_tokens_second': 0,
|
'max_tokens_second': 0,
|
||||||
'max_updates_second': 12,
|
|
||||||
'auto_max_new_tokens': True,
|
'auto_max_new_tokens': True,
|
||||||
'ban_eos_token': False,
|
'ban_eos_token': False,
|
||||||
'add_bos_token': True,
|
'add_bos_token': True,
|
||||||
|
|
|
@ -65,41 +65,39 @@ def _generate_reply(question, state, stopping_strings=None, is_chat=False, escap
|
||||||
all_stop_strings += st
|
all_stop_strings += st
|
||||||
|
|
||||||
shared.stop_everything = False
|
shared.stop_everything = False
|
||||||
last_update = -1
|
|
||||||
reply = ''
|
reply = ''
|
||||||
is_stream = state['stream']
|
is_stream = state['stream']
|
||||||
if len(all_stop_strings) > 0 and not state['stream']:
|
if len(all_stop_strings) > 0 and not state['stream']:
|
||||||
state = copy.deepcopy(state)
|
state = copy.deepcopy(state)
|
||||||
state['stream'] = True
|
state['stream'] = True
|
||||||
|
|
||||||
min_update_interval = 0
|
|
||||||
if state.get('max_updates_second', 0) > 0:
|
|
||||||
min_update_interval = 1 / state['max_updates_second']
|
|
||||||
|
|
||||||
# Generate
|
# Generate
|
||||||
|
last_update = -1
|
||||||
|
latency_threshold = 1 / 1000
|
||||||
for reply in generate_func(question, original_question, state, stopping_strings, is_chat=is_chat):
|
for reply in generate_func(question, original_question, state, stopping_strings, is_chat=is_chat):
|
||||||
|
cur_time = time.monotonic()
|
||||||
reply, stop_found = apply_stopping_strings(reply, all_stop_strings)
|
reply, stop_found = apply_stopping_strings(reply, all_stop_strings)
|
||||||
if escape_html:
|
if escape_html:
|
||||||
reply = html.escape(reply)
|
reply = html.escape(reply)
|
||||||
|
|
||||||
if is_stream:
|
if is_stream:
|
||||||
cur_time = time.time()
|
|
||||||
|
|
||||||
# Limit number of tokens/second to make text readable in real time
|
# Limit number of tokens/second to make text readable in real time
|
||||||
if state['max_tokens_second'] > 0:
|
if state['max_tokens_second'] > 0:
|
||||||
diff = 1 / state['max_tokens_second'] - (cur_time - last_update)
|
diff = 1 / state['max_tokens_second'] - (cur_time - last_update)
|
||||||
if diff > 0:
|
if diff > 0:
|
||||||
time.sleep(diff)
|
time.sleep(diff)
|
||||||
|
|
||||||
last_update = time.time()
|
last_update = time.monotonic()
|
||||||
yield reply
|
yield reply
|
||||||
|
|
||||||
# Limit updates to avoid lag in the Gradio UI
|
# Limit updates to avoid lag in the Gradio UI
|
||||||
# API updates are not limited
|
# API updates are not limited
|
||||||
else:
|
else:
|
||||||
if cur_time - last_update > min_update_interval:
|
# If 'generate_func' takes less than 0.001 seconds to yield the next token
|
||||||
last_update = cur_time
|
# (equivalent to more than 1000 tok/s), assume that the UI is lagging behind and skip yielding
|
||||||
|
if (cur_time - last_update) > latency_threshold:
|
||||||
yield reply
|
yield reply
|
||||||
|
last_update = time.monotonic()
|
||||||
|
|
||||||
if stop_found or (state['max_tokens_second'] > 0 and shared.stop_everything):
|
if stop_found or (state['max_tokens_second'] > 0 and shared.stop_everything):
|
||||||
break
|
break
|
||||||
|
|
|
@ -6,6 +6,7 @@ import yaml
|
||||||
|
|
||||||
import extensions
|
import extensions
|
||||||
from modules import shared
|
from modules import shared
|
||||||
|
from modules.chat import load_history
|
||||||
|
|
||||||
with open(Path(__file__).resolve().parent / '../css/NotoSans/stylesheet.css', 'r') as f:
|
with open(Path(__file__).resolve().parent / '../css/NotoSans/stylesheet.css', 'r') as f:
|
||||||
css = f.read()
|
css = f.read()
|
||||||
|
@ -71,6 +72,7 @@ if not shared.args.old_colors:
|
||||||
block_background_fill_dark='transparent',
|
block_background_fill_dark='transparent',
|
||||||
block_border_color_dark='transparent',
|
block_border_color_dark='transparent',
|
||||||
input_border_color_dark='var(--border-color-dark)',
|
input_border_color_dark='var(--border-color-dark)',
|
||||||
|
input_border_color_focus_dark='var(--border-color-dark)',
|
||||||
checkbox_border_color_dark='var(--border-color-dark)',
|
checkbox_border_color_dark='var(--border-color-dark)',
|
||||||
border_color_primary_dark='var(--border-color-dark)',
|
border_color_primary_dark='var(--border-color-dark)',
|
||||||
button_secondary_border_color_dark='var(--border-color-dark)',
|
button_secondary_border_color_dark='var(--border-color-dark)',
|
||||||
|
@ -89,6 +91,8 @@ if not shared.args.old_colors:
|
||||||
checkbox_label_shadow='none',
|
checkbox_label_shadow='none',
|
||||||
block_shadow='none',
|
block_shadow='none',
|
||||||
block_shadow_dark='none',
|
block_shadow_dark='none',
|
||||||
|
input_shadow_focus='none',
|
||||||
|
input_shadow_focus_dark='none',
|
||||||
button_large_radius='0.375rem',
|
button_large_radius='0.375rem',
|
||||||
button_large_padding='6px 12px',
|
button_large_padding='6px 12px',
|
||||||
input_radius='0.375rem',
|
input_radius='0.375rem',
|
||||||
|
@ -191,7 +195,6 @@ def list_interface_input_elements():
|
||||||
'max_new_tokens',
|
'max_new_tokens',
|
||||||
'prompt_lookup_num_tokens',
|
'prompt_lookup_num_tokens',
|
||||||
'max_tokens_second',
|
'max_tokens_second',
|
||||||
'max_updates_second',
|
|
||||||
'do_sample',
|
'do_sample',
|
||||||
'dynamic_temperature',
|
'dynamic_temperature',
|
||||||
'temperature_last',
|
'temperature_last',
|
||||||
|
@ -267,6 +270,10 @@ def gather_interface_values(*args):
|
||||||
if not shared.args.multi_user:
|
if not shared.args.multi_user:
|
||||||
shared.persistent_interface_state = output
|
shared.persistent_interface_state = output
|
||||||
|
|
||||||
|
# Prevent history loss if backend is restarted but UI is not refreshed
|
||||||
|
if output['history'] is None and output['unique_id'] is not None:
|
||||||
|
output['history'] = load_history(output['unique_id'], output['character_menu'], output['mode'])
|
||||||
|
|
||||||
return output
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -18,7 +18,7 @@ def create_ui():
|
||||||
mu = shared.args.multi_user
|
mu = shared.args.multi_user
|
||||||
|
|
||||||
shared.gradio['Chat input'] = gr.State()
|
shared.gradio['Chat input'] = gr.State()
|
||||||
shared.gradio['history'] = gr.JSON(visible=False)
|
shared.gradio['history'] = gr.State({'internal': [], 'visible': [], 'metadata': {}})
|
||||||
|
|
||||||
with gr.Tab('Chat', id='Chat', elem_id='chat-tab'):
|
with gr.Tab('Chat', id='Chat', elem_id='chat-tab'):
|
||||||
with gr.Row(elem_id='past-chats-row', elem_classes=['pretty_scrollbar']):
|
with gr.Row(elem_id='past-chats-row', elem_classes=['pretty_scrollbar']):
|
||||||
|
@ -55,7 +55,6 @@ def create_ui():
|
||||||
|
|
||||||
with gr.Column(scale=10, elem_id='chat-input-container'):
|
with gr.Column(scale=10, elem_id='chat-input-container'):
|
||||||
shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar'])
|
shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar'])
|
||||||
shared.gradio['show_controls'] = gr.Checkbox(value=shared.settings['show_controls'], label='Show controls (Ctrl+S)', elem_id='show-controls')
|
|
||||||
shared.gradio['typing-dots'] = gr.HTML(value='<div class="typing"><span></span><span class="dot1"></span><span class="dot2"></span></div>', label='typing', elem_id='typing-container')
|
shared.gradio['typing-dots'] = gr.HTML(value='<div class="typing"><span></span><span class="dot1"></span><span class="dot2"></span></div>', label='typing', elem_id='typing-container')
|
||||||
|
|
||||||
with gr.Column(scale=1, elem_id='generate-stop-container'):
|
with gr.Column(scale=1, elem_id='generate-stop-container'):
|
||||||
|
@ -65,21 +64,15 @@ def create_ui():
|
||||||
|
|
||||||
# Hover menu buttons
|
# Hover menu buttons
|
||||||
with gr.Column(elem_id='chat-buttons'):
|
with gr.Column(elem_id='chat-buttons'):
|
||||||
with gr.Row():
|
shared.gradio['Regenerate'] = gr.Button('Regenerate (Ctrl + Enter)', elem_id='Regenerate')
|
||||||
shared.gradio['Regenerate'] = gr.Button('Regenerate (Ctrl + Enter)', elem_id='Regenerate')
|
shared.gradio['Continue'] = gr.Button('Continue (Alt + Enter)', elem_id='Continue')
|
||||||
shared.gradio['Continue'] = gr.Button('Continue (Alt + Enter)', elem_id='Continue')
|
shared.gradio['Remove last'] = gr.Button('Remove last reply (Ctrl + Shift + Backspace)', elem_id='Remove-last')
|
||||||
shared.gradio['Remove last'] = gr.Button('Remove last reply (Ctrl + Shift + Backspace)', elem_id='Remove-last')
|
shared.gradio['Impersonate'] = gr.Button('Impersonate (Ctrl + Shift + M)', elem_id='Impersonate')
|
||||||
|
shared.gradio['Send dummy message'] = gr.Button('Send dummy message')
|
||||||
with gr.Row():
|
shared.gradio['Send dummy reply'] = gr.Button('Send dummy reply')
|
||||||
shared.gradio['Impersonate'] = gr.Button('Impersonate (Ctrl + Shift + M)', elem_id='Impersonate')
|
shared.gradio['send-chat-to-default'] = gr.Button('Send to Default')
|
||||||
|
shared.gradio['send-chat-to-notebook'] = gr.Button('Send to Notebook')
|
||||||
with gr.Row():
|
shared.gradio['show_controls'] = gr.Checkbox(value=shared.settings['show_controls'], label='Show controls (Ctrl+S)', elem_id='show-controls')
|
||||||
shared.gradio['Send dummy message'] = gr.Button('Send dummy message')
|
|
||||||
shared.gradio['Send dummy reply'] = gr.Button('Send dummy reply')
|
|
||||||
|
|
||||||
with gr.Row():
|
|
||||||
shared.gradio['send-chat-to-default'] = gr.Button('Send to Default')
|
|
||||||
shared.gradio['send-chat-to-notebook'] = gr.Button('Send to Notebook')
|
|
||||||
|
|
||||||
with gr.Row(elem_id='chat-controls', elem_classes=['pretty_scrollbar']):
|
with gr.Row(elem_id='chat-controls', elem_classes=['pretty_scrollbar']):
|
||||||
with gr.Column():
|
with gr.Column():
|
||||||
|
@ -87,7 +80,7 @@ def create_ui():
|
||||||
shared.gradio['start_with'] = gr.Textbox(label='Start reply with', placeholder='Sure thing!', value=shared.settings['start_with'], elem_classes=['add_scrollbar'])
|
shared.gradio['start_with'] = gr.Textbox(label='Start reply with', placeholder='Sure thing!', value=shared.settings['start_with'], elem_classes=['add_scrollbar'])
|
||||||
|
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
shared.gradio['enable_web_search'] = gr.Checkbox(value=shared.settings.get('enable_web_search', False), label='Activate web search')
|
shared.gradio['enable_web_search'] = gr.Checkbox(value=shared.settings.get('enable_web_search', False), label='Activate web search', elem_id='web-search')
|
||||||
|
|
||||||
with gr.Row(visible=shared.settings.get('enable_web_search', False)) as shared.gradio['web_search_row']:
|
with gr.Row(visible=shared.settings.get('enable_web_search', False)) as shared.gradio['web_search_row']:
|
||||||
shared.gradio['web_search_pages'] = gr.Number(value=shared.settings.get('web_search_pages', 3), precision=0, label='Number of pages to download', minimum=1, maximum=10)
|
shared.gradio['web_search_pages'] = gr.Number(value=shared.settings.get('web_search_pages', 3), precision=0, label='Number of pages to download', minimum=1, maximum=10)
|
||||||
|
@ -202,7 +195,7 @@ def create_event_handlers():
|
||||||
shared.reload_inputs = gradio(reload_arr)
|
shared.reload_inputs = gradio(reload_arr)
|
||||||
|
|
||||||
# Morph HTML updates instead of updating everything
|
# Morph HTML updates instead of updating everything
|
||||||
shared.gradio['display'].change(None, gradio('display'), None, js="(data) => handleMorphdomUpdate(data.html)")
|
shared.gradio['display'].change(None, gradio('display'), None, js="(data) => handleMorphdomUpdate(data)")
|
||||||
|
|
||||||
shared.gradio['Generate'].click(
|
shared.gradio['Generate'].click(
|
||||||
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
|
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
|
||||||
|
|
|
@ -1,4 +1,6 @@
|
||||||
import importlib
|
import importlib
|
||||||
|
import queue
|
||||||
|
import threading
|
||||||
import traceback
|
import traceback
|
||||||
from functools import partial
|
from functools import partial
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
@ -205,48 +207,51 @@ def load_lora_wrapper(selected_loras):
|
||||||
|
|
||||||
|
|
||||||
def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), return_links=False, check=False):
|
def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), return_links=False, check=False):
|
||||||
|
downloader_module = importlib.import_module("download-model")
|
||||||
|
downloader = downloader_module.ModelDownloader()
|
||||||
|
update_queue = queue.Queue()
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Handle direct GGUF URLs
|
# Handle direct GGUF URLs
|
||||||
if repo_id.startswith("https://") and ("huggingface.co" in repo_id) and (repo_id.endswith(".gguf") or repo_id.endswith(".gguf?download=true")):
|
if repo_id.startswith("https://") and ("huggingface.co" in repo_id) and (repo_id.endswith(".gguf") or repo_id.endswith(".gguf?download=true")):
|
||||||
try:
|
try:
|
||||||
path = repo_id.split("huggingface.co/")[1]
|
path = repo_id.split("huggingface.co/")[1]
|
||||||
|
|
||||||
# Extract the repository ID (first two parts of the path)
|
|
||||||
parts = path.split("/")
|
parts = path.split("/")
|
||||||
if len(parts) >= 2:
|
if len(parts) >= 2:
|
||||||
extracted_repo_id = f"{parts[0]}/{parts[1]}"
|
extracted_repo_id = f"{parts[0]}/{parts[1]}"
|
||||||
|
filename = repo_id.split("/")[-1].replace("?download=true", "")
|
||||||
# Extract the filename (last part of the path)
|
|
||||||
filename = repo_id.split("/")[-1]
|
|
||||||
if "?download=true" in filename:
|
|
||||||
filename = filename.replace("?download=true", "")
|
|
||||||
|
|
||||||
repo_id = extracted_repo_id
|
repo_id = extracted_repo_id
|
||||||
specific_file = filename
|
specific_file = filename
|
||||||
except:
|
except Exception as e:
|
||||||
pass
|
yield f"Error parsing GGUF URL: {e}"
|
||||||
|
progress(0.0)
|
||||||
|
return
|
||||||
|
|
||||||
if repo_id == "":
|
if not repo_id:
|
||||||
yield ("Please enter a model path")
|
yield "Please enter a model path."
|
||||||
|
progress(0.0)
|
||||||
return
|
return
|
||||||
|
|
||||||
repo_id = repo_id.strip()
|
repo_id = repo_id.strip()
|
||||||
specific_file = specific_file.strip()
|
specific_file = specific_file.strip()
|
||||||
downloader = importlib.import_module("download-model").ModelDownloader()
|
|
||||||
|
|
||||||
progress(0.0)
|
progress(0.0, "Preparing download...")
|
||||||
|
|
||||||
model, branch = downloader.sanitize_model_and_branch_names(repo_id, None)
|
model, branch = downloader.sanitize_model_and_branch_names(repo_id, None)
|
||||||
|
yield "Getting download links from Hugging Face..."
|
||||||
yield ("Getting the download links from Hugging Face")
|
|
||||||
links, sha256, is_lora, is_llamacpp = downloader.get_download_links_from_huggingface(model, branch, text_only=False, specific_file=specific_file)
|
links, sha256, is_lora, is_llamacpp = downloader.get_download_links_from_huggingface(model, branch, text_only=False, specific_file=specific_file)
|
||||||
|
|
||||||
|
if not links:
|
||||||
|
yield "No files found to download for the given model/criteria."
|
||||||
|
progress(0.0)
|
||||||
|
return
|
||||||
|
|
||||||
# Check for multiple GGUF files
|
# Check for multiple GGUF files
|
||||||
gguf_files = [link for link in links if link.lower().endswith('.gguf')]
|
gguf_files = [link for link in links if link.lower().endswith('.gguf')]
|
||||||
if len(gguf_files) > 1 and not specific_file:
|
if len(gguf_files) > 1 and not specific_file:
|
||||||
output = "Multiple GGUF files found. Please copy one of the following filenames to the 'File name' field:\n\n```\n"
|
output = "Multiple GGUF files found. Please copy one of the following filenames to the 'File name' field:\n\n```\n"
|
||||||
for link in gguf_files:
|
for link in gguf_files:
|
||||||
output += f"{Path(link).name}\n"
|
output += f"{Path(link).name}\n"
|
||||||
|
|
||||||
output += "```"
|
output += "```"
|
||||||
yield output
|
yield output
|
||||||
return
|
return
|
||||||
|
@ -255,17 +260,13 @@ def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), retur
|
||||||
output = "```\n"
|
output = "```\n"
|
||||||
for link in links:
|
for link in links:
|
||||||
output += f"{Path(link).name}" + "\n"
|
output += f"{Path(link).name}" + "\n"
|
||||||
|
|
||||||
output += "```"
|
output += "```"
|
||||||
yield output
|
yield output
|
||||||
return
|
return
|
||||||
|
|
||||||
yield ("Getting the output folder")
|
yield "Determining output folder..."
|
||||||
output_folder = downloader.get_output_folder(
|
output_folder = downloader.get_output_folder(
|
||||||
model,
|
model, branch, is_lora, is_llamacpp=is_llamacpp,
|
||||||
branch,
|
|
||||||
is_lora,
|
|
||||||
is_llamacpp=is_llamacpp,
|
|
||||||
model_dir=shared.args.model_dir if shared.args.model_dir != shared.args_defaults.model_dir else None
|
model_dir=shared.args.model_dir if shared.args.model_dir != shared.args_defaults.model_dir else None
|
||||||
)
|
)
|
||||||
|
|
||||||
|
@ -275,19 +276,65 @@ def download_model_wrapper(repo_id, specific_file, progress=gr.Progress(), retur
|
||||||
output_folder = Path(shared.args.lora_dir)
|
output_folder = Path(shared.args.lora_dir)
|
||||||
|
|
||||||
if check:
|
if check:
|
||||||
progress(0.5)
|
yield "Checking previously downloaded files..."
|
||||||
|
progress(0.5, "Verifying files...")
|
||||||
yield ("Checking previously downloaded files")
|
|
||||||
downloader.check_model_files(model, branch, links, sha256, output_folder)
|
downloader.check_model_files(model, branch, links, sha256, output_folder)
|
||||||
progress(1.0)
|
progress(1.0, "Verification complete.")
|
||||||
else:
|
yield "File check complete."
|
||||||
yield (f"Downloading file{'s' if len(links) > 1 else ''} to `{output_folder}/`")
|
return
|
||||||
downloader.download_model_files(model, branch, links, sha256, output_folder, progress_bar=progress, threads=4, is_llamacpp=is_llamacpp)
|
|
||||||
|
|
||||||
yield (f"Model successfully saved to `{output_folder}/`.")
|
yield ""
|
||||||
except:
|
progress(0.0, "Download starting...")
|
||||||
progress(1.0)
|
|
||||||
yield traceback.format_exc().replace('\n', '\n\n')
|
def downloader_thread_target():
|
||||||
|
try:
|
||||||
|
downloader.download_model_files(
|
||||||
|
model, branch, links, sha256, output_folder,
|
||||||
|
progress_queue=update_queue,
|
||||||
|
threads=4,
|
||||||
|
is_llamacpp=is_llamacpp,
|
||||||
|
specific_file=specific_file
|
||||||
|
)
|
||||||
|
update_queue.put(("COMPLETED", f"Model successfully saved to `{output_folder}/`."))
|
||||||
|
except Exception as e:
|
||||||
|
tb_str = traceback.format_exc().replace('\n', '\n\n')
|
||||||
|
update_queue.put(("ERROR", tb_str))
|
||||||
|
|
||||||
|
download_thread = threading.Thread(target=downloader_thread_target)
|
||||||
|
download_thread.start()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
message = update_queue.get(timeout=0.2)
|
||||||
|
if not isinstance(message, tuple) or len(message) != 2:
|
||||||
|
continue
|
||||||
|
|
||||||
|
msg_identifier, data = message
|
||||||
|
|
||||||
|
if msg_identifier == "COMPLETED":
|
||||||
|
progress(1.0, "Download complete!")
|
||||||
|
yield data
|
||||||
|
break
|
||||||
|
elif msg_identifier == "ERROR":
|
||||||
|
progress(0.0, "Error occurred")
|
||||||
|
yield data
|
||||||
|
break
|
||||||
|
elif isinstance(msg_identifier, float):
|
||||||
|
progress_value = msg_identifier
|
||||||
|
description_str = data
|
||||||
|
progress(progress_value, f"Downloading: {description_str}")
|
||||||
|
|
||||||
|
except queue.Empty:
|
||||||
|
if not download_thread.is_alive():
|
||||||
|
yield "Download process finished."
|
||||||
|
break
|
||||||
|
|
||||||
|
download_thread.join()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
progress(0.0)
|
||||||
|
tb_str = traceback.format_exc().replace('\n', '\n\n')
|
||||||
|
yield tb_str
|
||||||
|
|
||||||
|
|
||||||
def update_truncation_length(current_length, state):
|
def update_truncation_length(current_length, state):
|
||||||
|
|
|
@ -71,8 +71,6 @@ def create_ui(default_preset):
|
||||||
shared.gradio['max_new_tokens'] = gr.Slider(minimum=shared.settings['max_new_tokens_min'], maximum=shared.settings['max_new_tokens_max'], value=shared.settings['max_new_tokens'], step=1, label='max_new_tokens', info='⚠️ Setting this too high can cause prompt truncation.')
|
shared.gradio['max_new_tokens'] = gr.Slider(minimum=shared.settings['max_new_tokens_min'], maximum=shared.settings['max_new_tokens_max'], value=shared.settings['max_new_tokens'], step=1, label='max_new_tokens', info='⚠️ Setting this too high can cause prompt truncation.')
|
||||||
shared.gradio['prompt_lookup_num_tokens'] = gr.Slider(value=shared.settings['prompt_lookup_num_tokens'], minimum=0, maximum=10, step=1, label='prompt_lookup_num_tokens', info='Activates Prompt Lookup Decoding.')
|
shared.gradio['prompt_lookup_num_tokens'] = gr.Slider(value=shared.settings['prompt_lookup_num_tokens'], minimum=0, maximum=10, step=1, label='prompt_lookup_num_tokens', info='Activates Prompt Lookup Decoding.')
|
||||||
shared.gradio['max_tokens_second'] = gr.Slider(value=shared.settings['max_tokens_second'], minimum=0, maximum=20, step=1, label='Maximum tokens/second', info='To make text readable in real time.')
|
shared.gradio['max_tokens_second'] = gr.Slider(value=shared.settings['max_tokens_second'], minimum=0, maximum=20, step=1, label='Maximum tokens/second', info='To make text readable in real time.')
|
||||||
shared.gradio['max_updates_second'] = gr.Slider(value=shared.settings['max_updates_second'], minimum=0, maximum=24, step=1, label='Maximum UI updates/second', info='Set this if you experience lag in the UI during streaming.')
|
|
||||||
|
|
||||||
with gr.Column():
|
with gr.Column():
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
with gr.Column():
|
with gr.Column():
|
||||||
|
|
253
one_click.py
253
one_click.py
|
@ -70,12 +70,8 @@ def is_installed():
|
||||||
def cpu_has_avx2():
|
def cpu_has_avx2():
|
||||||
try:
|
try:
|
||||||
import cpuinfo
|
import cpuinfo
|
||||||
|
|
||||||
info = cpuinfo.get_cpu_info()
|
info = cpuinfo.get_cpu_info()
|
||||||
if 'avx2' in info['flags']:
|
return 'avx2' in info['flags']
|
||||||
return True
|
|
||||||
else:
|
|
||||||
return False
|
|
||||||
except:
|
except:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
@ -83,30 +79,112 @@ def cpu_has_avx2():
|
||||||
def cpu_has_amx():
|
def cpu_has_amx():
|
||||||
try:
|
try:
|
||||||
import cpuinfo
|
import cpuinfo
|
||||||
|
|
||||||
info = cpuinfo.get_cpu_info()
|
info = cpuinfo.get_cpu_info()
|
||||||
if 'amx' in info['flags']:
|
return 'amx' in info['flags']
|
||||||
return True
|
|
||||||
else:
|
|
||||||
return False
|
|
||||||
except:
|
except:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
def torch_version():
|
def load_state():
|
||||||
site_packages_path = None
|
"""Load installer state from JSON file"""
|
||||||
for sitedir in site.getsitepackages():
|
if os.path.exists(state_file):
|
||||||
if "site-packages" in sitedir and conda_env_path in sitedir:
|
try:
|
||||||
site_packages_path = sitedir
|
with open(state_file, 'r') as f:
|
||||||
break
|
return json.load(f)
|
||||||
|
except:
|
||||||
|
return {}
|
||||||
|
return {}
|
||||||
|
|
||||||
if site_packages_path:
|
|
||||||
torch_version_file = open(os.path.join(site_packages_path, 'torch', 'version.py')).read().splitlines()
|
def save_state(state):
|
||||||
torver = [line for line in torch_version_file if line.startswith('__version__')][0].split('__version__ = ')[1].strip("'")
|
"""Save installer state to JSON file"""
|
||||||
|
with open(state_file, 'w') as f:
|
||||||
|
json.dump(state, f)
|
||||||
|
|
||||||
|
|
||||||
|
def get_gpu_choice():
|
||||||
|
"""Get GPU choice from state file or ask user"""
|
||||||
|
state = load_state()
|
||||||
|
gpu_choice = state.get('gpu_choice')
|
||||||
|
|
||||||
|
if not gpu_choice:
|
||||||
|
if "GPU_CHOICE" in os.environ:
|
||||||
|
choice = os.environ["GPU_CHOICE"].upper()
|
||||||
|
print_big_message(f"Selected GPU choice \"{choice}\" based on the GPU_CHOICE environment variable.")
|
||||||
|
else:
|
||||||
|
choice = get_user_choice(
|
||||||
|
"What is your GPU?",
|
||||||
|
{
|
||||||
|
'A': 'NVIDIA - CUDA 12.4',
|
||||||
|
'B': 'AMD - Linux/macOS only, requires ROCm 6.2.4',
|
||||||
|
'C': 'Apple M Series',
|
||||||
|
'D': 'Intel Arc (beta)',
|
||||||
|
'N': 'CPU mode'
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
# Convert choice to GPU name
|
||||||
|
gpu_choice = {"A": "NVIDIA", "B": "AMD", "C": "APPLE", "D": "INTEL", "N": "NONE"}[choice]
|
||||||
|
|
||||||
|
# Save choice to state
|
||||||
|
state['gpu_choice'] = gpu_choice
|
||||||
|
save_state(state)
|
||||||
|
|
||||||
|
return gpu_choice
|
||||||
|
|
||||||
|
|
||||||
|
def get_pytorch_install_command(gpu_choice):
|
||||||
|
"""Get PyTorch installation command based on GPU choice"""
|
||||||
|
base_cmd = f"python -m pip install torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION} "
|
||||||
|
|
||||||
|
if gpu_choice == "NVIDIA":
|
||||||
|
return base_cmd + "--index-url https://download.pytorch.org/whl/cu124"
|
||||||
|
elif gpu_choice == "AMD":
|
||||||
|
return base_cmd + "--index-url https://download.pytorch.org/whl/rocm6.2.4"
|
||||||
|
elif gpu_choice in ["APPLE", "NONE"]:
|
||||||
|
return base_cmd + "--index-url https://download.pytorch.org/whl/cpu"
|
||||||
|
elif gpu_choice == "INTEL":
|
||||||
|
if is_linux():
|
||||||
|
return "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
|
||||||
|
else:
|
||||||
|
return "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
|
||||||
else:
|
else:
|
||||||
from torch import __version__ as torver
|
return base_cmd
|
||||||
|
|
||||||
return torver
|
|
||||||
|
def get_pytorch_update_command(gpu_choice):
|
||||||
|
"""Get PyTorch update command based on GPU choice"""
|
||||||
|
base_cmd = f"python -m pip install --upgrade torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION}"
|
||||||
|
|
||||||
|
if gpu_choice == "NVIDIA":
|
||||||
|
return f"{base_cmd} --index-url https://download.pytorch.org/whl/cu124"
|
||||||
|
elif gpu_choice == "AMD":
|
||||||
|
return f"{base_cmd} --index-url https://download.pytorch.org/whl/rocm6.2.4"
|
||||||
|
elif gpu_choice in ["APPLE", "NONE"]:
|
||||||
|
return f"{base_cmd} --index-url https://download.pytorch.org/whl/cpu"
|
||||||
|
elif gpu_choice == "INTEL":
|
||||||
|
intel_extension = "intel-extension-for-pytorch==2.1.10+xpu" if is_linux() else "intel-extension-for-pytorch==2.1.10"
|
||||||
|
return f"{base_cmd} {intel_extension} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
|
||||||
|
else:
|
||||||
|
return base_cmd
|
||||||
|
|
||||||
|
|
||||||
|
def get_requirements_file(gpu_choice):
|
||||||
|
"""Get requirements file path based on GPU choice"""
|
||||||
|
requirements_base = os.path.join("requirements", "full")
|
||||||
|
|
||||||
|
if gpu_choice == "AMD":
|
||||||
|
file_name = f"requirements_amd{'_noavx2' if not cpu_has_avx2() else ''}.txt"
|
||||||
|
elif gpu_choice == "APPLE":
|
||||||
|
file_name = f"requirements_apple_{'intel' if is_x86_64() else 'silicon'}.txt"
|
||||||
|
elif gpu_choice in ["INTEL", "NONE"]:
|
||||||
|
file_name = f"requirements_cpu_only{'_noavx2' if not cpu_has_avx2() else ''}.txt"
|
||||||
|
elif gpu_choice == "NVIDIA":
|
||||||
|
file_name = f"requirements{'_noavx2' if not cpu_has_avx2() else ''}.txt"
|
||||||
|
else:
|
||||||
|
raise ValueError(f"Unknown GPU choice: {gpu_choice}")
|
||||||
|
|
||||||
|
return os.path.join(requirements_base, file_name)
|
||||||
|
|
||||||
|
|
||||||
def get_current_commit():
|
def get_current_commit():
|
||||||
|
@ -209,28 +287,8 @@ def get_user_choice(question, options_dict):
|
||||||
|
|
||||||
def update_pytorch_and_python():
|
def update_pytorch_and_python():
|
||||||
print_big_message("Checking for PyTorch updates.")
|
print_big_message("Checking for PyTorch updates.")
|
||||||
|
gpu_choice = get_gpu_choice()
|
||||||
# Update the Python version. Left here for future reference in case this becomes necessary.
|
install_cmd = get_pytorch_update_command(gpu_choice)
|
||||||
# print_big_message("Checking for PyTorch and Python updates.")
|
|
||||||
# current_python_version = f"{sys.version_info.major}.{sys.version_info.minor}"
|
|
||||||
# if current_python_version != PYTHON_VERSION:
|
|
||||||
# run_cmd(f"conda install -y python={PYTHON_VERSION}", assert_success=True, environment=True)
|
|
||||||
|
|
||||||
torver = torch_version()
|
|
||||||
base_cmd = f"python -m pip install --upgrade torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION}"
|
|
||||||
|
|
||||||
if "+cu" in torver:
|
|
||||||
install_cmd = f"{base_cmd} --index-url https://download.pytorch.org/whl/cu124"
|
|
||||||
elif "+rocm" in torver:
|
|
||||||
install_cmd = f"{base_cmd} --index-url https://download.pytorch.org/whl/rocm6.2.4"
|
|
||||||
elif "+cpu" in torver:
|
|
||||||
install_cmd = f"{base_cmd} --index-url https://download.pytorch.org/whl/cpu"
|
|
||||||
elif "+cxx11" in torver:
|
|
||||||
intel_extension = "intel-extension-for-pytorch==2.1.10+xpu" if is_linux() else "intel-extension-for-pytorch==2.1.10"
|
|
||||||
install_cmd = f"{base_cmd} {intel_extension} --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
|
|
||||||
else:
|
|
||||||
install_cmd = base_cmd
|
|
||||||
|
|
||||||
run_cmd(install_cmd, assert_success=True, environment=True)
|
run_cmd(install_cmd, assert_success=True, environment=True)
|
||||||
|
|
||||||
|
|
||||||
|
@ -256,43 +314,11 @@ def install_webui():
|
||||||
if os.path.isfile(state_file):
|
if os.path.isfile(state_file):
|
||||||
os.remove(state_file)
|
os.remove(state_file)
|
||||||
|
|
||||||
# Ask the user for the GPU vendor
|
# Get GPU choice and save it to state
|
||||||
if "GPU_CHOICE" in os.environ:
|
gpu_choice = get_gpu_choice()
|
||||||
choice = os.environ["GPU_CHOICE"].upper()
|
|
||||||
print_big_message(f"Selected GPU choice \"{choice}\" based on the GPU_CHOICE environment variable.")
|
|
||||||
|
|
||||||
# Warn about changed meanings and handle old choices
|
|
||||||
if choice == "B":
|
|
||||||
print_big_message("Warning: GPU_CHOICE='B' now means 'AMD' in the new version.")
|
|
||||||
elif choice == "C":
|
|
||||||
print_big_message("Warning: GPU_CHOICE='C' now means 'Apple M Series' in the new version.")
|
|
||||||
elif choice == "D":
|
|
||||||
print_big_message("Warning: GPU_CHOICE='D' now means 'Intel Arc' in the new version.")
|
|
||||||
else:
|
|
||||||
choice = get_user_choice(
|
|
||||||
"What is your GPU?",
|
|
||||||
{
|
|
||||||
'A': 'NVIDIA - CUDA 12.4',
|
|
||||||
'B': 'AMD - Linux/macOS only, requires ROCm 6.2.4',
|
|
||||||
'C': 'Apple M Series',
|
|
||||||
'D': 'Intel Arc (beta)',
|
|
||||||
'N': 'CPU mode'
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
||||||
# Convert choices to GPU names for compatibility
|
|
||||||
gpu_choice_to_name = {
|
|
||||||
"A": "NVIDIA",
|
|
||||||
"B": "AMD",
|
|
||||||
"C": "APPLE",
|
|
||||||
"D": "INTEL",
|
|
||||||
"N": "NONE"
|
|
||||||
}
|
|
||||||
|
|
||||||
selected_gpu = gpu_choice_to_name[choice]
|
|
||||||
|
|
||||||
# Write a flag to CMD_FLAGS.txt for CPU mode
|
# Write a flag to CMD_FLAGS.txt for CPU mode
|
||||||
if selected_gpu == "NONE":
|
if gpu_choice == "NONE":
|
||||||
cmd_flags_path = os.path.join(script_dir, "user_data", "CMD_FLAGS.txt")
|
cmd_flags_path = os.path.join(script_dir, "user_data", "CMD_FLAGS.txt")
|
||||||
with open(cmd_flags_path, 'r+') as cmd_flags_file:
|
with open(cmd_flags_path, 'r+') as cmd_flags_file:
|
||||||
if "--cpu" not in cmd_flags_file.read():
|
if "--cpu" not in cmd_flags_file.read():
|
||||||
|
@ -300,34 +326,20 @@ def install_webui():
|
||||||
cmd_flags_file.write("\n--cpu\n")
|
cmd_flags_file.write("\n--cpu\n")
|
||||||
|
|
||||||
# Handle CUDA version display
|
# Handle CUDA version display
|
||||||
elif any((is_windows(), is_linux())) and selected_gpu == "NVIDIA":
|
elif any((is_windows(), is_linux())) and gpu_choice == "NVIDIA":
|
||||||
print("CUDA: 12.4")
|
print("CUDA: 12.4")
|
||||||
|
|
||||||
# No PyTorch for AMD on Windows (?)
|
# No PyTorch for AMD on Windows (?)
|
||||||
elif is_windows() and selected_gpu == "AMD":
|
elif is_windows() and gpu_choice == "AMD":
|
||||||
print("PyTorch setup on Windows is not implemented yet. Exiting...")
|
print("PyTorch setup on Windows is not implemented yet. Exiting...")
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# Find the Pytorch installation command
|
|
||||||
install_pytorch = f"python -m pip install torch=={TORCH_VERSION} torchvision=={TORCHVISION_VERSION} torchaudio=={TORCHAUDIO_VERSION} "
|
|
||||||
|
|
||||||
if selected_gpu == "NVIDIA":
|
|
||||||
install_pytorch += "--index-url https://download.pytorch.org/whl/cu124"
|
|
||||||
elif selected_gpu == "AMD":
|
|
||||||
install_pytorch += "--index-url https://download.pytorch.org/whl/rocm6.2.4"
|
|
||||||
elif selected_gpu in ["APPLE", "NONE"]:
|
|
||||||
install_pytorch += "--index-url https://download.pytorch.org/whl/cpu"
|
|
||||||
elif selected_gpu == "INTEL":
|
|
||||||
if is_linux():
|
|
||||||
install_pytorch = "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
|
|
||||||
else:
|
|
||||||
install_pytorch = "python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
|
|
||||||
|
|
||||||
# Install Git and then Pytorch
|
# Install Git and then Pytorch
|
||||||
print_big_message("Installing PyTorch.")
|
print_big_message("Installing PyTorch.")
|
||||||
|
install_pytorch = get_pytorch_install_command(gpu_choice)
|
||||||
run_cmd(f"conda install -y ninja git && {install_pytorch} && python -m pip install py-cpuinfo==9.0.0", assert_success=True, environment=True)
|
run_cmd(f"conda install -y ninja git && {install_pytorch} && python -m pip install py-cpuinfo==9.0.0", assert_success=True, environment=True)
|
||||||
|
|
||||||
if selected_gpu == "INTEL":
|
if gpu_choice == "INTEL":
|
||||||
# Install oneAPI dependencies via conda
|
# Install oneAPI dependencies via conda
|
||||||
print_big_message("Installing Intel oneAPI runtime libraries.")
|
print_big_message("Installing Intel oneAPI runtime libraries.")
|
||||||
run_cmd("conda install -y -c https://software.repos.intel.com/python/conda/ -c conda-forge dpcpp-cpp-rt=2024.0 mkl-dpcpp=2024.0", environment=True)
|
run_cmd("conda install -y -c https://software.repos.intel.com/python/conda/ -c conda-forge dpcpp-cpp-rt=2024.0 mkl-dpcpp=2024.0", environment=True)
|
||||||
|
@ -349,31 +361,15 @@ def update_requirements(initial_installation=False, pull=True):
|
||||||
assert_success=True
|
assert_success=True
|
||||||
)
|
)
|
||||||
|
|
||||||
torver = torch_version()
|
|
||||||
requirements_base = os.path.join("requirements", "full")
|
|
||||||
|
|
||||||
if "+rocm" in torver:
|
|
||||||
file_name = f"requirements_amd{'_noavx2' if not cpu_has_avx2() else ''}.txt"
|
|
||||||
elif "+cpu" in torver or "+cxx11" in torver:
|
|
||||||
file_name = f"requirements_cpu_only{'_noavx2' if not cpu_has_avx2() else ''}.txt"
|
|
||||||
elif is_macos():
|
|
||||||
file_name = f"requirements_apple_{'intel' if is_x86_64() else 'silicon'}.txt"
|
|
||||||
else:
|
|
||||||
file_name = f"requirements{'_noavx2' if not cpu_has_avx2() else ''}.txt"
|
|
||||||
|
|
||||||
requirements_file = os.path.join(requirements_base, file_name)
|
|
||||||
|
|
||||||
# Load state from JSON file
|
|
||||||
current_commit = get_current_commit()
|
current_commit = get_current_commit()
|
||||||
wheels_changed = False
|
wheels_changed = not os.path.exists(state_file)
|
||||||
if os.path.exists(state_file):
|
if not wheels_changed:
|
||||||
with open(state_file, 'r') as f:
|
state = load_state()
|
||||||
last_state = json.load(f)
|
if 'wheels_changed' in state or state.get('last_installed_commit') != current_commit:
|
||||||
|
|
||||||
if 'wheels_changed' in last_state or last_state.get('last_installed_commit') != current_commit:
|
|
||||||
wheels_changed = True
|
wheels_changed = True
|
||||||
else:
|
|
||||||
wheels_changed = True
|
gpu_choice = get_gpu_choice()
|
||||||
|
requirements_file = get_requirements_file(gpu_choice)
|
||||||
|
|
||||||
if pull:
|
if pull:
|
||||||
# Read .whl lines before pulling
|
# Read .whl lines before pulling
|
||||||
|
@ -409,19 +405,17 @@ def update_requirements(initial_installation=False, pull=True):
|
||||||
print_big_message(f"File '{file}' was updated during 'git pull'. Please run the script again.")
|
print_big_message(f"File '{file}' was updated during 'git pull'. Please run the script again.")
|
||||||
|
|
||||||
# Save state before exiting
|
# Save state before exiting
|
||||||
current_state = {}
|
state = load_state()
|
||||||
if wheels_changed:
|
if wheels_changed:
|
||||||
current_state['wheels_changed'] = True
|
state['wheels_changed'] = True
|
||||||
|
save_state(state)
|
||||||
with open(state_file, 'w') as f:
|
|
||||||
json.dump(current_state, f)
|
|
||||||
|
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# Save current state
|
# Save current state
|
||||||
current_state = {'last_installed_commit': current_commit}
|
state = load_state()
|
||||||
with open(state_file, 'w') as f:
|
state['last_installed_commit'] = current_commit
|
||||||
json.dump(current_state, f)
|
state.pop('wheels_changed', None) # Remove wheels_changed flag
|
||||||
|
save_state(state)
|
||||||
|
|
||||||
if os.environ.get("INSTALL_EXTENSIONS", "").lower() in ("yes", "y", "true", "1", "t", "on"):
|
if os.environ.get("INSTALL_EXTENSIONS", "").lower() in ("yes", "y", "true", "1", "t", "on"):
|
||||||
install_extensions_requirements()
|
install_extensions_requirements()
|
||||||
|
@ -432,11 +426,10 @@ def update_requirements(initial_installation=False, pull=True):
|
||||||
# Update PyTorch
|
# Update PyTorch
|
||||||
if not initial_installation:
|
if not initial_installation:
|
||||||
update_pytorch_and_python()
|
update_pytorch_and_python()
|
||||||
torver = torch_version()
|
|
||||||
clean_outdated_pytorch_cuda_dependencies()
|
clean_outdated_pytorch_cuda_dependencies()
|
||||||
|
|
||||||
print_big_message(f"Installing webui requirements from file: {requirements_file}")
|
print_big_message(f"Installing webui requirements from file: {requirements_file}")
|
||||||
print(f"TORCH: {torver}\n")
|
print(f"GPU Choice: {gpu_choice}\n")
|
||||||
|
|
||||||
# Prepare the requirements file
|
# Prepare the requirements file
|
||||||
textgen_requirements = open(requirements_file).read().splitlines()
|
textgen_requirements = open(requirements_file).read().splitlines()
|
||||||
|
|
|
@ -16,6 +16,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -33,12 +34,12 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# CUDA wheels
|
# CUDA wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
|
||||||
https://github.com/oobabooga/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/kingbri1/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -32,7 +33,7 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# AMD wheels
|
# AMD wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -32,7 +33,7 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# AMD wheels
|
# AMD wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -32,7 +33,7 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# Mac wheels
|
# Mac wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9-py3-none-any.whl
|
https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3-py3-none-any.whl
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -32,8 +33,8 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# Mac wheels
|
# Mac wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9-py3-none-any.whl
|
https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3-py3-none-any.whl
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -32,5 +33,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# llama.cpp (CPU only, AVX2)
|
# llama.cpp (CPU only, AVX2)
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -32,5 +33,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# llama.cpp (CPU only, no AVX2)
|
# llama.cpp (CPU only, no AVX2)
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
|
|
|
@ -16,6 +16,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -33,12 +34,12 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# CUDA wheels
|
# CUDA wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/oobabooga/exllamav3/releases/download/v0.0.3/exllamav3-0.0.3+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
|
https://github.com/turboderp-org/exllamav2/releases/download/v0.3.1/exllamav2-0.3.1-py3-none-any.whl; platform_system == "Linux" and platform_machine != "x86_64"
|
||||||
https://github.com/oobabooga/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
https://github.com/kingbri1/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
|
||||||
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
|
||||||
|
|
|
@ -15,6 +15,7 @@ Pillow>=9.5.0
|
||||||
psutil
|
psutil
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# CUDA wheels
|
# CUDA wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# Mac wheels
|
# Mac wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,6 +19,6 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# Mac wheels
|
# Mac wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# llama.cpp (CPU only, AVX2)
|
# llama.cpp (CPU only, AVX2)
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# llama.cpp (CPU only, no AVX2)
|
# llama.cpp (CPU only, no AVX2)
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# CUDA wheels
|
# CUDA wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# CUDA wheels
|
# CUDA wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
|
|
|
@ -7,6 +7,7 @@ markdown
|
||||||
numpy==1.26.*
|
numpy==1.26.*
|
||||||
pydantic==2.8.2
|
pydantic==2.8.2
|
||||||
PyPDF2==3.0.1
|
PyPDF2==3.0.1
|
||||||
|
python-docx==1.1.2
|
||||||
pyyaml
|
pyyaml
|
||||||
requests
|
requests
|
||||||
rich
|
rich
|
||||||
|
@ -18,5 +19,5 @@ sse-starlette==1.6.5
|
||||||
tiktoken
|
tiktoken
|
||||||
|
|
||||||
# CUDA wheels
|
# CUDA wheels
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
|
||||||
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.16.0/llama_cpp_binaries-0.16.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
|
||||||
|
|
|
@ -60,6 +60,14 @@ from modules.utils import gradio
|
||||||
|
|
||||||
def signal_handler(sig, frame):
|
def signal_handler(sig, frame):
|
||||||
logger.info("Received Ctrl+C. Shutting down Text generation web UI gracefully.")
|
logger.info("Received Ctrl+C. Shutting down Text generation web UI gracefully.")
|
||||||
|
|
||||||
|
# Explicitly stop LlamaServer to avoid __del__ cleanup issues during shutdown
|
||||||
|
if shared.model and shared.model.__class__.__name__ == 'LlamaServer':
|
||||||
|
try:
|
||||||
|
shared.model.stop()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -18,7 +18,6 @@ max_new_tokens_min: 1
|
||||||
max_new_tokens_max: 4096
|
max_new_tokens_max: 4096
|
||||||
prompt_lookup_num_tokens: 0
|
prompt_lookup_num_tokens: 0
|
||||||
max_tokens_second: 0
|
max_tokens_second: 0
|
||||||
max_updates_second: 12
|
|
||||||
auto_max_new_tokens: true
|
auto_max_new_tokens: true
|
||||||
ban_eos_token: false
|
ban_eos_token: false
|
||||||
add_bos_token: true
|
add_bos_token: true
|
||||||
|
|
Loading…
Add table
Reference in a new issue