This commit is contained in:
oobabooga 2025-05-29 21:49:26 +00:00 committed by GitHub
commit 98055ff6cf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
41 changed files with 1472 additions and 313 deletions

View file

@ -12,18 +12,20 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
## Features
- Supports multiple text generation backends in one UI/API, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), and [ExLlamaV2](https://github.com/turboderp-org/exllamav2).
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) is also supported via its own [Dockerfile](https://github.com/oobabooga/text-generation-webui/blob/main/docker/TensorRT-LLM/Dockerfile).
- Additional quantization libraries like [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [HQQ](https://github.com/mobiusml/hqq), and [AQLM](https://github.com/Vahe1994/AQLM) can be used with the Transformers loader if you install them manually.
- Easy setup: Choose between **portable builds** (zero setup, just unzip and run) for llama.cpp GGUF models on Windows/Linux/macOS, or the one-click installer that creates a self-contained `installer_files` directory that doesn't interfere with your system environment.
- UI that resembles the original ChatGPT style.
- Supports multiple text generation backends in one UI/API, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [Transformers](https://github.com/huggingface/transformers), [ExLlamaV3](https://github.com/turboderp-org/exllamav3), [ExLlamaV2](https://github.com/turboderp-org/exllamav2), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) (the latter via its own [Dockerfile](https://github.com/oobabooga/text-generation-webui/blob/main/docker/TensorRT-LLM/Dockerfile)).
- Easy setup: Choose between **portable builds** (zero setup, just unzip and run) for GGUF models on Windows/Linux/macOS, or the one-click installer that creates a self-contained `installer_files` directory.
- **File attachments**: Upload text files and PDF documents directly in conversations to talk about their contents.
- **Web search**: Optionally search the internet with LLM-generated queries based on your input to add context to the conversation.
- Advanced chat management: Edit messages, navigate between message versions, and branch conversations at any point.
- Automatic prompt formatting using Jinja2 templates. You don't need to ever worry about prompt formats.
- Automatic GPU layers for GGUF models (on NVIDIA GPUs).
- UI that resembles the original ChatGPT style.
- Three chat modes: `instruct`, `chat-instruct`, and `chat`, with automatic prompt templates in `chat-instruct`.
- Free-form text generation in the Default/Notebook tabs without being limited to chat turns. You can send formatted conversations from the Chat tab to these.
- Multiple sampling parameters and generation options for sophisticated text generation control.
- Switch between different models easily in the UI without restarting, with fine control over settings.
- OpenAI-compatible API with Chat and Completions endpoints, including tool-calling support see [examples](https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#examples).
- 100% offline and private, with zero telemetry, external resources, or remote update requests.
- 100% offline and private, with zero telemetry, external resources, or remote update requests. Web search is optional and user-controlled.
- Extension support, with numerous built-in and user-contributed extensions available. See the [wiki](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions) and [extensions directory](https://github.com/oobabooga/text-generation-webui-extensions) for details.
## How to install
@ -146,14 +148,14 @@ The `requirements*.txt` above contain various wheels precompiled through GitHub
For NVIDIA GPU:
ln -s docker/{nvidia/Dockerfile,nvidia/docker-compose.yml,.dockerignore} .
For AMD GPU:
ln -s docker/{amd/Dockerfile,intel/docker-compose.yml,.dockerignore} .
ln -s docker/{amd/Dockerfile,amd/docker-compose.yml,.dockerignore} .
For Intel GPU:
ln -s docker/{intel/Dockerfile,amd/docker-compose.yml,.dockerignore} .
For CPU only
ln -s docker/{cpu/Dockerfile,cpu/docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
#Create logs/cache dir :
mkdir -p logs cache
mkdir -p user_data/logs user_data/cache
# Edit .env and set:
# TORCH_CUDA_ARCH_LIST based on your GPU model
# APP_RUNTIME_GID your host user's group id (run `id -g` in a terminal)
@ -187,13 +189,13 @@ usage: server.py [-h] [--multi-user] [--character CHARACTER] [--model MODEL] [--
[--extensions EXTENSIONS [EXTENSIONS ...]] [--verbose] [--idle-timeout IDLE_TIMEOUT] [--loader LOADER] [--cpu] [--cpu-memory CPU_MEMORY] [--disk] [--disk-cache-dir DISK_CACHE_DIR]
[--load-in-8bit] [--bf16] [--no-cache] [--trust-remote-code] [--force-safetensors] [--no_use_fast] [--use_flash_attention_2] [--use_eager_attention] [--torch-compile] [--load-in-4bit]
[--use_double_quant] [--compute_dtype COMPUTE_DTYPE] [--quant_type QUANT_TYPE] [--flash-attn] [--threads THREADS] [--threads-batch THREADS_BATCH] [--batch-size BATCH_SIZE] [--no-mmap]
[--mlock] [--n-gpu-layers N_GPU_LAYERS] [--tensor-split TENSOR_SPLIT] [--numa] [--no-kv-offload] [--row-split] [--extra-flags EXTRA_FLAGS] [--streaming-llm] [--ctx-size N]
[--mlock] [--gpu-layers N] [--tensor-split TENSOR_SPLIT] [--numa] [--no-kv-offload] [--row-split] [--extra-flags EXTRA_FLAGS] [--streaming-llm] [--ctx-size N] [--cache-type N]
[--model-draft MODEL_DRAFT] [--draft-max DRAFT_MAX] [--gpu-layers-draft GPU_LAYERS_DRAFT] [--device-draft DEVICE_DRAFT] [--ctx-size-draft CTX_SIZE_DRAFT] [--gpu-split GPU_SPLIT]
[--autosplit] [--cfg-cache] [--no_flash_attn] [--no_xformers] [--no_sdpa] [--num_experts_per_token N] [--enable_tp] [--hqq-backend HQQ_BACKEND] [--cpp-runner]
[--cache_type CACHE_TYPE] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR] [--local_rank LOCAL_RANK] [--alpha_value ALPHA_VALUE] [--rope_freq_base ROPE_FREQ_BASE]
[--compress_pos_emb COMPRESS_POS_EMB] [--listen] [--listen-port LISTEN_PORT] [--listen-host LISTEN_HOST] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH]
[--gradio-auth-path GRADIO_AUTH_PATH] [--ssl-keyfile SSL_KEYFILE] [--ssl-certfile SSL_CERTFILE] [--subpath SUBPATH] [--old-colors] [--api] [--public-api]
[--public-api-id PUBLIC_API_ID] [--api-port API_PORT] [--api-key API_KEY] [--admin-key ADMIN_KEY] [--api-enable-ipv6] [--api-disable-ipv4] [--nowebui]
[--autosplit] [--cfg-cache] [--no_flash_attn] [--no_xformers] [--no_sdpa] [--num_experts_per_token N] [--enable_tp] [--cpp-runner] [--deepspeed] [--nvme-offload-dir NVME_OFFLOAD_DIR]
[--local_rank LOCAL_RANK] [--alpha_value ALPHA_VALUE] [--rope_freq_base ROPE_FREQ_BASE] [--compress_pos_emb COMPRESS_POS_EMB] [--listen] [--listen-port LISTEN_PORT]
[--listen-host LISTEN_HOST] [--share] [--auto-launch] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--ssl-keyfile SSL_KEYFILE] [--ssl-certfile SSL_CERTFILE]
[--subpath SUBPATH] [--old-colors] [--portable] [--api] [--public-api] [--public-api-id PUBLIC_API_ID] [--api-port API_PORT] [--api-key API_KEY] [--admin-key ADMIN_KEY]
[--api-enable-ipv6] [--api-disable-ipv4] [--nowebui]
Text generation web UI
@ -215,7 +217,7 @@ Basic settings:
--idle-timeout IDLE_TIMEOUT Unload model after this many minutes of inactivity. It will be automatically reloaded when you try to use it again.
Model loader:
--loader LOADER Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, ExLlamav3_HF, ExLlamav2_HF, ExLlamav2, HQQ,
--loader LOADER Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, ExLlamav3_HF, ExLlamav2_HF, ExLlamav2,
TensorRT-LLM.
Transformers/Accelerate:
@ -246,16 +248,18 @@ llama.cpp:
--batch-size BATCH_SIZE Maximum number of prompt tokens to batch together when calling llama_eval.
--no-mmap Prevent mmap from being used.
--mlock Force the system to keep the model in RAM.
--n-gpu-layers N_GPU_LAYERS Number of layers to offload to the GPU.
--gpu-layers N, --n-gpu-layers N Number of layers to offload to the GPU.
--tensor-split TENSOR_SPLIT Split the model across multiple GPUs. Comma-separated list of proportions. Example: 60,40.
--numa Activate NUMA task allocation for llama.cpp.
--no-kv-offload Do not offload the K, Q, V to the GPU. This saves VRAM but reduces the performance.
--row-split Split the model by rows across GPUs. This may improve multi-gpu performance.
--extra-flags EXTRA_FLAGS Extra flags to pass to llama-server. Format: "flag1=value1;flag2;flag3=value3". Example: "override-tensor=exps=CPU"
--extra-flags EXTRA_FLAGS Extra flags to pass to llama-server. Format: "flag1=value1,flag2,flag3=value3". Example: "override-tensor=exps=CPU"
--streaming-llm Activate StreamingLLM to avoid re-evaluating the entire prompt when old messages are removed.
Context and cache management:
Context and cache:
--ctx-size N, --n_ctx N, --max_seq_len N Context size in tokens.
--cache-type N, --cache_type N KV cache type; valid options: llama.cpp - fp16, q8_0, q4_0; ExLlamaV2 - fp16, fp8, q8, q6, q4; ExLlamaV3 - fp16, q2 to q8 (can specify k_bits and v_bits
separately, e.g. q4_q8).
Speculative decoding:
--model-draft MODEL_DRAFT Path to the draft model for speculative decoding.
@ -274,15 +278,9 @@ ExLlamaV2:
--num_experts_per_token N Number of experts to use for generation. Applies to MoE models like Mixtral.
--enable_tp Enable Tensor Parallelism (TP) in ExLlamaV2.
HQQ:
--hqq-backend HQQ_BACKEND Backend for the HQQ loader. Valid options: PYTORCH, PYTORCH_COMPILE, ATEN.
TensorRT-LLM:
--cpp-runner Use the ModelRunnerCpp runner, which is faster than the default ModelRunner but doesn't support streaming yet.
Cache:
--cache_type CACHE_TYPE KV cache type; valid options: llama.cpp - fp16, q8_0, q4_0; ExLlamaV2 - fp16, fp8, q8, q6, q4.
DeepSpeed:
--deepspeed Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration.
--nvme-offload-dir NVME_OFFLOAD_DIR DeepSpeed: Directory to use for ZeRO-3 NVME offloading.
@ -305,6 +303,7 @@ Gradio:
--ssl-certfile SSL_CERTFILE The path to the SSL certificate cert file.
--subpath SUBPATH Customize the subpath for gradio, use with reverse proxy
--old-colors Use the legacy Gradio colors, before the December/2024 update.
--portable Hide features not available in portable mode like training.
API:
--api Enable the API extension.

View file

@ -1,11 +1,11 @@
:root {
--darker-gray: #202123;
--dark-gray: #343541;
--light-gray: #444654;
--dark-gray: #2A2B32;
--light-gray: #373943;
--light-theme-gray: #f9fbff;
--border-color-dark: #525252;
--header-width: 112px;
--selected-item-color-dark: #32333e;
--selected-item-color-dark: #2E2F38;
}
@font-face {
@ -131,7 +131,7 @@ gradio-app > :first-child {
}
.header_bar {
box-shadow: 0 0 3px rgba(22 22 22 / 35%);
border-right: var(--input-border-width) solid var(--input-border-color);
margin-bottom: 0;
overflow-x: scroll;
text-wrap: nowrap;
@ -265,7 +265,7 @@ button {
.dark .pretty_scrollbar::-webkit-scrollbar-thumb,
.dark .pretty_scrollbar::-webkit-scrollbar-thumb:hover {
background: #ccc;
background: rgb(255 255 255 / 10%);
border-radius: 10px;
}
@ -419,6 +419,14 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
padding-right: 1rem;
}
.chat .message .timestamp {
font-size: 0.7em;
display: inline-block;
font-weight: normal;
opacity: 0.7;
margin-left: 5px;
}
.chat-parent.bigchat {
flex: 1;
}
@ -584,6 +592,7 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
padding: 0.65rem 2.5rem;
border: 0;
box-shadow: 0;
border-radius: 8px;
}
#chat-input textarea::placeholder {
@ -603,6 +612,16 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
display: none;
}
#chat-input .submit-button {
display: none;
}
#chat-input .upload-button {
margin-right: 16px;
margin-bottom: 7px;
background: transparent;
}
.chat-input-positioned {
max-width: 54rem;
left: 50%;
@ -827,7 +846,7 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
}
#chat-col.bigchat {
padding-bottom: 80px !important;
padding-bottom: 15px !important;
}
.message-body ol, .message-body ul {
@ -1171,11 +1190,11 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
background-color: var(--light-theme-gray);
}
#chat-controls {
.dark #chat-controls {
border-left: 1px solid #d9d9d0;
}
#past-chats-row {
.dark #past-chats-row {
border-right: 1px solid #d9d9d0;
}
@ -1236,42 +1255,31 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
position: relative;
}
.footer-button {
/* New container for the buttons */
.message-actions {
position: absolute;
bottom: -23px;
left: 0;
display: flex;
gap: 5px;
opacity: 0;
transition: opacity 0.2s;
}
.footer-button {
padding: 0;
margin: 0;
border: none;
border-radius: 3px;
cursor: pointer;
opacity: 0;
display: flex;
align-items: center;
transition: opacity 0.2s;
justify-content: center;
}
.footer-button.footer-copy-button {
bottom: -23px;
left: 0;
}
.footer-button.footer-refresh-button {
bottom: -23px;
left: 25px;
}
.footer-button.footer-continue-button {
bottom: -23px;
left: 50px;
}
.footer-button.footer-remove-button {
bottom: -23px;
left: 75px;
}
.message:hover .footer-button,
.user-message:hover .footer-button,
.assistant-message:hover .footer-button {
.message:hover .message-actions,
.user-message:hover .message-actions,
.assistant-message:hover .message-actions {
opacity: 1;
}
@ -1362,6 +1370,11 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
contain: layout;
}
.chat .message-body .thinking-content p,
.chat .message-body .thinking-content li {
font-size: 15px !important;
}
/* Animation for opening thinking blocks */
@keyframes fadeIn {
from { opacity: 0; }
@ -1398,3 +1411,143 @@ strong {
.dark #vram-info .value {
color: #07ff07;
}
.message-attachments {
display: flex;
flex-wrap: wrap;
gap: 8px;
margin-top: 8px;
padding-bottom: 6px;
}
.attachment-box {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
padding: 8px;
background: rgb(0 0 0 / 5%);
border-radius: 6px;
border: 1px solid rgb(0 0 0 / 10%);
min-width: 80px;
max-width: 120px;
}
.attachment-icon {
margin-bottom: 4px;
color: #555;
}
.attachment-name {
font-size: 0.8em;
text-align: center;
word-break: break-word;
overflow: hidden;
text-overflow: ellipsis;
display: -webkit-box;
-webkit-line-clamp: 2;
-webkit-box-orient: vertical;
}
.dark .attachment-box {
background: rgb(255 255 255 / 5%);
border: 1px solid rgb(255 255 255 / 10%);
}
.dark .attachment-icon {
color: #ccc;
}
/* Message Editing Styles */
.editing-textarea {
width: 100%;
min-height: 200px;
max-height: 65vh;
padding: 10px;
border-radius: 5px;
border: 1px solid #ccc;
background-color: var(--light-theme-gray);
font-family: inherit;
font-size: inherit;
resize: vertical;
}
.dark .editing-textarea {
border: 1px solid var(--border-color-dark);
background-color: var(--darker-gray);
}
.editing-textarea:focus {
outline: none;
border-color: var(--selected-item-color-dark);
}
.edit-controls-container {
margin-top: 0;
display: flex;
gap: 8px;
padding-bottom: 8px;
}
.edit-control-button {
padding: 6px 12px;
border: 1px solid #ccc;
border-radius: 4px;
cursor: pointer;
background-color: #f8f9fa;
color: #212529;
font-size: 12px;
margin: 0;
}
.dark .edit-control-button {
border: 1px solid var(--border-color-dark);
background-color: var(--light-gray);
color: #efefef;
}
/* --- Simple Version Navigation --- */
.version-navigation {
position: absolute;
bottom: -23px;
right: 0;
display: flex;
align-items: center;
gap: 5px;
opacity: 0;
transition: opacity 0.2s;
}
.message:hover .version-navigation,
.user-message:hover .version-navigation,
.assistant-message:hover .version-navigation {
opacity: 1;
}
.version-nav-button {
padding: 2px 6px;
font-size: 12px;
min-width: auto;
}
.version-nav-button[disabled] {
opacity: 0.3;
cursor: not-allowed;
}
.version-position {
font-size: 11px;
color: currentcolor;
font-family: monospace;
min-width: 35px;
text-align: center;
opacity: 0.8;
user-select: none;
}
.token-display {
font-family: monospace;
font-size: 13px;
color: var(--body-text-color-subdued);
margin-top: 4px;
}

View file

@ -14,7 +14,7 @@ WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=B LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
COPY CMD_FLAGS.txt /home/app/text-generation-webui/
COPY /user_data/CMD_FLAGS.txt /home/app/text-generation-webui/user_data
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000} ${CONTAINER_API_STREAM_PORT:-5005}
WORKDIR /home/app/text-generation-webui
# set umask to ensure group read / write at runtime

View file

@ -41,14 +41,4 @@ services:
security_opt:
- seccomp=unconfined
volumes:
- ./cache:/home/app/text-generation-webui/cache
- ./characters:/home/app/text-generation-webui/characters
- ./extensions:/home/app/text-generation-webui/extensions
- ./loras:/home/app/text-generation-webui/loras
- ./logs:/home/app/text-generation-webui/logs
- ./models:/home/app/text-generation-webui/models
- ./presets:/home/app/text-generation-webui/presets
- ./prompts:/home/app/text-generation-webui/prompts
- ./softprompts:/home/app/text-generation-webui/softprompts
- ./training:/home/app/text-generation-webui/training
- ./cloudflared:/etc/cloudflared
- ./user_data:/home/app/text-generation-webui/user_data

View file

@ -14,7 +14,7 @@ WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=D LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
COPY CMD_FLAGS.txt /home/app/text-generation-webui/
COPY /user_data/CMD_FLAGS.txt /home/app/text-generation-webui/user_data
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000} ${CONTAINER_API_STREAM_PORT:-5005}
# set umask to ensure group read / write at runtime
WORKDIR /home/app/text-generation-webui

View file

@ -41,12 +41,4 @@ services:
security_opt:
- seccomp=unconfined
volumes:
- ./characters:/home/app/text-generation-webui/characters
- ./extensions:/home/app/text-generation-webui/extensions
- ./loras:/home/app/text-generation-webui/loras
- ./models:/home/app/text-generation-webui/models
- ./presets:/home/app/text-generation-webui/presets
- ./prompts:/home/app/text-generation-webui/prompts
- ./softprompts:/home/app/text-generation-webui/softprompts
- ./training:/home/app/text-generation-webui/training
- ./cloudflared:/etc/cloudflared
- ./user_data:/home/app/text-generation-webui/user_data

View file

@ -115,6 +115,7 @@ async def openai_completions(request: Request, request_data: CompletionRequest):
if request_data.stream:
async def generator():
async with streaming_semaphore:
try:
response = OAIcompletions.stream_completions(to_dict(request_data), is_legacy=is_legacy)
async for resp in iterate_in_threadpool(response):
disconnected = await request.is_disconnected()
@ -122,6 +123,10 @@ async def openai_completions(request: Request, request_data: CompletionRequest):
break
yield {"data": json.dumps(resp)}
finally:
stop_everything_event()
response.close()
return
return EventSourceResponse(generator()) # SSE streaming
@ -143,6 +148,7 @@ async def openai_chat_completions(request: Request, request_data: ChatCompletion
if request_data.stream:
async def generator():
async with streaming_semaphore:
try:
response = OAIcompletions.stream_chat_completions(to_dict(request_data), is_legacy=is_legacy)
async for resp in iterate_in_threadpool(response):
disconnected = await request.is_disconnected()
@ -150,6 +156,10 @@ async def openai_chat_completions(request: Request, request_data: ChatCompletion
break
yield {"data": json.dumps(resp)}
finally:
stop_everything_event()
response.close()
return
return EventSourceResponse(generator()) # SSE streaming

View file

@ -1,3 +1,7 @@
// -------------------------------------------------
// Event handlers
// -------------------------------------------------
function copyToClipboard(element) {
if (!element) return;
@ -18,6 +22,201 @@ function copyToClipboard(element) {
});
}
function branchHere(element) {
if (!element) return;
const messageElement = element.closest(".message, .user-message, .assistant-message");
if (!messageElement) return;
const index = messageElement.getAttribute("data-index");
if (!index) return;
const branchIndexInput = document.getElementById("Branch-index").querySelector("input");
if (!branchIndexInput) {
console.error("Element with ID 'Branch-index' not found.");
return;
}
const branchButton = document.getElementById("Branch");
if (!branchButton) {
console.error("Required element 'Branch' not found.");
return;
}
branchIndexInput.value = index;
// Trigger any 'change' or 'input' events Gradio might be listening for
const event = new Event("input", { bubbles: true });
branchIndexInput.dispatchEvent(event);
branchButton.click();
}
// -------------------------------------------------
// Message Editing Functions
// -------------------------------------------------
function editHere(buttonElement) {
if (!buttonElement) return;
const messageElement = buttonElement.closest(".message, .user-message, .assistant-message");
if (!messageElement) return;
const messageBody = messageElement.querySelector(".message-body");
if (!messageBody) return;
// If already editing, focus the textarea
const existingTextarea = messageBody.querySelector(".editing-textarea");
if (existingTextarea) {
existingTextarea.focus();
return;
}
// Determine role based on message element - handle different chat modes
const isUserMessage = messageElement.classList.contains("user-message") ||
messageElement.querySelector(".text-you") !== null ||
messageElement.querySelector(".circle-you") !== null;
startEditing(messageElement, messageBody, isUserMessage);
}
function startEditing(messageElement, messageBody, isUserMessage) {
const rawText = messageElement.getAttribute("data-raw") || messageBody.textContent;
const originalHTML = messageBody.innerHTML;
// Create editing interface
const editingInterface = createEditingInterface(rawText);
// Replace message content
messageBody.innerHTML = "";
messageBody.appendChild(editingInterface.textarea);
messageBody.appendChild(editingInterface.controls);
editingInterface.textarea.focus();
editingInterface.textarea.setSelectionRange(rawText.length, rawText.length);
// Setup event handlers
setupEditingHandlers(editingInterface.textarea, messageElement, originalHTML, messageBody, isUserMessage);
}
function createEditingInterface(text) {
const textarea = document.createElement("textarea");
textarea.value = text;
textarea.className = "editing-textarea";
textarea.rows = Math.max(3, text.split("\n").length);
const controls = document.createElement("div");
controls.className = "edit-controls-container";
const saveButton = document.createElement("button");
saveButton.textContent = "Save";
saveButton.className = "edit-control-button";
saveButton.type = "button";
const cancelButton = document.createElement("button");
cancelButton.textContent = "Cancel";
cancelButton.className = "edit-control-button edit-cancel-button";
cancelButton.type = "button";
controls.appendChild(saveButton);
controls.appendChild(cancelButton);
return { textarea, controls, saveButton, cancelButton };
}
function setupEditingHandlers(textarea, messageElement, originalHTML, messageBody, isUserMessage) {
const saveButton = messageBody.querySelector(".edit-control-button:not(.edit-cancel-button)");
const cancelButton = messageBody.querySelector(".edit-cancel-button");
const submitEdit = () => {
const index = messageElement.getAttribute("data-index");
if (!index || !submitMessageEdit(index, textarea.value, isUserMessage)) {
cancelEdit();
}
};
const cancelEdit = () => {
messageBody.innerHTML = originalHTML;
};
// Event handlers
saveButton.onclick = submitEdit;
cancelButton.onclick = cancelEdit;
textarea.onkeydown = (e) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
submitEdit();
} else if (e.key === "Escape") {
e.preventDefault();
cancelEdit();
}
};
}
function submitMessageEdit(index, newText, isUserMessage) {
const editIndexInput = document.getElementById("Edit-message-index")?.querySelector("input");
const editTextInput = document.getElementById("Edit-message-text")?.querySelector("textarea");
const editRoleInput = document.getElementById("Edit-message-role")?.querySelector("textarea");
const editButton = document.getElementById("Edit-message");
if (!editIndexInput || !editTextInput || !editRoleInput || !editButton) {
console.error("Edit elements not found");
return false;
}
editIndexInput.value = index;
editTextInput.value = newText;
editRoleInput.value = isUserMessage ? "user" : "assistant";
editIndexInput.dispatchEvent(new Event("input", { bubbles: true }));
editTextInput.dispatchEvent(new Event("input", { bubbles: true }));
editRoleInput.dispatchEvent(new Event("input", { bubbles: true }));
editButton.click();
return true;
}
function navigateVersion(element, direction) {
if (!element) return;
const messageElement = element.closest(".message, .user-message, .assistant-message");
if (!messageElement) return;
const index = messageElement.getAttribute("data-index");
if (!index) return;
// Determine role based on message element classes
let role = "assistant"; // Default role
if (messageElement.classList.contains("user-message") ||
messageElement.querySelector(".text-you") ||
messageElement.querySelector(".circle-you")) {
role = "user";
}
const indexInput = document.getElementById("Navigate-message-index")?.querySelector("input");
const directionInput = document.getElementById("Navigate-direction")?.querySelector("textarea");
const roleInput = document.getElementById("Navigate-message-role")?.querySelector("textarea");
const navigateButton = document.getElementById("Navigate-version");
if (!indexInput || !directionInput || !roleInput || !navigateButton) {
console.error("Navigation control elements (index, direction, role, or button) not found.");
return;
}
indexInput.value = index;
directionInput.value = direction;
roleInput.value = role;
// Trigger 'input' events for Gradio to pick up changes
const event = new Event("input", { bubbles: true });
indexInput.dispatchEvent(event);
directionInput.dispatchEvent(event);
roleInput.dispatchEvent(event);
navigateButton.click();
}
function regenerateClick() {
document.getElementById("Regenerate").click();
}

View file

@ -1,3 +1,7 @@
// ------------------------------------------------
// Main
// ------------------------------------------------
let main_parent = document.getElementById("chat-tab").parentNode;
let extensions = document.getElementById("extensions");
@ -39,9 +43,24 @@ document.querySelector(".header_bar").addEventListener("click", function(event)
//------------------------------------------------
// Keyboard shortcuts
//------------------------------------------------
// --- Helper functions --- //
function isModifiedKeyboardEvent() {
return (event instanceof KeyboardEvent &&
event.shiftKey ||
event.ctrlKey ||
event.altKey ||
event.metaKey);
}
function isFocusedOnEditableTextbox() {
if (event.target.tagName === "INPUT" || event.target.tagName === "TEXTAREA") {
return !!event.target.value;
}
}
let previousTabId = "chat-tab-button";
document.addEventListener("keydown", function(event) {
// Stop generation on Esc pressed
if (event.key === "Escape") {
// Find the element with id 'stop' and click it
@ -49,10 +68,15 @@ document.addEventListener("keydown", function(event) {
if (stopButton) {
stopButton.click();
}
return;
}
if (!document.querySelector("#chat-tab").checkVisibility() ) {
return;
}
// Show chat controls on Ctrl + S
else if (event.ctrlKey && event.key == "s") {
if (event.ctrlKey && event.key == "s") {
event.preventDefault();
var showControlsElement = document.getElementById("show-controls");
@ -82,24 +106,29 @@ document.addEventListener("keydown", function(event) {
document.getElementById("Remove-last").click();
}
// Copy last on Ctrl + Shift + K
else if (event.ctrlKey && event.shiftKey && event.key === "K") {
event.preventDefault();
document.getElementById("Copy-last").click();
}
// Replace last on Ctrl + Shift + L
else if (event.ctrlKey && event.shiftKey && event.key === "L") {
event.preventDefault();
document.getElementById("Replace-last").click();
}
// Impersonate on Ctrl + Shift + M
else if (event.ctrlKey && event.shiftKey && event.key === "M") {
event.preventDefault();
document.getElementById("Impersonate").click();
}
// --- Simple version navigation --- //
if (!isFocusedOnEditableTextbox()) {
// Version navigation on Arrow keys (horizontal)
if (!isModifiedKeyboardEvent() && event.key === "ArrowLeft") {
event.preventDefault();
navigateLastAssistantMessage("left");
}
else if (!isModifiedKeyboardEvent() && event.key === "ArrowRight") {
event.preventDefault();
if (!navigateLastAssistantMessage("right")) {
// If can't navigate right (last version), regenerate
document.getElementById("Regenerate").click();
}
}
}
});
//------------------------------------------------
@ -132,8 +161,6 @@ targetElement.addEventListener("scroll", function() {
// Create a MutationObserver instance
const observer = new MutationObserver(function(mutations) {
updateCssProperties();
if (targetElement.classList.contains("_generating")) {
typing.parentNode.classList.add("visible-dots");
document.getElementById("stop").style.display = "flex";
@ -144,7 +171,6 @@ const observer = new MutationObserver(function(mutations) {
document.getElementById("Generate").style.display = "flex";
}
doSyntaxHighlighting();
if (!isScrolled && targetElement.scrollTop !== targetElement.scrollHeight) {
@ -157,7 +183,10 @@ const observer = new MutationObserver(function(mutations) {
const lastChild = messagesContainer?.lastElementChild;
const prevSibling = lastChild?.previousElementSibling;
if (lastChild && prevSibling) {
lastChild.style.minHeight = `calc(max(70vh, 100vh - ${prevSibling.offsetHeight}px - 102px))`;
lastChild.style.setProperty("margin-bottom",
`max(0px, calc(max(70vh, 100vh - ${prevSibling.offsetHeight}px - 102px) - ${lastChild.offsetHeight}px))`,
"important"
);
}
}
});
@ -446,32 +475,6 @@ const chatInput = document.querySelector("#chat-input textarea");
// Variables to store current dimensions
let currentChatInputHeight = chatInput.clientHeight;
// Update chat layout based on chat and input dimensions
function updateCssProperties() {
const chatInputHeight = chatInput.clientHeight;
// Check if the chat container is visible
if (chatContainer.clientHeight > 0) {
// Adjust scrollTop based on input height change
if (chatInputHeight !== currentChatInputHeight) {
const deltaHeight = chatInputHeight - currentChatInputHeight;
if (!isScrolled && deltaHeight < 0) {
chatContainer.scrollTop = chatContainer.scrollHeight;
} else {
chatContainer.scrollTop += deltaHeight;
}
currentChatInputHeight = chatInputHeight;
}
}
}
// Observe textarea size changes and call update function
new ResizeObserver(updateCssProperties).observe(document.querySelector("#chat-input textarea"));
// Handle changes in window size
window.addEventListener("resize", updateCssProperties);
//------------------------------------------------
// Focus on the rename text area when it becomes visible
//------------------------------------------------
@ -817,3 +820,55 @@ function createMobileTopBar() {
}
createMobileTopBar();
//------------------------------------------------
// Simple Navigation Functions
//------------------------------------------------
function navigateLastAssistantMessage(direction) {
const chat = document.querySelector("#chat");
if (!chat) return false;
const messages = chat.querySelectorAll("[data-index]");
if (messages.length === 0) return false;
// Find the last assistant message (starting from the end)
let lastAssistantMessage = null;
for (let i = messages.length - 1; i >= 0; i--) {
const msg = messages[i];
if (
msg.classList.contains("assistant-message") ||
msg.querySelector(".circle-bot") ||
msg.querySelector(".text-bot")
) {
lastAssistantMessage = msg;
break;
}
}
if (!lastAssistantMessage) return false;
const buttons = lastAssistantMessage.querySelectorAll(".version-nav-button");
for (let i = 0; i < buttons.length; i++) {
const button = buttons[i];
const onclick = button.getAttribute("onclick");
const disabled = button.hasAttribute("disabled");
const isLeft = onclick && onclick.includes("'left'");
const isRight = onclick && onclick.includes("'right'");
if (!disabled) {
if (direction === "left" && isLeft) {
navigateVersion(button, direction);
return true;
}
if (direction === "right" && isRight) {
navigateVersion(button, direction);
return true;
}
}
}
return false;
}

View file

@ -31,12 +31,37 @@ from modules.text_generation import (
get_max_prompt_length
)
from modules.utils import delete_file, get_available_characters, save_file
from modules.web_search import add_web_search_attachments
def strftime_now(format):
return datetime.now().strftime(format)
def get_current_timestamp():
"""Returns the current time in 24-hour format"""
return datetime.now().strftime('%b %d, %Y %H:%M')
def update_message_metadata(metadata_dict, role, index, **fields):
"""
Updates or adds metadata fields for a specific message.
Args:
metadata_dict: The metadata dictionary
role: The role (user, assistant, etc)
index: The message index
**fields: Arbitrary metadata fields to update/add
"""
key = f"{role}_{index}"
if key not in metadata_dict:
metadata_dict[key] = {}
# Update with provided fields
for field_name, field_value in fields.items():
metadata_dict[key][field_name] = field_value
jinja_env = ImmutableSandboxedEnvironment(
trim_blocks=True,
lstrip_blocks=True,
@ -133,7 +158,9 @@ def generate_chat_prompt(user_input, state, **kwargs):
impersonate = kwargs.get('impersonate', False)
_continue = kwargs.get('_continue', False)
also_return_rows = kwargs.get('also_return_rows', False)
history = kwargs.get('history', state['history'])['internal']
history_data = kwargs.get('history', state['history'])
history = history_data['internal']
metadata = history_data.get('metadata', {})
# Templates
chat_template_str = state['chat_template_str']
@ -172,11 +199,13 @@ def generate_chat_prompt(user_input, state, **kwargs):
messages.append({"role": "system", "content": context})
insert_pos = len(messages)
for entry in reversed(history):
for i, entry in enumerate(reversed(history)):
user_msg = entry[0].strip()
assistant_msg = entry[1].strip()
tool_msg = entry[2].strip() if len(entry) > 2 else ''
row_idx = len(history) - i - 1
if tool_msg:
messages.insert(insert_pos, {"role": "tool", "content": tool_msg})
@ -184,10 +213,48 @@ def generate_chat_prompt(user_input, state, **kwargs):
messages.insert(insert_pos, {"role": "assistant", "content": assistant_msg})
if user_msg not in ['', '<|BEGIN-VISIBLE-CHAT|>']:
messages.insert(insert_pos, {"role": "user", "content": user_msg})
# Check for user message attachments in metadata
user_key = f"user_{row_idx}"
enhanced_user_msg = user_msg
# Add attachment content if present
if user_key in metadata and "attachments" in metadata[user_key]:
attachments_text = ""
for attachment in metadata[user_key]["attachments"]:
filename = attachment.get("name", "file")
content = attachment.get("content", "")
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
if attachments_text:
enhanced_user_msg = f"{user_msg}\n\nATTACHMENTS:\n{attachments_text}"
messages.insert(insert_pos, {"role": "user", "content": enhanced_user_msg})
user_input = user_input.strip()
if user_input and not impersonate and not _continue:
# Check if we have attachments even with empty input
has_attachments = False
if not impersonate and not _continue and len(history_data.get('metadata', {})) > 0:
current_row_idx = len(history)
user_key = f"user_{current_row_idx}"
has_attachments = user_key in metadata and "attachments" in metadata[user_key]
if (user_input or has_attachments) and not impersonate and not _continue:
# For the current user input being processed, check if we need to add attachments
if not impersonate and not _continue and len(history_data.get('metadata', {})) > 0:
current_row_idx = len(history)
user_key = f"user_{current_row_idx}"
if user_key in metadata and "attachments" in metadata[user_key]:
attachments_text = ""
for attachment in metadata[user_key]["attachments"]:
filename = attachment.get("name", "file")
content = attachment.get("content", "")
attachments_text += f"\nName: {filename}\nContents:\n\n=====\n{content}\n=====\n\n"
if attachments_text:
user_input = f"{user_input}\n\nATTACHMENTS:\n{attachments_text}"
messages.append({"role": "user", "content": user_input})
def make_prompt(messages):
@ -256,7 +323,6 @@ def generate_chat_prompt(user_input, state, **kwargs):
# Resort to truncating the user input
else:
user_message = messages[-1]['content']
# Bisect the truncation point
@ -293,6 +359,50 @@ def generate_chat_prompt(user_input, state, **kwargs):
return prompt
def count_prompt_tokens(text_input, state):
"""Count tokens for current history + input including attachments"""
if shared.tokenizer is None:
return "Tokenizer not available"
try:
# Handle dict format with text and files
files = []
if isinstance(text_input, dict):
files = text_input.get('files', [])
text = text_input.get('text', '')
else:
text = text_input
files = []
# Create temporary history copy to add attachments
temp_history = copy.deepcopy(state['history'])
if 'metadata' not in temp_history:
temp_history['metadata'] = {}
# Process attachments if any
if files:
row_idx = len(temp_history['internal'])
for file_path in files:
add_message_attachment(temp_history, row_idx, file_path, is_user=True)
# Create temp state with modified history
temp_state = copy.deepcopy(state)
temp_state['history'] = temp_history
# Build prompt using existing logic
prompt = generate_chat_prompt(text, temp_state)
current_tokens = get_encoded_length(prompt)
max_tokens = temp_state['truncation_length']
percentage = (current_tokens / max_tokens) * 100 if max_tokens > 0 else 0
return f"History + Input:<br/>{current_tokens:,} / {max_tokens:,} tokens ({percentage:.1f}%)"
except Exception as e:
logger.error(f"Error counting tokens: {e}")
return f"Error: {str(e)}"
def get_stopping_strings(state):
stopping_strings = []
renderers = []
@ -341,12 +451,130 @@ def get_stopping_strings(state):
return result
def add_message_version(history, role, row_idx, is_current=True):
key = f"{role}_{row_idx}"
if 'metadata' not in history:
history['metadata'] = {}
if key not in history['metadata']:
history['metadata'][key] = {}
if "versions" not in history['metadata'][key]:
history['metadata'][key]["versions"] = []
# Determine which index to use for content based on role
content_idx = 0 if role == 'user' else 1
current_content = history['internal'][row_idx][content_idx]
current_visible = history['visible'][row_idx][content_idx]
history['metadata'][key]["versions"].append({
"content": current_content,
"visible_content": current_visible,
"timestamp": get_current_timestamp()
})
if is_current:
# Set the current_version_index to the newly added version (which is now the last one).
history['metadata'][key]["current_version_index"] = len(history['metadata'][key]["versions"]) - 1
def add_message_attachment(history, row_idx, file_path, is_user=True):
"""Add a file attachment to a message in history metadata"""
if 'metadata' not in history:
history['metadata'] = {}
key = f"{'user' if is_user else 'assistant'}_{row_idx}"
if key not in history['metadata']:
history['metadata'][key] = {"timestamp": get_current_timestamp()}
if "attachments" not in history['metadata'][key]:
history['metadata'][key]["attachments"] = []
# Get file info using pathlib
path = Path(file_path)
filename = path.name
file_extension = path.suffix.lower()
try:
# Handle different file types
if file_extension == '.pdf':
# Process PDF file
content = extract_pdf_text(path)
file_type = "application/pdf"
else:
# Default handling for text files
with open(path, 'r', encoding='utf-8') as f:
content = f.read()
file_type = "text/plain"
# Add attachment
attachment = {
"name": filename,
"type": file_type,
"content": content,
}
history['metadata'][key]["attachments"].append(attachment)
return content # Return the content for reuse
except Exception as e:
logger.error(f"Error processing attachment {filename}: {e}")
return None
def extract_pdf_text(pdf_path):
"""Extract text from a PDF file"""
import PyPDF2
text = ""
try:
with open(pdf_path, 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
for page_num in range(len(pdf_reader.pages)):
page = pdf_reader.pages[page_num]
text += page.extract_text() + "\n\n"
return text.strip()
except Exception as e:
logger.error(f"Error extracting text from PDF: {e}")
return f"[Error extracting PDF text: {str(e)}]"
def generate_search_query(user_message, state):
"""Generate a search query from user message using the LLM"""
# Augment the user message with search instruction
augmented_message = f"{user_message}\n\n=====\n\nPlease turn the message above into a short web search query in the same language as the message. Respond with only the search query, nothing else."
# Use a minimal state for search query generation but keep the full history
search_state = state.copy()
search_state['max_new_tokens'] = 64
search_state['auto_max_new_tokens'] = False
search_state['enable_thinking'] = False
# Generate the full prompt using existing history + augmented message
formatted_prompt = generate_chat_prompt(augmented_message, search_state)
query = ""
for reply in generate_reply(formatted_prompt, search_state, stopping_strings=[], is_chat=True):
query = reply.strip()
return query
def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_message=True, for_ui=False):
# Handle dict format with text and files
files = []
if isinstance(text, dict):
files = text.get('files', [])
text = text.get('text', '')
history = state['history']
output = copy.deepcopy(history)
output = apply_extensions('history', output)
state = apply_extensions('state', state)
# Initialize metadata if not present
if 'metadata' not in output:
output['metadata'] = {}
visible_text = None
stopping_strings = get_stopping_strings(state)
is_stream = state['stream']
@ -355,44 +583,85 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
if not (regenerate or _continue):
visible_text = html.escape(text)
# Process file attachments and store in metadata
row_idx = len(output['internal'])
# Add attachments to metadata only, not modifying the message text
for file_path in files:
add_message_attachment(output, row_idx, file_path, is_user=True)
# Add web search results as attachments if enabled
if state.get('enable_web_search', False):
search_query = generate_search_query(text, state)
add_web_search_attachments(output, row_idx, text, search_query, state)
# Apply extensions
text, visible_text = apply_extensions('chat_input', text, visible_text, state)
text = apply_extensions('input', text, state, is_chat=True)
# Current row index
output['internal'].append([text, ''])
output['visible'].append([visible_text, ''])
# Add metadata with timestamp
update_message_metadata(output['metadata'], "user", row_idx, timestamp=get_current_timestamp())
# *Is typing...*
if loading_message:
yield {
'visible': output['visible'][:-1] + [[output['visible'][-1][0], shared.processing_message]],
'internal': output['internal']
'internal': output['internal'],
'metadata': output['metadata']
}
else:
text, visible_text = output['internal'][-1][0], output['visible'][-1][0]
if regenerate:
row_idx = len(output['internal']) - 1
# Store the old response as a version before regenerating
if not output['metadata'].get(f"assistant_{row_idx}", {}).get('versions'):
add_message_version(output, "assistant", row_idx, is_current=False)
# Add new empty version (will be filled during streaming)
key = f"assistant_{row_idx}"
output['metadata'][key]["versions"].append({
"content": "",
"visible_content": "",
"timestamp": get_current_timestamp()
})
output['metadata'][key]["current_version_index"] = len(output['metadata'][key]["versions"]) - 1
if loading_message:
yield {
'visible': output['visible'][:-1] + [[visible_text, shared.processing_message]],
'internal': output['internal'][:-1] + [[text, '']]
'internal': output['internal'][:-1] + [[text, '']],
'metadata': output['metadata']
}
elif _continue:
last_reply = [output['internal'][-1][1], output['visible'][-1][1]]
if loading_message:
yield {
'visible': output['visible'][:-1] + [[visible_text, last_reply[1] + '...']],
'internal': output['internal']
'internal': output['internal'],
'metadata': output['metadata']
}
# Generate the prompt
kwargs = {
'_continue': _continue,
'history': output if _continue else {k: v[:-1] for k, v in output.items()}
'history': output if _continue else {
k: (v[:-1] if k in ['internal', 'visible'] else v)
for k, v in output.items()
}
}
prompt = apply_extensions('custom_generate_chat_prompt', text, state, **kwargs)
if prompt is None:
prompt = generate_chat_prompt(text, state, **kwargs)
# Add timestamp for assistant's response at the start of generation
row_idx = len(output['internal']) - 1
update_message_metadata(output['metadata'], "assistant", row_idx, timestamp=get_current_timestamp())
# Generate
reply = None
for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
@ -413,28 +682,51 @@ def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_mess
if _continue:
output['internal'][-1] = [text, last_reply[0] + reply]
output['visible'][-1] = [visible_text, last_reply[1] + visible_reply]
if is_stream:
yield output
elif not (j == 0 and visible_reply.strip() == ''):
output['internal'][-1] = [text, reply.lstrip(' ')]
output['visible'][-1] = [visible_text, visible_reply.lstrip(' ')]
# Keep version metadata in sync during streaming (for regeneration)
if regenerate:
row_idx = len(output['internal']) - 1
key = f"assistant_{row_idx}"
current_idx = output['metadata'][key]['current_version_index']
output['metadata'][key]['versions'][current_idx].update({
'content': output['internal'][row_idx][1],
'visible_content': output['visible'][row_idx][1]
})
if is_stream:
yield output
output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
# Final sync for version metadata (in case streaming was disabled)
if regenerate:
row_idx = len(output['internal']) - 1
key = f"assistant_{row_idx}"
current_idx = output['metadata'][key]['current_version_index']
output['metadata'][key]['versions'][current_idx].update({
'content': output['internal'][row_idx][1],
'visible_content': output['visible'][row_idx][1]
})
yield output
def impersonate_wrapper(text, state):
def impersonate_wrapper(textbox, state):
text = textbox['text']
static_output = chat_html_wrapper(state['history'], state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
prompt = generate_chat_prompt('', state, impersonate=True)
stopping_strings = get_stopping_strings(state)
yield text + '...', static_output
textbox['text'] = text + '...'
yield textbox, static_output
reply = None
for reply in generate_reply(prompt + text, state, stopping_strings=stopping_strings, is_chat=True):
yield (text + reply).lstrip(' '), static_output
textbox['text'] = (text + reply).lstrip(' ')
yield textbox, static_output
if shared.stop_everything:
return
@ -495,49 +787,60 @@ def generate_chat_reply_wrapper(text, state, regenerate=False, _continue=False):
def remove_last_message(history):
if 'metadata' not in history:
history['metadata'] = {}
if len(history['visible']) > 0 and history['internal'][-1][0] != '<|BEGIN-VISIBLE-CHAT|>':
row_idx = len(history['internal']) - 1
last = history['visible'].pop()
history['internal'].pop()
# Remove metadata directly by known keys
if f"user_{row_idx}" in history['metadata']:
del history['metadata'][f"user_{row_idx}"]
if f"assistant_{row_idx}" in history['metadata']:
del history['metadata'][f"assistant_{row_idx}"]
else:
last = ['', '']
return html.unescape(last[0]), history
def send_last_reply_to_input(history):
if len(history['visible']) > 0:
return html.unescape(history['visible'][-1][1])
else:
return ''
def replace_last_reply(text, state):
def send_dummy_message(textbox, state):
history = state['history']
text = textbox['text']
if len(text.strip()) == 0:
return history
elif len(history['visible']) > 0:
history['visible'][-1][1] = html.escape(text)
history['internal'][-1][1] = apply_extensions('input', text, state, is_chat=True)
# Initialize metadata if not present
if 'metadata' not in history:
history['metadata'] = {}
return history
def send_dummy_message(text, state):
history = state['history']
row_idx = len(history['internal'])
history['visible'].append([html.escape(text), ''])
history['internal'].append([apply_extensions('input', text, state, is_chat=True), ''])
update_message_metadata(history['metadata'], "user", row_idx, timestamp=get_current_timestamp())
return history
def send_dummy_reply(text, state):
def send_dummy_reply(textbox, state):
history = state['history']
text = textbox['text']
# Initialize metadata if not present
if 'metadata' not in history:
history['metadata'] = {}
if len(history['visible']) > 0 and not history['visible'][-1][1] == '':
row_idx = len(history['internal'])
history['visible'].append(['', ''])
history['internal'].append(['', ''])
# We don't need to add system metadata
row_idx = len(history['internal']) - 1
history['visible'][-1][1] = html.escape(text)
history['internal'][-1][1] = apply_extensions('input', text, state, is_chat=True)
update_message_metadata(history['metadata'], "assistant", row_idx, timestamp=get_current_timestamp())
return history
@ -547,7 +850,8 @@ def redraw_html(history, name1, name2, mode, style, character, reset_cache=False
def start_new_chat(state):
mode = state['mode']
history = {'internal': [], 'visible': []}
# Initialize with empty metadata dictionary
history = {'internal': [], 'visible': [], 'metadata': {}}
if mode != 'instruct':
greeting = replace_character_names(state['greeting'], state['name1'], state['name2'])
@ -555,6 +859,9 @@ def start_new_chat(state):
history['internal'] += [['<|BEGIN-VISIBLE-CHAT|>', greeting]]
history['visible'] += [['', apply_extensions('output', html.escape(greeting), state, is_chat=True)]]
# Add timestamp for assistant's greeting
update_message_metadata(history['metadata'], "assistant", 0, timestamp=get_current_timestamp())
unique_id = datetime.now().strftime('%Y%m%d-%H-%M-%S')
save_history(history, unique_id, state['character_menu'], state['mode'])
@ -735,6 +1042,16 @@ def load_history(unique_id, character, mode):
'visible': f['data_visible']
}
# Add metadata if it doesn't exist
if 'metadata' not in history:
history['metadata'] = {}
# Add placeholder timestamps for existing messages
for i, (user_msg, asst_msg) in enumerate(history['internal']):
if user_msg and user_msg != '<|BEGIN-VISIBLE-CHAT|>':
update_message_metadata(history['metadata'], "user", i, timestamp="")
if asst_msg:
update_message_metadata(history['metadata'], "assistant", i, timestamp="")
return history
@ -750,6 +1067,16 @@ def load_history_json(file, history):
'visible': f['data_visible']
}
# Add metadata if it doesn't exist
if 'metadata' not in history:
history['metadata'] = {}
# Add placeholder timestamps
for i, (user_msg, asst_msg) in enumerate(history['internal']):
if user_msg and user_msg != '<|BEGIN-VISIBLE-CHAT|>':
update_message_metadata(history['metadata'], "user", i, timestamp="")
if asst_msg:
update_message_metadata(history['metadata'], "assistant", i, timestamp="")
return history
except:
return history
@ -1071,20 +1398,12 @@ def my_yaml_output(data):
return result
def handle_replace_last_reply_click(text, state):
history = replace_last_reply(text, state)
save_history(history, state['unique_id'], state['character_menu'], state['mode'])
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html, ""]
def handle_send_dummy_message_click(text, state):
history = send_dummy_message(text, state)
save_history(history, state['unique_id'], state['character_menu'], state['mode'])
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html, ""]
return [history, html, {"text": "", "files": []}]
def handle_send_dummy_reply_click(text, state):
@ -1092,7 +1411,7 @@ def handle_send_dummy_reply_click(text, state):
save_history(history, state['unique_id'], state['character_menu'], state['mode'])
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html, ""]
return [history, html, {"text": "", "files": []}]
def handle_remove_last_click(state):
@ -1100,7 +1419,7 @@ def handle_remove_last_click(state):
save_history(history, state['unique_id'], state['character_menu'], state['mode'])
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html, last_input]
return [history, html, {"text": last_input, "files": []}]
def handle_unique_id_select(state):
@ -1146,7 +1465,13 @@ def handle_delete_chat_confirm_click(state):
def handle_branch_chat_click(state):
branch_from_index = state['branch_index']
if branch_from_index == -1:
history = state['history']
else:
history = state['history']
history['visible'] = history['visible'][:branch_from_index + 1]
history['internal'] = history['internal'][:branch_from_index + 1]
new_unique_id = datetime.now().strftime('%Y%m%d-%H-%M-%S')
save_history(history, new_unique_id, state['character_menu'], state['mode'])
@ -1157,7 +1482,93 @@ def handle_branch_chat_click(state):
past_chats_update = gr.update(choices=histories, value=new_unique_id)
return [history, html, past_chats_update]
return [history, html, past_chats_update, -1]
def handle_edit_message_click(state):
history = state['history']
message_index = int(state['edit_message_index'])
new_text = state['edit_message_text']
role = state['edit_message_role'] # "user" or "assistant"
if message_index >= len(history['internal']):
html_output = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html_output]
role_idx = 0 if role == "user" else 1
if 'metadata' not in history:
history['metadata'] = {}
key = f"{role}_{message_index}"
if key not in history['metadata']:
history['metadata'][key] = {}
# If no versions exist yet for this message, store the current (pre-edit) content as the first version.
if "versions" not in history['metadata'][key] or not history['metadata'][key]["versions"]:
original_content = history['internal'][message_index][role_idx]
original_visible = history['visible'][message_index][role_idx]
original_timestamp = history['metadata'][key].get('timestamp', get_current_timestamp())
history['metadata'][key]["versions"] = [{
"content": original_content,
"visible_content": original_visible,
"timestamp": original_timestamp
}]
history['internal'][message_index][role_idx] = apply_extensions('input', new_text, state, is_chat=True)
history['visible'][message_index][role_idx] = html.escape(new_text)
add_message_version(history, role, message_index, is_current=True)
save_history(history, state['unique_id'], state['character_menu'], state['mode'])
html_output = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html_output]
def handle_navigate_version_click(state):
history = state['history']
message_index = int(state['navigate_message_index'])
direction = state['navigate_direction']
role = state['navigate_message_role']
if not role:
logger.error("Role not provided for version navigation.")
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html]
key = f"{role}_{message_index}"
if 'metadata' not in history or key not in history['metadata'] or 'versions' not in history['metadata'][key]:
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html]
metadata = history['metadata'][key]
versions = metadata['versions']
# Default to the last version if current_version_index is not set
current_idx = metadata.get('current_version_index', len(versions) - 1 if versions else 0)
if direction == 'left':
new_idx = max(0, current_idx - 1)
else: # right
new_idx = min(len(versions) - 1, current_idx + 1)
if new_idx == current_idx:
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
return [history, html]
msg_content_idx = 0 if role == 'user' else 1 # 0 for user content, 1 for assistant content in the pair
version_to_load = versions[new_idx]
history['internal'][message_index][msg_content_idx] = version_to_load['content']
history['visible'][message_index][msg_content_idx] = version_to_load['visible_content']
metadata['current_version_index'] = new_idx
update_message_metadata(history['metadata'], role, message_index, timestamp=version_to_load['timestamp'])
# Redraw and save
html = redraw_html(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
save_history(history, state['unique_id'], state['character_menu'], state['mode'])
return [history, html]
def handle_rename_chat_click():
@ -1299,7 +1710,7 @@ def handle_your_picture_change(picture, state):
def handle_send_instruction_click(state):
state['mode'] = 'instruct'
state['history'] = {'internal': [], 'visible': []}
state['history'] = {'internal': [], 'visible': [], 'metadata': {}}
output = generate_chat_prompt("Input", state)

View file

@ -169,11 +169,7 @@ def convert_to_markdown(string, message_id=None):
thinking_block = f'''
<details class="thinking-block" data-block-id="{block_id}" data-streaming="{str(is_streaming).lower()}">
<summary class="thinking-header">
<svg class="thinking-icon" width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
<path d="M8 1.33334C4.31868 1.33334 1.33334 4.31868 1.33334 8.00001C1.33334 11.6813 4.31868 14.6667 8 14.6667C11.6813 14.6667 14.6667 11.6813 14.6667 8.00001C14.6667 4.31868 11.6813 1.33334 8 1.33334Z" stroke="currentColor" stroke-width="1.33" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M8 10.6667V8.00001" stroke="currentColor" stroke-width="1.33" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M8 5.33334H8.00667" stroke="currentColor" stroke-width="1.33" stroke-linecap="round" stroke-linejoin="round"/>
</svg>
{info_svg_small}
<span class="thinking-title">{title_text}</span>
</summary>
<div class="thinking-content pretty_scrollbar">{thinking_html}</div>
@ -339,11 +335,112 @@ copy_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" vie
refresh_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="tabler-icon tabler-icon-repeat"><path d="M4 12v-3a3 3 0 0 1 3 -3h13m-3 -3l3 3l-3 3"></path><path d="M20 12v3a3 3 0 0 1 -3 3h-13m3 3l-3 -3l3 -3"></path></svg>'''
continue_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-tabler icons-tabler-outline icon-tabler-player-play"><path stroke="none" d="M0 0h24v24H0z" fill="none"/><path d="M7 4v16l13 -8z" /></svg>'''
remove_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-tabler icons-tabler-outline icon-tabler-trash"><path stroke="none" d="M0 0h24v24H0z" fill="none"/><path d="M4 7l16 0" /><path d="M10 11l0 6" /><path d="M14 11l0 6" /><path d="M5 7l1 12a2 2 0 0 0 2 2h8a2 2 0 0 0 2 -2l1 -12" /><path d="M9 7v-3a1 1 0 0 1 1 -1h4a1 1 0 0 1 1 1v3" /></svg>'''
branch_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-tabler icons-tabler-outline icon-tabler-git-branch"><path stroke="none" d="M0 0h24v24H0z" fill="none"/><path d="M7 18m-2 0a2 2 0 1 0 4 0a2 2 0 1 0 -4 0" /><path d="M7 6m-2 0a2 2 0 1 0 4 0a2 2 0 1 0 -4 0" /><path d="M17 6m-2 0a2 2 0 1 0 4 0a2 2 0 1 0 -4 0" /><path d="M7 8l0 8" /><path d="M9 18h6a2 2 0 0 0 2 -2v-5" /><path d="M14 14l3 -3l3 3" /></svg>'''
edit_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="tabler-icon tabler-icon-pencil"><path d="M4 20h4l10.5 -10.5a2.828 2.828 0 1 0 -4 -4l-10.5 10.5v4"></path><path d="M13.5 6.5l4 4"></path></svg>'''
info_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="thinking-icon tabler-icon tabler-icon-info-circle"><path stroke="none" d="M0 0h24v24H0z" fill="none"/><path d="M12 2a10 10 0 0 1 0 20a10 10 0 0 1 0 -20z" /><path d="M12 16v-4" /><path d="M12 8h.01" /></svg>'''
info_svg_small = '''<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="thinking-icon tabler-icon tabler-icon-info-circle"><path stroke="none" d="M0 0h24v24H0z" fill="none"/><path d="M12 2a10 10 0 0 1 0 20a10 10 0 0 1 0 -20z" /><path d="M12 16v-4" /><path d="M12 8h.01" /></svg>'''
attachment_svg = '''<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21.44 11.05l-9.19 9.19a6 6 0 0 1-8.48-8.48l9.19-9.19a4 4 0 0 1 5.66 5.66l-9.2 9.19a2 2 0 0 1-2.83-2.83l8.49-8.48"></path></svg>'''
copy_button = f'<button class="footer-button footer-copy-button" title="Copy" onclick="copyToClipboard(this)">{copy_svg}</button>'
branch_button = f'<button class="footer-button footer-branch-button" title="Branch here" onclick="branchHere(this)">{branch_svg}</button>'
edit_button = f'<button class="footer-button footer-edit-button" title="Edit" onclick="editHere(this)">{edit_svg}</button>'
refresh_button = f'<button class="footer-button footer-refresh-button" title="Regenerate" onclick="regenerateClick()">{refresh_svg}</button>'
continue_button = f'<button class="footer-button footer-continue-button" title="Continue" onclick="continueClick()">{continue_svg}</button>'
remove_button = f'<button class="footer-button footer-remove-button" title="Remove last reply" onclick="removeLastClick()">{remove_svg}</button>'
info_button = f'<button class="footer-button footer-info-button" title="message">{info_svg}</button>'
def format_message_timestamp(history, role, index):
"""Get a formatted timestamp HTML span for a message if available"""
key = f"{role}_{index}"
if 'metadata' in history and key in history['metadata'] and history['metadata'][key].get('timestamp'):
timestamp = history['metadata'][key]['timestamp']
return f"<span class='timestamp'>{timestamp}</span>"
return ""
def format_message_attachments(history, role, index):
"""Get formatted HTML for message attachments if available"""
key = f"{role}_{index}"
if 'metadata' in history and key in history['metadata'] and 'attachments' in history['metadata'][key]:
attachments = history['metadata'][key]['attachments']
if not attachments:
return ""
attachments_html = '<div class="message-attachments">'
for attachment in attachments:
name = html.escape(attachment["name"])
# Make clickable if URL exists
if "url" in attachment:
name = f'<a href="{html.escape(attachment["url"])}" target="_blank" rel="noopener noreferrer">{name}</a>'
attachments_html += (
f'<div class="attachment-box">'
f'<div class="attachment-icon">{attachment_svg}</div>'
f'<div class="attachment-name">{name}</div>'
f'</div>'
)
attachments_html += '</div>'
return attachments_html
return ""
def get_version_navigation_html(history, i, role):
"""Generate simple navigation arrows for message versions"""
key = f"{role}_{i}"
metadata = history.get('metadata', {})
if key not in metadata or 'versions' not in metadata[key]:
return ""
versions = metadata[key]['versions']
# Default to the last version if current_version_index isn't set in metadata
current_idx = metadata[key].get('current_version_index', len(versions) - 1 if versions else 0)
if len(versions) <= 1:
return ""
left_disabled = ' disabled' if current_idx == 0 else ''
right_disabled = ' disabled' if current_idx >= len(versions) - 1 else ''
left_arrow = f'<button class="footer-button version-nav-button"{left_disabled} onclick="navigateVersion(this, \'left\')" title="Previous version">&lt;</button>'
right_arrow = f'<button class="footer-button version-nav-button"{right_disabled} onclick="navigateVersion(this, \'right\')" title="Next version">&gt;</button>'
position = f'<span class="version-position">{current_idx + 1}/{len(versions)}</span>'
return f'<div class="version-navigation">{left_arrow}{position}{right_arrow}</div>'
def actions_html(history, i, role, info_message=""):
action_buttons = ""
version_nav_html = ""
if role == "assistant":
action_buttons = (
f'{copy_button}'
f'{edit_button}'
f'{refresh_button if i == len(history["visible"]) - 1 else ""}'
f'{continue_button if i == len(history["visible"]) - 1 else ""}'
f'{remove_button if i == len(history["visible"]) - 1 else ""}'
f'{branch_button}'
)
version_nav_html = get_version_navigation_html(history, i, "assistant")
elif role == "user":
action_buttons = (
f'{copy_button}'
f'{edit_button}'
)
version_nav_html = get_version_navigation_html(history, i, "user")
return (f'<div class="message-actions">'
f'{action_buttons}'
f'{info_message}'
f'</div>'
f'{version_nav_html}')
def generate_instruct_html(history):
@ -354,26 +451,48 @@ def generate_instruct_html(history):
row_internal = history['internal'][i]
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Get timestamps
user_timestamp = format_message_timestamp(history, "user", i)
assistant_timestamp = format_message_timestamp(history, "assistant", i)
# Get attachments
user_attachments = format_message_attachments(history, "user", i)
assistant_attachments = format_message_attachments(history, "assistant", i)
# Create info buttons for timestamps if they exist
info_message_user = ""
if user_timestamp != "":
# Extract the timestamp value from the span
user_timestamp_value = user_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_user = info_button.replace("message", user_timestamp_value)
info_message_assistant = ""
if assistant_timestamp != "":
# Extract the timestamp value from the span
assistant_timestamp_value = assistant_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_assistant = info_button.replace("message", assistant_timestamp_value)
if converted_visible[0]: # Don't display empty user messages
output += (
f'<div class="user-message" '
f'data-raw="{html.escape(row_internal[0], quote=True)}">'
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
f'data-index={i}>'
f'<div class="text">'
f'<div class="message-body">{converted_visible[0]}</div>'
f'{copy_button}'
f'{user_attachments}'
f'{actions_html(history, i, "user", info_message_user)}'
f'</div>'
f'</div>'
)
output += (
f'<div class="assistant-message" '
f'data-raw="{html.escape(row_internal[1], quote=True)}">'
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
f'data-index={i}>'
f'<div class="text">'
f'<div class="message-body">{converted_visible[1]}</div>'
f'{copy_button}'
f'{refresh_button if i == len(history["visible"]) - 1 else ""}'
f'{continue_button if i == len(history["visible"]) - 1 else ""}'
f'{remove_button if i == len(history["visible"]) - 1 else ""}'
f'{assistant_attachments}'
f'{actions_html(history, i, "assistant", info_message_assistant)}'
f'</div>'
f'</div>'
)
@ -401,30 +520,39 @@ def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=
row_internal = history['internal'][i]
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Get timestamps
user_timestamp = format_message_timestamp(history, "user", i)
assistant_timestamp = format_message_timestamp(history, "assistant", i)
# Get attachments
user_attachments = format_message_attachments(history, "user", i)
assistant_attachments = format_message_attachments(history, "assistant", i)
if converted_visible[0]: # Don't display empty user messages
output += (
f'<div class="message" '
f'data-raw="{html.escape(row_internal[0], quote=True)}">'
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
f'data-index={i}>'
f'<div class="circle-you">{img_me}</div>'
f'<div class="text">'
f'<div class="username">{name1}</div>'
f'<div class="username">{name1}{user_timestamp}</div>'
f'<div class="message-body">{converted_visible[0]}</div>'
f'{copy_button}'
f'{user_attachments}'
f'{actions_html(history, i, "user")}'
f'</div>'
f'</div>'
)
output += (
f'<div class="message" '
f'data-raw="{html.escape(row_internal[1], quote=True)}">'
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
f'data-index={i}>'
f'<div class="circle-bot">{img_bot}</div>'
f'<div class="text">'
f'<div class="username">{name2}</div>'
f'<div class="username">{name2}{assistant_timestamp}</div>'
f'<div class="message-body">{converted_visible[1]}</div>'
f'{copy_button}'
f'{refresh_button if i == len(history["visible"]) - 1 else ""}'
f'{continue_button if i == len(history["visible"]) - 1 else ""}'
f'{remove_button if i == len(history["visible"]) - 1 else ""}'
f'{assistant_attachments}'
f'{actions_html(history, i, "assistant")}'
f'</div>'
f'</div>'
)
@ -441,26 +569,48 @@ def generate_chat_html(history, name1, name2, reset_cache=False):
row_internal = history['internal'][i]
converted_visible = [convert_to_markdown_wrapped(entry, message_id=i, use_cache=i != len(history['visible']) - 1) for entry in row_visible]
# Get timestamps
user_timestamp = format_message_timestamp(history, "user", i)
assistant_timestamp = format_message_timestamp(history, "assistant", i)
# Get attachments
user_attachments = format_message_attachments(history, "user", i)
assistant_attachments = format_message_attachments(history, "assistant", i)
# Create info buttons for timestamps if they exist
info_message_user = ""
if user_timestamp != "":
# Extract the timestamp value from the span
user_timestamp_value = user_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_user = info_button.replace("message", user_timestamp_value)
info_message_assistant = ""
if assistant_timestamp != "":
# Extract the timestamp value from the span
assistant_timestamp_value = assistant_timestamp.split('>', 1)[1].split('<', 1)[0]
info_message_assistant = info_button.replace("message", assistant_timestamp_value)
if converted_visible[0]: # Don't display empty user messages
output += (
f'<div class="message" '
f'data-raw="{html.escape(row_internal[0], quote=True)}">'
f'data-raw="{html.escape(row_internal[0], quote=True)}"'
f'data-index={i}>'
f'<div class="text-you">'
f'<div class="message-body">{converted_visible[0]}</div>'
f'{copy_button}'
f'{user_attachments}'
f'{actions_html(history, i, "user", info_message_user)}'
f'</div>'
f'</div>'
)
output += (
f'<div class="message" '
f'data-raw="{html.escape(row_internal[1], quote=True)}">'
f'data-raw="{html.escape(row_internal[1], quote=True)}"'
f'data-index={i}>'
f'<div class="text-bot">'
f'<div class="message-body">{converted_visible[1]}</div>'
f'{copy_button}'
f'{refresh_button if i == len(history["visible"]) - 1 else ""}'
f'{continue_button if i == len(history["visible"]) - 1 else ""}'
f'{remove_button if i == len(history["visible"]) - 1 else ""}'
f'{assistant_attachments}'
f'{actions_html(history, i, "assistant", info_message_assistant)}'
f'</div>'
f'</div>'
)

View file

@ -90,11 +90,6 @@ loaders_and_params = OrderedDict({
'ctx_size_draft',
'speculative_decoding_accordion',
],
'HQQ': [
'hqq_backend',
'trust_remote_code',
'no_use_fast',
],
'TensorRT-LLM': [
'ctx_size',
'cpp_runner',
@ -158,7 +153,6 @@ def transformers_samplers():
loaders_samplers = {
'Transformers': transformers_samplers(),
'HQQ': transformers_samplers(),
'ExLlamav3_HF': {
'temperature',
'dynatemp_low',

View file

@ -21,7 +21,6 @@ def load_model(model_name, loader=None):
'ExLlamav3_HF': ExLlamav3_HF_loader,
'ExLlamav2_HF': ExLlamav2_HF_loader,
'ExLlamav2': ExLlamav2_loader,
'HQQ': HQQ_loader,
'TensorRT-LLM': TensorRT_LLM_loader,
}
@ -102,21 +101,6 @@ def ExLlamav2_loader(model_name):
return model, tokenizer
def HQQ_loader(model_name):
try:
from hqq.core.quantize import HQQBackend, HQQLinear
from hqq.models.hf.base import AutoHQQHFModel
except ModuleNotFoundError:
raise ModuleNotFoundError("Failed to import 'hqq'. Please install it manually following the instructions in the HQQ GitHub repository.")
logger.info(f"Loading HQQ model with backend: \"{shared.args.hqq_backend}\"")
model_dir = Path(f'{shared.args.model_dir}/{model_name}')
model = AutoHQQHFModel.from_quantized(str(model_dir))
HQQLinear.set_backend(getattr(HQQBackend, shared.args.hqq_backend))
return model
def TensorRT_LLM_loader(model_name):
try:
from modules.tensorrt_llm import TensorRTLLMModel

View file

@ -183,8 +183,6 @@ def infer_loader(model_name, model_settings, hf_quant_method=None):
loader = 'ExLlamav3_HF'
elif re.match(r'.*exl2', model_name.lower()):
loader = 'ExLlamav2_HF'
elif re.match(r'.*-hqq', model_name.lower()):
return 'HQQ'
else:
loader = 'Transformers'
@ -337,7 +335,7 @@ def estimate_vram(gguf_file, gpu_layers, ctx_size, cache_type):
if key.endswith('.block_count'):
n_layers = value
elif key.endswith('.attention.head_count_kv'):
n_kv_heads = value
n_kv_heads = max(value) if isinstance(value, list) else value
elif key.endswith('.embedding_length'):
embedding_dim = value
@ -440,7 +438,7 @@ def update_gpu_layers_and_vram(loader, model, gpu_layers, ctx_size, cache_type,
- If for_ui=False: (vram_usage, adjusted_layers) or just vram_usage
"""
if loader != 'llama.cpp' or model in ["None", None] or not model.endswith(".gguf"):
vram_info = "<div id=\"vram-info\"'>Estimated VRAM to load the model:</span>"
vram_info = "<div id=\"vram-info\"'>Estimated VRAM to load the model:</div>"
if for_ui:
return (vram_info, gr.update()) if auto_adjust else vram_info
else:
@ -482,7 +480,7 @@ def update_gpu_layers_and_vram(loader, model, gpu_layers, ctx_size, cache_type,
vram_usage = estimate_vram(model, current_layers, ctx_size, cache_type)
if for_ui:
vram_info = f"<div id=\"vram-info\"'>Estimated VRAM to load the model: <span class=\"value\">{vram_usage:.0f} MiB</span>"
vram_info = f"<div id=\"vram-info\"'>Estimated VRAM to load the model: <span class=\"value\">{vram_usage:.0f} MiB</span></div>"
if auto_adjust:
return vram_info, gr.update(value=current_layers, maximum=max_layers)
else:

View file

@ -47,6 +47,7 @@ settings = {
'max_new_tokens_max': 4096,
'prompt_lookup_num_tokens': 0,
'max_tokens_second': 0,
'max_updates_second': 12,
'auto_max_new_tokens': True,
'ban_eos_token': False,
'add_bos_token': True,
@ -86,7 +87,7 @@ group.add_argument('--idle-timeout', type=int, default=0, help='Unload model aft
# Model loader
group = parser.add_argument_group('Model loader')
group.add_argument('--loader', type=str, help='Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, ExLlamav3_HF, ExLlamav2_HF, ExLlamav2, HQQ, TensorRT-LLM.')
group.add_argument('--loader', type=str, help='Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, ExLlamav3_HF, ExLlamav2_HF, ExLlamav2, TensorRT-LLM.')
# Transformers/Accelerate
group = parser.add_argument_group('Transformers/Accelerate')
@ -151,10 +152,6 @@ group.add_argument('--no_sdpa', action='store_true', help='Force Torch SDPA to n
group.add_argument('--num_experts_per_token', type=int, default=2, metavar='N', help='Number of experts to use for generation. Applies to MoE models like Mixtral.')
group.add_argument('--enable_tp', action='store_true', help='Enable Tensor Parallelism (TP) in ExLlamaV2.')
# HQQ
group = parser.add_argument_group('HQQ')
group.add_argument('--hqq-backend', type=str, default='PYTORCH_COMPILE', help='Backend for the HQQ loader. Valid options: PYTORCH, PYTORCH_COMPILE, ATEN.')
# TensorRT-LLM
group = parser.add_argument_group('TensorRT-LLM')
group.add_argument('--cpp-runner', action='store_true', help='Use the ModelRunnerCpp runner, which is faster than the default ModelRunner but doesn\'t support streaming yet.')
@ -262,8 +259,6 @@ def fix_loader_name(name):
return 'ExLlamav2_HF'
elif name in ['exllamav3-hf', 'exllamav3_hf', 'exllama-v3-hf', 'exllama_v3_hf', 'exllama-v3_hf', 'exllama3-hf', 'exllama3_hf', 'exllama-3-hf', 'exllama_3_hf', 'exllama-3_hf']:
return 'ExLlamav3_HF'
elif name in ['hqq']:
return 'HQQ'
elif name in ['tensorrt', 'tensorrtllm', 'tensorrt_llm', 'tensorrt-llm', 'tensort', 'tensortllm']:
return 'TensorRT-LLM'

View file

@ -65,39 +65,41 @@ def _generate_reply(question, state, stopping_strings=None, is_chat=False, escap
all_stop_strings += st
shared.stop_everything = False
last_update = -1
reply = ''
is_stream = state['stream']
if len(all_stop_strings) > 0 and not state['stream']:
state = copy.deepcopy(state)
state['stream'] = True
min_update_interval = 0
if state.get('max_updates_second', 0) > 0:
min_update_interval = 1 / state['max_updates_second']
# Generate
last_update = -1
latency_threshold = 1 / 1000
for reply in generate_func(question, original_question, state, stopping_strings, is_chat=is_chat):
cur_time = time.monotonic()
reply, stop_found = apply_stopping_strings(reply, all_stop_strings)
if escape_html:
reply = html.escape(reply)
if is_stream:
cur_time = time.time()
# Limit number of tokens/second to make text readable in real time
if state['max_tokens_second'] > 0:
diff = 1 / state['max_tokens_second'] - (cur_time - last_update)
if diff > 0:
time.sleep(diff)
last_update = time.monotonic()
last_update = time.time()
yield reply
# Limit updates to avoid lag in the Gradio UI
# API updates are not limited
else:
# If 'generate_func' takes less than 0.001 seconds to yield the next token
# (equivalent to more than 1000 tok/s), assume that the UI is lagging behind and skip yielding
if (cur_time - last_update) > latency_threshold:
if cur_time - last_update > min_update_interval:
last_update = cur_time
yield reply
last_update = time.monotonic()
if stop_found or (state['max_tokens_second'] > 0 and shared.stop_everything):
break
@ -503,11 +505,11 @@ def generate_reply_custom(question, original_question, state, stopping_strings=N
return
def print_prompt(prompt, max_chars=2000):
def print_prompt(prompt, max_chars=-1):
DARK_YELLOW = "\033[38;5;3m"
RESET = "\033[0m"
if len(prompt) > max_chars:
if max_chars > 0 and len(prompt) > max_chars:
half_chars = max_chars // 2
hidden_len = len(prompt[half_chars:-half_chars])
hidden_msg = f"{DARK_YELLOW}[...{hidden_len} characters hidden...]{RESET}"

View file

@ -109,7 +109,6 @@ def list_model_elements():
'threads',
'threads_batch',
'batch_size',
'hqq_backend',
'ctx_size',
'cache_type',
'tensor_split',
@ -192,6 +191,7 @@ def list_interface_input_elements():
'max_new_tokens',
'prompt_lookup_num_tokens',
'max_tokens_second',
'max_updates_second',
'do_sample',
'dynamic_temperature',
'temperature_last',
@ -210,6 +210,15 @@ def list_interface_input_elements():
'negative_prompt',
'dry_sequence_breakers',
'grammar_string',
'navigate_message_index',
'navigate_direction',
'navigate_message_role',
'edit_message_index',
'edit_message_text',
'edit_message_role',
'branch_index',
'enable_web_search',
'web_search_pages',
]
# Chat elements

View file

@ -24,7 +24,8 @@ def create_ui():
with gr.Row(elem_id='past-chats-row', elem_classes=['pretty_scrollbar']):
with gr.Column():
with gr.Row(elem_id='past-chats-buttons'):
shared.gradio['branch_chat'] = gr.Button('Branch', elem_classes='refresh-button', interactive=not mu)
shared.gradio['branch_chat'] = gr.Button('Branch', elem_classes='refresh-button', elem_id='Branch', interactive=not mu)
shared.gradio['branch_index'] = gr.Number(value=-1, precision=0, visible=False, elem_id="Branch-index", interactive=True)
shared.gradio['rename_chat'] = gr.Button('Rename', elem_classes='refresh-button', interactive=not mu)
shared.gradio['delete_chat'] = gr.Button('🗑️', elem_classes='refresh-button', interactive=not mu)
shared.gradio['Start new chat'] = gr.Button('New chat', elem_classes=['refresh-button', 'focus-on-chat-input'])
@ -47,13 +48,13 @@ def create_ui():
with gr.Row():
with gr.Column(elem_id='chat-col'):
shared.gradio['display'] = gr.JSON(value={}, visible=False) # Hidden buffer
shared.gradio['html_display'] = gr.HTML(value=chat_html_wrapper({'internal': [], 'visible': []}, '', '', 'chat', 'cai-chat', '')['html'], visible=True)
shared.gradio['html_display'] = gr.HTML(value=chat_html_wrapper({'internal': [], 'visible': [], 'metadata': {}}, '', '', 'chat', 'cai-chat', '')['html'], visible=True)
with gr.Row(elem_id="chat-input-row"):
with gr.Column(scale=1, elem_id='gr-hover-container'):
gr.HTML(value='<div class="hover-element" onclick="void(0)"><span style="width: 100px; display: block" id="hover-element-button">&#9776;</span><div class="hover-menu" id="hover-menu"></div>', elem_id='gr-hover')
with gr.Column(scale=10, elem_id='chat-input-container'):
shared.gradio['textbox'] = gr.Textbox(label='', placeholder='Send a message', elem_id='chat-input', elem_classes=['add_scrollbar'])
shared.gradio['textbox'] = gr.MultimodalTextbox(label='', placeholder='Send a message', file_types=['text', '.pdf'], file_count="multiple", elem_id='chat-input', elem_classes=['add_scrollbar'])
shared.gradio['show_controls'] = gr.Checkbox(value=shared.settings['show_controls'], label='Show controls (Ctrl+S)', elem_id='show-controls')
shared.gradio['typing-dots'] = gr.HTML(value='<div class="typing"><span></span><span class="dot1"></span><span class="dot2"></span></div>', label='typing', elem_id='typing-container')
@ -70,8 +71,6 @@ def create_ui():
shared.gradio['Remove last'] = gr.Button('Remove last reply (Ctrl + Shift + Backspace)', elem_id='Remove-last')
with gr.Row():
shared.gradio['Replace last reply'] = gr.Button('Replace last reply (Ctrl + Shift + L)', elem_id='Replace-last')
shared.gradio['Copy last reply'] = gr.Button('Copy last reply (Ctrl + Shift + K)', elem_id='Copy-last')
shared.gradio['Impersonate'] = gr.Button('Impersonate (Ctrl + Shift + M)', elem_id='Impersonate')
with gr.Row():
@ -79,14 +78,20 @@ def create_ui():
shared.gradio['Send dummy reply'] = gr.Button('Send dummy reply')
with gr.Row():
shared.gradio['send-chat-to-default'] = gr.Button('Send to default')
shared.gradio['send-chat-to-notebook'] = gr.Button('Send to notebook')
shared.gradio['send-chat-to-default'] = gr.Button('Send to Default')
shared.gradio['send-chat-to-notebook'] = gr.Button('Send to Notebook')
with gr.Row(elem_id='chat-controls', elem_classes=['pretty_scrollbar']):
with gr.Column():
with gr.Row():
shared.gradio['start_with'] = gr.Textbox(label='Start reply with', placeholder='Sure thing!', value=shared.settings['start_with'], elem_classes=['add_scrollbar'])
with gr.Row():
shared.gradio['enable_web_search'] = gr.Checkbox(value=shared.settings.get('enable_web_search', False), label='Activate web search')
with gr.Row(visible=shared.settings.get('enable_web_search', False)) as shared.gradio['web_search_row']:
shared.gradio['web_search_pages'] = gr.Number(value=shared.settings.get('web_search_pages', 3), precision=0, label='Number of pages to download', minimum=1, maximum=10)
with gr.Row():
shared.gradio['mode'] = gr.Radio(choices=['instruct', 'chat-instruct', 'chat'], value=shared.settings['mode'] if shared.settings['mode'] in ['chat', 'chat-instruct'] else None, label='Mode', info='Defines how the chat prompt is generated. In instruct and chat-instruct modes, the instruction template Parameters > Instruction template is used.', elem_id='chat-mode')
@ -96,6 +101,22 @@ def create_ui():
with gr.Row():
shared.gradio['chat-instruct_command'] = gr.Textbox(value=shared.settings['chat-instruct_command'], lines=12, label='Command for chat-instruct mode', info='<|character|> and <|prompt|> get replaced with the bot name and the regular chat prompt respectively.', visible=shared.settings['mode'] == 'chat-instruct', elem_classes=['add_scrollbar'])
with gr.Row():
shared.gradio['count_tokens'] = gr.Button('Count tokens', size='sm')
shared.gradio['token_display'] = gr.HTML(value='', elem_classes='token-display')
# Hidden elements for version navigation and editing
with gr.Row(visible=False):
shared.gradio['navigate_message_index'] = gr.Number(value=-1, precision=0, elem_id="Navigate-message-index")
shared.gradio['navigate_direction'] = gr.Textbox(value="", elem_id="Navigate-direction")
shared.gradio['navigate_message_role'] = gr.Textbox(value="", elem_id="Navigate-message-role")
shared.gradio['navigate_version'] = gr.Button(elem_id="Navigate-version")
shared.gradio['edit_message_index'] = gr.Number(value=-1, precision=0, elem_id="Edit-message-index")
shared.gradio['edit_message_text'] = gr.Textbox(value="", elem_id="Edit-message-text")
shared.gradio['edit_message_role'] = gr.Textbox(value="", elem_id="Edit-message-role")
shared.gradio['edit_message'] = gr.Button(elem_id="Edit-message")
def create_chat_settings_ui():
mu = shared.args.multi_user
@ -185,7 +206,7 @@ def create_event_handlers():
shared.gradio['Generate'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
lambda x: (x, ''), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
lambda x: (x, {"text": "", "files": []}), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
lambda: None, None, None, js='() => document.getElementById("chat").parentNode.parentNode.parentNode.classList.add("_generating")').then(
chat.generate_chat_reply_wrapper, gradio(inputs), gradio('display', 'history'), show_progress=False).then(
None, None, None, js='() => document.getElementById("chat").parentNode.parentNode.parentNode.classList.remove("_generating")').then(
@ -193,7 +214,7 @@ def create_event_handlers():
shared.gradio['textbox'].submit(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
lambda x: (x, ''), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
lambda x: (x, {"text": "", "files": []}), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
lambda: None, None, None, js='() => document.getElementById("chat").parentNode.parentNode.parentNode.classList.add("_generating")').then(
chat.generate_chat_reply_wrapper, gradio(inputs), gradio('display', 'history'), show_progress=False).then(
None, None, None, js='() => document.getElementById("chat").parentNode.parentNode.parentNode.classList.remove("_generating")').then(
@ -221,10 +242,6 @@ def create_event_handlers():
None, None, None, js='() => document.getElementById("chat").parentNode.parentNode.parentNode.classList.remove("_generating")').then(
None, None, None, js=f'() => {{{ui.audio_notification_js}}}')
shared.gradio['Replace last reply'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.handle_replace_last_reply_click, gradio('textbox', 'interface_state'), gradio('history', 'display', 'textbox'), show_progress=False)
shared.gradio['Send dummy message'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.handle_send_dummy_message_click, gradio('textbox', 'interface_state'), gradio('history', 'display', 'textbox'), show_progress=False)
@ -258,7 +275,7 @@ def create_event_handlers():
shared.gradio['branch_chat'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.handle_branch_chat_click, gradio('interface_state'), gradio('history', 'display', 'unique_id'), show_progress=False)
chat.handle_branch_chat_click, gradio('interface_state'), gradio('history', 'display', 'unique_id', 'branch_index'), show_progress=False)
shared.gradio['rename_chat'].click(chat.handle_rename_chat_click, None, gradio('rename_to', 'rename-row'), show_progress=False)
shared.gradio['rename_to-cancel'].click(lambda: gr.update(visible=False), None, gradio('rename-row'), show_progress=False)
@ -290,7 +307,14 @@ def create_event_handlers():
None, gradio('mode'), None, js="(mode) => {mode === 'instruct' ? document.getElementById('character-menu').parentNode.parentNode.style.display = 'none' : document.getElementById('character-menu').parentNode.parentNode.style.display = ''}")
shared.gradio['chat_style'].change(chat.redraw_html, gradio(reload_arr), gradio('display'), show_progress=False)
shared.gradio['Copy last reply'].click(chat.send_last_reply_to_input, gradio('history'), gradio('textbox'), show_progress=False)
shared.gradio['navigate_version'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.handle_navigate_version_click, gradio('interface_state'), gradio('history', 'display'), show_progress=False)
shared.gradio['edit_message'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.handle_edit_message_click, gradio('interface_state'), gradio('history', 'display'), show_progress=False)
# Save/delete a character
shared.gradio['save_character'].click(chat.handle_save_character_click, gradio('name2'), gradio('save_character_filename', 'character_saver'), show_progress=False)
@ -347,3 +371,13 @@ def create_event_handlers():
None, None, None, js=f'() => {{{ui.switch_tabs_js}; switch_to_notebook()}}')
shared.gradio['show_controls'].change(None, gradio('show_controls'), None, js=f'(x) => {{{ui.show_controls_js}; toggle_controls(x)}}')
shared.gradio['count_tokens'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.count_prompt_tokens, gradio('textbox', 'interface_state'), gradio('token_display'), show_progress=False)
shared.gradio['enable_web_search'].change(
lambda x: gr.update(visible=x),
gradio('enable_web_search'),
gradio('web_search_row')
)

View file

@ -39,11 +39,9 @@ def create_ui():
with gr.Row():
with gr.Column():
shared.gradio['gpu_layers'] = gr.Slider(label="gpu-layers", minimum=0, maximum=get_initial_gpu_layers_max(), step=1, value=shared.args.gpu_layers, info='Must be greater than 0 for the GPU to be used. ⚠️ Lower this value if you can\'t load the model.')
shared.gradio['ctx_size'] = gr.Slider(label='ctx-size', minimum=256, maximum=131072, step=256, value=shared.args.ctx_size, info='Context length. ⚠️ Lower this value if you can\'t load the model.')
shared.gradio['ctx_size'] = gr.Slider(label='ctx-size', minimum=256, maximum=131072, step=256, value=shared.args.ctx_size, info='Context length. Common values: 4096, 8192, 16384, 32768, 65536, 131072. ⚠️ Lower this value if you can\'t load the model.')
shared.gradio['gpu_split'] = gr.Textbox(label='gpu-split', info='Comma-separated list of VRAM (in GB) to use per GPU. Example: 20,7,7')
shared.gradio['cache_type'] = gr.Dropdown(label="cache-type", choices=['fp16', 'q8_0', 'q4_0', 'fp8', 'q8', 'q7', 'q6', 'q5', 'q4', 'q3', 'q2'], value=shared.args.cache_type, allow_custom_value=True, info='Valid options: llama.cpp - fp16, q8_0, q4_0; ExLlamaV2 - fp16, fp8, q8, q6, q4; ExLlamaV3 - fp16, q2 to q8. For ExLlamaV3, you can type custom combinations for separate k/v bits (e.g. q4_q8).')
shared.gradio['hqq_backend'] = gr.Dropdown(label="hqq_backend", choices=["PYTORCH", "PYTORCH_COMPILE", "ATEN"], value=shared.args.hqq_backend)
with gr.Column():
shared.gradio['vram_info'] = gr.HTML(value=get_initial_vram_info())
shared.gradio['flash_attn'] = gr.Checkbox(label="flash-attn", value=shared.args.flash_attn, info='Use flash-attention.')
@ -312,7 +310,7 @@ def get_initial_vram_info():
for_ui=True
)
return "<div id=\"vram-info\"'>Estimated VRAM to load the model:</span>"
return "<div id=\"vram-info\"'>Estimated VRAM to load the model:</div>"
def get_initial_gpu_layers_max():

View file

@ -71,6 +71,8 @@ def create_ui(default_preset):
shared.gradio['max_new_tokens'] = gr.Slider(minimum=shared.settings['max_new_tokens_min'], maximum=shared.settings['max_new_tokens_max'], value=shared.settings['max_new_tokens'], step=1, label='max_new_tokens', info='⚠️ Setting this too high can cause prompt truncation.')
shared.gradio['prompt_lookup_num_tokens'] = gr.Slider(value=shared.settings['prompt_lookup_num_tokens'], minimum=0, maximum=10, step=1, label='prompt_lookup_num_tokens', info='Activates Prompt Lookup Decoding.')
shared.gradio['max_tokens_second'] = gr.Slider(value=shared.settings['max_tokens_second'], minimum=0, maximum=20, step=1, label='Maximum tokens/second', info='To make text readable in real time.')
shared.gradio['max_updates_second'] = gr.Slider(value=shared.settings['max_updates_second'], minimum=0, maximum=24, step=1, label='Maximum UI updates/second', info='Set this if you experience lag in the UI during streaming.')
with gr.Column():
with gr.Row():
with gr.Column():

View file

@ -74,7 +74,7 @@ def natural_keys(text):
def check_model_loaded():
if shared.model_name == 'None' or shared.model is None:
if len(get_available_models()) <= 1:
if len(get_available_models()) == 0:
error_msg = "No model is loaded.\n\nTo get started:\n1) Place a GGUF file in your user_data/models folder\n2) Go to the Model tab and select it"
logger.error(error_msg)
return False, error_msg

129
modules/web_search.py Normal file
View file

@ -0,0 +1,129 @@
import concurrent.futures
from concurrent.futures import as_completed
from datetime import datetime
import requests
from bs4 import BeautifulSoup
from duckduckgo_search import DDGS
from modules.logging_colors import logger
def get_current_timestamp():
"""Returns the current time in 24-hour format"""
return datetime.now().strftime('%b %d, %Y %H:%M')
def download_web_page(url, timeout=5):
"""Download and extract text from a web page"""
try:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers, timeout=timeout)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text and clean it up
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = ' '.join(chunk for chunk in chunks if chunk)
return text
except Exception as e:
logger.error(f"Error downloading {url}: {e}")
return f"[Error downloading content from {url}: {str(e)}]"
def perform_web_search(query, num_pages=3, max_workers=5):
"""Perform web search and return results with content"""
try:
with DDGS() as ddgs:
results = list(ddgs.text(query, max_results=num_pages))
# Prepare download tasks
download_tasks = []
for i, result in enumerate(results):
url = result.get('href', '')
title = result.get('title', f'Search Result {i+1}')
download_tasks.append((url, title, i))
search_results = [None] * len(download_tasks) # Pre-allocate to maintain order
# Download pages in parallel
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all download tasks
future_to_task = {
executor.submit(download_web_page, task[0]): task
for task in download_tasks
}
# Collect results as they complete
for future in as_completed(future_to_task):
url, title, index = future_to_task[future]
try:
content = future.result()
search_results[index] = {
'title': title,
'url': url,
'content': content
}
except Exception as e:
logger.error(f"Error downloading {url}: {e}")
# Include failed downloads with empty content
search_results[index] = {
'title': title,
'url': url,
'content': ''
}
return search_results
except Exception as e:
logger.error(f"Error performing web search: {e}")
return []
def add_web_search_attachments(history, row_idx, user_message, search_query, state):
"""Perform web search and add results as attachments"""
if not search_query:
logger.warning("No search query provided")
return
try:
logger.info(f"Using search query: {search_query}")
# Perform web search
num_pages = int(state.get('web_search_pages', 3))
search_results = perform_web_search(search_query, num_pages)
if not search_results:
logger.warning("No search results found")
return
# Add search results as attachments
key = f"user_{row_idx}"
if key not in history['metadata']:
history['metadata'][key] = {"timestamp": get_current_timestamp()}
if "attachments" not in history['metadata'][key]:
history['metadata'][key]["attachments"] = []
for result in search_results:
attachment = {
"name": result['title'],
"type": "text/html",
"url": result['url'],
"content": result['content']
}
history['metadata'][key]["attachments"].append(attachment)
logger.info(f"Added {len(search_results)} web search results as attachments")
except Exception as e:
logger.error(f"Error in web search: {e}")

View file

@ -1,7 +1,9 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
bitsandbytes==0.45.*
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -13,6 +15,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -30,8 +33,8 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -29,7 +32,7 @@ sse-starlette==1.6.5
tiktoken
# AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -29,7 +32,7 @@ sse-starlette==1.6.5
tiktoken
# AMD wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+rocm6.2.4.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl; platform_system != "Darwin" and platform_machine != "x86_64"

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -29,7 +32,7 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9-py3-none-any.whl
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -29,8 +32,8 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9-py3-none-any.whl
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9-py3-none-any.whl

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -29,5 +32,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -29,5 +32,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -1,7 +1,9 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
bitsandbytes==0.45.*
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -13,6 +15,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -30,8 +33,8 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/exllamav3/releases/download/v0.0.1a9/exllamav3-0.0.1a9+cu124.torch2.6.0-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/turboderp-org/exllamav2/releases/download/v0.2.9/exllamav2-0.2.9+cu124.torch2.6.0-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

View file

@ -1,6 +1,8 @@
accelerate==1.5.*
beautifulsoup4==4.13.4
colorama
datasets
duckduckgo_search==8.0.2
einops
fastapi==0.112.4
gradio==4.37.*
@ -12,6 +14,7 @@ peft==0.15.*
Pillow>=9.5.0
psutil
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_x86_64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,6 +18,6 @@ sse-starlette==1.6.5
tiktoken
# Mac wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_15_0_arm64.whl; platform_system == "Darwin" and platform_release >= "24.0.0" and platform_release < "25.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_14_0_arm64.whl; platform_system == "Darwin" and platform_release >= "23.0.0" and platform_release < "24.0.0"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0-py3-none-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx2-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# llama.cpp (CPU only, no AVX2)
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cpuavx-py3-none-win_amd64.whl; platform_system == "Windows"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+cu124avx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkan-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -1,9 +1,12 @@
beautifulsoup4==4.13.4
duckduckgo_search==8.0.2
fastapi==0.112.4
gradio==4.37.*
jinja2==3.1.6
markdown
numpy==1.26.*
pydantic==2.8.2
PyPDF2==3.0.1
pyyaml
requests
rich
@ -15,5 +18,5 @@ sse-starlette==1.6.5
tiktoken
# CUDA wheels
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.12.0/llama_cpp_binaries-0.12.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-win_amd64.whl; platform_system == "Windows"
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.14.0/llama_cpp_binaries-0.14.0+vulkanavx-py3-none-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

View file

@ -18,6 +18,7 @@ max_new_tokens_min: 1
max_new_tokens_max: 4096
prompt_lookup_num_tokens: 0
max_tokens_second: 0
max_updates_second: 12
auto_max_new_tokens: true
ban_eos_token: false
add_bos_token: true