AutoGPTQ

History

TheBloke 1b3329b399 Fix 'groupsize' -> 'group_size' in all other .py files. I haven't touched any CUDA kernels in case there's any complexity there I don't understand		2023-05-05 14:44:16 +01:00
..
__init__.py	fix device mismatch when directly using model to inference after quantization	2023-04-28 16:41:46 +08:00
_base.py	Fix 'groupsize' -> 'group_size' in all other .py files. I haven't touched any CUDA kernels in case there's any complexity there I don't understand	2023-05-05 14:44:16 +01:00
_const.py	add gpt2	2023-04-28 09:13:22 +09:00
_utils.py	Change 'groupsize' to 'group_size' everywhere. Turns out this is easier than 'groupsize' due to dependencies in other files.	2023-05-05 13:36:00 +01:00
auto.py	support faster and model load strict	2023-05-04 09:07:34 +09:00
bloom.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
gpt2.py	fix device mismatch when directly using model to inference after quantization	2023-04-28 16:41:46 +08:00
gpt_neox.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
gptj.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
llama.py	Fix 'groupsize' -> 'group_size' in all other .py files. I haven't touched any CUDA kernels in case there's any complexity there I don't understand	2023-05-05 14:44:16 +01:00
moss.py	remove non-parameters module from MOSSGPTQForCausalLM.outside_layer_modules	2023-04-29 10:58:29 +08:00
opt.py	remove override of _resize_attention_mask for llama and opt	2023-04-28 23:08:42 +08:00