AutoGPTQ

History

PanQiWei c31b370228 make_sure_not_tensor_in_meta_device before load checkpoint		2023-05-24 11:32:45 +08:00
..
__init__.py	fix device mismatch when directly using model to inference after quantization	2023-04-28 16:41:46 +08:00
_base.py	make_sure_not_tensor_in_meta_device before load checkpoint	2023-05-24 11:32:45 +08:00
_const.py	add library version comparison help functions	2023-05-14 16:16:06 +08:00
_utils.py	fix meta device bug when use low_cpu_mem_usage	2023-05-24 11:19:59 +08:00
auto.py	add options: 'low_cpu_mem_usage' and 'full_cpu_offload'	2023-05-23 22:51:00 +08:00
bloom.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
gpt2.py	fix device mismatch when directly using model to inference after quantization	2023-04-28 16:41:46 +08:00
gpt_neox.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
gptj.py	add GPTJ fused attention module	2023-05-14 16:17:21 +08:00
llama.py	make compatible with older transformers version	2023-05-15 13:26:18 +08:00
moss.py	remove non-parameters module from MOSSGPTQForCausalLM.outside_layer_modules	2023-04-29 10:58:29 +08:00
opt.py	remove override of _resize_attention_mask for llama and opt	2023-04-28 23:08:42 +08:00