AutoGPTQ

History

PanQiWei b1c64d9269 add baichuan model attention fusion logic		2023-08-11 19:12:43 +08:00
..
__init__.py	support qwen	2023-08-08 19:27:43 +09:00
_base.py	extend to support qlinear_exllama's fusion	2023-08-11 14:52:26 +08:00
_const.py	Merge branch 'main' into xformers_integration	2023-08-10 15:27:11 +08:00
_utils.py	patch for transformers compatiblity	2023-08-09 14:23:59 +00:00
auto.py	Merge branch 'main' into xformers_integration	2023-08-10 15:27:11 +08:00
baichuan.py	add baichuan model attention fusion logic	2023-08-11 19:12:43 +08:00
bloom.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
codegen.py	Add support for CodeGen/2	2023-05-08 17:34:00 +03:00
gpt2.py	fix device mismatch when directly using model to inference after quantization	2023-04-28 16:41:46 +08:00
gpt_bigcode.py	Add support for GPTBigCode	2023-05-08 12:28:29 +03:00
gpt_neox.py	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
gptj.py	using transformers gptj rope implementation	2023-08-11 18:26:23 +08:00
internlm.py	Add support for InternLM	2023-07-07 09:25:40 -07:00
llama.py	add baichuan model attention fusion logic	2023-08-11 19:12:43 +08:00
moss.py	remove non-parameters module from MOSSGPTQForCausalLM.outside_layer_modules	2023-04-29 10:58:29 +08:00
opt.py	remove override of _resize_attention_mask for llama and opt	2023-04-28 23:08:42 +08:00
qwen.py	support qwen	2023-08-08 19:27:43 +09:00
rw.py	support falcon	2023-05-27 07:53:39 +09:00