AutoGPTQ/auto_gptq/modeling
2023-09-27 10:37:31 +08:00
..
__init__.py support qwen 2023-08-08 19:27:43 +09:00
_base.py Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
_const.py Add support for Falcon as part of Transformers 4.33.0, including new Falcon 180B 2023-09-06 18:03:33 +01:00
_utils.py exllamav2 integration 2023-09-25 16:51:18 +00:00
auto.py exllamav2 integration 2023-09-25 16:51:18 +00:00
baichuan.py Rename the class to match reference capitalisation 2023-06-18 21:01:07 +03:00
bloom.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
codegen.py Add support for CodeGen/2 2023-05-08 17:34:00 +03:00
gpt2.py fix device mismatch when directly using model to inference after quantization 2023-04-28 16:41:46 +08:00
gpt_bigcode.py Add support for GPTBigCode 2023-05-08 12:28:29 +03:00
gpt_neox.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
gptj.py add GPTJ fused attention module 2023-05-14 16:17:21 +08:00
internlm.py Add support for InternLM 2023-07-07 09:25:40 -07:00
llama.py make compatible with older transformers version 2023-05-15 13:26:18 +08:00
moss.py remove non-parameters module from MOSSGPTQForCausalLM.outside_layer_modules 2023-04-29 10:58:29 +08:00
opt.py remove override of _resize_attention_mask for llama and opt 2023-04-28 23:08:42 +08:00
qwen.py Update qwen.py for Qwen-VL 2023-08-30 16:29:55 +08:00
rw.py support falcon 2023-05-27 07:53:39 +09:00