AutoGPTQ/auto_gptq/modeling
qwopqwop200 1388acac94
fix bug
2023-05-02 19:13:13 +09:00
..
__init__.py fix device mismatch when directly using model to inference after quantization 2023-04-28 16:41:46 +08:00
_base.py add old cuda 2023-05-01 13:05:14 +09:00
_const.py add gpt2 2023-04-28 09:13:22 +09:00
_utils.py add qlinear_old 2023-05-01 13:04:47 +09:00
auto.py use the same Optional style as in other params 2023-04-29 09:52:11 +03:00
bloom.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
gpt2.py fix device mismatch when directly using model to inference after quantization 2023-04-28 16:41:46 +08:00
gpt_neox.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
gptj.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
llama.py fix bug 2023-05-02 19:13:13 +09:00
moss.py remove non-parameters module from MOSSGPTQForCausalLM.outside_layer_modules 2023-04-29 10:58:29 +08:00
opt.py remove override of _resize_attention_mask for llama and opt 2023-04-28 23:08:42 +08:00